Computational Problems in Multiagent Systems by Albert X i n Jiang B . S c , The University of British Columbia, 2003 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F THE REQUIREMENTS FOR T H E D E G R E E OF Master of Science in The Faculty of Graduate Studies (Computer Science) The University Of British Columbia March 31, 2006 © Albert X i n Jiang 2006 ii Abstract There has been recent interest from the computer science community on multiagent systems, where multiple autonomous agents, each with their own utility functions, act according to their own interests. In this thesis, we apply techniques developed in other areas of CS to solve two computational problems in multiagent systems: Action Graph Games (AGGs), first proposed in [Bhat and Leyton-Brown 2004], are a fully expressive game representation which can compactly express strict and context-specific independence and anonymity structure in players' utility functions. We present an efficient algorithm for computing expected payoffs under mixed strategy profiles. This algorithm runs in time polynomial in the size of the AGG representation (which is itself polynomial in the number of players when the in-degree of the action graph is bounded). We also present an extension to the AGG representation which allows us to compactly represent a wider variety of structured utility functions. A c t i o n G r a p h Games: There is much active research into the design of automated bidding agents, particularly for environments that involve multiple auctions. These settings are complex partly because an agent's optimal strategy depends on information about other bidder's preferences. When bidders' valuation distributions are not known ex ante, machine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bidders' valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valuations, and also in the sense that a high bidder may not be compelled to reveal her valuation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we proposed an EM-base algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach estimates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density estimation techniques. Bidding agents using the estimated distributions from our EM approach were able to outperform bidding agents using the straightforward estimates, in both decision-theoretic and game-theoretic settings. L e a r n i n g a n d p l a n n i n g i n online a u c t i o n environments: iii Contents Abstract ii Contents iii List of Figures v 1 Introduction 1.1 Multiagent Systems 1.2 Overview 1 1 2 2 Action Graph Games 4 2.1 2.2 2.3 Introduction Action Graph Games 2.2.1 Definition 2.2.2 Examples 2.2.3 Size of an A G G Representation Computing with A G G s 2.3.1 Notation 2.3.2 Computing Vy (a_i) 2.3.3 Proof of correctness 2.3.4 Complexity 2.3.5 Discussion A G G with Function Nodes 2.4.1 Motivating Example: Coffee Shop 2.4.2 Function Nodes 2.4.3 Representation Size 2.4.4 Computing with A G G F N s Applications 2.5.1 Application: Computing Payoff Jacobian Experiments 2.6.1 Representation Size 2.6.2 Expected Payoff Computation 2.6.3 Computing Payoff Jacobians 2.6.4 Finding Nash Equilibria using the Govindan-Wilson algorithm Conclusions 4 2.4 2.5 2.6 i 2.7 4 5 5 6 7 9 9 10 13 14 15 16 16 17 18 19 21 21 23 24 24 25 26 27 3 Contents iv B i d d i n g A g e n t s for O n l i n e A u c t i o n s w i t h H i d d e n B i d s . . . . 3.1 Introduction 3.1.1 Game-Theoretic and Decision-Theoretic Approaches . . . 3.1.2 Overview 3.2 A Model of Online Auctions 3.2.1 Bidding Dynamics 3.2.2 Hidden Bids 3.2.3 Discussion 3.3 Learning the Distributions 3.3.1 The Simple Approach 3.3.2 E M Learning Approach 3.3.3 Learning Distributions in a Game-Theoretic Setting . . . 3.4 Constructing a Bidding Agent 3.4.1 A Decision-Theoretic Approach to Repeated Auctions . . 3.4.2 A Game-Theoretic Approach to Bidding in Online Auctions without Proxies 3.5 Experiments 3.5.1 Repeated Online Auctions 3.5.2 Online Auctions without Proxies 3.6 Related Work 3.7 Conclusion 29 29 30 31 32 32 32 33 34 34 35 37 37 38 Bibliography 40 44 45 52 53 55 57 V List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.1 3.2 A G G representation of an arbitrary 3-player, 3-action game . . . A G G representation of a 3-player, 3-action graphical game . . . . A G G representation of the ice cream vendor game Projection of the action graph. Left: action graph of the ice cream vendor game. Right: projected action graph and action sets with respect to the action C I . A 5 x 6 Coffee Shop Game: Left: the A G G representation without function nodes (looking at only the neighborhood of the a node s). Middle: we introduce two function nodes. Right: s now has only 3 incoming edges Comparing Representation Sizes of the Coffee Shop Game (logscale). Left: 5 x 5 grid with 3 to 16 players. Right: 4-player r. x 5 grid with r varying from 3 to 10 Running times for payoff computation in the Coffee Shop Game. Left: 5 x 5 grid with 3 to 16 players. Right: 4-player r x 5 grid with r varying from 3 to 10 Running times for Jacobian computation in the Coffee Shop Game. Left: 5 x 5 grid with 3 to 10 players. Right: 4-player r x 5 grid with r varying from 3 to 10 Ratios of Running times for the Govindan-Wilson algorithm in the Coffee Shop Game. Left: 4 x 4 grid with 3 to 5 players. Right: 4-player r x 4 grid with r varying from 3 to 9. The error bars indicate standard deviation over 10 random initial perturbations. The constant lines at 1.0 indicating equal running times are also shown A n example of the bidding process of an auction with 7 potential bidders Results for Data Set 1, Distribution Estimation: distribution of bids ./(CE) (left); distribution of highest bids f (x) (right) Results for Data Set 1, Bidding: bidding strategies in the first auction (left); box plot of payoff regrets of the two approaches (right) Box plot of expected payoff regrets for overlapping auctions . . . Results for Data Set 2: Linear relationship between the mean of f(x\a) and a (left). Box plot of payoff regrets (right) 1 3.3 3.4 3.5 6 6 6 11 19 24 25 26 27 33 47 47 48 48 List of Figures 3.6 Results for Data Set 3: Distribution f(x) (top-left). Distribution g(m) (top-right). Distribution f (x) (bottom-left). Box plot of payoff regrets (bottom-right) Box plot of payoff regrets on the eBay Data Set Results for Online Auctions without Proxies: the value distributions f(v) (left); the distributions of number of bidders g(m) (right) Results for Online Auctions without Proxies: box plot of epsilonequilibria using the simple and E M approaches vi l 3.7 3.8 3.9 50 52 53 54 [ vii Glossary C (m) D D(s) s G N 5 Si - Vg (o-i) A A^ ' ) 5 1 A^ '^((T_i) S i A( ) s E Si s I er <Ji VVg '-^. (c?) 1 v n St u the contribution of action s to node m, 20 a configuration, 5 the number of players that chose action s, 5 a configuration over v(s), 5 the action graph, 5 the set of players, 5 the set of distinct actions; also the set of action nodes in the action graph, 5 player i's set of actions, 5 expected utility to agent i for playing pure strategy Si, given that all other agents play the mixed strategy profile cr_ 10 the set of configurations, 5 the set of configurations over v(s) given that player i has played s, 5 the set of configurations over v(si) that have positive probability of occurring under the mixed strategy (s,,<7_j), 14 the set of configurations over v(s) given that some player has played s, 5 set of all mixed strategy profiles, 9 set of all mixed strategies of player i, 9 an action profile, 5 the maximum in-degree of the action graph, 7 mixed strategy profile, 9 mixed strategy of player i, 9 payoff Jacobian's entry at row (i, Si) and column (j.*j),21 the neighbor relation of the action graph. i/(s) is the set of neighbors of s., 5 the number of players, 5 an action of player i, 5 utility function , 5 i; Glossary u s the utility function associated with action s; stores utilities given that the current player has played s, 6 viii 1 Chapter 1 Introduction 1.1 Multiagent Systems There has been recent interest from the computer science community on (noncooperative) multiagent systems, where multiple autonomous agents, each with their own utility functions, act according to their own interests. These situations can be modeled as games, and analyzed using game theory. Game theory has received a great deal of study, and is perhaps the dominant paradigm in microeconomics. However, the computational aspects of game theory have received relatively less attention until recently. The intersection of computer science and game theory includes many exciting topics. One line of research is to apply game theory to resource allocation problems in computing systems. Internet-based systems, due to their distributed and multiagent nature, are a popular area for game-theoretic analysis. Examples include network routing [Roughgarden and Tardos 2004], C P U scheduling [Regev and Nisan 1998; Waldspurger et al. 1992], peer-to-peer file sharing [Golle et al. 2001] and search engines' ranking systems [Altman and Tennenholtz 2005]. Another line of research, which includes the topic of this thesis, is to apply CS techniques to analyze and solve computational problems in multiagent systems. One type of problems is the computation of game-theoretic solutions such as Nash equilibria (and restricted versions such as pure-strategy equilibria and social welfare maximizing equilibria) and correlated equilibria, given a description of the game. Recently, complexity analysis from theoretical computer science has been used to analyze these problems [Conitzer and Sandholm 2003; Goldberg and Papadimitriou 2005; Daskalakis et al. 2005; Chen and Deng 2005]. There are also some recent research on practical algorithms for computing Nash equilibria [Roller et al. 1994; Porter et al. 2004]. The representation of games has also received recent interest. Traditionally, simultaneous-move games are represented in normal form, and dynamic games are represented in extensive form. However, for large games, these representations are impractically large, and any non-trivial computation on these representations would be intractable. Fortunately, most large games of any practical interest have highly structured payoff functions, thus it is possible to compactly represent them. One influential approach is to represent structure in payoffs as graphs. A l researchers have used graphs to represent structure in joint probability distributions (as Bayes nets or Markov random fields) with great success. For multiagent systems, several graphical representations have been proposed to represent the (strict) independence between players' payoffs; Chapter 1. Introduction 2 this includes multi-agent influence diagrams [Roller and Milch 2001] for dynamic games and graphical games [Kearns et al. 2001] for simultaneous-move games. Another type of graphical representations focus on context-specific independencies in agents' utility functions - that is, games in which agents' abilities to affect each other depend on the actions they choose. This includes local effect games [Leyton-Brown and Tennenholtz 2003] and action graph games [Bhat and Leyton-Brown 2004]; the latter is the topic of Chapter 2 of this thesis. Another type of computational problems often encountered in multiagent systems are learning problems. Game-theoretic solution concepts such as Nash equilibria assume that the players have perfect knowledge about the game being played, and are perfectly rational. In practice these conditions often do not hold. How should agents act in such environments? This is often cast as a multiagent reinforcement learning problem: instead of assuming other agents to be perfectly rational, assume instead that agent tries to learn an optimal policy through repeated interaction with the other players. Many approaches have been proposed [Bowling and Veloso 2001; Bowling 2004; Powers and Shoham 2004]. Most of these multiagent reinforcement learning approaches study settings of repeated games where players know the game being played. However, in many situations such as auctions, each player does not know the complete payoff function, and each player has certain private information about the payoffs. In the case of auctions, the private information are players' valuations on the item(s) being auctioned. If we know the joint probability distribution of the valuations, we can formulate the situation as a Bayesian game. Usually we do not know the distribution of valuations; instead the bidding history of previous auctions are available, and we could try to estimate the distribution from data. This machine learning problem is an essential component of the problem of building automated bidding agents, which has attracted interest partly due to the Trading Agent Competitions (TAC-Classic and T A C Supply Chain Management) [Wellman et al. 2002]. In Chapter 3 of this thesis, we look at another auction setting with practical interest, namely online auction systems such as the popular eBay. Learning the distributions in this setting presents unique challenges due to the fact that not all bids are shown in the bidding history. 1.2 Overview M y thesis contains results from two research projects on computational problems in game theory and multiagent systems. One is a way of compactly representing games in general, and the other is about a learning problem in a specific kind of multiagent systems, namely online auction environments. The two projects may seem to be about vastly different things, but a common theme is that in both cases we apply techniques developed in other areas of CS (graphical models and dynamic programming in the former, and the Expectation-Maximization algorithm in the latter) to solve computational problems in multiagent systems. Chapter 2 is about Action Graph Games (AGGs). First proposed in [Bhat and Leyton-Brown 2004], A G G s are a graphical representation of games that Chapter 1. Introduction 3 exploits the context-specific independence and anonymity structure in many games. Using A G G s we are able to compactly represent large structured games with many players, and furthermore efficiently compute expected payoffs under mixed strategies, which is an essential step in many game-theoretic computations, e.g. best response, Nash equilibria and correlated equilibria. In this thesis, we make several significant improvements to to results in [Bhat and Leyton-Brown 2004]. First we significantly improved the algorithm for computing expected payoffs of AGGs. Now our dynamic programming algorithm is able to compute expected payoffs in polynomial time with respect to the size of the representation, whereas in [Bhat and Leyton-Brown 2004] a polynomial time algorithm was found for only symmetric games with symmetric strategies. Furthermore, we extended the A G G representation to include function nodes in the action graph. This allows us to compactly represent a larger class of context-specific independence structure in games. Results of computational experiments on structured games confirm our theoretical predictions of compactness and computational speedup. In Chapter 3 we try to solve the problem of learning bidders' valuation and number distributions in online auction environments. There is much active research into the design of automated bidding agents, particularly for environments that involve multiple auctions. These settings are complex partly because an agent's optimal strategy depends on information about other bidder's preferences. When bidders' valuation distributions are not known ex ante, machine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bidders' valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valuations, and also in the sense that a high bidder may not be compelled to reveal her valuation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we proposed an EM-based algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach estimates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density estimation techniques. Bidding agents using the estimated distributions from our E M approach were able to outperform bidding agents using the straightforward estimates, in both decision-theoretic and game-theoretic settings. 4 Chapter 2 Action Graph Games 2.1 Introduction Game-theoretic models have recently been very influential in the computer science community. In particular, simultaneous-action games have received considerable study, which is reasonable as these games are in a sense the most fundamental. In order to analyze these models, it is often necessary to compute game-theoretic quantities ranging from expected utility to Nash equilibria. Most of the game theoretic literature presumes that simultaneous-action games will be represented in normal form. This is problematic because quite often games of interest have a large number of players and a large set of action choices. In the normal form representation, we store the game's payoff function as a matrix with one entry for each player's payoff under each combination of all players' actions. As a result, the size of the representation grows exponentially with the number of players. Even if we had enough space to store such games, most of the computations we'd like to perform on these exponential-sized objects take exponential time. Fortunately, most large games "of any practical interest have highly structured payoff functions, and thus it is possible to represent them compactly. (Intuitively, this is why humans are able to reason about these games in the first place: we understand the payoffs in terms of simple relationships rather than in terms of enormous look-up tables.) One influential class of representations exploit strict independencies between players' utility functions; this class include graphical games [Kearns et al. 2001], multi-agent influence diagrams [Roller and Milch 2001], and game nets [LaMura 2000]. A second approach to compactly representing games focuses on context-specific independencies in agents' utility functions - that is, games in which agents' abilities to affect each other depend on the actions they choose. Since the context-specific independencies considered here are conditioned on actions and not agents, it is often natural to also exploit anonymity in utility functions, where each agent's utilities depend on the distribution of agents over the set of actions, but not on the identities of the agents. Examples include congestion games [Rosenthal 1973] and local effect games (LEGs) [Leyton-Brown and Tennenholtz 2003]. Both of these representations make assumptions about utility functions, and as a result cannot represent arbitrary games. Bhat and Leyton-Brown [2004] introduced 1 A version of this chapter has been submitted for publication. Jiang, A . X . and LeytonBrown, K. (2006) A Polynomial-Time Algorithm for Action-Graph Games. Submitted to AAAI. 1 Chapter 2. Action Graph Games 5 action graph games (AGGs). Similar to LEGs, A G G s use graphs to represent the context-specific independencies of agents' utility functions, but unlike L E G s , A G G s can represent arbitrary games. Bhat & Leyton-Brown proposed an algorithm for computing expected payoffs using the A G G representation. For A G G s with bounded in-degree, their algorithm is exponentially faster than normal-form-based algorithms, yet still exponential in the number of players. In this chapter we make several significant improvements to results in [Bhat and Leyton-Brown 2004], In Section 2.3, we present an improved algorithm for computing expected payoffs. Our new algorithm is able to better exploit anonymity structure in utility functions. For A G G s with bounded in-degree, our algorithm is polynomial in the number of players. In Section 2.4, we extend the A G G representation by introducing function nodes. This feature allows us to compactly represent a wider range of structured utility functions. We also describe computational experiments in Section 2.6 which confirm our theoretical predictions of compactness and computational speedup. 2.2 2.2.1 Action G r a p h Games Definition A n action-graph game (AGG) is a tuple (N, S, v, u). Let N = {1,... ,n} denote the set of agents. Denote by S = FJieiv the set of action profiles, where Y[' the Cartesian product and 5, is agent i's set of actions. We denote by Si G Si one of agent i's actions, and s G S an action profile. Agents may have actions in common. Let S = [j Si denote the set of. distinct actions choices in the game. Let A denote the set of configurations of agents over actions. A configuration D G A is an ordered tuple of | 5 | integers (D(s), D(s'),...), with one integer for each action in S. For each s G S, D(s) specifies the number of agents that chose action s G S. Let V : S — i > A be the function that maps from an action profile s to the corresponding configuration D. These shared actions express the game's anonymity structure: agent i's utility depends only on her action S j and the configuration V(s). Let G be the action graph: a directed graph having one node for each action s G S. The neighbor relation is given by v : S >—> 2 . If s' G is(s) there is an edge from s' to s. Let denote a configuration over ^(s), i.e. is a tuple of |^(s)| integers, one for each action in v(s). Intuitively, agents are only counted in if they take an action which is an element of v(s). is the set of configurations over v(s) given that some player has played s. Similarly we .define : S >-* A< ) which maps from an action profile to the corresponding configuration over v(s). The action graph expresses context-specific independencies of utilities of the game: V i G N, if i chose action s, G S, then i's utility depends only on the s i€N s 2 8 If action s is in multiple players' action sets (say players i, j), and these action sets do not completely overlap, then it is possible that the set of configurations given that i played s (denoted A ^ ) ) is different from the set of configurations given that j played s. A ' ) is the union of these sets of configurations. 2 s,i s 6 Chapter 2. Action Graph Games Figure 2.1: A G G representation of an arbitrary 3-player, 3-action game Figure 2.2: A G G representation of a 3player, 3-action graphical game Figure 2.3: A G G representation of the ice cream vendor game numbers of agents who chose actions connected to s, which is the configuration Si In other words, the configuration of actions not in ^(SJ) does not affect i's utility. We represent the agents' utilities using a tuple of \S\ functions u = (u , u" ,...) one for each action s G 5. Each u is a function u : >—» R . So if agent i chose action s, and the configuration over v(s) is D^ \ then agent i's utility is u (D^). Observe that all agents have the same utility function, i.e. conditioned on choosing the same action s, the utility each agent receives does not depend on the identity of the agent. For notational convenience, we define w ( s , = u { D ^ ) and (s) = u(si,V^{s)). £>()(s). s s s s s a Ui 2.2.2 Examples Any arbitrary game can be encoded as an A G G as follows. Create a unique node St for each action available to each agent i. Thus Vs e S, D(s) G {0,1}, and Vi, J2ses D(s) t equal 1. The distribution simply indicates each agent's action choice, and the representation is no more or less compact than the normal form (see Section 2.2.3 for a detailed analysis). m u s Example 1 . Figure 2.1 shows an arbitrary 3-player, 3-action game encoded as an AGG. As always, nodes represent actions and directed edges represent membership in a node's neighborhood. The dotted boxes represent the players' action sets: player 1 has actions 1, 2 and 3; etc. Observe that there is always an edge between pairs of nodes belonging to different action sets, and that there is never an edge between nodes in the same action set. In a graphical game [Kearns et al. 2001] nodes denote agents and there is an edge connecting each agent i to each other agent whose actions can affect i's utility. Each agent then has a payoff matrix representing his local game with neighboring agents; this representation is more compact than normal form whenever the graph is not a clique. Graphical games can be represented as A G G s by replacing each node i in the graphical game by a distinct cluster of nodes 5, representing the action set of agent i . If the graphical game has an Chapter 2. Action Graph Games 7 edge from i to j, create edges so that V s , € 5 i , V s j € S j , s, € ^ ( j ) - T h e resulting A G G representations are as compact as the original graphical game representations. s Example 2. Figure 2.2 shows the AGG representation of a graphical game having three nodes and two edges between them (i.e., player 1 and player 3 do not directly affect each others' payoffs). The AGG may appear more complex than the graphical game; in fact, this is only because players' actions are made explicit. T h e A G G representation becomes even more compact when agents have actions in common, w i t h utility functions depending only on the number of agents taking these actions rather than on the identities of the agents. Example 3 . The action graph in Figure 2.3 represents a setting in which n vendors sell chocolate or vanilla ice creams, and must choose one of four locations along a beach. There are three kinds of vendors: nc chocolate (C) vendors, n y vanilla vendors, and nw vendors that can sell both chocolate and vanilla, but only on the west side. Chocolate (vanilla) vendors are negatively affected by the presence of other chocolate (vanilla) vendors in the same or neighboring locations, and are simultaneously positively affected by the presence of nearby vanilla (chocolate) vendors. Note that this game exhibits context-specific independence without any strict independence, and that the graph structure is independent of n. Other examples of compact A G G s that cannot be compactly represented as graphical games include: location games, role formation games, traffic routing games, product placement games and party affiliation games. 2.2.3 Size of an AGG Representation W e have claimed that action graph games provide a way of representing games compactly. B u t what exactly is the size of an A G G representation? A n d how does this size grow as the number of agents n grows? F r o m the definition of A G G in Section 2.2.1, we observe that we need the following to completely specify an A G G : • T h e set of agents A = { l , . . . , n } . T h i s can be specified by the integer n. r • T h e set of actions S. • E a c h agent's action set 5 , C S. • T h e action graph G. T h e set of nodes is 5 , which is already specified. T h e neighbor relation v can be straightforwardly represented as neighbor lists: for each node s € S we specify its list of neighbors v(s) C S. T h e space S required is X ] s s which is bounded by \S\T, where T = m a x | f ( s ) | , i.e. the m a x i m u m in-degree of the action graph. s IK)I' s Chapter 2. Action Graph Games 8 • For each action s, the utility function u : A^ ' — i > E. We need to specify a utility value for each distinct configuration € A^ ^. The set of configurations A' ) can be derived from the action graph, and can be sorted in lexicographical order. So we do not need to explicitly specify A ^ ; we can just specify a list of |A^ S '| utility values that correspond to the (ordered) set of configurations. |AW|, the number of distinct configurations over v(s), in general does not have a closed-form expression. Instead, we consider the operation of extending all agents' action sets via V i : Si i—> S. Now the number of configurations over v(s) is an upper bound on |A(")]. The bound is the number of (ordered) combinatorial compositions of n — 1 (since one player has already chosen s) into |i/(s)| +1 nonnegative integers, which is ^(n-\)^]}(s)^<. • Then the total space required for the utilities is bounded from above by |5| ^ " l ^ i j / • s s s s 3 Therefore the size of an AGG representation is dominated by the size of its utility functions, which is bounded by \S\ • If ^ is bounded by a constant as n grows, the representation size grows like O^Sln ), i.e. polynomially with respect to n. The AGG representation achieves compactness by exploiting two types of structure in the utilities: 1 1. Anonymity: agent i's utility depends only on her action Si and the configuration (i.e. number of players that play each action), but not on the identities of the players. Since the number of configurations |A| is usually less than the number of action profiles |S| = Y[ \Si\ and is never greater, we need fewer numbers to represent the utilities in AGG compared to the normal form. i 2. Context-specific independence: for each node s € S, the utility function u only needs to be denned over A^ '. Since |A^ '| is usually less s s S than |A| and is never greater, this further reduces the numbers we need to specify. For each AGG, there exists a unique induced normal form representation with the same set of players and actions for each i ; its utility function is a matrix that specifies each player i's payoff for each possible action profile s S S. This implies a space complexity of n f l " = 1 \Si\. When Si = S for all i , this n becomes n|5| , which grows exponentially with respect to n. The number of payoff values stored in an AGG representation is always less or equal to the number of payoff values in the induced normal form representation. This is because for each entry in the normal form which represents i's utility under action profile s, there exists a unique action profile s in the AGG with the This is the most compact way of representing the utility functions, but does not provide easy random access of the utilities. Therefore, when we want to do computation using A G G , we may convert each, utility function u" to a data structure that efficiently implements a mapping from sequences of integers to (floating-point) numbers, (e.g. tries, hash tables or Red-Black trees), with space complexity in the order of 0 ( I | A ^ ' | ) . 3 s Chapter 2. Action Graph Games 9 corresponding action for each player. T h i s s induces a unique configuration £>(s) over the A G G ' s action nodes. B y construction of the A G G u t i l i t y functions, X>(s) together w i t h s» determines a unique u t i l i t y w (Z?( )(s)) i n the A G G . Furthermore, there are no entries in the A G G u t i l i t y functions that do not correspond to any action profile (si,S-i) i n the n o r m a l form. T h i s means t h a t there exists a many-to-one m a p p i n g from entries of n o r m a l form to utilities i n the A G G . O f course, the A G G representation has the e x t r a overhead of representing the action graph, which is bounded by \S\J. B u t asymptotically, A G G ' s space complexity is never worse t h a n the equivalent n o r m a l form. Si 2.3 Si Computing with A G G s One of the m a i n motivations of compactly representing games is to do efficient c o m p u t a t i o n on the games. We have introduced A G G as a compact represent a t i o n of games; now we would like to exploit the compactness of the A G G representation when we do c o m p u t a t i o n . W e focus on the c o m p u t a t i o n a l task of c o m p u t i n g expected payoffs under a m i x e d strategy profile. Besides being important i n itself, this task is an essential component of many game-theoretic applications, e.g. computing best responses, G o v i n d a n a n d W i l s o n ' s continuation methods for finding N a s h equilibria [Govindan and W i l s o n 2003; G o v i n dan and W i l s o n 2004], the simplicial subdivision a l g o r i t h m for finding N a s h equilibria [van der L a a n et al. 1987], and finding correlated equilibria using P a p a d i m i t r i o u ' s algorithm [Papadimitriou 2005]. Besides exploiting the compactness of the representation, we would also like to be able to exploit the fact that quite often the mixed strategy profile given w i l l have small support. T h e support of a m i x e d strategy CT, is the set of pure strategies played w i t h positive probability (i.e. cri(si) > 0). Q u i t e often games have N a s h equilibria w i t h small support. Porter et a l . [2004] proposed algorithms that explicitly search for N a s h equilibria w i t h small support. In other algorithms for c o m p u t i n g N a s h equilibria such as G o v i n d a n - W i l s o n a n d simplicial subdivision, quite often we w i l l also be c o m p u t i n g expected payoffs for m i x e d strategy profiles w i t h s m a l l support. O u r a l g o r i t h m appropriately exploits strategy profiles w i t h small supports. 2.3.1 Notation Let <p(X) denote the set of all probability distributions over a set X. Define the set of mixed strategies for i as S j = <p(Si), and the set of all m i x e d strategy profiles as E = YlieN We denote an element of by CTJ, a n element of £ by a , and the probability that i plays action s as o~i(s). Define the expected u t i l i t y to agent i for playing pure strategy s ; , given that all other agents play the m i x e d strategy profile c _ j , as (2.1) \ Chapter 2. Action Graph Games 10 where P r ( s _ i | < 7 _ i ) = Ylj7n j( j) the probability of s_{ under the mixed strategy <7_j. T h e set of i's pure strategy best responses t o a mixed strategy profile cr_j is arg m a x Vy(<7_i), and hence the full set of i's pure and mixed strategy best responses t o <T_J is (7 s 1S s = v?(arg max vy'(a-i))s A strategy profile a is a N a s h equilibrium iff (2.2) BRi(cr-i) V i € N, 2.3.2 Oi € BRiip-i). (2.3) Computing V \(a-i) s E q u a t i o n (2.1) is a sum over the set S-i of action profiles of players other t h a n i. T h e number of terms is Ilj^i l^'l' which grows exponentially i n n . T h u s E q u a t i o n (2.1) is an exponential time algorithm for computing Vy (er_j). If we were using the normal form representation, there really would be \S-i\ different outcomes to consider, each w i t h potentially distinct payoff values, so evaluation E q u a t i o n (2.1) is the best we could do for computing V . C a n we do better using the A G G representation? Since A G G s are fully expressive, representing a game without any structure as a n A G G would not give us any computational savings compared to the normal form. Instead, we are interested i n structured games that have a compact A G G representation. In this section we present an algorithm that given any i , Si and <7_,, computes the expected payoff Vy ( a _ i ) i n time p o l y n o m i a l w i t h respect t o the size of the A G G representation. I n other words, our algorithm is efficient if the A G G is compact, and requires time exponential i n n if i t is not. I n particular, recall that for classes of A G G s whose in-degrees are bounded by a constant, their sizes are polynomial i n n. A s a result our algorithm will be p o l y n o m i a l i n n for such games. F i r s t we consider how to take advantage of the context-specific independence structure of the A G G , i.e. the fact that i's payoff when playing Sj only depends on the configurations i n the neighborhood of i . T h i s allows us t o project the other players' strategies into smaller action spaces that are relevant given s^. T h i s is illustrated i n Figure 2.4, using the ice cream vendor game (Figure 2.3). Intuitively we construct a graph from the point of view of an agent who took a particular action, expressing his indifference between actions that do not affect his chosen action. T h i s can be thought of as inducing a context-specific graphical game. Formally, for every action s € S define a reduced graph G^ by including only the nodes u(s) and a new node denoted 0. T h e only edges included i n G^ are the directed edges from each of the nodes v(s) t o the node s. Player j ' s action Sj is projected t o a node i n the reduced graph G^ b y the following mapping: l s (s) = I SJ S J e v{s) , . Chapter 2. Action Graph 11 Games S C Sy F i g u r e 2.4: P r o j e c t i o n of the action graph. Left: action graph of the ice cream vendor game. R i g h t : projected action graph a n d action sets w i t h respect to the action C I . In other words, actions that are not i n u(s) (and therefore do not affect the payoffs of agents playing s) are projected to 0. T h e resulting projected action set Sj ^ has c a r d i n a l i t y at most min(|5,-|, |^(s)| + 1). S W e define the set of m i x e d strategies o n the projected action set by T,^ = <p(Sj ^). A mixed strategy Oj o n the original action set Sj is projected S (s) to Oj (s) G£ v by the following m a p p i n g : *M(8 )=l ^ W [ 2^ 'eSi\v(s) s ^ " S j( a s I j W (2 5) ~ v s So given s , a n d o-i, we can compute i n <9(n|5|) time i n the worst case. N o w we c a n operate entirely o n the projected space, a n d write the expected payoff as where P r ^ * V - ^ ) = Uj^i ^(sf^). the worst case has (|^(SJ)| + l ) ( n _ 1 T h e s u m m a t i o n is over S^p, w h i c h i n ) terms. So for A G G s w i t h strict or context- specific independence structure, c o m p u t i n g this w a y is m u c h faster V^cT-i) t h a n doing the s u m m a t i o n i n (2.1) directly. However, the time complexity of this approach is still exponential i n n. N e x t we want to take advantage of the a n o n y m i t y structure of the A G G . R e c a l l from our discussion of representation size that the number of distinct configurations is usually smaller t h a n the number of distinct pure action profiles. So ideally, we want to compute the expected payoff V ' (er_i) as a s u m over the s Chapter 2. Action Graph Games 12 possible configurations, weighted b y their probabilities: (2.6) where <r = (si,a_l ) a n d (si) N (2.7) s:X>< «'(s) = D< i> J= l s s w h i c h is the probability of D - ^ given the m i x e d strategy profile cr - ^. E q u a t i o n (2.6) is a s u m m a t i o n of size | A ( ' * ) | , the number of configurations given that i played s , , which is p o l y n o m i a l i n n if X is bounded. T h e difficult task is to compute P r ( D |i7 ) for a l l G A( \ i.e. the probability d i s t r i b u t i o n 1 3 1 3 Si ( s i ) S i , 1 ( s i ) over A ( ' ) induced by <r( \ W e observe that the sum i n E q u a t i o n (2.7) is over the set of a l l action profiles corresponding to the configuration D^ \ T h e size of this set is exponential i n the number of players. Therefore directly c o m p u t i n g the probability distribution using E q u a t i o n (2.7) would take exponential time i n n. Indeed this is the approach proposed i n [Bhat a n d L e y t o n - B r o w n 2004]. S i 1 Si Si C a n we do better? W e observe that the players' m i x e d strategies are independent, i.e. a is a product probability d i s t r i b u t i o n cr(s) = F L ^ ^ i ) - A l s o , each player affects the configuration D independently. T h i s structure allows us to use d y n a m i c p r o g r a m m i n g ( D P ) t o efficiently compute the probability dist r i b u t i o n P r ^ ^ ^ l c r ^ ) ) . T h e i n t u i t i o n behind o u r algorithm is t o a p p l y one agent's m i x e d strategy at a time. I n other words, we a d d one agent at a time (s ) to the action graph. L e t cr\ {l,...,fc}. k denote the projected strategy profile of agents Denote b y AJj. *^ t h e set of configurations induced b y actions of 8 agents { 1 , . . . , k}. S i m i l a r l y denote D ^ a k € AJj. *\ Denote by Pk the probability s d i s t r i b u t i o n o n AJj."*^ induced by o-[ '\, a n d by Pk[D] the probability of configus ration D. A t iteration k of the algorithm, we compute Pk from Pk-i a n d cr \ s, k After iteration n , the a l g o r i t h m stops a n d returns P . n T h e pseudocode of our D P algorithm is shown as A l g o r i t h m 1. Each D is represented as a sequence of integers, so Pk is a m a p p i n g from sequences of integers to real numbers. W e need a d a t a structure to manipulate such probability distributions over configurations (sequences of integers) w h i c h permits quick lookup, insertion a n d enumeration. A n efficient d a t a structure for this purpose is a trie. Tries are c o m m o n l y used i n text processing to store strings of characters, e.g. as dictionaries for spell checkers. Here we use tries to store strings of integers rather t h a n characters. B o t h lookup a n d insertion complexity is linear i n | ^ ( . S i ) | . T o achieve efficient enumeration of a l l elements of a trie, we store the elements i n a list, i n the order of their insertions. k Chapter 2. Action Graph Games 13 Algorithm 1 C o m p u t i n g the induced probability d i s t r i b u t i o n P r ( D ( s i ) |a ^). ( s Algorithm ComputeP Input: Si, cr( ) Si Output: P , which is the d i s t r i b u t i o n Pr(D^\cr - ' ) < represented as a trie. Si 1 n D = ((),... ,0) P [D ] = 1.0 / / Initialization: A ^ { Si) 0 { Si) 0 0 = S I ) {D } [ Si) 0 forfc= 1 t o n do Initialize Pk to be a n empty trie for all D l\ from P -i do { k k for all s[ G S Si) { Si) k such that a[ (s{ ) Si) > 0 do Si) if s^ ^ 0 then 3i) £>l (4 ) + = s,) s,) / / pp y 1 A ] a c t i o n 4 Si) end if if Pk[D^^] does not exist yet then =0.0 P [D ] ( Si) k k end if P [D^} +=P - [Dt\]xol \s^) Si k k 1 end for end for end for return P n 2.3.3 Proof of correctness It is straightforward to see that A l g o r i t h m 1 is computing the following recurrence i n iteration k: V £ F C G A < S I , ) Y, P [D }= k k Pnp -i]xa[ ( < ) Si) t Si) S ( 2 ' 8 ) where T>( \Dk-i, sk '^) denotes the configuration resulting from a p p l y i n g /c's projected action s ^ to the configuration D ^\ G A £ " \ O n the other h a n d , the probability d i s t r i b u t i o n o n A ^ induced b y oi... is by definition Si s s k k s k k k £ Pr(£» |a ... ) = fc 1 fc II^( i) s Si... :I5(»i)(si... ) fc fe ( - ) 2 9 = £> i=l fc N o w we want to prove that our D P algorithm is indeed c o m p u t i n g the correct probability dis t r ibut ion, i.e. Pk[Dk] as defined by E q u a t i o n 2.8 is equal to Pr(Dfc|ffi... ). fc Chapter 2. Action Graph Games Theorem 1. For all k, and for all D k G 14 A[ , P [D ] = Pr(£> |cri... ). Si) k k fc fc Proof by induction on k. Base case: A p p l y i n g E q u a t i o n (2.8) for k = 1, it is straightforward to verify that Pi[Di] = P r ( £ > i | ( 7 i ) for a l l Di G A{ . Inductive case: N o w assume Pk-\[Dk-i\ = Pr(Dfc_i|<7i...fc_i) for a l l Dk~i € Si) A ( s i ) fc-i' k (2.10) P -r\D yxa {s ) P [D }= k k k k k F>k-i,s : k 2?(D _i,s ) = D f c f c k E " * w D ^,S : k x x n°j( j)j s \si...k-i:'D(si...fc-l)=£>fc- k l i=i / (2.11) = ' E f D _ ,s :'D(D - ,s ) k 1 k k 1 =D k k E (- ) 2 12 \s ...fc_i:r>(si... _ )=£) _ 1 f c 1 f c 1 j=l / fc = E EE i [ p ( O f c - i , ^ ) = D ] • i[D( ... _ )=D _ ] • t S 1 f e 1 f c 1 n^( j) s (2.13) =E IE 1 [D(D _ , f c 1 S f c )=D ] • f c 1[X>( ... _ ) = S1 fc 1 £>_] J C FL -n^jCSj) ( - ) 2 14 fe «l...fc j= l E n^^) (-) 21 6 si...k--V(<>i...k) = D j=l k = Pr(DtK.. f c ) (2.17) N o t e that from (2.13) t o (2.14) we use the fact that given a n action profile Z> Si...fc_i, (8l) there is a unique configuration Dk-i (*i...fc-i)- 2.3.4 G A J £ l j such that Dk-i = • Complexity O u r a l g o r i t h m for c o m p u t i n g V* (o~-i) consists of first c o m p u t i n g the projected strategies using (2.5), then following A l g o r i t h m 1, and finally doing the weighted s u m given i n E q u a t i o n (2.6). L e t A ( ' ^ ( c r _ i ) denote the set of configurations over u(si) that have positive p r o b a b i l i t y of o c c u r r i n g under the m i x e d strategy 5 i 15 Chapter 2. Action Graph Games (sj,<7_i). In other words, this is the number of terms we need to add together when doing the weighted sum in Equation (2.6). When <7_j has full support, A( ' )(er_j) = A ^ ' ) . Since looking up an entry in a trie takes time linear in the size of the key, which is |^(s,)| in our case, the complexity of doing the weighted sum in Equation (2.6) is 0(|f (si)||A( '*)(cr_t)|). Algorithm 1 requires n iterations; in iteration k, we look at all possible combinations of and s *\ and in each case do a trie look-up which costs 0(Ksi)|). Since |S[ | < + 1, and l A J ^ I < l A ^ I , the complexity of Algorithm 1 is 0(n|i/(si)| |A' )(<7_j)|). This dominates the complexity of summing up (2.6). Adding the cost of computing o_\.\, we get the overall complexity of expected payoff computation 0(n|5| + n | i / ( s j ) | | A ( ' * ) ( < 7 _ ; ) | ) . Since l A ^ V - i ) ! ) < l A ^ I < | A ( ) | , and |A< ')| is the number of payoff values stored in payoff function u , this means that expected payoffs can be computed in polynomial time with respect to the size of the AGG. Furthermore, our algorithm is able to exploit strategies with small supports which lead to a small l A ^ - ^ c r - i ) ! ) . Since\A^\ is bounded by ^'^jffj^i, this implies that if the in-degree of the graph is bounded by a constant, then the complexity of computing expected payoffs is 0(n|5| + n ). Si l 1 Si s k Si) 2 SI,T 2 S i SI S Si 1+1 T h e o r e m 2 . Given an AGG representation of a game, i's expected payoff V (cr_j) can be computed in polynomial time with respect to the representation size, and if the in-degree of the action graph is bounded by a constant, the complexity is polynomial in n. l s 2.3.5 Discussion Of course it is not necessary to apply the agents' mixed strategies in the order 1... n. In fact, we can apply the strategies in any order. Although the number of configurations |A' ^(cr_i)| remains the same, the ordering does affect the intermediate configurations A '\ We can use the following heuristic to try to minimize the number of intermediate configurations: sort the players by the sizes of their projected action sets, in ascending order. This would reduce the amount of work we do in earlier iterations of Algorithm 1, but does not change the overall complexity of our algorithm. In fact, we do not even have to apply one agent's strategy at a time. We could partition the set of players into sub-groups, compute the distributions induced by each of these sub-groups, then combine these distributions together. Algorithm 1 can be straightforwardly extended to deal with such distributions instead of mixed strategies of single agents. In Section 2.5.1 we apply this approach to compute Jacobians efficiently. Si, s k R e l a t i o n to P o l y n o m i a l M u l t i p l i c a t i o n We observe that the problem of computing Pr(D\o^) can be expressed as one of multiplication of multivariate polynomials. For each action node s G v(si), Chapter 2. Action Graph Games 16 let x be a variable corresponding to s. T h e n consider the following expression: s f[K S i ) E (0)+ fc=i \ (2.18) ^{s )x \ k Sk s eS nv( ) k k j Si T h i s is a multiplication of n multivariate polynomials, each corresponding to one player's projected mixed strategy. T h i s expression expands to a sum of | A ( ' ) | terms. E a c h term can be identified b y the tuple of exponents of the x variables, (D(s), D(s'),...). In other words, the set of terms corresponds to the set of configurations A ^ ' ) . T h e coefficient of the t e r m w i t h exponents D G A ^ * ' ^ is S i l 1 E (n- (4 )) (si) s< «>:£>(•'<)(s( »>)=£> s si) \fc=l a / which is exactly Pr(D\o- - ^) by E q u a t i o n (2.7)! So the whole expression (2.18) evaluates to ( Si E Pr{D\*M) *? (a) T h u s the problem of computing Pr(D\a - ^) is equivalent to the problem of computing the coefficients i n (2.18). O u r D P algorithm corresponds to the strategy of m u l t i p l y i n g one polynomial at a time, i.e. at iteration k we multiply the p o l y n o m i a l corresponding to player k's strategy w i t h the expanded p o l y n o m i a l of 1 . . . (k — 1) that we computed i n the previous iteration. l 2.4 Si A G G with Function Nodes There are games w i t h certain kinds of context-specific independence structures t h a t A G G s are not able to exploit. In E x a m p l e 4 we show a class of games w i t h one such k i n d of structure. O u r solution is to extend the A G G representation by introducing function nodes, which allows us to exploit a much wider variety of structures. 2.4.1 M o t i v a t i n g Example: Coffee Shop Example 4. In the Coffee Shop Game there are n players; each player is planning to open a new coffee shop in an downtown area, but has to decide on the location. The downtown area is represented by a r x c grid. Each player can choose to open the shop at any of the B = rc blocks, or decide not to enter the market. Conditioned on player i choosing some location s, her utility depends on: • the number of players that chose the same block, • the number of players that chose any of the surrounding blocks, and 17 Chapter 2. Action Graph Games • the number of players that chose any other location. The normal form representation of this game has size n\S\ = n(B + l ) . Since there are no strict independencies in the utility function, the size of the graphical game representation would be similar. Let us now represent the game as an AGG. We observe that if agent i chooses an action s corresponding to one of the B locations, then her payoff is affected by the configuration over all B locations. Hence, u(s) would consist of B action nodes corresponding to the B locations. The action graph has in-degree T = B. Since the action sets completely overlap, the representation size is 0(|5||AW|) = 0(B^"l^igf )• If we hold B constant, this becomes 0(Bn ), which is exponentially more compact than the normal form and the graphical game representation. If we instead hold n constant, the size of the representation is 0(B ), which is only slightly better than the normal form and graphical game representations. Intuitively, the AGG representation is only able to exploit the anonymity structure in this game. However, this game's payoff function does have contextspecific structure. Observe that u depends only on three quantities: the number of players that chose the same block, the number of players who chose surrounding blocks, and the number of players who chose other locations. In other words, u can be written as a function g of only 3 integers: u (D^) — g(D(s), J2 '£S' D(s'), J2 "eS" ^K ")) where S' is the set of actions that surrounds s and S" the set of actions corresponding to the other locations. Because the AGG representation is not able to exploit this context-specific information, utility values are duplicated in the representation. n n B n s s s 5 S 2.4.2 s Function Nodes In the above example we showed a kind of context-specific independence structure that AGGs cannot exploit. It is easy to think of similar examples, where u could be written as a function of a small number of intermediate parameters. One example is a "parity game" where u depends only on whether Es'ei/fs) -^( ') * °dd- Thus u would have just two distinct values, but the AGG representation would have to specify a value for every configuration s s s s e v e n o r s This kind of structure can be exploited within the AGG framework by introducing function nodes to the action graph G. Now G's vertices consist of both the set of action nodes S and the set of function nodes P. We require that no function node p € P can be in any player's action set, i.e. 5 D P = {}, so the total number of nodes in G is \S\ + \P\. Each node in G can have action nodes and/or function nodes as neighbors. For each p £ P, we introduce a function f : A ^ ' >—> IN, where € A( ) denotes configurations over p's neighbors. The configurations D are extended over the entire set of nodes, by defining D(p) = f (D^). Intuitively, D(p) are the intermediate parameters that players' utilities depend on. To ensure that the AGG is meaningful, the graph G restricted to nodes in P is required to be a directed acyclic graph (DAG). Furthermore it is required that p p p Chapter 2. A c t i o n Graph Games 18 every p £ P has at least one neighbor (i.e. incoming edge). These conditions ensure that D(s) for a l l s a n d D(p) for a l l p are well-defined. T o ensure that every p £ P is " u s e f u l " , we also require that p has at least one out-going edge. A s before, for each action node s we define a u t i l i t y function u : A^ i - » E . We call this extended representation (N, S, P, v, { / } g p , u) a n A c t i o n G r a p h G a m e w i t h F u n c t i o n Nodes ( A G G F N ) . a p p 2.4.3 Representation Size G i v e n a n A G G F N , we can construct a n equivalent A G G w i t h the same players N and actions S and equivalent u t i l i t y functions, but represented without a n y function nodes. We p u t an edge from s' t o s i n the A G G if either there is an edge from s' to s i n the A G G F N , or there is a p a t h from s' to s through a chain of function nodes. T h e number of utilities stored i n a n A G G F N is no greater t h a n the number of utilities i n the equivalent A G G without function nodes. W e c a n show this b y following similar arguments as before, establishing a many-to-one m a p p i n g from utilities i n the A G G representation to utilities i n the A G G F N . O n the other hand, A G G F N s have to represent the functions f , which can either be implemented using elementary operations, or represented as mappings similar to u . W e could construct examples w i t h huge number of function nodes, such that the space complexity of representing { / } g p would be greater t h a n that of the utility functions. In other words, blindly adding function nodes will not make the representation more compact. W e want to a d d function nodes only when they represent meaningful intermediate parameters and hence reduce the number of incoming edges on action nodes. Consider our coffee shop example. F o r each action node s corresponding to a location, we introduce function nodes p' and p " . L e t v(p' ) consist of actions surrounding s, a n d v(p") consist of actions for the other locations. T h e n we modify v(s) so that it has 3 nodes: i/(s) = {s,p' ,p"}, as shown i n F i g u r e 2.5. F o r all function nodes p £ P, we define f (D^) = V J „ ( ) D(m). N o w each is a configuration over only 3 nodes. Since f is a s u m m a t i o n operator, | A ( ) | is the number of compositions of n — 1 into 4 nonnegative integers, ^ " ^ j ^ , = n ( n + l ) ( n + 2)/6 = 0 ( n ) . We must therefore store 0(Bn ) u t i l i t y values. T h i s is significantly more compact than the A G G representation without function nodes, which had a representation size of 0(B ^ " " V ) ^ )• p s P P s s s m g p p S p 3 3 Remark 1. One property of the A G G representation as defined i n Section 2.2.1 is that u t i l i t y function u is shared b y a l l players that have s i n their action sets. W h a t i f we want to represent games w i t h agent-specific u t i l i t y functions, where utilities depend not only on s a n d D^ \ b u t also o n the identity of the player playing s? W e could split s into i n d i v i d u a l player's actions S j , Sj etc., so t h a t each action node has its own u t i l i t y function, however the resulting A G G would not be able to take advantage of the fact that the actions s , , Sj affect the other players' utilities i n the same way. U s i n g function nodes, we are able to compactly represent this k i n d of structure. We again split s into separate action nodes s;, Sj, but also introduce a function node p w i t h S j , Sj as its neighbors, s s Chapter 2. Action Graph Games 19 m m ai ..3 3 '9 ,3 a 1 m I Figure 2.5: A 5 x 6 Coffee Shop Game: Left: the AGG representation without function nodes (looking at only the neighborhood of the a node s). Middle: we introduce two function nodes. Right: s now has only 3 incoming edges. and define f to be the summation operator f (D^ >) = ^2 D(m). This way the function node p with its configuration D(p) acts as if Si and Sj had been merged into one node. Action nodes could then include p instead of both Si and Sj as a neighbor. This way agents can have different utility functions, without sacrificing representational compactness. p p 2.4.4 p meup Computing with A G G F N s Our expected-payoff algorithm cannot be directly applied to AGGFNs with arbitrary f . First of all, projection of strategies does not work directly, because a player j playing an action Sj v(s) could still affect Z?' ' via function nodes. Furthermore, our DP algorithm for computing the probabilities does not work because for an arbitrary function node p G v{s), each player would not be guaranteed to affect D(p) independently. Therefore in the worst case we need to convert the AGGFN to an AGG without function nodes in order to apply our algorithm. This means that we are not always able to translate the extra compactness of AGGFNs over AGGs into more efficient computation. p s Definition 1. An AGGFN is contribution-independent (CI) if • For all p G P, v(p) C S, i.e. the neighbors of function nodes are action nodes. • There exists a commutative and associative operator *, and for each node s G S an integer w , such that given an action profile s, for all p G P, s D{p) = *ieN: S i e i /(p)W S i . Note that this definition entails that D(p) can be written as a function of by collecting terms: D(p) = f (D^) = * »( )(*k=i *>,)• The coffee shop game is an example of a contribution-independent AGGFN, with the summation operator serving as *, and w = 1 for all s. For the parity game mentioned earlier, * is instead addition mod 2. If we are modeling an auction, and want D(p) to represent the amount of the winning bid, we would let w be the bid amount corresponding to action s, and * be the max operator. DM p se P s s Chapter 2. A c t i o n Graph Games 20 For contribution-independent A G G F N s , it is the case that for a l l function nodes p, each player's strategy affects D(p) independently. T h i s fact allows us to adapt our algorithm to efficiently compute the expected payoff Vy(cr_t). F o r simplicity we present the algorithm for the case where we have one operator * for a l l p € P, but our approach can be directly applied to games w i t h different operators associated w i t h different function nodes, and likewise w i t h a different set of w for each operator. We define the contribution of action s to node m G S U P, denoted C {m), s s as 1 if m = s, 0 if m G S \ {s}, and * '£v(m)(*k=i ^ w ) i f m G P . T h e n it is easy to verify that given an action profile s, D(s) = 5Z™=1 C (s) for all s G S and D{p) = * ! ? C (p) for all p G P. G i v e n that player i played s , , we define the projected contribution of action s, denoted Ci \ as the tuple (C (m)) ( y Note that different actions may have identical projected contributions. Player j's m i x e d strategy Oj i n duces a probability distribution over j ' s projected contributions, P r ( C ' * ) | < 7 j ) = 2~2 ..c< i)_(7(»i) i( j')- N o w we can operate entirely using the probabilities o n projected contributions instead of the m i x e d strategy probabilities. T h i s is (s •) analogous to the projection of Oj to <r) i n our algorithm for A G G s without function nodes. A l g o r i t h m 1 for computing the distribution Pr[p^\o) can be straightforl s m aj Sj = 1 St s 7n€u Si s s CT s wardly adopted to work w i t h contribution-independent A G G F N s : whenever we apply player k's contribution Cil'^ to D ^\, s k the resulting configuration D ^ is a k computed componentwise as follows: D \m) = C s ^ ( m ) + D z\(m) if m G 5 , and D '\m) = ci '\m) * D ^ j ( m ) if m G P. Following similar complexity analysis, if an A G G F N is contribution-independent, expected payoffs can be computed in polynomial time w i t h respect t o the representation size. A p p l i e d to S 3 4 the coffee shop example, since |A( )| = 0(n ), our algorithm takes (9(n|5| +n ) time, which grows linearly i n \S\. Remark 2. We note that similar ideas are emloyed i n the variable elimination algorithms that exploit causal independence i n Bayes nets [Zhang and Poole 1996]. Bayes nets are compact representations of probability distributions that graphically represent independencies between r a n d o m variables. A Bayes net is a D A G where nodes represent r a n d o m variables and edges represent direct probabilistic dependence between random variables. Efficient algorithms have been developed to compute conditional probabilities i n Bayes nets, such as clique tree propagation and variable elimination. C a u s a l independence refers to the situation where a node's parents (which may represent causes) affect the node independently. T h e conditional probabilities of the node can be defined using a binary operator that can be applied to values from each of the parent variables. Zhang and Poole [1996] proposed a variable elimination algorithm that exploits causal independence by factoring the conditional probability distribut i o n into factors corresponding to the causes. T h e way factors are combined together is similar i n spirit to our D P algorithm that combines the independent contributions of the players' strategies to the configuration D^ \ Sl k s k s k s k Si Chapter 2. Action Graph Games 21 T h i s parallel between Bayes nets a n d action graphs are not surprising. In A G G F N s , we are t r y i n g to compute the probability distribution over configurations Pr(D^\o-^ ^). If we see each node m i n the action graph as a r a n d o m variable D(m), this is the joint distribution of variables v(si). However, whereas edges i n Bayes nets represent probabilistic dependence, edges i n the action graph have different semantics depending on the target. Incoming edges of action nodes specifies the neighborhood v(s) that we are interested i n computing the probabilities of. Incoming edges of a function node represents the deterministic dependence between the r a n d o m variable of the function node D(p) and its parents. T h e only probabilistic components of action graphs are the players' mixed strategies. These are probability distributions of r a n d o m variables associated w i t h players, b u t are not explicitly represented i n the act i o n graph. Whereas A G G F N s i n general are not D A G s , given a n action s, we can construct a n induced Bayes net consisting of ^ ( s ) , the neighbors of function nodes i n v(s), a n d the neighbors of any new function nodes included, a n d so on until no more function nodes are included, a n d finally augmented w i t h n nodes representing the players' mixed strategies. Whereas for C I A G G F N s , the Bayes net formulation has a simple structure and does not yield a more efficient algorithm compared to A l g o r i t h m 1, this formulation could be useful for nonC I A G G F N s w i t h a complex network of function nodes, as standard Bayes net algorithms can be used to exploit the independencies i n the induced Bayes net. Si 2.5 Applications 2.5.1 Application: Computing Payoff Jacobian A game's payoff J a c o b i a n under a mixed strategy er is defined as a Yli \Si\ by Y2i m a t r i x w i t h entries defined as follows: (2.20) = Y, ( iMsi,Si>,s))Pr{s\cr) u s ses Here whenever we use an overbar i n our notation, it is shorthand for the subscript — {i, i'}. For example, s = s_{^/j. T h e rows of the m a t r i x are indexed by i and Si while the columns are indexed by i' a n d Si>. G i v e n entry V V y . ' . , (<f), l s we call S{ its primary action node, and s;/ its secondary action node. One of the m a i n reasons we are interested i n computing Jacobians is that it is the computational bottleneck i n G o v i n d a n and W i l s o n ' s continuation method for finding mixed-strategy N a s h equilibria i n multi-player games [Govindan a n d W i l s o n 2003]. T h e G o v i n d a n - W i l s o n algorithm starts by perturbing the payoffs to obtain a game w i t h a known equilibrium. It then follows a p a t h that is guaranteed t o give us one or more equilibria of the unperturbed game. I n each step, we need to compute the payoff J a c o b i a n under the current mixed strategy 22 Chapter 2. Action Graph Games i n order to get the direction of the p a t h ; we then take a small step along the p a t h and repeat. Efficient c o m p u t a t i o n of the payoff J a c o b i a n is important for more t h a n this continuation m e t h o d . For example, the iterated p o l y m a t r i x a p p r o x i m a t i o n ( I P A ) method [Govindan and W i l s o n 2004] has the same c o m p u t a t i o n a l problem at its core. A t each step the I P A method constructs a p o l y m a t r i x game that is a linearization of the current game w i t h respect t o the m i x e d strategy profile, the L e m k e - H o w s o n algorithm is used to solve this game, and the result updates the mixed strategy profile used i n the next iteration. T h o u g h theoretically it offers no convergence guarantee, I P A is typically much faster t h a n the continuation method. A l s o , it is often used t o give the continuation m e t h o d a quick start. The payoff J a c o b i a n m a y also be useful to multiagent reinforcement learning algorithms that perform policy search. E q u a t i o n (2.20) shows that the V V ^ ' Y , (tf) element of the J a c o b i a n can be interpreted as the expected u t i l i t y of agent i when she takes action s ; , agent i ' takes action s^, and a l l other agents use mixed strategies according to o. S o a straightforward approach is to use our D P algorithm to compute each entry of the J a c o b i a n . However, the J a c o b i a n m a t r i x has certain e x t r a structure that allows us t o achieve further speedup. F i r s t , we observe that some entries of the J a c o b i a n are identical. If two entries have same p r i m a r y action node s, then they are expected payoffs o n the same u t i l i t y function u , i.e. they have the same value if their induced probability distributions over A^ are the same. W e need to consider two cases: s 1. Suppose the two entries come from the same row of the J a c o b i a n , say player i's action S j . T h e r e are two sub-cases to consider: (a) Suppose the columns of the two entries belong t o the same player j, but different actions Sj and s'j. If sj- ^ = s'j \ i.e. Sj and s'j b o t h 8 Si project t o the same projected action i n S j ' s projected action graph, then V V . V . = V V „ H ' • S (b) Suppose the columns of the entries correspond t o actions of different players. W e observe that for all j and Sj such t h a t a^ \sj ^) Si St = 1, VV^ .{a) = Vl(<r-i). A s a special case, if S J = {0}, i.e. agent j does not affect i ' s payoff when i plays s$, then for a l l Sj G Sj, 8 i ) s V % ( f f ) = y *(a_i). s 2. If Si and Sj correspond to the same action node s (but owned by agents i and j respectively), thus sharing the same payoff function u , then V V , V , . = V V / ' Y - Furthermore, if there exist s' G Si,s' G 5,- such s t that s\ {s) = s' ( s ) , then VV''{, = 4 VV/'Y. E v e n if the entries are not equal, we can exploit the s i m i l a r i t y of the projected strategy profiles (and thus the s i m i l a r i t y of the induced distributions) between entries, and re-use intermediate results when c o m p u t i n g the induced Chapter 2. Action Graph Games 23 distributions of different entries. Since c o m p u t i n g the induced probability distributions is the bottleneck of o u r expected payoff algorithm, this provides significant speedup. F i r s t we observe that if we fix the row (i, S j ) and the column's player j, then o is the same for a l l secondary actions Sj £ Sj. W e can compute the probability d i s t r i b u t i o n P r ( D _ i | s , , c ? ' ^ ) , then for a l l Sj £ Sj, we just need to apply the action Sj t o get the induced probability d i s t r i b u t i o n for the entry V V ' ' ^ . N o w suppose we fix the row (i,Si). F o r two c o l u m n players j and j', their corresponding strategy profiles c-{tj} and a_^jiy are very similar, i n fact they are identical i n n—3 of the n —2 components. For A G G s without function nodes, we c a n exploit this similarity by c o m p u t i n g the d i s t r i b u t i o n P r ( D _ i | c r ^ ) , then for each j ^ i, we " u n d o " j's m i x e d strategy t o get the distribution induced b y cr-{ij}. R e c a l l from Section 2.3.5 that the distributions are coefficients of the m u l t i p l i c a t i o n of certain polynomials. So we can undo j ' s strategy by c o m p u t i n g the long division of the p o l y n o m i a l for a^i b y the p o l y n o m i a l for <jj. 8 n 8 n T h i s method does not work for contribution-independent A G G F N s , because i n general a player's contribution t o the configurations are not reversible, i.e. given Pr(D -i\a ^l ) a n d Oj, it is not always possible to undo the contributions of <Tj. Instead, we can efficiently compute the distributions b y recursively bisecting the set of players i n to sub-groups, c o m p u t i n g probability distributions induced b y the strategies of these sub-groups a n d combining t h e m . F o r example, suppose n = 9 a n d i = 9, so O-i = o\...%. W e need t o compute the distributions induced b y < 7 _ { } , . . . , c_.r g}, respectively. N o w we bisect cr_j into <7i...4 and 05...8. Suppose we have computed the distributions induced b y (T1...4 as well as (T234,CT134,<xi24,CT123,a n d s i m i l a r l y for the other group of 5 . . . 8. T h e n we c a n compute P r ( • |af_ |\ y) b y combining Pr(-.|o23*4) d P ('l°5678)' compute P r ( - | c r ^ 2 j ) b y combining P r ^ l c r ^ ) a n d Pr(-|05g* g), etc. W e have reduced the problem into two smaller problems over the sub-groups 1 . . . 4 a n d 5 . . . 8, which can then be solved recursively by further bisecting the sub-groups. T h i s m e t h o d saves the re-computation of sub-groups of strategies when comp u t i n g the induced distributions for each row of the J a c o b i a n , a n d it works w i t h any contribution-independent A G G F N s because it does not use long division to undo strategies. < ) n 8i lj9 s a n r g 9 2.6 7 Experiments W e implemented the A G G representation and o u r a l g o r i t h m for c o m p u t i n g expected payoffs a n d payoff Jacobians i n C + + . W e r a n several experiments to compare the performance of our implementation against the (heavily optimized) G a m e T r a c e r implementation [Blum et a l . 2002] which performs the same comp u t a t i o n for a n o r m a l f o r m representation. W e used the Coffee Shop game (with randomly-chosen payoff values) as a benchmark. W e varied b o t h the number of players and the number of actions. Chapter 2. A c t i o n Graph Games 00000000 1 00000000 -•-AGG 10000000 o 1000000 W (A ro >. n a. 100000 24 ------ 10000000 1000000 ••-NF * 100000 B 10000 10000 1000 1000 100 10 100 10 —»-AGG - » - NF 1 1 3 4 5 6 7 8 9 10 It 12 13 14 15 16 4 number of players 5 6 7 8 9 number of rows 10 F i g u r e 2.6: C o m p a r i n g Representation Sizes of the Coffee Shop G a m e (logscale). Left: 5 x 5 grid w i t h 3 to 16 players. R i g h t : 4-player r x 5 grid w i t h r varying from 3 t o 10. 2.6.1 Representation Size F o r each game instance we counted the number of payoff values that need t o be stored i n each representation. Since for b o t h n o r m a l form and A G G , the size of the representation is dominated by the number of payoff values stored, the number of payoff values is a good indication of the size of the representation. We first looked at Coffee Shop games w i t h 5 x 5 blocks, w i t h varying number of players. F i g u r e 2.6 has a log-scale plot of the number of payoff values i n each representation versus the number of players. T h e n o r m a l form representation grew exponentially w i t h respect to the number of players, and quickly becomes i m p r a c t i c a l for large number of players. T h e size of the A G G representation grew p o l y n o m i a l l y w i t h respect to n. W e then fixed the number of players at 4, a n d varied the number of blocks. For ease of comparison we fixed the number of colums at 5, and only changed the number of rows. F i g u r e 2.6 has a log-scale plot of the number of payoff values versus the number of rows. T h e size of the A G G representation grew linearly w i t h the number of rows, whereas the size of the n o r m a l form representation grew like a higher-order p o l y n o m i a l . T h i s was consistent w i t h our theoretical prediction that A G G F N s store 0(\S\n ) payoff values for Coffee Shop games while n o r m a l form representations store n\S\ payoff values. 3 n 2.6.2 Expected Payoff Computation Second, we tested the performance of our d y n a m i c programming algorithm against G a m e T r a c e r ' s n o r m a l form based a l g o r i t h m for computing expected payoffs, o n Coffee Shop games of different sizes. F o r each game instance, we generated 1000 r a n d o m strategy profiles w i t h full support, a n d measured the C P U (user) time spent c o m p u t i n g the expected payoffs under these strategy profiles. W e fixed the size of blocks at 5 x 5 and varied the number of players. F i g u r e 2.7 shows plots of the results. F o r very small games the n o r m a l form based algorithm is faster due t o its smaller bookkeeping overhead; as the n u m - Chapter 2. Action Graph Games co •D 120 100 80 60 40 —-AGG - « - NF 8 25 —•—AGG " -»-NF a • ill .a * 1,1,1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 number of players 3 4 5 6 7 8 9 10 number of rows F i g u r e 2.7: R u n n i n g times for payoff computation i n the Coffee Shop G a m e . Left: 5 x 5 g r i d w i t h 3 to 16 players. R i g h t : 4-player r x 5 grid w i t h r varying from 3 to 10. ber of players grows larger, our A G G F N - b a s e d algorithm's running time grows polynomially, while the n o r m a l form based algorithm scales exponentially. F o r more t h a n five players, we were not able to store the normal form representation i n memory. N e x t , we fixed the number of players at 4 and number of columns at 5, a n d varied the number of rows. O u r algorithm's running time grew roughly linearly i n the number of rows, while the n o r m a l form based algorithm grew like a higherorder p o l y n o m i a l . T h i s was consistent w i t h our theoretical prediction that our algorithm take 0 ( n | 5 | + n ) time for this class of games while normal-form based algorithms take 0 ( | 5 | ) time. L a s t , we considered strategy profiles having p a r t i a l support. W h i l e ensuring that each player's support included at least one action, we generated strategy profiles w i t h each action included i n the support w i t h probability 0.4. G a m e Tracer took about 6 0 % of its full-support running times to compute expected payoffs i n this d o m a i n , while our algorithm required about 2 0 % of its fullsupport r u n n i n g times. 4 r a _ 1 2.6.3 Computing Payoff Jacobians We have also r u n similar experiments on computing payoff Jacobians. A s discussed i n Section 2.5.1, the entries of a J a c o b i a n can be formulated as expected payoffs, so a J a c o b i a n c a n be computed by doing a n expected payoff c o m p u t a t i o n for each of its entry. In Section 2.5.1 we discussed methods that exploits the structure of the J a c o b i a n to further speedup the computation. G a m e T r a c e r ' s normal-form based implementation also exploits the structure of the J a c o b i a n by re-using p a r t i a l results of expected-payoff computations. W h e n comparing our A G G - b a s e d J a c o b i a n algorithm as described i n Section 2.5.1 against G a meTracer's implementation, the results are very similar to the above results for computing- expected payoffs, i.e. our implementation scales p o l y n o m i a l l y i n n while G a m e T r a c e r scales exponentially i n n. W e instead focus o n the question of how much speedup does the methods i n Section 2.5.1 provide, by comparing Chapter 2. Action AGG Jacobian Graph Games 26 T3 C O o a in £ a | a. u 4 5 6 7 8 9 10 number of players 5 6 7 8 9 10 number of rows F i g u r e 2.8: R u n n i n g times for J a c o b i a n computation in the Coffee Shop G a m e . Left: 5 x 5 grid w i t h 3 to 10 players. Right: 4-player r x 5 grid w i t h r varying from 3 to 10. our algorithm in Section 2.5.1 against the algorithm that computes expected payoffs (using our A G G - b a s e d algorithm described in Section 2.3) for each of the Jacobian's entries. T h e results are shown in F i g u r e 2.8. O u r a l g o r i t h m is about 50 times faster. T h i s confirms that the methods discussed in Seciton 2.5.1 provide significant speedup for computing payoff Jacobians. 2.6.4 Finding Nash Equilibria using the Govindan-Wilson algorithm G o v i n d a n and W i l s o n ' s algorithm [Govindan and W i l s o n 2003] is one of the most competitive algorithms for finding N a s h equilibria for multi-player games. T h e computational bottleneck of the algorithm is repeated c o m p u t a t i o n of payoff Jacobians as defined i n Section 2.5.1. N o w we show experimentally that the speedup we achieved for computing Jacobians using the A G G representation leads to a speedup in the G o v i d a n - W i l s o n algorithm. W e compared two versions of the G o v i n d a n - W i l s o n a l g o r i t h m : one is the implementation in G a m e T r a c e r , where the J a c o b i a n c o m p u t a t i o n is based on the n o r m a l form representation; the other is identical to the G a m e T r a c e r i m plementation, except that the Jacobians are computed using our algorithm for the A G G representation. B o t h techniques compute the Jacobians exactly. A s a result, given an initial perturbation to the original game, these two implementations would follow the same p a t h and return exactly the same answers. So the difference in their running times would be due to the different speeds of c o m p u t i n g Jacobians. A g a i n , we tested the two algorithms on Coffee Shop games of varying sizes: first we fixed the size of blocks at 4 x 4 and varied the number of players; then we fixed the number of players at 4 and number of columns at 4, and varied the number of rows. F o r each game instance, we r a n d o m l y generated 10 i n i t i a l perturbation vectors, and for each initial perturbation we r u n the two versions of the G o v i n d a n - W i l s o n algorithm. Since the running time of the G o v i n d a n W i l s o n algorithm highly depends on the initial perturbation, it is not meaningful Chapter 2. Action Graph Games number of players 27 number of rows F i g u r e 2.9: Ratios of R u n n i n g times for the G o v i n d a n - W i l s o n a l g o r i t h m i n the Coffee Shop G a m e . Left: 4 x 4 grid w i t h 3 to 5 players. R i g h t : 4-player r x 4 grid w i t h r varying from 3 to 9. T h e error bars indicate standard deviation over 10 r a n d o m i n i t i a l perturbations. T h e constant lines at 1.0 indicating equal running times are also shown. to compare the running times w i t h different i n i t i a l perturbations. Instead, we look at the ratio of running times between the n o r m a l form implementation and the A G G implementation. T h u s a ratio greater t h a n 1 means the A G G implementation spent less time t h a n the n o r m a l form implementation. W e plotted the results i n Figure 2.9. T h e results confirmed our theoretical prediction that as the size of the games grows (either i n the number of players or i n the number of actions), the speedup of the A G G implementation compared to the n o r m a l from implementation increases. 2.7 Conclusions We presented a polynomial-time a l g o r i t h m for computing expected payoffs i n action-graph games. For A G G s w i t h bounded in-degree, our algorithm achieves an exponential speed-up compared to normal-form based algorithms and B h a t and L e y t o n - B r o w n [2004] 's algorithm. W e also extended the A G G representation by introducing function nodes, which allows us to compactly represent a wider range of structured utility functions. W e showed that if a n A G G F N is contribution-independent, expected payoffs can be computed i n p o l y n o m i a l time. O u r current and future research includes two directions: C o m p u t a t i o n a l l y , we p l a n to apply our expected payoff algorithm to speed up other game-theoretic . computations, such as computing best responses and the simplicial subdivision a l g o r i t h m for finding N a s h equilibria. A l s o , as a direct corollary of our T h e o r e m 2 and P a p a d i m i t r i o u [2005]'s result, correlated equilibria can be computed i n time p o l y n o m i a l i n the size of the A G G . Representationally, we plan to extend the A G G framework to represent more types of structure such as a d d i t i v i t y of payoffs. In particular, we intend to study Chapter 2. Action Graph Games 28 is Bayesian games. In a Bayesian game, players are uncertain about which game (i.e. payoff function) they are playing, and each receives certain private information about the underlying game. Bayesian games are heavily used in economics for modeling competitive scenarios involving information asymmetries, e.g: for modeling auctions and other kinds of markets. A Bayesian game can be seen as a compact representation, since it is much more compact than its induced normal form. We plan to use the AGG framework to represent not only the structure inherent in Bayesian games, but also context-specific independence structures such as the ones we have considered here. 29 Chapter 3 Bidding Agents for Online Auctions with Hidden Bids 3.1 Introduction T h e r e has been much research o n the study of automated bidding agents for auctions a n d other market-based environments. T h e T r a d i n g Agent C o m p e t i tions ( T A C - C l a s s i c a n d T A C S u p p l y C h a i n Management) have attracted m u c h interest [Wellman et a l . 2002]. T h e r e have also been research efforts o n b i d ding agents a n d bidding strategies i n other auction environments [Byde 2002; B o u t i l i e r et a l . 1999; Greenwald' and B o y a n 2004; A r o r a et a l . 2003; C a i a n d W u r m a n 2003; A n t h o n y et a l . 2001]. A l t h o u g h this b o d y of work considers m a n y different auction environments, b i d d i n g agents always face a similar task: given a valuation function, the bidding agent needs to compute a n o p t i m a l b i d ding strategy that maximizes expected surplus. (Some environments such as T A C - S C M also require agents to solve a d d i t i o n a l , e.g., scheduling tasks.) 1 T h e " W i l s o n D o c t r i n e " in mechanism design argues that mechanisms should be constructed so that they are "detail-free"—that is, so that agents can behave rationally i n these mechanisms even without information about the distribution of other agents' valuations. F o r example, the V C G mechanism is detail-free because under this mechanism it is a weakly dominant strategy to b i d exactly one's valuation, regardless of other agents' beliefs, valuations or actions. U n der c o m m o n assumptions (in particular, independent private values) single-item E n g l i s h auctions are similar: a n agent should remain i n the auction u n t i l the bidding reaches the amount of her valuation. W h i l e detail-free mechanisms are desirable, they are not ubiquitous. V e r y often, agents are faced w i t h the problem of deciding how to behave in games that do not have dominant strategies a n d where other agents' preferences are strategically relevant. F o r example, we m a y want to participate i n a series of auctions r u n by different sellers at different times. T h i s is a c o m m o n scenario at online auction sites such as eBay. In Section 3.4 we consider a sequential auction model of this scenario, a n d show that information about other bidders' preferences is very relevant i n constructing a bidding strategy. A version of this chapter has been submitted for publication. Jiang, A . X . and L e y t o n B r o w n , K . (2005) B i d d i n g Agents for Online Auctions with H i d d e n Bids. Submitted to the Machine L e a r n i n g Journal. : Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 3.1.1 30 Game-Theoretic and Decision-Theoretic Approaches H o w should a bidding agent be constructed? Depending on the assumptions we choose to make about other bidders, two broad approaches to computing b i d ding strategies suggest themselves: a game-theoretic approach a n d a decisiontheoretic approach. T h e game theoretic approach assumes that a l l agents are perfectly rational a n d that this rationality is c o m m o n knowledge; the auction is modeled as a Bayesian game. U n d e r this approach, a bidding agent would compute a Bayes-Nash equilibrium of the auction game, a n d play the equilibr i u m bidding strategy. M u c h of the extensive economics literature o n auctions follows this approach (see, e.g., the survey i n [Klemperer 2000]). F o r example, i n environments w i t h multiple, sequential auctions for identical items a n d in which each bidder wants only a single item, the B a y e s - N a s h e q u i l i b r i u m is well-known [Milgrom a n d Weber 2000; Weber 1983]. Such equilibria very often depend o n the distribution of agents' valuation functions a n d the number of bidders. A l t h o u g h this information is rarely available i n practice, it is usually possible to estimate these distributions from the bidding history of previous auctions of similar items. Note that this involves m a k i n g the assumption t h a t past a n d future bidders w i l l share the same valuation distribution. T h e game-theoretic approach has received a great deal of study, a n d is perhaps t h e dominant p a r a d i g m i n microeconomics. In particular, there are very good reasons for seeking strategy profiles that are resistant to unilateral deviation. However, this approach is not always useful to agents who need to decide what to do i n a particular setting, especially when the rationality of other b i d ders is i n doubt, when the c o m p u t a t i o n of equilibria is intractable, or when the game has multiple equilibria. In such settings, it is sometimes more appropriate to rely o n decision theory. A decision-theoretic approach effectively treats other bidders as part of the environment, a n d ignores the possibility that they m a y change their behavior i n response to the agent's actions. W e again make the assumption that the other bidders come from a stationary p o p u l a t i o n ; however, i n this case we model agents' b i d amounts directly rather t h a n modeling their valuations and then seeking a n equilibrium strategy. W e then solve the resulting single-agent decision problem to find a bidding strategy that maximizes expected payoff. W e could also use a reinforcement-learning approach, where we continue to learn the bidding behavior of other bidders while participating i n the auctions. M u c h recent literature o n bidding agent design follows the decision-theoretic approach, e.g. [Boutilier et a l . 1999; B y d e 2002; Greenwald and B o y a n 2004; Stone et a l . 2002; M a c k i e - M a s o n et a l . 2004; Osepayshvili et a l . 2005]. T h i s chapter does not attempt to choose between these two approaches; it is our opinion that each has domains for which it is the most appropriate. T h e i m p o r t a n t point is that regardless of which approach we elect to take, we are faced w i t h the subproblem of estimating two distributions f r o m the bidding history of past auctions: the distribution o n the number of bidders, a n d the distribution of b i d amounts (for decision-theoretic approaches) or of valuations Chapter 3. Bidding Agents for Online Auctions with Hidden Bids (for game-theoretic approaches). Consequently, this problem has received much discussion in the literature. F o r example, A t h e y and Haile [2002] and various others in econometrics study the estimation of valuation distributions in various standard auction types given observed bids, assuming that bidders are perfectly rational and follow equilibrium strategies. O n the decision-theoretic front, a popular approach is to estimate the distribution of the final prices in auctions based on observed selling prices, and then to use this distribution to compute the o p t i m a l bidding strategy. E x a m p l e s of this include Stone et al.'s [2002] A T T a c agent for the T r a d i n g Agent C o m p e t i t i o n , M a c k i e - M a s o n et al.'s [2004] study of bidding in simultaneous ascending auctions a n d the follow-up work i n [Osepayshvili et al. 2005], B y d e ' s [2002] study of bidding in simultaneous online E n g l i s h auctions, and Greenwald and B o y a n ' s [2004] analysis of sequential E n g l i s h auctions. A paper by B o u t i l i e r et al. [1999] takes a different decisiontheoretic approach which is relevant to the approach we propose in this chapter. W e defer discussion of this work u n t i l Section 3.6, after we have presented our approach. 3.1.2 Overview In this chapter we consider sequential E n g l i s h auctions in which a full bidding history is revealed, such as the online auctions r u n by eBay. It might seem that there is very little left to say: we learn the distributions of interest f r o m the bidding history, then compute a bidding strategy based on that information for the current and future auctions. However, we show that under realistic bidding dynamics (described i n Section 3.2) the observed bidding histories omit some relevant information. F i r s t , some bidders may come to the auction when it is already in progress, find that the current price already exceeds their valuation, a n d leave without placing a b i d . Second, the amount the winner was willing to pay is never revealed. Ignoring these sources of bias can lead to poor estimates of the underlying valuation distribution and distribution of number of bidders. In Section 3.3 we propose a learning approach based on the E x p e c t a t i o n M a x i m i z a t i o n ( E M ) algorithm, which iteratively generates hidden bids consistent w i t h the observed bids, and then computes m a x i m u m - l i k e l i h o o d estimations of the valuation distribution based on the completed set of bids. T h e learned distributions can then be used in c o m p u t i n g decision-theoretic or game-theoretic bidding strategies. Section 3.4 discusses the c o m p u t a t i o n of the o p t i m a l strategy for a n agent i n a repeated E n g l i s h auction setting under the decision-theoretic approach, and i n a similar (non-repeated) setting under the game-theoretic approach. In Section 3.5 we present experimental results on synthetic d a t a sets as well as on d a t a collected from eBay, which show that our E M learning approach makes better estimates of the distributions, gets more payoff under the decision-theoretic m o d e l , and makes a better a p p r o x i m a t i o n to the B a y e s - N a s h equilibrium under the game-theoretic model, as compared to the straightforward approach which ignores hidden bids. 31 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 3.2 32 A M o d e l of Online Auctions Consider a (possibly repeated) auction held online, e.g., o n eBay. T h e r e are TO potential bidders interested i n a certain auction for a single item. W e assume that bidders are risk-neutral, have independent private values ( I P V ) , a n d that utilities are quasilinear. T h e number of bidders TO is drawn f r o m a discrete distribution g(m) w i t h support [2, oo). B i d d e r s ' potential valuations (in the game-theoretic context) or bids (in the decision-theoretic context) are independently d r a w n from a continuous distribution f{x). 3.2.1 Bidding Dynamics T h e m potential bidders arrive at the auction site sequentially. W h e n each bidder arrives, she observes the bids that have been accepted i n the auction so far, places a single proxy b i d a n d then leaves the system. T h e auctioneer processes new bids as follows: 2 1. W h e n a proxy b i d is submitted, the auctioneer compares it to the current price level, which is the second-highest proxy b i d so far plus a small b i d increment. 3 (a) If the submitted b i d is not greater t h a n the current price level the b i d is dropped a n d no record of the b i d is recorded i n the auction's history. (b) If the submitted b i d is higher t h a n the current price level but lower t h a n the highest proxy b i d so far, then the submitted b i d is accepted, the b i d amount of the currently w i n n i n g b i d is increased to equal the submitted b i d (i.e., this b i d remains the w i n n i n g bid), a n d the submitted b i d loses. (c) If the submitted b i d is higher t h a n the previously w i n n i n g b i d then the price level is increased to equal the previously w i n n i n g b i d a n d the submitted b i d is made the current winner. 2. A t the end of the auction, the item is awarded to the bidder who placed the highest b i d , a n d the final price level is the amount of the second highest bid. 3.2.2 Hidden Bids A c c o r d i n g to our model, some bidders' proxy b i d amounts w i l l be revealed. T h i s includes any b i d that was higher t h a n the current price level at the time it was placed, even if that b i d was immediately o u t b i d b y another proxy b i d . However, other bids w i l l not be observed. T h e r e are two types of hidden bids: 2 A proxy b i d d i n g system asks bidders for their m a x i m u m willingness to pay, and then bids up to this amount on the bidder's behalf. Such systems are common on online auction sites such as eBay; see e.g., h t t p : / / p a g e s . e b a y . c o m / h e l p / b u y / p r o x y - b i d d i n g . h t m l 3 F o r simplicity in this chapter we ignore this small increment and assume that the current price level is the second-highest proxy bid so far. Chapter 3. Bidding Agents for Online Auctions with Hidden Bids F i g u r e 3.1: A n example of the bidding process of a n auction w i t h 7 potential bidders. 1. T h e highest b i d of each auction XM2. D r o p p e d bids Xd that were lower than the current price when they were placed. Let us denote the set of visible bids as x and the set of hidden bids as XhL e t n denote the number of bidders that appears in the bidding history. T h i s means that x will always contain (n — 1) bids, since the winning bidder's proxy b i d is never observed. Since there are m potential bidders in total, (n — 1) bids are visible, and one b i d is the highest b i d Xhi, there are (m — n) dropped bids i n Xdv v Figure 3.1 shows an example of the bidding process for an auction according to our online auction model, and illustrates how bids are dropped. B i d s arrive sequentially from left to right. B i d s 3, 4 and 7 (the grey bars) will be dropped because they are lower t h a n the current bid amount at their respective time steps. T h e amount of the winning bid (6) will not be revealed, although an observer would know that it was at least the amount of b i d 2, w h i c h would be the amount paid by bidder 6. 3.2.3 Discussion O u r model of the bidding process is quite general. Notice that when a bidder observes that the price level is higher than her potential b i d , she may decide not to b i d in this auction. T h i s is equivalent to our model in w h i c h she always submits 33 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids the b i d , because dropped bids do not appear i n the bidding history (Indeed, this is the motivation for our model). A l s o our model covers the case of last-minute bidding, which happens quite often i n eBay auctions [Roth and Ockenfels 2002]: even though last-minute bids m a y be submitted almost simultaneously, e B a y processes the bids i n sequence. Observe that w i t h a proxy bidding system, a n d when agents have independent private values, bidders would not benefit from bidding more t h a n once i n an auction. However, i n practice eBay bidders quite often make multiple bids in one auction. One possible motivation for these bids is a desire to learn more information about the proxy b i d of the current high bidder [Shah et a l . 2003]. However, only the last b i d of the bidder represents her willingness to pay. G i v e n a bidder's last b i d , her earlier bids carry no extra information. Therefore, we w i l l be interested i n only the last b i d from each b i d d e r . W e c a n preprocess the bidding histories by removing a l l bids except the last bids from each bidder, without losing much information. 4 3.3 Learning the Distributions G i v e n the model of the bidding process, the first task of our bidding agent is to estimate the distributions f(x) and g(m) from the bidding history. Suppose we have access to the bidding history of K auctions for identical items. 3.3.1 The Simple Approach A simple approach is to ignore the hidden bids, a n d to directly estimate f(x) and g(m) from observed data. T h e observed number of bidders, n, is used to estimate g(m). T o estimate f(x) we use the observed bids x , which consists of (n — 1) bids for each auction. A n y standard density estimation technique m a y be used. Because it ignores hidden bids, this approach can be expected to produce biased estimates of / and g: v • g(m) will be skewed towards small values because n <m. • f(v) m a y be skewed towards small values because it ignores the winning b i d Xhi, or it m a y be skewed towards large values because it ignores the dropped bids XdA popular variation of this approach (mentioned in Section 3.1) is to d i rectly estimate the distribution of final prices from the selling prices of previous auctions, then to use this distribution to compute a decision-theoretic bidding strategy. T h e problem w i t h this approach is that for E n g l i s h auctions, the selling prices are the second-highest bids. A s we will show i n Section 3.4, to compute a decision-theoretic strategy we really need the distribution of the other bidders' 4 I n a common value model, the earlier bids do carry some information, and we would not be able to simply ignore those bids. 34 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids highest bids. U s i n g the distribution of final prices introduces bias, as this distrib u t i o n is skewed towards small values compared to the distribution of highest bids. 3.3.2 E M Learning Approach W e would like to have a n estimation strategy that accounts for the hidden bids Xh and any bias introduced by their absence. Suppose f(x) belongs to a class of distributions parameterized by 9, f[x\0). Further suppose that g(m) belongs to a class of distributions parameterized by A, g(m\\). F o r example, }(x) could be a N o r m a l distribution parameterized by its mean fx and variance a , whereas g(m) could be a Poisson distribution parameterized by its mean, A. W e want to find the m a x i m u m likelihood estimates of 9 a n d A given the observed data x . Suppose that we could actually observe the hidden bids Xh i n addition to x . T h e n estimating 9 and A from the completed data set (x ,Xh) would be easy. Unfortunately we do not have Xh- G i v e n x , a n d w i t h the knowledge of the bidding process, we could generate Xh if we knew 9 a n d A. Unfortunately we do not know 9 and A. A popular strategy for learning this k i n d of model w i t h missing d a t a is the E x p e c t a t i o n M a x i m i z a t i o n ( E M ) algorithm [Dempster et a l . 1977). E M is a n iterative procedure that alternates between E steps which generate the missing d a t a given current estimates for the parameters and M steps wjiich compute the m a x i m u m likelihood (or m a x i m u m a posteriori) estimates for the parameters based o n the completed data, which consists of the observed data a n d current estimates for the missing data. Formally, the E step computes 2 v v v v (3.1) Q{9) = j\og{p{x ,x \9))p{x \x y° \X^)dx , ld h v h v h a n d the M step solves the optimization 0("«") = a r g m a x ( Q ( 0 ) ) . (3.2) 6 Analogous computations are done to estimate A, the parameter for g(m\X). T h e E M algorithm terminates when A a n d 9 converge. It can be shown that E M w i l l converge to a local m a x i m u m of the observed d a t a likelihood function p{x \9, A). E M is a particularly appropriate algorithm i n our auction setting w i t h hidden bids, because v 1. E M was designed for finding m a x i m u m likelihood ( M L ) estimates for probabilistic models w i t h unobserved latent variables. 2. E M is especially helpful when the observed data likelihood function p(x \9,\) (which we want to maximize) is hard to evaluate, but the M L estimation given complete d a t a (x ,Xh) is relatively easy (because the M step corresponds to m a x i m i z i n g the expected log-likelihood of (x ,Xh))T h i s is exactly the case i n our auction setting. v v v 35 36 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids The E Step T h e integral i n E q u a t i o n (3.1) is generally intractable for our complex b i d d i n g process. However, we can compute a M o n t e C a r l o approximation of this integral by drawing N samples from the distribution p(xh\x ,0^ \ \(° ^), a n d approximating the integral b y a small s u m over the samples (see e.g. [Andrieu et a l . 2003]). A p p l i e d to our model, i n each E step our task is therefore to generate samples from the distribution p(xf \x , 9^ \ \( )). Recall that Xh consists of the highest b i d x^ and the dropped bids Xdold ld v old l old v Given a n d the second highest b i d (which is observed), the highest b i d Xhi can easily be sampled. A c c o r d i n g to the bidding process described i n Section 3.2, the highest b i d was generated by repeatedly drawing from f(x\9^ ^) u n t i l we get a b i d higher t h a n the previously highest (now second-highest) b i d . T h i s is exactly the rejection-sampling procedure for the distribution f(x\9^° ^) truncated at the second highest b i d a n d renormalized. F o r distributions w i t h a simple functional form (e.g. n o r m a l distributions), it m a y be easier to sample directly from the truncated distribution by reversing the C D F (see e.g. [West 1994]). S a m p l i n g the dropped bids Xd is a more difficult task. W e use the following procedure, which is based o n simulating the bidding process: old ld 1. SampleTOfrom g(m\\( ). cldS> 2. If m < n, reject the sample and go back to step 1. 3. Simulate the bidding process using x and m — n dropped bids: v (a) Repeatedly draw a sample b i d from f(x\6^ ^), and compare it to the current price level. If it is lower t h a n the price level, add the b i d to the set of dropped bids Xd- Otherwise, the current price level is increased to the next visible b i d from x . old v (b) If the number of bids i n Xd exceeds m — n, or if we used up a l l the bids i n x before we haveTO— n dropped bids i n Xd, we reject this sample a n d go back to step 1. O n l y when we used u p a l l bids i n x and we haveTO— n bids i n Xd, do we accept the sample of Xdv v 4. Repeat u n t i l we have generated N samples of Xd- The M Step O u r task at each M step is t o compute the m a x i m u m likelihood ( M L ) estimates of A a n d 9 from x a n d the generated samples of Xh- F o r many standard parametric families of distributions, there are analytical solutions for the M L estimates. F o r example, if f(x) is a normal distribution N(p, a), then given the complete set of bids (x ,Xh), the M L estimate of u, is the sample mean, a n d the M L estimate of a is the sample standard deviation. If g(m) is a Poisson distribution, then the M L estimate of the mean parameter A is the mean of the number of bidders per auction. If analytical solutions do not exist we can use v v Chapter 3. Bidding Agents for Online Auctions with Hidden Bids numerical optimization methods such as simulated annealing. If prior distributions on A a n d 6 are available, we m a y instead compute m a x i m u m a posteriori ( M A P ) estimates, which are point estimates of A a n d 9 that m a x i m i z e their posterior probabilities. 3.3.3 Learning Distributions in a Game-Theoretic Setting T h e approach we just described is decision-theoretic because we estimate the distribution of b i d amounts without considering how other agents would react to our behavior. W h a t if we want to take a game-theoretic approach? A t h e y and Haile [2002] discussed estimation i n the game-theoretic setting, however they generally assume that the number of bidders is known (there is a brief discussion of unknown number of bidders, but it is not relevant to our online auction setting). L e t f(v) be the distribution of bidder's valuations (instead of b i d amounts), and let g(m) denote the distribution of number of bidders, as before. G i v e n a bidder's valuation v, her b i d amount i n the game-theoretic setting is given b y the Bayes-Nash equilibrium of the auction game. M a n y (but not all) auction games have symmetric pure strategy equilibria. Assume there is a symmetric pure strategy equilibrium given by the b i d function b(v\f, g). O u r task is to estimate f(v) a n d g(m) given the observed bids. We can use a n E M learning a l g o r i t h m similar to the one i n Section 3.3.2 to estimate f(v\9) and g(m\\), where 9 a n d A are parameters for / and g: • E step: for each auction w i t h observed bids x : v — C o m p u t e observed bidders' valuations v bid function. from x v v by inverting the 5 — Generate bidders w i t h valuations Vh who place hidden bids Xh = b( h\f( \g( ^)T h i s is done by a similar procedure to the one i n Section 3.3.2 that simulates the auction's bidding process. v old old • M step: update 9 a n d A to maximize the likelihood of the valuations (Vv,Vh)- 3.4 Constructing a Bidding Agent In Sections 3.2 and 3.3 we presented a model of the bidding process for a single auction, and proposed methods to estimate the distributions of bids and number of bidders i n an auction. B u t our work is not done yet: how do we make use of these estimated distributions to compute a bidding strategy? 5 If the bidding function does not have a single-valued inverse function, or if the equilibrium has mixed strategies, we just generate bidders with valuation v v the equilibrium. that would likely b i d x v under 37 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids If we only participate i n one E n g l i s h auction, bidding is simple: under the I P V model it is a dominant strategy to b i d up t o our valuation for the item, a n d we do not even need to estimate the distributions. However if we are interested in more complex environments, good estimates of the distributions f(x) a n d g(m) are essential for computing a good bidding strategy. In this section we discuss two such environments: finitely-repeated online auctions and unrepeated online auctions without proxy bidding. W e consider the first problem from a decision-theoretic point of view, a n d take a game-theoretic approach to the second problem. 3.4.1 A Decision-Theoretic Approach to Repeated Auctions In this section we develop a decision-theoretic bidding agent for finitely repeated auctions. W e choose this setting because it is a reasonable model of the decisiontheoretic problem we would face if we wanted to b u y one item from a n online auction site. O u r estimation algorithm can straightforwardly be applied to more complex decision-theoretic models such as infinite horizon models w i t h discount factors and combinatorial valuation models. The Auction Environment Suppose we only want to b u y one item (say a P l a y s t a t i o n 2) i n a n environment (say eBay) where multiple auctions for similar, b u t not necessarily identical, P l a y s t a t i o n 2 systems are held regularly. R e c a l l that we have assumed that utility is quasilinear: thus if we successfully w i n one item, our utility will be equal t o our valuation for the item minus the price we paid. So our bidding agent's task is to compute a bidding strategy that will maximize this utility. Assume that we are only interested i n the first k auctions that w i l l be held after we arrived at the auction site. One motivation for such a restriction is that we prefer to have the item w i t h i n a bounded amount of time. If we fail t o w i n a n item from the k auctions, we lose interest i n the item and leave the auction site, and our utility is 0. (Alternatively, we could say that if we fail t o w i n a n auction then we could b u y the item from a store after leaving the auction site, i n which case we would get some other known and constant amount of utility.) Some of the k auctions m a y overlap i n time, b u t since e B a y auctions have strict closing times, this can be modeled as a sequential decision problem, where our agent makes bidding decisions right before each auction closes. N u m b e r the auctions 1 . . . k according t o their closing times. L e t Vj denote our valuation for the item from auction j. N o t e that this allows the items i n the auctions t o be non-identical. L e t bj denote our agent's b i d for auction j. L e t Uj denote our agent's expected payoff from participating i n auctions j .. .k, assuming we d i d not w i n before auction j. L e t Uk+i be our payoff if we fail t o w i n any of the auctions. F o r simplicity we define Uk+i = 0, though the analysis would be similar for any constant value. Suppose that for each auction j the number of other bidders is drawn from gj{m) a n d each bidder's b i d is drawn from fj(x). 38 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids39 Since each auction j is an E n g l i s h auction, only the highest b i d from other b i d ders affects our payoff. L e t fj (x) and Fj (x) respectively denote the probability density function and cumulative density function ( C D F ) of the highest b i d from other bidders i n the j auction. T h e n th oo F}(x)=J29 (rn)(F (x)) , m J j m=2 where Fj(x) is the C D F of fj(x). N o w Uj c a n be expressed as the following function of the future bids bj-.k = (bj, • • •, b ) and valuations Vj-.k = (VJ, ..., Vk)'k (VJ - x)fj(x)dx + (1 - Fj(bj))Uj i{b i:k,Vj i:k) (3.3) T h e first t e r m i n E q u a t i o n (3.3) is the expected payoff from the j second term is the expected payoff from the later auctions. auction; the + j+ + -oo th The Optimal Strategy Greenwald and B o y a n [2004] and A r o r a et al. [2003] have analyzed similar auction models. Following similar reasoning, we can derive the o p t i m a l b i d d i n g strategy for o u r auction model. L e t b*j. be the o p t i m a l b i d d i n g strategy for auctions j,...., k. L e t Uj(vj-k) denote the expected payoff under o p t i m a l strategy, i.e. Uj(vj-.k) = Uj(bj. ,Vj-k)W e c a n optimize Uj b y working from the k auction to the first one i n a manner similar t o backward i n d u c t i o n . B y solving the first-order conditions of Uj, we obtain the o p t i m a l bidding strategy: k th k (3.4) b*=Vj-U* (v .. ) +l j+1 k In other words, our agent should shade her bids b y the " o p t i o n v a l u e " , i.e. the expected payoff of participating i n future auctions. course the k th T h e exception is of auction; i n this case there are no future auctions and the o p t i m a l b i d is b* = Vk- T h u s we see that the standard result that honest b i d d i n g is k o p t i m a l i n an unrepeated auction is recovered as the special case k = 1. T h e computation of the o p t i m a l bidding strategies requires the c o m p u t a t i o n of the expected payoffs f * = Uj(b*j. ,Vj k), k d i s t r i b u t i o n fj(x) : which involves an integral over the ( E q u a t i o n 3.3). I n general this integral cannot be solved an- alytically, but we can compute its M o n t e C a r l o a p p r o x i m a t i o n i f we can sample from fj(x). If we can sample from fj(x) and gj(m), we can use the following straightforward procedure to generate a sample from 1. draw m from gj(m); 2. draw m samples from fj(x); 3. keep the m a x i m u m of these m samples. fj(x): Chapter 3. Bidding Agents for Online Auctions with Hidden Bids40 T h e bidding strategy b\. computed using E q u a t i o n s (3.4) a n d (3.3) is o p t i m a l , provided that the distributions fj(x) a n d gj(m) are the correct distributions of bids a n d number of bidders for a l l j € 1... k. O f course i n general we do not know the true fj{x) a n d gj(m) a n d the focus of this chapter is to estimate the distributions from the bidding history a n d use the estimated distributions to compute the bidding strategy. A s a result, the computed bidding strategy should be expected to achieve less t h a n the o p t i m a l expected payoff. However, it is reasonable to t h i n k that better estimates of f(x) and g(m) should give bidding strategies w i t h higher expected payoffs. T h i s is confirmed i n our experiments across a wide range of d a t a sets, which we discuss i n Section 3.5. k Auctions that Overlap in Time: Exploiting Early Bidding We observe that while the o p t i m a l b i d i n auction j does not depend o n fj, it does depend on / / for I > j. So far we have been estimating f}[x) using fi(x) and gi(m). In practice, auctions overlap i n time, and we often observe some early bidding a c t i v i t y by other bidders i n auctions j +1,..., k before we have to make a b i d o n auction j. T h i s extra information allows us to make even more informed (posterior) estimates o n fl(x), I > j, based o n fi(x), gi(m) a n d the observed bids for auction /, which leads to a better b i d for auction j. Suppose we have observed n — 1 early bids, denoted by x ; the current highest b i d Xh% is not revealed (but c a n be sampled from f(x) truncated at the current price). Since the auction is not over, there will be some set of future bids xfuture (possibly empty). W h e n the auction closes, the highest b i d from the other bidders w i l l be max{xhi, xfuture}- W e can generate Xf re if we know the number of future bids. We know the t o t a l number of bids m is d r a w n from g(m), and the number of bids made so far is n + \xd\, where xj, are the dropped bids so far, so the number of future bids is m — n — \xd\- N o w we have a procedure that samples from f (x): v utU 1 1. Simulate the auction using our model i n Section 3.2 to generate Xd, the dropped bids so far. 2. Sample the t o t a l number of bids m from g(m). 3. C o m p u t e the number of future bids, m — n — \xd\(a) If this quantity is negative, reject the sample. (b) Otherwise generate 4. Generate 3.4.2 Xhi Xf ture u as ( m — n — \x&\) bids d r a w n from f(x). and take the m a x i m u m of Xf uture and Xhi- A Game-Theoretic Approach to Bidding in Online Auctions without Proxies In Section 3.4.1 we discussed b u i l d i n g a decision-theoretic bidding agent for repeated E n g l i s h auctions. W h a t happens if we t r y to use the game-theoretic Chapter 3. Bidding Agents for Online Auctions with Hidden Bids41 approach on our auction model to build an agent t h a t plays a Bayes-Nash equil i b r i u m of the auction game? T h e difficulty turns out to be identifying the equilibrium. If each bidder other than our agent only participates i n one auction, then the situation is easy. In this case the other bidders' dominant strategies will be to bid t r u t h f u l l y up to the amount of their valuations. A s s u m i n g all bidders follow this dominant strategy as a game-theoretic approach would have us do, we find that the dist r i b u t i o n of bids is the same as the distribution of valuations. T h u s , our agent's strategy should be the same as the decision-theoretic o p t i m a l strategy described in Section 3.4.1. It is easy to verify that this strategy together w i t h the other players' t r u t h f u l bidding forms a B a y e s - N a s h equilibrium. If the other bidders participate i n more t h a n one auction then the equilibrium strategy gets more complex, b o t h strategically and computationally. M i l g r o m and Weber [2000] have derived equilibrium strategies for sequential auctions under some strong simplifying assumptions, such as identical items, a fixed number of bidders, and all bidders entering the auctions at the same time. In a n online auction setting, these assumptions are not reasonable: the items are often heterogenous, the number of bidders is u n k n o w n ex ante, and bidders have different entry times and exit policies. T h e equilibrium strategy in the general case is not known. Therefore, it is an open problem to find a game-theoretic model for repeated online auctions that is realistic and yet still tractable. Online Auctions without Proxies Because of the difficulties described above, we t u r n to a slightly different auct i o n setting in order to show how our distribution learning techniques can be used to b u i l d a game-theoretic bidding agent. Specifically, we will consider an online auction setting which differs in one crucial respect from the setting defined in Section 3.2. Bidders still participate in an online ascending auction against an u n k n o w n number of opponents and still arrive and b i d sequentially; however, now we add the restriction that bidders cannot place proxy bids. Instead, bidders can now place only, a single b i d before they leave the auction forever; the game now takes on a flavor similar to a first-price auction, but w i t h different information disclosure rules. W e chose this game because it has hidden-bid characteristics similar to the online auctions we discussed earlier, but at the same time it also has a k n o w n , computationally t r a c t a b l e — a n d yet n o n - t r i v i a l — B a y e s - N a s h equilibrium. M o r e formally, suppose that there are m potential bidders interested in a single item. T h e number m is d r a w n from a distribution g(m), and is observed by the auctioneer but not the bidders. B i d d e r s have independent private valuations, d r a w n from the distribution f(v). A b i d history records every b i d which is placed along w i t h the b i d amount, and is observable by all bidders. T h e auction has the following rules: - • T h e auctioneer determines a r a n d o m sequential order to use in approaching bidders. 3. Chapter Bidding Agents for Online Auctions with Hidden 42 Bids • E a c h bidder is approached and asked to make a b i d . E a c h bidder is asked once a n d only once. • W h e n asked, a bidder has to either make a b i d which is at least the current highest b i d plus a constant m i n i m u m increment S, or decide not to b i d . 1. If the bidder places a b i d , it is recorded i n the b i d d i n g history. 2. Otherwise, no record is made of the fact that the auctioneer approached the bidder. • T h e auction ends after the last bidder makes her decision. T h e highest submitted b i d wins, a n d the winner pays her b i d amount. T h e winner's u t i l i t y is her valuation minus her payment; other bidders get zero utility. C o m p u t i n g E q u i l i b r i u m Strategies W e now t u r n to the construction of a game-theoretic b i d d i n g agent for the auction described i n Section 3.4.2. W e begin by supposing that the distributions f(v) a n d g(m) are c o m m o n knowledge, which allows us to treat the auction as a Bayesian game and find its Bayes-Nash equilibrium. Suppose there exists a pure-strategy e q u i l i b r i u m of this game. Consider bidder i w i t h valuation v, who is asked to make the next b i d when the b i d ding history is x . Denote b(v\x ) the e q u i l i b r i u m b i d d i n g strategy. Before we compute b(v\x ), let us first eliminate dominated strategies. A s i n classical sealed-bid first-price auctions, it is obvious that we should never b i d higher t h a n our valuation v. L e t b denote the highest submitted b i d so far, thus b + 5 is the m i n i m u m b i d amount. It immediately follows that if tv < b + 5 then we should not b i d . O n the other h a n d , if v > b + 5, then we have non-negative expected payoffs by m a k i n g a b i d (as long as we b i d below v). v v v 0 0 0 0 Define as P the class of strategies w h i c h take the following form: • if v < b + 5, then do not b i d 0 • if v > b + 6, then b i d a n amount i n the interval [b + 5, v] 0 0 Following the above reasoning, a n y strategy not i n P is weakly dominated. T h u s , i n e q u i l i b r i u m a l l bidders w i l l play some strategy i n P. Suppose bidder i's valuation is v > b + S. T h u s she places a b i d b £ [b + S, v]. N o t e that bidder i's expected payoff depends on her valuation, her b i d b, and what future bidders do. Since future bidders play strategies i n P, we know i w i l l be o u t b i d if a n d only if at least one future bidder has valuation greater t h a n or equal to b + 5. In other words, as long as everyone plays a strategy i n P, a bidder's expected payoff depends o n future bidders' valuations b u t not their exact b i d d i n g functions. Denote by mf the number of future bidders, a n d denote by p — Pr(raj — Q\x ) the probability that there w i l l be no future bidders. L e t F (v) be the C D F of the (posterior) d i s t r i b u t i o n of the highest future b i d given x , conditioned o n having at least one future bidder. In other words, 0 0 0 x v v F^v) = E [F(v) f} m mf]mf>0 (3.5) Chapter 3. Bidding Agents for Online Auctions with Hidden Bids where F(v) is the C D F of the value distribution. F a n d p depend o n the posterior distribution of the number of future bidders, Pr(nif\x ). If we can generate samples ofTO/,then F and p can be straightforwardly approximated. Note that the set of bidders consists of past bidders (observed and hidden), the current bidder, a n d future bidders. T h u s we can generate m / using essentially the same procedure as described i n Section 3.4.1, by drawing m from g(m), simulating the auction so far to get the number of already dropped bids \xd\, then letting TO/ = m — la;,,! — \xd\ — 1. N o t e that we do not need to know the exact equilibrium strategy to generate the hidden bidders; the knowledge that the equilibrium strategy is i n P is enough to determine whether a bidder should b i d or not. l 0 v 1 0 N o w we can write the expected payoff U(b, v) as: U(b, v) = (1 - p )F (b + 6)(v-b) + (v - b) l 0 Po (3.6) Since v is fixed for each bidder, U(b, v) is then a function of b. W e can then use any standard numerical technique for function m a x i m i z a t i o n i n order to find the o p t i m a l b* £ [6 + 6, v] that maximizes U. 0 Theorem 3. The following strategy (played by all bidders) is a Bayes-Nash equilibrium of our auction setting: • if v < b + 6, do not bid. 0 • ifv>b 0 + 6, bid b* = a r g m a x b 6 [ 6 o + ^ ] U(b,v). T h e proof is straightforward: first observe that this strategy is i n P. W e have showed above that if everyone else is playing a strategy i n P, then b* as defined w i l l m a x i m i z e the bidder's utility. It follows that if everyone is playing the above strategy, then each bidder is playing a best response to the other bidders' strategies and so we have a n equilibrium. Learning the Distributions A t the beginning of Section 3.4.2 we assumed that the distributions f(v) a n d g(m) were commonly-known. N o w we relax this assumption. W e assume that we have access to the bidding histories of K past auctions, a n d consider the problem of learning the distributions. A s i n o u r online E n g l i s h auctions, some information is hidden from the bidding history: the number a n d valuations of the bidders w h o decided not to b i d . A s discussed i n Section 3.3.1, the simple approach of ignoring these hidden bidders could be expected to introduce bias to the estimates of / a n d g. Furthermore, the observed bids are not equal to the bidders' valuations; as shown above the equilibrium b i d is always lower than the valuation. T o correctly account for the hidden information, we use our E M algorithm for game-theoretic settings, as described i n Section 3.3.3. T o implement the E M algorithm, an important subproblem is reversing the equilibrium bidding strategy. Formally, given a complete bidding history x of a n auction a n d the v 43 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 44 current estimates f{v\9) and g(m\\), compute the corresponding valuations v such that bidders w i t h valuations v playing the equilibrium strategy would have placed the bids x . v v v Consider each b i d 6; € x . Denote by b the previous observed b i d . F r o m Section 3.4.2 we know that bi = argmaxbg^+^t,] U(b,v). There are two cases: v 0 1. If bi > b + 5, then 6j must be a local m a x i m u m in [b + 5, 0 0 the first-order condition b v] satisfying = 0. Solving for v, we get Q '^ dU v b F (b +5)+p /(l- ) l i v = b i 0 Po fWTI) + . ( 3 ' 7 ) where f is the derivative of F . W e can numerically approximate F , f and p by generating samples of m / i n the same way as i n Section 3.4.2. l 1 1 1 0 2. Otherwise, bi = b„ + 5. Intuitively, this means that the bidder's valuation is high enough to make a b i d (i.e. v > b + 6), but not high enough to fall into Case 1. A n y bidder w i t h a valuation i n the interval 0 . c , + d, , . r . F (b 1 b + 6 H 0 0 + 6)+p /(l-p ) 0 f (b l 0 0 + 5) would b i d i n this way. T h u s we generate samples of v by drawing from f(v\9) truncated to the above interval. Once the E M a l g o r i t h m has estimated the distributions / and g, our agent can use these distributions to compute the Bayes-Nash equilibrium strategy as described in Section 3.4.2. 3.5 Experiments W e evaluated b o t h our E M learning approach and the simple a p p r o a c h on several synthetic d a t a sets and on real world d a t a collected from eBay. W e compared the approaches i n three ways: 6 1. W h i c h approach gives better estimates of the distributions / ( C E ) , g(m) and f (x)? T h i s is important because better estimation of these distributions should give better results, regardless of whether agents take a decisiontheoretic approach or a game-theoretic approach to bidding. We measure the closeness of a n estimated distribution to the true distribution using the K u l l b a c k - L e i b l e r ( K L ) Divergence from the true distribution to the estimated distribution. T h e smaller the K L Divergence, the closer the estimated distribution to the true one. l We also looked at the variant of the simple approach that directly estimates the distribution f (x) using the selling prices. Our results show that this approach consistently underestimates f (x) as expected, and its performance is much worse than both the E M approach and the simple approach. For clarity, detailed results on this approach are not shown. 6 1 1 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids 2. F o r repeated online auction data, which approach gives better expected payoff under the decision-theoretic bidding model, as described i n Section 3.4.1? 3. For d a t a from online auctions without proxies, which approach gives closer estimates to the Bayes-Nash equilibrium under the game-theoretic bidding model (i.e., computes e-equilibria for smaller values of e), as described i n Section 3.4.2? O u r experiments show that the E M learning approach outperforms the simple approach i n all three ways, across a wide range of d a t a sets. 3.5.1 Repeated Online Auctions In this section we consider repeated online auctions, and thus attempt to answer the first two questions above. W e present results from four d a t a sets: • D a t a Set 1 has auctions of identical items, and we know the family of distributions that f(x) a n d g(m) belong to. • D a t a Set 2 has auctions of non-identical items, but we know the b i d dist r i b u t i o n /(x) is influenced linearly by a n attribute a. • D a t a Set 3 has auctions of identical items, but we do not know what k i n d of distributions f(x) a n d g(m) are. W e use nonparametric estimation techniques to estimate the distributions. • D a t a Set 4 is real-world auction d a t a on identical items, collected from eBay. S y n t h e t i c D a t a Set 1: Identical Items In this d a t a set, the items for sale in all auctions are identical, so the number of bidders and b i d amounts come from stationary distributions g(m) a n d f(x). f(x) is a N o r m a l distribution N(4, 3.5). g(m) is a Poisson distribution shifted to the right: g(m — 2) = P ( 4 0 ) , i.e. the number of bidders is always at least 2. T h e bidding history is generated using our model of the bidding process as described i n Section 3.2. E a c h instance of the d a t a set consists of bidding history from 40 auctions. W e generated 15 instances of the d a t a set. E s t i m a t i n g the D i s t r i b u t i o n s B o t h estimation approaches are informed of the parametric families from which f(x) and g(m) are drawn; their task is to estimate the parameters of the distributions, (/x, cr) for /(x) a n d A for g(m). A t the M step of the E M algorithm, standard M L estimates for /x, a , a n d A are computed, i.e. sample mean of the b i d amounts for /x, standard deviation of the b i d amounts for cr, and the mean of the number of bidders minus 2 (due to the shifting) for the Poisson parameter A. 45 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids O u r results show that the E M approach outperforms the simple approach i n the quality of its estimates for the distributions f(x), g(m) and f (x). F i g u r e 3.2 shows typical estimated d i s t r i b u t i o n s a n d the true distributions. W e observe that the plot of the estimated f(x) by the simple approach is significantly shifted to the right of the true distribution, i.e. the simple approach overestimated f(x). W e have also calculated K L Divergences from the true distributions to the estimated distributions, a n d the E M estimations have consistently lower K L Divergences. T h i s difference was verified to be significant, using the nonparametric W i l c o x o n sign-rank test. l 7 Bidding in Repeated Auctions Estimates from b o t h approaches were used to compute bidding strategies for a n auction environment w i t h 8 sequentially held auctions of the same k i n d of items, using the decision-theoretic model introduced i n Section 3.4.1. T h e agent's " a c t u a l " expected payoffs U\(b, v) under these bidding strategies were then computed, using the true distributions. T h e o p t i m a l bidding strategy and its expected payoff were also computed. O u r results show that the E M approach gives rise to bidding strategies closer to the o p t i m a l strategy, and achieves higher expected payoffs, as compared to the simple approach. F i g u r e 3.3 shows a plot of the bidding strategies i n the first auction, a n d a box plot of the mean regrets, which is the differences between o p t i m a l expected payoffs and actual expected payoffs. Formally, let b* denote the optimal strategy and b the strategy computed using estimated distributions, then the regret given our agent's valuation v is R(v) = Ui(b*,v)-Ui(b,v) and the mean regret is the expected value of R(v) over the distribution of v, which we set to be the same as the other bidders' b i d distribution / : F r o m the box plot we observe that the mean regret of the E M approach is much smaller t h a n that of the simple approach. T h e ratio between the mean regrets of the E M and simple approaches is 1 : 56. We also used the estimated distributions on the decision-theoretic model w i t h overlapping auctions, as described i n Section 3.4.1. W e again tested b o t h approaches on 8 sequentially held auctions, b u t now some early bidding activity on each auction was generated. These results are shown i n F i g u r e 3.4. A g a i n we see that the E M approach achieves higher expected payoffs (and thus less regret) compared to the simple approach. T h e E M approach seemed to benefit more f r o m the extra information of early bids t h a n the simple approach: the ratio between the mean regrets of the E M and simple approaches increased to 1 : 390. 7 T h e distributions shown were randomly chosen from the 15 instances of the d a t a set. W e have verified that the plots of the other distributions are qualitatively similar. 46 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids Distribution of the Highest Bid Figure 3.2: Results for Data Set 1, Distribution Estimation: distribution of bids f(x) (left); distribution of highest bids f (x) (right). l Bidding Strategies in the First Auction "2 4 6 8 10 12 14 16 18 Payoff Regrets 20 simple EM Bidder Valuation Figure 3.3: Results for Data Set 1, Bidding: bidding strategies in the first auction (left); box plot of payoff regrets of the two approaches (right). Synthetic Data Set 2: Non-identical Items In our second data set, the items on sale are not identical; instead the distribution of valuations are influenced by an observable attribute a. In this data set the dependence is linear: f(x\a) = N(l.la + 1.0,3.5). g{m) is a Poisson distribution as before: g(m — 2) = P(35). For each auction, a is sampled uniformly from the interval [3,9]. In other words, this data set is similar to Data Set 1, except that the bid distribution f(x) is drawn from a different parametric family. Both approaches now use linear regression to estimate the linear coefficients. Estimating the Distributions Again, our results show that the EM approach outperforms the simple approach for this data set, in terms of its estimates for f(x) and g(m). Figure 3.5 (left) shows the estimated linear relation between the mean of f(x\a) and a. From thefigurewe can see that the EM approach gives a much better estimate to the linear function. The simple approach again significantly overestimates the bid amounts. In fact the simple 47 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids Overlapping Auctions: Payoff Regrets Figure 3.4: B o x plot of expected payoff regrets for overlapping auctions The mean of f(x|a) versus Payoff Regrets Figure 3.5: Results for D a t a Set 2: Linear relationship between the m e a n of f(x\a) and a (left). B o x plot of payoff regrets (right). approach has consistently overestimated f(x) tested. for all the synthetic d a t a sets we Bidding in Repeated Auctions We then used the estimated distributions to compute a decision-theoretic agent's bidding strategies a n d expected payoffs of an auction environment w i t h 8 sequential auctions, where the attribute a of each item is observed. T h e E M approach also gives better expected payoff, the statistical significance of which is confirmed by W i l c o x o n ' s sign-rank test. Figure 3.5 (right) shows a box plot of regrets from different instances of d a t a sets, which shows that the E M approach achieved consistently higher payoffs. Synthetic Data Set 3: Unknown Distributions We go back to the identical items model w i t h stationary distributions f(x) a n d g(m). F o r this d a t a set, f(x) is a G a m m a distribution w i t h shape parameter 2 and scale parameter 3. g(m) is a mixture of two Poisson distributions: P ( 4 ) w i t h probability 0.6 a n d P ( 6 0 ) w i t h probability 0.4. B u t now the estimation 48 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids approaches does not know the types of the true distributions. Instead, b o t h use kernel density estimation (kernel smoothing), a nonparametric estimation strategy. Essentially, given N samples from a distribution p(x), we estimate p(x) by a mixture of N kernel functions centered at the N samples. E s t i m a t i n g the D i s t r i b u t i o n s A Gaussian kernel is used for estimating f(x) and a uniform kernel is used for estimating g(m). A t each M step of the E M algorithm, the b a n d w i d t h parameters of the two kernel estimations need to be selected. W e use the simple "rule of t h u m b " strategy [Silverman 1986] for b a n d w i d t h selection. W e used the same kernel estimation and b a n d w i d t h selection technique for the simple approach. O u r results show that the E M approach gives better estimates t h a n the simple approach. F i g u r e 3.6 shows typical estimated distributions and true distributions. F r o m the figure we can observe that the E M estimates of f(x), g(m) and f (x) are much closer to the true distributions that the simple estimates. T h e E M estimates have significantly smaller K L Divergences compared to the simple estimates, verified by W i l c o x o n ' s sign-rank test. 1 B i d d i n g i n R e p e a t e d A u c t i o n s W e then computed the expected payoffs under the decision-theoretic model w i t h 8 sequential auctions. T h e expected payoffs of the E M approach were not significantly better t h a n that of the simple approach, as shown by the b o x plot i n F i g u r e 3.6. O n e possible explanation is that although K L divergence is a good measure of similarity of distributions in general, under this particular sequential auction decision-theoretic model K L divergence might not be the best measure of quality. T h e bidding strategy as defined i n (3.4) a n d (3.3) is o p t i m a l if the distributions / and g are the true underlying distributions. U s i n g o u r estimated distributions, the resulting b i d ding strategy may be higher or lower t h a n the o p t i m a l bids. However from our experience i n these experiments, bidding too high is more costly t h a n bidding too low. T h i s is not taken into account i n the K L divergence c o m p u t a t i o n as well as o u r M L estimation procedure. T h i s suggests that a possible future research direction is to identify a loss function suited for the sequential auction bidding model, a n d compute estimated distributions by m i n i m i z i n g that loss function. A n o t h e r possible approach is to compute a posterior d i s t r i b u t i o n i n stead of point estimates of the parameters i n each iteration. F o r example, West [1994] used G i b b s sampling techniques to compute the posterior distribution of parameters i n a relatively simple model of incomplete d a t a . B i d d i n g strategies computed using the distribution of parameters should be free of the problem mentioned above. However, this k i n d of approach would be more c o m p u t a t i o n ally expensive. e B a y D a t a o n Sony P l a y s t a t i o n 2 Systems O u r experiments o n synthetic d a t a sets showed that our E M approach gave good estimates of the true distributions i n several different settings. However, the synthetic d a t a sets are generated using our model for the b i d d i n g process. 49 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids Figure 3.6: Results for Data Set 3: Distribution f(x) (top-left). Distribution g{m) (top-right). Distribution f (x) (bottom-left). Box plot of payoff regrets (bottom-right). l Thus, the above experiments do not tell us whether our model for the bidding process is an accurate description of what happens in real world online auctions. To answer this question, we wanted to test our approach on real world bid data. On eBay, the bidding histories of completed auctions are available for 30 days. Unfortunately, information on the hidden bids, especially the proxy bids of the winners of the auctions, is not publicly available. So unlike in the synthetic data experiments, we cannot compare our estimated distributions with the "true" distributions. To get around this problem, we used the following approach. First we collected bidding histories from a set of eBay auctions. Next we pretended that those highest bids were not placed, and the previously second highest bids were the highest bids. We then "hid" these new highest bids of each auction. This gave us a "ground truth" for the hidden bids which allowed us to evaluate our 8 8 O f course we could have used our techniques to generate values for the missing bids; however, this would have been unfair when the goal was to test these techniques! 50 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids different approaches. It is true that this approach of hiding bids changed the underlying distribution of bids; however, this d i d not worry us as our goal was not to learn the true distribution of bids i n our eBay auctions. Instead, our goal was to evaluate our algorithms' performance on a realistic, non-synthetic d a t a set. W e believe that despite the removal of the high b i d our d a t a set preserves qualitative characteristics of the original bidding history d a t a . If our model of the bidding process is correct, then o u r E M approach should be able to correctly account for the hidden bids i n this d a t a set and produce good estimates of/ ^). 1 W e collected bidding history of eBay auctions on b r a n d new Sony P l a y s t a t i o n 2 (Slim Model) consoles, over the m o n t h of M a r c h 2005. W e chose to study these auctions because they had been previously studied by Shah et al. [2003], who argued that bidders' valuations on PlayStations tend to be close to the private value model. W e considered only auctions that lasted one day and had at least 3 bidders. W e found 60 auctions that satisfied these requirements. W e then r a n d o m l y divided o u r d a t a into a training set and a testing set. 9 E s t i m a t i n g the D i s t r i b u t i o n s We tested four learning approaches: the E M and simple approaches that estimate a N o r m a l distribution for f(x) and a Poisson distribution for g(m), and the E M and simple approaches t h a t use kernel density estimation to estimate f(x) a n d g(m). O f course we d i d not have a ground t r u t h for these distributions, so it was not possible to compare the accuracy of the predictions. However, we could use b o t h approaches' estimates to estimate f (x) based o n the training set, and compare these estimates against the highest bids from the test set. W e d i d 8 runs of this experiment w i t h different r a n d o m partitions of training set a n d testing set, a n d aggregated the results. T h e K L Divergences of f (x) of the approaches were similar, and no one approach was significantly better t h a n the others. l 1 B i d d i n g i n R e p e a t e d A u c t i o n s W e then computed the expected payoffs under the decision-theoretic model. T h e E M approaches achieved significantly higher payoffs t h a n the simple approaches, as shown i n F i g u r e 3.7. T h e a p proaches using parametric models achieved similar payoffs to the corresponding approaches w i t h kernels. T h e good performance of the parametric estimation E M approach for the eBay d a t a set indicates that the N o r m a l and Poisson m o d els for f(x) and g{m) may be adequate models for modeling bidding on eBay. T h e E M approaches d i d not have better K L divergence t h a n the simple approaches, but nevertheless outperformed the simple approaches i n the repeated auction experiments. T h i s is similar to our situation i n D a t a Set 3. O u r experiments have shown that there is a positive correlation between better K L divergence and better performance i n repeated auctions, b u t that this correlat i o n is not perfect. 'We needed at least 3 bidders so that we could drop one and still have one observed bid. 51 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids Payoff Regrets simple simple kernel Figure 3.7: Box plot of payoff regrets on the eBay Data Set 3.5.2 Online Auctions without Proxies We built a data set modeling online auctions without proxies in order to compare the EM approach against the simple approach in the game-theoretic setting described in Section 3.4.2. Thus, in this section we consider questions 1 and 3 from the list at the beginning of Section 3.5. Similar to Section 3 . 5 . 1 , we use a normal distribution J V ( 5 . 0 , 2 . 0 ) for f(v) and a shifted Poisson distribution (with A = 10) for g(m). Each instance of the data set consists of bidding histories from 30 auctions. As might be expected from the results in the previous section, we found that the EM approach gave much better estimates of f(v) and g(m) than the simple approach. Figure 3.8 shows typical estimated distributions and the true distributions. E s t i m a t i n g the D i s t r i b u t i o n s We then compared the gametheoretic bidding agents that would have been built using the distributions estimated by the two learning approaches. Unlike in Section 3 . 5 . 1 , we cannot simply use expected payoff as a measure of performance, since in a game-theoretic setting our payoff depends other agents' strategies; for example, a bad approximation to an equilibrium could actually lead to increased payoffs for all agents. (Consider e.g., the prisoner's dilemma.) Instead, what we can measure is the amount that each player could gain by unilaterally deviating from the current strategy at the "equilibrium" strategy profile computed using each learning approach. One way of thinking of this value relates to the concept of e-equilibrium. (Recall that a strategy profile is an e-equilibrium if each agent can gain at most e by unilaterally deviating from the strategy profile.) Every strategy profile is E s t i m a t i n g the B a y e s - N a s h E q u i l i b r i u m 52 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids Distribution ol Valuations Distribution ot Number ot Bidders Figure 3.8: Results for Online Auctions without Proxies: the value distributions f(v) (left); the distributions of number of bidders g(rn) (right). an e-equilibrium for some e; what we compute is the smallest e for which the learned strategies form an e-equilibrium. For our auction setting, we observe (from Theorem 3) that no matter what distributions we use for / and g, our agents will play a strategy in P. As a result, the true equilibrium strategy (computed from Theorem 3 using the true distributions) is always a best response to the strategies our agents play. Given our agent's valuation v and observed bidding history so far x , we can compute the difference between the expected payoff of the best response and the expected payoff of our agent's strategy. The e for the game is then the expectation of this difference over v, m, and bidding history x . We approximate this expectation by sampling v, m, x and taking the mean of the payoff differences. We generated bidding histories for 20 auctions using each approach, and for each bidder in these auctions we computed expected payoff differences for 50 valuations v drawn from the true f(v). Our results show that strategy profiles computed using the EM approach are epsi/on-equilibria for much smaller values of e than the strategy profiles computed using the simple approach. This means that the EM bidding agent's strategy is much closer to the true equilibrium. We computed e for 15 instances of the training data set, and Figure 3.9 gives a box plot of the resulting e's. v v v 3.6 Related Work Our EM approach is similar in spirit to an approach by Boutilier et al. [1999]. This work concerns a decision-theoretic MDP approach to bidding in sequential first-price auctions for complementary goods. For the case when these sequential auctions are repeated, this paper discusses learning a distribution of other agents' highest bids for each good, based on the winning bids in past auctions. If the agent's own bid wins in an auction, the highest bid by the other agents is hidden because only the winning bid is revealed. To overcome this problem of hidden bids, the paper uses an EM approach to learn the distribution of the 53 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids Figure 3.9: Results for Online Auctions without Proxies: box plot of epsilonequilibria using the simple and E M approaches highest bid. In our E n g l i s h auction setting the highest bids are always hidden, and thus cannot be directly estimated. Instead we have to estimate the distributions of bids f(x) and of number of bidders g(m)—which requires us to take into account bids other t h a n the highest—and then compute the distribution of highest bids using f(x) and g(m). A s a result the learning task i n our d o m a i n is more complicated t h a n the problem considered by Boutilier et a l . [1999]. Several other papers have tried to solve the hidden b i d problem in various auction settings. Rogers et al. [2005] studied E n g l i s h auctions w i t h discrete b i d levels. T h e y discussed estimating the distributions of valuations and numbers of bidders (/ and g) from data, then using the estimated distributions to compute a n o p t i m a l set of b i d levels (including the reserve price) for the auctioneer. T h e i r learning approach is to look at only the final prices of the auctions, and then to use Bayesian inference to compute posterior distributions of the parameters of / and g given the final prices of previous auctions. W e note that this approach ignores all the earlier bids, which carry information about / and g, while our approach uses all bids observed. Furthermore, Rogers et al.'s [2005] approach works only for parametric distributions. If the true underlying distributions are not in the chosen parametric family of distributions, parametric estimation tends to give poor results. In Section 3.5.1 we showed that our approach can use nonparametric estimation techniques (kernel density estimation). Finally, the method of [Rogers et a l . 2005] needs to compute the joint distribution of multiple parameters, which takes exponential time in the number of parameters. O u r E M approach only tries to compute M L / M A P estimates of the parameters, so each iteration of E M scales roughly linearly w i t h the number of parameters. A n o t h e r relevant paper is [Song 2004]. L i k e us, Song studies the estimation problem in online E n g l i s h auctions in eBay-like environments. However she used a different approach, based on the theoretical result that the second- and thirdhighest valuations uniquely identify the underlying value distribution. However, applying this fact to the eBay model presents a problem: while the highest 54 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids observed b i d corresponds to the second-highest valuations, the second-highest observed b i d is not necessarily the third-highest valuation. T h i s is because the first- or second- highest bidders may have submitted bids higher t h a n the t h i r d highest value before the third-highest valued bidder had a chance to submit her final b i d , which means her bid can be hidden from the bidding history. T h u s the second-highest observed bid is sometimes less than the third-highest valuation, and ignoring this fact would introduce bias to the estimation. Song recognizes this problem, and her solution is to use d a t a from auctions in which the first- or second-highest bidder submitted bids higher t h a n the second-highest observed b i d late in the auction, because these auctions have relatively higher probability that their second-highest observed bids are third-highest valuations. However, this approach merely reduces expected bias rather than avoiding it entirely. Indeed, we note that there is empirical evidence [Roth and Ockenfels 2002] that bidding activity late in auctions has much higher density t h a n earlier b i d d i n g activity. T h i s suggests that even for the selected auctions of Song's approach, there may be significant probability that second-highest observed bids are less t h a n third-highest valuations. O u r approach models the hidden bidders, so does not suffer from the bias introduced in her approach, and does not need to drop any auctions from consideration. .Finally, Song [2004] d i d not discuss how to estimate the distribution of the number of bidders, which is essential for computing o p t i m a l bidding strategies in our repeated auctions setting (see Section 3.4). Haile and Tamer [2003] analyzed a different form of the hidden b i d problem: a bidder may have submitted an earlier b i d below her valuation, then the current price rises above her valuation, so she does not b i d again. A s a result, bidders' final observed bids may be below their valuations. Haile and Tamer [2003] proposed a method to compute bounds on the value distributions given the observed bids. In our model, bidders' final observed bids coincide w i t h their willingness to pay, so this issue is not addressed in our current model. O n the other h a n d , Haile and Tamer's [2003] approach was intended for physical (as opposed to online) auctions, where there are no fixed closing times, and the number of potential bidders are assumed to be known. T h u s their approach cannot be directly applied to online auctions where the number of bidders are inherently unknown. A n interesting future research direction is to combine our approach that deal w i t h hidden bidders w i t h Haile and Tamer [2003] 's technique for bounding the valuation distribution. 3.7 Conclusion In this chapter we have described techniques for building bidding agents in online auction settings that include hidden bids. In particular, we have addressed the issue of estimating the distributions of the number of bidders and b i d amounts f r o m incomplete auction data. We proposed a learning approach based on the E M algorithm that takes into account the missing bids by iteratively generating missing bids and doing m a x i m u m likelihood estimates on the completed set of 55 Chapter 3. Bidding Agents for Online Auctions with Hidden Bids bids. We applied our approach to both decision-theoretic and game-theoretic settings, and conducted experiments on both synthetic data as well as on eBay data. Our results show that our approach never did worse and often did much better than the the straightforward approach of ignoring the missing data, both in terms of the quality of the estimates and in terms of expected payoffs under a decision theoretic bidding model. 56 57 Bibliography A l t m a n , A l o n a n d Moshe Tennenholtz (2005). R a n k i n g systems: T h e P a g e R ank axioms. ACM Conference on Electronic Commerce. A n d r i e u , C , N . de Freitas, A . Doucet a n d M . I . J o r d a n (2003). A n introduction to M C M C for machine learning. Machine Learning. Anthony, P., W . H a l l , V . D . D a n g a n d N . Jennings (2001). A u t o n o m o u s agents for participating i n multiple online auctions. IJCAI Workshop on EBusiness and the Intelligent Web. A r o r a , A . , H . X u , R. P a d m a n and W . Vogt (2003). quential online auctions. W o r k i n g P a p e r . O p t i m a l bidding i n se- Athey, S. a n d P. Haile (2002). Identification i n standard auction models. Econometrica, 70(6), 2107-2140. B h a t , N . a n d K . L e y t o n - B r o w n (2004). C o m p u t i n g N a s h equilibria of actiongraph games. UAL Blum, B., C . Shelton and D. R o l l e r (2002). http: / / d a g s . s t a n f o r d . e d u / G a m e s / g a m e t r a c e r . h t m l . Gametracer. Boutilier, C , M . G o l d s z m i d t and B . S a b a t a (1999). Sequential auctions for the allocation of resources w i t h complementarities. IJCAI. B o w l i n g , M . (2004). Convergence a n d no-regret i n multiagent learning. NIPS. B o w l i n g , M . and M . Veloso (2001). Convergence of gradient dynamics w i t h a variable learning rate. ICML. B y d e , A (2002). A comparison among bidding algorithms for multiple auctions. Agent-Mediated Electronic Commerce IV. C a i , G . and P . R . W u r m a n (2003). Monte Carlo approximation in incompleteinformation, sequential-auction games (Technical R e p o r t ) . N o r t h C a r o l i n a State University. C h e n , X . a n d X . Deng (2005). Settling the complexity of 2-player Nash equilibrium (Technical R e p o r t T R 0 5 - 1 5 0 ) . E C C C . Conitzer, V . a n d T . S a n d h o l m (2003). C o m p l e x i t y results about N a s h equilibria. IJCAI. Bibliography 58 Daskalakis, C , P. W . G o l d b e r g and C . H . P a p a d i m i t r i o u (2005). The complexity of computing a Nash equilibrium (Technical R e p o r t T R 0 5 - 1 1 5 ) . E C C C . Dempster, A . , N . L a i r d a n d D. R u b i n (1977). M a x i m u m likelihood from i n complete d a t a v i a the E M a l g o r i t h m . 39(1), Journal of the Royal Statistical Society, 1-38. G o l d b e r g , P . W . and C . H . P a p a d i m i t r i o u (2005). Reducibility rium problems (Technical R e p o r t T R 0 5 - 0 9 0 ) . E C C C . among equilib- Golle, P h i l i p p e , K e v i n L e y t o n - B r o w n , Ilya M i r o n o v a n d M a r k L i l l i b r i d g e (2001). Incentives for sharing i n peer-to-peer networks. International Workshop on Electronic Commerce (WELCOM). G o v i n d a n , S. and R. W i l s o n (2003). A global N e w t o n method to compute N a s h equilibria. Journal Theory. of Economic G o v i n d a n , S. a n d R. W i l s o n (2004). C o m p u t i n g N a s h equilibria by iterated polymatrix approximation. Journal of Economic Dynamics and Control, 28, 1229-1241. Greenwald, A . a n d J . B o y a n (2004). B i d d i n g under uncertainty: T h e o r y a n d experiments. UAL Haile, P h i l i p A . a n d E l i e T a m e r (2003). Inferences w i t h a n incomplete model of E n g l i s h auctions. Journal of Political Economy, 111(1), 1-51. K e a r n s , M . J . , M . L . L i t t m a n and S . P . Singh (2001). G r a p h i c a l models for game theory. UAL K l e m p e r e r , P. (2000). A u c t i o n theory: A guide to the literature. In P. K l e m perer (Ed.), The economic theory of auctions. E d w a r d E l g a r . K o l l e r , D., N . M e g i d d o a n d B . v o n Stengel (1994). Fast algorithms for finding randomized strategies i n game trees. STOC. K o l l e r , D. and B . M i l c h (2001). Multi-agent influence diagrams for representing and solving games. IJCAI. L a M u r a , P. (2000). G a m e networks. UAL L e y t o n - B r o w n , K . and M . Tennenholtz (2003). Local-effect games. IJCAI. M a c k i e - M a s o n , J . K . , A . Osepayshvili, D . M . Reeves a n d M . P . W e l l m a n (2004). P r i c e prediction strategies for market-based scheduling. Fourteenth International Conference on Automated Planning and Scheduling (pp. 244-252). M i l g r o m , P. a n d R. Weber (2000). A theory of auctions a n d competitive bidding, II. In P. K l e m p e r e r ( E d . ) , The economic theory of auctions. E d w a r d Elgar. Bibliography 59 Osepayshvili, A . , M . P . W e l l m a n , D . M . Reeves and J . K . M a c k i e - M a s o n (2005). Self-confirming price prediction for simultaneous ascending auctions. UAL P a p a d i m i t r i o u , C H . (2005). games. STOC. C o m p u t i n g correlated equilibria i n multiplayer P o r t e r , R., E . N u d e l m a n and Y . S h o h a m (2004). Simple search methods for finding a N a s h equilibrium. AAAI (pp. 664-669). Powers, R. a n d Y . S h o h a m (2004). N e w criteria a n d a new algorithm for learning i n multiagent systems. NIPS. Regev, O . a n d N . N i s a n (1998). T h e popcorn market: c o m p u t a t i o n a l resources. International putation Conference O n l i n e markets for on Information and Com- Economies. Rogers, A . , E . D a v i d , J . Schiff, S. K r a u s and N . R. Jennings (2005). L e a r n ing environmental parameters for the design of o p t i m a l english auctions w i t h discrete b i d levels. International Workshop on Agent-Mediated E-Commerce. Utrecht, Netherlands. Rosenthal, R . W . (1973). A class of games possessing pure-strategy N a s h equilibria. Int. J. Game Theory, 2, 65-67. R o t h , A . E . a n d A . Ockenfels (2002). L a s t - m i n u t e bidding and the rules for ending second-price auctions: Evidence from e B a y and A m a z o n auctions o n the internet. American Economic Review. Roughgarden, T . and E . Tardos (2004). B o u n d i n g the inefficiency of equilibria i n nonatomic congestion games. Games and Economic Behavior, ^7(2), 3 8 9 403. S h a h , H . S . , N . R . J o s h i , A . S u r e k a a n d P R . W u r m a n (2003). M i n i n g for bidding strategies o n eBay. Lecture Notes on Artificial Silverman, B . W . (1986). Density estimation. Intelligence. L o n d o n : C h a p m a n and H a l l . Song, U n j y (2004). N o n p a r a m e t r i c estimation of a n e B a y auction model w i t h an u n k n o w n number of bidders. University of B r i t i s h C o l u m b i a . Stone, P., R . E . Schapire, J . A . C s i r i k , M . L . L i t t m a n and D . M c A l l e s t e r (2002). Attac-2001: A learning, autonomous bidding agent. Agent-Mediated Electronic Commerce IV (pp. 143-160). van der L a a n , G . , A . J . J . T a l m a n and L . van der H e y d e n (1987). S i m p l i c i a l variable dimension algorithms for solving the nonlinear complementarity problem o n a product of unit simplices using a general labelling. Mathematics of OR, 12(3), 377-397. Bibliography 60 Waldspurger, C . A . , T . H o g g , B . A . H u b e r m a n , J . 0 . K e p h a r t and W . S. Stornetta (1992). Spawn: A distributed computational economy. IEEE Transactions on Software Engineering. Weber, R . (1983). M u l t i - o b j e c t auctions. In R . Engelbercht-Wiggans, M . Shubik a n d R. S t a r k ( E d s . ) , Auctions, bidding, and contracting: Uses and theory, 165-191. N e w Y o r k University Press. W e l l m a n , M . P . , A . Greenwald, P. Stone a n d P . R . W u r m a n (2002). T h e 2001 T r a d i n g Agent C o m p e t i t i o n . IAAI. West, M . (1994). Discovery sampling and selection models. In J . O . Berger and S.S. G u p t a (Eds.), Decision theory and related topics IV. N e w Y o r k : Springer Verlag. Zhang, N . L . and D . Poole (1996). E x p l o i t i n g causal independence i n bayesian network inference. JAIR, 5, 301-328.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Computational problems in multiagent systems
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Computational problems in multiagent systems Jiang, Albert Xin 2006
pdf
Page Metadata
Item Metadata
Title | Computational problems in multiagent systems |
Creator |
Jiang, Albert Xin |
Date Issued | 2006 |
Description | There has been recent interest from the computer science community on multiagent systems, where multiple autonomous agents, each with their own utility functions, act according to their own interests. In this thesis, we apply techniques developed in other areas of CS to solve two computational problems in multiagent systems: Action Graph Games: Action Graph Games (AGGs), first proposed in [Bhat and Leyton-Brown 2004], are a fully expressive game representation which can compactly express strict and context-specific independence and anonymity structure in players' utility functions. We present an efficient algorithm for computing expected payoffs under mixed strategy profiles. This algorithm runs in time polynomial in the size of the AGG representation (which is itself polynomial in the number of players when the in-degree of the action graph is bounded). We also present an extension to the AGG representation which allows us to compactly represent a wider variety of structured utility functions. Learning and planning i n online auction environments: There is much active research into the design of automated bidding agents, particularly for environments that involve multiple auctions. These settings are complex partly because an agent's optimal strategy depends on information about other bidder's preferences. When bidders' valuation distributions are not known ex ante, machine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bidders' valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valuations, and also in the sense that a high bidder may not be compelled to reveal her valuation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we proposed an EM-base algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach estimates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density estimation techniques. Bidding agents using the estimated distributions from our EM approach were able to outperform bidding agents using the straightforward estimates, in both decision-theoretic and game-theoretic settings. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-06 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051584 |
URI | http://hdl.handle.net/2429/17668 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2006-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2006-0209.pdf [ 5.26MB ]
- Metadata
- JSON: 831-1.0051584.json
- JSON-LD: 831-1.0051584-ld.json
- RDF/XML (Pretty): 831-1.0051584-rdf.xml
- RDF/JSON: 831-1.0051584-rdf.json
- Turtle: 831-1.0051584-turtle.txt
- N-Triples: 831-1.0051584-rdf-ntriples.txt
- Original Record: 831-1.0051584-source.json
- Full Text
- 831-1.0051584-fulltext.txt
- Citation
- 831-1.0051584.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051584/manifest