Representing and Reasoning with Large Games by Xin Jiang B. Science, University of British Columbia, 2003 M. Science, University of British Columbia, 2006 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Computer Science) The University Of British Columbia (Vancouver) December 2011 c Xin Jiang, 2011 Abstract In the last decade, there has been much research at the interface of computer science and game theory. One important class of problems at this interface is the computation of solution concepts (such as Nash equilibrium or correlated equilibrium) of a finite game. In order to take advantage of the highly-structured utility functions in games of practical interest, it is important to design compact representations of games as well as efficient algorithms for computing solution concepts on such representations. In this thesis I present several novel contributions in this direction: The design and analysis of Action-Graph Games (AGGs), a fully-expressive modeling language for representing simultaneous-move games. We propose a polynomial-time algorithm for computing expected utilities given arbitrary mixed strategy profiles, and leverage the algorithm to achieve exponential speedups of existing algorithms for computing Nash equilibria. Designing efficient algorithms for computing pure-strategy Nash equilibria in AGGs. For symmetric AGGs with bounded treewidth our algorithm runs in polynomial time. Extending the AGG framework beyond simultaneous-move games. We propose Temporal Action-Graph Games (TAGGs) for representing dynamic games and Bayesian Action-Graph Games (BAGGs) for representing Bayesian games. For certain subclasses of TAGGs and BAGGs we gave efficient algorithms for equilibria that achieve exponential speedups over existing approaches. Efficient computation of correlated equilibria. In a landmark paper, Papadimii itriou and Roughgarden described a polynomial-time algorithm (”Ellipsoid Against Hope”) for computing sample correlated equilibria of compactlyrepresented games. Recently, Stein, Parrilo and Ozdaglar showed that this algorithm can fail to find an exact correlated equilibrium. We present a variant of the Ellipsoid Against Hope algorithm that guarantees the polynomialtime identification of exact correlated equilibrium. Efficient computation of optimal correlated equilibria. We show that the polynomialtime solvability of what we call the deviation-adjusted social welfare problem is a sufficient condition for the tractability of the optimal correlated equilibrium problem. iii Preface Certain chapters of this thesis are based on publications (or submissions to publications) by my collaborators and me (under the name Albert Xin Jiang). Per requirement of UBC Faculty of Graduate Studies, I describe here the relative contributions of all collaborators. Chapter 3 is based on the article Action-Graph Games by Albert Xin Jiang, Kevin Leyton-Brown and Navin Bhat, published in Games and Economic Behavior, Volume 71, Issue 1, January 2011, Pages 141–173, Elsevier. Navin and Kevin first proposed Action-Graph Games without function nodes (called AGG-0s / in this thesis), proposed an algorithm for computing expected utility for the symmetric case, and proposed an approach for computing sample Nash equilibria in symmetric AGG-0s, / by adapting Blum et al. [2006]’s approach for speeding up Govindan and Wilson’s [2003] global Newton method. My main contributions include: 1) extending the basic AGG-0/ representation by introducing function nodes and additive structure, yielding the more general representations AGGs with Function Nodes (AGG-FNs) and AGG-FNs with Additive Structure (AGG-FNAs); 2) proposing and implementing an algorithm for computing expected utility for general AGGs, and proving that it runs in polynomial time; 3) implementing software packages for game-theoretic analysis using AGGs, including programs that speed up existing algorithms for sample Nash Equilibria [Govindan and Wilson, 2003, van der Laan et al., 1987] by leveraging the expected utility algorithm; 4) carrying out computational experiments; 5) preparation of the manuscript. Kevin has played a supervisory role throughout the project. Chapter 4 is based on the paper Computing Pure Nash Equilibria in Symmetric Action Graph Games by Albert Xin Jiang and Kevin Leyton-Brown, published iv in the Proceedings of AAAI, 2007, although the chapter contains a significant amount of new material. My main contributions include: 1) identification of the research problem and the design of the overall approach; 2) working out the details of our algorithm and proving its correctness and running time; 3) preparation of the manuscript. Kevin has played a supervisory role throughout the project. Chapter 5 is based on the paper Temporal Action-Graph Games: A New Representation for Dynamic Games by Albert Xin Jiang, Kevin Leyton-Brown and Avi Pfeffer, published in the Proceedings of UAI, 2009. The identification and design of the overall research program is done via joint discussions by all three co-authors. My other contributions include: 1) working out the details of the Temporal ActionGraph Game representation and our algorithm for computing expected utility, and proving their properties; 2) implementing our algorithm and carrying out computational experiments; 3) preparation of a majority of the text in the manuscript. Kevin has played a supervisory role throughout the project. Chapter 6 is based on the paper Bayesian Action-Graph Games, published in the Proceedings of NIPS, 2010. The identification and design of the overall research program is done via joint discussions by both co-authors. My other contributions include: 1) working out the details of the Bayesian Action-Graph Game representation, our algorithm for computing expected utility and our approach for computing Bayes-Nash equilibrium, and proving their properties; 2) implementing our algorithm and carrying out computational experiments; 3) preparation of the manuscript. Kevin has played a supervisory role throughout the project. Chapter 7 is based on the paper Polynomial-time Computation of Exact Correlated Equilibrium in Compact Games by Albert Xin Jiang and Kevin LeytonBrown, published in the Proceedings of ACM-EC, 2011. My main contributions include: 1) identification of the research program; 2) design of our algorithm and analysis of its properties; 3) preparation of the manuscript. Kevin has played a supervisory role throughout the project. Chapter 8 is based on the manuscript A General Framework for Computing Optimal Correlated Equilibria in Compact Games by Albert Xin Jiang and Kevin Leyton-Brown, published in the Proceedings of the Seventh Workshop on Internet and Network Economics (WINE), 2011. My main contributions include: 1) identification of the research program; 2) design of our algorithm and analysis of its v properties; 3) preparation of the manuscript. Kevin has played a supervisory role throughout the project. vi Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 A Brief Survey on the Computation of Solution Concepts . . . . . . 10 2.1 Representations of Games . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Representing Complete-information Static Games . . . . 11 2.1.2 Representing Dynamic Games . . . . . . . . . . . . . . . 17 2.1.3 Representing Games of Incomplete Information . . . . . . 18 Computation of Game-theoretic Solution Concepts . . . . . . . . 19 2.2 2.2.1 Computing Sample Nash Equilibria for Normal-Form Games 20 2.2.2 Computing Sample Nash Equilibria for Compact Representations of Static Games . . . . . . . . . . . . . . . . . 2.2.3 27 Computing Sample Bayes-Nash Equilibria for Incompleteinformation Static Games . . . . . . . . . . . . . . . . . 31 2.2.4 Computing Sample Nash Equilibria for Dynamic Games . 33 2.2.5 Questions about the Set of All Nash Equilibria of a Game 35 2.2.6 Computing Pure-Strategy Nash Equilibria . . . . . . . . . 35 vii 2.2.7 Computing Correlated Equilibrium . . . . . . . . . . . . 38 2.2.8 Computing Other Solution Concepts . . . . . . . . . . . . 41 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Action-Graph Games . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.1.1 . . . . . . . . . . . . . . . . . . . . . 43 Action Graph Games . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.1 Basic Action Graph Games . . . . . . . . . . . . . . . . . 45 3.2.2 AGGs with Function Nodes . . . . . . . . . . . . . . . . 51 3.2.3 AGG-FNs with Additive Structure . . . . . . . . . . . . . 58 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.1 A Job Market . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3.2 Representing Anonymous Games as AGG-FNs . . . . . . 62 3.3.3 Representing Polymatrix Games as AGG-FNAs . . . . . . 63 3.3.4 Congestion Games with Action-Specific Rewards . . . . . 64 Computing Expected Payoff with AGGs . . . . . . . . . . . . . . 3.4.1 Computing Expected Payoff for AGG-0s / . . . . . . . . . 66 66 3.4.2 Computing Expected Payoff with AGG-FNs . . . . . . . 77 3.4.3 Computing Expected Payoff with AGG-FNAs . . . . . . . 81 Computing Sample Equilibria with AGGs . . . . . . . . . . . . . 82 3.5.1 Complexity of Finding a Nash Equilibrium . . . . . . . . 83 3.5.2 Computing a Nash Equilibrium: The Govindan-Wilson Al- 2.3 3 3.2 3.3 3.4 3.5 Our Contributions gorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Computing a Nash Equilibrium: The Simplicial Subdivision Algorithm . . . . . . . . . . . . . . . . . . . . . . . 88 Computing a Correlated Equilibrium . . . . . . . . . . . 89 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.6.1 Software Implementation and Experimental Setup . . . . 90 3.6.2 Representation Size . . . . . . . . . . . . . . . . . . . . . 92 3.6.3 Expected Utility Computation . . . . . . . . . . . . . . . 93 3.6.4 Computing Payoff Jacobians . . . . . . . . . . . . . . . . 94 3.6.5 Finding a Nash Equilibrium Using Govindan-Wilson . . . 96 3.5.4 3.6 84 viii 3.7 4 Finding a Nash Equilibrium Using Simplicial Subdivision 97 3.6.7 Visualizing Equilibria on the Action Graph . . . . . . . . 100 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Computing Pure-strategy Nash Equilibria in Action-Graph Games . 104 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2.1 AGGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2.2 Complexity of Computing PSNE . . . . . . . . . . . . . . 107 4.3 Computing PSNE in AGGs with Bounded Number of Action Nodes 108 4.4 Computing PSNE in Symmetric AGGs . . . . . . . . . . . . . . . 110 4.5 4.6 5 3.6.6 4.4.1 Restricted Games and Partial Solutions . . . . . . . . . . 110 4.4.2 Combining Partial Solutions . . . . . . . . . . . . . . . . 112 4.4.3 Dynamic Programming via Characteristics . . . . . . . . 113 4.4.4 Algorithm for Symmetric AGGs with Bounded Treewidth 4.4.5 Finding PSNE . . . . . . . . . . . . . . . . . . . . . . . 125 120 4.4.6 Computing Optimal PSNE . . . . . . . . . . . . . . . . . 126 Beyond symmetric AGGs . . . . . . . . . . . . . . . . . . . . . . 128 4.5.1 Algorithm for k-Symmetric AGG-0s / . . . . . . . . . . . . 128 4.5.2 General AGG-0s / and the Augmented Action Graph . . . . 129 Conclusions and Open Problems . . . . . . . . . . . . . . . . . . 134 Temporal Action-Graph Games: A New Representation for Dynamic Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.3 5.2.1 Temporal Action-Graph Games . . . . . . . . . . . . . . 138 5.2.2 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.2.3 Expected Utility . . . . . . . . . . . . . . . . . . . . . . 144 5.2.4 The Induced MAID of a TAGG . . . . . . . . . . . . . . 146 5.2.5 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . 147 Computing Expected Utility . . . . . . . . . . . . . . . . . . . . 148 5.3.1 Exploiting Causal Independence . . . . . . . . . . . . . . 149 ix 6 5.3.2 Exploiting Temporal Structure . . . . . . . . . . . . . . . 150 5.3.3 Exploiting Context-Specific Independence . . . . . . . . . 153 5.4 Computing Nash Equilibria . . . . . . . . . . . . . . . . . . . . . 154 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Bayesian Action-Graph Games . . . . . . . . . . . . . . . . . . . . . 159 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.2.1 6.3 Bayesian Action-Graph Games . . . . . . . . . . . . . . . . . . . 163 6.3.1 6.4 7 BAGGs with Function Nodes . . . . . . . . . . . . . . . 166 Computing a Bayes-Nash Equilibrium . . . . . . . . . . . . . . . 168 6.4.1 6.5 Complete-information interpretations . . . . . . . . . . . 162 Computing Expected Utility in BAGGs . . . . . . . . . . 170 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Polynomial-time Computation of Exact Correlated Equilibrium in Compact Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.1 8 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.1.1 Recent Uncertainty About the Complexity of Exact CE . . 177 7.1.2 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . 178 7.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7.3 The Ellipsoid Against Hope Algorithm . . . . . . . . . . . . . . . 182 7.4 Our Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.4.1 The Purified Separation Oracle . . . . . . . . . . . . . . . 185 7.4.2 The Simplified Ellipsoid Against Hope Algorithm . . . . 188 7.5 Uncoupled Dynamics with Polynomial Communication Complexity 192 7.6 Computing Extensive-form Correlated Equilibria . . . . . . . . . 194 7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A General Framework for Computing Optimal Correlated Equilibria in Compact Games . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 203 x 8.2.1 8.3 8.4 Correlated Equilibrium . . . . . . . . . . . . . . . . . . . 203 The Deviation-Adjusted Social Welfare Problem . . . . . . . . . . 204 8.3.1 The Weighted Deviation-Adjusted Social Welfare Problem 207 8.3.2 The Coarse Deviation-Adjusted Social Welfare Problem . 208 The Deviation-Adjusted Social Welfare Problem for Specific Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 8.5 8.4.1 Reduced Forms . . . . . . . . . . . . . . . . . . . . . . . 209 8.4.2 Linear Reduced Forms . . . . . . . . . . . . . . . . . . . 213 8.4.3 Representations with Action-Specific Structure . . . . . . 216 Conclusion and Open Problems . . . . . . . . . . . . . . . . . . . 221 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 A Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 A.1 File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 A.1.1 The AGG File Format . . . . . . . . . . . . . . . . . . . 239 A.1.2 The BAGG File Format . . . . . . . . . . . . . . . . . . . 241 A.2 Solvers for finding Nash Equilibria . . . . . . . . . . . . . . . . . 242 A.3 AGG Graphical User Interface . . . . . . . . . . . . . . . . . . . 243 A.4 AGG Generators in GAMUT . . . . . . . . . . . . . . . . . . . . 244 A.5 Software Projects Under Development . . . . . . . . . . . . . . . 244 xi List of Figures Figure 3.1 AGG-0/ representation of the Ice Cream Vendor game. . . . . 48 Figure 3.2 AGG-0/ representation of a 3-player, 3-action graphical game. 51 Figure 3.3 A 5 × 6 Coffee Shop game: Left: the AGG-0/ representation without function nodes (looking at only the neighborhood of Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 α ). Middle: we introduce two function nodes, p′ (bottom) and p′′ (top). Right: α now has only 3 neighbors. . . . . . . . . . Left: a two-player congestion game with three facilities. The actions are shown as ovals containing their respective facilities. Right: the AGG-FNA representation of the same congestion game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AGG-0/ representation of the Job Market game. . . . . . . . . AGG-FN representation of a game with agent-specific utility functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . AGG-FNA representation of a 3-player polymatrix game. Function node UAB represents player A’s payoffs in his bimatrix game against B, UBA represents player B’s payoffs in his bimatrix game against A, and so on. To avoid clutter we do not show the edges from the action nodes to the function nodes in this graph. Such edges exist from A and B’s actions to UAB and UBA , from A and C’s actions to UAC and UCA , and from B and C’s actions to UBC and UCB . . . . . . . . . . . . . . . . . . . Projection of the action graph. Left: action graph of the Ice Cream Vendor game. Right: projected action graph and action sets with respect to the action C1. . . . . . . . . . . . . . . . xii 57 60 61 63 64 69 Figure 3.9 Representation sizes of coffee shop games. Top left: 5 × 5 grid with 3 to 16 players (log scale). Top right: AGG only, 5 × 5 grid with up to 80 players (log scale). Bottom left: 4-player r × 5 grid, r varying from 3 to 15 (log scale). Bottom right: AGG only, up to 80 rows. . . . . . . . . . . . . . . . . . . . . 93 Figure 3.10 Running times for payoff computation in the Coffee Shop game. Top left: 5 × 5 grid with 3 to 16 players. Top right: AGG only, 5 × 5 grid with up to 80 players. Bottom left: 4-player r × 5 grid, r varying from 3 to 15. Bottom right: AGG only, up to 80 rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Figure 3.11 Job Market games, varying numbers of players. Left: comparing representation sizes. Right: running times for computing 1000 expected utilities. . . . . . . . . . . . . . . . . . . . . . 95 Figure 3.12 Govindan-Wilson algorithm; Coffee Shop game. Top row: 4 × 4 grid, varying number of players. Bottom row: 4-player r × 4 grid, r varying from 3 to 12. For each row, the left figure shows ratio of running times; the right figure shows logscale plot of CPU times for the AGG-based implementation. The dashed horizontal line indicates the one day cutoff time. . . . . . . . 98 Figure 3.13 Govindan-Wilson algorithm; Job Market games, varying numbers of players. Left: ratios of running times. Right: logscale plot of CPU times for the AGG-based implementation. . . . . 99 Figure 3.14 Ratios of running times of simplicial subdivision algorithms on Coffee Shop games. Left: 4 × 4 grid with 3 to 4 players. Right: 3-player r × 3 grid, r varying from 4 to 7. . . . . . . . 99 Figure 3.15 Simplicial subdivision algorithm; symmetric AGG-0s / on small world graphs. Top row: 5 actions, varying number of players. Bottom row: 4 players, varying number of actions. The left figures show ratios of running times; the right figures show logscale plots of CPU times for the AGG-based implementation. The dashed horizontal line indicates the one day cutoff time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 xiii Figure 3.16 Visualization of a Nash equilibrium of a 16-player Coffee Shop game on a 4 × 4 grid. The function nodes and the edges of the action graph are not shown. The action node at the bottom corresponds to not entering the market. . . . . . . . . . . . . 101 Figure 3.17 Visualization of a Nash equilibrium of a Job Market game with 20 players. Left: expected configuration of the equilibrium. Right: two mixed equilibrium strategies. . . . . . . . . . . . . 102 Figure 3.18 Visualization of a Nash equilibrium of an Ice Cream Vendor game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Figure 4.1 The road game with m = 8 and the action graph of its AGG representation. . . . . . . . . . . . . . . . . . . . . . . . . . 111 Figure 4.2 Restricted game on the rightmost 6 actions. . . . . . . . . . . 111 Figure 4.3 A partial solution on the rightmost 6 actions describes the configuration over these 8 actions. . . . . . . . . . . . . . . . . . 112 Figure 4.4 Characteristic function chP,Q for the rightmost 6 actions with P = {T6, B6} and Q = {T5, T6, T7, B5, B6, B7}. . . . . . . . 118 Figure 4.5 An action graph G. . . . . . . . . . . . . . . . . . . . . . . . 120 Figure 4.6 Figure 4.7 The primal graph G′ . . . . . . . . . . . . . . . . . . . . . . . 120 Tree decomposition of und(G) . . . . . . . . . . . . . . . . . 120 Figure 4.8 Tree decomposition of primal graph G′ , satisfying the conditions of Lemma 4.4.11. . . . . . . . . . . . . . . . . . . . . . 120 Figure 5.1 Induced BN of the TAGG of Example 5.1.1, with 2 time steps, 3 lanes, and 3 players per time step. Squares represent behavior strategy variables, circles represent action count variables, diamonds represent utility variables and shaded diamonds represent decision-payoff variables. To avoid cluttering the graph, we only show utility variables at time step 2 and a decisionpayoff variable for one of the decisions. . . . . . . . . . . . . 146 Figure 5.2 The transformed BN of the tollbooth game from Figure 5.1 with 3 lanes and 3 cars per time step. . . . . . . . . . . . . . . 150 xiv Figure 5.3 Running times for expected utility computation. Triangle data points represent Approach 1 (induced BN), diamonds represent Approach 2 (transformed BN), squares represent Approach 3 (proposed algorithm). Figure 6.1 . . . . . . . . . . . . . . . . . . . . 155 Action graph for a symmetric Bayesian game with n players, 2 types, 2 actions per type. . . . . . . . . . . . . . . . . . . . . 166 Figure 6.2 BAGG representation for a Coffee Shop game with 2 types per player on an 1 × k grid. . . . . . . . . . . . . . . . . . . . . . 169 Figure 6.3 GW, varying players. . . . . . . . . . . . . . . . . . . . . . . 174 Figure 6.4 GW, varying locations. . . . . . . . . . . . . . . . . . . . . . 174 Figure 6.5 GW, varying types. . . . . . . . . . . . . . . . . . . . . . . . 174 Figure 6.6 Simplicial subdivision. . . . . . . . . . . . . . . . . . . . . . 174 xv Acknowledgments First and foremost I would like to thank my parents, for their unconditional love and support, for their wisdom, and for encouraging me to pursue my interests. I am the person I am because of them, and I am very lucky to have them as parents. Kevin Leyton-Brown has been my advisor since my MSc degree. He has introduced me to game theory, and mentored me through all the research projects described in this thesis. I am eternally grateful to him for being a great teacher and communicator, for showing me his research vision yet giving me the freedom to explore and find my research topics, for helping me refine my half-formed ideas, for giving me concrete advice and pushing me to be better at all aspects of being a researcher, and for the career opportunities he introduced me to. I can honestly say that I really enjoyed my Ph.D. experience. I would like to thank fellow members of Kevin’s game theory group and my office mates, David Thompson, James Wright and Baharak Rastegari, for stimulating discussions on research and otherwise, and for the camaraderie. I have also had many enjoyable discussions with Chris Ryan while he was doing his Ph.D. in Operations Research at UBC, during which he introduced me to quite a few interesting mathematical concepts including algebraic geometry and generating functions. I would like to thank David Poole and Joel Friedman for serving on my supervisory committee, my university examiners Michael Friedlander and Sergei Severinov, and my external examiner David Parkes. They have given me very helpful feedbacks on my thesis. Many members of the algorithmic game theory research community have given me encouragement and help during my studies, I would like to especially mention Vince Conitzer, Christos Papadimitriou, Tim Roughgarden, Tuomas Sandholm, and Ted Turocy. I would also like to thank all my collaborators, xvi some of which I have mentioned above: Kevin Leyton-Brown, Navin Bhat, Avi Pfeffer, Mohammad Ali Safari, Chris Ryan, Nando de Freitas, Michael Buro, David Thompson, James Wright, and Damien Bargiacchi. During my Ph.D. studies I was supported by UBC’s University Graduate Fellowship for one year, the NSERC Canada Graduate Scholarship for three years, and partially by a Google Research Award “Advanced Computational Analysis of Position Auction Games”. I would like to thank them for their financial support. xvii Chapter 1 Introduction Game theory is a mathematical theory of games, interactions in which multiple autonomous agents, each with their own utility functions, act according to their own interests. Game theory has received a great deal of study, and is perhaps the dominant paradigm in microeconomics [e.g., Fudenberg and Tirole, 1991]. In the last decade, there has been much research at the interface of computer science and game theory [e.g., Nisan et al., 2007, Shoham and Leyton-Brown, 2009]. This interdisciplinary field has been named “algorithmic game theory”, “computational economics”, and “multiagent systems” by various researchers. This recent interest in game theory by the computer science community has been partially motivated by the explosion in the popularity of the Internet, which is essentially a network of computers controlled by selfish agents. There is thus much recent effort to apply game theory to various subdomains of the Internet such as TCP/IP routing, peerto-peer sharing, auction environments including eBay and AdWords, and social networks. One fundamental class of computational problems in game theory is the computation of solution concepts of a finite game. Examples of solution concepts include Nash equilibrium and correlated equilibrium. Intuitively, these solution concepts are answers to the following type of questions: what are the likely outcomes of the game, under certain models of rationality of the agents? Thus the task of computing these solution concepts can be understood in the language of AI as reasoning about the game. The goal is to be able to efficiently carry out such reasoning for 1 real-world multiagent systems. One application of such game-theoretic reasoning is the development of autonomous agents that can act intelligently by taking into account the strategic behavior of other agents. Another application is to help the designer of a system to predict its likely outcomes and to optimize the parameters of the system to achieve preferred outcomes. Furthermore, some computer scientists argue that the complexity of these computational problems have implications on whether equilibria can be reached in practice. A famous quote by Kamal Jain is “if your laptop cannot find the equilibrium, neither can the market.” The input to such computational problems is a description of the game. Most of the game theory literature presumes that simultaneous-action games will be represented in normal form. This is problematic because in many domains of interest the number of players and/or the number of actions per player is large. In the normal form representation, the game’s payoff function is stored as a matrix with one entry for each player’s payoff under each combination of all players’ actions. As a result, the size of the representation grows exponentially with the number of players. A similar problem arises in dynamic games, for which the extensive form serves as the standard representation. For large games, it becomes infeasible to store the game in memory. Computations that require time polynomial in the input size are nevertheless impractical. Fortunately, most large games of practical interest have highly-structured payoff functions, and thus it is possible to represent them compactly, by which we mean a representation that is exponentially smaller than its induced normal form. Intuitively, this helps to explain why people are able to reason about these games in the first place: we understand the payoffs in terms of simple relationships rather than in terms of enormous lookup tables. Of course, there are any number of ways of representing games compactly. For example, games of interest could be assigned short ID numbers. But we ultimately want to be able to compute solution concepts of the games, and we would like the running time of our algorithms to depend on the size of the compact representation rather than the size of the corresponding normal form. Can we design representations of games that are able to compactly encode a wide range of interesting games and are amenable to efficient computation? And how do we design efficient algorithms for computing solution concepts in these 2 compactly represented games? These are the central questions I tackle in this thesis. Before discussing my contributions, I will first briefly summarize the relevant literature; I will give a more in-depth survey in Chapter 2. One thread of recent work in the literature has explored compact game representations (also called concise or succinct representations) that are able to succinctly describe games that exhibit certain types of structure. Examples of such representations for completeinformation simultaneous-action games include anonymous games, graphical games [Kearns et al., 2001], and congestion games [Rosenthal, 1973]. Examples of structure include symmetry/anonymity, strict and action-specific independence, and additivity. However, the existing representations either only capture a subset of these types of structure, or are only able to represent a subset of games that exhibit a specific structure. There is a lack of a general modeling language that is fully expressive (able to express arbitrary games) while also able to compactly encode utility functions exhibiting commonly-encountered types of structure. Nash equilibrium (NE) is perhaps the most well-known and well-studied gametheoretic solution concept. There is a line of recent results from the computational complexity theory community on the hardness of various computational problems regarding Nash equilibria, perhaps most prominently the series of papers [Chen and Deng, 2006, Daskalakis et al., 2006b, Goldberg and Papadimitriou, 2006] establishing the PPAD-completeness of the the problem of finding a sample mixed-strategy Nash equilibrium in normal-form games of two or more players. I take the view that although these hardness results are important for understanding the problems, they do not imply that practical algorithms cannot be built. For example, there has been great advances in the design and implementation of practical solvers for theoretically hard problems such as SAT and integer programming. In terms of algorithms for finding a Nash equilibrium, earlier literature from economics and operations research focused on algorithms for the normal form [e.g., Govindan and Wilson, 2003, van der Laan et al., 1987]. In the last decade, with more compact game representations being proposed, there has been more efforts from the computer science community on algorithms for compact representations. Such efforts can roughly be divided into two categories, “black-box” approaches and “special-purpose” approaches. A black-box algorithm requires certain subroutines 3 provided by the representation to work, but otherwise treats the representation as a black box. Examples include efforts to adapt algorithms designed for the normal form to compact representations [Bhat and Leyton-Brown, 2004, Blum et al., 2006]. The computation of expected utility has emerged as a key subtask required by many black-box algorithms. The ability to carry out this computation efficiently has become an important design criterion for compact representations. Fortunately, most existing representations admit polynomial-time algorithms for expected utility. The existing black-box approaches are for the problem of finding a sample Nash equilibrium; while this problem is very important, we are often interested in questions regarding the set of equilibria such as finding the optimal equilibrium. On the other hand, a special-purpose approach tries to exploit certain specific structure of the game, and is thus specific to the representation. Although not as general as the black-box approach, a special-purpose approach can often identify tractable subclasses of games while the general case is hard; furthermore it can sometimes compute a concise description of the set of equilibria, allowing us to e.g., compute the optimal equilibrium. Examples include algorithms for computing pure-strategy Nash equilibria for tree graphical games [Daskalakis and Papadimitriou, 2006, Gottlob et al., 2005] and singleton congestion games [Ieong et al., 2005], and for computing mixed-strategy Nash equilibria for symmetric games [Papadimitriou and Roughgarden, 2005] and anonymous games [Daskalakis and Papadimitriou, 2007]. In terms of software implementations, the GAMBIT [McKelvey et al., 2006] package contains many of the existing algorithms for the normal form and the extensive form. There is a relative lack of publicly-available implementations of algorithms for compact representations, except for the Gametracer [Blum et al., 2002] package which provides implementations of black-box adaptations of two of Govindan and Wilson’s algorithms [Govindan and Wilson, 2003, 2004] for finding a sample Nash equilibrium. In summary, although there have been many advances in the theoretical understanding of how certain types of structure in games can be exploited for efficient computation, the lack of a general representation and publicly available software implementations for structured games meant that the computational analysis of large games has not become practical. Much of this thesis can be understood as my efforts to address these problems. Below I give an outline of my contributions, 4 including the design of game representations that can capture a wide variety of computation-friendly structure, novel algorithms for computing sample equilibria as well as optimal equilibria in compact games, and software implementations of tools for modeling and reasoning about structured games. In Chapter 3 I present work (joint with Kevin Leyton-Brown and Navin Bhat) regarding Action-graph games (AGGs), a compact representation of complete-information simultaneous-action games first proposed by Bhat and Leyton-Brown [2004]. We make several contributions that significantly extends Bhat and Leyton-Brown’s [2004] original work. First, we extended the original definition of AGGs by introducing function nodes and additive utility functions, capturing a wider variety of utility structure. The resulting AGG representation is a fully-expressive modeling language that both extends and unifies previous approaches: it can compactly express games with structure such as strict or context-specific independence, anonymity, and additivity; it can be used to compactly encode all games that are compact when represented as graphical games, symmetric games, anonymous games, congestion games, and polymatrix games, as well as additional realistic games that would take exponential space to represent using these existing representations. Second, we gave a polynomial-time algorithm for the important task of computing expected utility for AGGs, which then allows us to speed up existing normal-form-based equilibrium-finding algorithms including Govindan and Wilson’s [2003] Global Newton Method and the simplicial subdivision algorithm of van der Laan et al. [1987]. Third, we implemented and made available software tools for constructing, visualizing, and reasoning with AGGs. We present results of experiments showing that using AGGs leads to a dramatic increase in the size of games accessible to computational analysis. Pure-strategy Nash equilibrium (PSNE) is a more restricted concept that Nash equilibrium, and has certain theoretically and practically attractive properties. In Chapter 4 I present work (joint with Kevin Leyton-Brown) on computing purestrategy Nash equilibria for AGGs. Unlike our black-box approach in Chapter 3 for computing equilibria, here we use a special-purpose approach that exploits the graph-theoretical properties of the action graph. In particular, we propose a dynamic-programming algorithm that constructs equilibria of the game from equilibria of restricted games played on subgraphs of the action graph. If the game is 5 symmetric and the action graph has bounded treewidth, our algorithm determines the existence of pure-strategy Nash equilibrium in polynomial time. We also extend our approach to certain classes of asymmetric AGGs. Just as AGGs unify and extend existing representations, our approach can be understood as a generalization of existing special-purpose approaches for representations including singleton congestion games [Ieong et al., 2005] and graphical games [Daskalakis and Papadimitriou, 2006, Gottlob et al., 2005]. So far we have focused on representing and reasoning with simultaneous-action games. On the other hand, many multi-agent interactions involve decisions made sequentially over time; such situations are modeled as dynamic games in game theory. The standard representation for dynamic games, the extensive form, is inefficient for large, structured games, while the state-of-the-art compact representation, multi-agent influence diagrams (MAIDs), only capture strict utility independence structure. In Chapter 5 I present work (joint with Kevin Leyton-Brown and Avi Pfeffer), in which we propose temporal action-graph games (TAGGs), an extension of AGGs that can compactly represent dynamic games exhibiting a wide range of structure including anonymity or context-specific utility independencies. We also show that TAGGs can be understood as indirect MAID encodings in which many deterministic chance nodes are introduced. We provide an efficient algorithm for computing expected utility for TAGGs, and show both theoretically and empirically that our approach improves significantly on MAIDs. Games of incomplete information, or Bayesian games, are an important gametheoretic model in which players are uncertain about the utilities of the game. Despite having many applications in economics, there are relatively fewer results on the computational aspects of Bayesian games, such as compact representations and practical algorithms for computing solution concepts like Bayes-Nash equilibria. In Chapter 6 we extend AGGs to the incomplete-information setting and present Bayesian action-graph games (BAGGs), a compact representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. We provide an algorithm for computing expected utility in BAGGs, and discuss conditions under which the algorithm runs 6 in polynomial time. Sample Bayes-Nash equilibria of BAGGs can be computed by adapting existing algorithms for complete-information normal form games and leveraging our expected utility algorithm. First proposed by Aumann [1974, 1987], correlated equilibrium (CE) is another important solution concept. In a landmark paper, Papadimitriou and Roughgarden [2008] described a polynomial-time black-box algorithm (“Ellipsoid Against Hope”) for computing sample correlated equilibria of concisely-represented simultaneousmove games. Recently, Stein, Parrilo and Ozdaglar [2010] showed that this algorithm can fail to find an exact correlated equilibrium, but can be easily modified to efficiently compute approximate correlated equilibria. Currently, it remains an open problem to determine whether the algorithm can be modified to compute an exact correlated equilibrium. In Chapter 7 we show that it can, presenting a variant of the Ellipsoid Against Hope algorithm that guarantees the polynomial-time identification of exact correlated equilibrium. Also, our algorithm is the first to tractably compute correlated equilibria with polynomial-sized supports; such correlated equilibria are more natural solutions than the mixtures of product distributions produced previously, and have several advantages including requiring fewer bits to represent, being easier to sample from, and being easier to verify. However, since in general there can be an infinite number of correlated equilibria in a game, finding an arbitrary one is of limited value. In Chapter 8 we focus on the problem of computing a correlated equilibrium that optimizes some objective (e.g., social welfare). Papadimitriou and Roughgarden [2008] gave a sufficient condition for the tractability of the problem, however it only applies to a subset of existing representations. We propose a different algorithmic approach for the optimal CE problem that applies to all compact representations, and give a sufficient condition that generalizes Papadimitriou and Roughgarden’s condition. In particular, we reduce the optimal CE problem to the deviation-adjusted social welfare problem, a combinatorial optimization problem closely related to the optimal social welfare outcome problem. Our algorithm can be understood as an instance of the black-box approach, with the computation of the deviated social welfare problem as the key subroutine provided by the game representation. This framework allows us to identify new classes of games for which the optimal CE problem is tractable, including graphical polymatrix games on tree graphs. We also study the problem 7 of computing the optimal coarse correlated equilibrium, a solution concept closely related to CE. Using a similar approach we derive a sufficient condition for this problem, and use it to prove that the problem is tractable for singleton congestion games. In Appendix A I describe software packages we implemented and made available at http://agg.cs.ubc.ca. Taken together, this thesis presents several basic components of an algorithmic framework for computational analysis of large games: compact representations for complete-information and incomplete-information simultaneous-action games as well as dynamic games, a collection of implemented algorithms for computing sample Nash and correlated equilibria given such games, and some theoretical foundations for computing PSNE and optimal correlated equilibria. These are parts of a larger ongoing effort by our research group, that aims to apply computational game-theoretic analysis to real-world systems, especially the design and analysis of market mechanisms such as auctions. Such mechanism design problems have traditionally been attacked via purely analytical means, but computational analysis allows us to tackle settings for which theoretical analysis is difficult or impossible. Position auctions for advertising slots, such as the Generalized Second-Price auction used by Google AdWords, have received much recent interest from computer scientists and economists. Thompson and Leyton-Brown [2009] were able to use AGGs to compactly represent complete-information position auctions and compute their Nash equilibria, which allows them to analyze the economic properties of such auctions such as revenue and efficiency. Building on their work, I am currently working with David and Kevin to extend this analysis to incompleteinformation models of position auctions using BAGGs. Finally, I mention a couple of papers on related topics that I co-authored but do not include in this thesis. In [Jiang and Safari, 2010], Mohammad Ali Safari and I analyzed the problem of deciding the existence of pure-strategy Nash equilibria for graphical games on restricted classes of graphs, and showed that the problem is in polynomial time if and only if the class of graphs has bounded treewidth (after iterated removal of sinks). We proved our result by applying Grohe’s characterization of the complexity of homomorphism problems. This result illustrated a limitation of a class of graph-based special-purpose approaches that includes the algorithm 8 of Chapter 4, that it cannot be extended much beyond bounded-treewidth graphs. It influenced my later focus on more general approaches such as those in Chapters 7 and 8. In [Ryan et al., 2010], Chris Ryan, Kevin Leyton-Brown and I analyzed the problem of computing pure-strategy Nash equilibria in symmetric games whose utilities are compactly represented, such that the number of players can be exponential in the representation size. We showed that if the utility functions are represented as piecewise-linear functions, there exist polynomial-time algorithms for finding a pure-strategy Nash equilibria and count the number of equilibria. Our approach made use of the rational generating function method developed by Barvinok and Woods. I do not include these papers here because they do not fit in with the focus of the thesis. 9 Chapter 2 A Brief Survey on the Computation of Solution Concepts In this chapter we give a brief survey on the economics and computer science literature on the computation of game-theoretic solution concepts, focusing on Nash equilibrium and correlated equilibrium. There have been several surveys on various aspects of this topic: von Stengel [2002] focused on two-player games; McKelvey and McLennan [1996] focused on algorithms for the normal form; Papadimitriou [2007] focused on complexity results. In this survey we give emphasis to topics most relevant to this thesis, i.e., results that are relevant to large, structured games. The goal of this chapter is to present a bird’s-eye view of the state of the art. We will largely follow the narrative outlined in Chapter 1. In Section 2.1 we look at representations of games and the types of structure they capture. In Section 2.2 we look at algorithmic and complexity-theoretic results, with emphasis on algorithms for compact representations. In Section 2.3 we survey software packages for game-theoretic modeling and computation. 10 2.1 Representations of Games A game is a mathematical model of interaction among self-interested agents. Informally, to specify a game we need to specify a set of agents (also known as players), a set of strategies for each agent, and a utility function for each agent that assigns a utility value (also known as payoff) to each outcome of the game. Such models can be further divided into complete-information static games, incomplete-information static games and dynamic games. A game representation is a data structure that stores all information needed to specify a game. An instance of a representation is a game encoded in that representation. Thus it is often useful to think of a game representation as a class (or type) in the language of object-oriented software engineering, and an instance of a representation as an object in that class. Then the size of a representation is the amount of data required to specify a game instance (i.e., initialize an object) of that representation. In this section we survey the existing literature on representing games. Section 2.1.1 focuses on representing complete-information static games; Section 2.1.2 focuses on representing dynamic games; Section 2.1.3 focuses on representing incomplete-information games. 2.1.1 Representing Complete-information Static Games In static games, also known as simultaneous-move games, each agent chooses a strategy simultaneously (e.g., Rock-Paper-Scissors). By complete-information we mean that each agent knows the utility functions of all agents. Definition 2.1.1. A complete-information static game is a tuple (N, {Ai }i∈N , {ui }i∈N ) where • N = {1, . . . , n} is the set of agents; • for each agent i, Ai is the nonempty set of i’s actions (or pure strategies). We denote by ai ∈ Ai one of agent i’s actions. An action profile (or pure-strategy profile) a = (α1 , . . . , αn ) ∈ ∏i∈N Ai is a tuple of actions of the n agents. We also denote by a−i the (n − 1)-tuple of actions by agents other than i under 11 the action profile a.1 • ui : ∏ j∈N A j → R is i’s utility function, which specifies i’s utility given any action profile. A game representation is fully expressive if it can represent arbitrary games. We say a game representation has polynomial type [Daskalakis et al., 2006a] if the number of players and the number of actions for each player are bounded by polynomials of the representation size. For example, if the set of players and the sets of actions are encoded explicitly, then the representation has polynomial type. This is the case for all representations of static games discussed in this section. Normal Form A normal form representation of a game uses a multi-dimensional matrix Ui ∈ R∏ j∈N A j to represent each utility function ui . The size of this representation is approximately n ∏ j∈N |A j |, which is O(nmn ) where m = maxi∈N |Ai |. Two-player normal-form games are also called bimatrix games, since the utility functions of such a game can be specified by two A1 × A2 matrices. Although these games are fully expressive, the size of the representation grows exponentially in the number of players. As a result, the normal form is unsuitable for representing large systems. Although several computational tasks such as finding pure Nash equilibria and computing expected payoff under mixed strategies are polynomial-time in the size of the normal form representation, they are intractable for large games because the representation size itself is exponential. Graphical Games Fortunately, most real-world large games have structure that allows them to be represented compactly. A popular compact representation of games is graphical games, proposed by Kearns et al. [2001]. A game is associated with a graph whose 1 While in complete-information static games the concepts of actions and pure strategies coincide, we will see that this is no longer the case for incomplete-information games and dynamic games. For the cases when pure strategies are distinct from actions, we denote pure strategies by si ∈ Si and pure-strategy profiles by s ∈ S. For complete-information static games, both the a-based notation and the s-based notation are commonly used in the literature to denote pure strategies/actions [e.g., Fudenberg and Tirole, 1991, Shoham and Leyton-Brown, 2009]. 12 nodes correspond to the players of the game and edges correspond to payoff influence between players. In other words, each player’s payoffs depend only on his actions and those of his neighbors in the graph. We call this kind of structure strict utility independence. Definition 2.1.2. A graphical game is a tuple (G, {Ui }i∈N ) where • G = (N, E) is a directed graph,2 with the set of vertices corresponding to the set of agents. E is a set of ordered tuples corresponding to the arcs of the graph, i.e. (i, j) ∈ E means there is an arc from i to j. Vertex j is a neighbor of i if ( j, i) ∈ E. • for each i ∈ N, a local utility function Ui : ∏ j∈ν (i) A j → R where ν (i) = {i} ∪ { j ∈ N|( j, i) ∈ E} is the neighborhood of i. Each local utility function Ui is represented as a matrix of size ∏ j∈ν (i) |A j |. Since the size of the local utility functions dominates the size of the graph G, the total size of the representation is O(nm(I +1) ) where I is the maximum in-degree of G. A graphical game (G, {Ui }) specifies a game (N, {Ai }, {ui }) where each Ai is specified by the domain of agent i in Ui , and for all i ∈ N and all action profiles a we have ui (s) ≡ Ui (aν (i) ), where aν (i) = (a j ) j∈ν (i) . Graphical games are fully expressive: an arbitrary game can be represented as a graphical game on a complete graph. Symmetric Games and Anonymous Games A game is symmetric when all players are identical and interchangeable. Formally, a game is symmetric if each player has an identical set of actions and for all permutation of players π : {1, . . . , n} → {1, . . . , n}, ui (a1 , . . . , an ) = uπ (i) (aπ (1) , . . . , aπ (n) ). 2 Kearns et al. [2001] originally defined graphical games on undirected graphs, while some later authors [e.g., Daskalakis and Papadimitriou, 2006, Gottlob et al., 2005] used the directed graph version given here. A undirected graphical game is equivalent to a directed graphical game in which each edge {i, j} from the undirected graph is replaced by two directed edges (i, j) and ( j, i). Thus the directed graph version is more general. 13 Symmetric games have been studied since the beginning of noncooperative game theory. For example, Nash proved that symmetric games always have symmetric mixed Nash equilibria [Nash, 1951]. In a symmetric game, a player’s utility depends only on the player’s chosen action and the configuration, which is the vector of integers specifying the numbers of players choosing each of the actions. We say such a utility function exhibits anonymity. As a result, symmetric games can be represented more compactly than the normal form: we only need to specify a utility value for each action and each configuration. For a symmetric game with n players and m actions per player, the number of configurations is this grows like nm−1 , in which case Θ(nm−1 ) n+m−1 m−1 . For fixed m, numbers are required to specify the game. A straightforward generalization of symmetric games is k-symmetric games, in which there are k equivalence classes of players. Nash’s [1951] result applies to a very general notion of symmetry: roughly, if a game is invariant under a permutation group, then there exists a Nash equilibrium strategy profile that is invariant under the same group. Specialized to k-symmetric games, it implies that they always have k-symmetric Nash equilibria, where strategies within each class are identical. Any game is a k-symmetric game with k = n. On the other hand, when k is small compared to n, k-symmetric games can be compactly represented by specifying utilities for each k-configuration, where a k-configuration is a tuple of k configurations, one for each equivalence class. There has also been research [e.g., Brandt et al., 2009, Daskalakis and Papadimitriou, 2007] on a generalization of symmetric games called anonymous games, in which a given player’s utility depends on his identity as well as the action chosen and the configuration. Anonymous games can be compactly represented in a similar manner, requiring Θ(nm ) numbers for fixed m. Polymatrix Games Polymatrix games are a class of games in which each player’s utility is the sum of utilities resulting from her bilateral interactions with each of the n− 1 other players. This can be represented by specifying for each pair of players i and j a bimatrix game (two-player normal form game) with sets of actions Ai and A j . 14 When a utility function can be expressed as a sum of other functions, as in polymatrix games, we say it exhibits additive structure. Congestion Games A congestion game [Rosenthal, 1973] is a tuple (N, M, (Ai)i∈N , (K jk ) j∈M,k≤n ), where N = {1, . . . , n} is the set of players, M = {1, . . . , m} is a set of facilities (or resources); Ai is player i’s set of actions; each action ai ∈ Ai is a subset of the facilities: ai ⊂ M. K jk is the cost of using facility j when a total of k players have chosen actions that include facility j. For notational convenience we also define K j (k) ≡ K jk . Let #( j, a) be the number of players that chose facility j given the action profile a. The total cost (or disutility) of player i under pure strategy profile a = (ai , a−i ) is the sum of the costs on each of the facilities in ai , Costi (ai , a−i ) = −ui (ai , a−i ) = ∑ K j (#( j, a)). (2.1.1) j∈ai Only nm numbers are needed to specify the costs (K jk ) j∈M,k≤n . The representation also needs to specify the ∑i∈N |Ai | actions, each of which is a subset of M. If we use an m-bit binary string to represent each of these subsets, the total size of the congestion game representation is O(mn + m ∑i∈N |Ai |). From the above definition we can see that congestion games exhibit a specific combination of anonymity and additive structure, plus a type of utility independence which we call context-specific independence (CSI). This means that the independence structure of player i’s utility function (i.e., which subset of players that affect player i’s utility) changes depending on the context, which is a certain feature of the players’ strategies (in this case the facilities included in i’s chosen action). This is a more general type of independence structure than the strict independencies captured by graphical games. On the other hand, congestion games are not fully expressive. Local Effect Games Local Effect Games (LEGs), proposed by Leyton-Brown and Tennenholtz [2003], were the first graphical representation of games that focused on actions. In an LEG, 15 we have a graph whose nodes correspond to the actions of the game. Each player can choose any one of the nodes. Define configuration as in symmetric games, and let the configuration over node k, denoted c(k), be the number of players choosing node k. There is a node function Uk associated with each node k which maps the configuration of node k to a real number. There is an edge function Uk,m associated with each edge (k, m) of the graph, which maps the configuration over nodes k and m to a real number. The utility of a player i choosing node k is the sum of the node function Uk and all incoming edge functions, evaluated at the current configuration c: Uk (c(k)) + ∑ Um,k (c(m), c(k)). m∈ν (k) Like congestion games, LEGs also exhibit a combination of anonymity, additivity and context-specific independence structure. In this case the context for player i’s utility independence is the action chosen by i. We call such structure actionspecific independence. Unfortunately, like congestion games, LEGs are also not fully expressive. Action-Graph Games We have seen representations that capture various types of structure such as strict and context-specific independence, anonymity, and additivity. However, the existing representations either only capture a subset of these types of structure (graphical games, symmetric/anonymous games, polymatrix games), or are only able to represent a subset of games (symmetric/anonymous games, polymatrix games, congestion games, local-effect games). Action-graph games (AGGs), proposed by Bhat and Leyton-Brown [2004] and extended by Jiang et al. [2011], are a compact representation of simultaneous-move games that extends and unifies these previous approaches. AGGs are fully expressive (able to represent arbitrary games), can compactly express games whose utility functions exhibit action-specific independence, anonymity or additivity, and furthermore have nice computational properties. Chapter 3 gives a detailed discussion of AGGs. 16 2.1.2 Representing Dynamic Games In dynamic games, agents move sequentially. When agents are able to perfectly observe all moves, dynamic games are said to exhibit perfect information; otherwise, dynamic games exhibit imperfect information. The standard representation for dynamic games is the extensive form, which is a tree whose edges represent moves of players. Thus each node of the tree corresponds to a unique sequence of moves. Utilities for all players are specified for each leaf of the tree. Each internal node is assigned to a player, who can choose among the edges below that node. Imperfect information is specified using information sets: each player’s set of internal nodes is partitioned into information sets, and a player is unable to distinguish nodes in any of his information sets. Randomness in the environment can be represented as nodes for the Nature (also known as Chance) player, who randomizes over his actions according to some fixed distribution. See e.g., [Shoham and Leyton-Brown, 2009] for a formal definition of the extensive form. Each extensive-form game can be transformed to an induced normal form, where each pure strategy of a player prescribes an action for each of her information sets. The number of pure strategies can be exponential in the size of the extensive form, so transforming to the induced normal form entails an exponential blowup in representation size. In this sense the extensive form can be seen as a compact representation of dynamic games. However, this representation requires us to specify utilities for every possible sequence of moves; when the game exhibits more structure than this, a more compact representation is needed. For imperfect-information dynamic games, the most influential compact representation is multiagent influence diagrams (MAIDs) [Koller and Milch, 2003], which generalize single-agent influence diagrams to multiple agents. A MAID is represented as a directed graph, consisting of decision nodes, chance nodes and utility nodes. Each chance node corresponds to a random variable, with its domain and its probability distribution conditioned on its parents (nodes with incoming edges) specified by input. Each decision node represents a decision (over a finite number of choices) taken by some player, given her observations which are the instantiated values of the decision node’s parents. Each utility node represents the payoff to 17 some player, as a function of the instantiated values of the node’s parents. MAIDs are compact when players’ utility functions exhibit strict independencies, but are unable to compactly represent utility functions with anonymity or action-specific independencies. In Chapter 5 we discuss temporal action-graph games (TAGGs), which are a generalization of AGGs to the dynamic setting, and are able to compactly represent dynamic games with anonymity or context-specific utility independencies. 2.1.3 Representing Games of Incomplete Information In many multi-agent situations, players are uncertain about the game being played. Harsanyi [1967] proposed games of incomplete information (or Bayesian games) as a mathematical model of such interactions. Definition 2.1.3. A Bayesian game is a tuple (N, {Ai }i∈N , Θ, P, {ui }i∈N ) where N = {1, . . . , n} is the set of players; each Ai is player i’s action set, and A = ∏i Ai is the set of action profiles; Θ = ∏i Θi is the set of type profiles, where Θi is player i’s set of types; P : Θ → R is the type distribution and ui : A × Θ → R is the utility function for player i. As in the complete-information case, we denote by ai an element of Ai , and a = (a1 , . . . , an ) an action profile. Furthermore we denote by θi an element of Θi , and by θ a type profile. The game is played as follows. A type profile θ = (θ1 , . . . , θn ) ∈ Θ is drawn according to the distribution P. Each player i observes her type θi and, based on this observation, chooses from her set of actions Ai . Each player i’s utility is then given by ui (a, θ ), where a is the resulting action profile. Intuitively player i’s type represents her private information about the game. Bayesian games can be encoded as dynamic games with an initial move by Nature. Thus dynamic game representations such as the extensive form can be used to represent Bayesian games. This is also why we do not discuss dynamic games of incomplete information here, as they can also be encoded using existing dynamic game representations. However, incomplete-information static games do have independent interest apart from their dynamic game interpretation, as they are more similar to complete-information static games than to dynamic games. 18 In specifying a Bayesian game, the space bottlenecks are the type distribution and the utility functions. Without additional structure, we cannot do better than representing each utility function ui : A × Θ → R as a table and the type distribution as a table as well. We call this representation the Bayesian normal form. The size of this representation is n × ∏ni=1 (|Θi | × |Ai|) + ∏ni=1 |Θi |. A Bayesian game can be converted to its induced normal form, which is a complete-information game with the same set of n players, in which each player’s set of actions is her set of pure strategies in the Bayesian game. Each player’s utility under an action profile is defined to be equal to the player’s expected utility under the corresponding pure strategy profile in the Bayesian game. Alternatively, a Bayesian game can be transformed to its agent form, where each type of each player in the Bayesian game is turned into one player in a complete-information game. The sizes of the normal forms for the two complete-information interpretations are both exponential in the size of the Bayesian normal form. Singh et al. [2004] proposed a incomplete information version of the graphical game representation. Gottlob et al. [2007] considered a similar extension of the graphical game representation. Like graphical games, such representations are limited in that they can only exploit strict utility independencies. In Chapter 6 we discuss Bayesian Action-Graph Games (BAGGs), a fullyexpressive compact representation for Bayesian games that can compactly express Bayesian games exhibiting commonly encountered types of structure including symmetry, action- and type-specific utility independence, and probabilistic independence of type distributions. 2.2 Computation of Game-theoretic Solution Concepts Being able to compactly represent structured games is necessary, but often not sufficient for our purposes. We would like to efficiently reason about these games, by computing game-theoretic solution concepts such as Nash equilibrium and correlated equilibrium. 19 2.2.1 Computing Sample Nash Equilibria for Normal-Form Games In this subsection, we survey the literature on computing Nash equilibria in games represented in normal form. We start with the definition of Nash equilibrium and some theoretical results on the complexity of finding a sample Nash equilibrium, then look at existing algorithms, focusing on approaches for games with more than two players. In summary, the problem of computing one Nash equilibrium is PPAD-complete: polynomial time algorithms are unlikely to exist. Unsurprisingly, existing approaches all require exponential time in the size of the normal form. In a simultaneous-move game, a player i plays a pure strategy when she deterministically chooses an action from her action set Ai . She can also randomize over her actions, in which case we say that she plays a mixed strategy. Formally, let ϕ (X ) denote the set of all probability distributions over a set X . Define the set of mixed strategies for i as Σi ≡ ϕ (Ai ); then a mixed strategy σi ∈ Σi is a probability distribution over Ai . Define the set of all mixed strategy profiles as Σ ≡ ∏i∈N Σi ; then a mixed strategy profile σ ∈ Σ is a tuple of the n players’ mixed strategies. The expected utility (also known as expected payoff) of player i under the mixed strategy profile σ , denote by ui (σ ), is ui (σ ) = ∑ ui (a) ∏ σ j (ai ), (2.2.1) j∈N a∈A where σi (ai ) denotes the probability that i plays ai . The support of a mixed strategy σ j is the set of actions with positive probability under the distribution σ j . A support profile is a tuple of all players’ supports. Given σ−i , a tuple of mixed strategies of players other than i, we define the best response set of i to be the set of i’s mixed strategies that maximize her expected utility: BRi(σ−i ) = arg max ui (σi , σ−i ) σi Given σ−i , the expected utility of i playing mixed strategy σi is a convex combination of the expected utilities of playing pure strategies in Ai , so at least one of the pure strategies must be a best response. Thus to check whether σi is a best response, we just need to compare its expected utility against the expected utilities 20 of playing each of i’s pure strategies. One of the central solution concepts in game theory is Nash equilibrium. Definition 2.2.1 (Nash Equilibrium). A mixed strategy profile σ is a Nash equilibrium if for all i ∈ N, σi ∈ BRi(σ−i ). Intuitively, a Nash equilibrium is strategically stable: no player can profit by unilaterally deviating from her current mixed strategy. From the above discussion on best response, an equivalent condition for Nash equilibrium is that for all i ∈ N, for all ai ∈ Ai , ui (σ ) ≥ ui (ai , σ−i ), where by a slight abuse of notation, we denote by (ai , σ−i ) the mixed strategy profile where i plays pure strategy ai and other players play according to σ . One of the most famous results in game theory is Nash’s proof that any finite game always has a Nash equilibrium [Nash, 1951]. For a tutorial on Nash’s proof (as well as a derivation of Brouwer’s fixed-point theorem, which is used by his proof), see [Jiang and Leyton-Brown, 2007b]. Although a Nash equilibrium always exists, the existence proofs do not give an efficient algorithm for finding one. The central computational problem we consider here is the problem of finding a sample Nash equilibrium: Problem 2.2.2 (NASH). Given a game represented in normal form, find one Nash equilibrium. McKelvey and McLennan [1996] showed that this problem can be formulated as instances of other of computational problems, e.g., • finding a fixed point of a continuous function; • finding a global minimum of a continuous function; • solving a system of polynomial equations and inequalities. A frequently-used notion of approximation for Nash equilibrium is the socalled ε -Nash equilibrium: Definition 2.2.3 (ε -Nash Equilibrium). A mixed strategy profile σ is an ε -Nash equilibrium for some ε ≥ 0 if for all i ∈ N, for all ai ∈ Ai , ui (σ ) + ε ≥ ui (ai , σ−i ). Intuitively, each player cannot gain more than ε by deviating from her mixed strategy. When ε = 0, we recover Nash equilibrium. 21 Complexity The NASH problem is different from decision problems studied in complexity theory (e.g. SAT), which have a yes/no answer. Since a Nash equilibrium always exists, the decision problem asking about the existence of Nash equilibrium can be solved by a trivial algorithm that always returns “yes”. Instead, we are interested in finding a Nash equilibrium. This is an example of a function problem, which requires more complex answers than yes/no. Because we can check whether a given mixed strategy profile is a Nash equilibrium by computing expected utilities, the NASH problem is in FNP, the function problem version of NP. In fact it belongs to TFNP, the class of FNP problems whose solutions are guaranteed to exist. Another issue is that a Nash equilibrium for a game of more that two players may require irrational numbers in the probabilities, even if the game itself involves only rational payoffs. It is impossible to represent such a solution exactly using floating point numbers. Instead, in such cases we look for algorithms that given a game and an error tolerance ε represented in binary, computes an ε -Nash equilibrium. As always, we evaluate complexity as a function of the input size, which here includes ε . A recent series of papers [Chen and Deng, 2006, Daskalakis et al., 2006b, Goldberg and Papadimitriou, 2006] established that the NASH problem is PPADcomplete for normal form games, even if the game has only two players. The complexity class PPAD, introduced by Papadimitriou [1994], stands for Polynomial Parity Argument (Directed version). It is the class of TFNP problems whose solutions are guaranteed by a parity argument. It is widely believed that PPADcomplete problems are unlikely to be in P [e.g., Papdimitriou, 2007]. Although any Nash equilibrium is close to an ε -Nash equilibrium (in the space of mixed strategy profiles), a given ε -Nash equilibrium may be arbitrarily far from any Nash equilibrium of the game. Etessami and Yannakakis [2007] studied the complexity of the problem of finding an ε -Nash equilibrium close to some exact Nash equilibrium. They showed that the problem is at least as hard as the squareroot sum problem, which is not known even to belong to NP. 22 Algorithms for Two-Player Games A two-player game is zero-sum if for all action profiles a, we have u1 (a) + u2 (a) = 0. For zero-sum games, Nash equilibria can be computed in polynomial time by linear programming (see, e.g., [Shoham and Leyton-Brown, 2009, von Neumann and Morgenstern, 1944]). For general two-player games, the NASH problem can be formulated as a linear complementarity problem (LCP). The canonical method for solving such games is the Lemke-Howson Algorithm [Lemke and Howson, 1964]. Sets of labels are assigned to mixed-strategy profiles and Nash equilibria are characterized as “completely-labeled” mixed-strategy profiles. The algorithm uses pivoting techniques that are similar to the Simplex Algorithm to trace a path that ends at a completely-labeled point (i.e., Nash equilibrium). It is guaranteed to find a Nash equilibrium but in the worst case may require exponential time [Savani and von Stengel, 2004]. Lemke’s algorithm [Lemke, 1965] is a related method that uses similar pivoting techniques. Lipton et al. [2003] used the probabilistic method to show that for any twoplayer game, there always exists an ε -equilibrium with log-sized support. Their result implies a quasi-polynomial algorithm for finding an ε -equilibrium. Another interesting property of two-player games is that if both of the payoff matrices have small rank (say k), then there exists a Nash equilibrium with small (size k) support. Such a Nash equilibrium can be found efficiently by going through the small-sized support profiles. This was discussed by Lipton et al. [2003], but they mentioned that the result was known earlier. For bimatrix games whose entry-wise sum of the two matrices have small rank, Kannan and Theobald [2009] proposed a polynomial time algorithm for finding approximate Nash equilibria. More recently, Adsul et al. [2011] showed that if the rank of the sum of the two matrices is 1, a Nash equilibrium can be computed in polynomial time. Fictitious Play We now focus on algorithms for n-player games, where n > 2. We start with Fictitious Play [e.g., Brown, 1951, Shoham and Leyton-Brown, 2009], well-known 23 in the study of learning in games but can also be used as an algorithm for finding Nash equilibria. It is an iterative process; at each step, each player i plays a best response assuming each of the other players j chooses a mixed strategy corresponding to the empirical distribution of j’s past actions. For certain classes of games (e.g., zero-sum games and potential games) the empirical distribution of this process converges to a Nash equilibrium. However it is not guaranteed to converge for all games, hence it is only a heuristic for general games. Simplicial Subdivision One influential class of algorithms for computing Nash equilibria in n-player games are simplicial subdivision algorithms, which are based on Scarf’s algorithm [1967]. A modern version is due to van der Laan, Talman & van der Heyden [1987]. In a high level, the algorithm does the following: 1. The space of mixed strategy profiles Σ = ∏i Σi is partitioned into a set of subsimplexes. 2. We assign labels to vertices of the subsimplexes, in a way such that a “completely labeled” subsimplex corresponds to an approximate Nash equilibrium. 3. The algorithm follows a path of “almost completely labeled” subsimplexes, and eventually reaches a “completely labeled” subsimplex. 4. The approximate equilibrium is refined by restarting the algorithm near the approximate equilibrium, but using a finer grid. It can be proven (using Sperner’s Lemma) that the algorithm will always find an ε -equilibrium for any given ε . However the running time is exponential. In particular, the path could go through an exponential number of subsimplexes. Within each step of the path, one of the computational bottlenecks is computation of labels of the subsimplex. The computation of labels in turn depends on computation of expected utilities under mixed strategy profiles. 24 Function Minimization McKelvey and McLennan [1996] discussed formulating Nash equilibria as solutions of a function minimization problem. Given mixed strategy profile σ , let gi j (σ ) be the amount player i could gain by deviating to action j (and 0 if j is worse). A Nash equilibrium then corresponds to a global minimum of the function v(σ ) = ∑ ∑[gi j (σ )]2 , i j subject to σ being a mixed strategy profile. Note that the global minimum of v(σ ) is always 0, due to the existence of Nash equilibria. Standard function minimization techniques can then be applied. In order to find a global minimum, a good starting point is essential. According to McKelvey and McLennan [1996], this approach is “generally slower than other methods”. Homotopy Methods and the Global Newton Method At a high level, a homotopy method starts with a game that has a simple solution, then continuously deforms the payoffs of the game, until it ends at the original game of interest. Meanwhile, the method traces the path of Nash equilibria for these games, starting at a Nash equilibrium of the simple game and ending at a Nash equilibrium of the game of interest. Several homotopy methods for computing Nash equilibria have been proposed (a recent survey is [Herings and Peeters, 2009]). One such approach is Govindan and Wilson’s [2003] global Newton method (also known as continuation method [e.g., Blum et al., 2006]), which can be thought of as a generalization of the Lemke-Howson algorithm to the nplayer case. It starts at a deformed game where one action per player is given a large bonus, such that there exists a unique equilibrium. At each iteration, it computes the direction of next step by following a gradient. Since the path is nonlinear, the algorithm needs to periodically correct accumulated error using a local Newton method. One implementation of the algorithm is available in GameTracer [Blum et al., 2002]. The bottleneck of each iteration is the computation of the so-called payoff 25 Jacobian matrix given a mixed strategy profile. Entries of the Jacobian correspond to the expected utility of player i when i plays action a, player i′ plays action a′ , and all other players play according to the given mixed strategy profile. Iterated Polymatrix Approximation Iterated Polymatrix Approximation is another algorithm proposed by Govindan and Wilson [2004]. At a high level, the algorithm can be summarized as follows. 1. Start at some strategy profile σ 0 . 2. Consider the problem linearized at σ 0 : we get a polymatrix game, which (as we will see in Section 2.2.2) can be solved using a variant of the the LemkeHowson algorithm, to find equilibrium σ 1 . The payoffs of the polymatrix game correspond to entries of the payoff Jacobian. 3. Repeat with starting point σ 1 . If this process converges, it converges to a Nash equilibrium. However the algorithm is not guaranteed to converge. Thus, like fictitious play, this belongs to the category of heuristics. In cases of non-convergence, the authors propose using the result of the algorithm as a starting point for the Govindan-Wilson global Newton method. Support Enumeration Porter et al. [2008] proposed an algorithm that finds Nash equilibria by searching through support profiles. The algorithm can be summarized as follows. 1. Enumerate all support profiles, starting with small support sizes 2. Given a support profile, determine whether there exists a Nash equilibrium having that support profile. • For 2-player games, this involves solving a linear feasibility program. 26 • For n-player games, this involves solving a system of polynomial equations and inequalities3 of degree n − 1. 3. Stop when one equilibrium is found. Since the number of possible support profiles is exponential in the size of the normal form, and for n-player games step 2 requires exponential time, the above algorithm has exponential worst-case complexity. Nevertheless, the motivation behind the algorithm is the observation that many games have small-support Nash equilibria. When such equilibria exist, the algorithm can quickly find them. Another effective speedup Porter et al.’s algorithm employs is to prune off support profiles by eliminating dominated strategies conditioned on the current support profile. 2.2.2 Computing Sample Nash Equilibria for Compact Representations of Static Games So far we have focused on the NASH problem for normal form games. In this section we give an overview of literature on the computation of Nash equilibria under compact representations. Overall, we will see that (1) for many representations the NASH problem is in PPAD, and is PPAD-complete for fully-expressive representations, and (2) algorithms for the NASH problem can roughly be divided into two categories, “black-box” approaches which treat the representation as a black box, and “special-purpose” approaches which are representation-specific algorithms that exploit the structure of the representation such as symmetry and graph-theoretic properties. 3 One may wonder why not just solve the system of polynomial equations and inequalities characterizing the Nash equilibria of the game (see Section 2.2.1). There are two reasons one might prefer to solve the support-profile-specific system here: (1) for small support profiles, the resulting systems are much smaller; (2) it is known that for generic games, the solution set of a support-profile-specific system minus all the inequality constraints has dimension zero, i.e., it consists of isolated points. This means one method for solving this system is to solve the system minus all the inequality constraints (which is a system of polynomial equations), then check the solutions against the inequality constraints. Compared to the problem of solving systems of polynomial equations and inequalities, a wider variety of algorithms are available for solving polynomial equations, including ones based on (complex) algebraic geometry such as Groebner basis methods and polynomial homotopy continuation methods. 27 Complexity Fully-expressive game representations such as graphical games and AGGs can encode arbitrary normal form games. Therefore finding Nash equilibria for these representations is PPAD-hard. In other words, polynomial time algorithms are unlikely to exist. On the other hand, Daskalakis et al. [2006a] proved the following result: Theorem 2.2.4 ([Daskalakis et al., 2006a]). If a game representation satisfies the following properties: (1) the representation has polynomial type (defined in Section 2.1.1), and (2) expected utility can be computed using an arithmetic binary circuit with polynomial length, with nodes evaluating to constant values or performing addition, substraction, or multiplication on their inputs, then the NASH problem for this representation can be polynomially reduced to the NASH problem for some two-player, normal-form game. Since the NASH problem is in PPAD for two-player, normal-form games, the theorem implies that if the above properties hold, the NASH problem for such a compact game representation is in PPAD. Many of the existing representations satisfy these conditions. This is a positive result: since the NASH problems for such a compact representation reduces to NASH for a two-player game with size polynomial in the size of the compact representation, solving such a two-player game can be much easier than solving the normal form of the original game. The above result suggests that the computation of expected utility is of fundamental importance for the NASH problem. Another example of its importance is the observation that if we can compute expected utilities, we can verify a solution of the NASH problem. We will see more useful applications of expected utility computation throughout this survey. Speeding up Existing Algorithms and the Black-box Approach Quite a few of the existing algorithms for finding Nash equilibria of normal form games use computation of expected utility as a subroutine. Examples include Govindan and Wilson’s Global Newton Method and Iterated Polymatrix Approximation, as well as the simplicial subdivision algorithm. 28 For many compact representations (including all compact representations introduced in Section 2.1.1), there exist efficient algorithms for computing expected utility that scale polynomially in the representation size [e.g., Papadimitriou and Roughgarden, 2008]. Using these methods instead of normal-form-based methods for the expected utility subroutine, we can achieve exponential speedup of these existing Nash equilibrium algorithms without introducing any change in the algorithms’ behavior or output. Blum et al. [2006] were the first to propose such an approach, speeding up Govindan and Wilson’s algorithms [2003, 2004] for graphical games and MAIDs. In Chapter 3 we discuss our work on speeding up Govindan and Wilson’s Global Newton Method and the simplicial subdivision algorithm for AGGs. From a software-engineering point of view, such algorithms have a nice modular structure: an algorithm calls certain subroutines provided by the representation that access information about the game, but is otherwise unaware of the internal structure of the representation. At the same time, the representation-specific subroutines do not need to know about the details of the calling algorithm. We call such algorithms black-box algorithms. Another example of the black-box approach is the very recent adaptation of the support-enumeration approach to AGGs and graphical games [Thompson et al., 2011]. Here there are several required subroutines; one is the formulation of the polynomial system given a support profile. The polynomial system contains expressions for expected utilities, the construction of which can be thought of as symbolic computation of expected utilities. Many techniques for the expected utility problem in compact games translate to the symbolic problem. Another subroutine is the elimination of dominated strategies conditioned on a support profile. The black-box approach is not limited to the problem of computing a sample Nash equilibrium. For example, in Section 2.2.7 we look at Papadimitriou and Roughgarden’s [2008] algorithm for the problem of computing a correlated equilibrium, which requires a polynomial-time expected utility subroutine. This is also an example of a black-box algorithm that isn’t a direct adaptation of an existing algorithm for the normal form. On the other hand, specific representations may exhibit certain structure that can be exploited for efficient computation. We call these representation-specific al29 gorithms special-purpose algorithms. Intuitively, black-box algorithms and specialpurpose algorithms both exploit the compact representation’s structure, albeit at different levels: a black-box algorithm exploits structure to speed up a subroutine of the algorithm, keeping the rest of the algorithm intact across different representations, while in a special-purpose approach the entire algorithm is designed with a specific representation in mind. We now go through several representations and their corresponding special-purpose algorithms. Polymatrix Games Yanovskaya [1968] showed that Nash equilibria of a polymatrix game are solutions of an LCP. Such equilibria can be computed using a variant of the Lemke Howson algorithm [Howson Jr, 1972]. Symmetric Games As mentioned in Section 2.1.1, Nash [1951] proved that any symmetric game always has a symmetric Nash equilibrium. The space of symmetric strategy profiles has lower dimension than the space of mixed strategy profiles, so one might expect the problem of finding symmetric Nash equilibria to be easier than NASH in the general case. Gale et al. [1950] showed that NASH for bimatrix games can be reduced to finding a symmetric Nash equilibrium for symmetric bimatrix games. Therefore, the recent PPAD-completeness result for bimatrix games implies that finding symmetric Nash is also PPAD-complete. On the other hand, for symmetric games with a large number of players but a small number of actions, Papadimitriou and Roughgarden [2005] proposed a polynomial-time algorithm for finding a symmetric Nash equilibrium. The algorithm is based on the enumeration of all symmetric support profiles and the solution of a polynomial system for each support profile. Anonymous Games For anonymous games, the existence of symmetric equilibria is no longer guaranteed. Thus the above algorithm for symmetric games with a small number of 30 actions does not apply. Nevertheless, in a series of papers Daskalakis and Papadimitriou [2007, 2008, 2009] proposed polynomial-time algorithms for finding approximate Nash equilibria for anonymous games having a constant number of actions per player. Graphical Games Kearns et al. [2001] presented a polynomial-time algorithm for finding approximate Nash equilibrium in graphical games on tree graphs. The algorithm is based on a discretization of the mixed strategy space and a message-passing approach similar to probabilistic inference algorithms for Bayesian networks. For computing approximate Nash equilibria in graphical games on general graphs, Ortiz and Kearns [2003] and Vickrey and Koller [2002] proposed several approaches based on similar ideas. Elkind et al. [2006] presented a polynomial-time algorithm for finding exact Nash equilibria for graphical games on path graphs. The problem of finding exact Nash for tree graphs is still open. Symmetric AGGs Besides the black-box algorithms that we discuss in Chapter 3, Daskalakis et al. [2009] presented a polynomial-time special-purpose algorithm for finding an approximate symmetric Nash equilibrium in symmetric AGGs on tree graphs. Their algorithm is based on a discretization of the space of symmetric mixed strategies and a message-passing/dynamic programming approach. 2.2.3 Computing Sample Bayes-Nash Equilibria for Incomplete-information Static Games Bayes-Nash equilibrium is a solution concept for Bayesian games that is analogous to Nash equilibrium for complete-information games. Before we give its definition we first need to define strategies in Bayesian games. In a Bayesian game, player i can deterministically choose a pure strategy si , in which given each θi ∈ Θi she deterministically chooses an action si (θi ). Player i can also randomize and play a mixed strategy σi , in which her probability of choosing ai given θi is σi (ai |θi ). 31 That is, given a type θi ∈ Θi , she plays according to distribution σi (·|θi ) over her set of actions Ai . A mixed strategy profile σ = (σ1 , . . . , σn ) is a tuple of the players’ mixed strategies. The expected utility of i given θi under a mixed strategy profile σ is the expected value of i’s utility under the resulting joint distribution of a and θ , conditioned on i receiving type θi : ui (σ |θi ) = ∑ P(θ−i |θi ) ∑ ui (a, θ ) ∏ σ j (a j |θ j ). θ−i a (2.2.2) j A mixed strategy profile σ is a Bayes-Nash equilibrium if for all i, for all θi , for all ai ∈ Ai , ui (σ |θi ) ≥ ui (σ θi →ai |θi ), where σ θi →ai is the mixed strategy profile that is identical to σ except that i plays ai with probability 1 given θi . Computing Bayes-Nash Equilibria via Complete-information Interpretations Harsanyi [1967] showed that a Bayesian game can be interpreted as one of two equivalent complete-information games via both “induced normal form” and “agent form” interpretations. Specifically, the Nash equilibria of these complete-information games correspond to Bayes-Nash equilibria of the Bayesian game. (A detailed description of these correspondences is given in Chapter 6.) Thus one approach is to interpret a Bayesian game as a complete-information game, enabling the use of existing Nash-equilibrium-finding algorithms. However, as mentioned in Section 2.1.3, generating the normal form representations under both of these completeinformation interpretations leads to an exponential blowup in representation size. Howson and Rosenthal [1974] applied the agent form transformation to 2player Bayesian games, resulting in a complete-information polymatrix game which (recall from Section 2.2.2) can be solved using a variant of the Lemke-Howson algorithm. Their approach was able to avoid the aforementioned exponential blowup because in this case the agent forms admit a more compact representation (as polymatrix games). However, for n-player Bayesian games the corresponding agent forms do not correspond to polymatrix games or any other known representation. Nevertheless, in Chapter 6 we propose a general approach for computing sample Bayes-Nash equilibria in n-player Bayesian games (and BAGGs in particular). 32 Specifically, our approach solves the agent form of the BAGG using black-box versions of the Global Newton Method [Govindan and Wilson, 2003] and the simplicial subdivision algorithm [van der Laan et al., 1987], and instead of explicitly constructing the normal form of the agent form we use the BAGG as a compact representation of its agent form. Special-purpose Approaches Singh et al. [2004] proposed an incomplete information version of the graphical game representation, and presented efficient algorithms for computing approximate Bayes-Nash equilibria in the case of tree games. Gottlob et al. [2007] considered a similar extension of the graphical game representation and analyzed the problem of finding a pure-strategy Bayes-Nash equilibrium. Oliehoek et al. [2010] proposed a heuristic search algorithm for common-payoff Bayesian games, which has applications to cooperative multi-agent problems. 2.2.4 Computing Sample Nash Equilibria for Dynamic Games In perfect-information extensive-form games, all information sets contain a single node. As a result, each subtree of the extensive-form game tree form a subgame which can be solved independently of the rest of the tree. The backward induction algorithm computes a Nash equilibrium of the game by solving subgames from the leaves to the root. The running time is linear in the size of the extensive form. Furthermore, when the game is zero sum, it is possible to prune parts of the game tree that are not optimal. The canonical algorithm, Alpha-Beta pruning, has been influential in the design of high-performance game-playing systems for perfectinformation games such as chess and checkers. For extensive-form games with imperfect information, transforming to the induced normal form entails an exponential blowup in representation size. This is the main difficulty of the Nash equilibrium problem for dynamic games compared to the simultaneous-move case, and avoiding this exponential blowup is the focus of considerable existing literature. One common assumption is perfect recall: roughly, that each player remembers all her decisions and observations. For dynamic games with perfect recall, 33 there always exists a Nash equilibrium in behavior strategies, where a player independently chooses a distribution over actions at each of her information sets [Kuhn, 1953]. Computationally, behavior strategies are easier to work with, since representing a behavior strategy requires space linear in the extensive form, while representing a mixed strategy (i.e. a distribution over pure strategies) requires exponential space. For MAIDs, a behavior strategy for a player entails choosing, at each of her decision nodes and for each possible instantiation of the node’s parents, a probability distribution over her choices. The sequence form formulation of Koller, Meggido and von Stengel [1996] encodes a behavior strategy as a vector of “realization probabilities”. Using this formulation, the Nash equilibrium problem for zero-sum dynamic games can be formulated as a linear program of size polynomial in the extensive form representation. For two-player general-sum dynamic games, using the sequence form the Nash equilibrium problem can be formulated as a linear complementarity program (LCP) and solved using Lemke’s algorithm. For n-player games, Govindan and Wilson [2002] proposed an extension of their Global Newton Method to perfect-recall extensive-form games. As with the sequence form, strategies are encoded as realization probabilities. Daskalakis et al. [2006a] showed that the problem of finding a Nash equilibrium in behavior strategies for perfect-recall extensive-form games is in PPAD. For compact representations, existing approaches can again be divided into black-box and special-purpose ones. Koller and Milch [2001] proposed a specialpurpose approach for decomposing a MAID into subgraphs, each of which can be solved independently. As in the simultaneous-move case, the computation of expected utility is again an important subtask used by many game-theoretic computations. For example, such a subroutine can be used to run fictitious play, although (like in the simultaneous-move case) it is not guaranteed to converge. Blum et al. [2006] proposed a black-box approach for adapting Govindan and Wilson’s Global Newton Method for extensive-form games to MAIDs, by speeding up the subtask of computing the Jacobian matrix using a MAID-specific subroutine. In Chapter 5 we show that this algorithm can also be adapted to TAGGs. 34 2.2.5 Questions about the Set of All Nash Equilibria of a Game So far we have focused on finding one arbitrary Nash equilibrium. Since in general there can be more than one Nash equilibrium in a game, we are sometimes more interested in questions about the set of all Nash equilibria. Such problems include finding all Nash equilibria, counting the number of equilibria, and finding optimal Nash equilibria according to some objective such as social welfare, which is defined to be the sum of the players’ utilities. Unsurprisingly, such problems are usually intractable in the worst case (see e.g. [Conitzer and Sandholm, 2008]). For the problem of finding all Nash equilibria, Mangasarian [1964] proposed an algorithm for bimatrix games. More recently, Avis et al. [2010] described and implemented two algorithms for bimatrix games. Herings and Peeters [2005] proposed an algorithm that computes all Nash equilibria in an n-player normal form game by enumerating all support profiles. Compared to the support-enumeration method for finding a sample Nash equilibrium as discussed in Section 2.2.1, here the algorithm does not stop at a single Nash equilibrium and keeps going until all support profiles have been visited. At each support profile, the corresponding polynomial system is solved by either polynomial homotopy continuation or Groebner basis methods. For the problem of computing optimal Nash equilibria, Sandholm et al. [2005] proposed and evaluated a practical approach for bimatrix games using mixed-integer programming. 2.2.6 Computing Pure-Strategy Nash Equilibria A pure-strategy Nash equilibrium (PSNE), also known as pure Nash equilibrium or pure equilibrium, is a pure strategy profile that is a Nash equilibrium. Equivalently: Definition 2.2.5. An action profile a ∈ A is a pure-strategy Nash equilibrium (PSNE) of the game Γ if for all i ∈ N, for all a′i ∈ Ai , ui (ai , a−i ) ≥ ui (a′i , a−i ). Unlike mixed strategy Nash equilibria, PSNEs do not always exist in a game. Nevertheless, in many ways PSNE is a more attractive solution concept than mixedstrategy Nash equilibrium. First, PSNE can be easier to justify because it does not require the players to randomize. Second, it can be easier to analyze because of 35 its discrete nature (see, e.g., [Brandt et al., 2009]). There are several versions of the problem of computing PSNEs: deciding if a PSNE exists, finding one, counting the number of PSNEs, enumerating them, and finding the optimal equilibrium according to some objective (e.g., social welfare). Unlike the NASH problem, for games in normal form these problems can be solved in polynomial time in the input size, by enumerating all pure strategy profiles. Of course, since the size of the normal form representation grows exponentially in the number of players, this is problematic in practice. We thus focus on the problem for compact representations. The problem is hard in the most general case, when utility functions are arbitrary, efficiently-computable functions represented as circuits [Schoenebeck and Vadhan, 2006] or Turing Machines [Alvarez et al., 2005]. This is in contrast to the NASH case, where the Nash problems for both the normal form and fully-expressive compact representations are PPAD-complete. Iterated Best Response Iterated best response is a well-known both as a learning dynamics and as a heuristic algorithm for PSNE [e.g., Shoham and Leyton-Brown, 2009]. It is an iterative process starting at some arbitrary pure strategy profile. At each step, if there exists a player that is not playing a best response to the current pure strategy profile, that player changes her strategy to a best response. The process stops when all are playing best responses, in which case we have reached a PSNE. A related process is iterated better response, in which a deviating player only has to pick a pure strategy that is better than the current one. These processes can be carried out for all representations that provide efficient evaluation of utilities under arbitrary purestrategy profiles. However, like fictitious play, these are not guaranteed to converge for games in general. Graphical Games Gottlob et al. [2005] were the first to analyze the existence problem of pure-strategy Nash equilibria in graphical games. They proved that while the problem is NPcomplete in general, on games with graphs of bounded hypertree-width there exist a dynamic-programming algorithm that determines the existence of PSNE (and 36 finds one if it exists) in time polynomial in the size of the representation. Daskalakis and Papadimitriou [2006] reduced the problem to a Markov Random Field (MRF), and then applied the standard clique tree algorithm to the resulting MRF. Among their results they showed that for graphical games on graphs with log-sized treewidth, and bounded neighborhood size and bounded number of actions per player, the existence of pure Nash equilibria can be decided in polynomial time. Jiang and Safari [2010] analyzed the problem of deciding the existence of purestrategy Nash equilibria for graphical games on restricted classes of graphs, and gave a complete characterization of hard and easy classes of graphical games with bounded indegree, showing that the only tractable classes of graphs are those with bounded treewidth (after iterated removal of sinks). Daskalakis and Papadimitriou [2005] analyzed the complexity of finding pure and mixed Nash equilibria of graphical games on highly regular graphs (specifically, the d-dimensional grid) with identical local payoff functions for every player. Such games can be represented very compactly, as only the local payoff function at one neighborhood needs to be stored. They showed that finding pure-strategy Nash equilibria is tractable if d = 1 and NEXP-complete otherwise. Symmetric Games For symmetric games, questions about PSNE can be computed straightforwardly by checking all configurations, which requires polynomial time in the size of the representation, and polynomial time in n when the number of actions is fixed. Indeed, Brandt et al. [2009] proved that the existence problem for PSNE of symmetric games with constant number of actions is in the complexity class AC0 , which is the set of problems that can be solved by polynomial-sized constant-depth circuits with unlimited-fanin AND- and OR-gates. For anonymous games, efficient algorithms for PSNE have also been proposed [Brandt et al., 2009, Daskalakis and Papadimitriou, 2007]. Ryan et al. [2010] considered the problem of finding pure-strategy Nash equilibria in symmetric games whose utilities are very compactly represented, such that the number of players can be exponential in the representation size, and showed that if the utility functions are represented as piecewise-linear functions, there exist 37 polynomial-time algorithms for finding a pure-strategy Nash equilibria and count the number of equilibria. Congestion Games For congestion games, a PSNE always exists [Rosenthal, 1973]. Furthermore, iterated best-response dynamics always converge to a PSNE [Monderer and Shapley, 1996]. However, Fabrikant et al. [2004] showed that such dynamics may require an exponential number of steps to converge, and furthermore the problem of finding a PSNE for congestion games is complete for the complexity class PLS (which stands for Polynomial Local Search), which implies that a polynomial-time algorithm is unlikely to exist. For singleton congestion games, where the game is symmetric and each action consists of choosing only a single resource, Ieong et al. [2005] presented a polynomial-time algorithm for finding an optimal PSNE. AGGs Since AGGs can compactly encode arbitrary graphical games, the existence problem is NP-complete for AGGs. Conitzer [pers. comm., 2004] and Daskalakis et al. [2009] showed that the problem is NP-complete even for symmetric AGGs. In Chapter 4 we present a dynamic programming approach for computing PSNE in AGGs. For symmetric AGGs with bounded treewidth, our algorithm determines the existence of PSNE (and returns one if any exists) in polynomial time. We also show that our approach can be extended to certain classes of asymmetric AGGs. 2.2.7 Computing Correlated Equilibrium First proposed by Aumann [1974, 1987], correlated equilibrium (CE) is another important solution concept. Whereas in a mixed strategy Nash equilibrium players randomize independently, in a correlated equilibrium the players are allowed to coordinate their behavior based on signals from an intermediary. CE has interesting connections to the theory of online learning: the empirical distribution of no-internal-regret learning dynamics converge to the set of CE [e.g., Hart and 38 Mas-Colell, 2000, Nisan et al., 2007]. A correlated equilibrium is defined as a distribution x over action profiles, such that when a trusted intermediary draws a strategy profile a from this distribution, privately announcing to each player i her own component ai , i will have no incentive to choose another strategy, assuming others follow the suggestions. This requirement can be written as a set of linear incentive constraints on x. Combining these with the constraints that x is a distribution, the set of correlated equilibria can be formulated as a linear feasibility program with size polynomial in the size of the normal form. (A detailed description of this formulation is given in Chapter 7.) Thus it takes polynomial time in the size of the normal form to compute one CE, and indeed to compute an optimal CE according to some linear objective function. For compact representations, the same LP can have an exponential number of variables, due to the fact that the input size can be exponentially smaller. Thus, the above approach is not efficient for compact representations. Another challenge is that even explicitly representing a solution vector x can take exponential space. Thus, a compact representation for the distribution x is required. Furthermore, in order for the intermediary to be able to tractably implement such a correlated equilibrium, we also need an efficient algorithm for sampling from the distribution. In a landmark paper, Papadimitriou and Roughgarden [2008] proposed a blackbox algorithm for computing a sample CE, which runs in polynomial time when the game representation has polynomial type and when there is a polynomial-time algorithm for computing expected utility given mixed strategy profiles. The solutions are represented as mixtures of product distributions. Recently, Stein, Parrilo and Ozdaglar [2010] showed that this algorithm can fail to find an exact correlated equilibrium, but can be (easily) modified to efficiently compute approximate correlated equilibria. In Chapter 7 we present a variant of the Ellipsoid Against Hope algorithm that guarantees the polynomial-time identification of exact correlated equilibrium. For the problem of computing the optimal CE, Papadimitriou and Roughgarden [2008] showed that the problem is NP-hard for many existing representations, and gave a sufficient condition for the problem to be tractable. They showed that symmetric games, anonymous games and graphical games on tree graphs satisfy such a condition. In Chapter 8 we give a sufficient condition that generalizes Pa39 padimitriou and Roughgarden’s condition. In particular, we reduce the optimal CE problem to the deviation-adjusted social welfare problem, a combinatorial optimization problem closely related to the optimal social welfare outcome problem. This framework allows us to identify new classes of games for which the optimal CE problem is tractable, including graphical polymatrix games on tree graphs. Our algorithm can be understood as a black-box algorithm, with deviation-adjusted social welfare problem as the required subroutine. A couple of special-purpose approaches have been proposed for graphical games. Kakade et al. [2003] proposed an algorithm for computing a CE with maximum entropy in tree graphical games in polynomial time. More recently, Kamisetty et al. [2011] proposed a practical approach for approximating the optimal CE in graphical games. Computing Coarse Correlated Equilibria Coarse correlated equilibrium (CCE) [Hannan, 1957] is a solution concept closely related to CE. The difference between the two is the class of deviations they consider. Whereas CE requires that each player have no profitable deviation even if she takes into account the signal she receives from the intermediary, CCE only requires that each player have no profitable unconditional deviation. CCE is also related to online learning: the empirical distribution of a no-external-regret learning dynamics converge to the set of CCE. As in the case of CE, the set of CCE can also be formulated as an LP. A formal description is given in Chapter 8. A CE is also a CCE, and hence results for the polynomial-time computation of a sample CE also apply to the computation of a sample CCE. On the other hand, since the optimal CE problem is not always tractable, the optimal CCE problem could be easier than the optimal CE problem for some representations. In Chapter 8 we show that for singleton congestion games, the optimal CCE problem can be solved in polynomial time, while the complexity of the optimal CE problem for this class of games is unknown. 40 Computing Extensive-form Correlated Equilibria Recently, von Stengel and Forges [2008] proposed extensive-form correlated equilibrium (EFCE), a solution concept for perfect-recall extensive-form games that is closely related to correlated equilibrium. Recall that in an extensive-form game, each pure strategy of a player prescribes a move for each of her information sets. Like correlated equilibria, an EFCE is a distribution over pure-strategy profiles. Whereas in a CE of the induced normal form of the game the intermediary recommends a pure strategy to each player at the start of the game, in an EFCE the intermediary recommends a move to the player only when the corresponding information set is reached. Huang and Von Stengel [2008] described a polynomial-time algorithm for computing sample extensive-form correlated equilibria. Their algorithm follows a very similar structure as Papadimitriou and Roughgarden’s Ellipsoid Against Hope algorithm, and the flaws of the Ellipsoid Against Hope algorithm pointed out by Stein et al. [2010] also carry over. As a result, the algorithm can fail to find an exact EFCE. In Chapter 7 we extend our fix for Papadimitriou and Roughgarden’s Ellipsoid Against Hope algorithm to Huang and Von Stengel’s algorithm, allowing it to compute an exact EFCE. 2.2.8 Computing Other Solution Concepts Other solution concepts have been proposed in the economics literature to represent different notions of rational behavior. Computer scientists have studied the corresponding computational problems, including the computation of (iterated) elimination of dominated strategies [Conitzer and Sandholm, 2005], Stackelberg equilibrium [Conitzer and Sandholm, 2006, Paruchuri et al., 2008], closed under rational behavior (CURB) sets [M. Benisch and Sandholm, 2010], and sink equilibrium [Goemans et al., 2005]. While these are interesting problems, they are not directly related to this thesis and we refer interested readers to the papers referenced above. 41 2.3 Software GAMBIT [McKelvey et al., 2006] is a collection of software tools for game theoretic analysis. It includes implementations of many of the existing algorithms for the normal form and the extensive form. It also provides a graphical user interface for creating normal form and extensive form games, running algorithms for computing Nash equilibria, and visualizing the resulting profiles. It is available at http://www.gambit-project.org. Gametracer [Blum et al., 2002] provides black-box adaptations of two of Govindan and Wilson’s algorithms for finding a sample Nash equilibrium: Global Newton Method [Govindan and Wilson, 2003] and Iterated Polymatrix Approximation [Govindan and Wilson, 2004]. The algorithms are written as C++ functions that takes an instance of “gnmgame”, an abstract class with an abstract method4 for computing expected utilities.5 As a result, in order to apply these algorithms to a specific game representation, one merely has to implement the representation as a subclass of gnmgame. The package itself only provides a subclass for the normal form representation. Gametracer’s source code is available for download at http://dags.stanford.edu/Games/gametracer.html. It has also been adapted and in- corporated into GAMBIT. GAMUT [Nudelman et al., 2004] is a suite of game instance generators. It includes many classes of games studied in the economics and computer science literature, and parameterization options for the dimensions of the game, the types of utility functions and randomization. The stated purpose of GAMUT is for evaluating game-theoretic algorithms. The main output format for GAMUT is normal form. GAMUT is available at http://gamut.stanford.edu. In Appendix A we describe the software tools we implemented and make available at http://agg.cs.ubc.ca. They include command-line programs for finding sample Nash equilibria in AGGs and BAGGs, a graphical user interface for creating, editing and visualizing AGGs, and extensions of GAMUT that generate AGG instances. 4 An abstract method in C++ means that only the interface of method is given; any subclass that is not also abstract needs to provide an implementation of the method. 5 Another abstract method is for computing payoff Jacobians (see Chapter 3 for the definition), which usually requires similar types of computations as expected utilities. 42 Chapter 3 Action-Graph Games 3.1 Introduction In this chapter we focus on complete-information simultaneous-action games. An overview of the literature on compact representations and computation of solution concepts for such games is given in Chapter 2, specifically Sections 2.1.1, 2.2.2 and 2.2.7. As we summarized in Chapter 1, the existing representations either only capture a subset of the known types of structure (anonymity, strict and actionspecific independence, and additivity), or are only able to represent a subset of games. Meanwhile, the computation of expected utility has emerged as a key subtask required by many black-box algorithms for computing solution concepts. 3.1.1 Our Contributions Action-graph games (AGGs) are a general game representation that can be understood as offering the advantages of—and, indeed, unifying—existing representations including graphical games and congestion games. Like graphical games, AGGs can represent any game, and important game-theoretic computations can be performed efficiently when the AGG representation is compact. Hence, AGGs offer a general representational framework for game-theoretic computation. Like congestion games, AGGs compactly represent context-specific independence, anonymity, and additivity, though unlike congestion games they do not require any of these. Finally, AGGs can also compactly represent many games that are not compact as 43 either graphical games or as congestion games. We begin this chapter in Section 3.2 by defining action-graph games, including the basic representation and extensions with function nodes and additive utility functions, and characterizing their representation sizes. In Section 3.3 we provide several more examples of structured games which can be compactly represented as AGGs. Then we turn from representational to computational issues. In Section 3.4 we present a dynamic programming algorithm for computing an agent’s expected utility under an arbitrary mixed-strategy profile, prove its complexity, and explore several elaborations. In Section 3.5 we show that (as a corollary of the polynomial complexity of our expected utility algorithm) the problem of finding an ε -Nash equilibrium of an AGG is in PPAD: this is a positive result, as AGGs can be exponentially smaller than normal-form games. We also show how to use our dynamic programming algorithm to speed up existing methods for computing sample ε -Nash and ε -correlated equilibria. Finally, in Section 3.6 we present the results of extensive experiments with some of these algorithms, demonstrating that AGGs can feasibly be used to reason about interesting games that were inaccessible to any previous techniques. The largest game that we tackled in our experiments had 20 agents and 13 actions per agent; we found its Nash equilibrium in 14.3 minutes. A normal form representation of this game would have involved 9.4 × 10134 numbers, requiring an outrageous 7.5 × 10126 gigabytes even to store. Finally, let us describe the relationship between this chapter and past work on AGGs. Leyton-Brown and Tennenholtz [2003] introduced local-effect games, which can be understood as symmetric AGGs in which utility functions are required to satisfy a particular linearity property. Bhat and Leyton-Brown [2004] introduced the basic AGG representation and some of the computational ideas for reasoning with them. The dynamic programming algorithm was first proposed in Jiang and Leyton-Brown [2006], as was the idea of function nodes. An extended version of that paper appeared as Chapter 2 of the MSc thesis [Jiang, 2006]. The current chapter is based on the journal publication [Jiang et al., 2011], which substantially elaborates upon and extends the representations and methods from these earlier papers. Specifically, [Jiang et al., 2011] introduced the additive structure model and the encoding of congestion games, several of the examples, our computational methods for k-symmetric games and for additive structure, our speedup of 44 the simplicial subdivision algorithm, and all experiments presented in this chapter (Section 3.6). 3.2 Action Graph Games This section has three parts, each of which defines a different AGG variant. In Section 3.2.1 we define the basic AGG representation (which we dub AGG-0), / characterize its representation size, and show how it can be used to represent normalform, graphical, and symmetric games. In Section 3.2.2 we introduce the idea of function nodes, show how AGGs with function nodes (AGG-FNs) can capture additional structure in several example games, and show how to represent anonymous games as AGG-FNs. In Section 3.2.3 we introduce AGG-FNs with additive structure (AGG-FNA), which compactly represent additive structure in the utility functions of AGGs, and show how congestion games can be succinctly written as AGG-FNAs. 3.2.1 Basic Action Graph Games We begin with an intuitive description of basic action-graph games. Consider a directed graph with nodes A and edges E, and a set of agents N = {1, . . . , n}. Identical tokens are given to each agent i ∈ N. To play the game, each agent i simultaneously places her token on a node ai ∈ Ai , where Ai ⊆ A . Each node in the graph thus corresponds to an action choice that is available to one or more of the agents; this is where action-graph games get their name. Each agent’s utility is calculated according to an arbitrary function of the node she chose and the numbers of tokens placed on the nodes that neighbor that chosen node in the graph. We will argue below that any simultaneous-move game can be represented in this way, and that action-graph games are often much more compact than games represented in other ways. We now turn to a formal definition of basic action-graph games. Let N = {1, . . . , n} be the set of agents. Central to our model is the action graph. Definition 3.2.1 (Action graph). An action graph G = (A , E) is a directed graph where: 45 • A is the set of nodes. We call each node α ∈ A an action, and A the set of distinct actions. For each agent i ∈ N, let Ai be the set of actions available to i, with A = i∈N Ai . 1 We denote by ai ∈ Ai one of agent i’s actions. An action profile (or pure strategy profile) is a tuple a = (a1 , . . . , an ). Denote by A the set of action profiles. Then A = ∏i∈N Ai where ∏ is the Cartesian product. • E is a set of directed edges, where self edges are allowed. We say α ′ is a neighbor of α if there is an edge from α ′ to α , i.e., (α ′ , α ) ∈ E. Let the neighborhood of α , denoted ν (α ), be the set of neighbors of α , i.e., ν (α ) ≡ {α ′ ∈ A |(α ′ , α ) ∈ E}. Given an action graph and a set of agents, we can further define a configuration, which is a feasible arrangement of agents across nodes in an action graph. Definition 3.2.2 (Configuration). Given an action graph (A , E) and a set of action profiles A, a configuration c is a tuple of |A | non-negative integers (c(α ))α ∈A , where c(α ) is interpreted as the number of agents who chose action α ∈ A , and where there exists some a ∈ A that would give rise to c. Denote the set of all configurations as C. Let C : A → C be the function that maps from an action profile a to the corresponding configuration c. Formally, if c = C (a) then c(α ) = |{i ∈ N : ai = α }| for all α ∈ A . We can also restrict a configuration to a given node’s neighborhood. Definition 3.2.3 (Configuration over a neighborhood). Given a configuration c ∈ C and a node α ∈ A , let the configuration over the neighborhood of α , denoted c(α ) , be the restriction of c to ν (α ), i.e., c(α ) = (c(α ′ ))α ′ ∈ν (α ) . Similarly, let C(α ) denote the set of configurations over ν (α ) in which at least one player plays α .2 Let C (α ) : A → C(α ) be the function which maps from an action profile to the corresponding configuration over ν (α ). 1 Different agents’ action sets Ai , A j may (partially or completely) overlap. The implications of this will become clear once we define the utility functions. 2 If action α is in multiple players’ action sets (say players i, j), and these action sets do not completely overlap, then it is possible that the set of configurations given that i played α (denoted C(s,i) ) is different from the set of configurations given that j played α . C(α ) is the union of these sets of configurations. 46 Now we can state the formal definition of basic action-graph games as follows. Definition 3.2.4 (Basic action-graph game). A basic action-graph game (AGG-0) / is a tuple (N, A, G, u) where • N is the set of agents; • A = ∏i∈N Ai is the set of action profiles; • G = (A , E) is an action graph, where A = i∈N Ai is the set of distinct actions; • u = (uα )α ∈A is a tuple of |A | functions, where each uα : C(α ) → R is the utility function for action α . Semantically, uα (c(α ) ) is the utility of an agent who chose α , when the configuration over ν (α ) is c(α ) . For notational convenience, we define u(α , c(α ) ) ≡ uα (c(α ) ) and ui (a) ≡ u(ai , C (ai ) (a)). We also define A−i ≡ ∏ j=i A j as the set of action profiles of agents other than i, and denote an element of A−i by a−i . Example: Ice Cream Vendors The following example helps to illustrate the elements of the AGG-0/ representation, and also exhibits context-specificity and anonymity in utility functions. This example would not be compact under the existing game representations discussed in the introduction. It was inspired by Hotelling [1929], and elaborates an example used in Leyton-Brown and Tennenholtz [2003]. Example 3.2.5 (Ice Cream Vendor game). Consider a setting in which n vendors sell ice cream or strawberries, and must choose one of four locations along a beach. There are three kinds of vendors: nI ice cream vendors, nS strawberry vendors, and nW vendors who can sell both ice cream and strawberry, but only on the west side. Ice cream (strawberry) vendors are negatively affected by the presence of other ice cream (strawberry) vendors in the same or neighboring locations, and are simultaneously positively affected by the presence of nearby strawberry (ice cream) vendors. The AGG-0/ representation of this game is illustrated in Figure 3.1. As always, nodes represent actions and directed edges represent membership in a node’s 47 AI I1 I2 I3 I4 S1 S2 S3 S4 AW AS Figure 3.1: AGG-0/ representation of the Ice Cream Vendor game. neighborhood. The dotted boxes represent the action sets for each group of players; for example, the ice cream vendors have action set AI . Note that this game exhibits context-specific independence without any strict independence, and that the graph structure is independent of n. Size of an AGG-0/ Representation Intuitively, AGG-0s / capture two types of structure in games: 1. Shared actions capture the game’s anonymity structure: agent i’s utility depends only on her action ai and the configuration. Thus, agent i cares about the number of players that play each action, but not the identities of those players. 2. The (lack of) edges between nodes in the action graph expresses contextspecific independencies of utilities of the game: for all i ∈ N, if i chose action α ∈ A , then i’s utility depends only on the configuration over the neighborhood of α . In other words, the configuration over actions not in ν (α ) does not affect i’s utility. We have claimed informally that action graph games provide a way of representing games compactly. But what exactly is the size of an AGG-0/ representation, and how does it grow with the number of agents n? In this subsection we give a 48 bound on the size of an AGG-0, / and show that asymptotically it is never worse than the size of the equivalent normal form. From Definition 3.2.4 we observe that to completely specify an AGG-0/ we need to specify (1) the set of agents, (2) each agent’s set of actions, (3) the action graph, and (4) the utility functions. The first three can easily be compactly represented: 1. The set of agents N = {1, . . . , n} can be specified by the integer n. 2. The set of actions A can be specified by the integer |A |. Each agent’s action set Ai ⊆ A can be specified in O(|A |) space. 3. The action graph G = (A , E) can be straightforwardly represented as neighbor lists: for each node α ∈ A we specify its list of neighbors ν (α ) ⊆ A . The space required is ∑α ∈A |ν (α )|, which is bounded by |A |I , where I = maxα |ν (α )|, i.e., the maximum in-degree of G. We observe that whereas the first three components of an AGG-0/ (N, A, G, u) can always be represented in space polynomial in n and |Ai |, the size of the utility functions is worst-case exponential. So the size of the utility functions determines whether an AGG-0/ can be tractably represented. Indeed, for the rest of the paper we will refer to the number of payoff values stored as the representation size of the AGG-0. / The following proposition gives an upper bound on the number of payoff values stored. Proposition 3.2.6. Given an AGG-0, / the number of payoff values stored by its )! utility functions is at most |A | (n−1+I (n−1)!I ! . If I is bounded by a constant as n grows, the number of payoff values is O(|A |nI ), i.e. polynomial with respect to n. Proof. For each utility function uα : C(α ) → R, we need to specify a utility value for each distinct configuration c(α ) ∈ C(α ) . The set of configurations C(α ) can be derived from the action graph, and can be sorted in lexicographical order. Thus, we can just specify a list of |C(α ) | utility values that correspond to the (ordered) set of configurations.3 In general there is no closed form expression for |C(α ) |, the number of distinct configurations over ν (α ). Instead, we consider the operation of extending all agents’ action sets via ∀i : Ai → A . The number of configurations over 49 ν (α ) under the new action sets is an upper bound on |C(α ) |. This is the number of (ordered) combinatorial compositions of n− 1 (since one player has already chosen ν (α )|)! ν (α )| α ) into |ν (α )|+ 1 nonnegative integers, which is n−1+| = (n−1+| (n−1)!|ν (α )|! . Then |ν (α )| )! the total space required for the utilities is bounded from above by |A | (n−1+I (n−1)!I ! . If I is bounded by a constant as n grows, this grows like O(|A |nI ). For each AGG-0, / there exists a unique induced normal form representation with the same set of players and |Ai | actions for each i; its utility function is a matrix that specifies each player i’s payoff for each possible action profile a ∈ A. This implies a space complexity of n ∏ni=1 |Ai |. When Ai ≥ 2 for all i, the size of the induced normal form representation grows exponentially with respect to n. On the other hand, we observe that the number of payoff values stored in an AGG-0/ representation is always less than or equal to the number of payoff values in the induced normal form representation. Of course, the AGG-0/ representation has the extra overhead of representing the action graph, which is bounded by |A |I . But this overhead is dominated by the size of the induced normal form, n ∏ j |A j |. Thus, an AGG-0’s / asymptotic space complexity is never worse than that of its induced normal form game. It is also possible to describe a reverse transformation that encodes any arbitrary game in normal form as an AGG-0. / Specifically, a unique node ai must be created for each action available to each agent i. Thus ∀α ∈ A , c(α ) ∈ {0, 1}, and ∀i, ∑α ∈Ai c(α ) must equal 1. The configuration simply indicates each agent’s action choice, and expresses no anonymity or context-specific independence structure. This representation is no more or less compact than the normal form. More precisely, the number of distinct configurations over ν (ai ) is the number of action profiles of the other players, which is ∏ j=i |A j |. Since i has |Ai | actions, ∏ j |A j | payoff values are needed to represent i’s payoffs. So in total n ∏ j |A j | payoff values are stored, exactly the number in the normal form. 3 This is the most compact way of representing the utility functions, but does not provide easy random access to the utilities. Therefore, when we want to do computation using AGGs, we may convert each utility function uα to a data structure that efficiently implements a mapping from sequences of integers to (floating-point) numbers, (e.g. tries, hash tables or Red-Black trees), with space complexity O(I |C(α ) |). 50 1 4 7 2 5 8 3 6 9 Figure 3.2: AGG-0/ representation of a 3-player, 3-action graphical game. One might ask whether AGG-0s / can compactly represent known classes of structured games. Consider the graphical game representation as defined in Definition 2.1.2. Graphical games can be represented as AGG-0s / by replacing each node i in the graphical game by a distinct cluster of nodes Ai representing the action set of agent i. If the graphical game has an edge from i to j, edges must be created in the AGG-0/ so that ∀ai ∈ Ai , ∀a j ∈ A j , ai ∈ ν (a j ). The resulting AGG-0s / are as compact as the original graphical games. Figure 3.2 shows the AGG-0/ representation of a graphical game having three nodes and two edges (i.e., player 1 and player 3 do not directly affect each others’ payoffs). Another important class of structured games are symmetric games as defined in Section 2.1.1. An arbitrary symmetric game can be encoded as an AGG-0/ without an increase in asymptotic size. Specifically, let Ai = A for all i ∈ N. The resulting action graph is a clique, i.e., ν (α ) = A for all α ∈ A . 3.2.2 AGGs with Function Nodes There are games with certain kinds of context-specific independence structures that AGG-0s / are not able to exploit (see, e.g., Example 3.2.7 below). In this section we extend the AGG-0/ representation by introducing function nodes, allowing us to exploit a much wider variety of utility structures. Of course, as always, compact representation is not interesting as an end in itself. In Section 3.4.2 we identify broad subclasses of AGG-FNs—indeed, rich enough to encompass all AGG-FN examples presented in this chapter —which are amenable to efficient computation. 51 Examples: Coffee Shops and Parity Example 3.2.7 (Coffee Shop game). Consider a game involving n players; each player plans to open a coffee shop in a downtown area, represented by a r × k grid. Each player can choose to open a shop located within any of the B ≡ rk blocks or decide not to enter the market. Conditioned on player i choosing some location α , her utility depends on the numbers of players who chose (i) the same block; (ii) any of the surrounding blocks; and (iii) any other location. The normal form representation of this game has size n|A |n = n(B + 1)n . Since there are no strict independencies in the utility function, the asymptotic size of the graphical game representation is the same. Let us now represent the game as an AGG-0. / We observe that if agent i chooses an action α corresponding to one of the B locations, then her payoff is affected by the configuration over all B locations. Hence, ν (α ) must consist of B action nodes corresponding to the B locations, and so the action graph has in-degree I = B. Since the action sets completely overlap, the representation size is Θ(|A ||C(α ) |) = Θ B (n−1+B)! (n−1)!B! . If we hold B constant, this becomes Θ(BnB ), which is exponentially more compact than the normal form and the graphical game representation. If we instead hold n constant, the size of the representation is Θ(Bn ), which is only slightly better than the normal form and graphical game representations. Intuitively, the AGG-0/ representation is able to exploit anonymity structure in this game. However, this game’s payoff function also has context-specific structure that the AGG-0/ does not capture. Observe that uα depends only on three quantities: the number of players who chose the same block, the number of players who chose an adjacent block, and the number of players who chose another location. In other words, uα can be written as a function g of only three integers: uα (c(α ) ) = g(c(α ), ∑α ′ ∈A ′ c(α ′ ), ∑α ′′ ∈A ′′ c(α ′′ )) where A ′ is the set of actions surrounding α and A ′′ the set of actions corresponding to other locations. The AGG-0/ representation is not able to exploit this context-specific information, and so duplicates some utility values. There exist many similar examples in which the utility functions uα can be expressed as functions of a small number of intermediate parameters. Here we give one more. 52 Example 3.2.8 (Parity game). In a “parity game”, each uα depends only on whether the number of agents at neighboring nodes is even or odd, as follows: uα = 1 if ∑α ′ ∈ν (α ) c(α ′ ) mod 2 = 0; 0 otherwise. Observe that in the Parity game uα can take just two distinct values; however, the AGG-0/ representation must specify a value for every configuration c(α ) . Definition of AGG-FNs Structure such as that in Examples 3.2.7 and 3.2.8 can be exploited within the AGG framework by introducing function nodes to the action graph G; intuitively, we use them to describe intermediate parameters upon which players’ utilities depend. Now G’s vertices consist of both the set of action nodes A and the set of function nodes P, i.e. G = (A ∪ P, E). We require that no function node p ∈ P can be in any player’s action set: A ∩ P = {}. Thus, the total number of nodes in G is |A | + |P|. Each node in G can have action nodes and/or function nodes as neighbors. We associate a function f p : C(p) → R with each p ∈ P, where c(p) ∈ C(p) denotes configurations over p’s neighbors. The configurations c are extended to include the function nodes by the definition c(p) ≡ f p (c(p) ). If p ∈ P has no neighbors, f p is a constant function. To ensure that the AGG is meaningful, the graph G restricted to nodes in P is required to be a directed acyclic graph (DAG). This condition ensures that for all α and p, c(α ) and c(p) are well defined. To ensure that every p ∈ P is “useful”, we also require that p has at least one outgoing edge. As before, for each action node α we define a utility function uα : C(α ) → R. We call this extended representation an Action Graph Game with Function Nodes (AGG-FN), and define it formally as follows. Definition 3.2.9 (AGG-FN). An Action Graph Game with Function Nodes (AGG-FN) is a tuple (N, A, P, G, f , u), where: • N is the set of agents; • A = ∏i∈N Ai is the set of action profiles; • P is a finite set of function nodes; 53 • G = (A ∪ P, E) is an action graph, where A = i∈N Ai is the set of distinct actions. We require that the restriction of G to the nodes P is acyclic and that for every p ∈ P there exists an m ∈ A ∪ P such that (p, m) ∈ E; • f is a tuple ( f p ) p∈P , where each f p : C(p) → R is an arbitrary mapping from neighbors of p to real numbers; • u is a tuple (uα )α ∈A , where each uα : C(α ) → R is the utility function for action α . Given an AGG-FN, we can construct an equivalent AGG-0/ with the same players N and actions A and equivalent utility functions, but without any function nodes. We call this the induced AGG-0/ of the AGG-FN. There is an edge from α ′ to α in the induced AGG-0/ either if there is an edge from α ′ to α in the AGG-FN, or if there is a path from α ′ to α through a chain consisting entirely of function nodes. From the definition of AGG-FNs, the utility of playing action α is uniquely determined by the configuration c(α ) , which is uniquely determined by the configuration over the actions that are neighbors of α in the induced AGG-0. / As a result, the utility tables of the induced AGG-0/ can be filled in unambiguously. We observe that the number of utility values stored in an AGG-FN is no greater than the number of utility values in the induced AGG-0. / On the other hand, AGG-FNs have to represent the functions f p for each p ∈ P. In the worst case, these functions can be represented as explicit mappings similar to the utility functions uα . However, it is often possible to define these functions algebraically by combining elementary operations, as we do in most of the examples given in this chapter . In this case the functions’ representations require a negligible amount of space. Representation Size What is the size of an AGG-FN (N, A, P, G, f , u)? The following proposition gives a sufficient condition for the representation size to be polynomial. Here we speak about a class of AGG-FNs because our statement is about the asymptotic behavior of the representation size. This is in contrast to Proposition 3.2.6, where we gave an exact bound on the size of an individual AGG-0. / Proposition 3.2.10. A class of AGG-FNs has representation size bounded by a function polynomial in n, |A | and |P| if the following conditions hold: 54 1. for all function nodes p ∈ P, the size of p’s range |R( f p )| is bounded by a function polynomial in n, |A | and |P|; and 2. maxm∈A∪P ν (m) (the maximum in-degree in the action graph) is bounded by a constant. Proof. Given an AGG-FN (N, A, P, G, f , u), it is straightforward to check that all components except u and f are polynomial in n, |A | and |P|. uα First, consider an action node α ∈ A . Recall that the size of the utility function is C(α ) . Partition ν (α ), the set of α ’s neighbors, into νA (α ) = ν (α ) ∩ A and νP (α ) = ν (α ) ∩ P (neighboring action nodes and function nodes respectively). Since for each action α ′ ∈ νA (α ), c(α ′ ) ∈ {0, . . . , n}, and for each p′ ∈ νP (α ), c(p) ∈ R( f p ), then C(α ) ≤ (n + 1)|νA (α )| ∏ p∈νP (α ) |R( f p )|. This is polynomial because all action node in-degrees are bounded by a constant. Now consider a function node p ∈ P. Without loss of generality, assume that its function f p is represented explicitly as a mapping. (Any other representation of f p can be transformed into this explicit representation.) The representation size of f p is then C(p) . Using the same reasoning as above, we have C(p) ≤ (n+ 1)|νA (p)| ∏q∈νP (p) |R( f q )|, which is polynomial since all function node in-degrees are bounded by a constant. When the functions f p do not have to be represented explicitly, we can drop the requirement on the in-degree of function nodes. Corollary 3.2.11. A class of AGG-FNs has representation size bounded by a function polynomial in n, |A | and |P| if the following conditions hold: 1. for all function nodes p ∈ P, the function f p has a representation whose size is polynomial in n, |A | and |P|; 2. for each function node p ∈ P that is a neighbor of some action node α , the size of p’s range |R( f p )| is bounded by a function polynomial in n, |A | and |P|; and 3. maxα ∈A ν (α ) (the maximum in-degree among action nodes) is bounded by a constant. A very useful type of function node is the simple aggregator. 55 Definition 3.2.12 (Simple aggregator). A function node p ∈ P is a simple aggregator if each of its neighbors ν (p) are action nodes and f p is the summation function: f p (c(p) ) = ∑m∈ν (p) c(m). Simple aggregator function nodes take the value of the total number of players who chose any of the node’s neighbors. Since these functions can be specified in constant space, and since R( f p ) = {0, . . . , n} for all p, Corollary 3.2.11 applies. That is, the representation sizes of AGG-FNs whose function nodes are all simple aggregators are polynomial whenever the in-degrees of action nodes are bounded by a constant. In fact, under certain assumptions we can prove an even tighter bound on the representation size, analogous to Proposition 3.2.6 for AGG-0s. / Intuitively, this works because both configurations on action nodes and configurations on simple aggregators count the numbers of players who behave in certain ways. Proposition 3.2.13. Consider a class of AGG-FNs whose function nodes are all simple aggregators. For each m ∈ A ∪ P, define the function β (m) = m m ∈ A; ν (m) otherwise. Intuitively, β (m) is the set of nodes whose counts are aggregated by node m. If for each α ∈ A and for each m, m′ ∈ ν (α ), β (m) ∩ β (m′ ) = {} unless m = m′ (i.e., no action node affects α in more than one way), then the AGG-FNs’ representation sizes are bounded by |A | n−1+I I where I = maxα ∈A |ν (α )| is the maximum in- degree of action nodes. Proof. Consider the utility function uα for an arbitrary action α . Each neighbor m ∈ ν (α ) is either an action or a simple aggregator. Observe that a configuration c(α ) ∈ C(α ) is a tuple of integers specifying the numbers of players choosing each action in the set β (m) for each m ∈ ν (α ). As in the proof of Proposition 3.2.6, we extend each player’s set of actions to |A |, making the game symmetric. This weakly increases the number of configurations. Since the sets β (m) are non-overlapping, the number of configurations possible in the extended action space is equal to the number of (ordered) combinatorial compositions of n − 1 into |ν (α )| + 1 nonnegative integers, which is 56 n−1+|ν (α )| |ν (α )| . This includes one bin for Figure 3.3: A 5 × 6 Coffee Shop game: Left: the AGG-0/ representation without function nodes (looking at only the neighborhood of α ). Middle: we introduce two function nodes, p′ (bottom) and p′′ (top). Right: α now has only 3 neighbors. each action or simple aggregator in ν (α ), plus one bin for agents that take an action that is neither in ν (α ) nor in the neighborhood of any simple aggregator in ν (α ). Then the total space required for representing u is bounded by |A | n−1+I I where I = maxα ∈A |ν (α )|. Consider the Coffee Shop game from Example 3.2.7. For each action node α corresponding to a location, we introduce two simple aggregator function nodes, p′α and p′′α . Let ν (p′α ) be the set of actions surrounding α , and ν (p′′α ) be the set of actions corresponding to other locations. Then we set ν (α ) = {α , p′α , p′′α }, as shown in Figure 3.3. Now each c(α ) is a configuration over only three nodes. Since each f p is a simple aggregator, Corollary 3.2.11 applies and the size of this AGG-FN is polynomial in n and A . In fact since the game is symmetric and the β ()’s as defined in Proposition 3.2.13 are non-overlapping, we can calculate the exact value of |C(α ) | as the number of compositions of n − 1 into four nonnegative (n+2)! integers, (n−1)!3! = n(n + 1)(n + 2)/6 = O(n3 ). We must therefore store Bn(n + 1)(n + 2)/6 = O(Bn3 ) utility values. This is significantly more compact than the AGG-0/ representation, which has a representation size of O(B (n−1+B)! (n−1)!B! ). We can represent the parity game from Example 3.2.8 in a similar way. For each action α we create a function node pα , and let ν (pα ) = ν (α ). We then modify ν (α ) so that it has only one member, pα . For each function node p we define f p as f p (c(p) ) = ∑α ∈ν (p) c(α ) mod 2. Since R( f p ) = {0, 1}, Corollary 3.2.11 applies. In fact, each utility function just needs to store two values, and so the representation size is O(|A |) plus the size of the action graph. 57 3.2.3 AGG-FNs with Additive Structure So far we have assumed that the utility functions uα : C(α ) → R are represented explicitly, i.e., by specifying the payoffs for all c(α ) ∈ C(α ) . This is not the only way to represent a mapping; the utility functions could be defined as analytical functions, decision trees, logic programs, circuits, or even arbitrary algorithms. These alternative representations might be more natural for humans to specify, and in many cases are more compact than the explicit representation. However, this extra compactness does not always allow us to reason more efficiently with the games. In this section, we look at utility functions with additive structure. These functions can be represented compactly and do allow more efficient computation. Definition of AGG-FNs with Additive Structure We say that a multivariate function has additive structure if it can be written as a (weighted) sum of functions of subsets of the variables. This form is more compact because we only need to represent the summands, which have lower dimensionality than the entire function. We extend the AGG-FN representation by allowing uα to be represented as a weighted sum of the configuration of the neighbors of α .4 Definition 3.2.14. A utility function uα of an AGG-FN is additive if for all m ∈ ν (α ) there exist λm ∈ R, such that uα (c(α ) ) ≡ ∑ λm c(m). (3.2.1) m∈ν (α ) Such an additive utility function can be represented as the tuple (λm )m∈ν (α ). This is a very versatile representation of additivity, because the neighbors of α can be function nodes. Thus additive utility functions can represent weighted sums of arbitrary functions of configurations over action nodes. We now formally define an AGG-FN representation where some of the utility functions are additive. 4 Such a utility function could also be represented using standard function nodes representing summation. However, we treat the common case of additivity separately because it is amenable to special-purpose computational methods (intuitively, leveraging the linearity of expectation; see Section 3.4.3). 58 Definition 3.2.15. An AGG-FN with additive structure (AGG-FNA) is a tuple (N, A, P, G, f , A+ , Λ, u) where N, A, P, G, f are as defined in Definition 3.2.9, and • A+ ⊆ A is the set of actions whose utility functions are additive; • Λ = (λ α+ )α+ ∈A+ , where each λ α+ = (λmα+ )m∈ν (α ) is the tuple of coefficients representing the additive utility function uα+ ; • u = (uα )α ∈A \A+ , where each uα is as defined in Definition 3.2.9. These are the non-additive utility functions of the game, which are represented explicitly. Representation Size We only need |ν (α )| numbers to represent the coefficients of an additive utility function uα , whereas the explicit representation requires |C(α ) | numbers. Of course we also need to take into account the sizes of the neighboring function nodes p ∈ ν (α ) and their corresponding functions f p , which represent the summands of the additive functions. Each f p either has a simple description requiring negligible space, or is represented explicitly as a mapping. In the latter case its size can be analyzed the same way as utility functions on action nodes. That is, when the neighbors of p are all actions then Proposition 3.2.6 applies; otherwise the discussion in Section 3.2.2 applies. Representing Congestion Games as AGG-FNAs An arbitrary congestion game can be encoded as an AGG-FNA with no loss of compactness, where all uα are represented as additive utility functions. Given a congestion game (N, M, (Ai )i∈N , (K jk ) j∈M,k≤n ) as defined in Definition 2.1.1, we construct an AGG-FNA with the same number of players and same number of actions for each player as follows. • Create ∑i∈N |Ai | action nodes, corresponding to the actions in the congestion game. In other words, the action sets do not overlap. • Create 2m function nodes, labeled (p1 , . . . , pm , q1 , . . . , qm ). For each j ∈ M, there is an edge from p j to q j . For all j ∈ M and for all α ∈ A , if facility j 59 p1 p2 q1 q2 p3 B1 A2 1 A1 2 B2 3 A1 + q3 + + A2 B1 + B2 Figure 3.4: Left: a two-player congestion game with three facilities. The actions are shown as ovals containing their respective facilities. Right: the AGG-FNA representation of the same congestion game. is included in action α in the congestion game, then in the action graph there is an edge from the action node α to p j , and also an edge from q j to α . • For each p j , define c(p j ) ≡ ∑α ∈ν ( j) c(α ), i.e., p j is a simple aggregator. Since its neighbors are the actions that includes facility j, thus c(p j ) is the number of players that chose facility j, which is #( j, a). • Assign each q j only one neighbor, namely p j , and define c(q j ) ≡ f q j (c(p j )) ≡ K j (c(p j )). In other words, c(q j ) is exactly K j (#( j, a)), the cost on facility j. • For each action node α , represent the utility function uα as an additive function with weight −1 for each of its neighbors, uα (c(α ) ) = ∑ −c( j) = − j∈ν (α ) ∑ K j (#( j, a)). (3.2.2) j∈ν (α ) Example 3.2.16 (Congestion game). Consider the AGG-FNA representation of a two-player congestion game (see Figure 3.4). The congestion game has three facilities labeled {1, 2, 3}. Player A has actions A1={1} and A2={1, 2}; Player B has actions B1={2, 3} and B2={3}. Now let us consider the representation size of this AGG-FNA. The action graph has |A | + 2m nodes and O(m|A |) edges; the function nodes p1 , . . . , pm are simple aggregators and each only requires constant space; each f q j requires n numbers to specify so the total size of the AGG-FNA is Θ(mn + m|A |) = Θ(mn + m ∑i∈N |Ai |). 60 Economics Computer Science Electrical Engineering PhD PhD PhD MSc MEng MEng BSc BEng BEng Dipl Dipl Dipl High Figure 3.5: AGG-0/ representation of the Job Market game. Thus this AGG-FNA representation has the same space complexity as the original congestion game representation. One extension of congestion games is player-specific congestion games [Milchtaich, 1996, Monderer, 2007]. Instead of all players having the same costs K jk , in these games each player has a different set of costs. This can be easily represented as an AGG-FNA by following the construction above, but using a different set of function nodes qi1 , . . . , qim for each player i. 3.3 Further Examples In this section we provide several more examples of structured games that can be compactly represented as AGGs. 3.3.1 A Job Market Here we describe a class of example games that can be compactly represented as AGG-0s. / Unlike the Ice Cream Vendor game, the following example does not involve choosing among actions that correspond to geographical locations. Example 3.3.1 (Job Market game). Consider the individuals competing in a job market. Each player chooses a field of study and a level of education to achieve. 61 The utility of player i is the sum of two terms: (a) a constant cost depending only on the chosen field and education level, capturing the difficulty of studies and the cost of tuition and forgone wages; and (b) a variable reward, depending on (i) the number of players who chose the same field and education level as i, (ii) the number of players who chose a related field at the same education level, and (iii) the number of players who chose the same field at one level above or below i. Figure 3.5 gives an action graph modeling one such job market scenario, in which there are three fields, Economics, Computer Science and Electrical Engineering . For each field there are four levels of postsecondary study: Diploma, Bachelor, Master and PhD. Economics and Computer Science are considered related fields, and so are Computer Science and Electrical Engineering. There is another action representing high school education, which does not require a choice of field. The maximum in-degree of the action graph is five, whereas a naive representation of the game as a symmetric game (see Section 3.2.1) would correspond to a complete action graph with in-degree 13. Thus this AGG-0/ representation is able to take advantage of anonymity as well as context-specific independence structure. 3.3.2 Representing Anonymous Games as AGG-FNs One property of the AGG-0/ representation as defined in Section 3.2.1 is that utility function uα is shared by all players who have α in their action sets. What if we want to represent games with agent-specific utility functions, where utilities depend not only on α and c(α ) , but also on the identity of the player playing α ? As mentioned in Section 2.1.1, researchers have studied anonymous games, which deviate from symmetric games by allowing agent-specific utility functions [Daskalakis and Papadimitriou, 2007, Kalai, 2004, 2005]. To represent games of this type as AGGs, we cannot just let multiple players share action α , because that would force those players to have the same utility function uα . It does work to give agents non-overlapping action sets, replicating each action once for each agent. However, the resulting AGG-0/ is not compact; it does not take advantage of the fact that each of the replicated actions affects other players’ utilities in the same way. Using function nodes, it is possible to compactly represent this kind of structure. We again split α into separate action nodes αi for each player i able 62 A1 A2 A3 B1 B2 B3 Figure 3.6: AGG-FN representation of a game with agent-specific utility functions. to take the action. Now we also introduce a function node p with every αi as a neighbor, and define f p to be a simple aggregator. Now p gives the total number of agents who chose action α , expressing anonymity, and action nodes include p as a neighbor instead of each αi . This allows agents to have different utility functions without sacrificing representational compactness. Example 3.3.2 (Anonymous game). Consider an anonymous game with two classes of players, each class sharing the same utility functions. The AGG-FN representation of the game is shown in Figure 3.6. Players from the first class have action set {A1, A2, A3}, and players from the second class have action set {B1, B2, B3}. Furthermore, the utility functions of the second class of players exhibit certain contextspecific independence structure, which are expressed by the absence of some of the possible edges from function nodes to action nodes B1, B2, B3. 3.3.3 Representing Polymatrix Games as AGG-FNAs A polymatrix game (defined in Section 2.1.1) can be compactly represented as an AGG-FNA. The encoding is as follows. The AGG-FNA has non-overlapping action sets. For each pair of players (i, j), we create two function nodes to represent i and j’s payoffs under the bimatrix game between them. Each of these function nodes has incoming edges from all of i’s and j’s actions. For each player i and each of his actions ai , there are incoming edges from the n − 1 function nodes representing i’s payoffs in his bimatrix games against each of the other players. 63 SA A1 A2 + + UAC UAB UBA UCA SC SB + UBC B1 B2 + + C1 + C2 UCB Figure 3.7: AGG-FNA representation of a 3-player polymatrix game. Function node UAB represents player A’s payoffs in his bimatrix game against B, UBA represents player B’s payoffs in his bimatrix game against A, and so on. To avoid clutter we do not show the edges from the action nodes to the function nodes in this graph. Such edges exist from A and B’s actions to UAB and UBA , from A and C’s actions to UAC and UCA , and from B and C’s actions to UBC and UCB . uai is an additive utility function with weights equal to 1. Based on arguments similar to those in Section 3.2.1, this AGG-FNA representation has the same space complexity as the total size of the bimatrix games. Example 3.3.3 (Polymatrix game). Consider the AGG-FNA representation of a three-player polymatrix game, given in Figure 3.7. Each player’s payoff is the sum of her payoffs in 2 × 2 game with played with each of the other players; she is only able to choose her action once. This additive utility function can be captured by introducing a function node Ui j to represent each player i’s utility in the bimatrix game played with player j. 3.3.4 Congestion Games with Action-Specific Rewards So far the only use we have shown for AGG-FNAs is bringing existing game representations into the AGG framework. Of course, another key advantage of our approach is the ability to compactly represent games that would not have been compact under these existing game representations. We now give such an example. 64 Example 3.3.4 (Congestion game with action-specific rewards). Consider the following game with n players. As in a congestion game, there is a set of facilities M, each action involves choosing a subset of the facilities, and the cost for facility j depends only on the number of players that chose facility j. Now further assume that, in addition to the cost of using the facilities, each player i also derives some utility Ri depending only on her own action ai , i.e., the set of facilities she chose. This utility is not necessarily additive across facilities. That is, in general if A, B ⊂ M and A ∩ B = 0, / Ri (A ∪ B) = Ri (A) + Ri(B). So i’s total utility is ui (a) = Ri (ai ) − ∑ K j (#( j, a)). (3.3.1) j∈ai This game can model a situation in which the players use the facilities to complete a task, and the utility of the task depends on the facilities chosen. Another interpretation is given by Ben-Sasson et al. [2006], in their analysis of “congestion games with strategy costs,” which also have exactly this type of utility function. This work interpreted (the negative of) Ri (ai ) as the computational cost of choosing the pure strategy ai in a congestion game. Due to the extra Ri (ai ) term in the utility expression (3.3.1), this game cannot be directly represented as a congestion game or a player-specific congestion game,5 but it can be compactly represented as an AGG-FNA. We create ∑i |Ai | action nodes, giving the agents nonoverlapping action sets. We have shown in Section 3.2.3 that we can use function nodes and additive utility functions to represent the congestion-game-like costs. Beyond this construction, we just need to create a function node ri for each player i and define c(ri ) to be equal to Ri (ai ). The neighbors of ri are i’s entire action set: ν (ri ) = Ai . Since the action sets do not overlap, there are only |Ai | distinct configurations over Ai . In other words, |C(ri ) | = |Ai | and we need only O(|Ai |) space to represent each Ri . The total size of the representation is O(mn + m ∑i∈N |Ai |). 5 Interestingly, Ben-Sasson et al. [2006] showed that this game belongs to the set of potential games, which implies that there exists an equivalent congestion game. However, building such a congestion game from the potential function following Monderer and Shapley’s [1996] construction yields an exponential number of facilities, meaning that this congestion game representation is exponentially larger than the AGG-FNA representation presented here. 65 3.4 Computing Expected Payoff with AGGs Up to this point, we have concentrated on how AGGs may be used to compactly represent games of interest. But compact representation is only half the story, and indeed by itself is relatively easy to achieve. Our goal is to identify a compact representation that can be used directly (e.g., without conversion to its induced normal form) for the computation of game-theoretic quantities of interest. We now turn to this computational perspective, and show that we can indeed leverage AGG’s representational compactness in the computation of game-theoretic quantities. In this section we focus on the computational task of computing an agent’s expected payoff under a mixed strategy profile. As we discussed in Section 2.2, this task is important as an inner-loop problem in the computation of many game-theoretic quantities, including Govindan and Wilson’s [2003, 2004] algorithms for finding Nash equilibria, the simplicial subdivision algorithm for finding Nash equilibria [van der Laan et al., 1987], and Papadimitriou and Roughgarden’s [2008] algorithm for finding correlated equilibria. We discuss some of these applications in Section 3.5. Our main result of this section is an algorithm that efficiently computes expected payoffs of AGGs by exploiting their context-specific independence, anonymity and additivity structure. In Section 3.4.1 we introduce our expected payoff algorithm for AGG-0s, / and show (in Theorem 3.4.1) that the algorithm runs in time polynomial in the size of the input AGG-0. / For the special case of symmetric strategies in symmetric AGG-0s, / we present a different algorithm in Section 3.4.1 which runs asymptotically faster than our general algorithm for AGG-0s; / in Section 3.4.1 we extend this approach to the broader class of k-symmetric AGG-0s. / Finally, in Sections 3.4.2 and 3.4.3 we extend our expected payoff algorithm to AGG-FNs and AGG-FNAs respectively, and identify (in Theorems 3.4.5 and 3.4.6) conditions under which these extended algorithms run in polynomial time. 3.4.1 Computing Expected Payoff for AGG-0s / Following the notation of Section 2.2, we denote a mixed strategy of i by σi ∈ Σi , a mixed-strategy profile by σ ∈ Σ, and the probability that i plays action α as σi (α ). Now we can write the expected utility to agent i for playing pure strategy ai , 66 given that all other agents play the mixed strategy profile σ−i , as Vaii (σ−i ) ≡ ∑ ui (ai , a−i ) Pr(a−i |σ−i ), (3.4.1) a−i ∈A−i Pr(a−i |σ−i ) ≡ ∏ σ j (a j ). (3.4.2) j=i Note that Equation 3.4.2 gives the probability of a−i under the mixed strategy σ−i . In the rest of this section we focus on the problem of computing Vaii (σ−i ) given i, ai and σ−i . Having established the machinery to compute Vaii (σ−i ), we can then compute the expected utility of player i under a mixed strategy profile σ as ∑ai ∈Ai σi (ai )Vaii (σ−i ). One might wonder why Equations (3.4.1) and (3.4.2) are not the end of the story. Notice that Equation (3.4.1) is a sum over the set A−i of action profiles of players other than i. The number of terms is ∏ j=i |A j |, which grows exponentially in n. If we were to use the normal form representation, there really would be |A−i | different outcomes to consider, each with potentially distinct payoff values. Thus, using normal form the evaluation of Equation (3.4.1) would be the best possible algorithm for computing Vaii . Since AGGs are fully expressive, the same is true for games without any structure represented as AGGs. However, what about games that are exponentially more compact when represented as AGGs than when represented in the normal form? For these games, evaluating Equation (3.4.1) amounts to an exponential-time algorithm. In this section we present an algorithm that given any i, ai and σ−i , computes the expected payoff Vaii (σ−i ) in time polynomial in the size of the AGG-0/ representation. In other words, our algorithm is efficient if the AGG-0/ is compact, and requires time exponential in n if it is not. In particular, recall from Proposition 3.2.6 any AGG-0/ with maximum in-degree bounded by a constant has a representation size that is polynomial in n. As a result our algorithm is polynomial in n for such games. Exploiting Context-Specific Independence: Projection First, we consider how to take advantage of the context-specific independence structure of an AGG-0: / the fact that i’s payoff when playing ai only depends on 67 configurations over the neighborhood of i. The key idea is that we can project other players’ strategies onto a smaller action space that is strategically the same from the point of view of an agent who chose action ai . That is, we construct a graph from the point of view of a given agent, expressing his sense that actions that do not affect his chosen action are in a sense the “same action.” This can be seen as inducing a context-specific graphical game. Formally, for every action α ∈ A define a reduced graph G(α ) by including only the nodes ν (α ) and a new node denoted 0. / The only edges included in G(α ) are the directed edges from each of the nodes (α ) ν (α ) to the node α . Player j’s action a j is projected to a node a j graph G(α ) by the mapping (α ) aj ≡ a j a j ∈ ν (α ) a j ∈ ν (α ) 0/ in the reduced . (3.4.3) In other words, actions that are not in ν (α ) (and therefore do not affect the payoffs of agents playing α ) are projected onto a new action, 0. / The resulting projected (α ) action set A j has cardinality at most min(|A j |, |ν (α )| + 1). This is illustrated in Figure 3.8, using the Ice Cream Vendor game described in Example 3.2.5. (α ) We define the set of mixed strategies on the projected action set A j (α ) ϕ (A j ). A mixed strategy (α ) Σ j by the mapping (α ) (α ) by Σ j σ j on the original action set A j is projected to σ j (a j ) (α ) σ j (a j ) ≡ ∑α ′ ∈A j \ν (α ) σ j a j ∈ ν (α ) (α ′ ) (α ) aj = 0/ . (α ) σj ≡ ∈ (3.4.4) (a ) So given ai and σ−i , we can compute σ−ii in O(n|A |) time in the worst case. Now we can operate entirely on the projected space, and write the expected payoff as ∑ Vaii (σ−i ) = (a ) (a ) u ai , C (ai ) (ai , a−i ) Pr a−ii |σ−ii , (a ) (a ) a−ii ∈A−ii (a ) (a ) Pr a−ii |σ−ii = ∏σj (ai ) (ai ) aj . j=i (a ) The summation is over A−ii , which in the worst case has (|ν (ai )| + 1)(n−1) terms. 68 AI I1 I2 I3 I4 I1 I2 AW ; S1 S2 S4 S3 S1 S2 AS Figure 3.8: Projection of the action graph. Left: action graph of the Ice Cream Vendor game. Right: projected action graph and action sets with respect to the action C1. So for AGG-0s / with strict or context-specific independence structure, computing Vaii (σ−i ) in this way is exponentially faster than doing the summation in (3.4.1) directly. However, the time complexity of this approach is still exponential in n. Exploiting Anonymity: Summing over Configurations Next, we want to take advantage of the anonymity structure of the AGG-0. / Recall from our discussion of representation size that the number of distinct configurations is usually smaller than the number of distinct pure action profiles. So ideally, we want to compute the expected payoff Vaii (σ−i ) as a sum over the possible configurations, weighted by their probabilities: Vaii (σ−i ) = ∑ ui ai , c(ai ) Pr c(ai ) |σ (ai ) , (3.4.5) c(ai ) ∈C(ai ,i) Pr c(ai ) |σ (ai ) = ∑ a: C (ai ) (a) n ∏ σ j (a j ). (3.4.6) j=1 = c(ai ) (a ) where σ (ai ) ≡ (ai , σ−ii ) and Pr(c(ai ) |σ (ai ) ) is the probability of c(ai ) given the mixed strategy profile σ (ai ) . Recall that C(ai ,i) is the set of configurations over ν (ai ) given that i played ai . So Equation (3.4.5) is a summation of size |C(ai ,i) |, the number of configurations given that i played ai , which is polynomial in n if |ν (ai )| is bounded by a constant. The difficult task is to compute Pr(c(ai ) |σ (ai ) ) for all c(ai ) ∈ C(ai ,i) , 69 i.e., the probability distribution over C(ai ,i) induced by σ (ai ) . We observe that the sum in Equation (3.4.6) is over the set of all action profiles corresponding to the configuration c(ai ) . The size of this set is exponential in the number of players. Therefore directly computing the probability distribution using Equation (3.4.6) would take time exponential in n. Can we do better? We observe that the players’ mixed strategies are independent, i.e., σ is a product probability distribution σ (a) = ∏i σi (ai ). Also, each player affects the configuration c independently. This structure allows us to use dynamic programming (DP) to efficiently compute the probability distribution Pr(c(ai ) |σ (ai ) ). The intuition behind our algorithm is to apply one agent’s mixed strategy at a time, (a ) i effectively adding one agent at a time to the action graph. Let σ1...k denote the pro- (ai ) jected strategy profile of agents {1, . . . , k}. Denote by Ck induced by actions of agents {1, . . . , k}. Similarly, write (ai ) the probability distribution on Ck (a ) the set of configurations (a ) ck i (a ) ∈ Ck i . Denote by Pk i induced by σ1...k , and by Pk [c] the probability of (a ) configuration c. At iteration k of the algorithm, we compute Pk from Pk−1 and σk i . After iteration n, the algorithm stops and returns Pn . The pseudocode of our DP algorithm is shown as Algorithm 1, and our full algorithm for computing Vaii (σ−i ) is summarized in Algorithm 2. (a ) Each ck i is represented as a sequence of integers, so Pk is a mapping from sequences of integers to real numbers. We need a data structure to manipulate such probability distributions over configurations (sequences of integers) which permits quick lookup, insertion and enumeration. An efficient data structure for this purpose is a trie [Fredkin, 1962]. Tries are commonly used in text processing to store strings of characters, e.g. as dictionaries for spell checkers. Here we use tries to store strings of integers rather than characters. Both lookup and insertion complexity is linear in |ν (ai )|. To achieve efficient enumeration of all elements of a trie, we store the elements in a list, in the order of their insertion. We omit the proof of correctness of our algorithm, which is relatively straightforward. Complexity Let C(ai ,i) (σ−i ) denote the set of configurations over ν (ai ) that have positive probability of occurring under the mixed strategy (ai , σ−i ). In other words, this is the 70 Algorithm 1: Computing the induced probability distribution Pr(c(ai ) |σ (ai ) ). Input: ai , σ (ai ) Output: Pn , which is the distribution Pr(c(ai ) |σ (ai ) ) represented as a trie. (a ) c0 i = (0, . . . , 0); (a ) (ai ) P0 [c0 i ] = 1.0 ; // Initialization: for k = 1 to n do Initialize Pk to be an empty trie; C0 (a ) = {c0 i } (a ) i foreach ck−1 from Pk−1 do (a ) (a ) (a ) (a ) foreach ak i ∈ Ak i such that σk i (ak i ) > 0 do (a ) (ai ) ck i = ck−1 ; (a ) if ak i = 0/ then (a ) (a ) (a ) ck i (ak i ) += 1 ; // Apply action ak i (a ) if Pk [ck i ] does not exist yet then (a ) Pk [ck i ] = 0.0; (a ) (a ) (a ) (a ) i Pk [ck i ] += Pk−1 [ck−1 ] × σk i (ak i ); return Pn number of terms we need to add together when doing the weighted sum in Equation (3.4.5). When σ−i has full support, C(ai ,i) (σ−i ) = C(ai ,i) . Theorem 3.4.1. Given an AGG-0/ representation of a game, i’s expected payoff Vaii (σ−i ) can be computed in Θ(n|A | + n|ν (ai )|2 |C(ai ,i) (σ−i )|) time, which is polynomial in the size of the representation. If I , the in-degree of the action graph, is bounded by a constant, Vaii (σ−i ) can be computed in time polynomial in n. Proof. Since looking up an entry in a trie takes time linear in the size of the key, which is |ν (ai )| in our case, the complexity of doing the weighted sum in Equation (3.4.5) is O(|ν (ai )||C(ai ,i) (σ−i )|). Algorithm 1 requires n iterations; in iteration k, we look at all possible combi(a ) (a ) i nations of ck−1 and αk i , and in each case do a trie look-up which costs Θ(|ν (ai )|). (a ) (ai ) Since |Ak i | ≤ |ν (ai )| + 1, and |Ck−1 | ≤ |C(ai ,i) |, the complexity of Algorithm 1 is Θ(n|ν (ai )|2 |C(ai ,i) (σ−i )|). This dominates the complexity of summing up Equa(α ) tion (3.4.5). Adding the cost of computing σ−i , we get the overall complexity of 71 Algorithm 2 Computing expected utility Vaii (σ−i ), given ai and σ−i . (ai ) 1. for each j = i, compute the projected mixed strategy σ j (a ) (a ) σ j i (a j i ) ≡ σ j (a j ) ∑α ′ ∈A j \ν (ai ) σ j (α ′ ) using Equation (3.4.4): a j ∈ ν (ai ) . (a ) a j i = 0/ (a ) 2. compute the probability distribution Pr(c(ai ) |ai , σ−ii ) by following Algorithm 1. 3. calculate the expected utility using the following weighted sum (Equation (3.4.5)): Vaii (σ−i ) = ∑ ui ai , c(ai ) Pr c(ai ) |σ (ai ) . c(ai ) ∈C(ai ,i) expected payoff computation Θ(n|A | + n|ν (ai )|2 |C(ai ,i) (σ−i )|). Since |C(ai ,i) (σ−i )| ≤ |C(ai ,i) | ≤ |C(ai ) |, and |C(ai ) | is the number of payoff values stored in payoff function uai , this means that expected payoffs can be computed in polynomial time with respect to the size of the AGG-0. / Furthermore, our algorithm is able to exploit strategies with small supports which lead to a small |C(ai ,i) (σ−i )|. Since |C(ai ) | is bounded by (n−1+|ν (ai )|)! (n−1)!|ν (ai )|! , this implies that if the in-degree of the graph is bounded by a constant, then the complexity of computing expected payoffs is O(n|A | + nI +1 ). The proof of Theorem 3.4.1 shows that besides exploiting the compactness of the AGG-0/ representation, our algorithm is also able to exploit the cases where the mixed strategy profiles given have small support sizes, because the time complexity depends on |C(ai ,i) (σ−i )| which is small when support sizes are small. This is important in practice, since we will often need to carry out expected utility computations for strategy profiles with small supports. Porter et al. [2008] observed that quite often games have Nash equilibria with small support, and proposed algorithms that explicitly search for such equilibria. In other algorithms for computing Nash equilibria such as Govindan-Wilson and simplicial subdivision, it is also quite often necessary to compute expected payoffs for mixed strategy profiles with small support. Of course it is not necessary to apply the agents’ mixed strategies in the order 72 1 . . . n. In fact, we can apply the strategies in any order. Although the number of configurations |C(ai ,i) (σ−i )| remains the same, the ordering does affect the interme(a ) diate configurations Ck i . We can use the following heuristic to try to minimize the number of intermediate configurations: sort the players in ascending order of the sizes of their projected action sets. This reduces the amount of work we do in earlier iterations of Algorithm 1, but does not change its overall complexity. The Case of Symmetric Strategies in Symmetric AGG-0s / As described in Section 3.2.1, if a game is symmetric it can be represented as an AGG-0/ with Ai = A for all i ∈ N. Given a symmetric game, we are often interested in computing expected utilities under symmetric mixed strategy profiles, where a mixed strategy profile σ is symmetric if σi = σ j ≡ σ∗ for all i, j ∈ N. In Section 3.5.2 we will discuss algorithms that make use of expected utility computation under symmetric strategy profiles to compute a symmetric Nash equilibrium of symmetric games. To compute the expected utility Vaii (σ∗ ), we could use the algorithm we proposed for general AGG-0s / under arbitrary mixed strategies, which requires time polynomial in the size of the AGG-0. / But we can gain additional computational speedup by exploiting the symmetry in the game and the strategy profile. As before, we want to use Equation (3.4.5) to compute the expected utility, so the crucial task is again computing the probability distribution over projected con(a ) (a ) figurations, Pr(c(ai ) |σ (ai ) ). Recall that σ (ai ) ≡ (ai , σ−ii ). Define Pr(c(ai ) |σ∗ i ) to (a ) be the distribution induced by σ−ii , the partial mixed strategy profile of players (a ) other than i, each playing the symmetric strategy σ∗ i . Once we have the distri(a ) bution Pr(c(ai ) |σ∗ i ), we can then compute the distribution Pr(c(ai ) |σ (ai ) ) straightforwardly by applying player i’s strategy ai . In the rest of this section we focus on (a ) computing Pr(c(ai ) |σ∗ i ). Define S (c(ai ) ) to be the set containing all action profiles a(ai ) such that C (a(ai ) ) = c(ai ) . Since all agents have the same mixed strategies, each pure action profile in 73 S (c(ai ) ) is equally likely, so for any a(ai ) ∈ S (c(ai ) ) (ai ) = S (c(ai ) ) Pr a(ai ) |σ∗ (ai ) = Pr c(ai ) |σ∗ Pr a(ai ) |σ∗ (ai ) ∏ (a ) (ai ) (α ) (σ∗ i (α ))c , (3.4.7) . (3.4.8) α ∈A (ai ) The sizes of S (c(ai ) ) are given by the multinomial coefficient S c(ai ) = (n − 1)! . ∏α ∈A (ai ) c(ai ) (α ) ! (3.4.9) Better still, using a Gray code technique we can avoid reevaluating these equations for every c(ai ) ∈ C(ai ) . Denote the configuration obtained from c(ai ) by decrementing by one the number of agents taking action α ∈ A (ai ) and incrementing (a ) ′ by one the number of agents taking action α ′ ∈ A (ai ) as c(ai ) ≡ c(αi→α ′ ) . Then consider the graph HC(ai ) whose nodes are the elements of the set C(ai ) , and whose directed edges indicate the effect of the operation (α → α ′ ). This graph is a regular triangular lattice inscribed within a (|A (ai ) | − 1)-dimensional simplex. Having (a ) computed Pr(c(ai ) |σ∗ i ) for one node of HC(ai ) corresponding to configuration c(ai ) , we can compute the result for an adjacent node in O(1) time, Pr (a ) (a ) c(αi→α ′ ) |σ∗ i (a ) = σ∗ i (α ′ )c(ai ) (α ) (a ) σ∗ i (α ) c(ai ) (α ′ ) + 1 (ai ) Pr c(ai ) |σ∗ . (3.4.10) HC(ai ) always has a Hamiltonian path (attributed to an unpublished result of (a ) Knuth by Klingsberg [1982]), so having computed Pr(c(ai ) |σ∗ i ) for an initial c(ai ) using Equation (3.4.8), the results for all other projected configurations (nodes in HC(ai ) ) can be computed by using Equation (3.4.10) at each subsequent step on the path. Generating the Hamiltonian path corresponds to finding a combinatorial Gray code for compositions; an algorithm with constant amortized running time is given by Klingsberg [1982]. Intuitively, it is easy to see that a simple, “lawnmower” Hamiltonian path exists for any lower-dimensional projection of HC(ai ) , with the only state required to compute the next node in the path being a direction value for each dimension. (ai ) Our algorithm for computing the distribution Pr c(ai ) |σ∗ 74 is summarized in (ai ) Algorithm 3 Computing distribution Pr c(ai ) |σ∗ (a ) in a symmetric AGG-0/ (a ) 1. let c(ai ) = c0 i , where c0 i is the initial node of a Hamiltonian path of HC(ai ) . (ai ) 2. compute Pr c(ai ) |σ∗ using Equation (3.4.7): (ai ) Pr c(ai ) |σ∗ = (a ) (n − 1)! (a ) (σ∗ i (α ))c i (α ) . ∏ (a ) i ∏α ∈A (ai ) c (α ) ! α ∈A (ai ) 3. While there are more configurations in C(ai ) : (a ) (a) get the next configuration c(αi→α ′ ) in the Hamiltonian path, using Klingsberg’s algorithm [Klingsberg, 1982]. (a ) (ai ) (b) compute Pr c(αi→α ′ ) |σ∗ (a ) (ai ) Pr c(αi→α ′ ) |σ∗ using Equation (3.4.10): (a ) = σ∗ i (α ′ )c(ai ) (α ) (a ) σ∗ i (α ) c(ai ) (α ′ ) + 1 (ai ) Pr c(ai ) |σ∗ . (a ) (c) let c(ai ) = c(αi→α ′ ) . (ai ) 4. output Pr c(ai ) |σ∗ for all c(ai ) ∈ C(ai ) . Algorithm 3. For computing expected utility, we again use Algorithm 2, except with Algorithm 3 replacing Algorithm 1 as the subroutine for computing the distri(a ) bution Pr c(ai ) |σ∗ i . Theorem 3.4.2. Computation of the expected utility Vaii (σ∗ ) under a symmetric strategy profile for symmetric action-graph games using Equations (3.4.5), (3.4.7), (3.4.8) and (3.4.10) takes time O(|A | + |ν (ai )| C(ai ) (σ (ai ) ) ). (ai ) Proof. Projection to σ∗ Equation (3.4.5) has takes O(|A |) time since the strategies are symmetric. C(ai ) (σ (ai ) ) summands. The probability for the initial con- figuration requires O(n) time. Using Gray codes the computation of subsequent probabilities can be done in constant amortized time for each configuration. Since each look-up of the utility function takes O(|ν (ai )|) time, the total complexity of the algorithm is O(|A | + |ν (ai )| C(ai ) (σ (ai ) ) ). 75 Algorithm 4 Computing the probability distribution Pr(c(ai ) |σ (ai ) ) in a ksymmetric AGG-0/ under a k-symmetric mixed strategy profile σ (ai ) . 1. Partition the players according to {N1 , . . . , Nk }. (a ) 2. For each l ∈ {1, . . . , k}, compute Pr(c(ai ) |σNl i ), the probability distribution induced (a ) (a ) by σNl i , the partial strategy profile of players in Nl . Since σNl i is symmetric, this can be computed efficiently using Algorithm 3 as discussed in Section 3.4.1. 3. Combine the k probability distributions together using Algorithm 1, resulting in the distribution Pr(c(ai ) |σ (ai ) ). Note that this is faster than our dynamic programming algorithm for general AGG-0s / under arbitrary strategies, whose complexity is Θ(n|A |+n|ν (ai )|2 C(ai ) (σ (ai ) ) ) by Theorem 3.4.1. In the usual case where the second term dominates the first, the algorithm for symmetric strategies is faster by a factor of n|ν (ai )|. k-symmetric Games We now move to a generalization of symmetry in games that we call k-symmetry. Definition 3.4.3. An AGG-0/ is k-symmetric if there exists a partition {N1 , . . . , Nk } of N such that for all l ∈ {1, . . . , k}, for all i, j ∈ Nl , Ai = A j . Intuitively, k-symmetric AGG-0s / represent games with k classes of identical agents, where agents within each class are identical. Note that all games are trivially n-symmetric. The Ice Cream Vendor game of Example 3.2.5 is a nontrivial k-symmetric AGG-0/ with k = 3. Given a k-symmetric AGG-0/ with partition {N1 , . . . , Nk }, a mixed strategy profile σ is k-symmetric if for all l ∈ {1, . . . , k}, for all i, j ∈ Nl , σi = σ j . We are often interested in computing expected utility under k-symmetric strategy profiles. For example in Section 3.5.2 we will discuss algorithms that make use of such expected utility computations to find k-symmetric Nash equilibria in k-symmetric games. To compute expected utility under a k-symmetric mixed strategy profile, we can use a hybrid approach when computing the probability distribution over configurations, shown in Algorithm 4. Observe that this algorithm combines our specialized Algorithm 3 for handling symmetric games from Section 3.4.1 with the idea of running 76 Algorithm 1 on the joint mixed strategies of subgroups of agents discussed at the end of Section 3.4.1. 3.4.2 Computing Expected Payoff with AGG-FNs Algorithm 1 cannot be directly applied to AGG-FNs with arbitrary f p . First of all, projection of strategies does not work directly, because a player j playing an action a j ∈ ν (α ) could still affect c(α ) via function nodes. Furthermore, the general idea of using dynamic programming to build up the probability distribution by adding one player at a time does not work because for an arbitrary function node p ∈ ν (α ), each player would not be guaranteed to affect c(p) independently. We could convert the AGG-FN to an AGG-0/ in order to apply our algorithm, but then we would not be able to translate the extra compactness of AGG-FNs over AGG-0s / into more efficient computation. In this section we identify two subclasses of AGG-FN for which expected utility can be efficiently computed. In Section 3.4.2 we show that when all function nodes belong to a restricted class of contribution-independent function nodes, expected utility can be computed in polynomial time. In Section 3.4.2 we reinterpret the expected utility problem as a Bayesian network inference problem, which can be computed in polynomial time if the resulting Bayesian network has bounded treewidth. Contribution-Independent Function Nodes Definition 3.4.4. A function node p in an AGG-FN is contribution-independent (CI) if • ν (p) ⊆ A , i.e., the neighbors of p are action nodes. • There exists a commutative and associative operator ∗, and for each α ∈ ν (p) an integer wα , such that given an action profile a = (a1 , . . . , an ), c(p) = ∗i∈N:ai ∈ν (p) wai . • The running time of each ∗ operation is bounded by a polynomial in n, |A | and |P|. Furthermore, ∗ can be represented in space polynomial in n, |A | and |P|. 77 An AGG-FN is contribution-independent if all its function nodes are contributionindependent. Note that it follows from this definition that c(p) can be written as a function of c(p) c(α ) by collecting terms: c(p) ≡ f p (c(p) ) = ∗α ∈ν (p) (∗k=1 wα ). Simple aggregators can be represented as contribution-independent function nodes, with the + operator serving as ∗, and wα = 1 for all α . The Coffee Shop game is thus an example of a contribution-independent AGG-FN. For the parity game in Example 3.2.8, ∗ is instead addition mod 2. An example of a non-additive CI function node arises in a perfect-information model of an (advertising) auction in which actions correspond to bid amounts [Thompson and Leyton-Brown, 2009]. Here we want c(p) to represent the amount of the winning bid, and so we let wα be the bid amount corresponding to action α , and ∗ be the max operator. The advantage of contribution-independent AGG-FNs is that for all function nodes p, each player’s strategy affects c(p) independently. This fact allows us to adapt our algorithm to efficiently compute the expected utility Vaii (σ−i ). For simplicity we present the algorithm for the case where we have one operator ∗ for all p ∈ P, but our approach can be directly applied to games with different operators and wα associated with different function nodes. We define the contribution of action α to node m ∈ A ∪ P, denoted δα (m), as δ (m′ ) α 1 if m = α , 0 if m ∈ A \ {α }, and ∗m′ ∈ν (m) (∗k=1 wα ) if m ∈ P. Then it is easy to verify that given an action profile a = (a1 , . . . , an ), c(α ) = ∑nj=1 δa j (α ) for all α ∈ A and c(p) = ∗nj=1 δa j (p) for all p ∈ P. Given that player i played ai , and for all α ∈ A , we define the projected contribution of action α under ai , denoted (a ) δα i , as the tuple (δα (m))m∈ν (ai ) . Note that different actions α may have identical projected contributions under ai . Player j’s mixed strategy σ j induces a probability distribution over j’s projected contributions, Pr(δ (ai ) |σ j ) = ∑a :δ (ai ) =δ (ai ) σ j (a j ). aj j Now we can operate entirely using the probabilities on projected contributions instead of the mixed strategy probabilities. This is analogous to the projection of σ j (ai ) to σ j in our algorithm for AGG-0s. / Algorithm 1 for computing the distribution Pr(c(ai ) |σ ) can be straightforwardly adopted to work with contribution-independent AGG-FNs. Whenever we apply (a ) (a ) (ai ) i player k’s contribution δak i to ck−1 , the resulting configuration ck 78 is computed (a ) (a ) (a ) (a ) i componentwise as follows: ck i (m) = δak i (m)+ ck−1 (m) if m ∈ A , and ck i (m) = (a ) (a ) i δak i (m) ∗ ck−1 (m) if m ∈ P. To analyze the complexity of computing expected utility, it is necessary to know the representation size of a contribution-independent AGG-FN. For each function node p we need to specify ∗ and (wα )α ∈ν (p) instead of f p directly. Let ∗ denote the representation size of ∗. Then the total size of a contributionindependent AGG-FN is O(∑α ∈A |C(α ) | + ∗ ). As discussed in Section 3.2.2, this size is not necessarily polynomial in n, |A | and |P|; although when the conditions in Corollary 3.2.11 are satisfied, the representation size is polynomial. Theorem 3.4.5. Expected utility can be computed in time polynomial in the size of a contribution-independent AGG-FN. Furthermore, if the in-degrees of the action nodes are bounded by a constant and the sizes of ranges |R( f p )| for all p ∈ P are bounded by a polynomial in n, |A | and |P|, then expected utility can be computed in time polynomial in n, |A | and |P|. Proof Sketch. Following similar complexity analysis as Theorem 3.4.1, if an AGG-FN is contribution-independent, expected utility Vaii (σ−i ) can be computed in O(n|A ||C(ai ) |(T∗ + |ν (ai )|)) time, where T∗ denotes the maximum running time of an ∗ operation. Since T∗ is polynomial in n, |A | and |P| by Definition 3.4.4, the running time for computing expected utility is polynomial in the size of the AGG-FN representation. The second part of the theorem follows from a direct application of Corollary 3.2.11. For AGG-FNs whose function nodes are all simple aggregators, each player’s set of projected contributions has size at most |ν (ai ) + 1|, as opposed to |A | in the general case. This leads to a run time complexity of O(n|A | + n|ν (ai )|2 |C(ai ) |), which is better than the complexity of the general case proved in Theorem 3.4.5. Applied to the Coffee Shop game, since |C(α ) | = O(n3 ) and all function nodes are simple aggregators, our algorithm takes O(n|A | + n4 ) time, which grows linearly in |A |. 79 Beyond Contribution Independence What about the case where not all function nodes are contribution-independent—is there anything we can do besides converting the AGG-FN into its induced AGG-0? / It turns out that by reducing the problem of computing expected utility to a Bayesian network inference problem, we can still efficiently compute expected utilities for certain additional classes of AGG-FNs. Bayesian networks compactly represent probability distributions exhibiting conditional independence structure (see, e.g., [Pearl, 1988, Russell and Norvig, 2003]). A Bayesian network is a DAG in which nodes represent random variables and edges represent direct probabilistic dependence. Each node X is associated with a conditional probability distribution (CPD) specifying the probability of each realization of random variable X conditional on the realizations of its parent random variables. A key step in our approach for computing expected utility in AGG-FNs is computing the probability distribution over configurations Pr(c(ai ) |σ (ai ) ). If we treat each node m’s configuration c(m) as a random variable, then the distribution over configurations can be interpreted as the joint probability distribution over the set of random variables {c(m)}m∈ν (ai ) . Given an AGG-FN, a player i and an action ai ∈ Ai , we can construct an induced Bayesian network Bai i : • The nodes of Bai i consist of (i) one node for each element of ν (ai ); (ii) one node for each neighbor of a function node belonging to ν (ai ); and (iii) one node for each neighbor of a function node added in the previous step, and so on until no more function nodes are added. Each of these nodes m represents the random variable c(m). We further introduce another kind of node: (iv) n nodes σ1 , . . . , σn , representing each player’s mixed strategy. The domain of each random variable σi is Ai . • The edges of Bai i are constructed by keeping all edges that go into the function nodes that are included in B, ignoring edges that go into action nodes. Furthermore for each player j, we create an edge from σ j to each of j’s actions a j ∈ A j . • The conditional probability distribution (CPD) at each function node p is just the deterministic function f p . The CPD at each action node α ′ is a deterministic function that returns the number of its parents (observe that these 80 are all mixed strategy nodes) that take the value α ′ . Mixed strategy nodes have no incoming edges; their (unconditional) probability distributions are the mixed strategies of the corresponding players, except for player i, whose node σi takes the deterministic value ai . It is straightforward to verify that Bai i is a DAG, and that the joint distribution on random variables {c(m)}m∈ν (α ) is exactly the distribution over configurations (a ) Pr(c(ai ) |(ai , σ−ii )). This joint distribution can then be computed using a standard algorithm such as clique tree propagation or variable elimination. The running times of such algorithms are worst-case exponential; however, for Bayesian networks with bounded tree-width, their running times are polynomial. Further speedups are possible at nodes in the induced Bayesian network that correspond to action nodes and contribution-independent function nodes. The deterministic CPDs at such nodes can be formulated using independent contributions from each player’s strategy. This is an example of causal independence structure in Bayesian networks studied by Heckerman and Breese [1996] and Zhang and Poole [1996], who proposed different methods for exploiting such structure to speed up Bayesian network inference. Such methods share the common underlying idea of decomposing the CPDs into independent contributions, which is intuitively similar to our approach in Algorithm 1.6 3.4.3 Computing Expected Payoff with AGG-FNAs Due to the linearity of expectation, the expected utility of i playing an action ai with an additive utility function with coefficients (λm )m∈ν (ai ) is Vaii (σ−i ) = ∑ λm E[c(m)|ai , σ−i ], (3.4.11) m∈ν (ai ) where E[c(m)|ai , σ−i ] is the expected value of c(m) given the strategy profile (ai , σ−i ). Thus we can compute these expected values for each m ∈ ν (ai ), then sum them up as in Equation (3.4.11) to get the expected utility. If m is an action node, then E[c(m)|ai , σ−i ] is the expected number of players that chose m, which is 6 This approach of reducing expected utility computation to Bayesian network inference is further developed in Chapters 5 and 6, for Temporal Action-Graph Games and Bayesian Action-Graph Games respectively. 81 ∑i∈N σi (m). The more interesting case is when m is a function node. Recall that c(m) ≡ f m (c(m) ) where c(m) is the configuration over the neighbors of m. We can write the expected value of c(m) as E[c(m)|ai , σ−i ] = ∑ f m (c(m) ) Pr(c(m) |ai , σ−i ). (3.4.12) c(m) ∈C(m) This has the same form as Equation (3.4.5) for the expected utility Vaii (σ−i ), except that we have f m instead of uα . Thus our results for the computation of Equation (3.4.5) also apply here. That is, if the neighbors of m are action nodes and/or contribution-independent function nodes, then E[c(m)|ai , σ−i ] can be computed in polynomial time. Theorem 3.4.6. Suppose uα is represented as an additive utility function in a given AGG-FNA. If each of the neighbors of α is either (i) an action node, or (ii) a function node whose neighbors are action nodes and/or contribution-independent function nodes, then the expected utility Vαi (σ−i ) can be computed in time polynomial in the size of the representation. Furthermore, if the in-degrees of the neighbors of α are bounded by a constant, and the sizes of ranges |R( f p )| for all p ∈ P are bounded by a polynomial in n, |A | and |P|, then the expected utility can be computed in time polynomial in n, |A | and |P|. It is straightforward to verify that our AGG-FNA representations of polymatrix games, congestion games, player-specific congestion games and the game in Example 3.3.4 all satisfy the conditions of Theorem 3.4.6. 3.5 Computing Sample Equilibria with AGGs In this section we consider some theoretical and practical applications of our expected utility algorithm. In Section 3.5.1 we analyze the complexity of finding a sample ε -Nash equilibrium in an AGG and show that it is PPAD-complete. In Section 3.5.2 we extend our expected utility algorithm to the computation of payoff Jacobians, which is a key step in several algorithms for computing ε -Nash equilibria, including the Govindan-Wilson algorithm. In Section 3.5.3 we show that it can also speed up the simplicial subdivision algorithm, and in Section 3.5.4 we show that it can be used to find a correlated equilibrium in polynomial time. 82 3.5.1 Complexity of Finding a Nash Equilibrium In this section we consider the complexity of finding a Nash equilibrium of an AGG. As discussed in Section 2.2.1, since a Nash equilibrium for a game of more that two players may require irrational numbers in the probabilities, for practical computation it is necessary to consider approximations to Nash equilibria. Here we consider the frequently-used notion of ε -Nash equilibrium as defined in Definition 2.2.3. Recall from Section 2.2 that for any game representation, its NASH problem is defined to be the problem of finding an ε -Nash equilibrium of a game encoded in that representation, for some ε given as part of the input. Also recall from Section 2.2.1 that the NASH problem for n-player normal-form games with n ≥ 2 is complete for the complexity class PPAD, which is contained in NP but not known to be in P. Turning to compact representations, recall from Section 2.2.2 and in particular Theorem 2.2.4 that the complexity of computing expected utility plays a vital role in the complexity of finding an ε -Nash equilibrium. By leveraging Algorithm 1, we are able to apply Theorem 2.2.4 to AGGs. Corollary 3.5.1. The complexity of NASH for AGG-0s / is PPAD-complete. Remark. It may not be clear why this would be surprising or encouraging; indeed, the PPAD-hardness part of the claim is neither. However, the PPAD-membership part of the claim is a positive result. Specifically, it implies that the problem of finding a Nash equilibrium in an AGG-0/ can be reduced to the problem of finding a Nash equilibrium in a two-player normal-form game with size polynomial in the size of the AGG-0. / This is in contrast to the normal form representation of the original game, which can be exponentially larger than the AGG-0. / In other words, if we instead try to solve for a Nash equilibrium using the normal form representation of the original game, we would face a PPAD-complete problem with an input exponentially larger than the AGG-0/ representation. Proof sketch. The first condition of Theorem 2.2.4—polynomial type—is satisfied by all AGG variants, since action sets are represented explicitly. We first show that the problem belongs to PPAD, by constructing a circuit that computes expected utility and satisfies the second condition of Theorem 2.2.4.7 Recall that our expected utility algorithm consists of Equation (3.4.4), then Algorithm 1, and finally Equa83 tion (3.4.5). Equations (3.4.4) and (3.4.5) can be straightforwardly translated into arithmetic circuits using addition and multiplication nodes. Algorithm 1 involves for loops that cannot be directly translated to an arithmetic circuit, but we observe that we can unroll the for loops and still end up with a polynomial number of operations. The resulting circuit resembles a lattice with n levels; at the k-th level (a ) there are |Ck i | addition nodes. Each addition node corresponds to a configuration (ai ) ck (a ) (a ) ∈ Ck i , and calculates Pk [ck i ] as in iteration k of Algorithm 1. Also there are (a ) (a ) |Ak i | multiplication nodes for each ck i , in order to carry out the multiplications in iteration k of Algorithm 1. To show PPAD-hardness, we observe that an arbitrary graphical game can be encoded as an AGG-0/ without loss of compactness (see Section 3.2.1). Thus the problem of finding a Nash equilibrium in a graphical game can be reduced to the problem of finding a Nash equilibrium in an AGG-0. / Since finding a Nash equilibrium in a graphical game is known to be PPAD-hard, finding a Nash equilibrium in an AGG-0/ is PPAD-hard. For AGG-FNs that satisfy the conditions for Theorem 3.4.5 or AGG-FNAs that satisfy Theorem 3.4.6, similar arguments apply, and we can prove PPADcompleteness for those subclasses of games if we make the reasonable assumption that the operator ∗ used to define the CI function nodes can be implemented as an arithmetic circuit of polynomial length that satisfies the second condition of Theorem 2.2.4. 3.5.2 Computing a Nash Equilibrium: The Govindan-Wilson Algorithm Now we move from the theoretical to the practical. The PPAD-hardness result of Corollary 3.5.1 implies that a polynomial-time algorithm for Nash equilibrium is unlikely to exist, and indeed known algorithms for identifying sample Nash equilibria have worst-case exponential running times. Nevertheless, we will show that our dynamic programming algorithm for expected utility can be used to achieve exponential speedups in such algorithms, as well as an algorithm for computing a 7 Observe that the second condition in Theorem 2.2.4 implies that the expected utility algorithm must take polynomial time; however, some polynomial algorithms (e.g., those that rely on division) do not satisfy this condition. 84 sample correlated equilibrium. Specifically, we use a black-box approach as discussed in Section 2.2.2. First we consider Govindan and Wilson’s [2003] global Newton method, a state-of-the-art method for finding mixed-strategy Nash equilibria in multi-player games. Recall from Sections 2.2.1 and 2.3 that a bottleneck of the algorithm is the computation of payoff Jacobians, and the Gametracer package provides a blackbox implementation of the global Newton method that allows one to directly plug in representation-specific subroutines for this task. The payoff Jacobian is defined to be the Jacobian of the function V : Σ → R∑i |Ai | , whose (i, αi )-th component is the expected utility Vαi i (σ−i ). The corre- sponding Jacobian at σ is a (∑i |Ai |) × (∑i |Ai |) matrix with entries ∂ Vaii (σ−i ) ′ ≡ ∇Vai,ii ,ai′ (σ ) ∂ σi′ (ai′ ) = (3.5.1) ∑ u (ai , C (ai , ai , a)) Pr(a|σ ) (3.5.2) ′ a∈A if i = i′ , and zero otherwise. Here an overbar is shorthand for the subscript −{i, i′ } where i = i′ are two players; e.g., a ≡ a−{i,i′ } . The rows of the matrix are indexed ′ by i and ai while the columns are indexed by i′ and ai′ . Given entry ∇Vai,ii ,ai′ (σ ), we call ai its primary action node, and ai′ its secondary action node. We note that efficient computation of the payoff Jacobian is important for more than simply Govindan and Wilson’s global Newton method. For example, recall from Section 2.2.1 that the iterated polymatrix approximation (IPA) method [Govindan and Wilson, 2004] has the same computational problem at its core. Computing the Payoff Jacobian Now we consider how the payoff Jacobian may be computed. Equation (3.5.2) ′ shows that the ∇Vai,ii ,ai′ (σ ) element of the Jacobian can be interpreted as the expected utility of agent i when she takes action ai , agent i′ takes action ai′ , and all other agents use mixed strategies according to σ . So a straightforward—and quite effective—approach is to use our expected utility algorithm to compute each entry of the Jacobian. However, the Jacobian matrix has certain extra structure that allows us to achieve 85 further speedup. For example, observe that some entries of the Jacobian are identical. If two entries have the same primary action node α , then they are expected payoffs on the same utility function uα , and so have the same values if their induced probability distributions over C(α ) are the same. We need to consider two cases: 1. The two entries come from the same row of the Jacobian, say player i’s action ai . There are two sub-cases to consider: (a) The columns of the two entries belong to the same player j, but differ(ai ) ent actions a j and a′j . If a j (a ) = a′ j i , i.e., a j and a′j both project to the same projected action in ai ’s projected action graph,8 then ∇Vai,i ,aj j = ∇Vai,i ,aj ′ . This implies that when a j , a′j ∈ ν (ai ), ∇Vai,i ,aj j = ∇Vai,i ,aj ′ . j j (b) The columns of the entries correspond to actions of different players. (a ) We observe that for all j and a j such that σ (ai ) (a j i ) = 1, ∇Vai,i ,aj j (σ ) = (ai ) Vaii (σ−i ). As a special case, if A j = {0}, / i.e., agent j does not affect i’s payoff when i plays ai , then for all a j ∈ A j , ∇Vai,i ,aj j (σ ) = Vaii (σ−i ). 2. If ai and a j correspond to the same action node α (but owned by agents i and j respectively), thus sharing the same payoff function uα , then ∇Vai ,a j = ∇Vaj,ij ,ai . Furthermore, if there exist a′i ∈ Ai , a′j ∈ A j such that a′i (α ) = a′j (α ) (or i, j (α ) (α ) δa′ = δa′ for contribution-independent AGG-FNs), then ∇Vai,i ,aj ′ = ∇Vaj,ij ,a′ . i j j i A consequence of 1(a) is that any Jacobian of an AGG has at most ∑i ∑ai ∈Ai (n− 1)(ν (ai )+1) distinct entries. For AGGs with bounded in-degree, this is O(n ∑i |Ai |). For each set of identical entries, we only need to do the expected utility computation once. Even when two entries in the Jacobian are not identical, we can exploit the similarity of the projected strategy profiles (and thus the similarity of the induced distributions) between entries, reusing intermediate results when computing the induced distributions of different entries. Since computing the induced probability distributions is the bottleneck of our expected payoff algorithm, this provides significant speedup. 8 For (a ) (a ) contribution-independent AGG-FNs, the condition becomes δa j i = δa′ i , i.e., a j and a′j j have the same projected contribution under ai . 86 First we observe that if we fix the row (i, ai ) and the column’s player j, then σ is the same for all secondary actions a j ∈ A j . We can compute the probability distribution Pr(cn−1 |ai , σ (ai ) ), then for all a j ∈ A j , we just need to apply the action a j to get the induced probability distribution for the entry ∇Vai,i ,aj j . Now suppose we fix the row (i, ai ). For two column players j and j′ , their corresponding strategy profiles σ−{i, j} and σ−{i, j′ } are very similar, in fact they are identical in n − 3 of the n − 2 components. For AGG-0s, / we can exploit this similar(ai ) ity by computing the distribution Pr(cn−1 |σ−i ), then for each j = i, we “undo” j’s mixed strategy to get the distribution induced by σ−{i, j} , by treating distributions (a ) Pr(cn−1 |σ−ii ) and σ j as coefficients of polynomials and computing their quotient using long division. (See Section 2.3.5 of [Jiang, 2006] for a more detailed discussion of interpreting distributions over configurations as polynomials.) Finding equilibria of symmetric and k-symmetric games Nash proved [1951] that all finite symmetric games have at least one symmetric Nash equilibrium. The Govindan-Wilson algorithm can be adapted to find symmetric Nash equilibria in symmetric AGG-0s. / The modified algorithm now operates in the space of symmetric mixed strategy profiles Σ∗ = ϕ (A ), and follows a path of symmetric equilibria of perturbed symmetric games to a symmetric equilibrium of the unperturbed game. A key step of the algorithm is the computation of the Jacobian of the function V : Σ∗ → R|A | , whose α -th entry Vα (σ∗ ) is the expected utility of one player choosing α while the others play mixed strategy σ∗ . This Jacobian at σ∗ is a |A |× |A | matrix whose entry at row α and column α ′ is n− 1 multiplied by the expected utility of a player choosing action α , when another player is choosing action α ′ and the rest of the players play mixed strategy σ∗ . Such an entry can be efficiently computed using the techniques for symmetric expected utility computation discussed in Section 3.4.1, which are faster than our expected utility algorithm for general AGGs. Techniques discussed in the current section can further be used to speed up the computation of Jacobians in the symmetric case. In particular, it is straightforward to check that the Jacobian has at most ∑α ∈A (ν (α ) + 1) = O(|E|) identical entries, where E is the set of edges of the action graph. A straightforward corollary of Nash’s [1951] proof is that any k-symmetric 87 AGG-0/ has at least one k-symmetric Nash equilibrium. For each equivalence class ℓ of the players let Σℓ∗ denote the set of symmetric strategy profiles for Nℓ , and let Aℓ denote the set of actions of a player in Nℓ . Relying on similar arguments as above, we can adapt the Govindan-Wilson algorithm to find k-symmetric equilibria in k-symmetric AGG-0s. / The bottleneck is the computation of the Jacobian of the function V : ∏ℓ Σℓ∗ → R∑ℓ |A | , whose (ℓ, α )-th entry is the utility of a player in Nℓ ℓ playing action α , while the others play according to the given k-symmetric strategy profile (σ∗1 , . . . , σ∗k ). The entry at row ℓ, α and column ℓ′ , α ′ of the Jacobian matrix is equal to (|Nℓ′ | − 1ℓ=ℓ′ ) multiplied by the expected utility of a player in Nℓ choosing action α , when another player in Nℓ′ is choosing action α ′ and the others play according to the given k-symmetric strategy profile. Such expected utilities can be efficiently computed using the techniques discussed in Section 3.4.1. 3.5.3 Computing a Nash Equilibrium: The Simplicial Subdivision Algorithm Another algorithm for computing a sample Nash equilibrium is van der Laan, Talman & van der Heyden’s [1987] simplicial subdivision algorithm. Recall from Section 2.2.1 that one of the bottlenecks is the computation of labels of a given subsimplex in a simplicial subdivision of Σ, which in turn depends on computation of expected utilities under mixed strategy profiles. The GAMBIT package [McKelvey et al., 2006] provides an implementation of the simplicial subdivision algorithm for the normal form. We adapted this code into a black-box implementation that allows one to plug in representation-specific subroutines for expected utility computation. Combining this with an implementation of our AGG-based Algorithm 2 is then sufficient for an exponential speedup compared to the normal-form-based implementation of the simplicial subdivision algorithm. An advantage of the black-box implementation is that this is useful for other representations besides AGGs; e.g., in Chapter 6 we are able to use this for computing sample Bayes-Nash equilibria for Bayesian Action-Graph Games. 88 3.5.4 Computing a Correlated Equilibrium In Section 2.2.7 we gave an overview of the literature on the computation of a sample correlated equilibrium. In summary, Papadimitriou and Roughgarden [2008] proposed a polynomial-time algorithm for computing a sample correlated equilibrium given a game representation with polynomial type and a polynomial-time subroutine for computing expected utility under mixed strategy profiles. Recently, Stein et al. [2010] showed that Papadimitriou and Roughgarden’s algorithm can fail to find an exact correlated equilibrium, and presented a slight modification of the algorithm that efficiently computes an ε -correlated equilibrium. (An ε -correlated equilibrium is an approximation of the correlated equilibrium solution concept, where ε measures the extent to which the incentive constraints for correlated equilibrium are violated.) Incorporating this fix, we have the following. Theorem 3.5.2 ([Papadimitriou and Roughgarden, 2008]). If a game representation has polynomial type, and has a polynomial algorithm for computing expected utility, then an ε -correlated equilibrium can be computed in time polynomial in log 1ε and the representation size. In Chapter 7 we present a modified version of Papadimitriou and Roughgarden’s algorithm that is able to compute an exact correlated equilibrium in polynomial time. Theorem 3.5.3 (Restatement of Theorem 7.4.5; also [Jiang and Leyton-Brown, 2011]). If a game representation has polynomial type, and has a polynomial algorithm for computing expected utility, then a correlated equilibrium can be computed in time polynomial in the representation size. The second condition in both theorems involve the computation of expected utility. As a direct corollary of Theorem 3.5.3 and Theorem 3.4.1, there exists a polynomial algorithm for computing an exact correlated equilibrium given an AGG-0. / Corollary 3.5.4. Given a game represented as an AGG-0, / an exact correlated equilibrium can be computed in time polynomial in the size of the AGG-0. / 89 Similarly, for AGG-FNs and AGG-FNAs for which the expected utility problem can be solved in polynomial time (see Theorems 3.4.5 and 3.4.6), correlated equilibria can be computed in polynomial time. 3.6 Experiments Although our theoretical results show that there are significant benefits to working with AGGs, they might leave the reader with two worries. First, the reader might be concerned that while AGGs offer asymptotic computational benefits, they might not be practically useful. Second, even if convinced about the usefulness of AGGs, the reader might want to know the size of problems that can be tackled by the computational tools we have developed so far. We address both of these worries in this section, by reporting on the results of extensive computational experiments. Specifically, we compare the performance of the AGG representation and our AGG-based algorithms against normal-form-based solutions using the (highly optimized) GameTracer package [Blum et al., 2002]. As benchmarks, we used AGG and normalform representations of instances of Coffee Shop games, Job Market games, and symmetric AGG-0s / on random graphs. We compared the representation sizes of AGG and normal-form representations, and compared their performance resulting from using these representations to compute expected utility, to compute Nash equilibria using the Govindan-Wilson algorithm, and to compute Nash equilibria using the simplicial subdivision algorithm. Finally, we show how sample equilibria of these games can be visualized on action graphs. 3.6.1 Software Implementation and Experimental Setup We implemented our algorithms in a freely-available software package, in order to make it easy for other researchers to use AGGs to model problems of interest. Our software is capable of: • reading in a description of an AGG; • computing expected utility and Jacobian given mixed strategy profile; • computing Nash equilibria by adapting GameTracer’s [Blum et al., 2002] implementation of Govindan and Wilson’s [2003] global Newton method; 90 and • computing Nash equilibria by adapting GAMBIT’s [McKelvey et al., 2006] implementation of the simplicial subdivision algorithm [van der Laan et al., 1987]. We extended GAMUT [Nudelman et al., 2004], a suite of game instance generators, by implementing generators of instances of AGGs including Ice Cream Vendor games (Example 3.2.5), Coffee Shop games (Example 3.2.7), Job Market games (Example 3.3.1) and symmetric AGG-0s / on a random action graph with random payoffs. Finally, with Damien Bargiacchi, we also developed a graphical user interface for creating and editing AGGs. More details on these as well as software implementations of other algorithms from this thesis are given in Appendix A. All of our software is freely available at http://agg.cs.ubc.ca. When using Coffee Shop games in our experiments, we set payoffs randomly in order to test on a wide set of utility functions. For the visualization of equilibria in Section 3.6.7 we set the Coffee Shop game utility functions to be uα (c(α ), c(p′α ), c(p′′α )) = 20 − [c(α )]2 − c(p′α ) − log(c(p′′α ) + 1), where p′α is the function node representing the number of players choosing adjacent locations and p′′α is the function node representing the number of players choosing other locations. When using Job Market games in our experiments, we set the utility functions to be uα (c(α ) ) = Rα − Kα , c(α ) + ∑α ′ ∈ν (α )−{α } 0.1c(α ′ ) with Rα set to 2, 4, 6, 8, 10 and Kα set to 1, 2, 3, 4, 5 for the five levels from high school to PhD. When using Ice Cream Vendor games for the visualization of equilibria in Section 3.6.7 we set the utilities so that for a player i choosing action α , each vendor choosing a location α ′ ∈ ν (α ) contributes w f wl utility to i. w f is -1 when α ′ has the same food type as α , and 0.8 otherwise. wl is 1 when α ′ and α correspond to the same location, and 0.6 when they correspond to different (but neighboring) locations. In other words, there is a negative effect from players choosing the same 91 food type, and a weaker positive effect from players choosing a different food type. Furthermore, effects from neighboring locations are weaker than effects from the same location. All our experiments were performed using a computer cluster consisting of 55 machines with dual Intel Xeon 3.2GHz CPUs, 2MB cache and 2GB RAM, running Suse Linux 10.1. 3.6.2 Representation Size First, we compared the representation sizes of AGG-FNs and their induced normal forms. For each game instance we counted the number of payoff values that needed to be stored. We first looked at 5 × 5 block Coffee Shop games, varying the number of players. Figure 3.9 (left) has a log-scale plot of the number of payoff values in each representation versus the number of players. The normal form representation grew exponentially with respect to the number of players, and quickly became impractical. The size of the AGG representation grew polynomially with respect to n. As we can see from Figure 3.9 (right), even for a game instance with 80 players, the AGG-FN representation stored only about 2 million numbers. In contrast, the corresponding normal form representation would have had to store 1.2 × 10115 numbers. We then fixed the number of players at 4 and varied the number of actions; for ease of comparison we fixed the number of columns at 5 and only changed the number of rows. Recall from Section 3.2.2 that the representation size of Coffee Shop games—expressed both as AGGs and in the normal form—depends only on the number of players and number of actions, but not on the shape of the region. (Recall that the number of actions is B + 1, where B is the total number of blocks.) Figure 3.9 (left) shows a log-scale plot of the number of payoff values versus the number of actions, and Figure 3.9 (right) gives a plot for just the AGG-FN representation. The size of the AGG representation grew linearly with the number of rows, whereas the size of the normal form representation grew like a higherorder polynomial. For a Coffee Shop game with 4 players on an 80 × 5 grid, the AGG-FN representation stores only about 8000 numbers, whereas the normal form 92 10000000 10000000 1000000 payoffs ffs stored payoffs stored 100000000 1000000 100000 10000 1000 100 AGG NF 10 100000 10000 1000 100 10 1 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 6 number of players payoffs stored payoffs stored AGG NF 10000000 1000000 100000 10000 1000 100 00001 16 26 36 46 22 30 38 46 54 62 70 78 number of players 1000000000 100000000 14 56 66 76 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 91 number of actions 121 151 181 211 241 271 301 331 361 391 number of actions Figure 3.9: Representation sizes of coffee shop games. Top left: 5 × 5 grid with 3 to 16 players (log scale). Top right: AGG only, 5 × 5 grid with up to 80 players (log scale). Bottom left: 4-player r × 5 grid, r varying from 3 to 15 (log scale). Bottom right: AGG only, up to 80 rows. representation would have to store 1.0 × 1011 numbers. We also tested on Job Market games from Example 3.3.1, which have 13 actions. We varied the number of players from 3 to 24. The results are similar, as shown in Figure 3.11 (left). This is consistent with our theoretical observation that the sizes of normal form representations grow exponentially in n while the sizes of AGG representations grow polynomially in n. 3.6.3 Expected Utility Computation We tested the performance of our dynamic programming algorithm for computing expected utilities in AGG-FNs against GameTracer’s normal-form-based algorithm for computing expected utilities. For each game instance, we generated 1000 random strategy profiles with full support, and measured the CPU (user) time spent computing Vann (σ−n ) under these strategy profiles. Then we divided this measurement by 1000 to obtain the average CPU time. We first looked at Coffee Shop games of different sizes. We fixed the size of blocks at 5 × 5 and varied the number of players. Figure 3.10 shows plots of the results. For very small games the normal-form-based algorithm is faster due 93 to its smaller bookkeeping overhead; as the number of players grows larger, our AGG-based algorithm’s running time grows polynomially, while the normal-formbased algorithm scales exponentially. For more than five players, we were not able to store the normal form representation in memory. Meanwhile, our AGG-based algorithm scaled to much larger numbers of players, averaging about a second to compute an expected utility for an 80-player Coffee Shop game. Next, we fixed the number of players at 4 and the number of columns at 5, and varied the number of rows. Our algorithm’s running time grew roughly linearly with the number of rows, while the normal-form-based algorithm grew like a higher-order polynomial. This was consistent with our theoretical observation that our algorithm takes O(n|A | + n4 ) time for this class of games while normal-formbased algorithms take O(|A |n−1 ) time. We also considered strategy profiles having partial support. While ensuring that each player’s support included at least one action, we generated strategy profiles with each action included in the support with probability 0.4. GameTracer took about 60% of its full-support running times to compute expected utilities for the Coffee Shop game instances mentioned above, while our AGG-based algorithm required about 20% of its full-support running times. We also tested on Job Market games, varying the numbers of players. The results are shown in Figure 3.11 (right). The normal-form-based implementation ran out of memory for more than 6 players, while the AGG-based implementation averaged about a quarter of a second to compute expected utility in a 24-player game. 3.6.4 Computing Payoff Jacobians We ran similar experiments to investigate the computation of payoff Jacobians. As discussed in Section 3.5.2, the entries of a Jacobian can be formulated as expected payoffs, so a Jacobian can be computed by doing an expected payoff computation for each of its entries. In Section 3.5.2 we discussed methods that exploit the structure of the Jacobian to further speed up the computation. GameTracer’s normalform-based implementation also exploits the structure of the Jacobian by reusing partial results of expected payoff computations. When comparing our AGG-based 94 1 1 0.1 CPU time (s) s) CPU time (s) 0.1 0.01 0.001 0.0001 AGG NF 0.01 0.001 0.0001 0 00001 0.00001 3 4 5 6 7 8 9 6 10 11 12 13 14 15 16 14 number of players 30 38 46 54 62 70 78 0.0007 0.1 AGG NF 0.0006 CPU time (s) 0.01 CPU time (s) 22 number of players 0.001 0.0001 0.00001 0.0005 0.0004 0.0003 0.0002 0.0001 0 16 26 36 46 56 66 76 91 121 151 181 211 241 271 301 331 361 391 number of actions number of actions Figure 3.10: Running times for payoff computation in the Coffee Shop game. Top left: 5 × 5 grid with 3 to 16 players. Top right: AGG only, 5 × 5 grid with up to 80 players. Bottom left: 4-player r × 5 grid, r varying from 3 to 15. Bottom right: AGG only, up to 80 rows. 100000000 1 10000000 0.1 CPU time (s) payoffs stored 1000000 100000 10000 1000 100 0.01 0.001 0.0001 AGG NF AGG NF 10 0 00001 0.00001 1 3 6 9 12 15 18 21 4 4 24 2 6 8 10 12 14 16 18 20 22 24 number of players number of players Figure 3.11: Job Market games, varying numbers of players. Left: comparing representation sizes. Right: running times for computing 1000 expected utilities. 00001 Jacobian algorithm (as described in Section 3.5.2) to GameTracer’s implementation, we observed results very similar to those for computing expected payoffs: our implementation scaled polynomially in n while GameTracer scaled exponentially in n. We instead focus on the question of how much speedup the methods in Section 3.5.2 provided, by comparing our algorithm in Section 3.5.2 against the algorithm that computes expected payoffs (using our AGG-based algorithm described in Section 3.4) for each of the Jacobian’s entries. We tested on Coffee Shop games on a 5 × 5 grid with 3 to 10 players, as well as Coffee Shop games with 4 95 players, 5 columns and varying numbers of rows. For each instance of the game we randomly generated 100 strategy profiles with partial support. For each of these game instances, our algorithm as described in Section 3.5.2 was consistently about 50 times faster than computing expected payoffs for each of the Jacobian’s entries. This confirms that the methods discussed in Section 3.5.2 provide significant speedup for computing payoff Jacobians. 3.6.5 Finding a Nash Equilibrium Using Govindan-Wilson Now we show experimentally that the speedup we achieved for computing Jacobians using the AGG representation led to a speedup in the Govindan-Wilson algorithm. We compared two versions of the Govindan-Wilson algorithm: one is the implementation in GameTracer, where the Jacobian computation is based on the normal-form representation; the other is identical to the GameTracer implementation, except that the Jacobians are computed using our algorithm for the AGG representation. Both techniques compute the Jacobians exactly. As a result, given an initial perturbation to the original game, these two implementations follow the same path and return exactly the same Nash equilibrium. Again, we tested the two algorithms on Coffee Shop games of varying sizes: first we fixed the sizes of blocks at 4 × 4 and varied the number of players; then we fixed the number of players at 4 and number of columns at 4 and varied the number of rows. For each game instance, we randomly generated 10 initial perturbation vectors, and for each initial perturbation we ran the two versions of the Govindan-Wilson algorithm. Although the algorithm can (sometimes) find more than one equilibrium, we stopped both versions of the algorithm after one equilibrium was found. Since the running time of the Govindan-Wilson algorithm is very sensitive to the initial perturbation, for each game instance the running times with different initial perturbations had large variance. To control for this, for each initial perturbation we looked at the ratio of running times between the normal-form implementation and the AGG implementation (i.e., a ratio greater than 1 means the AGG implementation ran more quickly than the normal form implementation). We present the results in Figure 3.12 (left). We see that as the size of the games grew (either in the number of players or in the number of actions), the speedup of 96 the AGG implementation over that of the normal-form implementation increased. The normal-form implementation ran out of memory for game instances with more than 5 players, preventing us from reporting ratios above n = 5. Thus, we ran the AGG-based implementation alone on game instances with larger numbers of players, giving the algorithm a one-day cutoff time. As shown by the log-scale boxplot of CPU times in Figure 3.12 (top right), for game instances with up to 12 players, the algorithm terminated within one day for most initial perturbations. A normal form representation of such a game would have needed to store 7.0× 1015 numbers. Figure 3.12 (bottom right) shows a boxplot of the CPU times for the AGG-based implementation, varying the number of actions while fixing the number of players at 4. For game instances with up to 49 actions (a 4 × 12 grid plus one action for not entering the market), the algorithm terminated within an hour. We also tested on Job Market games with varying numbers of players. The results are shown in Figure 3.13. For the game instance with 6 players, the AGGbased implementation was about 100 times faster than the normal-form-based implementation. While the normal-form-based implementation ran out of memory for Job Market games with more than 6 players, the AGG-based implementation was able to solve games with 16 players in an average of 24 minutes. 3.6.6 Finding a Nash Equilibrium Using Simplicial Subdivision As discussed in Section 3.5.3, we can speed up the normal-form-based simplicial subdivision algorithm by replacing the subroutine that computes expected utility by our AGG-based algorithm. We have done so to GAMBIT’s implementation of simplicial subdivision. As with the Govindan-Wilson algorithm, from a given starting point both the original version of simplicial subdivision and our AGG version follow a deterministic path to determine exactly the same equilibrium. Thus, all performance differences are due to the choice of representation. We compared the performance of AGG-based simplicial subdivision against normal-form-based simplicial subdivision on instances of Coffee Shop games as well as instances of randomly-generated symmetric AGG-0s / on small world graphs. We always started from the mixed strategy profile in which each player gives equal probability to each of her actions. 97 100000 25 CPU time in seconds ratio of NF and AGG times 30 20 15 10 5 10000 1000 100 10 0 3 4 number of players 5 3 4 5 6 7 8 9 10 11 12 number of players CPU time in seconds ratio of NF and AGG times 10000 7 6 5 4 3 2 1000 100 10 1 1 0.1 13 17 21 25 29 33 37 41 45 49 number of actions 13 17 21 25 29 33 37 41 45 49 number of actions Figure 3.12: Govindan-Wilson algorithm; Coffee Shop game. Top row: 4 × 4 grid, varying number of players. Bottom row: 4-player r × 4 grid, r varying from 3 to 12. For each row, the left figure shows ratio of running times; the right figure shows logscale plot of CPU times for the AGG-based implementation. The dashed horizontal line indicates the one day cutoff time. We first considered instances of Coffee Shop games with 4 rows, 4 columns and varying numbers of players. For each game size we generated 10 instances with random payoffs. Figure 3.14 (left) gives a boxplot of the ratio of running times between the two implementations. The AGG-based implementation was about 3 times faster for the 3-player instances and about 30 times faster for the 4-player instances. We also tested on Coffee Shop games with 3 players, 3 columns and numbers of rows varying from 4 to 7, again generating 10 instances with random payoffs at each size. Figure 3.14 (right) gives a boxplot of the ratio of running times. As expected, the AGG-based implementation was faster and the gap in performance widened as games grew. We then investigated symmetric AGG-0s / on randomly generated small world 98 10000 120 CPU time in seconds ratio of NF and AGG times 140 100 80 60 40 20 1000 100 10 1 0.1 0 3 4 5 number of players 6 3 4 5 6 7 8 9 10 11 12 13 14 15 16 number of players Figure 3.13: Govindan-Wilson algorithm; Job Market games, varying numbers of players. Left: ratios of running times. Right: logscale plot of CPU times for the AGG-based implementation. 5 ratio of NF and AGG times ratio of NF and AGG times 30 25 20 15 10 5 4.5 4 3.5 3 2.5 2 3 4 13 number of players 16 19 number of actions 22 Figure 3.14: Ratios of running times of simplicial subdivision algorithms on Coffee Shop games. Left: 4 × 4 grid with 3 to 4 players. Right: 3player r × 3 grid, r varying from 4 to 7. graphs with random payoffs. The small world graphs were generated using GAMUT’s implementation with parameters K = 1 and p = 0.5. For each game size we generated 10 instances. We first fixed the number of action nodes at 5 and varied the number of players. Results are shown in Figure 3.15 (top row). While there was large variance in the absolute running times across different instances, the ratios of running times between normal-form-based and AGG-based implementations showed a clear increasing trend as the number of players increased. The normal-form-based implementation ran out of memory for instances with more than 5 players. Meanwhile, we ran the AGG-based implementation on larger instances with a one-day cutoff time. As shown by the boxplot, the AGG-based implementation solved most 99 100000 6 10000 CPU Time in seconds ratio of NF and AGG running times 7 5 4 3 2 100 10 1 0.1 1 3 4 number of players 5 3 40 4 5 6 7 number of players 8 10000 CPU Time in seconds ratio of NF and AGG running times 1000 30 20 10 1000 100 10 1 0.1 0 4 5 6 7 8 9 10 11 12 13 14 15 16 number of actions 4 5 6 7 8 9 10 11 12 13 14 15 16 number of actions Figure 3.15: Simplicial subdivision algorithm; symmetric AGG-0s / on small world graphs. Top row: 5 actions, varying number of players. Bottom row: 4 players, varying number of actions. The left figures show ratios of running times; the right figures show logscale plots of CPU times for the AGG-based implementation. The dashed horizontal line indicates the one day cutoff time. instances with up to 8 players within 24 hours. We then fixed the number of players at 4 and varied the number of action nodes from 4 to 16. Results are shown in Figure 3.15 (bottom row). Again, while the actual running times on different instances varied substantially, the ratios of running times showed a clear increasing trend as the number of actions increased. The AGG-based implementation was able to solve a 16-action instance in an average of about 3 minutes, while the normal-form-based implementation averaged about 2 hours. 3.6.7 Visualizing Equilibria on the Action Graph Besides facilitating representation and computation, the action graph can also be used to visualize strategy profiles in a natural way. A strategy profile σ (e.g., a Nash 100 Figure 3.16: Visualization of a Nash equilibrium of a 16-player Coffee Shop game on a 4 × 4 grid. The function nodes and the edges of the action graph are not shown. The action node at the bottom corresponds to not entering the market. equilibrium) can be visualized on the action graph by displaying the expected numbers of players that choose each of the actions. We call such a tuple the expected configuration under σ . This can be easily computed given σ : for each action node α , we sum the probabilities of playing α , i.e. E[c(α )] = ∑i∈N σi (α ) where σi (α ) is 0 when α ∈ Ai . When the strategy profile consists of pure strategies, the result is simply the corresponding configuration. The expected configuration often has natural interpretations. For example in Coffee Shop games and other scenarios where actions correspond to location choices, an expected configuration can be seen as a density map describing expected player locations. We illustrate using a 16-player Coffee Shop game on a 4 × 4 grid. We ran the (AGG-based) Govindan-Wilson algorithm, finding a Nash equilibrium in 77 seconds. The expected configuration of this (pure strategy) equilibrium is visualized in Figure 3.16. We also examined a Job Market game with 20 players. A normal form representation of this game would have needed to store 9.4 × 10134 numbers. We ran the AGG-based Govindan-Wilson algorithm, finding a Nash equilibrium in 860 sec101 Figure 3.17: Visualization of a Nash equilibrium of a Job Market game with 20 players. Left: expected configuration of the equilibrium. Right: two mixed equilibrium strategies. onds. The expected configuration of this equilibrium is visualized in Figure 3.17 (left). Note that the equilibrium expected configuration on some of the nodes are non-integer values, as a result of mixed strategies by some of the players. We also visualize two players’ mixed equilibrium strategies in Figure 3.17 (right). Finally, we examined an Ice Cream Vendor game (Example 3.2.5) with 4 locations, 6 ice cream vendors, 6 strawberry vendors, and 4 west-side vendors. The Govindan-Wilson algorithm found an equilibrium in 9 seconds. The expected configuration of this (pure strategy) equilibrium is visualized in Figure 3.18. Observe that the west side is relatively denser due to the west-side vendors. The locations at the east and west ends were chosen relatively more often than the middle locations, because the ends have fewer neighbors and thus experience less competition. 3.7 Conclusions We proposed action-graph games (AGGs), a fully expressive game representation that can compactly express utility functions with structure such as context-specific independence and anonymity. We also extended the basic AGG representation by 102 Figure 3.18: Visualization of a Nash equilibrium of an Ice Cream Vendor game. introducing function nodes and additive utility functions, allowing us to compactly represent a wider range of structured utility functions. We showed that AGGs can efficiently represent games from many previously studied compact classes including graphical games, symmetric games, anonymous games, and congestion games. We presented a polynomial-time algorithm for computing expected utilities in AGG-0s / and contribution-independent AGG-FNs. For symmetric and ksymmetric AGG-0s, / we gave more efficient, specialized algorithms for computing expected utilities under symmetric and k-symmetric strategy profiles respectively. We also showed how to use these algorithms to achieve exponential speedups of existing methods for computing a sample Nash equilibrium and a sample correlated equilibrium. We showed experimentally that using AGGs allows us to model and analyze dramatically larger games than can be addressed with the normal-form representation. In several later chapters of this thesis we present our efforts to extend and generalize our AGG framework. In Chapter 4 we consider the problem of computing PSNE. In Chapter 6 we propose Bayesian action-graph games (BAGGs) for representing Bayesian games, and in Chapter 5 we propose temporal action-graph games (TAGGs) for representing imperfect-information dynamic games. 103 Chapter 4 Computing Pure-strategy Nash Equilibria in Action-Graph Games 4.1 Introduction In this chapter, we analyze the problem of computing pure-strategy Nash equilibria (PSNE) in AGGs. Recall from Section 2.2.6 that PSNEs do not always exist in a game. We focus on the problems of deciding if a PSNE exists, and of finding a PSNE, and later extend our analysis to the problem of computing a PSNE with optimal social welfare. The existence problem for AGGs is known to be NP- complete, even for symmetric AGG-0s / with bounded in-degrees. Our goal in this chapter is to identify classes of AGGs for which this problem is tractable. We propose a dynamic programming approach and show that if the AGG-0/ is symmetric and the action graph has bounded treewidth, our algorithm determines the existence of pure equilibria in polynomial time. We then extend our approach beyond symmetric AGG-0s. / 1 1 This chapter is based on joint work with Kevin Leyton-Brown. Our earlier publication [Jiang and Leyton-Brown, 2007a] was restricted to the case of symmetric AGG-0s, / and furthermore the proposed algorithm contained an error. In the current chapter we describe the corrected algorithm for symmetric AGGs, and furthermore extend the algorithm to certain classes of asymmetric AGGs. 104 We give a brief overview of our approach, and contrast it with some of the related literature mentioned in Section 2.2.6. Recall from Definition 2.2.5 that a PSNE is a pure-strategy profile satisfying certain incentive constraints. For symmetric AGGs, we can cast the problem in terms of configurations and constraints on configurations. With the graphical structure of AGGs, a natural idea is to construct global solutions (i.e., configurations corresponding to PSNE) from partial solutions, which are configurations over a subset of action nodes satisfying certain local constraints on the corresponding subgraph of the action graph. One difficulty when combining partial solutions from subgraphs is that of inconsistency. For the PSNE problem on graphical games, Gottlob et al. [2005] and Daskalakis and Papadimitriou [2006] showed that an effective technique for dealing with inconsistency is tree decomposition (and the related concept of hypertree decomposition). Roughly, a tree decomposition [Robertson and Seymour, 1986] of a graph consists of a family of overlapping subsets of vertices of the graph, and a tree structure with these subsets as nodes, satisfying certain properties such that algorithms for trees can be adapted to work on the tree decomposition, with running time exponential only in the tree decomposition’s width (which measures the size of the largest subset). The treewidth of a graph is defined to be the width of the best tree decomposition for that graph. As a result, many NP-hard problems on graphs can be solved in polynomial time for graphs with bounded treewidth (see e.g., the recent survey by Bodlaender [2007]). For graphical games on bounded-treewidth graphs, it is sufficient to combine partial solutions from the leaves to the root of the tree decomposition while maintaining consistency across adjacent subsets, resulting in a polynomial-time algorithm for PSNE [Daskalakis and Papadimitriou, 2006]. However, whereas in graphical games the incentive constraints can be defined locally at each neighborhood, for AGGs we face an additional difficulty, because an agent could profitably deviate from playing an action in one part of the action graph to another. That is, the incentive constraints for PSNE in an AGG cannot be entirely captured by local constraints on subgraphs of the action graph. A simplified version of this difficulty was successfully dealt with in Ieong et al. [2005]’s polynomial-time algorithm for finding PSNE in singleton congestion games, which correspond to symmetric AGGs with only self edges. Their dynamic-programming algorithm is able to check against such deviations without having to store the exponential-sized 105 set of partial solutions, by maintaining sufficient statistics (specifically, bounds on utilities) that summarize the partial solutions compactly. Recall from Chapter3 that AGGs unify these existing representations; it turns out that our algorithm for AGGs also generalizes the existing algorithms for graphical games and singleton congestion games. Specifically, we define restricted games as AGGs played on subgraphs, equilibria of which satisfy the local incentive constraints; we then use tree-decomposition techniques to divide the action graph into subgraphs, allowing us construct equilibria of the game from equilibria of restricted games while maintaining consistency; and we use sufficient statistics (corresponding to the concept of characteristics [e.g., Bodlaender, 2007]) to check against deviations across partial solutions. Compared to the case of singleton congestion games, the edges (i.e., utility dependence) between action nodes in AGGs complicates the design of the sufficient static. Nevertheless we are able to overcome this technical challenge by further exploiting properties of tree decompositions. 4.2 Preliminaries 4.2.1 AGGs We refer readers to Chapter 3 for definitions of AGG-0s, / symmetric AGG-0s / and k-symmetric AGG-0s. / Recall that I is the maximum in-degree of the action graph. For an AGG-0/ Γ = (N, A, G, u), let ||Γ|| denote the number of utility values the representation stores. Recall from Proposition 3.2.6 that this number is less or )! equal to |A | (n−1+I / is symmetric. Let (n−1)!I ! , with equality holding when the AGG-0 U be the set of distinct utilities of the game Γ. Whereas in Chapter 3 we only need to consider configurations restricted to the neighborhood of some action node, in this chapter we will need to talk about configurations over arbitrary sets of action nodes. For a configuration c and a set of actions X ⊂ A , let c[X ] denote the restriction of c over X , i.e. c[X ] = (c[α ])α ∈X , where c[α ] is the number of players choosing action α . Let C [X ] denote the set of restricted configurations over X . Given an action graph G = (A , E) and a set of actions X ⊂ A , let GX be the action graph restricted to the action nodes X . Formally, GX ≡ (X , {(α , α ′ ) ∈ E|α , α ′ ∈ X }). For a set of actions X ⊂ A , 106 define ν (X ) ≡ {α ∈ A \ X |∃x ∈ X such that (α , x) ∈ E}: the set of actions not in X that are neighbors of some action in X . Also define X ≡ A \ X to be the complement of X . Then ν (X) ≡ {x ∈ X |∃α ∈ A \ X such that (x, α ) ∈ E}, the set of actions in X that are neighbors of some action not in X . Define τ (X ) ≡ {x ∈ X |∃α ∈ A \ X such that (x, α ) ∈ E or (α , x) ∈ E}. Given a configuration c[X ], let #c[X ] ≡ ∑x∈X c[x]. 4.2.2 Complexity of Computing PSNE Consider the problem determining whether a PSNE exists in a given AGG-0. / Recall from Section 2.2.6 that the obvious algorithm of checking every possible action profile runs in linear time in the normal form representation of the game. However, since AGGs can be exponentially more compact than the normal form, the running time of this algorithm is worst-case exponential in the size of the AGG. Indeed, the PSNE problem becomes NP-complete when the input is an AGG-0. / Proposition 4.2.1. The problem of determining whether a pure Nash equilibrium exists in an AGG-0/ is NP-complete. Proof Sketch. It is straightforward to see that the problem is in NP, because given a pure strategy profile it takes polynomial time to verify whether that profile is a Nash equilibrium. NP-hardness follows from the fact that any graphical game can be transformed (in polynomial time) to an equivalent AGG-0/ having the same space complexity, and the fact that the problem of determining the existence of pure equilibrium in graphical games is NP-hard [Daskalakis and Papadimitriou, 2006, Gottlob et al., 2005]. Perhaps more interestingly, the problem remains hard even if we restrict the games to be symmetric, in which case we cannot leverage existing results about graphical games. The following theorem was proved independently by Vincent Conitzer (personal communication) and Daskalakis et al. [2009]. Theorem 4.2.2 (Conitzer [pers. comm., 2004], Daskalakis et al. [2009]). The problem of determining whether a pure Nash equilibrium exists in a symmetric AGG is NP-complete, even when the in-degree of the action graph is at most 3. 107 4.3 Computing PSNE in AGGs with Bounded Number of Action Nodes Now we look at classes of AGGs in which |A |, the number of action nodes, is bounded by some constant. We show that in this case, the problem of finding pure equilibria can be solved in polynomial time. While this is a very restricted class of AGGs, we will use these results as building blocks for our dynamic programming approach for solving more complex AGGs. We first look at symmetric AGGs. We restate the following well-known property of symmetric games [e.g., Brandt et al., 2009] in the language of AGGs: Lemma 4.3.1. Suppose Γ is a symmetric AGG. If a and α ′ induce the same configuration, then a is a PSNE of Γ iff α ′ is a PSNE of Γ. This is because the configuration determines the utilities, and since in a symmetric AGG any player can choose any action in A , the configuration determines whether the incentive constraints for PSNE are satisfied. Note that this argument requires the symmetry property; in particular, the lemma no longer holds for asymmetric AGGs. Lemma 4.3.1 allows us to consider only the configurations instead of all the pure strategy profiles. We say a configuration c is a PSNE of Γ if its corresponding pure strategy profiles are PSNE. The following straightforward lemma (a specialization of known facts about symmetric games [e.g., Brandt et al., 2009]) gives the incentive constraints for PSNE in terms of configurations. Lemma 4.3.2. A configuration c∗ is a PSNE of a symmetric game iff for all α , α ′ ∈ A , if c∗ [α ] > 0, uα (c∗ ) ≥ uα (c∗α →α ′ ) ′ (4.3.1) where c∗α →α ′ is the resulting configuration when one agent playing α in c∗ deviates to α ′ . Formally, for all x ∈ A , ∗ c [x] − 1 if x = α ∗ cα →α ′ [x] = c∗ [x] + 1 if x = α ′ c∗ [x] otherwise 108 Given a configuration c, we can check whether it is a pure equilibrium in polynomial time. Theorem 4.3.3. The problem of determining whether a pure Nash equilibrium exists in a symmetric AGG with bounded |A | is in P. Proof. A polynomial algorithm is to check all configurations. Since |A | is bounded, the number of configurations n+|A |−1 |A |−1 = O(n|A |−1 ) is polynomial. This can easily be extended to k-symmetric AGGs. Definition 4.3.4. Suppose Γ is a k-symmetric AGG in which the players are partitioned into equivalence classes {N1 , . . . , Nk } with the corresponding distinct action sets {A 1 , . . . , A k }. Then given a pure strategy profile a, its corresponding k-configuration is a tuple (cℓ )1≤ℓ≤k where cℓ is the configuration over A ℓ induced by the players in Nℓ . In other words, for all α ∈ A ℓ , cℓ [α ] = |{i ∈ Nℓ |ai = α }|. Just as configurations capture all relevant information about pure strategy profiles in symmetric games, k-configurations capture all relevant information about pure strategy profiles in k-symmetric games. Thus we can determine the existence of pure equilibrium by checking all k-configurations. When k is bounded by a constant, there are polynomial number of k-configurations. Lemma 4.3.5. The problem of determining whether a pure Nash equilibrium exists in a k-symmetric AGG with bounded |A | and bounded k is in P. Proof. A polynomial algorithm is to check all k-configurations. Since |A | is l |Nl |+|A l |−1 = O(|Nl ||A |−1 ). |A l |−1 O(nk(|A |−1) ), which is polyno- bounded, for each l ∈ {1, . . . , k} the number of distinct cl is Therefore the number of distinct k-configurations is mial when k is bounded. For each k-configuration, checking whether it forms a Nash equilibrium takes polynomial time. Therefore the algorithm runs in polynomial time. Now consider the full class of AGGs with bounded |A |. Interestingly, our problem remains easy to solve. Theorem 4.3.6. The problem of determining whether a pure Nash equilibrium exists in an arbitrary AGG with bounded |A | is in P. 109 Proof. Any AGG Γ is k-symmetric by definition, where k is the number of distinct action sets. Since Ai ⊆ A for all i, the number of distinct nonempty action sets is at most 2|A | − 2. This is bounded, since |A | is bounded by a constant. Thus Γ is k-symmetric with bounded k, and Lemma 4.3.5 applies. 4.4 Computing PSNE in Symmetric AGGs We now consider classes of AGGs in which |A | is not bounded. We first focus on symmetric AGG-0s. / Since in this case all players have the same action set A , we can identify a symmetric AGG-0/ by the tuple n, G = (A , E), u . Whereas enumerating the configurations works well for AGGs with bounded |A |, this approach is less effective in the general case with unbounded |A |: in a symmetric AGG-0, / the number of configurations over A is n+|A |−1 |A |−1 , which is superpolynomial in ||Γ|| when I is bounded. Our approach is to use dynamic programming to construct PSNE of the game from PSNE of games restricted to parts of the action graph. This approach belongs to a large family of tree-decomposition-based dynamic programming algorithms for problems on graphs. In particular, in this section we adapt the standard concepts of partial solutions and characteristics [e.g., Bodlaender, 1997] to the PSNE problem in AGGs. 4.4.1 Restricted Games and Partial Solutions We first introduce the concept of a restricted game on R ⊂ A , which intuitively is the game played by a subset of players when we “restrict” them to the subgraph GR , i.e., require them to choose their actions from R. Of course, the utility functions of this restricted game are not defined until we specify a configuration on ν (R). Definition 4.4.1. Given a symmetric AGG-0/ Γ, a set of actions R ⊂ A , a configuration c[ν (R)] and n′ ≤ n, we define the restricted game Γ(n′ , R, c[ν (R)]) to be a symmetric AGG with n′ players and with GR as the action graph. Each action α ∈ R has the utility function uα |c[ν (R)] , which is the same as uα as defined in Γ except that the configuration of nodes outside R is assigned by c[ν (R)]. Formally, Γ(n′ , R, c[ν (R)]) = n′ , GR , uα |c[ν (R)] α ∈R . 110 T1 T2 T3 T4 T5 T6 T7 T8 T1 T2 T3 T4 T5 T6 T7 T8 B1 B2 B3 B4 B5 B6 B7 B8 B1 B2 B3 B4 B5 B6 B7 B8 Figure 4.1: The road game with m = 8 and the action graph of its AGG representation. Figure 4.2: Restricted game on the rightmost 6 actions. Example 4.4.2. Suppose each of n agents is interested in opening a business, and can choose to locate in any block along either side of a road of length m. Multiple agents can choose the same block. Agent i’s payoff depends on the number of agents who chose the same block as he did, as well as the numbers of agents who chose each of the adjacent blocks of land. This game can be compactly represented as a symmetric AGG, whose action graph is illustrated in Figure 4.1. To specify a restricted game on the rightmost 6 action nodes R = {T6, T7, T8, B6, B7, B8} of the road game of Figure 4.1, we need to specify the number of players on R as well as the configuration over ν (R) = {T5, B5}. This is illustrated in Figure 4.2, with R enclosed by the shaded rectangle and ν (R) in green. Lemma 4.3.1 tells us that we only need to consider configurations instead of strategy profiles. Likewise, for a restricted game on the subgraph X ⊂ A , we only need to consider restricted configurations c[X ]. The following lemma is straightforward. Lemma 4.4.3. If c∗ is a pure equilibrium of Γ, then c∗ [X ] is a pure equilibrium of the restricted game Γ(#c∗ [X ], X , c∗ [ν (X )]). We want to use equilibria of restricted games as building blocks to construct equilibria of the entire game. Of course, a restricted game on X ⊂ A is not welldefined until we specify c[ν (X )]. Thus we define a partial solution as a configuration on X ∪ ν (X ) which describes a restricted game on X as well as a pure equilibrium of it. Definition 4.4.4. A partial solution on X ⊆ A is a configuration c[X ∪ ν (X )] such that c[X ] is a pure equilibrium of the restricted game Γ(#c[X ], X , c[ν (X )]). 111 T1 T2 T3 T4 T5 T6 T7 T8 B1 B2 B3 B4 B5 B6 B7 B8 Figure 4.3: A partial solution on the rightmost 6 actions describes the configuration over these 8 actions. For the restricted game in Figure 4.2, the corresponding partial solution on R = {T6, T7, T8, B6, B7, B8} is a configuration over R ∪ ν (R), illustrated in Figure 4.3 as green nodes. We say a partial solution c[X ∪ ν (X )] can be extended if there exists a configuration c∗ such that c∗ is a PSNE of Γ and c∗ [X ∪ ν (X )] = c[X ∪ ν (X )]. 4.4.2 Combining Partial Solutions In order to combine partial solutions to form a partial solution on a larger subgraph, we need to make sure that the result is a valid restricted strategy profile. We say two partial solutions c′ [X ] and c′′ [Y ] are consistent if there exists a configuration c of the AGG-0/ such that c[X ] = c′ [X ] and c[Y ] = c′′ [Y ]. The following lemma shows that it is simple to check whether c[X ] and c′ [Y ] are consistent. Lemma 4.4.5. Given X ,Y ⊆ A , c[X ] is consistent with c′ [Y ] iff 1. for all α ∈ X ∩Y , c[α ] = c′ [α ], and 2. Let n′ = #c[X ] + #c′ [Y \ X ], then n′ ≤ n. Furthermore, if X ∪ Y = A then n′ = n. We omit the straightforward proof. For two configurations c[X ], c′ [Y ] that are consistent with each other, we define c[X ] ∪ c′ [Y ] to be the (unique) configuration on X ∪Y that is consistent with both c[X ] and c′ [Y ]. However, if we simply combine two consistent partial solutions that describe equilibria of restricted games on two disjoint sets X ,Y ∈ A , the result would not 112 necessarily induce an equilibrium of the restricted game on X ∪Y . This is because an agent who was playing an action in X might profitably deviate by playing an action in Y , and vice versa. We could deal with this problem by keeping track of all pure equilibria of each restricted game, and determine case-by-case whether two equilibria can be combined (by checking whether agents could profitably deviate from one restricted game to the other). But as we combine the restricted games to form larger restricted games and eventually the unrestricted game on the entire action graph G, the number of equilibria we would have to store could grow exponentially. 4.4.3 Dynamic Programming via Characteristics Perhaps we don’t need to keep track of all partial solutions. Imagine we had a function ch that summarized them, i.e. it mapped each partial solution to a characteristic from a finite set C which is smaller than the set of partial solutions. For this characteristic function to be useful, it need to be equilibrium-preserving, defined as follows. Definition 4.4.6. For X ⊂ A , a function ch() that maps partial solutions to their characteristics is equilibrium-preserving if for all pairs of partial solutions c[X ] and c′ [X ], if ch(c[X ]) = ch(c′ [X ]) then (c[X ] can be extended) ⇔ (c′ [X ] can be extended). Thus an equilibrium-preserving characteristic function ch() induces a partition of the set of partial solutions into equivalence classes. All partial solutions with the same characteristic behave the same way, so we only need to consider the set of all distinct characteristics. For X ⊂ A , we define CX ⊂ C to be the set of characteristics of partial solutions on X . Formally, CX = {ch(c[X ∪ ν (X )]) | c[X ∪ ν (X )] is a partial solution on X }. Given such a function ch, a dynamic-programming algorithm for determining the existence of PSNE of Γ has the following high-level structure: 1. Construct X = {X1 , . . . , Xm } such that 1≤ j≤m X j =A. 2. For each Xi ∈ X , compute CXi , the set of characteristics of partial solutions on Xi . 3. While |X | ≥ 2: 113 (a) Take X ,Y ∈ X . Remove them from X . (b) Compute CX∪Y from CX and CY . (c) Add X ∪Y to X . 4. Now X has only one member, A . Return TRUE iff CA is not empty. Since a partial solution on A is by definition a pure equilibrium of Γ, there exists a pure equilibrium of Γ if and only if CA is not empty. For this algorithm to run in polynomial time, the function ch() must satisfy the following properties: Property 1: At all times during the algorithm, for all X ∈ X , the size of CX is polynomial. This is necessary since all restricted strategy profiles could potentially be partial solutions, and so CX could potentially be the set of all possible characteristics for X . Property 2: For each of the initial X j , CX j can be computed in polynomial time. Property 3: CX∪Y can be computed from CX and CY in polynomial time. One algorithm having the above structure is Ieong et al. [2005]’s algorithm for computing PSNE in singleton congestion games (corresponding to symmetric AGG-0s / with only self-edges). Given such an AGG-0, / the algorithm starts by partitioning A into sets each containing one action, and combines them in an arbitrary order. Consider two restricted games Γ′ and Γ′′ on two disjoint sets of action nodes X and Y respectively. Observe that in this case, to check consistency between two equilibria of Γ′ and Γ′′ respectively, it is sufficient to check the numbers of players in Γ′ and Γ′′ . Given a restricted game Γ′ on X ⊂ A and an equilibrium c∗ of Γ′ , define the worst current utility WCU(c∗ , Γ′ ) to be the utility of the worst-off player in Γ′ , or ∞ if Γ′ has 0 players. Define the best entrance utility BEU(c∗ , Γ′ ) to be the best payoff a player currently playing an action outside of X can get by playing an action in X , assuming the current players in Γ′ play c∗ . If Γ′ already has all n players, BEU(c∗ , Γ′ ) = −∞. Since all players in a symmetric game are identical, if any player can profitably deviate out of Γ′ , then the worst-off player (with utility WCU(c∗ , Γ′ )) can profitably deviate out of Γ′ ; similarly if an agent can profitably deviate to any action in Γ′ , then she can achieve utility BEU(c∗ , Γ′ ). Therefore, to check whether agents could profitably deviate from Γ′ currently in 114 equilibrium c′ to Γ′′ in equilibrium c′′ , we just need to check whether WCU(c′ , Γ′ ) is greater than BEU(c′′ , Γ′ ). Thus WCU(c′ , Γ′ ) and BEU(c′ , Γ′ ) can be used as sufficient statistics for checking existence of profitable deviations out of and into the restricted game Γ′ , and #c[X ] for checking consistency. The resulting characteristics are equilibrium-preserving, and require less space than keeping track of the partial solutions on X because WCU and BEU are utility values and thus there are at most ||Γ||2 possible pairs. We adapt Ieong et al. [2005]’s characteristic function to general symmetric AGGs. First of all, we now need c[ν (X )] in order to specify restricted games and partial solutions on X . As a result, to check consistency between a partial solution on X and partial solutions on other parts of the graph, we need to keep track of the number of players in X , the configuration over ν (X ), and the configuration over ν (X ). Furthermore, in general action graphs, we may have sets X ,Y ⊂ A such that ν (X ) ∩ Y = 0. / In such cases deviating from an action in ν (X ) ∩ Y to a restricted ′ game Γ on X changes the configuration on ν (X ), which in turn affects the utility functions of Γ′ . In other words, the best utility a player originally playing an action α ∈ X can get by deviating into Γ′ on X with current configuration c∗ is a quantity that depends on (1) whether α is in ν (X ) and (2) if so, α itself. As a result, simply using BEU(c∗ , Γ′ ) and WCU(c∗ , Γ′ ) is no longer sufficient for checking profitable deviations. We thus need more sophisticated sufficient statistics for checking deviations in this case. One approach is to extend our definition of BEU(c∗ , Γ′ ) by making it vector-valued, specifying the best utilities when the deviating player is an outside player and when the player is playing each of the actions in ν (X ). The length of the resulting vector is thus |ν (X )| + 1. Furthermore we could extend WCU(c∗ , Γ′ ) by making it a vector consisting of the worst utility from X \ ν (X ) and from each of the actions in ν (X ). Although it is intuitive, it turns out that this approach yields a polynomial-time algorithm only in the case of symmetric AGG-0s / with bounded treewidth and bounded in-degree. Instead, in this chapter we describe a different approach that yields a polynomialtime algorithm for bounded-treewidth symmetric AGG-0s, / thus eliminating the separate requirement on in-degree. First, we redefine BEU(c∗ , Γ′ ) in terms of devia115 tions from players outside of X ∪ ν (X ). Definition 4.4.7. Given a restricted game Γ′ on X ⊂ A and an equilibrium c∗ of Γ′ , the best entrance utility BEU(c∗ , Γ′ ) is the best payoff an outside player (a player currently playing an action outside of X ∪ ν (X )) can get by playing an action in X , assuming the current players in Γ′ play c∗ . If there are 0 outside players, BEU(c∗ , Γ′ ) = −∞. In order to check deviations into and out of X , we partition X into P and X \ P, and check the corresponding restricted games separately. We will specify P in Section 4.4.4; for now we only require that X ⊇ P ⊇ τ (X ). Recall that τ (X ) are the set of nodes in X with outgoing edges to and/or incoming edges from nodes outside X . Intuitively, P contains all nodes in X that we cannot apply BEU and WCU to. This implies ν (X \ P) ∩ X = 0/ and ν (X) ∩ (X \ P) = 0. / Thus we can use WCU and BEU for restricted games on X \ P as sufficient statistics for checking deviations between X \ P and nodes outside X . The remaining task is to check deviations between P and nodes outside X . We do this by explicitly keeping track of configurations on Q ⊇ P ∪ ν (P). We will exactly specify Q in Section 4.4.4. In other words, we keep track of the partial solutions on P. Note in particular that this provides enough information to specify the corresponding restricted games on P. Finally, since configurations over X \ P will not be referred to by partial solutions on any Y ⊂ A that is disjoint from X , in order to maintain consistency it is sufficient to keep track of the number of players playing in X and the configuration over P ∪ ν (X ), which is a subset of Q. Taking these together, we have the following characteristic function. Lemma 4.4.8. Given X ⊂ A , P ⊆ X such that P ⊇ τ (X ), and Q ⊇ P ∪ ν (P), consider the characteristic function chP,Q that maps a partial solution c[X ∪ ν (X )] to chP,Q (c[X ∪ ν (X )]) = (c[Q], #c[X ], WCU(c[X ′ ], Γ′ ), BEU(c[X ′ ], Γ′ )), where Γ′ = Γ(#c[X ′ ], X ′ , c[ν (X ′ )]) and X ′ = X \ P. Then chP,Q is equilibriumpreserving. Proof. Suppose we have two partial solutions c[X ∪ ν (X )] and c′ [X ∪ ν (X )] such that chP,Q (c[X ∪ ν (X )]) = chP,Q (c′ [X ∪ ν (X )]). Furthermore c[X ∪ ν (X )] can be 116 extended, i.e., there exists a PSNE c∗ of the game such that c∗ [X ∪ ν (X )] = c[X ∪ ν (X )]. We need to show that c′ [X ∪ ν (X )] can be extended. Since c∗ [X ∪ ν (X)] and c[X ∪ ν (X )] are consistent, and since c[X ∪ ν (X )] and c′ [X ∪ ν (X )] have the same characteristic (in particular, the same configuration on ν (X) ∪ ν (X ) and the same number of players in X ), therefore c∗ [X ∪ ν (X)] and c′ [X ∪ ν (X )] are consistent. Consider the configuration c′∗ ≡ c∗ [X ∪ ν (X)] ∪ c′ [X ∪ ν (X )]. We claim that c′∗ is a PSNE of the game (which directly implies that c′ [X ∪ ν (X )] can be extended). To show this, we observe that since c∗ [X ∪ ν (X)] and c′ [X ∪ ν (X )] are already partial solutions on X and X respectively (and are consistent with each other), we only need to make sure there are no profitable deviations between them. We partition X into P and X ′ = X \ P. Since there were no profitable deviations between partial solutions c[P ∪ ν (P)] and c∗ [X ∪ ν (X)], and since c[P ∪ ν (P)] = c′ [P ∪ ν (P)], there are no profitable deviations between partial solutions c′ [P ∪ ν (P)] and c∗ [X ∪ ν (X)]. Suppose there is a profitable deviation from X ′ under partial solution c′ [X ′ ∪ ν (X ′ )] to X under partial solution c∗ [X ∪ ν (X)]. Then there is a profitable deviation from the worst-off player in X ′ under c′ [X ′ ∪ ν (X ′ )]. Since her utility is equal to that of the worst-off player in X ′ under c[X ′ ∪ ν (X ′ )], there must be a profitable deviation from the partial solution c[X ′ ∪ ν (X ′ )] to c∗ [X ∪ ν (X)], a contradiction. A similar argument shows that there is no profitable deviation from X under c∗ [X ∪ ν (X)] to X ′ under c′ [X ′ ∪ ν (X ′ )]. We denote by CXP,Q the set of characteristics on X under the characteristic function chP,Q . For the restricted game in Example 4.4.2, we can use P = {T6, B6} and Q = P ∪ ν (P) = {T5, T6, T7, B5, B6, B7}. These are illustrated in Figure 4.4. The following lemma shows how sets of characteristics from two subsets X ′ and X ′′ of A (with characteristic functions chP′ ,Q′ and chP′′ ,Q′′ respectively) can be combined together. Here we require that X ′ and X ′′ have a limited amount of overlap; specifically, we require that X ′ ∩ X ′′ ⊆ P′ ∪ P′′ . Intuitively, the combination of subsets with such overlap is manageable because (1) we can calculate the total number of players in X ′ ∪ X ′′ from the characteristics because we know the configuration of (and thus the number of players in) X ′ ∩ X ′′ ; and (2) since the configuration of X ′ ∩ X ′′ is already “in equilibrium” with both sides, it is sufficient to check deviations from X ′′ \ X ′ to X ′ \ X ′′ and vice versa. We do this by partitioning 117 T1 T2 T3 T4 T5 T6 T7 T8 B1 B2 B3 B4 B5 B6 B7 B8 Figure 4.4: Characteristic function chP,Q for the rightmost 6 actions with P = {T6, B6} and Q = {T5, T6, T7, B5, B6, B7}. the former into X ′′ \ P′′ and P′′ \ X ′ , and the latter into X ′ \ P′ and P′ \ X ′′ , then checking the resulting set of deviations using information provided by the characteristics. Lemma 4.4.9. Suppose that X , P, Q, X ′, P′ , Q′ , X ′′ , P′′ , Q′′ are subsets of A such that τ (X ) ⊆ P ⊆ X , τ (X ′ ) ⊆ P′ ⊆ X ′ , τ (X ′′ ) ⊆ P′′ ⊆ X ′′ , Q ⊇ P ∪ ν (P), Q′ ⊇ P′ ∪ ν (P′ ), Q′′ ⊇ P′′ ∪ ν (P′′ ), X ′ ∩ X ′′ ⊆ P′ ∪ P′′, and X ′ ∪ X ′′ = X . For all c[Q] ∈ C[Q], integer B ≤ n, and Uc ,Ue ∈ U , the tuple (c[Q], B,Uc ,Ue ) ∈ CXP,Q if and only if there exist c′ [Q′ ], c′′ [Q′′ ], B′ , B′′ , and Uc′ , Uc′′ , Ue′ , and Ue′′ such that ′ ′ 1. (c′ [Q′ ], B′ ,Uc′ ,Ue′ ) ∈ CXP′ ,Q , ′′ ′′ 2. (c′′ [Q′′ ], B′′ ,Uc′′ ,Ue′′ ) ∈ CXP′′ ,Q , 3. c′ [Q′ ] is consistent with c′′ [Q′′ ], 4. c[Q] = c′′′ [Q] where c′′′ = c′ [Q′ ] ∪ c′′ [Q′′ ], 5. B = B′ + B′′ − c′′′ [X ′ ∩ X ′′ ], and if X = A then B = n, 6. Uc′ ≥ Ue′′ and Uc′′ ≥ Ue′ , 7. Uc′ ≥ BEU(c′′ [P′′ \ X ′ ], Γ′′ ), WCU(c′ [P′ \ X ′′ ], Γ′ ) ≥ Ue′′ , Uc′′ ≥ BEU(c′ [P′ \ X ′′ ], Γ′ ), WCU(c′′ [P′′ \X ′ ], Γ′′ ) ≥ Ue′ where Γ′ = Γ(#c′ [P′ \X ′′ ], P′ \X ′′ , c′ [ν (P′ \ X ′′ )]) and Γ′′ = Γ(#c′′ [P′′ \ X ′ ], P′′ \ X ′ , c′′ [ν (P′′ \ X ′ )]), 8. c[P′ ∪ P′′ ] is an equilibrium of Γ(#c[P′ ∪ P′′ ], P′ ∪ P′′ , c′′′ [ν (P′ ∪ P′′ )], 9. Uc = min{Uc′ ,Uc′′ , WCU(c′′′ [Z], ΓZ )} and Ue = max{Ue′ ,Ue′′ , BEU(c′′′ [Z], ΓZ )}, where Z = (P′ ∪ P′′ ) \ P and ΓZ = Γ(#c′′′ [Z], Z, c′′′ [ν (Z)]). 118 Proof Sketch. ⇒ (“only if”) part: Suppose c[X ∪ ν (X )] is a partial solution on X with characteristic (c[Q], B,Uc ,Ue ). Then let c′ [X ′ ∪ ν (X ′ )] = c[X ′ ∪ ν (X ′ )]. It is straightforward to see that c′ [X ′ ] is an equilibrium of the restricted game Γ(#c′ [X ′ ], X ′ , c[ν (X ′ )]). Therefore c′ [X ′ ∪ ν (X ′ )] is a partial solution on X ′ . Similarly, let c′′ [X ′′ ∪ ν (X ′′ )] = c[X ′ ∪ ν (X ′′ )], and the same argument applies. Then it is straightforward to verify that the characteristics of c′ [X ′ ∪ ν (X ′ )] and c′′ [X ′′ ∪ ν (X ′′ )] satisfy the above conditions. ⇐ (“if”) part: Suppose c′ [X ′ ∪ ν (X ′ )] and c′′ [X ′′ ∪ ν (X ′′ )] are partial solutions with characteristics (c′ [Q′ ], B′ ,Uc′ ,Ue′ ) and (c′′ [Q′′ ], B′′ ,Uc′′ ,Ue′′ ) respectively, and there exists c[Q], B,Uc ,Ue such that conditions 3 to 9 are satisfied. Then conditions 3 and 5 together with Lemma 4.4.5 imply that c′ [X ′ ∪ ν (X ′ )] and c′′ [X ′′ ∪ ν (X ′′ )] are consistent. Let c = c′ [X ′ ∪ ν (X ′ )] ∪ c′′ [X ′′ ∪ ν (X ′′ )]. By a similar argument as in the proof of Lemma 4.4.8, conditions 6 to 8 ensure that there are no profitable deviations between the partial solutions c′ [X ′ ∪ ν (X ′ )] and c′′ [X ′′ ∪ ν (X ′′ )], and therefore c[X ] is an equilibrium of the restricted game Γ(B, X , c[ν (X )]). Let Y = X \ P. Then X ′ \ P′ , X ′′ \ P′′ and Z partitions Y . By the definition of worst current utility, WCU(c[Y ], Γ(#c[Y ],Y, c[ν (Y )])) is the minimum of {Uc′ ,Uc′′ , WCU(c′′′ [Z], ΓZ )}, which are the worst current utilities on X ′ \ P, X ′′ \ P′′ and Z respectively. Therefore WCU(c[Y ], Γ(#c[Y ],Y, c[ν (Y )])) = Uc . Similarly BEU(c[X ], Γ(B, X , c[ν (X )])) = Ue . Therefore c[X ∪ ν (X )] is a partial solution with characteristic (c[Q], B,Uc ,Ue ). Lemma 4.4.9 implies that it takes polynomial time to check if two character′ ′ ′′ ′′ istics (c′ [Q′ ], B′ ,Uc′ ,Ue′ ) ∈ CXP′ ,Q and (c′′ [Q′′ ], B′′ , Uc′′ , Ue′′ ) ∈ CXP′′ ,Q are consistent and if there are no profitable deviations between them, and if so to construct a characteristic in CXP,Q for their combined partial solutions. Thus if we iterate over all ′ ′ ′′ ′′ pairs of characteristics in CXP′ ,Q and CXP′′ ,Q respectively, we can construct CXP,Q in ′ ′ ′′ ′′ time polynomial in the sizes of CXP′ ,Q and CXP′′ ,Q . Let us now consider the size of CXP,Q for an arbitrary X ⊆ A . Recall that the WCU and BEU are utility values and thus each has at most |U | ≤ ||Γ|| distinct values. Also #c[X ] ∈ {0, . . . , n} by definition. So the number of distinct characteristics can be much smaller than the number of corresponding partial solutions c[X ∪ ν (X )] when |Q| ≪ |X ∪ ν (X )|. However, since Q ⊇ ν (X ) and |ν (X )| is |X |I 119 A Ù 89:; ?>=< E o 89:; ?>=< A o Ù G ?>=< 89:; D o G ?>=< 89:; B i y G ?>=< 89:; C o Ù G ?>=< 89:; F o G ?>=< 89:; G 89:; ?>=< E ?>=< 89:; 89:; B A ❄ ?>=< ❄❄ ⑧⑧ ❄❄❄ ❄❄ ❄ ⑧ ❄ ⑧ ❄ ⑧⑧ ❄ 89:; ?>=< 89:; ?>=< 89:; ?>=< D C F 89:; ?>=< G Figure 4.6: The primal graph G′ . Figure 4.5: An action graph G. R1 ={A,B} R5 ={D,E} R3 ={C,D} R2 ={B,C} R4 ={C,F} R6 ={F,G} Figure 4.7: Tree decomposition of und(G) X1 ={A,B,C} X5 ={C,D,E} X3 ={B,C,D,E,F} X2 ={A,B,C,D,F} X4 ={B,C,D,F,G} X6 ={C,F,G} Figure 4.8: Tree decomposition of primal graph G′ , satisfying the conditions of Lemma 4.4.11. in the worst case, the number of possible configurations over Q is superpolynomial in ||Γ|| in the worst case. Since CXP,Q could potentially include every distinct tuple (c[Q], B,Uc ,Ue ), the size of CXP,Q is superpolynomial in the worst case. Indeed, Theorem 4.2.2 showed that we will not find a poly-time algorithm for general symmetric AGGs unless P = NP. Nevertheless, we next show that if the action graph G has bounded treewidth, we can combine the restricted games in a way such that the number of configurations |C[Q]| (and thus |CXP,Q |) remains polynomial in ||Γ|| as X grows. 4.4.4 Algorithm for Symmetric AGGs with Bounded Treewidth We first introduce some notation. Given an action graph G = (A , E), define H (G) to be the hypergraph (A , E ) with E = {{α } ∪ ν (α )|α ∈ A }. In other words, for each action α ∈ A , there is a hyperedge containing α and its neighbors. Duplicate hyperedges are removed. Let G′ be the primal graph of the hypergraph H (G). G′ is a undirected graph on the same set of vertices, and there is an edge between two nodes if they are in some hyperedge in H (G). G′ = (A , {{u, v}|∃h ∈ E such that u, v ∈ h}). Thus for each α ∈ A , α and its neighbors in G form a clique in G′ . In the Bayes net literature G′ is also known 120 as the moral graph of G. For example, Figure 4.5 shows the action graph G of a symmetric AGG. Its hypergraph H (G) has the same set of vertices and the hyperedges {A, B}, {A, B,C}, {D, E}, {C, D, E}, {F, G}, {C, F, G}, and {B,C, D, E}. Figure 4.6 shows G’s primal graph G′ . The concept of tree decomposition and treewidth was introduced by Robertson and Seymour [1986]. Definition 4.4.10. A tree decomposition of an undirected graph G′ = (V, E) is a pair (X , T ) with T = (I, F) a tree (where I and F are the nodes and edges of the tree respectively), and X = {Xi |i ∈ I} a family of subsets of V , one for each node of T , such that 1. i∈I Xi =V, 2. for all edges {v, w} ∈ E there exists an i ∈ I with v ∈ Xi and w ∈ Xi , and 3. for all i, j, k ∈ I: if j is on the path from i to k in T , then Xi ∩ Xk ⊆ X j . The width of a tree decomposition is maxi∈I |Xi | − 1. The treewidth tw(G′ ) of a graph G′ is the minimum width over all tree decompositions of G′ . Condition 3 of the definition can be equivalently stated as the following: for all v ∈ V , the set {i ∈ I|v ∈ Xi } induces a subtree of T . Let the treewidth tw(Γ) of an AGG Γ be the treewidth of und(G), the undirected version of its action graph G (excluding self-edges). Figure 4.7 shows a tree decomposition ({Ri |i ∈ I}, T = (I, F)) of the undirected version of the action graph G in Figure 4.5. In this case und(G) is a tree. The width of the tree decomposition is 1 since each tree node contains at most 2 vertices of und(G). This is a tree decomposition of minimum width, since any tree decomposition must have nodes containing e.g., both A and B since {A, B} is an edge in und(G). In fact, it is known in general that the treewidth of a connected tree is 1. A tree decomposition of und(G) provides a family of subsets (R1 , . . . , R6 in Figure 4.7) of vertices that cover A , and if the width of the decomposition is bounded by a constant that implies the sizes of Ri are bounded. We will be using Ri as the P’s in Lemmas 4.4.8 and 4.4.9. However, we also need to control the size of Q ⊇ P ∪ ν (P) in those lemmas in order to control the running time of the 121 resulting dynamic programming algorithm. It turns out that a tree decomposition of the primal graph can be constructed that yields the appropriate Q’s of Lemmas 4.4.8 and 4.4.9. Given a tree graph T = (I, F) and J ⊂ I, let TJ be the subgraph of T restricted to J. Lemma 4.4.11. Given a symmetric AGG-0/ Γ with treewidth w, there exists a tree decomposition ({Xi |i ∈ I}, T = (I, F)) of the primal graph G′ of width at most (w + 1)(I + 1) − 1, and {Ri |i ∈ I} such that 1. i∈I Ri = A , and Ri ∪ ν (Ri ) ⊆ Xi for all i ∈ I, 2. Let J ⊂ I such that TJ is a connected graph and connects to the rest of the tree via only one edge { j, j′ } ∈ F with j ∈ J. Let YJ = i∈J Ri . Then τ (YJ ) ⊆ R j . Proof. By assumption there exists a tree decomposition of und(G) of width w. Denote this decomposition ({Ri |i ∈ I}, T = (I, F)). Then i∈I Ri = A . Let Xi = Ri ∪ ν (Ri ) for all i ∈ I. Daskalakis and Papadimitriou [2006] proved that the resulting ({Xi |i ∈ I}, T ) is a tree decomposition of the primal graph G′ having width at most (w + 1)(I + 1) − 1. Then Ri ∪ ν (Ri) ⊆ Xi . Given J, j and YJ as defined in the statement of the lemma, we claim that τ (YJ ) ⊆ R j . To see this, consider each α ∈ τ (YJ ). Then by definition there must be an α ′ ∈ YJ such that {α , α ′ } is an edge in und(G). We note that TI\J is also connected. Since YJ = i∈J Ri , we have YJ ⊆ i∈I\J Ri = YI\J and thus α ′ ∈ YI\J . Since {α , α ′ } is an edge in und(G), by condition 2 of Definition 4.4.10 there exists i′ ∈ I such that α , α ′ ∈ Ri′ . Furthermore such i′ must be in I \ J since α ′ ∈ YJ . Since α is contained in some Ri with i ∈ J, by condition 3 of Definition 4.4.10 α must be contained in all Ri′′ such that i′′ is on the path from i to i′ in T . Since j is on this path, α ∈ R j . Since the undirected version of the action graph in Figure 4.5 has treewidth 1, Lemma 4.4.11 guarantees a tree decomposition of the primal graph with width at most 7 satisfying the above conditions. Figure 4.8 shows such a tree decomposition (with width 4) of the primal graph G′ from Figure 4.6. Each node i ∈ I of the tree is labeled with Xi . Lemma 4.4.11 together with Lemma 4.4.8 imply that: 122 Corollary 4.4.12. Given any J, j and YJ satisfying condition 2 of Lemma 4.4.11, chR j ,X j is an equilibrium-preserving characteristic function on YJ . Also observe that for all i ∈ I, chRi ,Xi is trivially an equilibrium-preserving characteristic function on Ri . Pick an arbitrary node r ∈ I to be the root of T . We say node j is a descendant of node i (equivalently i is an ancestor of j) if i is on the path from r to j. Define Zi = {v ∈ R j | j = i or j is a descendant of i}. Then Zr ≡ A . Intuitively, when we combine the restricted games associated with node i and its descendants in T , we would get a restricted game on Zi . For each node i ∈ I with children q1 , . . . , qm ∈ I, for each j ≤ m, define Zi, j = Ri ∪ Zq1 ∪ . . . ∪ Zq j . This implies that Zi,m ≡ Zi . Then Corollary 4.4.12 implies that for any Zi, j , chRi ,Xi is an equilibrium-preserving characteristic function. We write CZi, j ≡ CZRi,ij,Xi . For our tree decomposition in Figure 4.8, if we let node 1 be the root r, then Z5 = R5 , Z6 = R6 , Z3 = R3 ∪ R5 = {C, D, E}, Z4 = R4 ∪ R6 = {C, F, G}, Z2 = R2 ∪ R3 ∪ R4 ∪ R5 ∪ R6 = {B,C, D, E, F, G}, and Z1 = A . Since node 2 has two children q1 = 3 and q2 = 4, then Z2,1 = R2 ∪ Z3 = {B,C, D, E} and Z2,2 = Z2,1 ∪ Z4 = Z2 = {B,C, D, E, F, G}. We adapt our dynamic programming algorithm from the previous section so that {Ri |i ∈ I} is the initial family of subsets that covers A , and the order in which the subsets are combined is guided by the tree decomposition, from the leaves to the root. 1. For each Ri , compute CRi . This can be done by enumerating all possible configurations c[Xi ] and keeping those that induce a pure equilibrium of the restricted game on Ri . 2. Initialize the set Done ⊆ I to contain the leaves of the tree T . 3. While ∃i ∈ I \ Done such that {i′ ∈ I|i′ is a child of i} ⊆ Done: (a) Let CZi,0 := CRi (b) Let q1 , . . . , qm be the children of i. (c) For j = 1 to m, compute CZi, j from CZi, j−1 and CZq j by applying Lemma 4.4.9. (d) CZi := CZi,m 123 (e) Add i to Done. 4. Return TRUE iff CZr is nonempty. For the tree decomposition in Figure 4.8 with node 1 being the root, our algorithm would start from the leaves 5 and 6, then compute CZ3 = C Z3,1 by combining CR3 and CR5 , compute CZ4 = C Z4,1 by combining CR4 and CR6 , compute CZ2,1 = C{B,C,D,E} by combining CR2 and CZ3 , then compute CZ2 = CZ2,2 = C{B,C,D,E,F,G} by combining CZ2,1 and CZ4 , and finally compute CZ1 by combining CR1 and CZ2 . Theorem 4.4.13. Deciding the existence of pure equilibrium in symmetric AGG-0s / with bounded treewidth is in P. Proof. Suppose the treewidth of the AGG is bounded by a constant, w. Then a tree decomposition of und(G) having width at most w can be constructed in time exponential only in w, i.e., in polynomial time (see e.g. [Bodlaender, 1996, Kloks, 1994]). Then we can apply Lemma 4.4.11 to construct in polynomial time the tree decomposition ({Xi |i ∈ I}, T = (I, F)) of the primal graph G′ and {Ri |i ∈ I}. It is straightforward to check that our algorithm above correctly computes all CZi, j . Specifically, at step 3c, since Zi, j−1 and Zq j correspond to disjoint subgraphs of T connected by edge {i, q j } ∈ F, we have Zi, j−1 ∪ Zq j ⊆ Ri . Therefore we can apply Lemma 4.4.9. Since Zr ≡ A , the algorithm correctly determines the existence of pure equilibrium in Γ. The running time of the algorithm is polynomial in the size of the CZi ’s. The size of each CZi is bounded by n||Γ||2 |C [Xi ]|. Since the tree decomposition has width at most (w + 1)(I + 1) − 1, |C [Xi ]| ≤ n+(w+1)(I +1) (w+1)(I +1) . The latter is the number of ordered combinatorial compositions of n into (w + 1)(I + 1) + 1 nonnegative integers. An equivalent way of counting this number is as follows: 1. break n into w + 1 nonnegative integers x1 , . . . , xw+1 such that ∑w+1 i=1 xi = n. 2. then break each of the first w integers into I + 1 nonnegative parts in the same way, and the last one (xw+1 ) into I + 2 nonnegative parts. There are n+w w different ways of carrying out step 1. Since each integer con- sidered in step 2 is at most n, there are at most 124 n+I +1 I +1 ways of breaking each integer. Therefore n+(w+1)(I +1) (w+1)(I +1) ≤ n+w w n+I +1 w+1 . I +1 Since w is a constant, this is polynomial in ||Γ||. Hence our algorithm runs in polynomial time. When the input is an AGG-0/ encoding of a singleton congestion game, i.e., a symmetric AGG-0/ with only self-edges, the resulting und(G) has treewidth 0 and by Theorem 4.4.13 the existence of PSNE can be determined in polynomial time. Of course, our result applies to a much larger class of games. Road games (Example 4.4.2) have treewidth 2 for all m. Thus by Theorem 4.4.13 the existence of PSNE can be determined in polynomial time for these games. Our approach can be straightforwardly extended to the computation of related solution concepts such as pure-strategy ε -Nash equilibrium and strict equilibrium. For example, for pure-strategy ε -Nash equilibrium, we define partial solutions such that they induce ε -Nash equilibria of the corresponding restricted games, and use a modified version of Lemma 4.4.9 where the conditions that compare best entrance utilities and worst current utilities are relaxed by ε ; e.g., Uc′ ≥ Ue′′ is replaced by Uc′ + ε ≥ Ue′′ . 4.4.5 Finding PSNE So far we have focused on the problem of deciding the existence of PSNE. Our dynamic programming approach can also be used to find these equilibria if they exist. We first consider the problem of constructing a single PSNE. After the bottom-up pass of the tree decomposition as discussed above, if CZr is not empty, we do a top-down pass as follows: 1. Initialize Done ⊆ I to be {r}, 2. Pick an arbitrary (c[Xr ], Br ,Ucr ,Uer ) ∈ CZr 3. Set CZr = {(c[Xr ], Br ,Ucr ,Uer )}, 4. While Done = I: (a) Take i ∈ Done such that {i′ |i′ is a child of i} ∩ Done = 0/ (b) Let q1 , . . . , qm be the children of i. (c) CZi ≡ CZi,m will have a single element (c[Xi ], Bi ,Uci ,Uei ). 125 (d) Let CZi,0 := CRi = {ch(c[Xi ])} (e) For each j ∈ m, m − 1, . . . , 1: q q i. pick (c[Xq j ], Bq j ,Uc j ,Ue j ) ∈ CZq j and (c[Xi ], Bi, j−1 ,Uci, j−1 ,Uei, j−1 ) ∈ CZi, j−1 such that they combine to form the single element of Ci, j while satisfying the conditions of Lemma 4.4.9. q q ii. set CZq j := {(c[Xq j ], Bq j ,Uc j ,Ue j )} and CZi, j−1 := {(c[Xi ], Bi, j−1 ,Uci, j−1 ,Uei, j−1 )}. iii. add q j to Done. 5. Now each CRi contains a single element ch(c[Xi ]). Output configuration i∈I c[Xi ]. Since the bottom-up pass has established the correct CZi, j , step 4(e)i can always be carried out. Therefore the algorithm is correct, and by the same argument as in the proof of Theorem 4.4.13 the algorithm runs in polynomial time. This proves: Corollary 4.4.14. The problem of finding a PSNE is in P for symmetric AGG-0s / with bounded treewidth. A similar top-down pass would make sure that each CZi, j contains exactly the characteristics of extendable partial solutions. Although the number of pure equilibria of an AGG could be exponential in the representation size ||Γ||, the resulting set of CZi, j along with the tree decomposition constitutes a succinct description of the set of PSNE of the game, analogous to Daskalakis and Papadimitriou [2006]’s construction of succinct descriptions of the set of PSNE of graphical games. Given a symmetric AGG-0/ with bounded treewidth, such a succinct description can be computed in polynomial time. The succinct description can be used e.g., to enumerate the set of all PSNE in time polynomial in the size of input and output, and to check if there exists a PSNE with a specific configuration at certain action nodes. 4.4.6 Computing Optimal PSNE Recall from Chapter 2 that the social welfare is the sum of the players’ utilities. Given a configuration c in a symmetric AGG-0/ Γ, the social welfare can be written as WΓ (c) = ∑ α ∈A c[α ]uα (c[ν (α )]). 126 Our algorithm can be extended to compute the socially optimal PSNE if one exists. The characteristics now also store the social wealth of the restricted games. Specifically, we use the characteristic function chopt (c[Zi, j ∪ ν (Zi, j )]) = (chRi ,Xi (c[Zi j ∪ ν (Zi, j )]),WΓ′ (c[Zi, j ])) where Γ′ = Γ(#c[Zi j ], Zi j , c[ν (Zi j )]) is the restricted game on Zi j induced by the opt partial solution. Let CZi, j be the corresponding set of characteristics. The way characteristics from two sets X ′ , X ′′ ⊆ A are combined is also slightly different from Lemma 4.4.9. Once we have checked consistency and profitable deviations as in Lemma 4.4.9, we now need to compute the social welfare of the resulting characteristic from the given characteristics of X ′ and X ′′ . Simply adding the social welfare values would not be correct due to the possible overlap of X ′ and X ′′ ; fortunately we know the configuration over X ′ ∩ X ′′ and their neighbors (by assumption of Lemma 4.4.9) so we are able to calculate the social welfare of the overlap and subtract it from the sum. Corollary 4.4.15. Suppose X = X ′ ∪ X ′′ , and X ′ , X ′′ , P, P′ , P′′ , Q, Q′ , Q′′ satisfy the prerequisites of Lemma 4.4.9. For all c[Q], B,Uc ,Ue ,W∪ , we have (c[Q], B,Uc ,Ue ,W∪ ) ∈ ′′ ′′ ′′ ′′ ′′ ′′ CXopt if and only if there exist (c′ [Q′ ], B′ ,Uc′ ,Ue′ ,W ′ ) ∈ CXopt ′ and (c [Q ], B ,Uc ,Ue ,W ) ∈ CXopt ′′ satisfying the conditions of Lemma 4.4.9, and W∪ = W ′ +W ′′ −WΓ∩ (c[X ′ ∩ X ′′ ]) where Γ∩ = Γ(#c[X ′ ∩ X ′′ ], X ′ ∩ X ′′ , c[ν (X ′ ∩ X ′′ )]). Using this characteristic function together with the bottom-up pass above, we can compute the optimal social welfare achieved by a PSNE, if one exists. A topdown pass then constructs such a PSNE. One issue with this approach is that due to the additional social welfare term in a characteristic, the number of characteristics opt in each CZi, j can be greater than |CZi, j |. Fortunately, it is straightforward to show that: Lemma 4.4.16. Suppose partial solutions c[X ∪ ν (X )] and c′ [X ∪ ν (X )] induce the same characteristic under chopt except that the former’s social welfare is less than 127 the latter’s. Then the former can be extended to a PSNE if and only if the latter can be extended to a PSNE with greater social welfare. opt This implies that whenever we have multiple characteristics in CZi, j that differ only in their social welfare values, we can safely prune away all but the one with opt the greatest social welfare. The resulting CZi, j has the same cardinality as CZi, j , therefore the algorithm runs in polynomial time. Corollary 4.4.17. Computing a maximum social welfare PSNE in symmetric AGG-0s / with bounded treewidth is in P. 4.5 Beyond symmetric AGGs 4.5.1 Algorithm for k-Symmetric AGG-0s / Our results for symmetric AGG-0s / can be straightforwardly extended to k-symmetric AGG-0s / with bounded k. Consider a k-symmetric AGG-0/ Γ with player classes N1 , . . . , Nk . As discussed in Section 4.3, it is sufficient to consider k-configurations. Define restricted game Γ((n′ℓ )1≤ℓ≤k , X , (cℓ [νℓ (X )])1≤ℓ≤k ) to be the k-symmetric AGG-0/ played on GX , in which each player class ℓ ∈ {1 . . . k} has n′ℓ ≤ |Nℓ | − cℓ [ν (X )] players, and the utility function for each α ∈ X is uα |(cℓ (ν (X)))1≤ℓ≤k , i.e., the same as uα of Γ except that the configuration of nodes outside X are given by the k-configuration (cℓ (ν (X )))1≤ℓ≤k . We define a partial solution on X to be a k-configuration (cℓ [X ∪ ν (X )])1≤ℓ≤k such that (cℓ [X ])1≤ℓ≤k is a PSNE of the restricted game Γ((#cℓ [X ])1≤ℓ≤k , X , (cℓ [νℓ (X )])1≤ℓ≤k ). Similarly, we extend the characteristic functions of Section 4.4 by replacing each component of the characteristic with its k-tuple version. Definition 4.5.1. Given a restricted game Γ′ on X ⊂ A and a PSNE (c∗ℓ )1≤ℓ≤k of Γ′ , player class ℓ’s worst current utility WCUℓ ((c∗ℓ )1≤ℓ≤k , Γ′ ) is the utility of the worst-off player from class ℓ in Γ′ , or ∞ if Γ′ has 0 players in class ℓ. Player class ℓ’s best entrance utility BEUℓ ((c∗ℓ )1≤ℓ≤k , Γ′ ) is the best payoff an outside player (a player currently playing an action outside of X ∪ ν (X )) from class ℓ can get by playing an action in X ∩ A ℓ , assuming the current players in Γ′ play (c∗ℓ )1≤ℓ≤k . If there are 0 outside players from class ℓ or X ∩ A ℓ = 0, / BEU((c∗ℓ )1≤ℓ≤k , Γ′ ) = −∞. 128 Lemma 4.5.2. Given a k-symmetric AGG-0/ Γ, X ⊂ A , P ⊆ X such that P ⊇ τ (X ), and Q ⊇ P ∪ ν (P), consider the characteristic function chkP,Q that maps a partial solution (cℓ [X ∪ ν (X )])1≤ℓ≤k to (cℓ [Q], #cℓ [X ], WCUℓ (c[X ′ ], Γ′ ), BEUℓ (c[X ′ ], Γ′ ))1≤ℓ≤k , where Γ′ = Γ((#cℓ [X ′ ])1≤ℓ≤k , X ′ , (cℓ [ν (X ′ )])1≤ℓ≤k ) and X ′ = X \ P. Then chkP,Q is equilibrium-preserving. Lemma 4.4.9 can be similarly extended to the k-symmetric case. Therefore we can use this characteristic function together with our bottom-up pass algorithm to determine the existence of PSNE in k-symmetric AGG-0s, / and use the top-down algorithm to find a PSNE if one exists. For k-symmetric AGG-0s / with bounded k and bounded treewidths, each of the k components of chkRi ,Xi ’s output can take at most poly(||Γ||) values, and as a result the number of characteristics is polynomial in ||Γ||. We thus have the following generalization of Theorem 4.4.13, Corollary 4.4.14 and Corollary 4.4.17. Corollary 4.5.3. For k-symmetric AGG-0s / with bounded k and bounded treewidths, the problems of determining the existence of PSNE, of constructing a PSNE, and of finding maximum social welfare PSNE are all in P. We observe that when k = 1, i.e., when the game is symmetric, chkP,Q degenerates into chP,Q we previously defined for the symmetric case, and this algorithm simplifies into our algorithm for symmetric AGG-0s. / 4.5.2 General AGG-0s / and the Augmented Action Graph We now consider the case of general AGG-0s. / We note that such games can still be viewed as k-symmetric (with k at most n), but now k may grow with the input size. Our approach in Section 4.5.1 for k-symmetric AGG-0s / works well only when k is bounded by a constant, since the number of characteristics under chkP,Q grows exponentially in k. Can this approach be extended to the case of general AGG-0s? / We observe that in order to check deviations out of and into X \ P, we do not need to keep track of information about player classes whose action sets are either (1) fully contained in X \ P, or (2) disjoint from X \ P. In the former case, no player 129 of that class can deviate outside X \ P; this is reflected in chkP,Q as best entrance utilities of −∞ for that class in the restricted game on X \ P, but we also do not need to keep track of the worst current utilities for the class. Similarly, in the latter case, no player of that class can deviate into X \ P. To check deviations out of and into P, we only need to keep track of information on player classes whose action sets intersect Q. In other words, it is sufficient to define a characteristic function in terms of the player classes that are relevant to the current subset of nodes. Formally, Lemma 4.5.4. Consider a k-symmetric AGG-0/ Γ with player classes 1, . . . , k corresponding to sets of players N1 , . . . , Nk and action sets A 1 , . . . , A k . Given X ⊂ A , P ⊆ X such that P ⊇ τ (X ), and Q ⊇ P ∪ ν (P), let L(X , P) = {ℓ|1 ≤ ℓ ≤ k, A ℓ ⊆ (X \ P), A ℓ ∩ (X \ P) = 0}, / and let K(Q) = {ℓ|1 ≤ ℓ ≤ k, A ℓ ∩ Q = 0}. / Consider + the characteristic function chP,Q that maps a partial solution (cℓ [X ∪ ν (X )])1≤ℓ≤k to (cℓ [Q])ℓ∈K(Q) , (#cℓ [X \ P])ℓ∈L(X,P) , (WCUℓ (c[X ′ ], Γ′ ))ℓ∈L(X,P) , (BEUℓ (c[X ′ ], Γ′ ))ℓ∈L(X,P) , where Γ′ = Γ((#cℓ [X ′ ])1≤ℓ≤k , X ′ , (cℓ [ν (X ′ )])1≤ℓ≤k ) and X ′ = X \ P. Then ch+ P,Q is equilibrium-preserving. Lemma 4.4.9 can be similarly extended. The number of characteristics under ch+ P,Q is exponential in |Q|, |K(Q)| and |L(X , P)|. Intuitively, as we combine these characteristics to form characteristics on larger subgraphs, |L(X , P)| will also grow, unless we “finish off” certain player classes, i.e., player class ℓ such that A ℓ be- come a subset of X \ P. Can we divide the action graph and combine the restricted games in a way that keeps |Q|, |K(Q)| and |L(X , P)| small? A natural idea is to turn to tree decompositions of G, as we did in Section 4.4.4. However, [Daskalakis et al., 2009] proved that the problem of determining the existence of PSNE is NP-hard even for AGGs with tree-width 1 and constant in-degree. In other words, we cannot hope for a polynomial-time algorithm for general AGGs with constant treewidths, unless P = NP. On the other hand, there exist classes of asymmetric AGGs that are poly-time solvable, e.g. those corresponding to tree graphical games. This implies that looking at the action graph alone is insufficient for identifying such tractable classes of AGGs. We have seen that information about the action sets of the AGG 130 is needed in order to define ch+ P,Q . Thus a natural idea is to define an object that incorporates information about the action sets as well as the action graph of the AGG. Definition 4.5.5. Given an AGG-0/ Γ with player classes 1, . . . , k, define the augmented action graph2 to be a directed graph AG = (V + , E + ) = (A ∪ {1, . . . , k}, E ∪ {(ℓ, α )|({α } ∪ ν (α )) ∩ A ℓ = 0}). / Let I + be the maximum in-degree of AG. In other words, we add to the action graph G new vertices {1, . . . , k} corresponding to the player classes, and an edge from each player class ℓ to action α if α or any of its neighbors are in the action set of class ℓ. Intuitively, the edges from player class nodes to action nodes in the augmented action graph ensure that in the resulting tree decomposition, the set of tree nodes to which a player class is relevant forms a connected subgraph of the tree. This is formalized in the following result for augmented action graphs, which is analogous to Lemma 4.4.11 for action graphs. Lemma 4.5.6. Given a k-symmetric AGG-0/ Γ whose augmented action graph AG has treewidth w, there exists a tree decomposition ({Xi ∪ Ki |Xi ⊆ A , Ki ⊆ {1, . . . , k}, i ∈ I}, T = (I, F)) of AG’s primal graph AG′ of width at most (w + 1)(I + + 1) − 1, and {Ri ⊆ A |i ∈ I} such that 1. i∈I Ri = A , and Ri ∪ ν (Ri ) ⊆ Xi for all i ∈ I, 2. Let J ⊂ I such that TJ is a connected graph and connects to the rest of the tree via only one edge { j, j′ } ∈ F with j ∈ J. Let YJ = i∈J Ri . Then τ (YJ ) ⊆ R j , K(X j ) ⊆ K j , and L(YJ , R j ) ⊆ L j . Proof. The construction is very similar to that of Lemma 4.4.11: given a tree decomposition ({Ri ∪ Li|Ri ⊆ A , Li ⊆ {1, . . . , k}, i ∈ I}, T = (I, F)) of AG, we build 2 We note that our definition of augmented action graph is different from the augmented graph of Daskalakis et al. [2009]. The computational problem that Daskalakis et al. [2009] were trying to solve (finding approximate mixed-strategy Nash equilibria) is different from the PSNE problem considered in this chapter. 131 a tree decomposition of AG′ by adding to each tree node i ∈ I the neighboring vertices of Ri (vertices in Li have no neighbors). Lemma 4.5 of [Daskalakis and Papadimitriou, 2006] ensures that the result is a tree decomposition of AG′ with width at most (w + 1)(I + + 1) − 1. The resulting tree decomposition ({Xi ∪ Ki |Xi ⊆ A , Ki ⊆ {1, . . . , k}, i ∈ I}, T = (I, F)) will have Xi = Ri ∪ ν (Ri ) as in the proof of Lemma 4.4.11, and Ki = Li ∪ {ℓ|1 ≤ ℓ ≤ k, A ℓ ∩ (Ri ∪ ν (Ri )) = 0}. / This implies K(Xi ) ⊆ Ki for all i ∈ I. By the same argument as in the proof of Lemma 4.4.11, we have τ (YJ ) ⊆ R j . It remains to show that L(YJ , R j ) ⊆ L j . Consider an arbitrary ℓ ∈ L(YJ , R j ). This implies that A ℓ ∩ (YJ \ R j ) = 0/ and A ℓ ∩ (YJ \ R j ) = 0. / But this implies that there exists α ∈ (YJ \ R j ) such that (ℓ, α ) ∈ E + , and there exists α ′ ∈ YJ \ R j such that (ℓ, α ′ ) ∈ E + . Since the tree nodes that contain α must be in J \ { j}, and by condition 2 of Definition 4.4.10 (ℓ, α ) must be contained in some tree node, we must have that ℓ ∈ Li for some i ∈ J \ { j}. Similarly we must have ℓ ∈ Li′ for some i′ ∈ J \ { j}. But by condition 3 of Definition 4.4.10 we must have ℓ ∈ L j , and therefore L(YJ , R j ) ⊆ L j . Lemma 4.5.6 implies that we can apply the bottom-up pass algorithm using the characteristic function ch+ Ri ,Xi for Zi, j , and correctly determines the existence of PSNE. If a PSNE exists then a top-down pass constructs one. Let us consider the running time of this approach. If we assume that AG has bounded indegree and bounded treewidth, this immediately implies that |Xi | and Ki are bounded for all i ∈ I, and the number of characteristics are polynomial in n and |A |. This in turn implies that our algorithm runs in polynomial time in this case. Proposition 4.5.7. For AGG-0s / whose action graphs have bounded indegree and bounded treewidth, the problems of determining the existence of PSNE and of finding a PSNE are in P. One question is whether it is possible to show that the run time is polynomial in the input size when the augmented action graph has bounded treewidth, i.e., without any requirement on the in-degree. However, this turns out to be more difficult than in the symmetric case. Specifically, in order to prove such a result without any requirement on the in-degree, we would need to compare the runtime with (a lower 132 bound of) the input size. Whereas for symmetric AGGs we have exact estimates of the input size, for general AGGs we only proved upper bounds in Chapter 3. The complexity of PSNE for AGG-0s / with bounded-treewidth augmented action graphs remains an open problem. One interesting case is when the input is an AGG-0/ encoding of a boundedtreewidth graphical game. Recall that the PSNE problem for such games is known to be tractable [Daskalakis and Papadimitriou, 2006, Gottlob et al., 2005]. We show that our algorithm runs in polynomial time given AGG-0/ encodings of such games, thus providing another proof of this result. Proposition 4.5.8. Determining the existence of PSNE in bounded-treewidth graphical games is in P. Proof. Recall from Chapter 3 that the size of the AGG-0/ encoding is proportional to that of the graphical game, which is Θ(∑ℓ∈N |Aℓ | ∏ j∈νg (ℓ) |A j |), where νg (ℓ) is the set of neighboring players of ℓ in the graphical game. The AGG has k = n player classes, each containing a single player. We denote by ℓ the player class corresponding to player ℓ ∈ N. Suppose the underlying graph (N, Eg ) of the graphical game has treewidth w and maximum in-degree Ig . Then the corresponding action graph G = (A , E) is given in Chapter 3 and the corresponding augmented action graph AG = (A ∪ N, E ∪ {(ℓ, α )|ℓ ∈ N, α ∈ Aℓ }). Given a tree decomposition ({Li |i ∈ I}, T = (I, F)) of the graph (N, Eg ) with width w, it is straightforward to show that ({Ri ∪ Li |i ∈ I}, T ), where Ri = ℓ∈Li Aℓ for all i ∈ I, is a tree decom- position of the augmented action graph AG. The width of this decomposition is O(maxℓ∈N |Aℓ |w). Construct the tree decomposition ({Xi ∪ Ki |i ∈ I}, T ) for the primal graph AG′ according to Lemma 4.5.6. It is straightforward to verify that Ki = Li ∪ νg (Li ) and Xi = ℓ∈Ki Aℓ . Therefore |Ki | = O(wIg ), |Xi | = O(maxℓ∈Ki |Aℓ |wIg ), and the width of the decomposition is O(maxℓ∈N |Aℓ |wIg ). Now consider the number of characteristics under ch+ Ri ,Xi . Since for each ℓ ∈ N and J ⊆ I we either have Aℓ ⊆ YJ or Aℓ ∩ YJ = 0, / this implies that L(YJ , R j ) = 0/ for all j ∈ I and J ⊆ I. Thus the only nontrivial component of the characteristic is (cℓ [Xi ])ℓ∈Ki . Since each ℓ ∈ Ki corresponds to a single player, |Cℓ [Xi ]| = |Aℓ |. Thus the number of possible (cℓ [Xi ])ℓ∈Ki is ∏ℓ∈Ki |Aℓ |, which is polynomial in the input 133 size since |Ki | = O(wIg ). Thus the number of characteristics is polynomial in ||Γ||, which implies that our algorithm runs in polynomial time. We see from the above proof that in this case the characteristic degenerates into (cℓ [Xi ])ℓ∈Ki , which carries the same amount of information as the partial pure strategy profile of players in Ki . This is exactly the same sufficient statistic used by Daskalakis and Papadimitriou [2006]’s algorithm for graphical games, and as a result our algorithm simplifies to the equivalent of Daskalakis and Papadimitriou [2006]’s algorithm when given an AGG-0/ encoding of a graphical game. We also note that our algorithms for symmetric and k-symmetric AGG-0s / can be seen as special cases of our augmented-action-graph-based algorithm. In particular, consider a k-symmetric AGG-0/ with action graph G, and suppose und(G) has a tree decomposition ({Ri |i ∈ I}, T = (I, F)). Then our algorithm for k-symmetric AGG-0s / corresponds to applying the augmented-action-graph-based algorithm to the tree decomposition ({Ri ∪ {1, . . . , k}|i ∈ I}, T ) for AG′ , i.e., having all k player classes in each of the tree nodes of the decomposition. 4.6 Conclusions and Open Problems In this chapter we analyzed the problem of computing PSNE in AGGs. We proposed a dynamic programming algorithm and showed that for symmetric AGG-0s / with bounded treewidth, our algorithm determines the existence of PSNE in polynomial time. We extended our approach to certain classes of asymmetric AGG-0s, / and showed that our algorithm generalizes existing dynamic-programming approaches for computing PSNE in graphical games and singleton congestion games. One question is whether our approach has captured all the tractable classes of AGG-0s / for the PSNE problem. The answer is no. For example, consider an asymmetric AGG-0/ whose action graph has no inter-vertex edges and only self edges. This is the same as the singleton congestion games studied by Ieong et al. [2005] except that here the game is not symmetric. It is straightforward to see that this game corresponds to a congestion game, and thus PSNE always exist. Furthermore, by a similar argument as Ieong et al. [2005], given such a game a PSNE can be found by iterated best response dynamics in polynomial time. On the other hand, the augmented graph of such an AGG-0/ might have large treewidth. 134 This example can be generalized: if the action graph contains a set X of such singleton nodes, and the action sets that intersect X does not contain any node not in X , then the subgraph of the singleton nodes does not affect the existence of PSNE, i.e., a PSNE exists in the game if and only if a PSNE exists in the restricted game on the rest of the graph. We can even further generalize this: consider a subgraph GX such that (as above) the action sets that intersect X does not contain any node not in X , and that X has only incoming edges from the rest of G and no out going edges (i.e., ν (X ) = 0), / then GX does not affect the existence of PSNE, and we can safely delete the subgraph and solve the rest of the graph. This process can be repeated. (This is analogous to, and indeed a generalization of, the case of graphical games with sinks which was discussed in [Jiang and Safari, 2010].) Note that for these examples, a greedy approach is used instead of (or in addition to) the dynamic programming approach used in this chapter. For the problem of existence of PSNE in graphical games, Jiang and Safari [2010] was able to completely characterize the tractable classes of bounded-indegree graphs. An open problem is completely characterizing the types of restrictions to the graphical structure of AGG-0s / that make the PSNE problem tractable, perhaps by leveraging some of the techniques developed in [Jiang and Safari, 2010]. Another future direction is to extend our approach to AGG-FNs. Recall that the configuration on a function node is the value of a deterministic function of the configuration of its neighbors. Thus given a symmetric AGG-FN, its PSNE correspond to configurations over its action nodes and function nodes such that the configuration over each function node is equal to the appropriate value, and the configuration over action nodes satisfy the incentive and consistency constraints as before. Assuming the deterministic functions for the function nodes are explicitly represented, it is then relatively straightforward to extend our dynamic-programming approach to work on the action graphs of symmetric AGG-FNs. An interesting question is whether this can be extended to efficiently deal with compactly-represented function nodes such as summation function nodes. Finally, as we have seen in this chapter, one faces additional technical challenges when going beyond the symmetric case. It would be interesting to see if our approaches discussed in Section 4.5 can be extended to AGG-FNs. 135 Chapter 5 Temporal Action-Graph Games: A New Representation for Dynamic Games 5.1 Introduction In this chapter we1 turn our focus to compact representations of dynamic games. As mentioned in Section 2.1.2, the most influential compact representation for imperfect-information dynamic games is multiagent influence diagrams, or MAIDs [Koller and Milch, 2003]. MAIDs are compact when players’ utility functions exhibit independencies; such compactness can also be leveraged for computational benefit (see Section 2.2.4). Consider the following example of a dynamic game. Example 5.1.1. Twenty cars are approaching a tollbooth with three lanes. The drivers must decide which lane to use. The cars arrive in four waves of five cars each. In each wave, the drivers must pick lanes simultaneously, and can see the number of cars before them in each lane. A driver’s utility decreases with the number of cars that chose the same lane either before him or at the same time. 1 This chapter is based on published joint work with Kevin Leyton-Brown and Avi Pfeffer [Jiang et al., 2009]. 136 A straightforward MAID representation of the game of Example 5.1.1 contains very little structure; in particular, each player will have a utility node, whose parents are all the decision nodes of the drivers before her. As the number of players grow, the representation size of the utility functions grow exponentially. Computation using such a representation would be highly inefficient. However, the game really is highly structured: agents’ payoffs exhibit context-specific independence (utility depends only on the number of cars in the chosen lane) and agents’ payoffs exhibit anonymity (utility depends on the numbers of other agents taking given actions, not on these agents’ identities). The problem with a straightforward MAID representation of this game is that it does not capture either of these kinds of payoff structure. As we have seen in Chapter 2, a wider variety of compact game representations exist for simultaneous-move games. In particular, several of these game representations (including congestion games and local effect games) can compactly represent anonymity and context-specific independence (CSI) structures. We saw in Chapter 3 that AGGs unify these past representations by compactly representing both anonymity and CSI while still retaining the ability to represent any game. Furthermore, structure in AGGs can be leveraged for computational benefit. However, AGGs are unable to represent the game presented in Example 5.1.1 because they cannot describe sequential moves or imperfect information. In this chapter we present a new representational framework called Temporal Action-Graph Games (TAGGs) that allows us to capture this kind of structure. Like AGGs, TAGGs can represent anonymity and CSI, but unlike AGGs they can also represent games with dynamics, imperfect information and uncertainty. We first define the representation of TAGGs, and then show formally how they define a game using an induced Bayesian network (BN). We demonstrate that TAGGs can represent any MAID, but can also represent situations that are hard to capture naturally as MAIDs. If the TAGG representation of a game contains anonymity or CSI, the induced BN will have special structure that can be exploited by inference algorithms. We present an algorithm for computing expected utility of TAGGs that exploits this structure. Our algorithm first transforms the induced BN to another BN that represents the structure more explicitly, then computes expected utility using a specialized inference algorithm on the transformed BN. We show that it 137 performs better than using a MAID in which the structure is not represented explicitly, and better than using a standard BN inference algorithm on the transformed BN. 5.2 Representation 5.2.1 Temporal Action-Graph Games At a high level, Temporal Action-Graph Games (TAGGs) extend the AGG representation by introducing the concepts of time, uncertainty and imperfect information, while adapting the AGG concepts of action nodes and action-specific utility functions to the dynamic setting. We first give an informal description of these concepts. Temporal structure. A TAGG describes a dynamic game played over a series of time steps 1, . . . , T , on a set of action nodes A . At each time step a version of a static AGG is played by a subset of agents on A , and the action counts on the action nodes are accumulated. Chance variables. TAGGs model uncertainty via chance variables. Like random variables in a BN, a chance variable is associated with a set of parents and a conditional probability table (CPT). The parents may be action nodes or other chance variables. Each chance variable is associated with an instantiation time; once instantiated, its value stays the same for the rest of the game. Chance variables can be thought of as a generalization of the (deterministic) function nodes in AGG-FNs. Decisions. At each time step one or more agents move simultaneously, represented by agent-specific decisions. TAGGs model imperfect information by allowing each agent to condition his decision on observed values of a given subset of decisions, chance variables, and the previous time step’s action counts. Action nodes. Each decision is a choice of one from a number of available action nodes. As in AGGs, the same action may be available to more than one player. Action nodes provide a time-dependent tally: the action count for each action A in each time step τ is the number of times A has been chosen during the time period 1, . . . , τ . 138 Utility functions. There is a utility function UAτ associated with each action A at each time τ , which specifies the utility a player receives at time τ for having chosen action A. Each UAτ has a set of parents which must be action nodes or chance variables. The utility of playing action A depends only on what happens over these parents. An agent who took action A (once) may receive utility at multiple times (e.g., short-term cost and long-term benefit); this is captured by associating a set of payoff times with each decision. An agent’s overall utility is defined as the sum of the utilities received at all time steps. Play of a TAGG can be summarized as follows: 1. At time 0, action counts are initialized to zero; chance variables with instantiation time 0 are instantiated, 2. At each time τ ∈ {1, . . . , T }: (a) all agents with decisions at τ observe the appropriate action counts, chance variables, and decisions, if any. (b) all decisions at τ are made simultaneously. (c) action counts at τ are tallied. (d) chance variables at time τ are instantiated. (e) for each action A, utility function UAτ is evaluated, with this amount of utility accruing to every agent who took action a at a decision whose payoff times include τ ; the result is not revealed to any of the players.2 3. At the end of the game, each agent receives the sum of all utility allocations throughout the game. Intuitively, the process can be seen as a sequence of simultaneous-move AGGs played over time. At each time step τ , the players that have a decision at time τ participate in a simultaneous-move AGG on the set of action nodes, whose action counts are initialized to be the counts at τ − 1. Each action A’s utility function is UAτ and A’s neighbors in the action graph correspond to the parents of UAτ . 2 If an agent plays action A for two decisions that have the same payoff time τ , then the agent receives twice the value of UAτ . 139 We observe that decisions and chance variables in TAGGs are similar to decision nodes and chance nodes (respectively) in MAIDs, except that here their parents can be time-dependent action counts. Thus the need to specify the time steps that decisions and chance nodes in a TAGG are instantiated; but once instantiated their values stay fixed. We also observe that the time-dependent nature of action counts in TAGGs is similar to how dynamic Bayesian networks (DBNs) [Dean and Kanazawa, 1989, Murphy, 2002], a probabilistic graphical model of temporal domains, model their time-dependent random variables. Just as a DBN can be unrolled into a BN; later on we will see that a TAGG can also be unrolled into a MAID. Before formally defining TAGGs, we need to first define the concept of a configuration at time τ over a set of action nodes, decisions and chance variables, which is intuitively an instantiation at time τ of a corresponding set of variables. Definition 5.2.1. Given a set of action nodes A , a set of decisions D, a set of chance variables X , and a set B ⊆ A ∪ X ∪ D, a configuration at time τ over B, denoted as CBτ , is a |B|-tuple of values, one for each node in B. For each node b ∈ B, the corresponding element in CBτ , denoted as Cτ (b), must satisfy the following: • if b ∈ A , Cτ (b) is an integer in {0, . . . , |D|} specifying the action count on b at τ , i.e. the number of times action b has been chosen during the time period 1, . . . , τ . • if b ∈ D, Cτ (b) is an action in A , specifying the action chosen at D. • if b ∈ X , Cτ (b) is a value from the domain of the random variable, Dom[b]. Let CBτ be the set of all configurations at τ over B. We now offer formal definitions of chance variables, decisions, and utility functions. Definition 5.2.2. A chance variable X is defined by: 1. a domain Dom[X ], which is a nonempty finite set; 2. a set of parents Pa[X ], which consists of chance variables and/or actions; 140 3. an instantiation time t(X ), which specifies the time at which the action counts in Pa[X ] are instantiated; 4. a CPT Pr(X |Pa[X ]), which specifies the conditional probability distribution t(X) of X given each configuration CPa[X] . We require that each chance variable’s instantiation time be no earlier than its parent chance variable’s instantiation times, i.e. if chance variable X ′ ∈ Pa[X ], then t(X ′ ) ≤ t(X ). Definition 5.2.3. A decision D is defined by: 1. the player making the decision, pl(D). A player may make multiple decisions; the set of decisions belonging to a player ℓ is denoted by Decs[ℓ]. 2. its decision time t(D) ∈ {1, . . . , T }. Each player has at most one decision at each time step. 3. its action set Dom[D], a nonempty set of actions. 4. the set of payoff times pt(D) ⊆ {1, . . . , T }. We assume that τ ≥ t(D) for all τ ∈ pt(D). 5. its observation set O[D]: a set of decisions, actions, and chance variables, t(D)−1 whose configuration at time t(D)− 1 (i.e. CO[D] ) is observed by pl(D) prior to making the decision. We require that if decision D′ is an observation of D, then t(D′ ) < t(D). Furthermore if chance variable X is an observation of D, then t(X ) < t(D). Definition 5.2.4. Each action A at each time τ is associated with one utility function UAτ . Each UAτ is associated with a set of parents Pa[UAτ ], which is a set of actions and chance variables. We require that if chance variable X ∈ Pa[UAτ ], then t(X ) ≤ τ . Each utility function UAτ is a mapping from the set of configurations τ CPa to a real value. [U τ ] A We can now formally define TAGGs. Definition 5.2.5. A Temporal Action-Graph Game (TAGG) is a tuple (N, T, A , X , D, U ), where: 141 1. N = {1, . . . , n} is a set of players. 2. T is the duration of the game. 3. A is a set of actions. 4. X is a set of chance variables. Let G be the induced directed graph over X . We require that G be a directed acyclic graph (DAG). 5. D is the set of decisions. We require that each decision D’s action set Dom[D] ⊆ A . 6. U = {UAτ : A ∈ A , 1 ≤ τ ≤ T } is the set of utility functions. First, let us see how to represent Example 5.1.1 as a TAGG. The set N corresponds to the cars. The duration T = 4. We have one action node for each lane. For each time τ , we have five decisions, each belonging to a car that arrives at time τ . The action set for each decision is the entire set A . The payoff time for each decision is the time the decision is made, i.e., pt(D) = {t(D)}. Each decision has all actions as observations. For each A and τ , the utility UAτ has A as its only parent. The representation size of each utility function is at most n; the size of the entire TAGG is O(|A |T n). The TAGG representation is useful beyond compactly representing MAIDs. The representation can also be used to specify information structures that would be difficult to represent in a MAID. For example, we can represent games in which agents’ abilities to observe the decisions made by previous agents depend on what actions these agents took. Example 5.2.6. There are 2T ice cream vendors, each of which must choose a location along a beach. For every day from 1 to T , two of the vendors simultaneously set up their ice cream stands. Each vendor lives in one of the locations. When a vendor chooses an action, it knows the location of vendors who set up stands in previous days in the location where it lives or in one of the neighboring locations. The payoff to a vendor in a given day depends on how many vendors set up stands in the same location or in a neighboring location. Example 5.2.6 can be represented as a TAGG, the key elements of which are as follows. There is an action A for each location. Each player j has one decision 142 D j , whose observations include actions for the location j lives in and neighboring locations. The payoff time for each decision is T , and the utility function UAT has A and its neighboring locations as parents. Let us consider the size of a TAGG. It follows from Definition 5.2.5 that the space bottlenecks of the representation are the CPTs Pr(X |Pa[X ]) and the utility functions UAτ , which have polynomial sizes when the numbers of their parents are bounded by a constant. Lemma 5.2.7. Given TAGG (N, T, A , X , D, U ), if maxX∈X |Pa[X ]| and maxU∈U |Pa[U ]| are bounded by a constant, then the size of the TAGG is bounded by a polynomial in maxX∈X Dom[X ], |X |, |D|, |U |, and T . 5.2.2 Strategies In Section 2.2.4 we introduced the standard concepts of pure, mixed and behavior strategies in dynamic games. We now apply these concepts to the case of TAGGs. We start with pure strategies, where at each decision D, an action is chosen detert(D)−1 ministically as a function of observed information, i.e., the configuration CO[D] . A mixed strategy of a player i is a probability distribution over pure strategies of i. Recall that since there can be an exponential number of pure strategies in a dynamic game, a mixed strategy is generally an exponential-sized object. We thus restrict our attention to behavior strategies, in which the action choices at different decisions are randomized independently. t(D)−1 Definition 5.2.8. A behavior strategy at decision D is a function σ D : CO[D] → ϕ (Dom[D]), where ϕ (Dom[D]) is the set of probability distributions over Dom[D]. A behavior strategy for player i, denoted σi , is a tuple consisting of a behavior strategy for each of her decisions. A behavior strategy profile σ = (σ1 , . . . , σn ) consists of a behavior strategy σi for all i. An agent has perfect recall when she never forgets her action choices and observations at earlier decisions. The TAGG representation does not enforce perfect recall; TAGGs can represent perfect recall games as well as non-perfect-recall games. A technical issue on representing perfect recall games as TAGGs is the following: in order to preserve the perfect-recall property of the resulting TAGG, 143 each decision D of player i should observe all of i’s earlier decisions and observations. However, recall that if an action A is in the observation set of one of i’s earlier decisions at time t ′ < t(D), it means that the action count at time t ′ − 1 was observed. Directly including A in O[D] would instead imply that the action count of A at time t(D) − 1 is observed by D, in which case the information structure of the TAGG is different from the original game and is thus not a faithful representation. Instead, we model the situation by creating a deterministic chance variable ′ XAt −1 with instantiation time t ′ − 1; its only parent is A and its value is the action ′ count of A at time t ′ − 1. We then include XAt −1 in O[D]. It is straightforward to see ′ that XAt −1 carries the information equivalent to observing the action count of A at time t ′ − 1, and the resulting TAGG provides a correct representation of the perfect recall game. 5.2.3 Expected Utility Now we use the language of Bayesian networks to formally define an agent’s expected utility in a TAGG given a behavior strategy profile σ . Specifically, we define an induced BN that formally describes how the TAGG is played out. Given a behavioral strategy profile, decisions, chance variables and utilities can naturally be understood as random variables. On the other hand, action counts are time dependent. Thus, we have a separate action count variable for each action at each time step. Definition 5.2.9. Let A ∈ A be an action and τ ∈ {1, ..., T } be a time point. Aτ denotes the action count variable representing the number of times A was chosen from time 1 to time τ . Let A0 be the variable which is 0 with probability 1. We would like to define expected utility for each player, which is the sum of expected utilities of the player’s decisions. On the other hand, the utility functions in TAGGs are action specific. To bridge the gap, we create new decision-payoff variables in the induced BN that represent the utilities of decisions received at each of their payoff time points. Definition 5.2.10. Given a TAGG and a behavior strategy profile σ , the induced BN is defined over the following variables: for each decision D ∈ D there is a 144 behavior strategy variable which by abuse of notation we shall also denote by D; for each chance variable X ∈ X there is a variable which we shall also denote by X ; there is a variable Aτ for each action A ∈ A and time step τ ∈ {1, ..., T }; for each utility function UAτ for actions A ∈ A and time points τ ∈ {1, ..., T }, there is a utility variable also denoted by UAτ ; for each decision D and each time τ ∈ pt(D), there is a decision-payoff variable uτD . We define the actual parents of each variable V , denoted APa[V ], as follows: The actual parents of a behavior strategy variable D are the variables correspondt(D)−1 ing to O[D], with each action Ak ∈ O[D] replaced by Ak ents of an action count variable Aτ . The actual par- are all behavior strategy variables D whose decision time t(D) ≤ τ and A ∈ Dom[D]. The actual parents of a chance variable X are the variables corresponding to Pa[X ], with each action Ak ∈ Pa[X ] t(X) replaced by Ak . The actual parents of a utility variable UAτ are the variables corresponding to Pa[UAτ ], with each action Ak ∈ Pa[UAτ ] replaced by Aτk . where {A1 , ..., Aℓ } = Dom[D]. The CPDs of chance variables are the CPDs of the corresponding chance variables in the TAGG. The CPD of each behavior strategy variable D is the behavior strategy σ D . The CPD of each utility variable UAτ is a deterministic function defined by the corresponding utility function UAτ . The CPD of each action count variable Aτ is a deterministic function that counts the number of decisions in APa[A] that are assigned value A. The CPD of each decision-payoff variable uτD is a multiplexer, i.e. a deterministic function that selects the value of its utility variable parent according to the choice of its decision parent. For example, if the value of D is Ak , then the value of uτD is the value of UAτk . Theorem 5.2.11. Given a TAGG, let F be the directed graph over the variables of the induced BN in which there is an edge from V1 to V2 iff V1 is an actual parent of V2 . Then F is acyclic. This follows from the definition of TAGGs and the way we set up the actual parents in Definition 5.2.10. By Theorem 5.2.11, the induced BN defines a joint probability distribution over its variables, which we denote by Pσ . Given σ , denote by E σ [V ] the expected value of variable V in the induced BN. We are now ready to define the expected utility to 145 decisions action count variables utility variables decision-payoff variable Figure 5.1: Induced BN of the TAGG of Example 5.1.1, with 2 time steps, 3 lanes, and 3 players per time step. Squares represent behavior strategy variables, circles represent action count variables, diamonds represent utility variables and shaded diamonds represent decision-payoff variables. To avoid cluttering the graph, we only show utility variables at time step 2 and a decision-payoff variable for one of the decisions. players under behavior strategy profiles. Definition 5.2.12. The expected utility to player ℓ under behavior strategy profile σ is EUσ (ℓ) = ∑D∈Decs[ℓ] ∑τ ∈pt(D) E σ [uτD ]. Figure 5.1 shows an induced BN of a TAGG based on Example 5.1.1 with six cars and three lanes. Note that although we use squares to represent behavior strategy variables, they are random variables and not actual decisions as in influence diagrams. 5.2.4 The Induced MAID of a TAGG Given a TAGG we can construct a MAID that describes the same game. We use a similar construction as the induced Bayesian Network, but with two differences. First, instead of behavior strategy variables with CPDs assigned by σ , we have decision nodes in the MAID. Second, each decision-payoff variable uτD becomes a utility node for player pl(D) in the MAID. The resulting MAID describes the same game as the TAGG, because it offers agents the same strategies and their expected utilities are defined by the same BN. We call this the induced MAID of the TAGG. 146 5.2.5 Expressiveness It is natural to ask about the expressiveness of TAGGs: what games can we represent? It turns out that TAGGs are able to compactly represent all MAIDs. Lemma 5.2.13. Any MAID can be represented as a TAGG with the same space complexity. Proof. Recall that a MAID consists of a set of decisions, a set of chance nodes and a set of utility nodes. Given a MAID, we construct a TAGG in the following way: • For each decision D′ of the MAID and each value d ′ ∈ Dom[D′ ], create an unique action Ad ′ in the TAGG. • Decisions and chance nodes of the MAID can be directly copied over to the TAGG. • Utility nodes in MAIDs are player-specific: each utility node is associated with some player. Utility nodes in TAGGs are action specific. We can encode MAID utility nodes as TAGG utility nodes as follows: Given a MAID utility node U ′ associated with player j, create a dummy decision DU ′ belonging to player j, whose action set contains exactly one action AU ′ . We then encode the utility function for U ′ in the MAID as the utility associated with action AU ′ in the TAGG. • One difference between MAIDs and TAGGs is that in MAIDs decisions can be parents of chance and utility nodes; in TAGGs only chance variables and actions can be parents of chance and utility nodes. Nevertheless, MAID chance nodes and utility nodes can be encoded in TAGGs by replacing each decision parent D′ by the corresponding set of actions in Dom[D′ ]. • Decisions and chance nodes of MAIDs are not associated with time points. Nevertheless, since the MAID is a directed acyclic graph, we can assign decision times to decisions and instantiation times to chance variables that are consistent with the topological order of the MAID. The payoff times of each decision is assigned to be the singleton {T }, i.e. at the end of the game. 147 As a result, TAGGs can represent any extensive form game representable as a MAID. These include all perfect recall games, and the subclass of imperfect recall games where each information set does not involve multiple time steps. Now consider the converse problem of reducing TAGGs to MAIDs. In this case, since the induced MAID of a TAGG is payoff equivalent to the TAGG, it trivially follows that any TAGG can be represented by a MAID. However, the induced MAID has a large in-degree, and can thus be exponentially larger than the TAGG. For example, in the games of Examples 5.1.1 and 5.2.6, the induced MAIDs have max in-degrees that are equal to the number of decisions, which implies that the sizes of the MAIDs grow exponentially with the number of decisions, whereas the sizes of the TAGGs for the same games grow linearly in the number of decisions. This is not surprising, since TAGGs can exploit more kinds of structure in the game (CSI, anonymity) compared to a straightforward MAID representation. In Section 5.3.1 we show that the induced MAID can be transformed into a MAID that explicitly represents the underlying structure. The size of the transformed MAID is polynomial in the size of the TAGG. The TAGG representation is also a true generalization of AGGs, since any AGG-0/ can be straightforwardly represented as a TAGG with T = 1. Function nodes in AGG-FNs and AGG-FNAs can be modeled as chance nodes with a deterministic CPT, thus AGG-FNs and AGG-FNAs can also be represented as TAGGs with T = 1. 5.3 Computing Expected Utility In this section, we consider the task of computing expected utility EUσ [ j] to a player j given a mixed strategy profile σ . As mentioned in Section 2.2.4, computation of EU is an essential step in many game-theoretic computations for dynamic games, such as finding a best response given other players’ strategy profile, checking whether a strategy profile is a Nash equilibrium, and heuristic algorithms such as fictitious play and iterated best response. In Section 5.4 we discuss extending our methods in this section to a subtask in the Govidan-Wilson algorithm for computing Nash equilibria. One benefit of formally defining EU in terms of BNs is that now the problem of 148 computing EU can be naturally cast as a BN inference problem. (In Chapter 3 we discussed such a reduction in the context of AGGs.) By Definition 5.2.12, EUσ [ j] is the sum of a polynomial number of terms of the form E σ [uτD ]. We thus focus on computing one such E σ [uτD ]. This can be computed by applying a standard BN inference algorithm on the induced BN. In fact, BN inference is the standard approach for computing expected utility in MAIDs [Koller and Milch, 2003]. Thus the above approach for TAGGs is computationally equivalent to the standard approach for a natural MAID representation of the same game. In this section, we show that the induced BNs of TAGGs have special structure that can be exploited to speed up computation, and present an algorithm that exploits this structure. 5.3.1 Exploiting Causal Independence The standard BN inference approach for computing EU does not take advantage of some kinds of TAGG structure. In particular, recall that in the induced network, each action count variable Aτ ’s parents are all previous decisions that have Aτ in their action sets, implying large in-degrees for action variables. Considering for example the clique-tree algorithm, this means large clique sizes, which is problematic because running time scales exponentially in the largest clique size of the clique tree. However, the CPDs of these action count variables are structured counting functions. Such structure is an instance of causal independence in BNs [Heckerman and Breese, 1996]. It also corresponds to anonymity structure for static game representations like symmetric games and AGGs. We can exploit this structure to speed up computation of expected utility in TAGGs. Our approach is a specialization of Heckerman and Breese’s [1996] method for exploiting causal independence in BNs. At a high level, Heckerman and Breese’s method transforms the original BN by creating new nodes that represent intermediate results, and re-wiring some of the arcs, resulting in an equivalent BN with small in-degree. They then apply conventional inference algorithms on the new BN. For example, given an action count variable Aτk with parents {D1 . . . Dℓ }, create a node Mi for each i ∈ {1 . . . ℓ − 1}, representing the count induced by D1 . . . Di . Then, instead of having D1 . . . Dℓ as parents of Aτk , its parents become Dℓ and Mℓ−1 , and each Mi ’s parents are Di and Mi−1 . The resulting graph would have in-degree at 149 Figure 5.2: The transformed BN of the tollbooth game from Figure 5.1 with 3 lanes and 3 cars per time step. most 2 for Aτk and the Mi ’s. In our induced BN, the action count variables Atk at earlier time steps t < τ already represent some of these intermediate counts, so we do not need to duplicate them. Formally, we modify the original BN in the following way: for each action count variable Aτk , first remove the edges from its current parents. Instead, Aτk now has two parents: the action count variable Aτk −1 and a new node MAτ k representing the contribution of decisions at time τ to the count of Ak . If there is more than one decision at time τ that has Ak in its action set, we create intermediate variables as in Heckerman and Breese’s method. We call the resulting BN the transformed BN of the TAGG. Figure 5.2 shows the transformed BN of the tollbooth game whose induced BN was given in Figure 5.1. ′ We can then use standard algorithms to compute probabilities P(utD ) on the transformed BN. For classes of BNs with bounded treewidths, these probabilities ′ (and thus E[utD ]) can be computed in polynomial time. 5.3.2 Exploiting Temporal Structure In practice, the standard inference approaches use heuristics to find an elimination ordering. This might not be optimal for our BNs. We present an algorithm based on the idea of eliminating variables in the temporal order. For the rest of the section, we fix D and a time t ′ ∈ pt(D) and consider the computation of E σ [utD ]. ′ We first group the variables of the induced network by time steps: variables at time τ include decisions at τ , action count variables Aτ , chance variables X with instantiation time τ , intermediate nodes between decisions and action counts at τ , 150 and utility variables UAτ . As we are only concerned about E σ [utD ] for a t ′ ∈ pt(D), ′ we can safely discard the variables after time t ′ , as well as utility variables before t ′ . It is straightforward to verify that the actual parents of variables at time τ are either at τ or before τ . We say a network satisfies the Markov property if the actual parents of variables at time τ are either at τ or at τ − 1. Parts of the induced BN (e.g. the action count variables) already satisfy the Markov property, but in general the network does not satisfy the property. Exceptions include chance variable parents and decision parents from more than one time step ago. Given an induced BN, we can transform it into an equivalent network satisfying the Markov property. If a variable V1 at t1 is a parent of variable V2 at t2 , with t2 − t1 > 1, then for each t1 < τ < t2 we create a dummy variable V1τ belonging to time τ so that we copy the value of V1 to V1t2 −1 . We then delete the edge from V1 to V2 and add an edge from V1t2 −1 to V2 . The Markov property is computationally desirable because variables in time τ d-separate past variables from future variables. A straightforward approach to exploiting the Markov property is the following: as τ goes from 1 to t ′ , compute the joint distribution over variables at τ using the joint distribution over variables at τ − 1. In fact, we can do better by adapting the interface algorithm [Darwiche, 2001] for dynamic Bayesian networks to our setting.3 Define the interface Iτ to be the set of variables in time τ that have children in time τ + 1. Iτ d-separates past from future, where past is all variables before τ and non-interface variables in τ , and future is all variables after τ . In an induced BN, Iτ consists of: action count variables at time τ ; chance variables X at time τ that have children in future; decisions at τ that are observed ′ by future decisions; decision D which is a parent of utD , and dummy variables created by the transform. We define the set of effective variables at time τ , denoted by Vτ , as the subset 3 Whereas in DBNs the set of variables for each time step remains the same, for our setting this is no longer the case. It turns out that the interface algorithm can be adapted to work on our transformed BNs. Also, the transformed BNs of TAGGs have more structure than DBNs, particularly within the same time step, which we exploit for further computational speedup. 151 of Iτ that are ancestors of utD . For time t ′ , we let Vt = {utD }. Intuitively, at each ′ ′ ′ time step τ we only need to keep track of the distribution P(Vτ ), which acts as a sufficient statistic as we go forward in time. For each τ , we calculate P(Vτ ) by conditioning on instantiations of P(Vτ −1 ). The interface algorithm for TAGGs can be summarized as the following: 1. compute distribution P(V0 ) 2. for τ = 1 to t ′ (a) for each instantiation of Vτ −1 , vτj −1 , compute the distribution over Vτ : P Vτ |Vτ −1 = vτj −1 (b) P(Vτ ) = ∑v P Vτ |Vτ −1 = v P Vτ −1 = v ′ ′ ′ 3. since Vt = {utD }, we now have P(utD ) ′ 4. return the expected value E[utD ] We can further improve on this, in particular on the subtask of computing P(Vτ |Vτ −1 ). We observe that there is also a temporal order among variables in each time τ : first the decisions and intermediate variables, then action count variables, and finally chance variables. Partition Vτ into four subsets consisting of action count variables Aτ , chance variables Xτ , behavior strategy variables Dτ and dummy copy variables Cτ . Then P(Vτ |Vτ −1 ) can be factored into P(Cτ |Vτ −1 )P(Dτ , Aτ |Vτ −1 )P(Xτ |Aτ , Vτ −1 ). This allows us to first focus on decisions and action count variables to compute P(Dτ , Aτ |Vτ −1 ) and then carry out inference on the chance variables. Calculating P(Dτ , Aτ |Vτ −1 ) involves eliminating all behavior strategy variables not in Dτ as well as the intermediate variables. Note that conditioned on Vτ −1 , all decisions at time τ are independent. This allows us to efficiently eliminate variables along the chains of intermediate variables. Let the decisions at time τ be {Dτ1 , . . . , Dτℓ }. Let Mτ be the set of intermediate variables corresponding to action count variables in Aτ . Let Mτk be the subset of Mτ that summarizes the contribution of Dτ1 , . . . , Dτk . We eliminate variables in the order Dτ1 , Dτ2 , Mτ2 , Dτ3 , Mτ3 , . . . , Mτℓ , except for decisions in Dτ . The tables in the variable elimination algorithm need 152 to keep track of at most |Dτ | + |Aτ | variables. Thus the complexity of computing P(Dτ , Aτ |Vτ −1 ) for an instantiation of Vτ −1 is exponential only in |Dτ | + |Aτ |. Computing P(Xτ |Aτ , Vτ −1 ) for each instantiation of Aτ , Vτ −1 involves eliminating the chance variables not in Xτ . Any standard inference algorithm can be applied here. The complexity is exponential in the treewidth of the induced BN restricted on all chance variables at time τ , which we denote by Gτ . Putting everything together, the bottleneck of our algorithm is constructing the tables for the joint distributions on Vτ , as well as doing inference on Gτ . Theorem 5.3.1. Given a TAGG and behavior strategy profile σ , if for all τ , both |Vτ | and the treewidth of Gτ are bounded by a constant, then for any player j the expected utility EUσ [ j] can be computed in time polynomial in the size of the TAGG representation and the size of σ . Our algorithm is especially effective for induced networks that are close to having the Markov property, in which case we only add a small number of dummy copy variables to Vτ . If only a constant number of dummy copy variables are added, the time complexity of computing expected utility then grows linearly in the duration of the game. On the other hand, for induced networks far from having the Markov property, |Vτ | can grow linearly as τ increases, implying that the time complexity is exponential. 5.3.3 Exploiting Context-Specific Independence TAGGs have action-specific utility functions, which allows them to express contextspecific payoff independence: which utility function is used depends on which action is chosen at the decision. This is translated to context-specific independence structure in the induced BN, specifically in the CPD of uτD . Conditioned on the value of D, uτD only depends on one of its utility variable parents. There are several ways of exploiting such structure computationally, including conditioning on the value of the decision D [Boutilier et al., 1996], or exploiting the context-specific independence in a variable elimination algorithm [Poole and Zhang, 2003]. One particularly simple approach that works for multiplexer utility nodes is to decompose the utility into a sum of utilities [Pfeffer, 2000]. For each utility node parent Ukt of utD , there is a utility function utD,k that depends on Ukt 153 and D. If D = k, utD,k is equal to Ukt . Otherwise, utD,k is 0. It is easy to see t t that utD (U1t , . . . ,Umt , D) = ∑m k=1 uD,k (Uk , D). We can then modify our algorithm to compute each E[utD,k ] instead of E[utD ]. This results in a reduction in the set of effective variables Vτk , which are now the variables at τ that are ancestors of utD,k . Furthermore, whenever Vτk = Vτk′ for some k, k′ , the distributions over them are identical and thus can be reused. For static games represented as TAGGs with T = 1, our algorithm is equivalent to the polynomial-time expected utility algorithm for AGGs described in Chapter 3. Applying our algorithm to tollbooth games of Example 5.1.1 and ice cream games of Example 5.2.6, we observe that for both cases Vτ consists of a subset of action count variables at τ plus the decision whose utility we are computing. Therefore the expected utilities of these games can be computed in polynomial time if |A | is bounded by a constant. 5.4 Computing Nash Equilibria Since the induced MAID of a TAGG is payoff equivalent to the TAGG, algorithms for computing the Nash equilibria of MAIDs [Blum et al., 2006, Koller and Milch, 2003, Milch and Koller, 2008] can be directly applied to an induced MAID to find Nash equilibria of a TAGG. However, this approach does not exploit all TAGG structure. We can do better by constructing a transformed MAID, in a manner similar to the transformed BN, exploiting causal independence and CSI as in Sections 5.3.1 and 5.3.3. We can do better yet and exploit the temporal structure as described in Section 5.3.2, if we use a solution algorithm that requires computation of probabilities and expected utilities. Govindan and Wilson [2002] presented an algorithm for computing equilibria in perfect-recall extensive-form games. Blum, Shelton and Koller [2006] adapted this algorithm to MAIDs. A key step in the algorithm is, for each pair of players i and j, and one of i’s utility nodes, computing the marginal distribution over i’s decisions and their parents, j’s decisions and their parents, and the utility node. Our algorithm in Section 5.3.2 can be straightforwardly adapted to compute this distribution. This approach is efficient if each player only has a small number of decisions, as in the games in Examples 5.1.1 and 5.2.6. 154 1000 100 10 1 10000 10000 1000 1000 CPU time ((seconds) CPU time (s seconds) CPU time ((seconds) 10000 100 10 1 0.1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 T, duration of the TAGG 8 100 10 1 0.1 9 10 11 12 13 14 15 16 17 18 19 20 1 cars per time step 2 3 4 5 6 7 8 9 10 11 12 13 14 15 T, duration of the TAGG Figure 5.3: Running times for expected utility computation. Triangle data points represent Approach 1 (induced BN), diamonds represent Approach 2 (transformed BN), squares represent Approach 3 (proposed algorithm). However, we did not implement these algorithms for TAGGs, because of a lack of publicly-available implementations for these algorithms. In particular, whereas Gametracer [Blum et al., 2002] provided an implementation of Govindan and Wilson’s [2003] global Newton method for normal form games, it did not provide an implementation of Govindan and Wilson’s [2002] algorithm for extensive-form games. 5.5 Experiments We have implemented our algorithm for computing expected utility in TAGGs, and run experiments on the efficiency and scalability of our algorithm. We compared three approaches for computing expected utility given a TAGG: Approach 1 applying the standard clique tree algorithm (as implemented by the Bayes Net Toolbox [Murphy, 2007]) on the induced BN; Approach 2 applying the same clique tree algorithm on the transformed BN; Approach 3 our proposed algorithm in Section 5.3. All approaches were implemented in MATLAB. All our experiments were performed using a computer cluster consisting of machines with dual Intel Xeon 3.2GHz CPUs, 2MB cache and 2GB RAM. We ran experiments on tollbooth game instances of varying sizes. For each game instance we measured the CPU times for computing expected utility of 100 155 random behavior strategy profiles. Figure 5.3 (left) shows the results in log scale for toll booth games with 3 lanes and 5 cars per time step, with the duration varying from 1 to 15. Approach 1 ran out of memory for games with more than 1 time step. Approach 2 was more scalable; but ran out of memory for games with more than 5 time steps. Approach 3 was the most scalable. On smaller instances it was faster than the other two approaches by an order of magnitude, and it did not run out of memory as we increased the size of the TAGGs to at least 20 time steps. For the toll booth game with 14 time steps it took 1279 seconds, which is approximately the time Approach 2 took for the game instance with 5 time steps. Figure 5.3 (middle) shows the results in log scale for tollbooth games with 3 time steps and 3 lanes, varying the number of cars per time step from 1 to 20. Approach 1 ran out of memory for games with more than 3 cars per time step; Approach 2 ran out of memory for games with more than 6 cars per time step; and again Approach 3 was the most scalable. We also ran experiments on the ice cream games of Example 5.2.6. Figure 5.3 (right) shows the results in log scale for ice cream games with 4 locations, two vendors per time step, and durations varying from 1 to 15. The home locations for each vendor were generated randomly. Approaches 1 and 2 ran out of memory for games with more than 3 and 4 time steps, respectively. Approach 3 finished for games with 15 time steps in about the same time as Approach 2 took for games with 4 time steps. 5.6 Conclusions TAGGs are a novel graphical representation of imperfect-information extensiveform games. They are an extension of simultaneous-move AGGs to the dynamic setting; and can be thought of as a sequence of AGGs played over T time steps, with action counts accumulating as time progresses. This process can be formally described by the induced BN. For situations with anonymity or CSI structure, the TAGG representation can be exponentially more compact than a direct MAID representation. We presented an algorithm for computing expected utility for TAGGs that exploits its anonymity, CSI as well as temporal structure. We showed both theoretically and empirically that our approach is significantly more efficient than 156 the standard approach on a direct MAID representation of the same game. Another interesting solution concept is extensive-form correlated equilibrium [von Stengel and Forges, 2008]. EFCE was defined for perfect-recall extensiveform games, but the concept can be applied to other representations of perfectrecall dynamic games. One interesting direction is to adapt Huang and Von Stengel’s [2008] polynomial-time algorithm for computing sample EFCE to compact representations like MAIDs and TAGGs. As mentioned in Section 2.2.4, dynamic games with perfect recall have nice properties including the existence of Nash equilibria in behavior strategies. Furthermore, most of existing algorithmic approaches for dynamic games assume perfect recall. However, strategies in perfect-recall games can be computationally expensive to represent and reason about. For example in a perfect-recall TAGG, since each decision of a player has to condition on all previous decisions and observations of the player, the representation size of a behavior strategy grows exponentially in the number of previous decisions of that player. Representations like MAIDs and TAGGs can compactly express the utility functions, but this exponential blow-up of the strategy space is an inherent property of perfect recall. This blow-up already arises for two-player zero-sum games such as poker. Perfect recall is thus also problematic as a realistic model of rationality, since real-life agents do not have unlimited amount of memory. In light of this, an interesting direction is to explore imperfect-recall models, and solution concepts and algorithms for such models. In single-agent settings, there has been research on relaxing perfect recall using limited memory influence diagrams (LIMIDs) [Nilsson and Lauritzen, 2000]. However, for multi-agent imperfect recall games, existence of Nash equilibria in behavior strategies is not guaranteed. There has been some research on classes of imperfect recall games in which such equilibria do exist. One approach is based on “forgetting” certain “payoff-irrelevant” information from certain classes of perfect recall games, and showing that the resulting imperfect-recall game has a Nash equilibrium in behavior strategies that is also a Nash equilibrium of the original perfect recall game. Such equilibria are called Markov Perfect Equilibria (MPE) [e.g., Fudenberg and Tirole, 1991]. Milch and Koller [2008] took such an approach for MAIDs, in which case forgetting information corresponds to deleting certain edges into decision nodes. However, even if Nash equilibria in behavior strategies 157 exist in the resulting imperfect-recall game, there is currently no general-purpose algorithm for finding such equilibria. For the zero-sum game of poker, Waugh et al. [2009] considered the approach of formulating imperfect-recall models where players forget certain information. The reduction in strategy space allowed them to solve larger instances (corresponding to finer abstractions to the game of poker) than previously possible. They solved the resulting imperfect-recall game using counterfactual regret minimization, a heuristic algorithm without theoretical guarantees but appeared to empirically converge to approximate equilibria. Although unlike the MPE case, the transformation is not lossless (i.e., a Nash equilibrium of the imperfect-recall game is no longer a Nash equilibrium of the original game), they showed empirically that agents using the resulting strategies performed well. There have also been research on weaker solution concepts than MPE that allow players to ignore more information, such as Mean Field Equilibrium [e.g., Adlakha et al., 2010, Iyer et al., 2011]. Another approach is to consider restricted settings that admit stronger theoretical and practical properties. For instance, in Chapter 6 we consider Bayesian games, which (recall from Section 2.1.3) can be formulated as dynamic games; however they have specific structure that makes them computationally friendlier than arbitrary dynamic games. In particular, these games do not have the problem of exponential blow-up of strategy space. We are able to leverage techniques from simultaneous-move games for representing and computing with Bayesian games. 158 Chapter 6 Bayesian Action-Graph Games 6.1 Introduction In this chapter we1 consider static games of incomplete information (or Bayesian games) [Harsanyi, 1967], in which (recall from Section 2.1.3) players are uncertain about the underlying game. Bayesian games have found many applications in economics, including most notably auction theory and mechanism design. Our interest is in computing with Bayesian games, and particularly in identifying sample Bayes-Nash equilibrium. We surveyed the relevant literature in Chapter 2, specifically Sections 2.1.3 and 2.2.3. To summarize, there are two key obstacles to performing such computations efficiently. The first is representational: recall that the straightforward tabular representation of Bayesian game utility functions (the Bayesian Normal Form) requires space exponential in the number of players. The second obstacle is the lack of existing algorithms for identifying sample BayesNash equilibrium for arbitrary Bayesian games. Recall that a Bayesian game can be interpreted as an equivalent complete-information game via “induced normal form” or “agent form” interpretations. Thus one approach is to interpret a Bayesian game as a complete-information game, enabling the use of existing Nash-equilibriumfinding algorithms. However, generating the normal form representations under both of these complete-information interpretations causes an exponential blowup in representation size, even when the Bayesian game has only two players. 1 This chapter is based on joint work with Kevin Leyton-Brown [2010]. 159 In this chapter we propose Bayesian Action-Graph Games (BAGGs), a compact representation for Bayesian games. BAGGs can represent arbitrary Bayesian games, and furthermore can compactly express Bayesian games with commonly encountered types of structure. The type profile distribution is represented as a Bayesian network, which can exploit conditional independence structure among the types. BAGGs represent utility functions in a way similar to the AGG representation, and like AGGs, are able to exploit anonymity and action-specific utility independencies. Furthermore, BAGGs can compactly express Bayesian games exhibiting type-specific independence: each player’s utility function can have different kinds of structure depending on her instantiated type. We provide an algorithm for computing expected utility in BAGGs, a key step in many algorithms for gametheoretic solution concepts. As in Chapter 5, our approach interprets expected utility computation as a probabilistic inference problem on an induced Bayesian Network. In particular, our algorithm runs in polynomial time for the important case of independent type distributions. To compute Bayes-Nash equilibria for BAGGs, we consider the agent form interpretation of the BAGG. Howson and Rosenthal [1974] showed that the agent form of an arbitrary two-player Bayesian game is a polymatrix game, which can be represented compactly (thus avoiding the aforementioned blowup) and solved using a variant of the Lemke-Howson algorithm. However, for n-player BAGGs the corresponding agent forms do not correspond to polymatrix games or any other known representation, and the Lemke-Howson algorithm cannot be applied. Nevertheless, we are able to generalize Howson and Rosenthal’s approach to propose an algorithm for finding sample Bayes-Nash equilibria for arbitrary BAGGs. Specifically, we show that BAGGs can act as a general compact representation of the agent form; in particular, computational tasks on the agent form can be done efficiently by leveraging our expected utility algorithm for BAGGs. We then apply black-box approaches for Nash equilibria in complete-information games discussed in Sections 2.2.2 and 3.4, specifically the simplicial subdivision algorithm [van der Laan et al., 1987] and Govindan and Wilson’s [2003] global Newton method. We show empirically that our approach outperforms the existing approaches of solving for Nash on the induced normal form or on the normal form representation of the agent form. 160 Bayesian games can be interpreted as dynamic games with a initial move by Nature; thus, also related is the literature on representations for dynamic games, including MAIDs and TAGGs. Compared to these representations for dynamic games, BAGGs focus explicitly on structure common to Bayesian games; in particular, only BAGGs can efficiently express type-specific utility structure. Also, by representing utility functions and type distributions as separate components, BAGGs can be more versatile. For example, one future direction made possible by this separation is to model Bayesian games without common type distributions. Another future direction is to answer computational questions that do not depend on the type distribution, such as computing ex-post equilibria. Furthermore, we will see that BAGGs enjoy nicer computational properties than arbitrary dynamic games. For example, BAGGs can be solved by adapting Govindan and Wilson’s global Newton method [2003] (see Section 2.2.1) for static games; this is generally more practical than their related Nash equilibrium algorithm [2002] that directly works on dynamic games: while both approaches avoid the exponential blowup of transforming to the induced normal form, the global Newton method for dynamic games has to solve an additional quadratic program at each step of the homotopy. A limitation of BAGGs is that it requires the types to be discrete. There has been some research on heuristic methods for finding Bayes-Nash equilibria for Bayesian games with continuous types, including Reeves and Wellman [2004]’s work on iterated best response for certain classes of auction games and Rabinovich et al. [2009]’s work on fictitious play. Developing general compact representations and efficient algorithms for Bayes-Nash equilibria for such games remain interesting open problems. 6.2 Preliminaries The standard definition of a Bayesian game (N, {Ai }i∈N , Θ, P, {ui }i∈N ) is given in Definition 2.1.3. The standard concepts of pure strategy si , mixed strategy σi , expected utility for Bayesian games, and Bayes-Nash equilibrium are introduced in Section 2.2.3. Recall from Section 2.1.3 that the space bottlenecks of representing a Bayesian game are the type distribution and the utility function. Representing them as tables, the Bayesian normal form requires n × ∏ni=1 (|Θi | × |Ai|) + ∏ni=1 |Θi | 161 numbers to specify. We say a Bayesian game has independent type distributions if players’ types are drawn independently, i.e. the type-profile distribution P(θ ) is a product distribution: P(θ ) = ∏i P(θi ). In this case the distribution P can be represented compactly using ∑i |Θi | numbers. Given a permutation of players π : N → N and an action profile a = (a1 , . . . , an ), let aπ = (aπ (1) , . . . , aπ (n) ). Similarly let θ π = (θπ (1) , . . . , θπ (n) ). We say the type distribution P is symmetric if |Θi | = |Θ j | for all i, j ∈ N, and if for all permutations π : N → N, P(θ ) = P(θ π ). We say a Bayesian game has symmetric utility functions if |Ai | = |A j | and |Θi | = |Θ j | for all i, j ∈ N, and if for all permutations π : N → N, we have ui (a, θ ) = uπ (i) (aπ , θ π ) for all i ∈ N. A Bayesian game is symmetric if its type distribution and utility functions are symmetric. The utility functions of such i ||Ai | a game range over at most |Θi ||Ai | n−2+|Θ unique utility values. |Θi ||Ai |−1 A Bayesian game exhibits conditional utility independence if each player i’s utility depends on the action profile a and her own type θi , but does not depend on the other players’ types. Then the utility function of each player i ranges over at most |A||Θi | unique utility values. 6.2.1 Complete-information interpretations Harsanyi [Harsanyi, 1967] showed that any Bayesian game can be interpreted as one of two complete-information games, the Nash equilibria of each of which correspond to Bayes-Nash equilibria of the Bayesian game. A Bayesian game can be converted to its induced normal form, which is a complete-information game with the same set of n players, in which each player’s set of actions is her set of pure strategies in the Bayesian game. Each player’s utility under an action profile is defined to be equal to the player’s expected utility under the corresponding pure strategy profile in the Bayesian game. Alternatively, a Bayesian game can be transformed to its agent form, where each type of each player in the Bayesian game is turned into one player in a complete-information game. Formally, given a Bayesian game (N, {Ai }i∈N , Θ, P, ˜ {ui }i∈N ), we define its agent form as the complete-information game (N, {A˜ j,θ j }( j,θ )∈N˜ , {u˜ j,θ j }( j,θ )∈N˜ ), where N˜ consists of ∑ j∈N |Θ j | players, one for evj j 162 ery type of every player of the Bayesian game. We index the players by the tuple ( j, θ j ) where j ∈ N and θ j ∈ Θ j . For each player ( j, θ j ) ∈ N˜ of the agent form game, her action set A˜ ( j,θ ) is A j , the action set of j in the Bayesian game. The j set of action profiles is then A˜ = ∏ j,θ j A( j,θ j ) . The utility function of player ( j, θ j ) ˜ u˜ j,θ j (a) is u˜ j,θ j : A˜ → R. For all a˜ ∈ A, ˜ is equal to the expected utility of player j of the Bayesian game given type θ j , under the pure strategy profile sa˜ , where for all i and all θi , sai˜ (θi ) = a˜(i,θi ) . Observe that there is a one-to-one correspondence between action profiles in the agent form and pure strategies of the Bayesian game. A similar correspondence exists for mixed strategy profiles: each mixed strategy profile σ of the Bayesian game corresponds to a mixed strategy σ˜ of the agent form, with σ˜ (i,θi ) (ai ) = σi (ai |θi ) for all i, θi , ai . It is straightforward to verify that u˜i,θi (σ˜ ) = ui (σ |θi ) for all i, θi . This implies a correspondence between Bayes Nash equilibria of a Bayesian game and Nash equilibria of its agent form. Proposition 6.2.1. σ is a Bayes-Nash equilibrium of a Bayesian game if and only if σ˜ is a Nash equilibrium of its agent form. 6.3 Bayesian Action-Graph Games In this section we introduce Bayesian Action-Graph Games (BAGGs), a compact representation of Bayesian games. First consider representing the type distributions. Specifically, the type distribution P is specified by a Bayesian network (BN) containing at least n random variables corresponding to the n players’ types θ1 , . . . , θn . For example, when the types are independently distributed, then P can be specified by the simple BN with n variables θ1 , . . . , θn and no edges. Now consider representing the utility functions. Our approach is to adapt concepts from the AGG representation (see Chapter 3) to the Bayesian game setting. At a high level, a BAGG is a Bayesian game on an action graph, a directed graph on a set of action nodes A . To play the game, each player i, given her type θi , simultaneously chooses an action node from her type-action set Ai,θi ⊆ A . Each action node thus corresponds to an action choice that is available to one or more of the players. Once the players have made their choices, an action count is tallied for each action node α ∈ A , which is the number of agents that have chosen α . A player’s utility depends only on the action node she chose and the action counts 163 on the neighbors of the chosen node. We observe that the main difference between the AGG and BAGG representations is that whereas in an AGG each player’s set of available actions is specified by her action set, in a BAGG we have type-action sets, meaning each player’s set of available actions can depend on her instantiated type. We now turn to a formal description of BAGGs’ utility function representation. Central to our model is the action graph.2 An action graph G = (A , E) is a directed graph where A is the set of action nodes, and E is a set of directed edges, with self edges allowed. We say α ′ is a neighbor of α if there is an edge from α ′ to α , i.e., if (α ′ , α ) ∈ E. Let the neighborhood of α , denoted ν (α ), be the set of neighbors of α . For each player i and each instantiation of her type θi ∈ Θi , her type-action set Ai,θi ⊆ A is the set of possible action choices of i given θi . These subsets are unrestricted: different type-action sets may (partially or completely) overlap. Define player i’s total action set to be A∪i = θi ∈Θi Ai,θi . We denote by A = ∏i A∪i the set of action profiles, and by a ∈ A an action profile. Observe that the action profile a provides sufficient information about the type profile to be able to determine the outcome of the game; there is no need to additionally encode the realized type distribution. We note that for different types θi , θi′ ∈ Θi , Ai,θi and Ai,θi′ may have different sizes; i.e., i may have different numbers of available action choices depending on her realized type. A configuration c is a vector of |A | non-negative integers, specifying for each action node the numbers of players choosing that action. Let c(α ) be the element of c corresponding to the action α . Let C : A → C be the function that maps from an action profile a to the corresponding configuration c. Formally, if c = C (a) then c(α ) = |{i ∈ N : ai = α }| for all α ∈ A . Define C = {c : ∃a ∈ A such that c = C (a)}. In other words, C is the set of all possible configurations in the BAGG. Observe that the concept of configurations in BAGGs is related to the concept of configurations in AGGs in the following way: C in a BAGG is isomorphic to the set of configurations in an AGG-0/ with the same action graph G = (A , E) but with action sets corresponding to total action sets of the BAGG, i.e., Ai ≡ A∪i . 2 The definition of action graph coincides with the corresponding concept in AGGs. We repeat the definition here in order to give a complete description of BAGGs. 164 We can also define a configuration over a subset of nodes. In particular, we will be interested in configurations over a node’s neighborhood. Given a configuration c ∈ C and a node α ∈ A , let the configuration over the neighborhood of α , denoted c(α ) , be the restriction of c to ν (α ), i.e., c(α ) = (c(α ′ ))α ′ ∈ν (α ). Similarly, let C(α ) denote the set of configurations over ν (α ) in which at least one player plays α . Let C (α ) : A → C(α ) be the function that maps from an action profile to the corresponding configuration over ν (α ). Definition 6.3.1. A Bayesian action-graph game (BAGG) is a tuple (N, Θ, P, {Ai,θi }i∈N,θi ∈Θi , G, {uα }α ∈A ) where N is the set of agents; Θ = ∏i Θi is the set of type profiles; P is the type distribution, represented as a Bayesian network; Ai,θi ⊆ A is the type-action set of i given θi ; G = (A , E) is the action graph; and for each α ∈ A , the utility function is uα : C(α ) → R. As in the case of AGGs, shared actions in a BAGG capture the game’s anonymity structure. Furthermore, the (lack of) edges between nodes in the action graph of a BAGG expresses action- and type-specific independencies of utilities of the game: depending on player i’s chosen action node (which also encodes information about her type), her utility depends on configurations over different sets of nodes. Lemma 6.3.2. An arbitrary Bayesian game given in Bayesian normal form can be encoded as a BAGG storing the same number of utility values. Proof. Given an arbitrary Bayesian game (N, {Ai }i∈N , Θ, P, {ui }i∈N ) represented in Bayesian normal form, we construct the BAGG (N, Θ, P, {A′i,θi }i∈N,θi ∈Θi , G, {uα }α ∈A ) as follows. The Bayesian normal form’s tabular representation of type profile distribution P can be straightforwardly represented as a BN, e.g. by creating a random variable representing θ as the only parent of the random variables θ1 , . . . , θn . To represent utility functions, we create an action graph G with ∑i |Θi ||Ai | action nodes; in other words, all type-action sets A′i,θi are disjoint. Each action ai ∈ Ai of the Bayesian normal form corresponds to |Θi | action nodes in the BAGG, one for each type instantiation θi . For each player i and each type θi ∈ Θi , each action node α ∈ A′i,θi has incoming edges from all action nodes from typeaction sets A′j,θ j for all j = i, θ j ∈ Θ j , i.e. all action nodes of the other players. For each action node α ∈ A′i,θi corresponding to ai ∈ Ai , the utility function uα is defined as follows: given configuration c(α ) we can infer the action profile a′−i ∈ A′−i 165 of the BAGG, which then tells us the corresponding a−i and θ−i of the Bayesian normal form, which gives us the utility ui (a, θ ). The number of utility values stored in this BAGG is the same as the Bayesian normal form. Bayesian games with symmetric utility functions exhibit anonymity structure, which can be expressed in BAGGs by sharing action nodes. Specifically, we label each Θi as {1, . . . , T }, so that each t ∈ {1, . . . , T } corresponds to a class of equivalent types. Then for each t ∈ {1, . . . , T }, we have Ai,t = A j,t for all i, j ∈ N, i.e. type-action sets for equivalent types are identical. Figure 6.1 shows the action graph for a symmetric Bayesian game with two types and two actions per type. Figure 6.1: Action graph for a symmetric Bayesian game with n players, 2 types, 2 actions per type. 6.3.1 BAGGs with Function Nodes In this section we extend the basic BAGG representation by introducing function nodes to the action graph, as we did for AGG-FNs in Chapter 3. Function nodes allow us to exploit a much wider variety of utility structures in BAGGs. In this extended representation,3 the action graph G’s vertices consist of both the set of action nodes A and the set of function nodes P. We require that no function node p ∈ P can be in any player’s action set. Each function node p ∈ P is associated with a function f p : C(p) → R. We extend c by defining c(p) to be the result of applying f p to the configuration over p’s neighbors, f p (c(p) ). Intuitively, c(p) can be used to describe intermediate parameters that players’ utilities depend 3 The definitions of function nodes and contribution-independent function nodes coincides with the corresponding concepts in AGGs. We repeat them here for completeness. 166 on. To ensure that the BAGG is meaningful, the graph restricted to nodes in P is required to be a directed acyclic graph. As before, for each action node α we define a utility function uα : C(α ) → R. Of particular computational interest is the subclass of contribution-independent function nodes. A function node p in a BAGG is contribution-independent if ν (p) ⊆ A , there exists a commutative and associative operator ∗, and for each α ∈ ν (p) an integer wα , such that given an action profile a = (a1 , . . . , an ), c(p) = ∗i∈N:ai ∈ν (p) wai . A BAGG is contribution-independent if all its function nodes are contribution-independent. Intuitively, if function node p is contribution-independent, each player’s strategy affects c(p) independently. A very useful kind of contribution-independent function nodes are simple aggregator function nodes, which set ∗ to the summation operator + and the weights to 1. Such a function node p simply counts the number of players that chose any action in ν (p). Let us consider the size of a BAGG representation. The representation size of the Bayesian network for P is exponential only in the in-degree of the BN. The utility functions store ∑α |C(α ) | values. Recall that C and thus C(α ) correspond to configurations in an related AGG. We can thus apply the same analysis for the representation size of AGGs in Chapter 3. As in Chapter 3, estimations of this size generally depend on what types of function nodes are included. We state only the following (relatively straightforward) result since in this chapter we are mostly concerned with BAGGs with simple aggregator function nodes. Theorem 6.3.3. Consider BAGGs whose only function nodes, if any, are simple aggregator function nodes. If the in-degrees of the action nodes as well as the indegrees of the Bayesian networks for P are bounded by a constant, then the sizes of the BAGGs are bounded by a polynomial in n, |A |, |P|, ∑i |Θi | and the sizes of domains of variables in the BN. The proof is by a direct application of Corollary 3.2.11. This theorem shows a nice property of simple aggregator function nodes: representation size does not grow exponentially in the in-degrees of these function nodes. The next example (an extension of Example 3.2.7) illustrates the usefulness of simple aggregator function nodes, including for expressing conditional utility independence. 167 Example 6.3.4 (Coffee Shop game). Consider a symmetric Bayesian game involving n players; each player plans to open a new coffee shop in a downtown area, but has to decide on the location. The downtown area is represented by a r × k grid. Each player can choose to open a shop located within any of the B ≡ rk blocks or decide not to enter the market. Each player has one of T types, representing her private information about her cost of opening a coffee shop. Players’ types are independently distributed. Conditioned on player i choosing some location, her utility depends on: (a) her own type; (b) the number of players that chose the same block; (c) the number of players that chose any of the surrounding blocks; and (d) the number of players that chose any other location. The Bayesian normal form representation of this game has size n[T (B + 1)]n . The game can be expressed as a BAGG as follows. Since the game is symmetric, we label the types as {1, . . . , T }. A contains one action O corresponding to not entering and T B other action nodes, with each location corresponding to a set of T action nodes, each representing the choice of that location by a player with a different type. For each t ∈ {1, . . . , T }, the type-action sets Ai,t = A j,t for all i, j ∈ N and each consists of the action O and B actions corresponding to locations for type t. For each location (x, y) we create three function nodes: pxy representing the number of players choosing this location, p′xy representing the number of players choosing any surrounding blocks, and p′′xy representing the number of players choosing any other block. Each of these function nodes is a simple aggregator function node, whose neighbors are action nodes corresponding to the appropriate locations (for all types). Each action node for location (x, y) has three neighbors, pxy , p′xy , and p′′xy . Figure 6.2 shows the action graph for the game with T = 2 on an 1 × k grid. Since the BAGG action graph has maximum in-degree 3, by Theorem 6.3.3 the representation size is polynomial in n, B and T . 6.4 Computing a Bayes-Nash Equilibrium In this section we consider the problem of finding a sample Bayes-Nash equilibrium given a BAGG. Our overall approach is to interpret the Bayesian game as a complete-information game, and then to apply existing algorithms for finding Nash equilibria of complete-information games. We consider two state-of-the-art 168 Figure 6.2: BAGG representation for a Coffee Shop game with 2 types per player on an 1 × k grid. Nash equilibrium algorithms, van der Laan et al’s simplicial subdivision [1987] and Govindan and Wilson’s global Newton method [2003]. Recall from Section 6.2.1 that a Bayesian game can be transformed into its induced normal form or its agent form. In the induced normal form, each player i has |Ai ||Θi | actions (corresponding to her pure strategies of the Bayesian game). Solving such a game would be infeasible for large |Θi |; just to represent an Nash equilibrium requires space exponential in |Θi |. A more promising approach is to consider the agent form. Note that we can straightforwardly adapt the agent-form transformation described in Section 6.2.1 to the setting of BAGGs: now the action set of player (i, θi ) of the agent form corresponds to the type-action set Ai,θi of the BAGG. The resulting complete-information game has ∑i∈N |Θi | players and |Ai,θi | actions for each player (i, θi ); a Nash equilibrium can be represented using just ∑i ∑θi |Ai,θi | numbers. However, the normal form representation of the agent form has size ∑ j∈N |Θ j | ∏i,θi |Ai,θi |, which grows exponentially in n and |Θi |. Applying the Nash equilibrium algorithms to this normal form would be infeasible for large games. Fortunately, we do not have to explicitly represent the agent form as a normal form game. Instead, we treat a BAGG as a compact representation of its agent form, and carry out any required computation on the agent form by operating directly on the BAGG. Recall from 169 Section 2.2.1 that a key computational task required by both Nash equilibrium algorithms in their inner loops is the computation of expected utility of the agent form. Recall from Section 6.2.1 that for all (i, θi ) the expected utility u˜i,θi (σ˜ ) of the agent form is equal to the expected utility ui (σ |θi ) of the Bayesian game. Thus in the remainder of this section we focus on the problem of computing expected utility in BAGGs. 6.4.1 Computing Expected Utility in BAGGs Recall from Section 2.2.3 that σ θi →ai is the mixed strategy profile that is identical to σ except that i plays ai given θi . The main quantity we are interested in is ui (σ θi →ai |θi ), player i’s expected utility given θi under the strategy profile σ θi →ai . Note that the expected utility ui (σ |θi ) can then be computed as the sum ui (σ |θi ) = ∑ai ui (σ θi →ai |θi )σi (ai |θi ). One approach is to directly apply Equation (2.2.2), which has (|Θ−i | × |A|) terms in the summation. For games represented in Bayesian normal form, this algorithm runs in time polynomial in the representation size. Since BAGGs can be exponentially more compact than their equivalent Bayesian normal form representations, this algorithm runs in exponential time for BAGGs. In this section we present a more efficient algorithm that exploits BAGG structure. We first formulate the expected utility problem as a Bayesian network inference problem. Given a BAGG and a mixed strategy profile σ θi →ai , we construct the induced Bayesian network (IBN) as follows. We start with the BN representing the type distribution P, which includes (at least) the random variables θ1 , . . . , θn . The conditional probability distributions (CPDs) for the network are unchanged. We add the following random variables: one strategy variable D j for each player j; one action count variable for each action node α ∈ A , representing its action count, denoted c(α ); one function variable for each function node p ∈ P, representing its configuration value, denoted c(p); and one utility variable U α for each action node α . We then add the following edges: an edge from θ j to D j for each player j; for each player j and each α ∈ A∪j , an edge from D j to c(α ); for each function variable c(p), all incoming edges corresponding to those in the action graph G; and for each α ∈ A , for each action or function node 170 m ∈ ν (α ) in G, an edge from c(m) to U α in the IBN. The CPDs of the newly added random variables are defined as follows. Each strategy variable D j has domain A∪j , and given its parent θ j , its CPD chooses an action from A∪j according to the mixed strategy σ θj i →ai . In other words, if j = i then Pr(D j = a j |θ j ) is equal to σ j (a j |θ j ) for all a j ∈ A j,θ j and 0 for all a j ∈ A∪j \ A j,θ j ; and if j = i we have Pr(D j = ai |θ j ) = 1. For each action node α , the parents of its action-count variable c(α ) are strategy variables that have α in their domains. The CPD is a deterministic function that returns the number of its parents that take value α ; i.e., it calculates the action count of α . For each function variable c(p), its CPD is the deterministic function f p . The CPD for each utility variable U α is a deterministic function specified by uα . Remark 6.4.1. Observe that our construction of IBN here is similar to the construction of induced BN from a TAGG in Chapter 5. One difference is that in a BAGG, type affects utility indirectly through type-action sets, resulting in a different construction of CPDs at the strategy variables D j from the TAGG case. Also, each strategy variable in a BAGG has in-degree 1, whereas in a perfect-recall TAGG the in-degree of a decision of player i grows linearly in the number of i’s previous decisions. It is straightforward to verify that the IBN is a directed acyclic graph (DAG) and thus represents a valid joint distribution. Furthermore, the expected utility ui (σ ti →ai |θi ) is exactly the expected value of the variable U ai conditioned on the instantiated type θi . Lemma 6.4.2. For all i ∈ N, all θi ∈ Θi and all ai ∈ Ai,θi , we have ui (σ θi →ai |θi ) = E[U ai |θi ]. Standard BN inference methods could be used to compute E[U ai |θi ]. However, such standard algorithms do not take advantage of structure that is inherent in BAGGs. In particular, recall that in the induced network, each action count variable c(α )’s parents are all strategy variables that have α in their domains, implying large in-degrees for action count variables. As in the TAGG case, the CPDs of action count variables exhibit causal independence, and we can apply a version of Heckerman and Breese’s method [Heckerman and Breese, 1996] to transform the IBN 171 into an equivalent BN small in-degree. Given an action count variable c(α ) with parents (say) {D1 . . . Dn }, for each i ∈ {1 . . . n− 1} we create a node Mα ,i , representing the count induced by D1 . . . Di . Then, instead of having D1 . . . Dn as parents of c(α ), its parents become Dn and Mα ,n−1 , and each Mα ,i ’s parents are Di and Mα ,i−1 . The resulting graph has in-degree at most 2 for c(α ) and the Mα ,i ’s. The CPDs of function variables corresponding to contribution-independent function nodes also exhibit causal independence, and thus we can use a similar transformation to reduce their in-degree to 2. We call the resulting Bayesian network the transformed Bayesian network (TBN) of the BAGG. As in Chapter 5, it is straightforward to verify that the representation size of the TBN is polynomial in the size of the BAGG. We can then use standard inference algorithms to compute E[U α |θi ] on the TBN. For classes of BNs with bounded treewidths, this can be computed in polynomial time. Since the graph structure (and thus the treewidth) of the TBN does not depend on the strategy profile (but, rather, only on the BAGG itself), we have the following result. Theorem 6.4.3. For BAGGs whose TBNs have bounded treewidths, expected utility can be computed in time polynomial in n, |A |, |P| and | ∑i Θi |. Bayesian games with independent type distributions are an important class of games and have many applications, such as independent-private-value auctions. When contribution-independent BAGGs have independent type distributions, expected utility can be efficiently computed. Theorem 6.4.4. For contribution-independent BAGGs with independent type distributions, expected utility can be computed in time polynomial in the size of the BAGG. Note that this result is stronger than that of Theorem 6.4.3, which only guarantees efficient computation when TBNs have constant treewidth. Proof. We reduce the problem of computing expected utility ui (σ θi →ai |θi ) for BAGGs with independent type distributions to the problem of computing expected utility for AGGs. Given a BAGG (N, G, {uα }α ∈A ), we consider the AGG Γ specified by (N, {A∪i }i∈N , G, {uα }α ∈A ), i.e., an AGG with the same set of players, the same action 172 graph and the same utility functions, but with action sets corresponding to total action sets of the BAGG. The representation size of the AGG Γ is proportional to the size of the BAGG. Furthermore, since the BAGG is contribution-independent, all function nodes in the AGG Γ are contribution-independent. Given i, θi and σ θi →ai , for each player j = i we can calculate Pr(D j ) by summing out θ j : Pr(D j = a j ) = ∑θ j σ j (a j |θ j ). Observe that this distribution of the strategy variable D j can be interpreted as a (complete-information) mixed strategy σ ′j of the AGG Γ’s player j. Similarly for player i, the distribution Pr(Di |θi ) can be interpreted as a mixed strategy σi′ for Γ’s player i. Furthermore these distributions are independent, so they induce the same distribution over configurations of the BAGG as the distribution over configurations of the AGG Γ induced by the mixed-strategy profile σ ′ = (σ1′ , . . . , σn′ ). Therefore the expected utility ui (σ θi →ai |θi ) for the BAGG is equal to the expected utility of i in the AGG Γ under the mixed strategy profile σ ′ . Expected utility for contribution-independent AGGs can be computed in polynomial time by running the algorithm described in Section 3.4.2. An alternative approach for proving Theorem 6.4.4 is to work on the TBN of the BAGG, which can be shown to have treewidth as most |ν (ai )|. Although |ν (ai )| is not necessarily a constant, meaning that Theorem 8 cannot be directly applied, it can be shown that a variable elimination algorithm needs to store at most |C(ai ) | numbers in each of its tables, which is polynomial in the size of the BAGG. These two proof approaches can be thought of as two interpretations of the same expected utility algorithm. 6.5 Experiments We have implemented our approach for computing a Bayes-Nash equilibrium given a BAGG by applying Nash equilibrium algorithms on the agent form of the BAGG. We adapted two algorithms, GAMBIT’s [McKelvey et al., 2006] implementation of simplicial subdivision and GameTracer’s [Blum et al., 2002] implementation of Govindan and Wilson’s global Newton method, by replacing calls to expected utility computations of the complete-information game with corresponding expected utility computations of the BAGG. Recall from Section 3.5 that we have adapted 173 BAGG-AF NF-AF INF 1000 100 10 1 0.1 3 4 5 6 7 number of players 100000 10000 1000 100 10 1 0.1 6 8 10 12 14 16 18 20 number of locations Figure 6.3: GW, varying players. Figure 6.4: GW, varying locations. 10000 100000 1000 10000 100 10 1 0 1 0.1 0.01 2 3 4 5 6 7 8 types per player CPU time in seconds CPU time in seconds econds ds CPU time in seconds 10000 CPU time in seconds econds nds 100000 Figure 6.5: GW, varying types. 1000 100 10 BAGGAF NF-AF 1 0.1 2 3 4 5 6 7 8 number of players Figure 6.6: Simplicial subdivision. GAMBIT’s implementation of simplicial subdivision to a black-box implementation, and that Gametracer’s implementation is already black-box, thus further adaptation of the algorithms to the BAGG case was relatively straightforward to implement once we have the expected utility subroutine. We ran experiments that tested the performance of our approach (denoted by BAGG-AF) against two approaches that compute a Bayes-Nash equilibrium for arbitrary Bayesian games. The first (denoted INF) computes a Nash equilibrium on the induced normal form; the second (denoted NF-AF) computes a Nash equilibrium on the normal form representation of the agent form. Both were implemented using the original, normal-form-based implementations of simplicial subdivision and global Newton method. We thus studied six concrete algorithms, two for each game representation. We tested these algorithms on instances of the Coffee Shop Bayesian game described in Example 6.3.4. We created games of different sizes by varying the number of players, the number of types per player and the number of locations. For each size we generated 10 game instances with random integer payoffs, and measured the running (CPU) times. Each run was cut off after 10 hours if it had not yet finished. All our experiments were performed using a computer cluster consisting of 55 machines with dual Intel Xeon 3.2GHz CPUs, 2MB cache and 2GB RAM, running Suse Linux 11.1. We first tested the three approaches based on the Govindan-Wilson (GW) algo174 rithm. Figure 6.3 shows running time results for Coffee Shop games with n players, 2 types per player on a 2 × 3 grid, with n varying from 3 to 7. Figure 6.4 shows running time results for Coffee Shop games with 3 players, 2 types per player on a 2 × x grid, with x varying from 3 to 10. Figure 6.5 shows results for Coffee Shop games with 3 players, T types per player on a 1× 3 grid, with T varying from 2 to 8. The data points represent the median running time of 10 game instances, with the error bars indicating the maximum and minimum running times. All results show that our BAGG-based approach (BAGG-AF) significantly outperformed the two normal-form-based approaches (INF and NF-AF). Furthermore, as we increased the dimensions of the games the normal-form based approaches quickly ran out of memory (hence the missing data points), whereas BAGG-NF did not. We also did experiments on BAGG-AF and NF-AF running the simplicial subdivision algorithm. Figure 6.6 shows running time results for Coffee Shop games with n players, 2 types per player on a 1 × 3 grid, with n varying from 3 to 7. Again, BAGG-AF significantly outperformed NF-AF, and NF-AF ran out of memory for game instances with more than 4 players. 175 Chapter 7 Polynomial-time Computation of Exact Correlated Equilibrium in Compact Games 7.1 Introduction So far we have focused on the AGG representation and its extensions. For the remaining two technical chapters (this chapter and Chapter 8) we switch our attention to algorithms that work for a wide class of compact representations including AGGs. Specifically, we consider problems regarding correlated equilibrium (CE) [Aumann, 1974, 1987]. In this chapter we consider the problem of computing a sample correlated equilibrium. In Section 2.2.7 we gave an overview of literature on this problem; in order to motivate our results in this chapter we first take a more in-depth look at some of the relevant papers. The “Ellipsoid Against Hope” algorithm [Papadimitriou, 2005, Papadimitriou and Roughgarden, 2008] is a polynomial-time method for identifying (a polynomial-size representation of) a CE, given a game representation satisfying two properties: polynomial type and the polynomial expectation property, which requires access to a polynomial-time algorithm that computes the expected utility of any player under any mixed-strategy profile. Recall that most existing compact game representations discussed in Sec- 176 tion 2.1.1 (including graphical games, symmetric games, congestion games, polymatrix games and action-graph games) satisfy these properties. At a high level, the Ellipsoid Against Hope algorithm works by solving an infeasible dual LP (D) using the ellipsoid method (exploiting the existence of a separation oracle), and arguing that the LP (D′ ) formed by the generated cutting planes must also be infeasible. Solving the dual of this latter LP (which has polynomial size) yields a CE, which is represented as a mixture of the product distributions generated by the separation oracle. The Ellipsoid Against Hope algorithm is an instance of the black-box approach: it calls the expected utility subroutine as part of its separation oracle computation, but does not access the internal details of the representation. 7.1.1 Recent Uncertainty About the Complexity of Exact CE In a recent paper, Stein, Parrilo and Ozdaglar [2010] raised two interrelated concerns about the Ellipsoid Against Hope algorithm. First, they identified a symmetric 3-player, 2-action game with rational1 utilities on which the algorithm can fail to compute an exact CE. Indeed, they showed that the same problem arises on this game for a whole class of related algorithms. Specifically, if an algorithm (a) outputs a rational solution, (b) outputs a convex combination of product distributions, and (c) outputs a convex combination of symmetric product distributions when the game is symmetric, then that algorithm fails to find an exact CE on their game, because the only CE of their game that satisfies properties (b) and (c) has irrational probabilities. This implies that any algorithm for exact rational CE must violate (b) or (c). Second, Stein, Parrilo and Ozdaglar also showed that the original analysis by Papadimitriou and Roughgarden [2008] incorrectly handles certain numerical precision issues, which we now briefly describe. Recall that a run of the ellipsoid method requires as inputs an initial bounding ball with radius R and a volume bound v such that the algorithm stops when the ellipsoid’s volume is smaller than v. To correctly certify the (in)feasibility of an LP using the ellipsoid method, R and v need to be set to appropriate values, which depend on the maximum encoding size of a constraint in the LP. However (as pointed out by Papadimitriou and 1 Throughout this chapter, by “rational” we mean rational numbers (ratios of integers) rather than rationality of players. 177 Roughgarden [2008]), each cut returned by the separation oracle is a convex combination of the constraints of the original dual LP (D) and thus may require more bits to represent than any of the constraints in (D); as a result, the infeasibility of the LP (D′ ) formed by these cuts is not guaranteed. Papadimitriou and Roughgarden [2008] proposed a method to overcome this difficulty, but Stein et al. showed that this method is insufficient for finding an exact CE. For the related problem of finding an approximate correlated equilibrium (ε -CE), Stein et al. gave a slightly modified version of the Ellipsoid Against Hope algorithm that runs in time polynomial in log 1ε and the game representation size.2 For problems that can have necessarily irrational solutions, it is typical to consider such approximations as efficient; however, the computation of a sample CE is not such a problem, as there always exists a rational CE in a game with rational utilities, since CE are defined by linear constraints. It remains an open problem to determine whether the Ellipsoid Against Hope algorithm can be modified to compute an exact, rational correlated equilibrium.3 7.1.2 Our Results In this chapter, we use an alternate approach—completely sidestepping the issues just discussed—to derive a polynomial-time algorithm for computing an exact (and rational) correlated equilibrium given a game representation that has polynomial type and satisfies the polynomial expectation property. Specifically, our approach is based on the observation that if we use a separation oracle (for the same dual LP formulation proposed by Papadimitriou and Roughgarden [2008]) that generates cuts corresponding to pure-strategy profiles (instead of Papadimitriou and Roughgarden’s separation oracle that generates nontrivial product distributions), then these cuts are actual constraints in the dual LP, as opposed to convex combinations of constraints. As a result we no longer encounter the numerical accuracy issues that prevented the previous approaches from finding exact correlated equilibria. Both the resulting algorithm and its analysis are also considerably simpler ε -CE is defined to be a distribution that violates the CE incentive constraints by at most ε . a recent addendum to their original paper, Papadimitriou and Roughgarden [2010] acknowledged the flaw in the original algorithm. We note also that Stein et al. subsequently withdrew their paper from arXiv. It is our belief that their results are nevertheless correct; we discuss them here because they help to motivate our alternate approach. 2 An 3 In 178 than the original: standard techniques from the theory of the ellipsoid method are sufficient to show that our algorithm computes an exact CE using a polynomial number of oracle queries. The key issue is the identification of pure-strategy-profile cuts. It is relatively straightforward to show that such cuts always exist: since the product distribution generated by the Ellipsoid Against Hope algorithm ensures the nonnegativity of a certain expected value, then by a simple application of the probabilistic method there must exist a pure-strategy profile that also ensures the nonnegativity of that expected value. The key is to go beyond this nonconstructive proof of existence to also compute pure-strategy-profile cuts in polynomial time. We show how to do this by applying the method of conditional probabilities [Erd˝os and Selfridge, 1973, Raghavan, 1988, Spencer, 1994], an approach for derandomizing probabilistic proofs of existence. At a high level, our new separation oracle begins with the product distribution generated by Papadimitriou and Roughgarden’s separation oracle, then sequentially fixes a pure strategy for each player in a way that guarantees that the corresponding conditional expectation given the choices so far remains nonnegative. Since our separation oracle goes though players sequentially, the cuts generated can be asymmetric even for symmetric games. Indeed, we can confirm (see Section 7.4.2) that it makes such asymmetric cuts on Stein, Parrilo and Ozdaglar’s symmetric game—thus violating their condition (c)—because our algorithm always identifies a rational CE. As with the Ellipsoid Against Hope algorithm and Stein et al.’s modified algorithm, our algorithm is also a black-box algorithm that calls the expected utility subroutine. Another effect of our use of pure-strategy-profile cuts is that the correlated equilibria generated by our algorithm are guaranteed to have polynomial-sized supports; i.e., they are mixtures over a polynomial number of pure strategy profiles. Correlated equilibria with polynomial-sized supports are known to exist in every game (e.g., [Germano and Lugosi, 2007]); intuitively this is because CE are defined by a polynomial number of linear constraints, so a basic feasible solution of the linear feasibility program would have a polynomial number of non-zero entries. Such small-support correlated equilibria are more natural solutions than the mixtures of product distributions produced by the Ellipsoid Against Hope algorithm: because of their simpler form they require fewer bits to represent and fewer random bits to 179 sample from; furthermore, verifying whether a given polynomial-support distribution is a CE only requires evaluating the utilities of a polynomial number of pure strategy profiles, whereas verifying whether a mixture of product distributions is a CE would require evaluating expected utilities under product distributions, which is generally more expensive. No tractable algorithm has previously been proposed for identifying such a CE, thus our algorithm is the first algorithm that computes in polynomial time a CE with polynomial support given a compactly-represented game. In fact, we show that any CE computed by our algorithm corresponds to a basic feasible solution of the linear feasibility program that defines CE, and is thus an extreme point of the set of CE of the game. Since Papadimitriou and Roughgarden [2008] proposed the Ellipsoid Against Hope algorithm for computing a CE, researchers have proposed algorithms for related problems that used a similar approach (which we call the Ellipsoid Against Hope approach): first solving an infeasible LP using the ellipsoid method with some separation oracle, then arguing that the LP formed by the cutting planes is also infeasible, and finally solving the dual of the latter polynomial-sized LP. For example, Hart and Mansour [2010] considered the setting where each player initially knows only her own utility function, and proposed a communication procedure that finds a CE with polynomial communication complexity using a straightforward adaptation of the Ellipsoid Against Hope algorithm. Huang and Von Stengel [2008] proposed a polynomial-time algorithm for computing a extensive-form correlated equilibrium (EFCE) [von Stengel and Forges, 2008], a solution concept for extensive-form games, by applying the Ellipsoid Against Hope approach to the LP formulation of EFCE. For both algorithms, the separation oracle outputs a mixture of the original constraints, and hence the flaws of the Ellipsoid Against Hope algorithm pointed out by Stein et al. [2010] also apply. We show that our techniques can be adapted to these two algorithms, yielding in both cases exact solutions with polynomial-sized supports. In particular, we replace the original separation oracles with “purified” versions that output cutting planes corresponding to the original constraints, which ensures that the resulting algorithms avoid the numerical issues. The rest of the chapter is organized as follows. We start with basic definitions and notation in Section 7.2. In Section 7.3 we summarize Papadimitriou and Roughgarden’s Ellipsoid Against Hope algorithm. In Section 7.4 we describe our 180 algorithm and prove its correctness. In Sections 7.5 and 7.6 we describe our fixes to Hart and Mansour’s [2010] and Huang and Von Stengel’s [2008] algorithms respectively, and Section 7.7 concludes. This chapter is based on published joint work with Kevin Leyton-Brown [2011]. New material that does not appear in [Jiang and Leyton-Brown, 2011] includes Sections 7.5 and 7.6. 7.2 Preliminaries In this chapter and Chapter 8 we largely follow the notation of Papadimitriou [2005] and Papadimitriou and Roughgarden [2008], which has become standard notation for the literature on CE computation. The notation is slightly different from the one we used in the previous (AGG-specific) chapters. Consider a simultaneous-move game with n players. Denote a player p, and player p’s set of pure strategies (i.e., actions) S p . Let m = max p |S p |. Denote a pure strategy profile s = (s1 , . . . , sn ) ∈ S, with s p being player p’s pure strategy. Denote by S−p the set of partial pure strategy profiles of the players other than p. Player p’s utility under pure strategy profile s is usp . We assume that utilities are nonnegative integers (but results in this chapter can be straightforwardly adapted to rational utilities). Denote the largest utility of the game as u. A correlated distribution is a probability distribution over pure strategy profiles, represented by a vector x ∈ RM , where M = ∏ p |S p |. Then xs is the probability of pure strategy profile s under the distribution x. A correlated distribution x is a product distribution when it can be achieved by each player p randomizing independently over her actions according to some distribution x p , i.e., xs = ∏ p xspp . Such a product distribution is also known as a mixed-strategy profile, with each player p playing the mixed strategy x p . Throughout the paper we assume that a game is given in a representation satisfying two properties, following Papadimitriou and Roughgarden [2008]: • polynomial type: recall from Section 2.1.1 that this means the number of players and the number of actions for each player are bounded by polynomials in the size of the representation. • the polynomial expectation property: we have access to an algorithm that 181 computes the expected utility of any player p under any product distribution x, i.e., ∑s∈S usp xs , in time polynomial in the size of the representation. Definition 7.2.1. A correlated distribution x is a correlated equilibrium (CE) if it satisfies the following incentive constraints: for each player p and each pair of her actions i, j ∈ S p , ∑ [uisp − u pjs ]xis ≥ 0, (7.2.1) s∈S−p where the subscript “is” (respectively “ js”) denotes the pure strategy profile in which player p plays i (respectively j) and the other players play according to the partial profile s ∈ S−p . We write these incentive constraints in matrix form as U x ≥ 0. Thus U is an N × M matrix, where N = ∑ p |S p |2 . The rows of U , corresponding to the left-hand sides of the constraints (7.2.1), are indexed by (p, i, j) where p is a player and i, j ∈ S p are a pair of p’s actions. Denote by Us the column of U corresponding to pure strategy profile s. These incentive constraints, together with the constraints x ≥ 0, ∑ xs = 1, (7.2.2) s∈S which ensure that x is a probability distribution, form a linear feasibility program that defines the set of CE. The largest value in U is at most u. We define the support of a correlated equilibrium x as the set of pure strategy profiles assigned positive probability by x. Germano and Lugosi [2007] showed that for any n-player game, there always exists a correlated equilibrium with support size at most 1+ ∑ p |S p |(|S p |− 1) = N + 1− ∑ p |S p |. Intuitively, such correlated equilibria are basic feasible solutions of the linear feasibility program for CE, i.e., vertices of the polyhedron defining the feasible region. Furthermore, these basic feasible solutions involve only rational numbers for games with rational payoffs (see e.g. Lemma 6.2.4 of [Gr¨otschel et al., 1988]). 7.3 The Ellipsoid Against Hope Algorithm In this section, we summarize Papadimitriou and Roughgarden’s [2008] Ellipsoid Against Hope algorithm for finding a sample CE, which can be seen as an effi182 ciently constructive version of earlier proofs [Hart and Schmeidler, 1989, Myerson, 1997, Nau and McCardle, 1990] of the existence of CE. We will concentrate on the main algorithm and only briefly point out the numerical issues discussed at length by both Papadimitriou and Roughgarden [2008] and Stein et al. [2010], as our analysis will ultimately sidestep these issues. Papadimitriou and Roughgarden’s approach considers the linear program max ∑ xs (P) s∈S U x ≥ 0, x ≥ 0, which is modified from the linear feasibility program for CE by replacing the constraint ∑s∈S xs = 1 from (7.2.2) with the maximization objective. (P) either has x = 0 as its optimal solution or is unbounded; in the latter case, taking a feasible solution and scaling it to be a distribution yields a correlated equilibrium. Thus one way to prove the existence of CE is to show the infeasibility of the dual problem U T y ≤ −1, y ≥ 0. (D) The Ellipsoid Against Hope algorithm uses the following lemma, versions of which were also used by Nau and McCardle [1990] and Myerson [1997]. Lemma 7.3.1 ([Papadimitriou and Roughgarden, 2008]). For every dual vector y ≥ 0, there exists a product distribution x such that xU T y = 0. Furthermore there exists an algorithm that given any y ≥ 0, computes the corresponding x (represented by x1 , . . . , xn ) in time polynomial in n and m. We will not discuss the details of this algorithm; we will only need the facts that the resulting x is a product distribution and can be computed in polynomial time. Note also that the resulting x is symmetric if y is symmetric. Lemma 7.3.1 implies that the dual problem (D) is infeasible (and therefore a CE must exist): xU T y is a convex combination of the left hand sides of the rows of the dual, and for any feasible y the result must be less than or equal to −1. The Ellipsoid Against Hope algorithm runs the ellipsoid algorithm on the dual (D), with the algorithm from Lemma 7.3.1 as separation oracle, which we call the 183 the Product Separation Oracle. At each step of the ellipsoid algorithm, the separation oracle is given a dual vector y(i) . The oracle then generates the corresponding product distribution x(i) and indicates to the ellipsoid algorithm that (x(i)U T )y ≤ −1 is violated by y(i) . The ellipsoid algorithm will stop after a polynomial number of steps and determine that the program is infeasible. Let X be the matrix whose rows are the generated product distributions x(1) , . . . , x(L) . Consider the linear program [XU T ]y ≤ −1, y ≥ 0, (D′ ) and observe that the rows of [XU T ]y ≤ −1 are the cuts generated by the ellipsoid method. If we apply the same ellipsoid method to (D′ ) and use a separation oracle that returns the cut x(i)U T y ≤ −1 given query y(i) , the ellipsoid algorithm would go through the same sequence of queries y(i) and cutting planes x(i)U T y ≤ −1 and return infeasible. Presuming that numerical problems do not arise,4 we will find that (D′ ) is infeasible. This implies that its dual [U X T ]α ≥ 0, α ≥ 0 is unbounded and has polynomial size, and thus can be solved for a nonzero feasible α . We can thus scale α to obtain a probability distribution. We then observe that X T α satisfies the incentive constraints (7.2.1) and the probability distribution constraints (7.2.2) and is therefore a correlated equilibrium. The distribution X T α is the mixture of product distributions x(1) , . . . , x(L) with weights α , and thus can be represented in polynomial space and can be efficiently sampled from. One issue remains. Although the matrix XU T is polynomial sized, computing it using matrix multiplication would involve an exponential number of operations. On the other hand, entries of XU T are differences between expected utilities that arise under product distributions. Since we have assumed that the game represen4 Since each row of (D′ )’s constraint matrix XU T may require more bits to represent than any row of the constraint matrix U T for (D), running the ellipsoid algorithm on (D′ ) with the original bounding ball and volume lower bound for (D) would not be sound, and as a result (D′ ) is not guaranteed to be infeasible. Indeed, Stein et al. [2010] showed that when running the algorithm on their symmetric game example, (D′ ) would remain feasible, and thus the output of the algorithm would not be an exact CE. Furthermore, since the only CE of that game that is a mixture of symmetric product distributions is irrational, there is no way to resolve this issue without breaking at least one of the symmetry and product distribution properties of the Ellipsoid Against Hope algorithm. For more on these issues and possible ways to address them, please see Papadimitriou and Roughgarden [2008, 2010], Stein et al. [2010]. 184 tation admits a polynomial-time algorithm for computing such expected utilities, XU T can be computed in polynomial time. Lemma 7.3.2 ([Papadimitriou and Roughgarden, 2008]). There exists an algorithm that given a game representation with polynomial type and satisfying the polynomial expectation property, and given an arbitrary product distribution x, computes xU T in polynomial time. As a result, XU T can be computed in polynomial time. 7.4 Our Algorithm In this section we present our modification of the Ellipsoid Against Hope algorithm, and prove that it computes exact CE. There are two key differences between our approach and the original algorithm for computing approximate CE. 1. Our modified separation oracle produces pure-strategy-profile cuts; 2. The algorithm is simplified, no longer requiring a special mechanism to deal with numerical issues (because pure-strategy-profile cuts can be represented directly as rows of (D)’s constraint matrix). 7.4.1 The Purified Separation Oracle We start with a “purified” version of Lemma 7.3.1. Lemma 7.4.1. Given any dual vector y ≥ 0, there exists a pure strategy profile s such that (Us )T y ≥ 0. Proof. Recall that Lemma 7.3.1 states that given dual vector y ≥ 0, a product distribution x can be computed in polynomial time such that xU T y = 0. Since x[U T y] is a convex combination of the entries of the vector U T y, there must exist some nonnegative entry of U T y. In other words, there exists a pure strategy profile s such that (Us )T y ≥ xU T y = 0. The proof of Lemma 7.4.1 is a straightforward application of the probabilistic method: since xU T y is the expected value of (Us )T y under distribution x, which we denote Es∼x [(Us )T y], the nonnegativity of this expectation implies the existence of 185 some s such that (Us )T y ≥ 0. Like many other probabilistic proofs, this proof is not efficiently constructive; note that there are an exponential number of possible pure strategy profiles. It turns out that for game representations with polynomial type and satisfying the polynomial expectation property, an appropriate s can indeed be identified in polynomial time. Our approach can be seen as derandomizing the probabilistic proof using the method of conditional probabilities [Erd˝os and Selfridge, 1973, Raghavan, 1988, Spencer, 1994]. At a high level, for each player p our algorithm picks a pure strategy s p , such that the conditional expectation of (Us )T y given the choices so far remains nonnegative. This requires us to compute the conditional expectations, but this can be done efficiently using the expected utility subroutine guaranteed by the polynomial expectation property. Lemma 7.4.2. There exists a polynomial-time algorithm that given • an instance of a game in a representation satisfying polynomial type and the polynomial expectation property, • a polynomial-time subroutine for computing expected utility under any product distribution (as guaranteed by the polynomial expectation property), and • a dual vector y ≥ 0, finds a pure strategy profile s ∈ S such that (Us )T y ≥ 0. Proof. Given a product distribution x, let x(p→s p ) be the product distribution in which player p plays s p and all other players play according to x. Since x is a product distribution, x(p→s p )U T y is the conditional expectation of (Us )T y given that p plays s p , and furthermore we have for any p, xU T y = ∑ x(p→s p )U T y xspp . (7.4.1) sp Since x p is a distribution, the right hand side of (7.4.1) is a convex combination and thus there must exist an action s p ∈ S p such that x(p→s p )U T y ≥ xU T y ≥ 0. Since x(p→s p ) is a product distribution, this process can be repeated for each player 186 Algorithm 5 Computes a pure strategy profile s such that (Us )T y ≥ 0. 1. Given y ≥ 0, identify a product distribution x satisfying xU T y = 0, using the algorithm described in Lemma 7.3.1. 2. Sequentially for each player p ∈ {1, . . . , n}, (a) iterate through actions s p ∈ S p , and compute x(p→s p )U T using the algorithm described in Lemma 7.3.2, until we find an action s∗p ∈ S p such that x(p→s∗p )U T y ≥ 0. (b) set x to be x(p→s∗p ) . 3. The resulting x corresponds to a pure strategy profile s. Output s. to yield a pure strategy profile s such that (Us )T y ≥ xU T y ≥ 0. This is formalized in Algorithm 5. We now consider the running time of Algorithm 5. We observe that x remains a product distribution throughout the algorithm and can thus be represented by its marginals x1 , . . . , xn , requiring only polynomial space. Due to the polynomial expectation property, the algorithm described in Lemma 7.3.2 is polynomial, which implies that in Step 2a, for each s p ∈ S p , x(p→s p )U T can be computed in polynomial time. Since Step 2a requires at most |S p | such computations, and since polynomial type implies that n and |S p | are polynomial in the input size, the algorithm runs in polynomial time. A straightforward corollary is the following: Corollary 7.4.3. Algorith
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Representing and reasoning with large games
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Representing and reasoning with large games Jiang, Xin 2011
pdf
Page Metadata
Item Metadata
Title | Representing and reasoning with large games |
Creator |
Jiang, Xin |
Publisher | University of British Columbia |
Date Issued | 2011 |
Description | In the last decade, there has been much research at the interface of computer science and game theory. One important class of problems at this interface is the computation of solution concepts (such as Nash equilibrium or correlated equilibrium) of a finite game. In order to take advantage of the highly-structured utility functions in games of practical interest, it is important to design compact representations of games as well as efficient algorithms for computing solution concepts on such representations. In this thesis I present several novel contributions in this direction: The design and analysis of Action-Graph Games (AGGs), a fully-expressive modeling language for representing simultaneous-move games. We propose a polynomial-time algorithm for computing expected utilities given arbitrary mixed strategy profiles, and leverage the algorithm to achieve exponential speedups of existing algorithms for computing Nash equilibria. Designing efficient algorithms for computing pure-strategy Nash equilibria in AGGs. For symmetric AGGs with bounded treewidth our algorithm runs in polynomial time. Extending the AGG framework beyond simultaneous-move games. We propose Temporal Action-Graph Games (TAGGs) for representing dynamic games and Bayesian Action-Graph Games (BAGGs) for representing Bayesian games. For certain subclasses of TAGGs and BAGGs we gave efficient algorithms for equilibria that achieve exponential speedups over existing approaches. Efficient computation of correlated equilibria. In a landmark paper, Papadimitriou and Roughgarden described a polynomial-time algorithm ("Ellipsoid Against Hope") for computing sample correlated equilibria of compactly-represented games. Recently, Stein, Parrilo and Ozdaglar showed that this algorithm can fail to find an exact correlated equilibrium. We present a variant of the Ellipsoid Against Hope algorithm that guarantees the polynomial-time identification of exact correlated equilibrium. Efficient computation of optimal correlated equilibria. We show that the polynomial-time solvability of what we call the deviation-adjusted social welfare problem is a sufficient condition for the tractability of the optimal correlated equilibrium problem. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2012-01-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0052175 |
URI | http://hdl.handle.net/2429/39951 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2012-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2012_spring_jiang_xin.pdf [ 1.84MB ]
- Metadata
- JSON: 24-1.0052175.json
- JSON-LD: 24-1.0052175-ld.json
- RDF/XML (Pretty): 24-1.0052175-rdf.xml
- RDF/JSON: 24-1.0052175-rdf.json
- Turtle: 24-1.0052175-turtle.txt
- N-Triples: 24-1.0052175-rdf-ntriples.txt
- Original Record: 24-1.0052175-source.json
- Full Text
- 24-1.0052175-fulltext.txt
- Citation
- 24-1.0052175.ris