UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essays in political economy and on networks Canen, Nathan Joseph 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_november_canen_nathan.pdf [ 3.05MB ]
JSON: 24-1.0372167.json
JSON-LD: 24-1.0372167-ld.json
RDF/XML (Pretty): 24-1.0372167-rdf.xml
RDF/JSON: 24-1.0372167-rdf.json
Turtle: 24-1.0372167-turtle.txt
N-Triples: 24-1.0372167-rdf-ntriples.txt
Original Record: 24-1.0372167-source.json
Full Text

Full Text

Essays in Political Economy and on NetworksbyNathan Joseph CanenBEcon, Fundac¸a˜o Getulio Vargas (FGV/EPGE), 2011MEcon, Fundac¸a˜o Getulio Vargas (FGV/EPGE), 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinThe Faculty of Graduate and Postdoctoral Studies(Economics)The University of British Columbia(Vancouver)September 2018c© Nathan Joseph Canen, 2018The following individuals certify that they have read, and recommend to the Faculty ofGraduate and Postdoctoral Studies for acceptance, the dissertation entitled: Essays in Po-litical Economy and on Networks, submitted by Nathan Joseph Canen in partial fulfillmentof the requirements for the degree of Doctor of Philosophy in Economics.Examining Committee:Francesco Trebbi, Economics SupervisorPaul Schrimpf, Economics Supervisory Committee MemberSergei Severinov, Economics University ExaminerRichard Johnston, Political Science University ExaminerArianna Degan, Economics, UQAM External ExamineriiAbstractThis thesis studies topics in political economy and the economics of networks.In Chapter 2, we present and structurally estimate a model of endogenous networkformation and legislative activity of politicians. Employing data on social and legislativeeffort of members of the 105th-110th U.S. Congresses (1997-2009), we find that there aresubstantial complementarities between the efforts of politicians, both within and acrossparties.Chapter 3 considers the econometrics of incomplete information games on networks.This chapter develops a tractable empirical model of linear interactions where each agent,after observing part of his neighbors’ types, not knowing the full network of how infor-mation is transmitted, uses linear best responses. This allows the researcher to performasymptotic inference without having to observe all the players in the game or having toknow precisely the sampling process. The usefulness of this procedure is shown with anapplication to the provision of public goods across municipalities in Colombia.Chapter 4 studies the sources of party polarization in the U.S. Congress. Polarization isnot just the result of changes in the ideology of individual legislators, but also of changesin the ability of political parties to discipline (whip) their members and of the deliberateagenda setting by their leadership. This chapter evaluates quantitatively the importanceof these three components in driving polarization through a novel identification approachbased on previously untapped whip count data and a structural model of legislative activity.In the final chapter, I turn my attention to the voters’ side in political economy mod-els. Surveys, polling data and media reports indicate that voters often choose whom to votefor at different stages in the political campaign. I develop a model of costly informationacquisition that rationalizes these observations. The model implies a key tradeoff betweenthe cost of acquiring information, and the gain such information brings. Under this frame-work, I show that information blackouts (i.e. forbidding release of campaigning or pollinginformation before the election) generates welfare losses of around 1-2%.iiiLay SummaryThis thesis is based on a series of contributions to the fields of political economy and theeconomics of networks. In Chapter 2, we explore how politicians form social connections inCongress. We provide an empirical model that allows an in depth analysis of these strategicdecisions. Chapter 3 provides methodological contributions to the study of strategic inter-actions when agents may be connected on a network. In particular, we focus on tools formodels of incomplete information where agents are better informed about their neighborsthan of others. Leveraging on new data and a new theoretical framework, Chapter 4 showshow observed polariation may be driven by three separate factors: the polarization of in-dividual ideologies, changes in legislative agenda and party discipline. We show how toempirically decompose its sources. Finally, I look at the timing of information on voter’sdecisions through an empirical model of information accumulation.ivPrefaceChapter 2 of this thesis is coauthored with Matthew O. Jackson and Francesco Trebbi. Allcoauthors contributed equally to all aspects of the project.Chapter 3 of this thesis is based on joint work with Kyungchul Song and Jacob Schwartz.All coauthors contributed equally to all aspects of the project, including developing themodel, the Monte Carlo simulations and the empirical example. Part of this work also ap-pears in Jacob Schwartz’s Ph.D dissertation at the University of British Columbia. He hasgiven his consent in sharing this chapter.Chapter 4 of this thesis is coauthored with Chad Kendall and Francesco Trebbi. Allcoauthors contributed equally to all aspects of the project.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Endogenous Networks and Legislative Activity . . . . . . . . . . . . . . . . . 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.1 Relation to the Literature . . . . . . . . . . . . . . . . . . . . . . . 52.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Legislators, Parties, and Partisanship . . . . . . . . . . . . . . . . . 62.2.2 Legislative effort and Preferences . . . . . . . . . . . . . . . . . . 82.2.3 A Micro-Foundation Built Upon Reelection Preferences . . . . . . 92.2.4 Solving For Equilibrium . . . . . . . . . . . . . . . . . . . . . . . 122.2.5 Pareto Efficient Efforts . . . . . . . . . . . . . . . . . . . . . . . . 142.2.6 Preliminaries to Estimation . . . . . . . . . . . . . . . . . . . . . . 152.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23vi2.4.1 Moment Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4.2 Estimation via GMM . . . . . . . . . . . . . . . . . . . . . . . . . 252.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.5.1 Fit and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 302.6 Assessment and Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . 322.6.1 Stability of Equilibrium and Other Equilibrium Properties . . . . . 342.6.2 Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Estimating Local Interactions Among Many Agents Who Observe TheirNeighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Strategic Interactions with Information Sharing . . . . . . . . . . . . . . . 473.2.1 A Model of Interactions with Information Sharing . . . . . . . . . 473.2.2 Predictions from Rationality . . . . . . . . . . . . . . . . . . . . . 503.2.3 Belief Projection and Best Linear Responses . . . . . . . . . . . . 523.2.4 The External Validity of Network Externality . . . . . . . . . . . . 573.3 Econometric Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.1 General Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.2 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 703.4 A Monte Carlo Simulation Study . . . . . . . . . . . . . . . . . . . . . . . 713.5 Empirical Application: State Presence across Municipalities . . . . . . . . 733.5.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . 733.5.2 Empirical Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.5.3 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . 773.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804 Unbundling Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.2.2 Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.2.3 Information and Timing . . . . . . . . . . . . . . . . . . . . . . . 884.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.3.1 Roll Call Votes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90vii4.3.2 Whip Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.3.3 The Whip Count . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.3.4 Optimal Policy Choices . . . . . . . . . . . . . . . . . . . . . . . 924.3.5 The Whip Count and Bill Pursuit Decisions . . . . . . . . . . . . . 944.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.5 Transition to Estimation and Identification . . . . . . . . . . . . . . . . . . 984.5.1 Identification of the Model . . . . . . . . . . . . . . . . . . . . . . 1004.5.2 Krehbiel’s critique: Lack of identification of θi and of party effectswithout whip counts . . . . . . . . . . . . . . . . . . . . . . . . . 1034.6 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064.8 Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.8.1 The importance of party discipline for the approval of legislation . . 1094.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.10 Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125 Information Accumulation and the Timing of Voting Decisions . . . . . . . . 1195.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.2.1 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.2.2 Environment - Voters . . . . . . . . . . . . . . . . . . . . . . . . . 1235.2.3 Political Parties and Voting . . . . . . . . . . . . . . . . . . . . . . 1265.2.4 Voting Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.2.5 Results and Solution . . . . . . . . . . . . . . . . . . . . . . . . . 1275.3 Moving to the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.5 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.6 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1405.8 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1415.8.1 Different Distributions for η . . . . . . . . . . . . . . . . . . . . . 1445.9 Model Fit and Specification Tests . . . . . . . . . . . . . . . . . . . . . . . 1465.10 Policy Implications - Pre-election Silence (Blackouts) . . . . . . . . . . . . 1465.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150viiiA Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163A.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163A.1.1 Best Response Dynamics . . . . . . . . . . . . . . . . . . . . . . . 169A.2 Parametric Identification, with set identification of φ . . . . . . . . . . . . 170A.2.1 The need for exponential measurement errors even when α is para-metric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173A.3 Rewriting the Model in terms of Moment conditions over i . . . . . . . . . 174A.4 Details on Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175A.4.1 OLS and plug-in Approach as Starting Values for Optimization . . 176A.4.2 Computation of the Optimal Weighting Matrix and of Standard Errors176A.5 Computation of Comparative Statics . . . . . . . . . . . . . . . . . . . . . 177A.6 Identification and Estimation using second moments of the proxies of (s∗i ,x∗i )179A.6.1 Estimation with Second Moment Conditions on the Proxies of s∗i ,x∗i 180A.6.2 Results under Restrictions on the Second Moments of the Proxiesfor (s∗i ,x∗i ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181A.7 Identification with Nonparametric α . . . . . . . . . . . . . . . . . . . . . 183B Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185B.1 Construction of the Estimator for the Asymptotic Covariance Matrix . . . . 185B.2 Proofs of Theorems 3.2.1 - 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . 188C Appendix to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205C.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205C.2 The choices from Party “R” . . . . . . . . . . . . . . . . . . . . . . . . . . 218C.2.1 Agenda Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219D Appendix to Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221D.1 Additional Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . 221D.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229D.3 Israeli Political System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233D.3.1 Israel in 2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233D.4 First Stage Estimation of the Priors . . . . . . . . . . . . . . . . . . . . . . 234D.5 Optimization Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234ixList of TablesTable 2.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Table 2.2 Main Results, Specification 1 . . . . . . . . . . . . . . . . . . . . . . . 27Table 2.3 Differences in the Distributions of αi Across Parties . . . . . . . . . . . 28Table 2.4 Main Results, Specification 2 . . . . . . . . . . . . . . . . . . . . . . . 31Table 2.5 Model Fit: Correlation of Estimated Network of the Model to the Cospon-sorship Networks in the Data . . . . . . . . . . . . . . . . . . . . . . . 33Table 2.6 Estimated and High Effort Equilibria . . . . . . . . . . . . . . . . . . . 35Table 2.7 Counterfactual in γ: Predicted Probability of Bill Approval . . . . . . . 37Table 2.8 Counterfactuals in c: Predicted (Proportional) Change in the (Mean)Probability of Bill Approval . . . . . . . . . . . . . . . . . . . . . . . . 38Table 2.9 Counterfactuals in α: Looking at the Changes in (Ex-Ante) predictedprobability of Emergency Crisis bills in the 110th Congress, if the Re-publicans who lost their seats remained . . . . . . . . . . . . . . . . . . 41Table 3.1 The Characteristics of the Payoff Graphs . . . . . . . . . . . . . . . . . . . 58Table 3.2 The Degree Characteristics of the Graphs Used in the Simulation Study . . . . 73Table 3.3 The Empirical Coverage Probability and Average Length of Confidence Inter-vals for β0 at 95% Nominal Level. . . . . . . . . . . . . . . . . . . . . . . 74Table 3.4 The Empirical Coverage Probability and Average Length of Confidence Inter-vals for a′ρ0 at 95% Nominal Level. . . . . . . . . . . . . . . . . . . . . . 75Table 3.5 State Presence and Networks Effects across Colombian Municipalities . . . . 79Table 4.1 Summary Statistics on Bill Selection . . . . . . . . . . . . . . . . . . . 114Table 4.2 Number of Whips per Party . . . . . . . . . . . . . . . . . . . . . . . . 114Table 4.3 Main Estimates from the First Step . . . . . . . . . . . . . . . . . . . . 115Table 4.4 Decomposition of Polarization in Ideologies and Whipping . . . . . . . 116xTable 4.5 Distance of γ1,γ2 to the Party Medians . . . . . . . . . . . . . . . . . . 116Table 4.6 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Table 4.7 Likelihood Ratio Test for Constant ymax . . . . . . . . . . . . . . . . . . 117Table 4.8 Counterfactual: Voting Outcomes on Salient Bills . . . . . . . . . . . . 118Table 5.1 When Did You Finally Decide to Vote for the Party? 2006 survey . . . . 133Table 5.2 Answer to “Israeli condition in general”, Pre-Election survey . . . . . . 135Table 5.3 Different Measurements for Policy Vectors . . . . . . . . . . . . . . . . 140Table 5.4 Results of the Structural Model . . . . . . . . . . . . . . . . . . . . . . 142Table 5.5 Results of Extensions to the Structural Model . . . . . . . . . . . . . . 145Table A.1 Results, Specification 1, second moments of the (s∗i ,x∗i ) proxy . . . . . . 182Table D.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224Table D.2 Distribution of Votes and Seats in the Knesset, 2006 Elections . . . . . . 225Table D.3 How Voters Are Deciding . . . . . . . . . . . . . . . . . . . . . . . . . 226Table D.4 Israeli Civilian Fatalities in Terrorist Attacks, 2003-2005 . . . . . . . . . 227Table D.5 Who stops earlier? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228xiList of FiguresFigure 2.1 Total Number of Cosponsorships per Congressional cycle . . . . . . . 18Figure 2.2 Number of Cosponsorships Within and Across Parties per Congres-sional cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 2.3 Correlation between the raw data of log(1+Words) in Floor Speeches andCosponsorship decisions. . . . . . . . . . . . . . . . . . . . . . . . . . . 20Figure 2.4 Correlation between the raw data of Roll Call Effort and Cosponsorshipdecisions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Figure 2.5 Distribution of Estimated α over Time . . . . . . . . . . . . . . . . . . 26Figure 2.6 Estimated Probability of Approval - Democrats, Congresses 105-110 . 29Figure 2.7 Estimated Probability of Approval - Republicans, Congresses 105-110 . 30Figure 3.1 Network Externality Comparison Between Equilibrium and Behavioral Mod-els: Erdo¨s-Re´nyi Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 60Figure 3.2 Network Externality Comparison Between Equilibrium and Behavioral Mod-els: Baraba´si-Albert Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 61Figure 3.3 Degree Distribution of GP . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 3.4 Average Network Externality from being a Department Capital . . . . . . . . 80Figure 4.1 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Figure 4.2 Example of Value Functions . . . . . . . . . . . . . . . . . . . . . . . 96Figure 4.3 Votes with the Majority Party at Whip Counts and Roll Calls . . . . . . 112Figure 4.4 Estimated ideologies, θi, per Party over Time . . . . . . . . . . . . . . 112Figure 4.5 Estimated Ideologies from the Model θi compared to DWNominate . . 113Figure 4.6 Estimated ymax per Party over Time . . . . . . . . . . . . . . . . . . . 113Figure 5.1 Distribution of Estimated Priors (Precision) . . . . . . . . . . . . . . . 143Figure 5.2 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147xiiAcknowledgmentsI am deeply indebted to my supervisor, Francesco Trebbi, for all of his dedication andsupport. I am particularly thankful for his immense generosity, including the many hours Ihave learned while exchanging ideas; invaluable to my growth. I would also like to thankmy committee members, Hiro Kasahara and Paul Schrimpf, for their continuous supportthroughout the years. I am also very grateful to my examining committee, Richard Johnston,Sergei Severinov and, especially, Arianna Degan, for their valuable comments and thoroughreading of this dissertation.I have benefited from the support of many other UBC faculty and staff. In particular,I would like to thank Maureen Chin for all her help. Furthermore, I thank the generousfunding from UBC and the Vancouver School of Economics.I have learned much from my other co-authors Matthew O. Jackson, Chad Kendall,Jacob Schwartz and Kevin Song, for whom I would like to express my gratitude. These haveincluded long conversations that have resulted in a deeper understanding of the nature ofeconomics and econometric models. I take these lessons on with me. Support has also comefrom my friends and colleagues. I couldn’t have wished for a better environment within theSchool of Economics. I would particularly like to thank Anderson, Anujit, Denis, Gae¨lle,Hugo, Jacob, Joa˜o, Pierluca, Rogerio, Tom and Tzu-Ting, who have shared this path withme at different times. To my friends outside of economics, I owe so much to your emotionalsupport.Last but not least, I would like to thank my family. My parents, Ana and Alberto, andmy sister, Doris, have always been present, encouraging and nurturing. They have inspiredme to pursue my dreams, to do my best, while grounded in what really matters. The Fortes-Kursan family provided me a family away from home. Finally, thanks to Cadi, who hasbeen with me through the best and the worst.xiiiChapter 1IntroductionThis thesis explores topics in political economy and the economics of networks. In partic-ular, it investigates questions related to the organization of political institutions, such as thesocial networks formed by politicians, the organization and disciplining of politicians bytheir party leaders, and the differential impact of information on the timing of voting deci-sions. To do so, I also introduce new methodologies and new data that can address thesequestions.The second chapter of my dissertation, in joint work with Matthew O. Jackson andFrancesco Trebbi, addresses how politicians form networks in Congress. These networkscan be beneficial for the approval of a politician’s legislation, as they can gather supportfrom their own network which may help to pass a bill. Previous work had focused on an-alyzing the shape of the network of politicians, with connections measured by bill cospon-sorships. However, the decision of whom to connect with is itself a choice, taken whileanticipating future returns of such connections. In our model, we characterize the choicesof social effort (used to connect with others) and legislative effort, when passing a billrequires both. The model provides a rich set of testable predictions. These include who be-comes more connected, how the amount of effort changes according to the cost of effort, aswell as the impacts of changing the composition of Congress. The model is then estimated,and we discuss how changes to the composition of Congress can affect the passage of majorbills, such as the Emergency Economic Stabilization Act in Congress 110.Chapter 3 provides new methodological tools for the estimation of such models. Thischapter is entitled “Estimating Local Interactions Among Many Agents Who Observe TheirNeighbors” and is based on joint work with Jacob Schwartz and Kyungchul Song. We studygames on networks when there might be incomplete information: agents are better informed1about their neighbors than someone who they are not connected to. Our results providean estimator and statistical tests with good properties. One key issue addressed with ourmethodology is that usual sampling techniques do not work with networks: a random sam-ple of a network will look very different than the original network studied. This is becauseeach member’s connections are unlike another’s. Another key issue addressed is how toestimate models of games on networks, even when the researcher does not observe the fullnetwork in the data. This has multiple applications in political economy, as illustrated in anempirical application of the provision of public goods across municipalities in Colombia.Having studied the choices of individual legislators in a previous chapter, Chapter 4 thenstudies the decisions and incentives within parties. This is done with a particular focus onunderstanding the sources of polarization, in joint work with Chad Kendall and FrancescoTrebbi. We quantify the importance of agenda setting, drifting ideologies and increasingparty discipline in explaining observed polarization in roll call votes. Separately identifyingsuch effects is nontrivial: we only observe outcomes that are a result of all of them together.To solve this, we use new data with a new theoretical framework. The new data comesfrom whip counts conducted by party leaders, which gauges legislators’ positions on a billbefore they actually vote them on the Congress floor. We model the inner workings ofparty functioning in the U.S. Congress, with a theoretical model that addresses both agendasetting and party discipline. Crucially, our empirical model allows us to disentangle partyeffects from party discipline, agenda setting and ideology.My final chapter then looks at voters’ decisions. I start from the observation that votersmake decisions on whom to vote for at different points in time. While many of them decideon who to vote for in the last day or the last week of the campaign, others know all alongwho they are going to vote for. This means they are willing to commit on their decision,even when they might learn useful new information in the future. To explain these dif-ferences, I propose a model of costly information accumulation. For some voters, stoppingearly is due to high costs of information, while for others, the benefit of information is small.The model allows me to disentangle what drives the stopping times observed for differentsubgroups in the data. I then look at the impacts of a blackout policy, by which informationis banned from voters in a day or a week before the election. Such a policy, although aimingfor fairness, harms voters as it affects only those who still wish to accumulate information.2Chapter 2Endogenous Networks andLegislative Activity12.1 IntroductionDeliberative bodies, especially larger ones, rely on informal interactions in order to functionproductively. Information percolates through informal political networks and individualsform relationships with each other in order to craft and pass legislation. Because of thesalient role of interpersonal ties in the legislative process, its study dates at least to the1930s ([154]), but only in the last fifteen years has this area of research come to moreprominence ([115]).The challenge of simultaneously modeling network formation and political decision-making may be a reason for this delayed uptake, and it still presents a significant hurdle foradvancing our understanding of how legislatures internally operate. The construction andanalysis of the model presented here helps fill this gap.Our model generalizes the tractable and powerful framework of [36]. In their modelagents (who in our setting are to be thought of as elected representatives) choose bothhow much socializing to do with other politicians and how much legislative effort to exert.Socializing efforts result in randomly formed relationships that increase the success of leg-islative efforts, and so social and legislative efforts are complements. Importantly, socialand legislative efforts are also complementary to those of the other politicians with whoma given politician has ties, both within and outside of his/her party. In [36], however, re-lationships form completely at random. This misses realistic biases in interaction (such as1This chapter is a joint work with Matthew O. Jackson and Francesco Trebbi.3homophily) that characterize many social environments, especially ones political in nature.In our generalization, we allow social ties to form at a different rate within compared toacross groups – so that, for instance, legislators can collaborate with members of their ownparty at a different rate than with members of the opposition.We structurally estimate our model employing data on cosponsorship and legislativeeffort of members of House of Representatives from the 105th-110th U.S. Congresses.A first empirical finding is that the complementarities among politicians are significantand stable across our sample period. The social marginal multiplier on legislative effort isestimated to be between a tenth and a third of the direct incentive for legislative effort.2This means that a nontrivial fraction of incentives for efforts of politicians are driven bywhat other politicians are doing. In summary, socializing appears vital to bill passage and,in addition, this has been consistently true over time.We then examine differences between Democrats and Republicans. We find that thetwo parties have different base payoffs from passing legislation, both in terms of averageand variance across party members (both are higher for the Democrats). These differenceslead to higher levels of social and legislative efforts of both types by Democrats, all elseheld equal.Further, our generalization allows social interaction to be biased towards one’s ownparty. This turns out to be quantitatively important, as it allows for asymmetries in behavioracross political parties, which clearly appear in the congressional data we explore in ourapplication. Specifically, we find evidence that partisan bias is an empirically relevant fea-ture and a model with biased interactions fits the data significantly better than a model withno bias. We also show evidence, however, that social interaction in the U.S. Congress isfar from being an exclusively partisan affair. The data appears more nuanced than the com-mon narrative of a completely balkanized Congress, segregated along party lines, that hasemerged from recent literature mostly based on post-1980 congressional roll call evidence([128]; [69]). Intermediate levels of partisanship in our environment – a partisan bias is inthe range of 8−10 percent which we can estimate in an extended version of our model, forexample – fits the data better than a fully partisan model where 100 percent of interactionsare exclusively within party. Intuitively, it is hard to reconcile the thousands of bipartisancosponsorships observed in recent congressional data with an hypothesis of unmitigated po-larization between parties. Our conjecture is that the stark posturing required by the media,2As it will be clear in the analysis that follows, the parameter multiplying the full product of social andlegislative efforts ranges from 0.05 to 0.08, which when multiplied by other legislators’ efforts of 3 to 4.5, andsocial efforts around 1, leads to a multiplier of 0.1 to 0.32. This is compared to direct incentives for legislativeactivity ranging from 1.0 to 1.4.4the focus on divisive language ([78], [77]), and some metrics of formal political activitymay miss some dimensions of bipartisan interaction that more informally may take placeamong legislators.In the final part of this chapter, we assess the specific equilibrium at play within eachCongress among the multiple equilibria typically present in this class of games.3 We showthat the estimated equilibria are interior and stable and that social effort is (inefficiently)under-provided in all Congresses.Finally, our estimated model enables us to perform counterfactuals. We show large re-sponses to changes in the relative costs of socializing. In addition, we examine whetherthe amount of legislation that was generated in the emergency response to the 2008-09 fi-nancial crisis would have changed if the Democrats had not taken over the House, but ithad remained as it was before the crisis. Here we find a quantitatively small change (be-tween 3−5 percent) in the amount of legislation – so even though the preferences changed,the difference would be in the fractions of social to legislative efforts, but not in the endoutcome of this particular episode.2.1.1 Relation to the LiteratureFrom the theoretical perspective, this chapter contributes to a literature that examines peer-influenced behavior when accounting for the endogeneity of networks.4 In particular, wegeneralize the model of [36] to include biases (homophily) in network formation. Thepartisan biases in interactions matter significantly in our empirical application, and shouldalso in a variety of other applications where social interaction may have in-group versusout-group components.This chapter also contributes to the earlier literature that showed that social networksmatter in legislative environments. For instance, [74] used a connectedness measure basedon cosponsorship to show that more connected members of Congress are able to get moreamendments approved and have more success on roll call votes on their sponsored bills.5Again using cosponsorship links, [48] show that Congress can be understood as a small-world network, as it appears subdivided in multiple dense parts tied together by some inter-mediaries. These network features correlate with legislative productivity over time (numberof important laws passed, as defined by [123]).63For a complete review of approaches to empirical models with multiple equilibria, see [57].4See [83], [82], [121], [96], [15], [97] (and see [94], [95], [98] for surveys of the network formation andgames on networks literatures).5A similar study is [175].6There is other work on cosponsorhip. For example, [8], [109], and [31] study the incentives for cosponsor-5The network analysis of legislation is growing, and provides increasing evidence thatsocial relationships matter substantially and are causal in nature. For example, [106] showsa correlation between bill survival and weak ties of the sponsor for eight state legislaturesand for the US House of Representatives. [50] employ identification restrictions aimedat ascertaining causal effects of networks on voting behavior (using the quasi-at-randomseating arrangements of Freshman Senators). [70] studies the role of exogenously shiftedsocial connections within the European Parliament also using seating arrangements.Importantly, all of these papers take networks as exogenous. In contrast, our endoge-nous analysis of the network enables us to see how incentives to socialize differ acrossparties and have changed over time.The rest of this chapter is organized as follows. Section 2.2 derives the empirical modeland discusses identification. Section 2.3 presents the data. Section 2.4 illustrates the estima-tion and the moment conditions. The estimation results and the assessment of the model’sfit are reported in Section 2.5. Section 2.6 contains the main counterfactuals and discussesthe equilibria of the model from an empirical perspective. Section 2.7 concludes.2.2 The Model2.2.1 Legislators, Parties, and PartisanshipThe legislature, henceforth referred to as “Congress”, is composed of n members and, forsimplicity, we focus on one chamber (e.g. the House). N = {1,2, ...,n} represents the setof politicians in Congress. Clearly, although we refer to Congress, the model applies to avariety of deliberative bodies, legislatures, committees, and organizations more generally.The set of politicians is partitioned into parties, with a generic party denoted P` .Each party P` has a level of partisanship p`. In particular, members of party ` spenda fraction of their interaction, p`, at exclusively party ` events, so only mixing and meet-ing with own-party events, and the remainder, 1− p`, at events in which they mix withing in different settings (focusing on ideological similarity, tenure, etc.). Beyond their role in social networks,[170] study the signaling content of cosponsorships, noting that cosponsorship is a cheap way of signaling to themedian voter about one’s congressional activity. They identify three different explanations for cosponsorshipsand their possible signaling impact: (i) bandwagoning (signaling strong support for the bill), (ii) ideology, and(iii) expertise. They find a null to moderate effect of cosponsorship on bill success, as measured as successiveprogress of the bills through Congress hurdles. [104] instead point out that the timing of cosponsorships wouldindicate that it is not as much a signaling to voters, as to other politicians (for example, they show that extrem-ists seem to cosponsor earlier). Still in the context of bill sponsorships, [10] find correlations with legislativeproductivity (i.e. the bill passing through different stages in Congress) for Congress member who sponsor morebills and use more floor time (albeit at a declining marginal rate).6members of other parties. This can include party and caucus meetings, joint sessions, fund-raising events, committee works, social gatherings and formal events, etc. For our periodof interest, examples of party-specific events are closed sessions called Party Conferencesfor Republicans and Party Caucuses for Democrats (their respective chairs represent thenumber 3 position of official party leadership rankings).Politician i is from party P(i), and p(i) denotes the level of partisanship of politician i’sparty.In our empirical analysis, there are two parties, 1,2, and then we index the n politiciansso that the first q of them belong to P1 = {1; ...;q} and the remainder to P2 = {q+1; ...;n}.Let q≥ n/2, so that party 1 is the majority party.SocializingEach politician chooses an effort level of how much he or she socializes (i.e., how muchtime he or she spends interacting with other politicians), denoted si ∈ ℜ+. It is via thissocializing that he or she forms connections with other politicians.The network {gi, j}i, j∈N that arises from the vector of social efforts, s, is described by7:gi j(s) = sis jmi j(s), (2.1)where if j ∈ P(i) thenmi j(s) = p(i)p( j)∑k∈P(i),k 6=i p(k)sk+(1− p(i)) (1− p( j))∑k 6=i(1− p(k))sk, (2.2)and if j /∈ P(i) thenmi j(s) = (1− p(i)) (1− p( j))∑k 6=i(1− p(k))sk. (2.3)So, politicians meet their own party members in two ways: at their own events andat bipartisan events. They meet members of the other party only at the bipartisan events.Politicians are met with the relative frequency with which they are present at events.7This is for the case in which s j > 0 for at least two people in each party. If other agents are not puttingin social effort, then there can be nobody to match with, and then some of these equations do not apply (theydivide by 0, and if all s are 0). In those cases the matching is described as follows. If at most one s j > 0, then setmi j = 0 for all i j and the entire network equal to 0. If there are at least two people with s j > 0, but also at leastone party with s j > 0 for no more than one agent, then set mi j = gi, j = 0 for all members of a party that has normore than one s j > 0, but use the remaining above specified equations for gi, j’s for any other combinations.7In Appendix, we show that:∑j 6=isis jmi j(s) = si, (2.4)so that the total number of connections that i makes is proportional relative to si.8When p` = 0 for each `, then this simplifies to coincide with the model of [36]. Whenp` = 1 for each `, instead, each party is completely cut off from the other. Then, withineach party again [36] applies.2.2.2 Legislative effort and PreferencesThe other choice of politicians is their legislative effort xi ∈ℜ+. The benefits from legisla-tive efforts are described by:αixi+φ∑j 6=isis jmi j(s)xix j. (2.5)As in a large class of models, of which [36] is a salient instance, there is a direct benefitfrom private effort, with idiosyncratic weight αi. In addition, there are complementaritiesin legislative efforts between politicians who have formed connections: the more effort theyboth expend, the more likely their legislation is to pass. The size of this interaction effectis governed by the parameter φ , whose quantification is a relevant goal in the empiricalanalysis that follows.Both forms of effort are costly for a politician. The cost of legislative effort is given byc2 x2i , with c > 0, and the total cost of socializing is given by12 s2i . The parameter c governsthe relative cost of legislative effort to social effort.Taken together, the politician’s preferences are the amount of legislation that he or sheproduces less the costs of legislative and social efforts. This is given by:u˜i(xi,x−i,si,s−i) = αixi+φ∑j 6=isis jmi j(s)xix j− c2x2i −12s2i . (2.6)8Although we use cosponsorships of Congressional bills as proxies for socializing efforts, we choose notmodel the directed cosponsoring by i of any specific Congress member j (nor other i, j directed linkages).Rather, we posit a generic social effort si. Pairwise cosponsorships between i, j ∈N have been extensivelyinvestigated in a literature largely led by [74]. Here, we make use of this information in Section 2.5, where weconsider how well our model-predicted links gi, j fit the auxiliary information of i, j pairwise cosponsorships asa form of external validation of our approach.82.2.3 A Micro-Foundation Built Upon Reelection PreferencesThere are many different ways in which we could justify the preferences in (2.6), as thenatural presumption is that politicians care to maximize the legislation that they pass. Here,we posit a realistic microfoundation, which we will bring directly to the data.Politicians care about being reelected and can affect the probability of being reelectedby exerting effort in Congress and by building connections instrumental to having specificlegislation passed (e.g. policy favorable to the politician’s constituents).Each politician anticipates these effects on his/her reelection chances. More specifically,each congressional cycle has two periods, 1 and 2, where the second period provides thereelection incentives that drive activity in the first period. Politicians are career motivatedand exert costly efforts with the aim of increasing their chances of being reelected.In period 1, each Congress member can present a policy proposal, which for brevitywe refer to as a “bill”. The bill consists of a policy goal the Congress member intends tofulfill, for instance passing a statute targeted to his or her constituency, landing a subsidy,or obtaining an earmark beneficial to firms in the home district. We describe below howgetting i’s policy goal fulfilled maps into an increase in i’s chances of being reelected.Suppose a politician’s utility is given by:ui = Pr(reelected)− c2x2i −12s2i . (2.7)The choice of xi, the level of legislative activity exerted by i, affects the support for i’slegislation, Yi, through a function:Yi = εixi(∑j∈Ngi, j(s)x j). (2.8)Both i’s own legislative effort, xi, and that of his or her connections in the network,∑ j∈N gi, j(s)x j,matter for the ultimate support received by i’s bill.Yi is stochastic and depends also on a random shock εi, assumed to be standard Paretodistributed with scale parameter γ > 0 and i.i.d. across politicians. We assume that εiis realized after the choice of x, the vector of x j across all politicians j ∈ N . Becauseεi is a shock following the realized legislative support, i must take expectations over itsvalue when choosing (xi,si). Also notice that each link between politician i and j is anendogenous function of the social efforts of everybody else, hence the dependency gi, j(s)on s, the vector of s j efforts across all politicians j ∈N .9The bill is approved if Yi > m, where m > 0 is a generic institutional threshold.9 Theprobability of having the bill approved is thus given by:Pr(Yi > m) = Pr(εi >mxi(∑ j∈N gi, j(s)x j)) (2.9)=( γm)(∑j∈Ngi, j(s)x j)xi,where we use the distributional assumption on ε .10 Actual passage of the bill sponsored byi is represented by the indicator function I[Yi>m].We interpret xi as the observable legislative effort by i, instrumental to the approval of i’sbill, and we postulate that voters prefer politicians exerting higher legislative effort to lowereffort. We also allow for voters to care about whether in fact the bill passes conditional oneffort. That is, we allow for the political principals (the voters) to reward their agent i foreffort xi, networking si, and ultimately luck εi.To get reelected, the politician must have an approval rate in his/her electoral districtthat is sufficiently large. Similarly to [18], the electoral approval rate of i is modeled as avariable Vi:Vi = ρVi,0+ζ I[Yi>m]+αixi+ηi (2.10)where ηi is assumed to be a mean zero electoral shock, uniformly distributed on [−0.5,0.5],and where Vi,0 ≥ 0 stands for the baseline approval rate before the start of the term (i.e. be-fore period 1 in the model). Hence, this set-up allows for approval rates to be persistent, butalso to react when a politician is capable of getting a bill approved I[Yi>m] (with a gain ζ ) orwhen i exerts high legislative effort xi. ρ > 0 measures persistence in approval rates, whichmay be due to the politician’s characteristics (such as incumbency advantage, committeemembership, majority party affiliation). The parameter ζ , which could be equal to zeroempirically, governs the relative importance of a bill actually passing vis-a`-vis legislativeeffort. The direct effect of xi is captured by αi, which is i-specific and may depend on partyaffiliation, majority status, congressional delegation, etc. Finally, notice that, while there isno direct value to the voters of the politician having more socializing, the value of si matters9Naturally, m can be function of a simple majority requirement or even supermajority restrictions.10More generally, one can take Yi to represent the average approval rate of i’s multiple bills. In this case,each b is a separate bill by a politician i. The conditions for our model are unchanged, as long as bills are notstrategically introduced (i.e. specifically shocks εb are still i.i.d. within i).10implicitly, being instrumental in getting legislation approved.In period 2, i is reelected if his/her electoral approval level, Vi is larger than an electoralthreshold w< 1. So, the probability of being reelected is given by:Pr(ρVi,0+ζ I(Yi>m)+αixi+ηi > w)= min{(1+0.5−w)+ρVi,0+ζ I(Yi>m)+αixi,1},where we have used the distributional assumption on η . The above expression is non neg-ative, since its terms are non negative. If the first expression in the brackets is larger than1, then the probability of reelection is 1. We proceed with the empirically relevant case ofwhen reelection is uncertain (i.e. a politician does not know for sure whether (s)he will loseor win).11Note that, in period 1 when making his effort decisions, i does not know the value of εi.So taking the expectation over ε of the above implies an expected probability of reelection,when choosing (si,xi), given by:Pr(reelected) = EεPr(ρVi,0+ζ I(Yi>m)+αixi+ηi > w) (2.11)= (1.5−w)+ρVi,0+ζ γm ∑j∈Ngi, j(s)xix j +αixiwhere Eε I(Yi>m) = Pr(Yi > m), as given by (2.9).Replacing (2.11) into the utility function (2.7) yields:ui(xi,x−i) = (1.5−w)+ρVi,0+ζ γm ∑j∈Ngi, j(s)xix j +αixi− c2x2i −12s2i (2.12)Since the terms (1.5−w) and ρVi,0 do not affect the maximization problem, an equiva-lent utility function is:u˜i(xi,x−i,si,s−i) = αixi+φ∑j 6=isis jmi j(s)xix j− c2x2i −12s2i (2.13)11When the probability of reelection is either 0 or 1, we have that si = xi = 0. This can be seen from equation(2.7). If a politician i cannot influence his reelection prospects, he will not undertake costly effort. This iscontrary to what is observed in the data, as described in Section 2.3, as well as theories of legislative behavior(e.g. [122]).11where φ = ζ γm . This is the specification given in (2.6).2.2.4 Solving For EquilibriumWe examine the pure strategy Nash equilibria of the game in which all politicians simulta-neously choose si and xi.The first order conditions with respect to si and xi that characterize the best response ofpolitician i imply that interior equilibrium levels of (s∗i ,x∗i ) must satisfy:12s∗i = φ∑j 6=is∗jmi j(s∗)x∗i x∗j (2.14)andcx∗i = αi+φ∑j 6=is∗i s∗jmi j(s∗)x∗j . (2.15)We rewrite (2.14) ass∗ix∗i= φ∑j 6=is∗jmi j(s∗)x∗j , (2.16)To fully characterize equilibria, we work with the same approximation as in [36]. Inparticular, we work “at the limit”, when the number of politicians grows.13 In particular,we solve for equilibrium under the assumption that ∑ j 6=i s∗jmi j(s∗)x∗j is the same for all i ofthe same party.This implies that s∗ix∗iis the same for all agents within a party. Using (2.16) in (2.15)yields:cx∗i = αi+ s∗i φ∑j 6=is∗jmi j(s∗)x∗j= αi+s∗2ix∗iDividing through by x∗i implies thatc =αix∗i+s∗2ix∗2i. (2.17)Since s∗ix∗iis the same for all agents within a party, (2.17) implies that αix∗i is the same for all12Note that second derivatives are everywhere negative.13Alternatively, this could be justified via a continuum of politicians of each type, or by examining an epsilonequilibrium with a large n.12agents within a party. This further implies that:x∗i = αiXP(i), (2.18)for some XP(i). In addition, the fact thats∗ix∗iis the same for all agents within a party, impliesthats∗i = αiSP(i), (2.19)in equilibrium for some SP(i).To get explicit expressions for our empirical analysis of Congress, we now specializethe analysis to the case of two parties.For each party j = 1,2 defineA j = ∑i∈PjαiB j = ∑i∈Pjα2i .Proposition 2.2.1. The (interior) Nash equilibria of the limit game of this model are posi-tive solutions to the system given by:x∗i = αiXP(i),and (2.20)s∗i = αiSP(i), (2.21)whereS1X1= φ(p1B1X1A1+(1− p1)2B1S1X1+(1− p1)(1− p2)B2S2X2(1− p1)A1S1+(1− p2)A2S2), (2.22)S2X2= φ(p2B2X2A2+(1− p2)2B2S2X2+(1− p1)(1− p2)B1S1X1(1− p1)A1S1+(1− p2)A2S2), (2.23)cX21 = X1+S21, cX22 = X2+S22. (2.24)All proofs appear in Appendix.If p1 = 1 or p2 = 1, then things reduce to the case of two separate parties with nointeraction across them. That is, they are two copies of the model in [36]. Similarly, ifp1 = p2 = 0 then there is no impact of party affiliation, and again the model simplifies tothat of [36]. The novel case is when at least one partisanship level is positive, yet both levels13are below 1. This biases the interaction of at least one party, leaving room for interactionacross parties. In this case there will be both social mixing across different parties andpartisanship in socializing.Generally, there are multiple equilibria. For instance, there is always an (unstable)equilibrium in which si = 0 for all i. In that case, since no other politician provides effort, agiven politician’s efforts results in no connections and so the best response is also to provideno effort.In addition, a sufficient condition for existence of an interior equilibrium is as follows.Proposition 2.2.2. A sufficient condition for the existence of an interior equilibrium is2c3/23√3≥ φ max[B1A1,B2A2]. (2.25)In this setting with two parties and nontrivial partisanship, there will generally be ei-ther two or four interior equilibria (except at a degenerate set of values where the systemswitches from two to four additional equilibria).142.2.5 Pareto Efficient EffortsBefore proceeding with the empirical analysis, we comment on the Pareto inefficiency ofthe equilibrium outcomes of the model. This is relevant for a welfare analysis that checkswhether there is over-provision or under-provision of social and legislative effort in thestrategic setting.Generally, the fact that there are positive externalities in efforts – in particular in leg-islative efforts – implies that there is under-provision of effort. In particular, the Paretooptimal social and legislative effort levels are unbounded: any finite level of efforts arePareto dominated by some higher levels.15 Hence, all equilibria are characterized by an“under-provision” of efforts.To see this, we first note that the interaction term in equation (2.13) multiplies three14These equilibria correspond to when both parties exert high levels or low levels of social efforts, and thenfor some parameters there are also two additional equilibria in which one party does medium-high and the otherdoes medium-low socializing.15The Pareto analysis in [36] only applies if actions are bounded at some small enough finite level. Thesecond derivatives in their proof flip signs if actions are large enough. Thus, there is a local maximum of aweighted sum of utilities that is interior (which is the one identified in their analysis of first-best actions), butthe global maximum is actually unbounded. With a strict enough bound on efforts, there would exist an interiormaximizer. Effectively, once the efforts are large enough, then the interaction effects dominate the costs. Oneneeds to constrain efforts to be below that level in order to get an Pareto optimal effort solutions. Note howeverthat even with bounds, the equilibrium efforts tend to be inefficient, given the externalities.14variables together: si, xi and x j. This has a cubic function property on efforts: doubling allefforts produces an eight times higher interaction effect. Meanwhile, the costs on the socialeffort (si) and legislative effort (xi) are quadratic. Hence, doubling those only quadruplescosts. It is then direct to check that the gains from increasing effort grow faster than theircosts, which implies the following result.Proposition 2.2.3. Every finite profile of efforts is Pareto dominated by some larger levelof efforts.This implies that, although higher payoffs are possible, the selfish attention to individualcosts limit the amount of effort that is produced in any equilibrium.16If we were to cap effort levels at some high level, then there would exist Pareto optimalefforts bound by the caps.The message here is that generally, given the complementarities and positive exter-nalities, there is underprovision of effort. A political party, or a government, could helpovercome some of the inefficiencies, for instance, by subsidizing meetings and interactions.2.2.6 Preliminaries to EstimationGenerally, effort levels s∗i ,x∗i are not measured exactly and are observed with noise. Forinstance, the bill cosponsorships often used as the basis for the construction of politicalnetworks are end products that miss other forms of socializing (e.g. close-doors meetings,fund-raisers, and so on). Similarly, although we can partially observe legislative effortthrough standard proxies (e.g. times the Congress member was present on the floor forspeeches, presence in roll call voting, or number of bills written17), these are imperfectproxies for the legislative efforts that politicians exert. Thus, we account for measurementerror in our analysis.Let Nτ denote the politicians comprising Congress τ and note that this is a set whichvaries across different τ .18 Introducing classical measurement error, for politician i inCongress τ , we observe:16It should be noted, that with high enough effort levels, then the best responses increase without bound inresponse to increases in others’ efforts. There is no equilibrium, because of the unbounded feedback. Again, ifone imposed a cap, then there would be an equilibrium at those capped levels in the model.17Both highlighted as important for legislative success in [10].18The data is observed for multiple Congresses and we provide identification results for parameters specific toeach Congress. This means we allow our parameters to differ across different Congresses and we can constructtime-series estimates of the parameters.15s˜i,τ = s∗i,τe−λi,τ (2.26)x˜i,τ = x∗i,τe−vi,τ . (2.27)s∗i denotes what is chosen, but it is hit with independent noise and s˜i is observed, andsimilarly for x˜i. The measurement error, conditional on this observation (and on the data wehave), is mean zero, and independent of all the other measurement errors across individualsand time. We do not need to impose that the measurement errors in both types of effort havethe same distribution.From Proposition 2.2.1, (S1,S2,X1,X2) are completely determined by the parametersthat govern the system. Then, all individual choices are functions of parameters and of theset of types {α j} j∈Nτ .Let:αi,τ = ez′i,τβP(i),τ (2.28)where zi,τ indicates a vector of individual observables19 (e.g. ideology, tenure, committeemembership), and βP(i) are party-specific parameters that will be estimated.20The information we employ in the analysis is the following. Let {yi,τ = I(Yi,τ>m)} indi-cate whether each bill was approved or not, where i ∈Nτ and τ is a given Congressionalcycle.21 {s˜i,τ} indicates the (log of hundreds of) cosponsorship decisions per politiciani ∈ Nτ . This is our proxy for the equilibrium social effort{s∗i,τ}. The use of logs andrescaling allows us to keep this effort proxy in the same scale as our proxy variable forlegislative effort. {x˜i,τ} indicates a vector of observable proxies for legislative effort{x∗i,τ}.As discussed in more detail in the following section, this is constructed using data on floorspeeches (word counts per politician during a term) and roll call presence/votes. We employa procedure (Non-Negative Matrix Factorization, see [166]), to reduce the dimensionalityof this set of proxies to a single dimension.2219Unobservables are already be present when we introduce measurement errors. Note that if we had αi =ez′iβ+ηi , we could rewrite equation (2.26), using the equilibrium results, simply as:s˜ieλi−ηi = ez′iβ SP(i), (2.29)and a redefinition of the measurement error to λi−ηi (still mean-zero and i.i.d.) would suffice in returningto the model presented in the main text, as long as A1,A2,B1,B2 were not functions of ηi.20Identification of the model does not rely on the parametrization of α , as we prove in Appendix. However,this is useful for estimation purposes. A nonparametric α would require us to estimate a parameter αi for eachpolitician in each Congress, when we only observe one set of (s˜i, x˜i) per period.21A complete data description section follows below.22Sponsorship of bills is already included, as we use the separate bills independently. Further details on this,16As we perform our analysis within a Congress, we suppress the notation τ . We assumethat a single pure strategy Nash equilibrium, as defined in Proposition 2.2.1, is played ineach Congress. We do not impose, however, that the same equilibrium is played acrossdifferent Congresses, rather we characterize the equilibrium played empirically in Section2.6.Given {yi, s˜i, x˜i,zi}i∈N , we estimate the parameters (c,φ ,ζ ,γ, p1, p2,β1,β2). For iden-tification, we set m = 1, so that the random variable εi is scaled in terms of the institutionalthreshold. The basis for identification is Proposition 2.2.1 and the systems of equations thatit provides. For identification of the parameters of our model it is not necessary to iden-tify the full set of equilibria, but instead just to use the implications that we are observingsome (interior) equilibrium. More precisely, we show that, given the observed data, one canuniquely pin down the equilibrium that is played: although there are multiple interior purestrategy Nash equilibria, conditional on observing {s˜i, x˜i}i∈N , there is only one set of val-ues that is most consistent with it. Formal identification of our model, given the informationavailable to the econometrician, is demonstrated in Appendix.2.3 DataWe use the cosponsorship data from [74], compiled from the Library of Congress, cover-ing the 105th to the 110th United States Congress (from 1997 to 2009). This data con-tains cosponsorship decisions by politician, and within that data, who sponsors and whocosponsors each bill. It also contains information on whether the each bill was approved inCongress or not (we focus on passage in the House of Representatives). Figure 2.1 showsthat measures of inter-connectedness of Congress, for example the total number of cospon-sorship links in legislative acts across members of the House ([74]), have been steadilyincreasing. Figure 2.2 then breaks down how cosponsorships vary within and across par-ties.Per Congressional cycle, we compute the log of how many hundred bills each politiciancosponsors, which is the variable Cosponsorships. This function of cosponsorships acts asan empirical proxy for the social effort {s∗i,τ}i∈Nτ .We note here that cosponsorship differs from bill sponsorship. Sponsoring a bill refersto the introduction of a bill for consideration (and can be done by multiple legislators draft-ing the bill, the “sponsors”). Instead, cosponsorships refer to the decision of adding one’sname as a supporter of the bill (becoming a “cosponsor” of the bill). This decision per sedoes not involve any writing of legislation. Cosponsorships are prevalent in Congress, asthe data and procedure to lead to the effort proxy are in the next Section.17Figure 2.1: Total Number of Cosponsorships per Congressional cycleThe figure shows the evolution of the total number of (unique) cosponsorships during a congressionalcycle (i.e. anytime a politician has cosponsored another in a directed way) over time.Figure 2.2: Number of Cosponsorships Within and Across Parties per CongressionalcycleThe figure shows the evolution of the total number of (unique) cosponsorships within and acrossparties during a congressional cycle (i.e. anytime a politician has cosponsored another in a directedway) over time.can be seen in Table 2.1, and the presence of cosponsorship across party lines is still quitecommon, notwithstanding the trends in polarization discussed in [69], as evident from thetime series in Figure 2.2.The individual bill success outcome (i.e. if the bill passes or not) maps into {yi,τ}i∈N .We then use the sponsorship information to link the outcome of the bill to the networkcharacteristics and individual decisions.18To compute our proxies for legislative effort, {x∗i,τ}i∈Nτ , we first collect data on RollCall voting and floor speeches in Congress. Data for Roll Call voting comes from VoteView.We compute an index, for each politician and for each term in Congress, as the times theCongress member voted as a proportion of total Roll Call votes. This measure, which wecall Roll Call E f f ort, is defined as 1− (number of times i was “Not Voting”/ total Numberof Roll Call votes in a Congress).Following [10], we also use data on floor speeches as a measure of individual legislativeeffort. To do so, we compile the amount of words that each Congress member used inhis/her floor speeches across the duration of one term (we call this variable Words). OurFloor Speeches variable is constructed as log(1+Wordsi,τ/300). We log and rescale thisvariable to a scale comparable to other legislative activities. That is, we divide the numberof words by 300, which is the congressional limit for each short speech – House Rulesexplicitly limit one minute speeches to 300 words.23 Data on floor speeches comes from[78], available on ICPSR.24That these measures of social interaction and legislative activity may be germane to oneanother is evident from the significant and positive raw correlation of link formation andproxies of legislative activity and effort, for instance floor speeches in Figure 2.3 and RollCall Effort in Figure 2.4. This complementarity between effort choices is fully consistentwith our theoretical setup.We proceed to construct {x˜i,τ}i∈Nτ , by using both Roll Call Effort and Floor Speeches.An appropriate combination of these variables can be obtained through dimensionality re-duction methods. Since effort should be non-negative, we employ a procedure that guaran-tees positive values (i.e. we do not use methodologies like principal components analysisthat involve a centering of data and negative values).25 We employ Non-Negative MatrixFactorization (NNMF), a dimensionality reduction procedure which imposes constraints so23One minute speeches are usually conducted at the beginning of the legislative day. They allow politiciansto address the chamber on a topic of their choice, in an unrestricted way. During these short speeches, con-gressmembers often share information about bills and amendments for the day, pay tributes or share their viewson policy. As described in [156], this is a very valuable tool for politicians, including junior ones. It allowscongressmembers to be seen and heard easily, as these speeches are often broadcast due to their short lengthduring electoral campaigns.24As there are changes in the composition of Congress within a term, for instance due to death or resig-nation among other reasons, we have some observations whose cosponsorship numbers and word counts donot correspond to a full term. To mend this, we scale up values proportionally to the recorded behavior whilein Congress. In other words, if a politician leaves halfway through his term, we double the values of theseobservations.25Our qualitative results still hold if we use either of these variables individually. However, the magnitudesof the estimates change due to the different scales of Roll Call Effort (between 0 and 1) and the floor speechdata (in hundreds of words).19Figure 2.3: Correlation between the raw data of log(1+Words) in Floor Speeches andCosponsorship decisions.The figure shows the positive correlation between proxies for socializing (log number of cosponsor-ships) and legislative effort (log number of words in floor speeches). The graph presents the variablesin raw form, without rescaling or removal of members with low cosponsorship. The raw correlationis 0.174. In red, we present a LOWESS (locally weighted scatterplot smoothing) fit, with bandwidth(span) equal to 0.9, fitting the relationship between the variables. We do remove, as described in theData section, observations that have total words equal to zero, which are mostly due to death/resig-nations in that term.that the resulting elements are all non-negative. [168] provide a discussion of this method-ology. NNMF works by factorizing a matrix, call it A, into two positive matrices W,H,under a quadratic loss function. The product WH is an approximation to A of smaller di-mension, as there are less columns in W than rows in A. We then use the main factor in Was our proxy.We also use observable characteristics, namely ideology (measured by DWNominatefrom VoteView), tenure (how many terms a politician has served in Congress, with datacoming from the Library of Congress), and committee memberships.Data on committee memberships comes from the work of [162]. To quantify the valueof the committees a politician is in, we use the Grosewart measure ([90]). [90] and [161]estimate a cardinal value of how much an assignment to a given committee is valuable topoliticians. Such estimates are based upon data on how often politicians accept transfersfrom one committee to another. The more desirable committees are those that politicians ac-cept to be transferred to often, but rarely accept to be transferred away from. The Grosewart20Figure 2.4: Correlation between the raw data of Roll Call Effort and Cosponsorshipdecisions.The figure shows the correlation between proxies for socializing (Cosponsorships) and legislative ef-fort (times the politician is Voting in Roll Call). The graph presents the variables in raw form, withoutrescaling or removal of members with low cosponsorship. We present a LOWESS (locally weightedscatterplot smoothing) fit, with bandwidth (span) equal to 0.9, fitting the relationship between thevariables. This is shown to illustrate that a low correlation across both variables on the full supportis driven by the observations with Roll Call Effort close to 1. For example, a positive correlation of0.135 holds for the data with Roll Call Effort less than 0.95. For our estimation, this will be the usefulvariation in the Roll Call Effort variable that approximates legislative effort.measure sums up the values of the committees in which a politician is present. We use theestimates given in [161] for our study, since they are the updated values for the period westudy.26Summary statistics for all our variables of interest can be found for reference in Table2.1.We restrict the data to Congresses 105th-110th for multiple reasons. First, the data weemploy to compute effort from floor speeches is only available from the 104th Congressonwards. Second, the 104th Congress (corresponding to the Republican Revolution) pro-vides a structural break in the analysis of Congressional behavior. With multiple changes26Below, we also consider an alternative measure for committee memberships. There, we construct dummyvariables for whether a politician has been assigned to a given committee during that congressional term.We then focus on the main committees for parsimony: Appropriations, Energy and Commerce, Oversightand Government Reform, Rules, Transportation and Infrastructure, and Ways and Means. We also include avariable Leadership of whether the politician was the Speaker, the Majority or Minority Leader, or the Majorityor Minority Whip.21Table 2.1: Summary StatisticsCongress105 106 107 108 109 110CosponsorshipsMean 185.74 234.57 229.79 226.75 230.74 269.65Standard Deviation 85.79 102.91 127.03 124.08 119.48 135.90Floor Speeches (Words)Mean 32938.633 36282.23 27906.61 33490.47 33985.21 37416.96Standard Deviation 38503.19 39234.14 34421.74 42334.30 45922.73 51212.574Roll Call EffortMean 0.9620 0.9524 0.9556 0.9505 0.9605 0.9551Standard Deviation 0.0514 0.0574 0.0579 0.0665 0.0380 0.0497Ideology (DWNominate)Mean 0.0674 0.0695 0.0865 0.1116 0.1276 0.0784Standard Deviation 0.4428 0.4549 0.4682 0.4823 0.4966 0.5031TenureMean 4.8439 5.1839 5.4498 5.6073 6.0479 6.0584Standard Deviation 3.9562 3.7690 3.7741 3.9005 4.0137 4.2412GrosewartMean 0.2725 0.2797 0.2896 0.2352 0.3046 0.3180Standard Deviation 1.0815 1.1207 1.1224 1.1545 1.1591 1.1654Approval of House BillsMean 0.1087 0.1246 0.0981 0.1138 0.0957 0.1285Standard Deviation 0.3758 0.3782 0.3092 0.3439 0.3690 0.3687Number of Politicians N 442 435 440 439 438 445Number of Bills 4874 5681 5767 5431 6436 7340The table presents summary statistics for the variables used in the structural estimation, across Con-gresses. Roll Call Effort is defined as the proportion of Roll Call votes that the politician does notappear as “Not Voting”. Number of words said in floor speeches aggregates the number of words saidby a politician across all his speeches in a term. Cosponsorships and number of words are scaled tofull term length (i.e. if a politician leaves mid-office and is replaced mid-office; then both him andthe replacement have those variables multiplied by 2.). For estimation, we remove the observations(bills and politicians) we do not have or cannot match to identifying numbers, and those with lessthan 3 Cosponsorships (see the Data section). These are mostly Congressmen who substitute othersmid-term. Data used for bills is House bills (H.R.).22to Congressional composition and structure during the 104th, it becomes hard to comparethe costs and socializing of this specific Congress to others, preceding or following, withouthaving to further delve into the exceptionality of this particular congressional cycle, whichis not the aim of this work.27Finally, we perform an additional trimming of the data across all Congresses. We drop7 (of the 2681) observations in which politicians that have cosponsorship figures less than3 bills over a full term. From our identification equations, we must use Congress membersthat have at least one cosponsored bill. For those with less than 3 cosponsorships (andgiven that most politicians cosponsor in the hundreds), scaling is also inappropriate. Wealso remove a set of 19 observations, that have the number of words in Floor Speeches setto 0 in the data of [78]. These observations relate almost exclusively to a politician whoeither resigned or died during that term.282.4 Estimation2.4.1 Moment EquationsLet z˜i = [1, Ii∈P2 ,z′i,z′iIi∈P2 ], where Ii∈P2 denotes a dummy variable of whether politician i is inParty 2. Further define: βs = [log(S1), log(S2)−log(S1),β1,β2−β1], βx = [log(X1), log(X2)−log(X1),β1,β2−β1].The moment conditions necessary to identify and estimate the model’s parameters are29:27In addition, without ad-hoc modifications to the estimating model specifically designed to accommodatethe idiosyncrasies of the 104th Congress, this lack of stability would also likely undermine any effort of struc-tural estimation.28Such as Representatives Jo Ann Davis in the 110th Congress, Sony Bono in the 105th, or resignationsas Representative Bobby Jindal in the 110th. Since the data is zero, the rescaling above does not prove to beadequate, so we drop these observations.29See Appendix for a full derivation of Identification and a description of the moment conditions. We alsoprove in Appendix that identification holds in a nonparametric version of the model, where we do impose thefunctional form for α .23Ez˜i(log(s˜i)− z˜′iβs) = 0 (2.30)Ez˜i(log(x˜i)− z˜′iβx) = 0 (2.31)E(2(log(s˜i)− log(x˜i))− log(c− 1X1)−(log(c− 1X2)− log(c− 1X1))Ii∈P2)= 0(2.32)ElogP(yi = 1)− log(1ζ)− log(s˜2i ) = 0. (2.33)S1 = φX1 (B1S1X1m11+B2S2X2m12) (2.34)S2 = φX2 (B2S2X2m22+B1S1X1m12) . (2.35)These moment conditions are based on rewriting the equations of Proposition 2.2.1using our parameterization for α and measurement errors, given in equations (2.28) and(2.26). They allow us to identify (c,ζ ,S1,S2,X1,X2,β1,β2) and set identify φ . Specifically,lack of point identification of φ is the result of lack of point identification of p1 and p2.The parameters p1 and p2 enter nonlinearly in equations (2.34) and (2.35) through mi j(s),identifying a ridge of (p1, p2) pairs satisfying (2.34) and (2.35).In Appendix, we demonstrate how to obtain point identification of φ when we imposeadditional restrictions on the proxy variables (s˜i, x˜i). The restrictions there are on the sec-ond moments of the effort proxies, similarly in spirit to random coefficient models. Suchrestrictions are justified under the assumption that more partisanship may result in nois-ier measurement of social interactions and legislative effort. This alternative approach isheavier in terms of assumptions and for this reason we do not adopt it for the derivationof our main results in Section 2.5. Instead, we report estimated values for all parameters(including p1, p2) under these additional assumptions and specification in Appendix.30For our main empirical exercise, we let Party 1 denote the Democratic Party (with itsvariables denoted by the subscript Dem) and Party 2 denote the Republican Party (analo-gously denoted with a Rep subscript).We carry out the estimation process using a two step procedure. In the first step, wecompute estimates for the parameters (c,ζ ,SDem,SRep,XDem,XRep,βDem,βRep) from the mo-ment equations (2.30) - (2.33) above, via the Generalized Method of Moments.In the second step, we use the first-step estimates to derive a set estimate for φ . This is30As second moments restrictions allow to obtain point estimates of p1 and p2, we use such estimates in thelast part of Section 2.5 for assessing additional implications of our model and validation.24done by using equations (2.34) and (2.35). We grid all pairs (pDem, pRep) ∈ [0,1]× [0,1],and, employing the estimated ADem,ARep,SDem,SRep from the first step, we calculate thevalues of mi j for each pair (pDem, pRep). The set estimate for φ are all the values that satisfyequations (2.34) and (2.35) for any pair (pDem, pRep).Concerning the information of whether a bill passed or not {yi,τ}i∈N , the model is ag-nostic on how many bills a politician proposes. Because a good fraction of members ofCongress sponsor multiple bills, however, we work with L> N bills in the actual data. Thisis easily accommodated in the estimation. Recall that ε are i.i.d. across time and bills. Foreach politician i, all i’s bills have the same associated network gi, j, as it comes from thesame politician and his same network and effort choices (as well as those of his network).The different ε realizations, however, represent different bill qualities or institutional ar-rangements within politician, meaning that the same politician may have one bill approvedand not another. The dimensionality of the problem can be decreased by simply averagingout each bill’s success by politician. This is made possible by the fact that equation (2.33)holds for all bills, implying that it must hold for all politicians as well. Hence, we use theaverage pass rate of bills for politician i as its estimate of the probability of bill approval.2.4.2 Estimation via GMMTo estimate the model, we replace equations (2.30)-(2.33) by their empirical counterpartsand stack them into a vector of the form 1n ∑ni=1 g(s˜i, x˜i,yi,zi;θ). Since all moments haveexpectations taken over λi,vi, which are i.i.d. and mean zero for all politicians, the empir-ical counterpart replaces the expectation operator by the mean over i.31 Furthermore, weaverage over the approval rates for bills for each politician to get the estimated probability31That is, the expectation operator has one observation for each politician, and averages across all politicians.For example, the empirical counterparts to (2.30)-(2.31) are:1nn∑i=1z˜i(log(s˜i)− z˜′iβs) = 0, (2.36)1nn∑i=1z˜i(log(x˜i)− z˜′iβx) = 0, (2.37)or in matrix form:Z˜′(log(s˜)− Z˜βs)n= 0, (2.38)Z˜′(log(x˜)− Z˜βx)n= 0, (2.39)where Z˜ stacks up zi, and log(s˜), log(x˜) stack-up log(s˜i), log(x˜i) respectively.25of approval at the politician level.We then minimize the quadratic form:(1nn∑i=1g(s˜i, x˜i,yi,zi;θ))′W(1nn∑i=1g(s˜i, x˜i,yi,zi;θ)), (2.40)where 1n ∑ni=1 g(s˜i, x˜i,yi,zi;θ) is given by stacking up the empirical counterparts of equations(2.30)-(2.33), for a total of 2k+ 2 equations (k being the dimensionality of z˜i). W is theweighting matrix, which can be taken as the identity matrix (an inefficient choice), or theoptimal weighting matrix for the GMM.Given these first stage estimates, we then estimate the set of feasible values for φ aspreviously described. Further details about the empirical implementation are discussed inAppendix.2.5 ResultsTable 2.2 presents our parameter estimates. Figure 2.5 and Table 2.3 show the distributionsof the estimated αi (Figure 2.5) over time and by party (Table 2.3). These distributionsappear stable across Congresses.Figure 2.5: Distribution of Estimated α over TimeSplitting the samples by party, we observe important differences in the estimated distri-butions of Republicans and Democrats. Democrats have a higher average and dispersion,while Republicans have tighter distributions. This implies different social effort patterns26Table 2.2: Main Results, Specification 1Congress105 106 107 108 109 110c 0.327 0.324 0.359 0.365 0.352 0.336(0.0089) (0.0083) (0.0102) (0.0114) (0.0095) (0.0085)φ [0.0476,0.0591] [0.0487,0.0660] [0.0537,0.0762] [0.0520,0.0724] [0.0526,0.0709] [0.0546,0.0698]ζ 16.050 23.609 20.185 16.387 20.548 19.154(0.918) (1.889) (1.364) (1.041) (1.323 (1.060)SDem 0.876 1.054 1.006 0.970 1.009 1.156(0.0329) (0.0362) (0.0355) (0.0342) (0.0312) (0.0295)SRep 0.850 0.978 0.936 0.847 0.881 0.994(0.0380) (0.0406) (0.0502) (0.0422) (0.0471) (0.0590)XDem 3.423 3.586 3.155 3.132 3.297 3.723(0.130) (0.127) (0.117) (0.131) (0.113) (0.0938)XRep 4.056 4.422 4.168 3.813 3.885 4.093(0.180) (0.186) (0.220) (0.193) (0.209) (0.262)Ideology -0.572 -0.493 -0.710 -0.766 -0.679 -0.509(0.0888) (0.0825) (0.0839) (0.0927) (0.0818) (0.0701)Tenure 0.00364 0.00586 0.00452 0.00518 0.00320 -0.00101(0.00285) (0.00242) (0.00275) (0.00311) (0.00259) (0.00222)Grosewart -0.0145 -0.0180 -0.0300 -0.0151 -0.0111 -0.00404(0.00952) (0.00973) (0.00944) (0.0114) (0.00960) (0.00843)Ideology×Rep 0.0754 -0.0189 -0.0706 0.0929 0.127 0.126(0.0764) (0.0699) (0.0876) (0.0749) (0.0783) (0.0823)Tenure×Rep 0.0110 0.00728 0.00489 0.00776 0.00196 -0.000345(0.00329) (0.00271) (0.00347) (0.00302) (0.00318) (0.0037)Grosewart×Rep -0.0177 -0.00613 -0.0205 -0.0239 -0.0326 -0.0341(0.0104) (0.0100) (0.0115) (0.0112) (0.0104) (0.0124)N 436 433 435 437 433 437Notes: Standard errors in parentheses. The table presents the GMM estimates using the OptimalWeighting Matrix for the parameters of interest, as described in the Estimation section. StandardErrors are computed from estimates of the variance for a GMM estimator with the Optimal WeightingMatrix. Details are in Appendix. The estimate of φ is its estimated identified set. Rep represents thedummy variable of whether a politician was in the Republican Party. Hence, a variable Tenure×Republican represents the additional estimate of the Tenure variable for the Republican Party, ascompared to the Democratic one.27Table 2.3: Differences in the Distributions of αi Across PartiesCongress 105 106 107 108 109 110Democrats:Mean αi 1.256 1.234 1.329 1.375 1.323 1.196Standard Deviation of αi 0.109 0.0946 0.137 0.149 0.125 0.088Republicans:Mean αi 1.085 1.027 0.985 1.094 1.081 1.073Standard Deviation of αi 0.0408 0.0264 0.0277 0.0390 0.0424 0.0465We show the mean and the standard deviation of the (estimated) distributions of αi for each party,highlighting the differences in those distributions. The estimates presented are those from Table 2.2.across parties, as Democrats socialize more and (by our social meeting function) more of-ten with other Democrats.We see that φ has an estimated set that excludes 0 and is stable over time. Its magnituderanges from [0.047,0.076]. To put this into context, note that the marginal utility of anincrease in xi isαi+φsi∑j 6=ix js jmi j(s)− cxi.The direct benefit αi ranges from 1 to 1.4, while the social part φsi∑ j 6=i x js jmi j(s) rangesfrom about 0.15 to 0.32. So, the social incentive is somewhere between a tenth to a third ofthe direct incentives.This provides evidence that the returns to socializing are quantitatively sizeable (signif-icantly away from 0 for all Congresses and of a significant magnitude relative to c whichranges in [0.32,0.37]) and have not changed much over time. In the context of the model,these returns represent the (expected) gain to the politician of the interaction with otherpoliticians and having a bill approved relative to the direct return to legislative effort. φbeing stable also suggests that the returns of having a bill approved have not significantlychanged over the period of time studied.The relative cost of legislative effort to social effort, c, is estimated to be increasingover time. This may be consistent with an increase in the complexity of extant statutes,as for example evident from an average number of pages per statute of 3.6 in 1965-6628to 18.8 in 2015-1632, making interaction between politicians more important in draftingand drumming up support for legislations, or in improvements in the technology of socialinteraction among members of the House, inside and outside the Capitol. We explore howchanges in c affect the choices of social and legislative efforts in our estimated equilibriumin the next section.The estimates of ζ are also significant and large in magnitude. This indicates thatpoliticians see positive gains from having bills approved. This larger magnitude is neededfor the model to be internally consistent. This is because ζ also operates as a normalizerthat guarantees that the probability of bill approval (in equation (2.33)) is between 0 and 1.Using the estimated values of ζ , we can then calculate the probability of bill approvals foreach politician. We show these in Figures 2.6 for Democrats and 2.7 for Republicans.Figure 2.6: Estimated Probability of Approval - Democrats, Congresses 105-110By comparing Figures 2.6 - 2.7 with the average bill passage rates in the summarystatistics (Table 2.1), we can see that the model can generate a good match at the meanapproval rate (which we observe), while our structural assumptions allow us to representthe whole distribution of expected probabilities of having a bill approved across differentpoliticians. These indicate some variation over time. Later Congresses (108th and 110th)show a higher predicted approval rate for most politicians. For Congress 110, such effectsare also driven by an increase in the average αi.We can also discuss the statistical significance of different observable characteristics inexplaining the individual αi. With our baseline specification that uses Ideology, Tenure,32Vital Statistics of Congress 2017, Chapter 6, available at www.brookings.edu29Figure 2.7: Estimated Probability of Approval - Republicans, Congresses 105-110Grosewart for zi, we see that ideology is statistically significant (especially in later Con-gresses). The estimates suggest that those on the left of the ideological spectrum (extremistDemocrats, moderate Republicans), have higher returns of exerting legislative effort. Mean-while, the Grosewart variable, capturing the impacts of committee assignments, appears tobe noisy.We also consider another specification where we replace the Grosewart variable bydummy variables of committee assignments to each of Congress main committees. This isshown in Table 2.4. We can see that the results from our main specification are robust. It isnoteworthy in that case, though, that the estimate of being in the Rules Committee is positiveand significant. The Rules Committee is the committee in charge of determining the rulesthat allow each bill to come to the floor, fundamental for the progress of legislation. It seemsconsistent that politicians in that committee are rewarded for effort in it, even conditionalon having the same ideology, party, and tenure.2.5.1 Fit and DiscussionTo conclude the section, consider the following out-of-sample validation of our approach.Although our analysis employs i’s total cosponsorship figures to proxy for social efforts∗i,τ , the more fine-grained data on pairwise cosponsorship information between i, j politi-cians, directional in nature, is not used in estimation. In this subsection, we predict thecosponsorship i, j links at the level of each Congress member based on what is predictedby our gi, j(s) function separately for each Congress in our sample. The goal is to show that30Table 2.4: Main Results, Specification 2Congress105 106 107 108 109 110c 0.326 0.327 0.351 0.360 0.346 0.330(0.00907) (0.00839) (0.0102) (0.0107) (0.00935) (0.00751)φ [0.0473,0.0594] [0.0483,0.0661] [0.0536,0.0761] [0.0517,0.0715] [0.0526,0.0705] [0.0550,0.0693]ζ 16.132 23.457 20.250 16.417 20.238 19.211(0.907) (1.839) (1.346) (1.025) (1.278) (1.018)SDem 0.881 1.037 1.027 0.978 1.018 1.146(0.0325) (0.0348) (0.0369) (0.0356) (0.0312) (0.0289)SRep 0.870 0.989 1.018 0.875 0.918 1.091(0.0442) (0.0441) (0.0585) (0.0495) (0.0539) (0.0704)XDem 3.420 3.504 3.210 3.177 3.338 3.708(0.127) (0.119) (0.119) (0.125) (0.111) (0.0894)XRep 4.166 4.491 4.522 3.933 4.034 4.450(0.217) (0.211) (0.259) (0.224) (0.240) (0.293)Ideology -0.568 -0.535 -0.681 -0.769 -0.635 -0.510(0.0852) (0.0801) (0.0821) (0.0908) (0.0827) (0.0719)Tenure 0.00349 0.00446 0.00188 0.00389 0.00378 0.000119(0.00298) (0.00219) (0.00237) (0.00254) (0.00231) (0.00209)Appropriations -0.0480 -0.0158 -0.0814 -0.0990 -0.0900 -0.0507(0.0402) (0.0330) (0.0356) (0.0407) (0.0383) (0.0289)EnergyandCommerce 0.0206 0.0296 -0.0178 0.0227 0.0363 0.0302(0.0339) (0.0307) (0.0346) (0.0349) (0.0311) (0.0223)Oversight 0.0387 0.0420 0.0603 0.0405 0.0500 0.00186(0.0286) (0.0323) (0.0333) (0.0408) (0.0431) (0.0387)Rules 0.184 0.165 0.146 0.188 0.198 0.142(0.0258) (0.0279) (0.0340) (0.0392) (0.0368) (0.0247)Leadership 0.0812 0.0182 0.0997 -0.0703 -0.0560 -0.735(0.0759) (0.0510) (0.0964) (0.0807) (0.0522) (0.233)Transportation -0.0132 0.0444 0.0216 0.00783 0.000136 0.00706(0.0310) (0.0297) (0.0254) (0.0294) (0.0259) (0.0267)WaysAndMeans -0.0674 -0.0521 -0.0306 -0.0388 -0.0313 -0.0178(0.0384) (0.0313) (0.0365) (0.0410) (0.0421) (0.0345)Ideology×Rep 0.0453 -0.0527 -0.145 0.0771 0.0839 0.0401(0.0774) (0.0699) (0.0847) (0.0750) (0.0784) (0.0862)Tenure×Rep 0.00991 0.00720 0.00198 0.00668 0.00233 0.000134(0.00316) (0.00266) (0.00311) (0.00303) (0.00291) (0.00333)Appropriations×Rep -0.00958 -0.0412 -0.159 -0.113 -0.124 -0.143(0.0293) (0.0271) (0.0390) (0.0399) (0.0398) (0.0431)EnergyandCommerce×Rep -0.0794 -0.0109 -0.0423 -0.05403 -0.0295 -0.0438(0.0307) (0.0266) (0.0309) (0.0318) (0.0337) (0.0375)Oversight×Rep 0.0780 0.105 0.0892 0.0615 0.0801 0.0363(0.0333) (0.0235) (0.0312) (0.0298) (0.0351) (0.0416)Rules×Rep 0.102 0.0961 0.0702 0.0476 0.142 0.117(0.0403) (0.0275) (0.0291) (0.0355) (0.0415) (0.0497)Leadership×Rep -0.0792 -0.0502 -0.158 -0.203 -0.234 -0.0146(0.0668) (0.0920) (0.163) (0.137) (0.159) (0.0917)Transportation×Rep -0.0406 -0.0367 -0.0592 -0.000666 -0.0230 -0.0457(0.0324) (0.0285) (0.0310) (0.0310) (0.0311) (0.0341)WaysAndMeans×Rep -0.0434 0.000758 -0.0117 -0.0226 -0.0781 -0.150(0.0362) (0.0412) (0.0411) (0.0372) (0.0469) (0.0486)N 436 433 435 437 433 436Notes: Standard errors in parentheses. The table presents the results from the GMM estimation underthe second specification. That is, we replace the Grosewart measure by dummy variables for the mostimportant committees. The variable Leadership represents a dummy of whether the politician wasthe Speaker, the Majority or Minority Leader, or the Majority or Minority Whip. Rep is a dummyvariable for belonging to the Republican Party. The estimate of φ is its estimated identified set. Theoptimal weighting matrix is used, and standard errors are estimated as discussed in Appendix. Allother notes follow those in Table 2.2.31our estimated network model fits such proxies for social ties (common in the literature, see[74]), even without explicitly modeling pairwise socialization decisions.The correlations between the estimated gi, j(s) and any i, j pairwise cosponsorships arereported in Table 2.5. The Table illustrates the correlations for two possible definitions oflinks based on i, j cosponsorship in the data: in the top panel cosponsorships are considereddirected from i to j and in the second panel cosponsorships are considered a-directional. Inthe two cases the correlations with the model-implied gi, j(s) are 0.25 and 0.31 respectivelyand statistically significant. The model appears able to fit disaggregated socializing proxiesemerging from individual cosponsorships that are not directly targeted in estimation.We can draw an additional relevant conclusion from this exercise. Results of Fisher’sz-transformation tests also suggest that our model with pDem > 0, pRep > 0 is better atcapturing the relationships from the pairwise cosponsorship data than alternative modelswith full partisanship (at least one of pDem = 1 or pRep = 1) or without any partisanship(pDem = pRep = 0). These comparisons are possible since different gi, j(s) can be generatedusing different values for pDem, pRep.33We underscore that, although recent political economy research highlights a hollow-ing out of the moderate middle ground in congressional voting ([128]; [69]), our modelwith pDem, pRep in a range less of 0.1 (values in this range for pDem, pRep are obtained inAppendix) produces a substantially better fit of the cosponsorship data than a model withcomplete polarization pDem = pRep = 1, which is statistically dominated.While the exact point estimates of pDem, pRep rely on our assumptions on second mo-ments, we believe that the rejection of pDem = pRep = 1 has to be considered more general.The raw data in Figure 2.2 itself displays a sufficient degree of cross-party cosponsorshipto cast doubt on an hypothesis of “full sorting” among party members in the House. Possi-bly, reconciling a world of more polarized legislators and the thousands of cosponsorshipsacross party lines reported in Figure 2.2, may come from noting that, as ideology may di-verge, engagement across party lines becomes more important for getting legislation to thefloor and passed. Our model appears to capture such phenomena.2.6 Assessment and CounterfactualsOur theory allows for multiplicity of equilibria. In this Section, we first discuss what wecan say about the equilibrium being played in each Congress. In the second part of this33For these tests, absent an estimate for p1 and p2 from the baseline model where these parameters are notidentified, we employ the estimates generated by a specification imposing second moment restrictions, reportedin Appendix. This shows that our model captures part of the empirically observed relationship.32Table 2.5: Model Fit: Correlation of Estimated Network of the Model to the Cospon-sorship Networks in the DataCongress Correlation Fisher’s z-statisticData from Directed Cosponsorships:Model: pDem > 0, pRep > 0 0.2544 -Model: pDem = 0, pRep = 0 0.2516 2.219**Model: pDem = 1 0.1840 55.615***Data from “Combined” Cosponsorships:Model: pDem > 0, pRep > 0 0.3125 -Model: pDem = 0, pRep = 0 0.3091 2.825***Model: pDem = 1 0.2261 70.207***We compare the performance of the partisan model (with pDem > 0, pRep > 0) to the performance ofthe model without partisanship (pDem = pRep = 0) and complete partisanship (pDem = 1), in explain-ing the observed cosponsorships in the data. In the first panel, cosponsorships are measured by thedirected number: how many times i cosponsors j. In the second panel, “combined cosponsorships”’are measured by the number of times i cosponsors j and j cosponsors i, creating a symmetric undi-rected graph. To calculate the statistics, we first generate the links using the theoretical definitiongi j(s) = sis jmi j(s) under our estimated parameters. That is done for the 3 cases. We then show thecorrelations of the model links to the values of the cosponsorships in the data. The estimated valuesfor pDem > 0, pRep > 0 come from the estimates using second moments of (s˜i, x˜i), from Table A.1in Appendix. We present Fisher’s z-transformation statistic, for the test that the correlation of theModel with pDem > 0, pRep > 0 is equal to the correlation of the alternative model (without parti-sanship/complete partisanship). ∗∗∗ represents that the null hypothesis of equal correlations can berejected at 1% significance level, ∗∗ at 5%. Note that, when estimating the model, we did not usecosponsorships at the i j level. We aggregate all Congresses in the analysis above.33Section, we present counterfactual exercises using the estimated model parameters.2.6.1 Stability of Equilibrium and Other Equilibrium PropertiesLet us first discuss the stability of the equilibria that we find.A preliminary technical consideration deals with the fact that we only infer noisy esti-mates of social and legislative effort (SDem,SRep,XDem,XRep) from the data. Those values,however, do not necessarily solve the original (non-noisy) system defined in Proposition2.2.1 under our set of estimated parameters, they only approximate its solution. To find theequilibrium values (SDem,SRep,XDem,XRep) that are consistent with our estimated parame-ters for (c,ζ ,βDem,βRep), we solve the system in Proposition 2.2.1 exactly. Those solutionsare presented in the upper panel of Table 2.6. They are used to compute the model consis-tent bill approval rates shown in Figures 2.6 - 2.7, as well as the counterfactuals presentedbelow. This procedure is described in more detail in Appendix.Based on the results in Figures 2.6 - 2.7 and in the upper panel of Table 2.6, Congress isalways at an interior equilibrium of our model. As there can be multiple interior equilibria,we check that the estimated equilibria are stable.34To perform this analysis we use the best response dynamics.35 Starting at some vectorss0,x0 and iterating through t, we get that the best response dynamics are described by:sti = xt−1i φ∑j 6=imi j(st−1)st−1j xt−1j , (2.41)andxti =αic+ st−1iφc ∑j 6=imi j(st−1)st−1j xt−1j . (2.42)We check whether perturbations away from equilibrium socialization and legislativeefforts converge back to the estimated equilibrium efforts through the best responses of theplayers in the network. This is done by starting slightly below or above the values of Table2.6, and successively iterating the best response dynamics.36 In all Congresses 105th-110th,34Generally, when there are multiple interior equilibria, only some are stable. This is in contradiction withProposition 1 in [36] which claims stability of all interior equilibria. In their model, contrary to the originalproof, the largest equilibrium is unstable. In the proof of that proposition the matrix Π cannot be approximatedby setting off-diagonal terms to 0. In fact, the eigenvalue can change sign if the off-diagonal terms are includedand are on the order of 1/n. This reverses their conclusion.35For details, we refer the reader to the Appendix. Here sti takes xt−1i as given, but one could also solve forsimultaneous best reply dynamics and get the same results.36This is not a full check of stability, as we are not verifying all perturbations. However, given the structureof equilibria, all best responses in a party are proportional to the same X ,S and have similar dynamics.34Table 2.6: Estimated and High Effort EquilibriaCongress 105 106 107 108 109 110Effort Level in the Estimated Equilibrium:SDem 0.835 0.975 0.957 0.906 0.960 1.052SRep 0.822 0.956 0.914 0.873 0.934 1.034XDem 3.639 3.851 3.508 3.397 3.570 3.832XRep 3.623 3.826 3.455 3.357 3.537 3.808Effort Level in the Higher Equilibrium:SDem 2.893 2.470 2.290 2.396 2.320 2.181SRep 2.759 2.362 2.079 2.196 2.183 2.111XDem 6.808 6.150 5.456 5.561 5.577 5.530XRep 6.584 5.973 5.127 5.250 5.361 5.419We numerically assess whether the equilibrium we have estimated is the equilibrium with the high-est values of social and legislative efforts (SDem,SRep,XDem,XRep). To do so, for each Congresswe compute two (possibly) distinct solution to the system of equations in Proposition 2.2.1 underour estimated parameters. First, we find the solutions under our estimated parameters, with start-ing values for effort levels at those estimated in Table 2.2 (upper panel). Second, we search for ahigher effort equilibrium (lower panel). This is done by searching for a solution to equations (2.22) -(2.24), while starting from a vector with large values of effort relative to the estimated ones (namely,(100,100,100,100)). The Table shows that there exists an equilibrium with higher levels of bothsocial and legislative effort in all Congresses. These effort levels are higher than the equilibrium oneswe estimated and that we observe in the data.35we find that best-response dynamics converge back to our estimated equilibrium (from theupper panel in Table 2.6) after few iterations.Second, we compare our estimated equilibrium effort levels to those from a possibleequilibrium with higher levels of effort in each Congress. To do so, we take the systemdefined by Proposition 2.2.1 under the estimated parameters and numerically search forsolutions of this system at higher values of (SDem,SRep,XDem,XRep).These results are reported in the lower panel of Table 2.6. We find that there exists ahigher effort level equilibrium for every Congresses considered in our analysis. These otherequilibria are distinct from the ones we estimate, with effort levels that are approximatelydouble in magnitude to the empirically assessed ones. We also verify that these unobservedequilibria are unstable. To do so, we repeat the stability exercise, starting below the valuesof the higher equilibria. Here, we find that our dynamics diverge away from these higherequilibria.From these exercises we deduce that Congress is generally in an interior low effort equi-librium and, moreover, all Congresses operate at effort levels lower than at an unobserved,unstable, high effort equilibrium, which Pareto dominates the observed equilibrium.We conclude this discussion by briefly noting that there also always exists a “semi-corner” equilibrium. In this equilibrium legislative effort is exerted and is chosen at levelx∗i = αi/c, but there is no socializing, i.e. s∗i = 0 for all i. This occurs since there is noreturn to social effort if no other politician is socializing. Each politician acts in autarky.Effort is still provided in the model, because there are direct incentives αi for legislativeeffort, but no law is passed. Such an outcome is not desirable for politicians or voters due toProposition 2.2.3. Furthermore, this semi-corner equilibrium is unstable, in the sense that,were any politician to deviate to a positive social effort s j, so would all the other politicians.The semi-corner equilibrium is not observed, given that we observe positive socializa-tion in Congress. Such a complete shutdown of socialization effort would be unstable, andso should not be observed for any length of time.2.6.2 CounterfactualsTo conclude, we use the model to answer some hypothetical questions. We focus first onchanging γ keeping φ constant (i.e. changing the bill quality shock). Subsequently, weexplore changes in the relative effort cost of legislative effort c. Finally, we conclude witha counterfactual analysis of the Congressional response to the 2008-09 financial crisis bymodifying the partisan composition of the House.36Table 2.7: Counterfactual in γ: Predicted Probability of Bill ApprovalCongress 105 106 107 108 109 110DemocratsDecrease in 10% in γˆ 0.0568 0.0495 0.0506 0.0626 0.0514 0.0706RepublicansDecrease in 10% in γˆ 0.0570 0.0496 0.0509 0.0629 0.0516 0.0707We calculate the average probability of bill approval by party, when γ is decreased by 10% (keepingφ constant). This is done by calculating the probability of bill approval, given by P(yi = 1) = 1ζ (s∗i )2,with s∗i the solution to the non-noisy equilibrium system from Proposition 2.2.1 (further details inAppendix). We then decrease γ by 10%.Change in γ with φ constantFor each of the 105th-110th Congresses, Table 2.7 shows what would happen to equilibriumefforts if γ was reduced while keeping φ constant (noticing that ζ must adjust inversely).Let us recall that the shock to the bill passage ε is assumed to be standard Pareto dis-tributed with scale parameter γ > 0, hence a lower γ determines a lower median draw of thepositive shock ε and a lower chance of legislative success. As the system of equations inProposition 2.2.1 does not change (the equations depend on φ directly), only the probabilityof approval is affected as per equation (2.9). Hence, γ only changes the shape of the billapproval function. Quantitatively a decrease in γ by 10 percent leads to a sizeable shift ofthe probability of approval curve to the left (which in Table 2.1 are shown to vary from9.57 to 12.85 percent). This is shown in Table 2.7, which reports the average probabilitiesof bill approval under a smaller γ . As we only change the values of γ in equation (2.33),the percentage change in the expected probability of bill passage is linear, dropping by 10percent as well.Change in cWe now examine some counterfactuals with respect to c. By lowering c we lower the costof legislative effort relative to social effort. Table 2.8 reports what would happen to theequilibrium if c were reduced.In our example we decrease the estimated value of c by 1 and by 2 percent, and as-37Table 2.8: Counterfactuals in c: Predicted (Proportional) Change in the (Mean) Prob-ability of Bill ApprovalCongress 105 106 107 108 109 110Decrease in 1% in cˆ 0.082 0.108 0.116 0.104 0.115 0.140Decrease in 2% in cˆ 0.177 0.244 0.267 0.233 0.263 0.346The table presents the change in the average probability of bill approval under the counterfactual,where the estimated cost c, is reduced by either 1% or 2% (of the estimated value cˆ). We do this bycalculating the the implied optimal {x∗i ,s∗i } from Proposition 2.2.1 under the appropriate value of cand calculate the probability of approval defined as 1ζ (s∗i )2. We then find the percentual change overthe predicted values under Table 2.2.sess the changes in the probability of bill approval at our estimated equilibrium. We finda positive effect of decreasing c on the likelihood of bill passage. Moreover, this effectis quantitatively large, substantially more than linear, consistent with the evidence so farpointing towards a substantial role for social complementarities across legislators. Thereis an increase of bill success probability of approximately 10 percent on average across allCongresses, vis-a`-vis a decrease of 1 percent in c. Similarly large effects are present fora 2 percent cut. These magnitudes appear consistent with the overall importance of socialinteractions in legislative activity reported in Section 5.The rationale for the sign of this effect is that a decrease in c leads to an increased choiceof legislative effort that has a sufficiently strong complementarity with social effort to alsodrive up socializing by the individual at this equilibrium (similarly to what happens at thelow effort equilibrium of [36]). This increases the returns to socializing by others, who thenfurther increase their own legislative efforts until a new equilibrium is reached. This feedsback and leads to a large positive effect, highlighting the importance of socializing in theapproval of bills. These dynamics contrast to an opposite negative feedback effect that canoccur, for example, in the higher Pareto equilibrium of [36]. In that case, a decrease in cmay lead to a decrease in bill approval. This is because, at the high equilibrium, a decreasein c makes socialization more costly, which reduces the incentives for socialization. In turn,this reduces the returns for xi due to the complementarity of effort choices.38Counterfactual of the Democratic Party Takeover in the 110th CongressWe conclude this section by proposing a counterfactual of congressional behavior duringthe 110th cycle.Elected in November 2006, the House of Representative turned Democratic majorityafter twelve years of consecutive Republican control. This revealed to be a particularlyconsequential election, as it was the 110th Congress that voted between the Summer andthe Fall of 2008 a host of emergency economic measures in response to the 2008-09 finan-cial crisis (for an analysis of these Congressional votes, see [134]). Some of this legislativeactivity happened to be extremely momentous, including the vote of the Emergency Eco-nomic Stabilization Act of 2008 (EESA, also known as the “TARP” from the TroubledAsset Relief Program), which initially failed passage in the House, inducing one of thelargest intra-day losses in NYSE’s history.An important counterfactual is to assess how relevant the role of congressional networkswas in eventually guaranteeing a responsive legislative intervention to the financial crisis.How different would legislative activity have looked absent the Democratic Party takeoverin 2006?Within our framework, this counterfactual corresponds to keeping the composition ofthe 109th Congress in the 110th Congress. That is, we keep the observed characteristicsand estimated αi for the members of the 109th Congress in an analysis of outcomes in the110th Congress. Meanwhile, the institutional setting of the 110th Congress remains thesame (with the cost and returns to social effort, as well as the other institutional parameterskept at their estimated values for the 110th Congress).Figure 2.5 shows that the distribution of αi from the 109th Congress appears mostlymildly to the left of what is estimated for the 110th Congress. However, even small differ-ences in the vector of α could potentially produce substantial effects through the network interms of bill likelihood of success. In Table 2.9 we inspect the magnitude of the counterfac-tual reductions in the likelihood of bill passage for some of the most important emergencyresponse legislation during the Fall of 2008. This set includes in addition to the EESA, theHousing and Economic Recovery Act of 2008 (aimed at foreclosure prevention, also stud-ied by [134]), the Economic Stimulus Act of 2008 and the Supplementary AppropriationsAct, both large bills precursor of the fiscal intervention of 2009. Table 2.9 reports the rel-ative differences in bill passage probabilities between the counterfactual and the estimatedmodel, as well as the baseline probabilities.37 It appears all differences are in the range of a37Some of these bills’ complex histories align with a low likelihood of success. The EESA of 2008 failedthe House. In addition, about H.R. 3221, [134] write: “Roll call 301: “On Agreeing to the Senate Amendment393−5 percent reduction in the likelihood of success, a quantitatively small effect consideringthe baseline probability of approval. That is, the social network composition of the Housewould have not changed in a sufficiently different way to substantially affect final votingoutcomes. This is counter to the claim that, absent the Democratic takeover of the Housein 2006, the financial crisis response would have been substantially different, with a morerestrained government intervention under a Republican Congress (obviously, conditionallyon the same set of emergency legislation being pushed forth by Treasury Secretary HankPaulson and Federal Reserve Chairman Ben Bernanke). Of course, our exercise can onlyspeak to the quantity of legislation and not to its content.2.7 ConclusionsWe have developed and estimated a structural model of legislative activity for the U.S.Congress in which endogenous, partisan social interactions play an important role in pro-moting bill passage. We estimate that social effort matters substantially and significantlyfor legislative activity.By endogenizing both legislative and social efforts, we are able to accommodate com-plementarities in actions that appear to be strong. In particular, we find that the complemen-tarities among politicians are quantitatively substantial (on the order of a tenth to a third ofthe direct incentives), and are fairly stable across our sample period. We also find that thetwo parties have different base payoffs from passing legislation, both in terms of the averageand variance across party members (both are higher for the Democrats). Overall, we showhow the process of informal social interaction among legislators may paint a less extreme,although still partisan, picture of the internal operation of Congress.Multiple equilibria arise naturally within our theoretical setting (as it is typical of mod-els of endogenous network formation). By careful consideration of the theoretical modeland its behavior around the estimated equilibrium, we are able to show that Congress ap-pears in a stable, low-socialization equilibrium, with effort levels lower than in a Paretosuperior, but unstable, equilibrium present in all Congresses.Finally, our estimated model enables us to perform some counterfactuals. We showwith Amendment No. 1: H.R. 3221 Foreclosure Prevention Act of 2008.” This vote is considered by manythe first crucial roll call in the political economy of the crisis and was characterized by strong opposition (anda veto threat) by the executive branch. The Wall Street Journal (May 9, 2008) refers to the vote as follows:“The House voted 266-154 in favor of the centerpiece of the legislation $300 billion in federal loan guarantees-despite a White House veto threat.” In particular, “The heart of the legislation is a program to help strugglinghomeowners by providing them with new mortgages backed by the Federal Housing Administration. Theguarantees would be provided if lenders agree to reduce the principal of a borrower’s existing mortgage.” H.R.3221 had also previously failed cloture in the Senate in February 2008.40Table 2.9: Counterfactuals in α: Looking at the Changes in (Ex-Ante) predicted prob-ability of Emergency Crisis bills in the 110th Congress, if the Republicans wholost their seats remainedAct Proportional Baseline ProbabilityChange of SuccessEmergency Economic Stabilization Act of 2008. (H.R. 1424) -0.0505 0.0848Sponsor: Patrick Kennedy, Democrat - RIHousing and Economic Recovery Act of 2008 (H.R. 3221) -0.0391 0.0898Sponsor: Nancy Pelosi, Democrat - CAEconomic Stimulus Act of 2008 (H.R. 5140) -0.0391 0.0898Sponsor: Nancy Pelosi, Democrat - CASupplementary Appropriations Act, 2008 (H.R. 2642) -0.0518 0.0706Sponsor: Chet Edwards, Democrat - TXThe table presents the proportional change (Counterfactual - Model)/Model of the probability of eachbill passing under our counterfactual scenario. The counterfactual scenario is keeping the Republicanmajority and composition from the 109th Congress, in the 110th Congress. To do so, we keep thecharacteristics of all politicians from the 109th Congress with the estimated parameters of the 110thCongress. The only difference to characteristics is we add 1 for each politician’s Tenure variable,as those politicians would have stayed 1 extra term in the counterfactual. We do not change theCommittee composition or ideology. We then calculate the projected probability of bill approvalusing the estimated parameters from the 110th Congress. The baseline (model) probability is shownin the second column, computed using pˆDem, pˆRep from Table A.1 in Appendix.substantial impacts from changing the relative cost of social effort in terms of probabilityof bill passage and estimate that the response to the 2008-09 financial crisis would not havechanged in terms of levels of legislation if the Democrats had not taken over the House.With more recent waves of data, further counterfactuals (for example, related to the role ofthe Tea Party movement) could be performed.41Chapter 3Estimating Local InteractionsAmong Many Agents Who ObserveTheir Neighbors13.1 IntroductionInteractions between agents - for example, through personal or business relations - generallylead to their actions being correlated. In fact, such correlated behaviors form the basisof identifying and estimating peer effects, neighborhood effects, or more generally socialinteractions in the literature. (See [25] and [62] for a review of this literature.)Empirical modeling becomes nontrivial when one takes seriously the fact that people areoften connected directly or indirectly on a large complex network, observing some others’types, and that the econometrician observes only a small fraction of those on the network.Furthermore, strategic environments are highly heterogeneous across agents as each agentoccupies a nearly “unique” position in the network. Information sharing potentially createsa complex form of cross-sectional dependence among the observed actions and yet theeconometrician rarely has precise information about the actual network on which peopleobserve other people.The main contribution of this chapter is to develop a tractable empirical model of lin-ear interactions among agents with the following two major features. First, assuming alarge game on a complex exogenous network, the empirical model allows the agents not to1This chapter is a joint work with Jacob Schwartz and Kyungchul Song.42observe the full network, but to observe only part of the types of their neighbors.2Second, our model explains strategic interdependence among agents through correlatedobserved behaviors. In this model, the locality of cross-sectional dependence among theobserved actions reflects the locality of strategic interdependence among the agents. Mostimportantly, unlike most incomplete information game models in the literature, our set-upallows for information sharing on unobservables, i.e., each agent is allowed to observe hisneighbors’ payoff relevant signals that are not observed by the econometrician.Third, the econometrician does not need to observe the whole set of players in the gamefor inference. It suffices that he observes many (potentially) non-random samples of localinteractions. The inference procedure that this chapter proposes is asymptotically validindependently of the actual sampling process, as long as the sampling process satisfiescertain weak conditions. Accommodating a wide range of sampling processes is usefulbecause random sampling is rarely used for the collection of network data, and a preciseformulation of the actual sampling process is often difficult in practice.A standard approach to model interactions among agents is to model them as a gameand use equilibrium strategies from the game to obtain predictions and testable implications.Such an approach is cumbersome in our set-up. Since a particular realization of any agent’stype affects all the other agents’ actions in equilibrium through a chain of information shar-ing, each agent needs to form a “correct” belief about the entire information graph. Apartfrom such an assumption being highly unrealistic, it also implies that predictions from anequilibrium that the econometrician uses to form testable implications generally involve allthe players in the game, when it is often the case that only part of the players are observed inpractice. Thus an empirical analysis which regards the players in the sample as coincidentwith the actual set of players in the game will suffer from lack of external validity when histarget “population” is the original large game involving much more players than those inthe sample.Instead, this chapter adopts an approach of behavioral modeling, where it is assumedthat each agent, not knowing fully the information sharing relations, optimizes accordingto his simple beliefs about other players’ strategies. The crucial part of our behavioralassumption is a primitive form of belief projection which says that each agent, not knowingwho his payoff neighbors observe, projects his own beliefs about other players onto hispayoff neighbors. More specifically, if agent i gives more weight to agent j than to agent k,2For example, a recent paper by [32] documents that people in a social network has a substantial lack ofknowledge on the network, and that the violation of this assumption may have significant implications in thepredictions of the model.43agent i believes that each of his payoff neighbor s does the same in comparing agents j andk.Belief projection in our chapter is a variant of inter-personal projection studied in be-havioral economics. A related behavioral concept is projection bias of [118] which refersto the tendency of a person projecting his own current taste to his future taste. See also[167] who reported experiment results on the interpersonal projection of tastes onto otheragents. Since formation of belief is often tied to the information set the agent has, beliefprojection is closely related to information projection in [119] who focuses on the tendencyof a person projecting his information to other agents’ information. The main differencehere is that our focus is to formulate the assumption in a way that is useful for inferenceusing observational data on actions on a network.We show that our primitive form of belief projection yields an explicit form of thebest linear response which has intuitive features. For example, the best linear response issuch that each agent i gives more weights to those agents with a higher local centrality tohim, where the local centrality of agent j to agent i is defined to be high if and only if ahigh fraction of agents from those whose actions affect agent i’s payoff have their payoffsaffected by agent j’s action. Also, each agent’s action responds to a change in his owntype more sensitively when there are stronger strategic interactions, due to what we call thereflection effect. The reflection effect of player i captures the way player i’s type affects hisown action through his payoff neighbors whose payoffs are affected by player i’s types andactions.One might wonder how close the predictions from our behavioral model is to the predic-tions from an equilibrium model. For this we consider a simple linear interactions modelas a complete information game where one can compute the equilibrium explicitly. Theequilibrium strategies are given in a primitive form of a spatial autoregressive model. Wecompare the network externality from our behavioral model and that from the complete in-formation game model using simulated graphs, one from Erdo¨s-Re´nyi random graphs andthe other from a scale-free random graph generation of Baraba´si-Albert. In both cases, it isshown that both models have similar predictions when the payoff externality parameter isless than or equal to 0.5. However, when it is close to one, the network externality becomesmuch higher in the equilibrium model than in the behavioral model. This is because whilestrong local interactions induce global cross-sectional dependence in the equilibrium modeldue to extensive information transmission, it does not in our behavioral model. Also, as thenetwork size increases, the network externality from our behavioral model changes morestably than that from the equilibrium strategies from a complete information game.44We investigate the finite sample properties of asymptotic inference through Monte Carlosimulations using various payoff graphs. The results show reasonable performance of theinference procedures. In particular, the size and the power of the test for the strategicinteraction parameter work well in finite samples. We also apply our method to an empiricalapplication of decisions of municipalities on state presence revisiting the study by [4]. Weconsider an incomplete information game model which permits information sharing. Thefact that our best linear responses explicitly reveal the local dependence structure meansthat it is unnecessary to separately correct for spatial correlation following, for example,the procedure of [51].The literature of social interactions often look for evidence of interactions through cor-related behaviors. For example, linear interactions models investigate correlation betweenYi and the average of outcomes over agent i’s neighbors. See for example [120], [55], [30]and [26] for identification analysis in linear interactions models, and see [38] for an appli-cation in the study of peer effects. [84] considers nonlinear interactions on a social networkand discusses endogenous network formation. These models often assume that we observemany independent samples of such interactions, where each independent sample constitutesa game which contains the entire set of the players in the game.In the context of a complete information game, linear interactions models on a largesocial network can generally be estimated without assuming independent samples. Theoutcome equations frequently take the form of spatial autoregressive models which havebeen actively studied in the literature of spatial econometrics. ([11]) A recent study by[100] consider a model of linear interactions on a large social network which allows forendogenous network formation. Developing inference on a large game model with nonlin-ear interactions is more challenging. See [132], [172], [160], [173], and [174] for a largegame model of nonlinear interactions. This large game approach is suitable when the dataset does not have many independent samples of interactions. One of the major issues in thelarge game approach is that the econometrician often observes only part of the agents in theoriginal game.3Our approach of empirical modeling is also based on a large game model which iscloser to the tradition of linear interactions models in the sense that our approach attemptsto explain strategic interactions through correlated behaviors among neighbors. In our set-up, the cross-sectional dependence of the observed actions is not merely a nuisance that3[160], [172], [100], [173] and [174] assume observing all the players in the large game. In contrast, [132]allows for observing i.i.d. samples from the many players, but assumes that each agent’s payoff involves all theother agents’ actions exchangeably.45complicates asymptotic inference; it provides the very piece of information that reveals thestrategic interdependence among agents. The correlated behaviors also arise in equilibriumin models of complete information games or games with types that are either privately orcommonly observable. (See [30] and [26].) However, as emphasized before, such an ap-proach can be cumbersome in our context of a large game primarily because the testableimplications from the model typically involve the entire set of players, when in many ap-plications the econometrician observes only a small subset of the players in the large game.[64] model the interactions as a Bayesian game on a large network with private link infor-mation. Like this chapter, they permit the agents not to observe the full network, and showidentification of the model primitives adopting a Bayesian Nash equilibrium as a solutionconcept. One of the major differences of our work from theirs is that we permit infor-mation sharing on unobservables, so that the actions of neighboring agents are potentiallycorrelated even after controlling for observables.A departure from the equilibrium approach in econometrics is not new in the literature.[13] studied the implications of various rationality assumptions for identification of theparameters in a game. Unlike their approach, our focus is on a large game where manyagents interact with each other on a single complex network, and, instead of considering allthe beliefs which rationalize observed choices, we consider a particular set of beliefs thatsatisfy a simple rule and yield an explicit form of best linear responses. (See also [81] and[93] for empirical research adopting behavioral modeling for interacting agents.)This chapter is organized as follows. In Section 3.2, we introduce an incomplete in-formation game of interactions with information sharing. This section derives the crucialresult of best linear responses under simple belief rules. In this section, we discuss the is-sue of external validity of network externality comparing two simple interactions models: acomplete information game with equilibrium strategies and our behavioral model. Section3.3 focuses on econometric inference. This section presents inference procedures, explainsa situation where we can measure the role of information sharing on unobservables andcompares our approach with a standard linear-in-means model. Section 3.4 investigates thefinite sample properties of our inference procedure through a study of Monte Carlo simula-tions. Section 3.5 presents an empirical application on state capacity among municipalities.Section 3.6 concludes. The technical proofs of the results are found in the Appendix.463.2 Strategic Interactions with Information Sharing3.2.1 A Model of Interactions with Information SharingStrategic interactions among a large number of information-sharing agents can be modeledas an incomplete information game. Let N be the set of a finite yet large number of players.Each player i ∈ N is endowed with his type vector (Ti,ηi), where ηi is a private type andTi a sharable type. As we will elaborate later, information ηi is kept private to player iwhereas Ti is observed by his “neighbors” which we define later. Throughout this chapter,we set Ti = (X ′i ,εi)′, where Xi is the vector of characteristics of player i that are observedby the econometrician, and εi the unobserved characteristic of player i. Thus the modelpermits information sharing on unobservables εi. This feature in fact makes a significantdeparture from many existing incomplete information interactions models which assumethat variables that the econometrician observes are public among the agents whereas thevariables that the econometrician does not observe are kept private among themselves. (e.g.[26])To capture the strategic interactions among players, let us introduce an undirected graphGP = (N,EP), where EP denotes the set of edges i j, i, j ∈N with i 6= j and each edge i j ∈EPrepresents that the action of player i affects player j’s payoff.4 We denote NP( j) to be theGP-neighborhood of player j, i.e., the collection of players whose actions affect the payoffof player j:NP( j) = {i ∈ N : i j ∈ EP},and let nP( j) = |NP( j)|. We define NP(i) = NP(i)∪{i} and let nP(i) = |NP(i)|.Player i choosing action yi ∈ Y with the other players choosing y−i = (y j) j 6=i obtainspayoff:ui(yi,y−i,T,ηi) = yi(X ′i,1γ0+ X˜′i,2δ0+β0y˜i+ εi+ηi)− 12y2i ,where T = (Ti)i∈N , Xi,1 and Xi,2 are subvectors of Xi,X˜i,2 =1nP(i)∑k∈NP(i)rikXk,2, and y˜i =1nP(i)∑k∈NP(i)rikyk,if NP(i) 6=∅, and X˜i,2 = 0 and y˜i = 0 otherwise. The factor rik measures the “relative weight”4A graph G = (N,E) is undirected if i j ∈ E whenever ji ∈ E for all i, j ∈ N.47of individual k in the network from the viewpoint of i. Here, we consider two specifications.Specification A : rik = 1, for all i,k ∈ N. (3.1)Specification B : rik = nP(k)/nP(i), for all i,k ∈ N.The simple choice rik = 1 gives equal weight to every other agent, but the choice ofrik = nP(k)/nP(i) give more weights to those who have more edges with others relative toagent i. Thus the payoff depends on other players’ actions and types only through those ofhis GP-neighbors. We call GP the payoff graph.The parameter β0 measures the payoff externality among agents. In the terminologyof Manski (1993), δ0 captures the exogenous effect and β0 the endogenous effect of socialinteractions. As for β0, we make the following assumption:Assumption 3.2.1. 0≤ |β0|< 1.This assumption is often used to derive a characterization of a unique pure strategyequilibrium in the literature. (See e.g. [30] and [26] for its use.) When β0 > 0, the game iscalled a game of strategic complements and, when β0 < 0, it is called a game of strategicsubstitutes.Let us introduce information sharing relations in the form of a directed graph (or anetwork) GI = (N,EI) on N so that each i j in EI represents the edge from player i to playerj, where the presence of edge i j joining players i and j indicates that Ti is observed byplayer j. Hence the presence of an edge i j between agents i and j represents informationflow from i to j. We call graph GI the information graph. For each j ∈ N, defineNI( j) = {i ∈ N : i j ∈ EI},that is, the set of GI-neighbors observed by player j.5 Also writeNI(i) = NI(i)∪{i},i.e., the GI-neighborhood of i including i himself. We define nI(i) = |NI(i)|.We do not assume that each agent knows the whole information graph GI and the payoffgraph GP. To be precise about each agent’s information set, let us introduce some notation.5More precisely, the neighbors in NI( j) are called in-neighbors and nI(i) = |NI( j)| in-degree. Throughoutthis chapter, we simply use the term neighbors and degrees, unless specified otherwise.48For each i ∈ N, we set NP,1(i) = NP(i) and NI,1(i) = NI(i), and for k≥ 2, define recursivelyNP,k(i) =⋃j∈NP(i)NP,k−1( j), and NI,k(i) =⋃j∈NI(i)NI,k−1( j).Thus NP,k(i) denotes the set of players which consist of player i and those players who areconnected to player i through at most k edges in GP, and similarly with NI,k(i). Also, defineNP,k(i) = NP,k(i) \ {i} and NI,k(i) = NI,k(i) \ {i}. For each k ≥ 1, let Ni,k−1 be the σ -fieldgenerated by NP,k+1(i), NI(i) and some additional information Ci which potentially causescorrelation between types across different players. (We will explain Ci later.) That is, fork ≥ 1,Ni,k−1 = σ(NP,k+1(i),NP,k(i), ...,NP,2(i),NI(i))∨Ci,where ∨ between two σ -fields is the smallest σ -field among those which contain the twoσ -fields. Define for each k ≥ 0,Ii,k = σ(TNI(i),ηi)∨Ni,k,where TNI(i) = (Tj) j∈NI(i). We use Ii,k to represent the information set of agent i. Forexample, when agent i has Ii,1 as his information set, it means that agent i knows the setof agents whose types he observes (i.e., NI(i)), the set of agents j whose actions affecthis payoff (i.e., NP,1(i)) and the set of agents whose actions affect the payoff of his GP-neighbors j (i.e., NP,2(i)), and the sharable types of his GI-neighbors (i.e., TNI(i)) and hisown private signal ηi.Throughout the chapter, it is not assumed that any agent i knows NI(k) for any of hisGP-neighbors k. In other words, there might be some GP-neighbor k who may observe otheragents that agent i does not observe, and agent i does not know who such GP-neighbor k isor who those other agents player k observes are.Regarding the joint distribution of the profile of sharable types T , we make the follow-ing assumption:Assumption 3.2.2. For each i ∈ N, TN\NI(i) and TNI(i) are conditionally independent given(GP,NI(i)) and C , whereC = ∨i∈NCi.This assumption allows the individual types to be correlated unconditionally. Each49player i has information Ci which can cause correlation between his type and other agents’types. For example, any two types Ti and Tj may contain a common signal which comesfrom a common observation by the two agents i and j.Assumption 3.2.2 says that the sharable types between two non-neighbors in GI areindependent conditional on all such pieces of information Ci.The assumption permits the situation where the payoff network GP is exogenouslyformed, for example, as a dyadic regression model degree heterogeity, ai, with errors ui j’sthat are independent of εi’s, η j’s, Xi’s and ai’s. (See e.g. [89].) In this case, if we setCi = σ(Xi,ai), Assumption 3.2.2 is reduced to that for each i ∈ N, εN\NI(i) and εNI(i) areconditionally independent given (GP,NI(i),X ,a), where X = (Xi)i∈N and a = (ai)i∈N .3.2.2 Predictions from RationalityEach player chooses a strategy that maximizes his expected payoff according to his beliefs.This provides predictions for their actions given their beliefs. To characterize predictionsfrom rationality, we introduce some notation. For i, j,k ∈ N, let wik j denote the weight thatplayer i believes that player k gives to player j. Suppose that the strategy of player k asbelieved by player i is given as follows:sik(Ik) = ∑j∈NiI(k)T ′j wik j +ηk, (3.2)where NiI(k) denotes the set of players (including player k) who player i believes thatplayer k observes. Given player i’s strategy and his expected strategy of other playerssi−i = (sik)k∈N\{i}, the (interim) expected payoff of player i is defined asUi(si,si−i;Ii) = E[ui(si(Ii),si−i(I−i),T,ηi)|Ii],where si−i(I ) = (sik(Ik))k∈N\{i}, I−i = ∨k 6=iIk and T = (Ti)i∈N . A best linear responsesBRi of player i corresponding to the strategies si−i of the other players as expected by playeri is a linear strategy such that for any linear strategy si,Ui(sBRi ,si−i;Ii)≥Ui(si,si−i;Ii), a.e.Under the assumptions of the model, the best linear responses can be shown to produce50a map from beliefs to actions. To see this, first letwB = (w1, ...,wn)be the belief profile of all the agents, where wi = (wik j)k, j∈N . Then the rationality of agents(i.e., their choosing a best linear response given their beliefs) gives the following relation:w =MwB,where w = (wi j)i, j∈N corresponds to best responses and M is the best response operatorwhich assigns a strategy profile (in terms of weights wi j) to a given belief profile wB. (Theexplicit form of the best response operator is found in the Appendix.)In order to generate predictions, one needs to deal with the beliefs wB. There are threeapproaches to model these beliefs. The first approach is an equilibrium approach where thebeliefs wB coincide with the actual weights implemented by the agents in equilibrium. Thesecond approach uses rationalizability where all the linear strategies that are rationalizablegiven some belief wB are in consideration. The third approach is a behavioral approachwhere one considers a set of simple behavioral assumptions on the beliefs wB and focuseson the best linear responses to corresponding to these beliefs.There are pros and cons among the three approaches. One of the main differences be-tween the equilibrium approach and the behavioral approach is that the former approachrequires the beliefs wi−i to be “correct” for all players i in equilibrium. However, sinceeach player i generally does not know who each of his GP-neighbors observes, a Bayesianplayer in a standard model with rational expectations would need to know the distributionof the entire information graph GI (or at least have a common prior on the informationgraph commonly agreed upon by all the players) to form a “correct” belief given his infor-mation. Given a potentially complex form of GP (partially observed in data) and that theeconometrician rarely observes GI with precision, producing a testable implication fromthis equilibrium model appears far from a trivial task.The rationalizability approach can be used to relax this rational expectations assumptionby eliminating the requirement that the beliefs be correct. The approach considers all thepredictions that are rationalizable given some beliefs. However, in our context, the bestresponse operator M depends on unknown parameters in general, and hence the set ofpredictions from rationalizability can potentially be large and may fail to produce sharppredictions that would be useful in practice.As we explain later in detail, this chapter takes the third approach. We adopt a set of51simple behavioral assumptions on players’ beliefs which can be incorrect from the view-point of a person with full knowledge on the distribution of the information graph, yet usefulas a rule-of-thumb guidance for an agent in a complex decision-making environment suchas one in our model. As we shall see later, this approach can give a sharp prediction that isintuitive and analytically tractable.3.2.3 Belief Projection and Best Linear ResponsesIn this chapter, we consider the following set of behavioral assumptions on the beliefs.Condition BP (Belief Projection): (i) For each i ∈ N and k ∈ NP(i),(a) wikk = wii,(b) wik j = τik jwi j for all j ∈ NI(i)∩NiI(k) for some positive number τ ik j, whereτ iki = 1/(riknP(k)), and (3.3)τ ik j = 1/rik, for all j ∈ NI(i)∩NiI(k), and(ii) wik j = 0 for all j /∈ N¯P(k).As mentioned before, each player i does not know who his GP-neighbors observe, andCondition BP describes a simple rule of belief formation in this environment. The mainpremise of Condition BP is that each agent projects his own beliefs about himself and otherplayers onto his GP neighbors. Condition BP (i)(a) says that each player i believes thatthe self-weight his GP-neighbor k gives to himself is the same as the self-weight of playeri himself. Condition BP (i)(b) says that player i’s belief on his GP neighbor k’s weight toplayer j is formed in reference to his own weight to player j. This assumption says that eachagent believes that his GP neighbors follow the same ranking of other agents as he does.The belief projection is taken as a rule of thumb for each agent i who needs to form anexpectation about his GP-neighbors’ actions when he does not know who his GP-neighborsobserve.The specification of τ iki in (3.3) reflects that player i believes that player k does not caremuch about player i’s type in choosing an action if the player k has many GP-neighbors.The specification of τ jki in (3.3) says that each player i believes that the weight of each ofhis GP-neighbors given to a GP-neighbor j is (1/rik)wi j. For example, if rik = n¯P(k)/n¯P(i),we havewik j =n¯P(i)n¯P(k)wi j.52Therefore, player i believes that when player k has more GP-neighbors than he does, playerk gives less weight to player j than he does. Not knowing who player k observes, player iemploys this rule-of-thumb belief regarding player k’s weights given to other players.Condition BP(ii) is concerned with player i’s belief about the players that his GP-neighbors observe. A standard approach in an incomplete information game with Bayesianplayers assumes that the players agree on a common prior on the entire information graphGI . From this, each agent i derives his posterior on the GI-neighbors of each of his GP-neighbors. Instead, Condition BP(ii) states that player i simply considers only those playersin NP(k) when he deliberates on those players whose action affects the payoff of player k.This is because while player i knows player k’s GP-neighborhood, he does not know playerk’s GI-neighborhood.Let us distinguish between different environments with different information structuresof the game.Definition 3.2.1. (i) Each agent i ∈ N is said to be of simple type if (a) she has beliefs aboutthe other players’ strategies as in Condition BP, (b) believes that other players play a linearstrategy of the form in (3.2), and (c) has information set Ii =Ii,0 with NP,2(i)⊂ NI(i).(ii) Each agent i ∈ N is said to be of first order sophisticated type, if she believes that theother players are of simple type and has information set Ii =Ii,1 with NP,3(i)⊂ NI(i).6The difference between the simple type and a sophisticated type lies not only in thedifference in the rationality type but also in the information set. A first order sophisticatedtype agent knows who the neighbors of the neighbors of their neighbors in GP (i.e., NP,3(i))are, whereas a simple type agent knows only who the neighbors of their neighbors in GP(i.e., NP,2(i)) are.Regarding the sophistication of agents, we make explicit the following basic assumptionwhich we assume throughout.Assumption 3.2.3. The game is populated by agents with the same order of sophistication.Different levels of reasoning for agents of the game are assumed in level k models whichhave received a great deal of attention as a behavioral model in the experiment literature.(See Chapter 5 of [39] for a review.) In these experiments, a simple type agent is often muchsimpler than those in our set-up, where the agent chooses an action without considering any6One can also define a higher order sophisticated type, although this chapter does not fully elaborate on sucha case. More specifically, for k ≥ 2, each agent i ∈ N who believes that the other players are of the (k− 1)-thorder sophisticated type and has information setIi =Ii,k with NP,k+2(i)⊂ NI(i) is said to be of the k-th ordersophisticated type.53strategic interdependence. In contrast, our simple type agent already considers strategicinterdependence and forms a best linear response. On the other hand, the experiment litera-ture of level-k models allows the agents to be of different rationality type in the same game.In our set-up which focuses on observational data, identification of the unknown proportionof each rationality type appears far from trivial. Hence, we consider a game where all theagents have the same order of sophistication.Our focus on linear strategies in combination with other assumptions gives an explicitform of best linear responses. For the expression, let us introduce some notation: for eachi ∈ N and j ∈ NI(i),ci j ≡ 1nP(i) ∑k∈NP(i)1{ j ∈ NP(k)}, if i 6= j, and (3.4)cii ≡ 1nP(i) ∑k∈NP(i)1{i ∈ NP(k)}n¯P(k)=1nP(i)∑k∈NP(i)1n¯P(k),where the last equality follows due to GP being undirected. Note that ci j is the proportionof player i’s GP-neighbors whose payoffs are influenced by the type and action of playerj. Hence ci j represents the local centrality of player j to player i in terms of player j’sinfluence on player i’s GP-neighbors. On the other hand, cii is the average of 1/nP(k)among player i’s GP-neighbors k whose payoffs are affected by player i’s sharable type andaction.Using the explicit form of the best response operator M and Condition BP, we canderive the explicit form of best linear responses. The following theorem gives the form inthe case where all the players are of simple type.Theorem 3.2.1. Suppose that Assumptions 3.2.1 - 3.2.3 hold and all the players are ofsimple type. Suppose further that for each i ∈ N and k 6= i, E[ηk|Ii] = 0. Then each playeri’s best linear response sBRi takes the following form:sBRi (Ii) = λii(γ ′0Xi,1+ εi+β0nP(i)∑j∈NP(i)λi j(γ ′0X j,1+ ε j))+1nP(i)∑j∈NP(i)λi jδ ′0X j,2+ηi,where λi j ≡ ri j/(1−β0ci j).The result in Theorem 3.2.1 shows multiple intuitive features. First, it shows that eachplayer i’s best linear response does not depend on the types of payoff-irrelevant agents54whose types player i observes but whose actions do not affect player i’s payoff. Note thatagents indirectly connected to agent i in GP can still shape the player’s strategies throughthe local centralities ci j. (Later, we also consider the case of sophisticated type, wherethe types of indirectly connected agents are permitted to influence the agent i’s actions.)7Furthermore, observe that for j ∈ NP(i),∂ sBRi (Ii)∂x j,1=β0ri jnP(i)(1−β0cii)(1−β0ci j)γ0 and (3.5)∂ sBRi (Ii)∂x j,2=δ0ri jnP(i)(1−β0ci j) ,both of which measure the response of actions of agent i to a change in the observed typechange of his GP-neighbors. Hence, these quantities capture the network externality in thestrategic interactions.First, note that the network externality for agent i from a particular agent j decreases inthe neighborhood size nP(i) of agent i. More importantly, the network externality for eachagent i is different across i’s and across their GP neighbors j depending on their “impor-tance” to agent i in the payoff graph. This is seen from the network externality (3.5) beingan increasing function of agent j’s local centrality to agent i, i.e., ci j, when the game isthat of strategic complements (i.e., β0 > 0). In other words, the larger the fraction of agenti’s GP-neighbors whose payoff is affected by agent j’s action, the higher the network ex-ternality of agent i from agent j’s type change becomes. Therefore, in our model networkexternality is heterogeneous across agents, depending on the local feature of the payoffgraph around each agent.It is interesting to note that the network externality for agent i with respect to his owntype Xi,1 has a factor λii ≡ rii/(1− β0cii) = 1/(1− β0cii) which is increasing in cii whenβ0 > 0. We call11−β0cii −1the reflection effect which captures the way player i’s type affects his own action throughhis GP neighbors whose payoffs are affected by player i’s types and actions. The reflectioneffect arises because each agent, in decision making, considers the fact that his type affectsother GP-neighbors’ decision making. When there is no payoff externality (i.e., β0 = 0),7The local dependence of actions from best linear responses regardless of what values β0 take in (−1,1) isin contrast with the complete information version of the game, where a high value of β0 makes the dependenceclose to be global.55the reflection effect is zero. However, when there are strong strategic interactions or whena majority of player i’s GP-neighbors have a small GP-neighborhood (i.e., small n¯(k) in thedefinition of cii in (3.4)), the reflection effect is large. Note that for those agents whose ciithe econometrician observes, the reflection effect is easily recovered once one estimates thepayoff externality β0.Now let us turn to the case where the game is played among the first-order sophisticatedplayers.Theorem 3.2.2. Suppose that Assumptions 3.2.1 - 3.2.3 hold and that all the players are offirst-order sophisticated type. Suppose further that for each i ∈ N and k 6= i, E[ηk|Ii] = 0.Then each player i’s best linear response sBR.FSi takes the following form:sBR,FSi (Ii1) = γ0Xi,1+ εi+β0nP(i)∑j∈NP(i)λ j j(γ ′0X j,1+ ε j)+β20 ∑j∈N¯P,2(i)λ˜i j(γ ′0X j,1+ ε j)+δ0X˜i,2+δ ′0β0 ∑j∈N¯P,2(i)λ¯i jX j,2+ηi.where,λ¯i j =1nP(i)∑k∈NP(i)rikλk j1{ j ∈ NP(k)}nP(k),and λ˜i j =1nP(i)∑k∈NP(i)rikλk j1{ j ∈ NP(k)}nP(k)(1−β0ckk) .Note that as compared to the case of the game with agents of simple type, the gamewith agents of the first order sophisticated type predicts outcomes with broader networkexternality. For example, in contrast to the case of simple type agents, the types of neighborswhose actions do not affect player i’s payoff can affect his best response. More specifically,note that for j ∈ NP,2(i)\NP(i),∂ sBR.FSi (Ii)∂x j,1= β 20 γ0λ˜i j and∂ sBR.FSi (Ii)∂x j,2= β0δ0λ i j.This externality from player j on player i is strong when ck j’s are large for many k ∈ NP(i),56i.e., when player j has a high local centrality to a large fraction of player i’s GP-neighbors.83.2.4 The External Validity of Network ExternalityThrough a simple model of linear interactions, we explore two issues of external valid-ity. The first issue is about generalizing the results that come from a model with a smallergraph to the population with a larger graph. We see how sensitively the network external-ity changes as the network grows. If the sensitivity is not high, this supports the externalvalidity of a model toward a larger graph. The second issue is about misspecification ofbehavioral assumptions. Here we set the benchmark (true) model to be a complete infor-mation model with equilibrium strategies, but assume that the econometrician adopts ourbehavioral model to make the analysis tractable. Then we explore how close the networkexternality from the behavioral model is to the true model of complete information game.Both models assume the same payoff function and the same payoff graph. For simplicity,we remove Xi’s and ηi’s. The main focus here is on the stability of the prediction of thenetwork externalities as we progressively move from a small payoff graph to a large payoffgraph. Let Yi be the observed outcome of player i as predicted from either of the two gamemodels.The complete information game model assumes that every agent observes all the typesεi’s of other agents. This model yields the following equilibrium equation:Yi =β0nP(i)∑j∈NP(i)Yj + εi.Then the reduced form for Yi’s can be written asy = (I−β0A)−1ε,where y = (Y1, ...,Yn)′,ε = (ε1, ...,εn)′, and A is a row-normalized adjacency matrix of thepayoff graph GP, i.e., the (i, j)-th entry of A is 1/nP(i) if j ∈NP(i) and zero otherwise. Thusin the complete information equilibrium model, each Yi is a linear combination of all εi’s.The model implies that when β0 is close to one (i.e., the local interaction becomes strong),the equilibrium outcome can exhibit extensive cross-sectional dependence.On the other hand, our behavioral model (with specification A: rik = 1 in (3.1) and with8Using the explicit form of the best response operator M and Condition BP, we can derive best linearresponses in a game populated by agents of a higher order sophisticated type. As the sophistication of agentsbecomes of higher order, the network externality of each agent broadens to a wider set of agents.57Table 3.1: The Characteristics of the Payoff GraphsErdo¨s-Re´nyi Baraba´si-AlbertNetwork A Network B Network C Network A Network B Network Cn 162.0 766.4 3067.4 432.2 2080.1 4663.5dmx 10.72 12.50 14.14 76.62 98.82 113.66dav 2.043 2.296 3.186 1.437 1.902 2.233Notes: This table gives average characteristics of the payoff graphs, GP, used in the simulation study,where the average was over 50 simulations. dav and dmx denote the average and maximum degreesof the payoff graphs.the assumption that all the players are of simple type) predicts the outcomes in the followingsimple reduced form:Yi = λii(εi+β0nP(i)∑j∈NP(i)λi jε j),which comes from Theorem 3.2.1 without Xi’s and ηi’s. It is important to note that thetwo models have the same payoff with the same payoff externality parameter β0. The onlydifference is the information set assumptions and the solution concepts of the game.The parameter of interest is the average network externality:1n ∑i∈N1nP(i)∑j∈NP(i)∂ sBRi (Ii)∂ε j=1n ∑i∈N1nP(i)∑j∈NP(i)[(I−β0A)−1]i j, from the equilibrium modelβ0λiin ∑i∈N1nP(i)∑j∈NP(i)λi j, from the behavioral model,where [(I−β0A)−1]i j denotes the (i, j)-th entry of the matrix (I−β0A)−1.Note that the network externalities depend only on β0 and the payoff graph GP. For thepayoff graph GP, we considered two different models for random graph generation. Thefirst kind of random graphs are Erdo¨s-Re´nyi (ER) random graph with the probability equalto 5/n and the second kind of random graphs are Baraba´si-Albert (BA) random graph suchthat beginning with an Erdo¨s-Re´nyi random graph of size 20 with each link forming withequal probability 1/19 and grows by including each new node with two links formed withthe existing nodes with probability proportional to the degree of the nodes.58For each random graph, we first generate a random graph of size 10,000, and thenconstruct three subgraphs A,B,C such that network A is a subgraph of network B and thenetwork B is a subgraph of network C. We generate these subgraphs as follows. First, wetake a subgraph A to be one that consists of agents within distance k from agent i= 1. Thennetwork B is constructed to be one that consists of the neighbors of the agents in networkA and network C is constructed to be one that consists of the neighbors of the agents innetwork B. For an ER random graph, we took k = 3 and for a BA random graph, wetook k = 2. We repeated the process 50 times to construct an average behavior of networkexternality as we increase the network. Table 3.1 shows the average network sizes anddegree characteristics as we move from Networks A, B to C.First, we would like to see how sensitive the predicted average network externalitybecomes as we move across three networks of increasing sizes. The results are in Figures3.1 and 3.2. Figure 3.1 captures the relation between β0 and the average network externalityfor the case of ER graphs and Figure 3.2 captures that for the case of BA graphs. The leftpanel depicts the relation from the complete information equilibrium model and the rightpanel depicts the relation from the behavioral model.As shown in Figures 3.1-3.2, the predicted network externality from the behavioralmodel is less sensitive to the change of the networks than that from the equilibrium model.In particular, this contrast is stark when β0 is close to 1. The main reason behind thiscontrast is that in the case of the equilibrium model, stronger local strategic interactionsinduce extensive cross-sectional dependence. This extensiveness will sensitively depend onthe size and the shape of the network. On the other hand, the behavioral model limits theextent of the cross-sectional dependence even when β0 is high. Hence the predicted networkexternality does not vary as much as the equilibrium model as we change the network. Theresult illustrates the point that our behavioral model translates local strategic interactions tolocal stochastic dependence of observed actions gives a better property of external validitythan the complete information equilibrium model.Suppose that the econometrician believes the true model is an equilibrium model, butuses our behavioral model as a proxy for the equilibrium model. If these two models gener-ate “similar” predictions, using our behavioral model as a proxy will not be a bad idea. Theresults in Figures 3.1 and 3.2 again show that the answer depends on the payoff externalityβ0. Unless the parameter β0 is very high (say larger than or equal to 0.5), both the equilib-rium approach and the behavioral approach give similar network externality. However, thediscrepancy widens when β0 is high. Hence in this set-up, using our behavioral approachas a proxy for an equilibrium approach makes sense only when strategic interdependence is59Figure 3.1: Network Externality Comparison Between Equilibrium and Behavioral Models:Erdo¨s-Re´nyi GraphsNotes: Each line gives the average network externality as a function of β0, where the network is gen-erated through an ER graph. The complete information game shows how the relationship betweenthe network externality and β0 changes as we expand the graph from a subgraph of agents withindistance k from the agent 1. (Networks A, B, and C correspond to networks with k = 3,4,5 from asmall graph to a large one.) The figures show that the average network externality from the behav-ioral model behaves more stably across different networks than that from the equilbrium model inparticular when β0 (local interaction parameter) is high.not too high.The comparison here uses a set-up where the econometrician observes all the players inthe game. However, it should be kept in mind that as we shall see later when we proposeinference, the behavioral model naturally accommodates the case where one observes onlypart of the players whereas the complete information game model does not in general.Hence when the local strategic interactions are not very high, the behavioral model can bea good proxy for a complete information game model with predictions from an equilibriumwhen only part of the players are observed in the sample.60Figure 3.2: Network Externality Comparison Between Equilibrium and Behavioral Models:Baraba´si-Albert GraphsNotes: The figure is similar to the previous one except that the graph is now BA. The completeinformation game shows the relation changes as we expand the graph from a subgraph of agentswithin distance k from the agent 1. Again, the behavioral model gives a prediction of the relation thattends to be more stable than the complete information game in this network generation.3.3 Econometric Inference3.3.1 General OverviewPartial Observation of InteractionsA large network data set is often obtained through a non-random sampling process. (Seee.g. [110].) The main difficulty in practice is that the actual sampling process by whichthe network data are gathered is hard to formulate formally with accuracy. Our approach ofempirical modeling can be useful in such a situation where interactions are observed onlypartially through a certain non-random sampling scheme that is not precisely known. Inthis section, we make explicit the data requirements for the econometrician and proposeinference procedures. We mainly focus on the game where all the players in the game areof simple type.Suppose that the original game of interactions consists of a large number of agentswhose set we denote by N. Let the set of players N be on a payoff graph GP and an in-formation graph GI , facing the strategic environment as described in the preceding section.61Denote the best response as an observed dependent variable Yi: for i ∈ N,Yi = sBRi (Ii).Let us make the following additional assumption on this original large game. Let us firstdefineF = σ(X ,GP,GI)∨C ,i.e., the σ -field generated by X = (Xi)i∈N , GP, GI and C .Assumption 3.3.1. (i) εi’s and ηi’s are conditionally i.i.d. across i’s givenF .(ii) {εi}ni=1 and {ηi}ni=1 are conditionally independent givenF .(iii) For each i ∈ N, E[εi|F ] = 0 and E[ηi|F ] = 0.The last condition (iii) excludes endogenous formation of GP or GI , because the condi-tion requires that the unobserved type components εi and ηi be conditionally mean indepen-dent of these graphs, given X = (Xi)i∈N and C . However, the condition does not exclude thepossibility that GP and GI are formed based on (X ,C ). Hence the formation of networksby agents using information in X or C is permitted.The econometrician observes only a subset N∗ ⊂ N of agents and part of GP througha potentially stochastic sampling process of unknown form. We assume for simplicity thatn∗ ≡ |N∗| is nonstochastic. This assumption is satisfied, for example, if one collects thedata for agents with predetermined sample size n∗. We assume that though being a smallfraction of N, the set N∗ is still a large set justifying our asymptotic framework that sends n∗to infinity. Most importantly, constituting only a small fraction of N, the observed sampleN∗ of agents induces a payoff subgraph which one has no reason to view as “approximating”or “similar to” the original payoff graph GP. Let us make precise the data requirements.Condition A: The stochastic elements of the sampling process are conditionally indepen-dent of {(T ′i ,ηi)′}i∈N givenF .Condition B: For each i ∈ N∗, the econometrician observes NP(i) and (Yi,Xi), and for eachj ∈ NP(i), the econometrician observes |NP(i)∩NP( j)|, nP( j) and X j.Condition C: Either of the following two conditions is satisfied:(a) For i, j ∈ N∗ such that i 6= j, NP(i)∩NP( j) =∅.(b) For each agent i ∈ N∗, and for any agent j ∈ N∗ such that NP(i)∩NP( j) 6= ∅, theeconometrician observes Yj, |NP( j)∩NP(k)|, nP(k) and Xk for all k ∈ NP( j).62Before we discuss the conditions, it is worth noting that these conditions are triviallysatisfied when we observe the full payoff graph GP and N∗ = N. Condition A is satisfied,for example, if the sampling process is based on observed characteristics X and some char-acteristics of the strategic environment that is commonly observed by all the players. Thiscondition is violated if the sampling is based on the outcomes Yi’s or unobserved payoff-relevant signals such as εi or ηi. Condition B essentially requires that in the data set, weobserve (Yi,Xi) of many agents i, and for each GP-neighbor j of agent i, observe the numberof the agents who are common GP-neighbors of i and j and the size of GP-neighborhoodof j along with the observed characteristics X j.9 As for a GP-neighbor j of agent i ∈ N∗,this condition does not require that the agent j’s action Yi or the full set of his GP-neighborsare observed. Condition C(a) is typically satisfied when the sample of agents N∗ is ran-domly selected from a much larger set of agents so that no two agents have overlappingGP-neighbors in the sample.10 In practice for use in inference, one can take the set N∗ toinclude only those agents that satisfy Conditions A-C as long as N∗ thereof is still largeand the selection is based only on (X ,GP). One can simply use only those agents whoseGP-neighborhoods are not overlapping, as long as there are many such agents in the data.Estimating Payoff Parameters and the Average Network ExternalityIn order to introduce inference procedures for β0 and other payoff parameters, let us definefor i ∈ N,Zi,1 = λiiXi,1+β0λiinP(i)∑j∈NP(i)λi jX j,1, and (3.6)Zi,2 =1nP(i)∑j∈NP(i)λi jX j,2.(Note that Zi,1 and Zi,2 rely on β0 although it is suppressed from notation for simplicity aswe do frequently below for other quantities.) By Theorem 3.2.1, we can writeYi = Z′i,1γ0+Z′i,2δ0+ vi,9Note that this condition is violated when the neighborhoods are top-coded in practice. For example, themaximum number of friends in the survey for a peer effects study can be set to be lower than the actual numberof friends for many students. The impact of this top-coding upon the inference procedure is an interestingquestion on its own.10This random selection does not need to be a random sampling from the population of agents. Note thatthe random sampling is extremely hard to implement in practice in this situation, because one needs to use theequal probability for selecting each agent into the collection N∗, but this equal probability will be feasible onlywhen one has at least the catalog of the entire population N.63wherevi = λiiεi+β0λiinP(i)∑j∈NP(i)λi jε j +ηi.Note that the observed actions Yi are cross-sectionally dependent (conditional on Xi’s) dueto information sharing on unobservables εi. However, since only the types of GP-neighborsturn out to be relevant in the best linear response, the correlation between Yi and Yj is non-zero only when agents i and j are GP-neighbors.We define Zi = [Z′i,1,Z′i,2]′ ∈Rdx1+dx2 and ρ0 = [γ ′0,δ ′0]′ ∈Rdx1+dx2 , where Xi,1 ∈Rdx1 andXi,2 ∈ Rdx2 , so that we can rewrite the linear model asYi = Z′iρ0+ vi.Suppose that ϕi is M×1 vector of instrumental variables (which potentially depend on β0)with M > d ≡ dx1 +dx2 such that for all i ∈ N,E[viϕi] = 0.Note that the orthogonality condition above holds for any ϕi as long as for each i ∈ N, ϕi isF -measurable, i.e., onceF is realized, there is no extra randomness in ϕi. This is the case,for example, when ϕi is a function of X = (Xi)i∈N . We also allow that each ϕi depends onβ0.While the asymptotic validity of our inference procedure admits a wide range of choicesfor ϕi’s, one needs to choose them with care to obtain sharp inference on the payoff pa-rameters. Especially, it is important to consider instrumental variables which involve thecharacteristics of GP-neighbors to obtain a sharp inference on payoff externality parame-ter β0. This is because the cross-sectional dependence of observations carries substantialinformation for estimating strategic interdependence among agents.The moment function is nonlinear in the payoff externality β0 and it is not easy toensure that these moment conditions uniquely determine the true parameter vector even inthe limit as n∗ goes to infinity.11 In this chapter, we adopt a Bonferroni procedure in whichwe first obtain a confidence interval for β0 and, using this, we perform inference on ρ0. Thisapproach works well even when β0 is not consistently estimable.11One might consider following the nonlinear iterated least squares approach of [27]. However, it is not clearin our context whether the parameter β0 is consistently estimable across various payoff graph configurations asn∗ diverges to infinity. Thus, we consider a Bonferroni approach.64We proceed first to estimate ρ0 assuming knowledge of β0. DefineSϕϕ = ϕ ′ϕ/n∗, and ϕ˜ = ϕS−1/2ϕϕwhere ϕ is an n∗×M matrix whose i-th row is given by ϕ ′i , i ∈ N∗. DefineΛ=1n∗ ∑i∈N∗ ∑j∈N∗E[viv j|F ]ϕ˜iϕ˜ ′j, (3.7)and let Λˆ be a consistent estimator of Λ. (We will explain how we construct this estimatorlater.) DefineSZϕ˜ = Z′ϕ˜/n∗, and Sϕ˜y = ϕ˜ ′y/n∗,where Z is an n∗×d matrix whose i-th row is given by Z′i and y is an n∗×1 vector whosei-th entry is given by Yi, i ∈ N∗. Since (from the fact that GP is undirected)ci j =|NP(i)∩NP( j)|nP(i),we can construct Zi for each i ∈ N∗ from the data satisfying Conditions A-C. Then weestimateρˆ =[SZϕ˜ Λˆ−1S′Zϕ˜]−1SZϕ˜ Λˆ−1Sϕ˜y. (3.8)Using this estimator, we construct a vector of residuals vˆ = [vˆi]i∈N∗ , wherevˆi = Yi−Z′i ρˆ. (3.9)Finally, we form a test statistic as follows:T (β0) =vˆ′ϕ˜Λˆ−1ϕ˜ ′vˆn∗, (3.10)making it explicit that the test statistic depends on β0. Later we show thatT (β0)→d χ2M−d , as n∗→ ∞,where χ2M−d denotes the χ2 distribution with degree of freedom M−d. Let Cβ1−(α/2) be the65(1− (α/2))100% confidence set for β0 defined asCβ1−(α/2) ≡ {β ∈ (−1,1) : T (β )≤ c1−(α/2)},where T (β ) is computed as T (β0) with β0 replaced by β and the critical value c1−(α/2) isthe (1− (α/2))-quantile of χ2M−d .Then we establish that under regularity conditions,√n∗Vˆ−1/2(ρˆ−ρ0)→d N(0, I),as n∗→ ∞, whereVˆ =[SZϕ˜ Λˆ−1S′Zϕ˜]−1.(See Section 3.3.2 below for conditions and formal results.) Using this estimator ρˆ , we canconstruct a (1−α)100% confidence interval for a′ρ0 for any non-zero vector a. For thisdefineσˆ2(a) = a′Vˆ a.Let ca1−(α/4) be the (1− (α/4))-percentile of N(0,1). Define for a vector a with the samedimension as ρ ,Cρ1−(α/2)(β0,a) =[a′ρˆ−ca1−(α/4)σˆ(a)√n,a′ρˆ+ca1−(α/4)σˆ(a)√n].Then the confidence set for a′ρ is given byCρ1−α(a) =⋃β∈Cβ1−(α/2)Cρ1−(α/2)(β ,a).Notice that since β runs in (−1,1) and the estimator ρˆ has an explicit form, the confidenceinterval is not computationally costly to construct in general.Often the eventual parameter of interest is one that captures how strongly the agents’sdecisions are inter-dependent through the network. Here let us introduce parameters repre-senting the sensitivity. Let sBRi (Ii) be the best linear response of agent i having informationset Ii. Let us define the average network externality with respect to variable Xi,1,r (where66Xi,1,r represents the r-th entry of Xi,1) to beθ1(β0,γ0,r) =1n∗ ∑i∈N∗1nP(i)∑j∈NP(i)∂ sBRi (Ii)∂x j,1,r=1n∗ ∑i∈N∗1nP(i)∑j∈NP(i)β0ri jnP(i)(1−β0cii)(1−β0ci j)γ0,r,where γ0,r denotes the r-th entry of γ0. See (3.5). Thus the confidence interval for θ1(β0,γ0)can be constructed from the confidence interval for β0 and γ0 as follows:Cθ11−α ={θ1(β ,γr) : β ∈Cβ1−α , and γr ∈Cγr1−α},where Cβ1−α denotes the confidence interval for γ0,r. We can define similarly the averagenetwork externality with respect to an entry of Xi,2 and construct a confidence interval forit. Details are omitted.Downweighting Players with High Degree CentralityWhen there are players who are linked to many other players in GP, the graph GP tends tobe denser, making it difficult to obtain good variance estimators that perform stably in finitesamples. To remedy this situation, this chapter proposes a downweighting of those playerswith high degree centrality in GP. More specifically, in choosing an instrument vector ϕi,we may consider the following:ϕi(X) =1√n¯P(i)gi(X), (3.11)where gi(X) is a function of X . This choice of ϕi downweights players i who have a largeGP-neighborhood. Thus we rely less on the variations of the characteristics of those playerswho have many neighbors in GP.Taking downweighting agents too heavily may hurt the power of the inference becausethe actions of agents with high centrality contain information about the parameter of interestthrough the moment restrictions. On the other hand downweighting them too lightly willhurt the finite sample stability of the inference due to strong cross-sectional dependencethey cause to the observations. Since a model with agents of higher order sophisticatedtype results in observations with more extensive cross-sectional dependence, the role ofdownweighting can be prominent in securing finite sample stability in such a model.67Comparison with Linear-in-Means ModelsOne of the most frequently used interaction models in the econometrics literature is a linear-in-means model specified as follows:Yi = X ′i,1γ0+X′i,2δ0+β0µei (Y i)+ vi, (3.12)where µei (Y i) denotes the player i’s expectation of Y i, andY i =1nP(i)∑i∈NP(i)Yi and X i =1nP(i)∑i∈NP(i)Xi.The literature assumes rational expectations by equating µei (Y i) to E[Y i|Ii], and then pro-ceeds to identification analysis of parameters γ , δ0 and β0. For actual inference, one needsto use an estimated version of E[Y i|Ii]. One standard way in the literature is to replace itby Y i so that we haveYi = X ′i,1γ0+X′i,2δ0+β0Y i+ v˜i,where v˜i is an error term defined as v˜i = β0(E[Y i|Ii]−Y i)+ vi. The complexity arises dueto the presence of Y i which is an endogneous variable that is involved in the error term v˜i.12One of the frequently used approaches is to use instrumental variables. There are twotypes of instrumental variables. The first kind is a peers-of-peers type instrumental variablewhich is based on the observed characteristics of the neighbors of the neighbors. Thisstrategy was proposed by [102], [30] and [55]. The second kind of an instrumental variableis based on observed characteristics excluded from the group characteristics as instrumentalvariables. (See [33] and [63].) However, finding such an instrumental variable in practiceis not always a straightforward task in empirical research.Our approach of empirical modeling is different in several aspects. Our modeling usesbehavioral assumptions instead of rational expectations, and produces a reduced form forobserved actions Yi from using best linear responses. This reduced form gives a rich set oftestable implications and makes explicit the source of cross-sectional dependence in relationto the payoff graph. Our approach permits any nontrivial functions of F as instrumentalvariables at least for the validity of the inference. Furthermore, one does not need to observemany independent interactions for inference.12A similar observation applies in the case of a complete information version of the model, where one directlyuses Y i in place of µei (Y i) in (3.12). Still due to simultaneity of the equations, Y i necessarily involve error termsvi not only of agent i’s own but other agents’ as well.68Estimation of Asymptotic Covariance MatrixThe inference requires an estimator of Vˆ . First, let us find the population version of Vˆ . Aftersome algebra, it is not hard to see that the population version (conditional on F ) of Vˆ isgiven byV =[SZϕ˜Λ−1S′Zϕ˜]−1. (3.13)For estimation, it suffices to estimate Λ defined in (3.7). For this, we need to incorporatethe cross-sectional dependence of the residuals vi properly. From the definition of vi, it turnsout that vi and v j can be correlated if i and j are connected indirectly through two edges inGP. However, constructing an estimator of Λ simply by imposing this dependence structureand replacing vi by vˆi can result in a conservative estimator with unstable finite sampleproperties, especially when each player has many players connected through two edges.Instead, we propose an alternative estimator of Λ as follows. This estimator is found towork well in our simulation studies. We present the explanation and construction of thisestimator in the Appendix.To obtain an estimator Λˆ of Λ (up to β0), we first obtain a first-step estimator of ρ asfollows:ρ˜ =[SZϕ˜S′Zϕ˜]−1 SZϕ˜Sϕ˜y. (3.14)Using this estimator, we construct a vector of residuals v˜ = [v˜i]i∈N∗ , wherev˜i = Yi−Z′i ρ˜. (3.15)Then we estimate13Λˆ= Λˆ1+ Λˆ2,whereΛˆ1 =1n∗ ∑i∈N∗v˜2i ϕ˜iϕ˜′i , andΛˆ2 =sˆεn∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅qε,i jϕ˜iϕ˜ ′j,13Under Condition C(a) for sample N∗, we have Λ2 = 0 because the second sum in the expression for λ2 isempty. Hence in this case, we can simply set Λˆ2 = 0.69andsˆε =∑i∈N∗∑ j∈NP(i)∩N∗ v˜iv˜ j∑i∈N∗∑ j∈NP(i)∩N∗ qε,i j, (3.16)qε,i j =λ jiλiiλ j jnP( j)+λi jλiiλ j jnP(i)+β0λiiλ j jnP(i)nP( j)∑k∈NP(i)∩NP( j)λikλ jk.(Note that the quantity qε,i j can be evaluated once β0 is fixed.) Using Λˆ, we construct theestimator for the covariance matrix V , i.e.,Vˆ =[SZϕ˜ Λˆ−1S′Zϕ˜]−1. (3.17)Later we provide conditions for the estimator to be consistent for V .143.3.2 Asymptotic TheoryIn this section, we present the assumptions and formal results of asymptotic inference. Weintroduce some technical conditions.Assumption 3.3.2. There exists c> 0 such that for all n∗≥ 1, λmin(Sϕϕ)≥ c, λmin(SZϕ˜S′Zϕ˜)≥c, λmin(SZϕ˜Λ−1S′Zϕ˜)≥ c, λmin(Λ)≥ c, and1n∗ ∑i∈N∗λiinP(i)∑j∈NP(i)∩N∗λi j > c,where λmin(A) for a symmetric matrix A denotes the minimum eigenvalue of A.Assumption 3.3.3. There exists a constant C > 0 such that for all n∗ ≥ 1,maxi∈N◦||Xi||+maxi∈N◦||ϕ˜i|| ≤C14 In finite samples, Vˆ is not guaranteed to be positive definite. We can modify the estimator by using spectraldecomposition similarly as in [40]. More specifically, we first take a spectral decomposition Vˆ = BˆAˆBˆ′, whereAˆ is a diagonal matrix of eigenvalues aˆ j of Vˆ . We replace each aˆ j by the maximum between aˆ j and some smallnumber c> 0 in Aˆ to construct A∗. Then the modified version V˜ ≡ BˆA∗Bˆ′ is positive definite. For c> 0, takingc = 0.005 seems to work well in the simulation studies.70and E[ε2i |F ]+E[η2i |F ]<C, where n◦ = |N◦| andN◦ =⋃i∈N∗N¯P(i).Assumption 3.3.2 is used to ensure that the asymptotic distribution is nondegenerate.This regularity condition is reasonable, because an asymptotic scheme that gives a degener-ate distribution would not be adequate to derive a finite sample, nondegenerate distributionof an estimator. Assumption 3.3.3 can be weakened at the expense of complexity in theconditions and the proofs.We introduce an assumption which requires the payoff graph to have a bounded degreeover i in the observed sample N∗.Assumption 3.3.4. There exists C > 0 such that for all n∗ ≥ 1,maxi∈N∗|NP(i)| ≤C.We may relax the assumption to a weaker, yet more complex condition at the expenseof longer proofs, but in our view, this relaxation does not give additional insights. WhenN∗ is large, one can remove very high-degree nodes to obtain a stable inference. As suchremoval is solely based on the payoff graph GP, the removal does not lead to any violationof the conditions given in this chapter.The following theorem establishes the asymptotic validity of the inference based on thebest linear responses in Theorem 3.2.1. The proof is found in the Appendix.Theorem 3.3.1. Suppose that the conditions of Theorem 3.2.1 and Assumptions 3.3.1 -3.3.4 hold. Then,T (β0)→d χ2M−d , and Vˆ−1/2√n∗ (ρˆ−ρ0)→d N(0, I),as n∗→ ∞.3.4 A Monte Carlo Simulation StudyIn this section, we investigate the finite sample properties of the asymptotic inference acrossvarious configurations of the payoff graph, GP. The payoff graphs are generated accordingto two models of random graph formation, which we call Specifications 1 and 2. Specifica-tion 1 uses the Baraba´si-Albert model of preferential attachment, with m representing the71number of edges each new node forms with existing nodes. The number m is chosen from{1,2,3}. Specification 2 is the Erdo¨s-Re´nyi random graph with probability p= λ/n, whereλ is also chosen from {1,2,3}.15 In the first table, we report degree characteristics of thepayoff graphs used in the simulation study.For the simulations, we also set the following: ρ0 = (γ ′0,δ ′0)′, with γ0 = (2,4,1)′, andδ0 = (3,4)′. We choose a to be a vector of ones so that a′ρ0 = 14. The variables ε andη are drawn i.i.d. from N(0,1). The first column of Xi,1 is a column of ones, while theremaining columns of Xi,1 are drawn independently from N(1,1). The columns of Xi,2 aredrawn independently from N(3,1).For instruments, we consider the following nonlinear transformations of X1 and X2:ϕi = [Z˜i,1,X2i,1,X2i,2,X3i,2]′,where we defineZ˜i,1 ≡ 1nP(i) ∑j∈NP(i)λi jλ j jX j,1.We generate Yi from the best response function in Theorem 3.2.1. While the instrumentsX2i,1,X2i,2,X3i,2 capture the nonlinear impact of Xi’s, the instrument Z˜i,1 captures the cross-sectional dependence along the payoff graph. The use of this instrumental variable is crucialin obtaining a sharp inference for β0. Note that since we have already concentrated out ρ informing the moment conditions, we cannot use linear combinations of Xi,1 and Xi,2 as ourinstrumental variables. The nominal size in all the experiments is set at α = 0.05.Overall, the simulation results illustrate the good power and size properties for theasymptotic inference on β0 and a′ρ0. As expected, the average length of the confidenceintervals for both β0 and a′ρ0 become shorter as the sample size increases. We find thatthe confidence interval for β0 exhibits empirical coverage close to the 95% nominal level,while the confidence interval for a′ρ0 is somewhat conservative. This conservativeness isexpected, given the fact that the interval is constructed using a Bonferroni approach.15Note that in Specification 1, the Baraba´si-Albert graph is generated with an Erdo¨s-Re´nyi seed graph, wherethe number of nodes in the seed is set to equal the smallest integer above 5√n. All graphs in the simulationstudy are undirected.72Table 3.2: The Degree Characteristics of the Graphs Used in the Simulation StudySpecification 1 Specification 2n m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3500 dmx 17 21 30 5 8 11dav 1.7600 3.2980 4.8340 0.9520 1.9360 2.96001000 dmx 18 29 34 6 7 9dav 1.8460 3.5240 5.2050 0.9960 1.9620 3.00205000 dmx 32 78 70 7 10 11dav 1.9308 3.7884 5.6466 0.9904 2.0032 3.0228Notes: This table gives characteristics of the payoff graphs, GP, used in the simulation study. dav anddmx denote the average and maximum degrees of the payoff graphs.3.5 Empirical Application: State Presence acrossMunicipalities3.5.1 Motivation and BackgroundState capacity (i.e., the capability of a country to provide public goods, basic services, andthe rule of law) can be limited for various reasons. (See e.g. [24] and [76]). A “weak state”may arise due to political corruption and clientelism, and result in spending inadequately onpublic goods ([3]), accommodating armed opponents of the government ([151]), and war([125]). Empirical evidence has shown how these weak states can persist from precolonialtimes. Higher state capacities seem related to the current level prosperity at the ethnic andnational levels ([75] and [136]).Our empirical application is based on a recent study by [4] who investigate the localchoices of state capacity in Colombia, using a model of a complete information game onan exogenously formed network. In their set-up, municipalities choose a level of spendingon public goods and state presence (as measured by either the number of state employeesor state agencies). There is network externality in a municipality’s choice because munic-ipalities that are adjacent to each other can benefit from their neighbors’ choices of publicgoods provisions, such as increased security, infrastructure and bureaucratic connections.Thus, a municipality’s choice of state capacity can be thought of as a strategic decision ona geographic network.It is not obvious that public good provision in one municipality leads to higher spendingon public goods in neighboring municipalities. Some neighbors may free-ride and under-invest in state presence if they anticipate others will invest highly. Rent-seeking by munici-73Table 3.3: The Empirical Coverage Probability and Average Length of Confidence Intervalsfor β0 at 95% Nominal Level.Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 0.9760 0.9715 0.9710 0.9740 0.9685 0.9700n = 1000 0.9665 0.9720 0.9660 0.9785 0.9705 0.9755n = 5000 0.9745 0.9705 0.9680 0.9715 0.9765 0.9690−0.3 n = 500 0.9730 0.9690 0.9690 0.9740 0.9690 0.9655n = 1000 0.9665 0.9710 0.9660 0.9750 0.9700 0.9710n = 5000 0.9700 0.9670 0.9645 0.9710 0.9755 0.96450 n = 500 0.9610 0.9660 0.9715 0.9735 0.9670 0.9625n = 1000 0.9640 0.9690 0.9690 0.9745 0.9665 0.9670n = 5000 0.9685 0.9670 0.9655 0.9710 0.9745 0.96600.3 n = 500 0.9640 0.9665 0.9725 0.9780 0.9720 0.9675n = 1000 0.9670 0.9695 0.9655 0.9770 0.9700 0.9730n = 5000 0.9705 0.9660 0.9635 0.9725 0.9755 0.96750.5 n = 500 0.9725 0.9735 0.9735 0.9800 0.9715 0.9700n = 1000 0.9690 0.9705 0.9660 0.9770 0.9770 0.9785n = 5000 0.9720 0.9695 0.9650 0.9770 0.9755 0.9730Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 0.2735 0.3069 0.3334 0.3318 0.2933 0.2832n = 1000 0.2264 0.2513 0.2749 0.2874 0.2489 0.2491n = 5000 0.1771 0.1783 0.1971 0.2368 0.1917 0.1818−0.3 n = 500 0.2541 0.2971 0.3245 0.3189 0.2818 0.2703n = 1000 0.2132 0.2445 0.2680 0.2810 0.2398 0.2412n = 5000 0.1710 0.1739 0.1952 0.2387 0.1901 0.17910 n = 500 0.2434 0.2881 0.3175 0.3043 0.2692 0.2606n = 1000 0.2051 0.2377 0.2627 0.2730 0.2316 0.2332n = 5000 0.1639 0.1715 0.1925 0.2373 0.1877 0.17460.3 n = 500 0.2481 0.2891 0.3149 0.2982 0.2652 0.2629n = 1000 0.2114 0.2377 0.2639 0.2673 0.2260 0.2314n = 5000 0.1657 0.1729 0.1913 0.2325 0.1884 0.17100.5 n = 500 0.2613 0.2974 0.3186 0.2977 0.2693 0.2691n = 1000 0.2193 0.2429 0.2688 0.2620 0.2267 0.2350n = 5000 0.1703 0.1770 0.1912 0.2253 0.1851 0.1687Notes: The first half of the table reports the empirical coverage probability of the asymptotic confi-dence interval for β0 and the second half reports its average length. The simulated rejection proba-bility at the true parameter is close to the nominal size of α = 0.05 and the average lengths decreasewith n. The simulation number is R = 2000.74Table 3.4: The Empirical Coverage Probability and Average Length of Confidence Intervalsfor a′ρ0 at 95% Nominal Level.Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 0.9945 0.9955 0.9925 0.9955 0.9950 0.9910n = 1000 0.9925 0.9930 0.9955 0.9955 0.9970 0.9910n = 5000 0.9850 0.9875 0.9900 0.9915 0.9925 0.9930−0.3 n = 500 0.9935 0.9960 0.9940 0.9960 0.9950 0.9905n = 1000 0.9905 0.9935 0.9955 0.9955 0.9960 0.9910n = 5000 0.9795 0.9880 0.9895 0.9885 0.9940 0.98950 n = 500 0.9935 0.9965 0.9945 0.9955 0.9925 0.9925n = 1000 0.9925 0.9935 0.9945 0.9960 0.9970 0.9920n = 5000 0.9785 0.9875 0.9895 0.9840 0.9910 0.98600.3 n = 500 0.9950 0.9940 0.9935 0.9955 0.9915 0.9925n = 1000 0.9940 0.9940 0.9940 0.9920 0.9965 0.9925n = 5000 0.9790 0.9895 0.9925 0.9790 0.9860 0.98500.5 n = 500 0.9940 0.9925 0.9935 0.9940 0.9915 0.9920n = 1000 0.9945 0.9940 0.9945 0.9890 0.9930 0.9930n = 5000 0.9835 0.9900 0.9910 0.9645 0.9850 0.9860Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 1.5840 1.6752 1.7643 1.3326 1.4123 1.8075n = 1000 1.3626 1.4522 1.4819 1.1218 1.1474 1.4152n = 5000 0.9995 0.9979 1.0333 0.8658 0.7786 0.8503−0.3 n = 500 1.5337 1.6361 1.7225 1.4037 1.4196 1.7567n = 1000 1.3263 1.4140 1.4466 1.2028 1.1714 1.3966n = 5000 0.9896 0.9749 1.0145 0.9656 0.8244 0.86500 n = 500 1.5068 1.6007 1.6761 1.5646 1.4632 1.6976n = 1000 1.3060 1.3607 1.4031 1.3721 1.2259 1.3798n = 5000 0.9840 0.9486 0.9873 1.1527 0.9154 0.89220.3 n = 500 1.5516 1.6019 1.6501 1.8290 1.5653 1.6869n = 1000 1.3416 1.3257 1.3754 1.6133 1.3204 1.3959n = 5000 1.0066 0.9412 0.9690 1.3938 1.0199 0.92130.5 n = 500 1.6553 1.6353 1.6567 2.1022 1.6772 1.7101n = 1000 1.4069 1.3146 1.3731 1.8362 1.4272 1.4376n = 5000 1.0552 0.9420 0.9542 1.5865 1.1029 0.9593Notes: The true a′ρ0 is equal to 14. The first half of the table reports the empirical coverage prob-ability of the asymptotic confidence interval and the second half its average length for a′ρ0. Theempirical coverage probability of the confidence interval for a′ρ0 is generally conservative which isexpected from the use of the Bonferroni approach. Nevertheless, the length of the confidence intervalis reasonably small. The simulation number, R, is 2000.75Figure 3.3: Degree Distribution of GPNotes: The figure presents the degree distribution of the graph GP used in the empirical specification.The average degree is 5.48, the maximum degree is 20, and the minimum degree is 1.pal politicians would also limit the provision of public goods. On the other hand, economiesof scale could yield complementarities in state presence across neighboring municipalities.In our study, we extend the model in [4] to an incomplete information game whereinformation may be shared across municipalities. In particular, we do not assume that allmunicipalities know and observe all characteristics and decisions of the others. It seemsreasonable that the decisions made across the country may not be observed or well knownby those municipalities that are geographically remote.3.5.2 Empirical Set-upLet yi denote the state capacity in municipality i (as measured by the (log) number of publicemployees in municipality i) and GP denote the geographic network, where an edge isdefined on two municipalities that are geographically adjacent.16 We assume that GP isexogenously formed. The degree distribution of GP is shown in Figure 3.3. We study theoptimal choice of yi, where yi leads to a larger prosperity pi. Prosperity in municipality i ismodeled as:pi =(β y¯+ x1,iγ+ηi+ εi+ ςDi)yi, (3.18)16This corresponds to the case in of δ1 = δ2 = 0 in [4].76where ςDi is a district specific dummy variable, εi and ηi are our sharable and non-sharableprivate information, and y¯ = 1nP(i) ∑ j∈NP(i) y j. The term x1,i represents municipality charac-teristics. These include geographic characteristics, such as land quality, altitude, latitude,rainfall; and municipal characteristics, such as distance to highways, distance to royal roadsand Colonial State Presence.17The welfare of a municipality is given byui(yi,y−i,T,ηi) = pi(yi, y¯,T,ηi)− 12y2i , (3.19)where the second term refers to the cost of higher state presence, and the first term is theprosperity pi.We can rewrite the welfare of the municipality by substituting (3.18) into (3.19):ui(yi,y−i,T,ηi) =(β y¯+ x1,iγ+ηi+ εi+ ςDi)yi− 12y2i , (3.20)which is our model from the previous section. We assume that municipalities (or the mayorin charge), wishes to maximize welfare by choosing state presence, given their beliefs aboutthe types of the other municipalities.In our specification, we allow for incomplete information. This is reflected in the termsεi, ηi, which will be present in the best response function. The municipality, when choosingstate presence yi, will be able to observe εi of its neighbors and will use its beliefs over thetypes of the others to generate its best response. The best response will follow the resultsfrom Theorem Model SpecificationWe follow closely Table 3 in [4] for the choice of specifications and variables. Throughoutthe specifications, we include longitude, latitude, surface area, elevation, annual rainfall,department fixed effects and a department capital dummy (all in X1). We further considerthe effect of variables distance to current highways, land quality and presence of rivers inthe municipality.For the choice of instruments, we consider two separate types of instruments. The first is17Note that, here we take x2,i = 0. This is done for a closer correspondence to the specification in [4]. Finally,note that pi is only a function of terms are multiplied by yi. This is a simplification from their specification. Wedo so because we will focus on the best response equation. The best response equation, derived from the firstorder condition to this problem, would not include any term that is not a function of yi itself.77the sum of neighbor values (across GP) of the historical variables (denoted as Ci).18 The his-torical variables used are Total Crown Employees (also called Colonial State Officials), Dis-tance to Royal Roads, Colonial State Agencies and Historical Population, as well as Colo-nial State Presence Index squared and Distance to Royal Roads squared. The later two addadditional power to the inference. We also use the variable Z˜i = nP(i)−1∑ j∈NP(i)λi jλ j jX j,1as part of the instrumental variable, which was shown to perform well in the Monte CarloSimulations in Section 3.4. This variable captures cross sectional dependence as a crucialsource of variation for inference on the strategic interactions. We use downweighting of ourinstruments as explained in a preceding section.3.5.4 ResultsThe results across a range of specifications are presented in Table 3.5. In these results, wesee that the effect is statistically different than 0 and stable across specifications. It indicatesthat there is complementarity in the provision of public goods and state presence (β > 0).Let us compare our results to those in [4]. There, the authors report the average marginaleffects over their weighted graph. The (weighted) average degree is 0.0329, so our resultscan be compared in an approximation, by considering 0.0329 βˆ .In general, our estimates have the same sign and significance as those of [4]. Our esti-mates are in the range of [0.002, 0.013], after reweighting as mentioned before, somewhatcomparable to theirs of [0.016, 0.022] (in the case of the outcome of the number of pub-lic employees, in Table 3). Hence, we find similar qualitative effects, although a smallermagnitude. Recall that our confidence set is built without assuming that β0 is consistentlyestimable.In Figure 3.4, we show the results of our estimated network externalities for the es-timates from Table 3.5, for the importance of being a department capital. The averagenetwork externality is computed from1N ∑i∈N1nP(i)∑j∈NP(i)β0γˆdcnP(i)(1−β0cii)(1−β0ci j) ,where γˆdc is the estimated parameter of the X1 variable department capital, and we vary β0within its confidence set. The parameter is defined in Section 3.3, and captures the average18For this, we assume the exclusion restriction in [4], namely that historical variables only affect prosperity inthe same municipality. This means that although one’s historical variables (Total Crown Employees, Distanceto Royal Roads, Colonial State Agencies and Historical Population, as well as functions thereof) can affectthe same municipality’s prosperity, it can only affect those of the neighbors by impacting the choice of statecapacity in the first, which then impacts the choice of the state capacity in the neighbors.78Table 3.5: State Presence and Networks Effects across Colombian MunicipalitiesOutcome: The Number of State EmployeesBaseline Distance to Highway Land Quality Rivers(1) (2) (3) (4)β0 [0.16,0.31] [0.15,0.32] [0.17,0.39] [0.09,0.38]dyi/d(colonial state [−0.051,0.001] [−0.045,−0.001] [−0.043,0.000] [−0.024,0.003]officials)Averagedyi/d(colonial state [−1.138,3.760] [−1.335,2.742] [−0.609,3.388] [−1.775,1.987]agencies)Averagedyi/d(distance to [−0.010,0.009] [−0.008,0.010] [−0.007,0.015] [−0.005,0.012]Royal Roads)n 1018 1018 1003 1003Notes: Confidence sets for β are presented in the table, obtained from inverting the test statisticT (β ) from Section 3.3, with confidence level of 95%. The critical values in the first row comefrom the asymptotic statistic. Downweighting is used. The average marginal effects for historicalvariables upon state capacity are also shown. This is computed from finding the confidence set forthe appropriate γ estimate. For Colonial State Agencies and Distance to Royal Roads, since they enterin quadratic form in X1, we show the average marginal effect. All specifications include controls oflatitude, longitude, surface area, elevation, rainfall, as well as Department and Department capitaldummies. Instruments are constructed from payoff neighbors’ sum of the GP neighbors values ofthe historical variables Total Crown Employees, Colonial State Agencies, Colonial State Agenciessquared, population in 1843, distance to Royal Roads, distance to Royal Roads squared, togetherwith the non-linear function Z˜i = nP(i)−1∑ j∈NP(i) λi jλ j jX j,1. Column (2) includes distance to currenthighway in X1, Column (3) expands the specification of Column (2) by also including controls forland quality (share in each quality level). Column (4) controls for rivers in the municipality and landquality, in addition to those controls from Column (1). One can see that the results are very stableacross specifications.effect of a neighbour being a department capital.The figure shows that there is a strong and increasing network externality from being adepartment capital over the range of the confidence set of β . This indicates that the effectof being a capital has spillovers on other municipalities: since β > 0, and one expects thatdepartment capitals have more state presence and resources, being a department capitalyields increasing returns the stronger the complementarity.79Figure 3.4: Average Network Externality from being a Department CapitalNotes: The figure presents the average network externalities from being a department capital. We usethe estimated results from Column (3) in Table 3.5. This captures the externality for a municipalityfrom being a department capital, which involves higher state presence and centralization of resources.This effect is not only the direct effect, but it also quantifies a reflection effect: neighbors of depart-ment capitals also benefit from it. The grey shaded area represents the 95% confidence interval forβ0.3.6 ConclusionThis chapter proposes a new approach of empirical modeling for interactions among manyagents when the agents observe the types of their neighbors. The main challenge arisesfrom the fact that the information sharing relations are typically connected among a largenumber of players whereas the econometrician observes only a fraction of those agents.Using a behavioral model of belief formation, this chapter produces an explicit form of bestlinear responses from which an asymptotic inference procedure for the payoff parametersis developed. As we showed, this explicit form gives a reduced form for the observedactions, and exhibits various intuitive features. For example, the best linear responses showthat network externality is heterogeneous across agents depending on the relations of theirpayoff neighbors.The advantage of our chapter’s approach is two-fold. First, the empirical modelingaccording to our approach accommodates a wide range of sampling processes. Such afeature is crucial because the econometrician rarely has precise knowledge about the actualsampling process through which data are generated. Second, the model can be used whenonly part of the players are observed from a large connected network of agents.80Chapter 4Unbundling Polarization14.1 IntroductionThis chapter focuses on a set of open questions in the literature on political polarization,a phenomenon that has taken a sharply increasing tack since the mid-1970s [126, 129]. Afirst, main question concerns the exact assessment of the role played by parties in legislativeactivity. This typically takes (at least) two perspectives. The first perspective concerns itselfwith the role of political parties in setting extreme agendas, selecting the policy issues underdeliberation [21, 53]. A second perspective focuses on party discipline, how much partieshave been a source of voting divergence by inducing more adversarial vote choices of theirmembers. Both perspectives appear germane to the debate on polarization, as the power ofparty leaders vis-a`-vis rank and file has been increasing over time.2A related and equally important open question extends to the primitive problem of as-sessing the ideological stance of politicians in the first place - that is, absent any equilibriumdisciplining by parties on floor votes [113]. Polarization appears driven at least in part fromthe replacement of more moderate with more extreme legislators [164] and to a much lowerextent by changes in individual preferences (consistently with [54, 116]). To answer whothese politicians are - or more precisely where their policy ideal points are located - requiresrecovering an unbiased distribution of within-party individual ideologies before legislatorsare persuaded by the leadership (we will refer to this latter action as “whipping”). Suchdistributions are of great interest to the political economy and political science scholarship1This chapter is a joint work with Chad Kendall and Francesco Trebbi2See [16] for a discussion of whether political polarization is the result of better internal enforcement byparty leaders. Also [139] for a recent discussion.81focused on the behavior of national legislators [117, 129, 133, 150].The goal of this chapter is to present a credibly identified approach for unbundlingthese different components - agenda setting, party discipline, and ideology - in driving po-larization. How much of political polarization is the result of more ideologically polarizedmembers of Congress relative to more polarized party leaderships setting divisive agen-das and forcing their members to toe the party line? Rather than seeking a mono-causalexplanation, we try to gauge the magnitude of such drivers.A first contribution of this chapter is to provide a economic model of legislative activityfor a two-party legislature. It is designed to capture strategic considerations on multiplenested dimensions. The first dimension is what issues and, given an issue, what policyalternatives are selected by parties. Policies that are not valuable vis-a`-vis a specific statusquo, or too hard to pass given the chamber composition, may not even be pursued. Thesecond dimension is whether, once a certain alternative to a status quo is proposed, theleadership decides to invest in acquiring extra information to ascertain the prospects of thatspecific policy alternative (i.e. “to whip count” a bill) or not. Certain policy alternativesmay be worth the extra informational effort, others not. Again, policies that appear notpromising once more information is acquired may not be further pursued by the leadership.A third dimension for consideration by the party is, if a bill is eventually brought to thefloor for a vote, which legislators to discipline (i.e. “to whip”), and by exactly how much,in order maximize the likelihood of legislative success, given the opposing party may whipas well. As the model formalizes, member voting decisions on legislation are ultimatelyendogenous to all phases of this process.3Unbundling empirically the multiple elements of this process is a second contributionof the chapter. We identify and estimate our model structurally. This exercise is ultimatelypossible thanks to the use of identifying variation supplementary to standard floor votesalone. We make use of a complete corpus of whip count votes compiled from historical3Seminal work from [52], [53] and [7] emphasizes the importance of parties for the functioning of Congress.It focuses on how parties use available institutions to coordinate and set policies to their benefit, as well ashow party leaders work for that goal with their party members. Cox and McCubbins emphasize institutionalmechanisms by which parties get their policies on the floor, and block the minority’s policies. They discuss theincentives to do so, which included the “brand” value of a party, increasing reelection chances for politicians,increasing the coordination of policies that the politicians may be unsure of, setting policy positions, as wellas helping the enforcement and coordination of policies and votes. Evidence such as in [71] has shown thatthese mechanisms of policy positioning and agenda setting are present, as measured by the attendance rates andtranscripts from party caucuses, and affect legislative roll call voting. [7] and his Conditional Party Governmenttheory proposes that parties play an important role in pushing policies of interest to the rank and file. Economistssuch as [37] have also taken a similar stance to party organization, emphasizing internal control issues, but witha focus on electoral success.82sources by [65] for the 95th to 99th Congress (1977 to 1986), at the inflection point of con-temporary party polarization. These counts, which are run by the party leadership in orderto ascertain the floor prospects of specific bills, allow us to introduce information on theactual position of members before a floor vote occurs. Whip counts provide informationneeded to pin down ideological positions of legislators at a point before party control isexerted and, in fact, these counts are precisely used to identify members on the fence on abill. An informational argument at the base of our use of whip counts relies on two argu-ments. The information revelation value of whip counts resides in the repeated interactionbetween members and the leadership, which limits the ability of rank-and-file politicians tosystematically lie or deceive powerful observers, their own party leaders. Their interactionsare frequent and stakes are generally high. In addition, by a revealed preference argument,the fact that costly whip counts are systematically employed by the leadership on crucialbills bear witness to their usefulness and informational value - why would the leaders spendtime on these counts otherwise. In fact, the Majority and Minority Whips who organizethese counts are high ranking position in the party hierarchy.4From whip counts information one also learns about party discipline. Switching behav-ior in Yea/Nay between the whip count stage and the roll call stage provides the identifyingvariation useful to pin down the extent of whipping - who is the target of party influenceand how much legislators close to the marginal voter for a bill need to be persuaded.Finally, the new data provides identifying variation for assessing agenda setting. Thisarises from the fact that not all bills that are voted on the floor are previously whip countedand that certain bills that are whip counted are subsequently dropped without a subse-quent vote.5 From flexible assumptions on the distribution of latent status quo policiesand from theoretically identified thresholds determining which bills are roll called and/orwhip counted, we can identify policies that are never proposed and never voted.To conduct such an analysis, we must first recover legislator ideal points. Standard4The data structure of whip counts has been explored before, such as in the work of [152] and [61], forexample, but with different scope or objective. In both papers, the data was collected as the authors workedwithin the Whip Offices (as American Political Science Association Congressional Fellows). Our final dataprovides a comprehensive set-up: for many bills over different Congresses, we can track the voting intentionsof politicians, how those changed in the final vote, and the whips who were responsible in making these changeshappen. Two works, in particular, have looked at whip counts in the context of parties and party discipline. [35]look at 16 whip counts and their roll calls and find that most of the switching of votes has gone in the directionof party leaders. They argue that even if this undermines the true impacts of whips (as many of the votes areguaranteed by leaders in equilibrium, without having it actually changed), it still presents evidence of higheffectiveness of this measure. [66] also use whip counts, and provide an extensive survey of whipping in heHouse of Representatives and the Senate, drawing attention to some historical examples.5For a recent instance, consider the early 2017 efforts to repeal the Affordable Care Act by the Republicanleadership in the House, which were repeatedly whip counted, but not voted.83approaches to the estimation of ideal points based on random utility models or optimalclassification approaches to roll call votes alone, such as the popular DW-Nominate [150],miss important density in the middle of the support of the ideological distribution. Thesemethods, which conflate party discipline and agenda setting in the estimation of individualideology [158], show a polarization level of ideal points much larger than the actual onebased on our unbiased estimates. This rationale finds exact quantification in our environ-ment. Across our Congresses the distance between party medians is on average one third ofthe corresponding distance based on standard DW-Nominate estimates. According to ourestimates the share of total polarization attributed to party discipline as opposed to ideolog-ical drift over time varies from 64.7 percent in the 96th Congress to 71.2 percent in the 99thCongress.The chapter tackles several paradoxes in the literature on the political economy of leg-islatures, including the observation by [111, 112] that party unity in floor voting may notnecessarily be conclusive evidence of discipline. It is at its core an identification critique.Politicians from the same party are likely to share a similar ideology, hence they could bevoting the same way, regardless of any role for party leadership.6With our structural estimates at hand, we show that, quantitatively, parties matter sub-stantially. To assess the importance of party discipline, we show counterfactuals that shutdown the whipping phase of floor voting and produce alternative outcomes in roll calls.Eliminating party discipline in the form of whipping is precisely rejected relative to a modelwith party discipline using model selection tests. The extent of party discipline is statis-tically different from zero, quantitatively sizable, and growing between 1977 and 1986.Given the specific time period over which our whip count data is available, we are also ableto assess the role of parties in steering particularly salient bills in the early 1980s, includingthe National Energy Act of 1977 (H.R. 8444), the Foreign Intelligence Surveillance Act of1978 (H.R. 7308), the Contra affair in Nicaragua: prohibiting covert paramilitary activityin Nicaragua (H.R. 5399) in 1984, the lifting the arms embargo to Turkey in 1978 (H.R.6The main difficulty lies in being able to compare outcomes with parties, to outcomes with none. In aseries of works, Keith Krehbiel ([111], [112], [113]) argued that the previous literature failed to address theconfounding issues of whether parties are effective, or whether they are only a grouping of similar mindedpoliticians. This identification problem comes from using outcomes such as roll call votes, party cohesion, orparty unity scores. These measures are a combination of politicians’ preferences and of party effects. Politiciansfrom the same party are likely to share similar ideologies, so they could be voting in the same way regardless ofparty discipline. The paradox, as stated by [112], is that this would make it seem that parties are strongest whenthey are most homogeneous ideologically (and hence, when they are needed the least). That, in turn, leads to anempirically difficult problem: how does one separate individual ideology measurements from party effects? Inparticular, how does one estimate party effects, when ideology measures confound both parties and individualideologies?8412514), the implementation of the Panama Canal Treaty (H.R. 111) in 1979, and severalkey tax bills.This chapter touches several strands of literature. It is primarily concerned with po-larization in the political elite. The empirical literature on political polarization has a richhistory [148] and has experienced a recent resurgence in interest due to glaring increases inpartisanship ([126], but also media reports7). Exemplifying one of the most popular existingprocedures to estimate legislator ideology from the political economy and political scienceperspectives8, [129] offers a broad discussion of this research area and links it to parallelrelevant phenomena, such as the co-determined evolution of U.S. income inequality [146].Raising political polarization has been detected not only in legislator ideology assessmentsbased on roll calls but in candidate survey responses [139], congressional speech scores[79], and campaign contributions measures [28]. Considerations on polarization from theeconomic perspective and related to the seemingly increasing policy gridlock after the 2008financial crisis are offered in [135].We contribute to this discussion from the empirical perspective, in an effort to parsequantitatively some of the deep determinants of polarization. In this respect our work com-plements other recent attempts, such as [139]. It differs in terms of theory, identificationstrategy, and in the use of a structural approach.These decomposition efforts are rooted in an older related literature that seeks ways toseparate a politician’s true policy preferences from that of the party, by focusing on sit-uations one or the other factor would not be present. [158] propose one such method ofseparating party effects from politician ideology, which has been widely used and adapted(e.g. [127, 138]). The argument is that parties concentrate their efforts on results theycan impact, such as close legislative votes. Seemingly, expected lopsided votes would notattract nor need party intervention. Absent party effects on lopsided votes, [158] argue infavor of estimating individual ideologies from a first stage on lopsided roll calls alone. Afterrecovering estimates of individual preferences, in a second stage they study close votes torecover party effects, given the previously estimated legislator true preferences. There aretwo main methodological obstacles to this this approach. First, which vote is lopsided andwhich is a contested is endogenous to the choice of policy alternative by the agenda setter(see the discussion in [21]). This selection mechanism will be explicit in our framework.Secondly, [127] note that this method provides poor identification, due to minimal varia-7See, for instance, Philip Bump, December 21, 2016, “Farewell to the most polarized Congress in morethan 100 years!” Washington Post.8Among the standard approaches to estimation are [49, 92, 149].85tion of vote choices within a party in lopsided votes and offer a nonparametric alternative.Evidence of party effects appears quantitatively small according to these studies.Our chapter attempts to addresses both methodological issues and does not rely onarbitrary selection of votes where parties are active or not. Other closely related paperssuch as [49], who use Bayesian methods to estimate ideal points, also employ lopsidedbills to recover party discipline. [12]) use a survey directly targeted at candidate ideology(NPAT, also used in [139]) to estimate ideal points, hence moving away from roll calls.9The rest of our work is organized as follows. Section 4.2 presents our model and Section4.3 our main analytical results. Section 4.4 describes our data, with emphasis on our ap-plication of whip count information. Section 4.5 focuses on the identification of the modeland Section 4.6 presents our estimator. Section 4.7 discusses our estimation and Section 4.8our counterfactual exercises and benchmarks our analysis to extant metrics of polarization.Section 4.9 concludes. For convenience, all Figures and Tables referred to in the Resultsare shown after the Conclusion.The Appendix contains all proofs and additional empiricalsupporting material.4.2 ModelTwo parties compete for votes on a series of issues that make up a congressional term. Todiscipline members, each party employs whips who serve two purposes: they aggregateinformation, and, at a personal cost, can persuade members to vote along party lines. Fora given status quo policy, a party choose the alternative policy (the bill) to be voted upon.It does so accounting for both its own ability to discipline (whip) its members, as well asthat of the other party, and on the value and likelihood of passage of the alternative policy.Because floor votes are costly, not all status quo’s are contested. In addition, the proposingparty can employ a formal whip count which allows it to obtain additional informationabout a bill’s probability of success before a floor vote, and to drop bills that are unlikelyto pass conditional on these counts.10 Whether the proposing party chooses to conduct a9Another approach looks at politicians who change party and see how their voting behavior changes. As[142] finds, congressmembers who switch party do change voting patterns, suggesting that ideology is nottheir sole decision factor. Our model results in this change in behaviour. Nonetheless, other methods such asDWNominate, rely on distinct ideology estimates for the same legislator depending on their party affiliation.One interesting historical approach is presented by [99]. By looking at congressmembers who served theU.S. House and then went serving in the Confederate House during the American Civil War, he finds strikingdifferences in the estimated ideologies for the same politician from voting behavior in the different Houses.Since legislators were the same, under very similar institutional settings, he concludes (with further evidence)that differences were due to agenda setting and party discipline rather than mere ideology.10The party not setting the agenda may also conduct a whip count, but this occurs less frequently in our dataso we do not model its reason for doing so.86formal whip count depends upon its option value relative to the fixed cost of undertakingthis process. Not all bills are therefore brought to the floor and not all status quo policiesare even contested to begin with.4.2.1 PreliminariesParty members vote on a series of policies at times t = 1,2, . . . ,T with the majority votedetermining the winning policy. We work in a single-dimensional ideological space. Eachparty, p ∈ {D,R}, has a mass of Np members whose underlying ideologies, θ , are continu-ously distributed with cumulative distribution functions (CDFs), Fp(θ). We assume that thecorresponding probability distribution functions (PDFs), fp(θ), have unbounded support.The median member(s) of a party are identified by θm,p and represent the preference of theparty overall. We assume without loss that θm,D < θm,R.In each period, with probability γ party D is randomly recognized to set the policy al-ternative, xt , to be put to a vote. With the remaining probability 1−γ , party R is recognized.The recognized party draws a status quo policy, qt , from the party-specific continuous CDF,Wp(q), with corresponding PDF, wp(q), which is assumed to have unbounded support.114.2.2 PreferencesThere are three sets of actors for each party: non-whip members, whip members, and theparty itself.Whips are a ‘technology’ that a party uses to discipline its members. We take themass and ideologies of whips as exogenous and assume an exogenous matching of whips tomembers for which they are responsible, such that each member is controlled by exactly onewhip. Whips acquire information from members and are eventually rewarded for obtainingvotes that the party desires.All party members (whips and non-whips) derive expressive utility from the policy,kt ∈ {qt ,xt}, that they vote for. This utility is given by u(kt ,ωi,t), where ωi,t = θi + δ 1i,t +δ 2i,t +η1t +η2t determines their position on a particular bill. We assume a symmetric, strictlyconcave utility function: u(kt ,ωi,t) = u(|kt −ωi,t |) with u(ωi,t ,ωi,t) = uk(ωi,t ,ωi,t) = 0,ukk(kt ,ωi,t)< 0.θi is a member’s fundamental ideology, which we assume a constant trait of i.12 A11In our application D is the majority party and one can assume γ 1/2. We do not model how the frequencyof recognition can be affected by the leadership of both parties, but to allow more flexibility in the structure ofissue selection we allow the distributions of primitive status quo’s to vary by party p.12In this we follow the discussion and evidence from [116, 139].87member’s position on any particular bill is determined by this ideology, two idiosyncraticshocks, δ 1i,t and δ 2i,t , and two aggregate shocks, η1t and η2t . The need for the accrual ofmultiple shocks will become clearer below, where we model the information acquisitionproblem for the proposing party. The aggregate shocks are independent draws from normaldistributions with mean zero and standard deviations, σ1 and σ2, respectively. The aggre-gate shocks η1t and η2t are common across all members of both parties. The idiosyncraticshocks δ 1i,t and δ 2i,t are identically and independently distributed across i and t according tothe continuous, unbounded, and mean zero CDF, G(δ ) with corresponding PDF, g(δ ).Whip members, in addition to their utility from voting, receive a payment of rp (whichmay differ across parties) for each member i for which the whip is responsible who voteswith the party. rp may represent, for example, improved future career opportunities withinthe party hierarchy.13 We model whip influence over the members for which she is respon-sible as an ability to persuade a member to change his position on a particular bill. Toinfluence a member’s position by an amount, yi,t (i.e. to move his bliss point to ωi,t + yi,t),a whip bears an increasing cost, c(yi,t) (c′ > 0), which can be thought of most simply asan effort cost.14 We assume c(0) < rp so that the whip always exerts a non-zero amountof influence. The contribution to a whip’s utility from whipping is therefore given by∑i (rpI(i votes with party)− c(yi,t)), where I(.) is the indicator function and the summa-tion is over the members for whom he is responsible.Each party derives utility from that of its median member, u(kt ,θm,p) where kt ∈ {qt ,xt}is the winning policy. For simplicity, we assume that the party’s position, represented bytheir median member is not subject to idiosyncratic or aggregate shocks.15 Because theparty does not directly bear the cost of whipping members, whipping is costless to the party(and thus both parties always whip votes to the maximum extent possible).4.2.3 Information and TimingThe timing of the model is as follows (see Figure 4.1). At each time t:1. Party D is randomly recognized to be the proposing party with probability γ . With13Rewarding the whip only if he switches a member’s vote does not change the results.14Having the shocks and influence operate on the ideological bliss point rather than as changes in utility (i.e.u(kt ,θi)+ δ 1i,t + δ2i,t +η1t +η2t + yi,t ) simplifies the model in two ways. First, it ensures that the maximuminfluence exerted by a whip (see Section 4.3.2) is a constant, independent of the locations of the policies andthe distance between them. Second, it ensures the expected number of votes monotonically decreases in theextremeness of the alternative policy, xt (see the proof of Proposition 4.3.4), which need not be the case forutility shocks.15This assumption simplifies the policy setting decision because aggregate shock realizations never cause theproposing party to prefer the status quo over the proposed alternative.88Figure 4.1: Timelinethe remaining probability, 1− γ , the proposing party is R. The status quo policy, qt ,is drawn and observed by all.2. The proposing party chooses the policy xt as an alternative to the status quo qt anddecides whether or not to conduct a whip count at a cost, Cw > 0.3. The first aggregate and idiosyncratic shocks, η1t and δ 1i,t , are realized and observednoisily: each member observes his idiosyncratic shock, δ 1i,t , and the policy he prefers,u(xt ,θi+δ 1i,t +η1t )≶ u(qt ,θi+δ 1i,t +η1t ), but not the realization of η1t .4. If a whip count is undertaken, each member makes a report, mi,t ∈ {yes,no}, to hiswhip, answering the question, ‘Given your position, will you vote with the party?’.The outcome of the whip count is common knowledge. In the aggregate, the whipcount reveals the realization of η1t (see Section 4.3.3).5. The proposing party (conditional on the whip count, if taken) decides whether or notto proceed with the bill, taking it to a roll call vote at a cost, Cb > 0.6. The second aggregate and idiosyncratic utility shocks, η2t and δ 2i,t , are realized andobserved as in the case of the first shocks. Each member observes his idiosyncraticshock, δ 2i,t , and the policy he prefers u(xt ,ωi,t)≥ u(qt ,ωi,t), but not the the realizationof η2t .7. Whips communicate with their members in order to learn the sum of the aggregateshock, η1t +η2t .8. Whips learn the sum of the idiosyncratic shocks, δ 1i,t +δ 2i,t of the members for whomthey are responsible and then choose the amount of influence to exert, yi,t , with eachmember.9. The roll call vote occurs.The information structure (who knows what and when) is a formalization of the two mainduties of a whip. First, whips aggregate information - no single member is likely to know89the outcome of the bill, so information must be aggregated across all members in order to(i) decide whether or not to continue with a bill (whip count) and (ii) decide how much totry to influence members. Second, whips, by maintaining close relationships with the rank-and-file members they are responsible for, obtain information about individual positions,and use this information to decide which members can be most easily persuaded to toe theparty line.4.3 AnalysisWe solve the model via backwards induction. In Sections 4.3.1 and 4.3.2, we determine thedecisions of members and whips. These decisions are the same in each party, so we dropthe party label for convenience. In Sections 4.3.3 through 4.3.5, we turn to the decisionsunique to the proposing party: which, if any, alternative policy to pursue, and whether ornot to conduct a floor vote and a whip count.4.3.1 Roll Call VotesPrior to the roll call vote, whips communicate with the members for whom they are re-sponsible in order to learn the value of η1t +η2t , which is necessary for deciding how muchinfluence to exert (see Section 4.3.2). To do so, each whip asks each member for which heis responsible the policy he intends to vote for. In the aggregate, this process reveals the ag-gregate shocks as in the case of a whip count (see Section 4.3.3). Whips then communicatethe values of the aggregate shocks to all members, so that they have full information at thetime of their vote.A member votes for xt if and only if u(xt ,ωi,t + yi,t) ≥ u(qt ,ωi,t + yi,t) where ωi,t + yi,tis the member’s ideological bill point after whip influence.16 It is convenient to definethe marginal voter as the ideological position of the voter who is indifferent between thetwo policies. Given symmetric utility functions, this voter is located at ωi,t = MVt = xt+qt2 ,absent any party discipline.4.3.2 Whip DecisionsAt the time of the whipping decision (just prior to roll call), each whip has full informationabout the aggregate shocks and the idiosyncratic shocks of his members. He thereforeknows whether or not a given (conditional) transfer induces a vote for a party’s preferred16Ties have measure zero due to the continuous nature of the shocks and therefore the vote tie-breaking ruleis immaterial.90policy or not, and so either exerts the minimal influence necessary to make the memberindifferent between policies, or exerts no influence at all. The maximum influence, ymaxp , heis willing to exert is defined by rp = c(ymaxp ), or ymaxp = c−1(rp). ymaxp is strictly greater thanzero by assumption (c(0)< rp).Given ymaxP , Lemma 4.3.1 establishes that only members who wouldn’t otherwise votefor the party’s preferred policy, and are within a fixed distance of the marginal voter areinfluenced (whipped).Lemma 4.3.1. Assume a party strictly prefers policy kt over policy k′t . Then, the whips ofthe party whip only members, i, whose realized ideologies are on the opposite side of MVtfrom kt and such that |ωi,t −MVt | ≤ ymaxp4.3.3 The Whip CountThe proposing party may conduct a whip count in order to learn about the first aggregateshock, η1t . Whips receive reports, mi,t ∈ {yes,no}, from each member for whom they areresponsible and subsequently make these reports public. If each member reports truthfully,he reports mi,t = yes if u(xt ,θi + δ 1i,t +η1t ) ≥ u(qt ,θi + δ 1i,t +η1t ) and mi,t = no otherwise.Given the continuum of reports, {mi,t}, by the law of large numbers, E[η1t |{mi,t}] = ηˆ1t ,where ηˆ1t is the realized value of η1t .Furthermore, all members reporting truthfully forms part of an equilibrium strategyof the overall game because no single member can influence beliefs about ηˆ1t , and hencecannot influence the eventual policy outcome by misreporting.17 We therefore assume inwhat follows that members play a truth-telling strategy.18We formalize these claims in Lemma 4.3.2.Lemma 4.3.2. Truth-telling at the whip count stage forms part of an equilibrium strategy.Under truth-telling, the realization of the first aggregate shock, ηˆ1t , is known with probabil-ity one.17In addition, misreporting does not change the amount of influence a member’s whip exerts because thewhip learns the member’s true position before exerting influence.18As usual, there also exists an equilibrium of the whip count subgame in which each member babbles, sothat nothing is learned about ηˆ1t .914.3.4 Optimal Policy ChoicesAfter observing qt , the proposing party can choose to do one of three things. One, it candecide not to pursue any alternative policy. Two, it can choose an alternative policy topursue, xt , without conducting a whip count. In this case, the party pays the cost, Cb, ofpursuing the bill to the roll call stage. Three, the party can choose an alternative policy topursue and conduct a whip count at a cost, Cw. In this case, after observing the results ofthe whip count, the party can decide whether or not to continue with the bill at a cost of Cb.Choosing to undertake the whip count is analogous to purchasing an option: the optionto save the cost of pursuing the bill should the initial aggregate shock η1t turn out unfavor-ably.For status quo policies to the left of the proposing party’s ideal point, θm,p, the alterna-tive policy pursued (if any) must lie to the right of the status quo: any policy to the left of qtis less preferred than qt and qt can be obtained at no cost. Similarly, for status quo policiesto the right of θm,p, the proposed alternative policy must lie to the left of the status quo. Inchoosing how far from the status quo to set the alternative policy, the proposing party facesan intuitive trade-off: policies closer to its ideal point are more valuable, should they besuccessfully voted in, but are less likely to obtain the necessary votes to pass.Lemma 4.3.3 formalizes this intuition.Lemma 4.3.3. The expected number of votes that the alternative policy, xt , obtains strictlydecreases with the distance between xt and the proposing party’s ideal point.The result of Lemma 4.3.3 guarantees that the alternative policy proposed must lie be-tween the party’s ideal point and the status quo policy. An alternative policy on the oppositeside of the ideal point from the status quo is dominated by xt = θm,p, which is both morepreferred and obtains more votes in expectation.For ease of exposition, for the remainder of the analysis we present the case in whichparty D is the proposer - the case of party R is symmetric. Given the whipping technologiesavailable to each party defined by the maximum influence their whips are willing to exert,ymaxR and ymaxD , we can define the position of the marginal voter when the alternative policyis such that it obtains exactly half of the votes. Denote this position, MˆV i, j, where thesubscripts i, j ∈ {L,R} indicate the directions of the policy that parties D and R whip for,respectively.19 As shown in the proof of Lemma 4.3.3, for a given realized marginal voter,19Each ˆMVi, j is a function of many parameters of the model, so we suppress their dependencies for conve-nience. Note, however, that each is independent of qt and xt .92M˜V t = MVt −η1t −η2t , the number of votes for xt is known with probability one due to thecontinuum of members in each party. Denoting the number of votes for xt as a function ofthe realized marginal voter, Y (M˜Vt), each ˆMVi, j is then given by Y ( ˆMVi, j) = NR+ND2 .In the absence of a whip count, if party D pursues an alternative policy, the alternativepolicy xt must maximizeEUno countD (qt ,xt) = Pr(xt wins)u(xt ,θm,D)+Pr(xt loses)u(qt ,θm,D)−Cbwhere the cost of of proceeding with the bill, Cb, is paid with certainty.For status quo policies to the left of θm,D, since xt ∈ (qt ,θm,D], both parties prefer andwhip for xt , the rightmost policy. Because Y (M˜V t) is monotonically decreasing in xt , andtherefore M˜V t , xt wins if and only if M˜V t < ˆMVR,R so that Pr(xt wins)=Pr(M˜V t < MˆV R,R).20The sum of the aggregate shocks, η1t + η2t , is normally distributed with a variance ofσ2 = σ21 + σ22 so that we can write Pr(xt wins|xt > qt) = 1−Φ(M˜V t− ˆMVR,Rσ)where Φdenotes the CDF of the standard normal distribution.For status quo policies to the right of θm,D, we have xt ∈ [θm,D,qt). Party D thereforewhips for the leftmost policy, xt , but party R may whip for either policy depending onwhere qt and xt lie with respect to θm,R. As a simplification, we assume party R alwayswhips for qt in this case.21 Under this assumption, xt wins if and only if M˜V t > ˆMVL,R, sothat Pr(xt wins|xt < qt) =Φ(M˜V t− ˆMVL,Rσ).Conducting a whip count provides the option value of dropping the bill and avoiding thecost, Cb , if the first aggregate shock makes it unlikely the bill will pass. After conductingthe whip count to learn the first aggregate shock, party D continues to pursue the bill ifand only if Pr(xt wins|η1t = ηˆ1t )(u(xt ,θm,D)−u(qt ,θm,D))+u(qt ,θm,D)−Cb ≥ u(qt ,θm,D),where ηˆ1t is the realized value of η1t and u(qt ,θm,D) is the party’s utility from the outsideoption of dropping the bill. Pr(xt wins|η1t = ηˆ1t ) is easily shown to be strictly monotonicin ηˆ1t , so that we can define cutoff values of η1t , η1t and η1t , such that party D continues topursue the bill if and only if η1t > η1t (for status quo’s to the left of θm,D) or η1t < η1t (forstatus quo’s to the right).20Ties occur with measure zero so any tie-breaking rule suffices.21Similarly, if party R proposes an alternative to a status quo policy, qt < θm,R, we assume party D alwayswhips for the status quo. We can solve the model without these assumptions, and the results are qualitativelysimilar. The only difference is that the proposing party may choose to set the alternative policy such thatthe other party is exactly indifferent between policies in order to gain its support, rather than pushing for analternative policy closer to the proposing party’s ideal point. Thus, the model predicts a mass of bills for whichthe the marginal voter is at exactly the opposing party’s ideal point. In reality, uncertainty about party positionsis likely to prevent such fine-tuning of policies.93Given these continuation policies, prior to the whip count, party D chooses xt to maxi-mizeEUcountD (qt ,xt) = Pr(η1t > η1t)[Pr(xt wins|η1t > η1t )(u(xt ,θm,D)−Cb)+(1−Pr(xt wins|η1t > η1t ))(u(qt ,θm,D)−Cb)]+Pr(η1t < η1t )u(qt)for status quo policies to the left of θm,D andEUcountD (qt ,xt) = Pr(η1t < η1t )[Pr(xt wins|η1t < η1t )(u(xt ,θm,D)−Cb)+(1−Pr(xt wins|η1t < η1t ))(u(qt ,θm,D)−Cb)]+Pr(η1t > η1t )u(qt)for status quo policies to the right of θm,D.We define xcountt and xno countt to be the optimal alternative policies pursued (if any alter-native is pursued) when a whip count is conducted and when it is not, respectively. Propo-sition 4.3.4 shows that, provided that the cost of pursuing a bill, Cb, is not too large, theseoptimal policies are unique and bounded away from the party’s ideal point.Furthermore, the alternative policy pursued with a whip count is closer to the party’sideal policy. Intuitively, the fact that a whip count allows the party to drop bills that areunlikely to pass after observing the first aggregate shock allows it to pursue policies that aremore difficult to pass.Proposition 4.3.4. There exists a strictly positive cutoff cost of pursuing a bill, Cˆb > 0,such that for all Cb < Cˆb, the optimal alternative policies, xcountt and xno countt , are uniqueand contained in (qt ,θm,D) for qt < θm,D, contained in (θm,D,qt) for qt > θm,D, and equalto θm,D for qt = θm,D.The requirement in Proposition 4.3.4 that Cb be sufficiently small is for analytical pur-poses only for the whip counted case. Numerically, we have been unable to find a coun-terexample in which the proposition does not hold.4.3.5 The Whip Count and Bill Pursuit DecisionsTo complete the analysis, we determine for which status quo policies alternative policiesare pursued and, when they are pursued, whether or not a whip count is conducted.94Define the value functions, V countD (qt)=EUcountD (qt ,xcountt )−u(qt ,θm,D) and V no countD (qt)=EUno countD (qt ,xno countt )−u(qt ,θm,D), as the gains from pursuing an alternative policy withand without conducting a whip count, respectively (note that these definitions account forthe cost of pursuing a bill, Cb, but ignore the cost of the whip count, Cw).Lemma 4.3.5 characterizes the value functions as a function of the status quo policy.Lemma 4.3.5. Fix Cb < Cˆb such that the optimal alternative policies, xcountt and xno countt ,are unique. Then, for all qt 6= θm,D, the value of pursuing an alternative policy with awhip count, V countD (qt), strictly exceeds that without, Vno countD (qt). Furthermore, both valuefunctions strictly decrease with |qt − θm,D|, but the difference between them, V countD (qt)−V no countD (qt) strictly increases.Intuitively, both value functions decrease as the status quo approaches the proposingparty’s ideal point because there is less to gain from an alternative policy. More inter-estingly, the reason the difference between the value functions increases as the status quoapproaches the party’s ideal point is because the whip count is an option that allows theproposing party to initially pursue a bill, but drop it if the initial aggregate shock turns outto be unfavorable (thus avoiding the cost, Cb). This option value is always positive becausethe party could always ignore the result of the whip count. It increases as the status quonears the party’s ideal point because passing an alternative policy becomes more difficult(fixing xt , as qt approaches θm,D, the marginal voter approaches θm,D, resulting in a lowerprobability of passing). Therefore, exercising the option becomes more likely, and thusmore valuable.Using the nature of the value functions, Proposition 4.3.6 shows which bills are pursuedwith and without a whip count, accounting for the fact that whipping is costly.Proposition 4.3.6. Fix Cb< Cˆb such that the optimal alternative policies, xcountt and xno countt ,are unique and fix the cost of a whip count, Cw > 0. Then, we can define a set of cutoff statusquo policies, ql,ql,qr, and qr, with ql ≤ ql < θm,D < qr ≤ qr such that:1. for qt ∈ [−∞,ql]∪ [qr,∞], the optimal alternative policy, xno countt , is pursued withoutconducting a whip count.2. for qt ∈ (ql,ql]∪ [qr,qr), the optimal alternative policy, xcountt , is pursued and a whipcount is conducted.95Figure 4.2: Example of Value FunctionsNote: Value functions of pursuing an alternative policy with and without a whip count. The x-axis shows q.Party D is the proposing party. The value functions are simulated using θm,D = −0.5, θm,R = 0.5, ˆMVR,R =ˆMVL,R =−0.5, σ1 = σ2 = 1, Cb = 0.5, Cw = 0.025, and quadratic utility.3. for qt ∈ (ql,qr), no alternative policy is pursued.We illustrate Proposition 4.3.6 by example in Figure 4.2.For status quo policies nearest to party D’s ideal policy, alternative policies are neverpursued. There are two reasons for this. First, the optimal policy alternative that could beproposed would be very much opposed by R and have a low chance of success. Secondly,because qt is close to θm,D to begin with, any additional policy change is not very valuable.For status quo policies farther away, alternative policies may be pursued with or without awhip count, but when both are possible (as in the empirically relevant case illustrated), it isalways policies farthest from the party’s ideal policy that are pursued without a whip count,as they are easier to pass in roll call.4.4 DataThe data used in this chapter come from two main sources. We use data on whip counts,compiled from historical sources by Professor Lawrence Evans (College of William and96Mary) and roll call voting data from VoteView (e.g. [149, 150]).By merging the data on whip counts to the roll call voting data, we can see the variationof outcomes from a Congress member due to whipping.The data collected by Professor Evans is a comprehensive set of whip counts retrievedfrom a variety of historical sources. These are mostly from historical archives holdingformer whip and party leaders’ papers. The data collection procedure is described in depthin [65], and involved visits to the archives, collection of the data, and treatment by his teamof researchers. We focus on his data from 1977-1986. Our specific choice of time periodis driven by the fact that whip count data as comprehensive and complete as the one forCongresses 95th-99th is not systematically available for other Congresses. This is mostlydue to idiosyncratic differences in the effort and diligence in record keeping by the Whips.It is nonetheless the case that the period under analysis in the chapter is interesting, sittingat the inflection point of modern political polarization in American politics. Our data hencecaptures a crucial turning point.For the Republican Party, between 1977-1980, the data originally comes from theRobert H. Michel Collection, in the Dirksen Congressional Center, Pekin, Illinois, Lead-ership Files, 1963-1996. This part of the data “appears to be nearly comprehensive aboutwhip activities on that side of the partisan aisle, 1975-1980”.For the Democratic Party from 1977 to 1986, data comes from the Congressional Papersof Thomas S. Foley, Manuscripts, Archives and Special Collections Department, HollandLibrary, Washington State University, Boxes 197-203. Although John Brademas was theMajority whip from 1977 to 1980, his papers are collected within the Thomas Foley Col-lection (his successor). According to [65], “the Brademas records are extensive and verywell organized, and I am confident that they are nearly comprehensive. For that matter, Ialso have a similar sense of the archival file from Foley’s time in the position”. The dataalso allows us to merge with Roll Call data, since Professor Evans associates it to bill thatwas voted on the floor (if the latter was sufficiently close to the one that had a whip count).In total, we have 340 bills with whip counts covering the period of 1977 to 1986. 70 ofthe bills are Republican Whip Counts, from the years 1977-1980. The remaining 270 areDemocratic Whip Counts, from 1977 to 1986. Within the file for each bill, we have data onhow each Congress member responded “Yea” or “Nay” to the party’s question at the whipcount stage. Some bills include further whip counts (i.e. a second, third whip count), whichfollow the same structure. Table 4.1 shows aggregate statistics for the progression of billsin our time frame. These include the number of bills whip counted (dropped and pursued97in roll calls), as well as those roll called.22The bills included in this data address a variety of questions about foreign aid, domesticpolicy, economic policy, among others. Some bills that are covered include the NationalEnergy Act of 1977 (H.R. 8444), the Foreign Intelligence Surveillance Act of 1978 (H.R.7308), Healthcare for the Unemployed Act of 1983 (H.R. 3021), the Dr. Martin LutherKing Jr. National Holiday Bill of 1983 (H.R. 3345), the Contra affair in Nicaragua: pro-hibiting covert paramilitary activity in Nicaragua (H.R. 5399) in 1984 (as well as other billsregarding funds for those), the lifting the arms embargo to Turkey in 1978 (H.R. 12514),the implementation of the Panama Canal Treaty (H.R. 111) in 1979, and successive votesfor increasing the debt limit.We merge the data from whip counts and whip identities to the final votes for thosebills on the floor. The roll call data comes from VoteView.org ([149]), a standard reference.From the total of 340 whip counts, we obtain 238 cases which can be directly associatedwith a subsequent floor vote in House.4.5 Transition to Estimation and IdentificationFor convenience to the reader, we omit time subscripts where not necessary and remind thatthe CDF of δ1 and δ2 is denoted as G. We will assume that it is the same across parties andwhips/non-whips and denote the CDF of the convolution δ1+δ2 as G1+2. Finally, we recallthat η1 and η2 are i.i.d., with Normal distributionsN (0,σ21 ) andN (0,σ22 ), respectively.From the model, a politician (from party D) will say “Yea” at the whip count if:23δ1,i,t +η1,t +θi ≤MVt , (4.1)where MVt =xt+qt2 is the marginal voter. Let us define two auxiliary variables:γ1,t = MVt −η1,t (4.2)γ2,t = MVt −η1,t −η2,t . (4.3)22We also obtained data on the identity of whips (including the regional and assistant whips that composepart of the party ranks in addition to the main party whip - Majority Whip or Minority Whip) for each partysince the 1970s, originally compiled by [131]. The data covers the periods between the 95th and the 106thCongresses (1977 to 2000). This was originally collected from the editions of the Congressional QuarterlyAlmanac and Congressional Quarterly’s Politics in America and provides lists for Democratic and Republicanwhip membership. The data on the number of such whips by party and Congress is reported in Table 4.2. TheTable shows intuitively how large the apparatus for the enforcement of party discipline within each party is.23The case with party R can be found in Appendix.98These represent the realized Marginal Voters at the whip counts and at the roll callstages. Hence, the probability of a “Yea” at the whip count stage is given by:P(Y wci,t = 1) = P(δ1,i,t +θi ≤MVt −η1,t)= P(δ1,i,t ≤ γ1,t −θi)= G(γ1,t −θi). (4.4)Now, for the roll call vote, we have a “Yea” (given that the party may whip) if:δ1,i,t +δ2,i,t +η1,t +η2,t +θi ≤MVt + ymaxD , (4.5)Hence, the probability of a “Yea” at the roll call stage is given by:P(Y rci,t = 1) = P(δ1,i,t +δ2,i,t ≤MVt −η1,t −η2,t −θi+ ymaxD )= P(δ1,i,t +δ2,i,t ≤ γ2,t −θi+ ymaxD )= G1+2(γ2,t −θi+ ymaxD ). (4.6)We now proceed with some parametrization assumptions, and prove identification ofthe parameters of the model.Consider the following Assumptions:Assumption 1 (Normalization): We normalize one politician (without loss of general-ity, politician “0”) such that θ0 = 0.Assumption 2 (Distributions): (i) G is the CDF of a standard Normal distribution, withCDF denoted by Φ(·). It is the same for whips and non-whips, and across both parties.24Furthermore, (ii) q follows a Normal(µq,σ2q ). We will allow this distribution to be partyspecific as well.Assumption 1 normalizes the ideology of one politician to 0. Without this normaliza-tion, we cannot identify the individual ideologies θi (just as in fixed effects regressions).We would only recover the difference of ideologies across legislators.24It is not essential for it to be a Normal distribution, we will only need it to be a standard distribution(no parameters to be estimated). The Normal distribution is convenient as it has a simple closed form for theconvolution G1+2, which also becomes a Normal distribution.99Assumption 2 (i) implies that the variance of the distribution G1+2 is equal to 2. Weneed to standardize the distribution of δ1, because the decision at the whip count stage isanalogous to a discrete choice model (e.g. Probit or Logit model). The variance and meanof the errors are not identified in this class of models. We had already assumed that δ1,δ2were mean zero.Meanwhile, Assumption 2 (ii) describes qt with a flexible (parametric) distributionthat satisfies the main assumptions in the theoretical model. A parametric distribution isneeded, since we can only recover qt from bills that are pursued (see Proposition 4.3.6).This parametrization allows us to infer the distribution of the status quo over the bills thatare not pursued as well, once we have estimated its conditional version. Note that, althoughwe use a Normal distribution for qt , the resulting distribution of marginal voters, MVt , canbe very different than that of a Normal distribution.We now prove that we can identify the set of parameters,Θ ={{{θi}i,{γ1,t}t ,{γ2,t}t ,σ21 ,σ22 ,ymaxR ,ymaxD ,µq,σ2q ,ql,qr}}, as well as the mass of billsthat are whip counted.254.5.1 Identification of the ModelWe focus of identifying the party-specific parameters for the Democratic Party. The argu-ment is analogous for Republicans.From equation (4.4) under Assumption 2(i), we have that, for every i and t:Φ−1(P(Y wci,t = 1)) = γ1,t −θi. (4.7)The left hand side of this equation is (a transformation of) the probability of “Yea” atthe whip count stage.The difference of equation (4.7) across politicians i and 0 in period t:Φ−1(P(Y wc0,t = 1))−Φ−1(P(Y wci,t = 1)) = θi, (4.8)where we have used that θ0 = 0 (Assumption 1). Intuitively, θi is identified by the dif-ferences in the probability of saying “Yea” at the whip count stage for different politiciansrelative to the normalizer. The normalization of θ0 = 0 allows us to pin down exactly thedistribution of θi and not of the differences. The whip count stage serves as a baseline, giv-ing us how ideology affects the probability of saying “Yea” or “Yea” before party discipline25That is, the mass of q within the theoretical set from Proposition 4.3.6, given by (ql ,ql ]∪ [ql ,qr).100takes place.Since θi is known, we have that γ1,t is known for an arbitrary t from equation (4.7).Therefore, the realized marginal voter at the whip count, γ1,t is identified from the proba-bility of saying “Nay” for a known politician at the whip count stage.Moving on to the roll call period. Using Assumption 2, we rewrite equation (4.6) as:P(Y rci,t = 1) = G1+2(γ2,t −θi+ ymaxD )= Φ(γ2,t −θi+ ymaxD√2), (4.9)where G1+2 is a Normal distribution CDF with variance 2 by Assumption 2(i).Equation (4.9) implies that:Φ−1(P(Y rci,t = 1)) =γ2,t −θi+ ymaxD√2, (4.10)for every i, t.Note that, by their definition:γ1,t − γ2,t = η2,t . (4.11)Therefore, using equations (4.7), (4.10) and (4.11), we have that for an arbitrary bill t:Φ−1(P(Y wci,t = 1))−√2Φ−1(P(Y rci,t = 1)) = γ1,t −θi− (γ2,t −θi+ ymaxD )= η2,t − ymaxD . (4.12)Taking expectations (over t) on both sides implies that:Et(Φ−1(P(Y wci,t = 1))−√2Φ−1(P(Y rci,t = 1)))= −ymaxD , (4.13)since η2,t is mean zero. The intuition is that the average change of voting behavior fromthe whip count stage to the roll call stage for party D is given by the whipping parameter,ymaxD . Using an average is important: there are idiosyncratic ideology shocks with every billbetween both stages, but their average is zero. The changes that are not mean zero are thoseoriginating from party discipline. This argument can be repeated for every Congress and101allows us to estimate ymaxD for every congressional cycle.26Since ymaxD is identified, we recover the individual values of γ2,t from equation (4.10).The set of γ2,t that is recovered includes bills with only roll calls (which, for convenience,we will denote as γrc only2,t , as well as those that have both roll calls and whip counts.Since γ1,t ,γ2,t have been identified, equation (4.11) implies that the distribution of η2,tis semiparametrically identified. It follows that we can recover its variance, σ22 . To boundthe variance of η1,t , denoted by σ21 , we then use the following information. We know thatfor bills with whip counts:Var(γ1,t) =Var(MV countt )+σ21 . (4.14)The left hand side is known, and we know that Var(MV countt ) ≥ 0. Hence, we havean upper bound for σ21 given by Var(γ1,t), which must be satisfied. This bound will besufficient for empirical purposes, although we can construct an improved pointwise valuethrough a more involved recursive argument.27Now, let us consider bills that only have roll calls.28 By Proposition 4.3.6, these billswith only roll calls have status quo’s that satisfy q ∈ (−∞,ql]∪ [qr,∞). For these bills, weknow the distribution of:MV rc onlyt = η1,t +η2,t + γrc only2,t . (4.15)This is because η1,t +η2,t ∼N (0,σ21 +σ22 ) and the distribution of γrc only2,t is identifiedbased on estimated bill fixed effects in the roll call votes.The left hand side of (4.15) is given by xno count,∗(q)+q2 , which is a known invertible func-tion of q (Lemma C.1.1 (2) in Appendix). Hence, the distribution of {q : q ∈ (−∞,ql]∪[qr,∞)} is identified. This includes the truncation points, ql,qr, and the parameters {µq,σ2q },26For the Congresses where we do not have Republican whip counts, we can still recover its ymaxR . This isbecause we have data for the Democrats for each Congress, and hence, we can recover ymaxD for each Congressby the argument above. Finally, note that with only roll calls, we can always identify the sum ymaxR +ymaxD . Thisis done by taking the difference of equation (4.10) and its Republican counterpart for members of opposingparties.27As we show in the next paragraphs, given an initial value of σ21 , including its upper bound, we can recoverthe distribution of qt and the remaining parameters consistent with this initial value. We can then check whetherequation (4.14) holds with our initial value. If it doesn’t, we can generate a new estimate of σ21 using theobserved Var(γ1,t) minus the Var(MV countt ) from our previous estimate. This algorithm can work recursivelyuntil convergence of a σ21 estimate that exactly satisfies equation (4.14).28Bills with both whip counts and roll calls are a selected subsample. A truncation on η1 after the whipcount, denoted by the threshold η¯1 in our model indicates the draw of the first aggregagte shock below whichthe bill is not brought to the floor after a whip count (because there is insufficient support). Looking at billswith only roll calls avoids this specific selection issue for the identification of the distribution of q.102as they uniquely define this truncated distribution.29 Since we observe the proportion ofbills that only have roll calls relative to those that also have whip counts in the data, it fol-lows that we know the mass of bills that have been whip counted.30 This completes theidentification of the model.314.5.2 Krehbiel’s critique: Lack of identification of θi and of party effectswithout whip countsWithout whip count data, the whipping parameter ymaxD is not identified. To note this, wecan look at equation (4.6).If we did not know θi and had to estimate it from roll call data, we could redefineθ˜i = θi− ymaxD and have that:P(Y rci,t = 1) = G1+2(γ2,t −θi+ ymaxD )= G1+2(γ2,t − θ˜i). (4.16)Hence, with roll call data alone, we cannot separate a shift in everyone’s (true) ideologyfrom party discipline effect due to whipping (the basis of the critique in [111]). It is furtherimportant to note that a correct estimation of the ymaxD “shift” is crucial to correctly positionlegislator ideology distributions and therefore to assess the extent of polarization - typicallymeasured in inter-party distance between median ideologies.4.6 EstimationGiven that we have identified the set of parameters of interest, we can now proceed toestimation. We observe the outcome of votes at both the whip count stage for both partiesp ∈ {D,R} (denoted Y p,wci,t ) and at the roll call stage (denoted Y p,rci,t ), for each politiciani ∈ {1, ...,N} and bills t ∈ {1, ...,T}.To estimate the model, we find the distribution of {θi,γ1,t ,γ2,t ,ymaxD , ,ymaxR ,σ21 ,σ22 } forpoliticians and bills that have whip counts and/or roll calls. This is done by MaximumLikelihood.29Although MˆV is still unknown, we can recover it from its definition: as we know the distribution of θ foreach party, we know ymaxP for each party and the number of politicians from each party, NR,ND.30We do not know, however, how that mass is divided to the left and to the right of the party median.31Although the parameters of the agenda setting part of the model are identified, we do not pursue theirestimation in this chapter due to finite sample limitations.103By replacing the conditional probability of voting “Yea” on the roll call given a “Yea”on the whip count by the unconditional one, we can define a pseudo-likelihood for the firststep given by:L (Θ;Y wci,t ,Yrci,t )= ∏p∈{D,R}T∏t=1N∏n=1P(Y p,wci,t = 1)Y p,wci,t P(Y p,wci,t = 0)1−Y p,wci,t P(Y p,rci,t = 1)Y p,rci,t P(Y p,rci,t = 0)1−Y p,rci,t ,(4.17)Operating with the pseudo-likelihood as opposed to the more cumbersome original like-lihood has no effect on the consistency of the estimation ([88], [171]). This is because ourmodel is identified despite the nuisance of the dependency between the roll call and thewhip count stages.Focusing on the Democratic Party, we can use equations (4.4) and (4.6), together withour parametrization to reexpress (4.17) as:LD(Θ;Y wci,t ,Yrci,t ) =T∏t=1ND∏n=1Φ(γ1,t −θi)Y wci,t (1−Φ(γ1,t −θi))1−Ywci,t ××Φ(γ2,t −θi+ ymaxD√2)Y rci,t (1−Φ(γ2,t −θi+ ymaxD√2))1−Y rci,t,(4.18)where we use that G is a standard Normal distribution CDF, G1+2 is a Normal distri-bution CDF with variance 2, ND denotes the number of politicians in D, and P(Y mi,t = 1) =1−P(Y mi,t = 0), for m = wc,rc.Analogously, for the Republican Party32, we get:LR(Θ;Y wci,t ,Yrci,t ) =T∏t=1NR∏n=1(1−Φ(γ1,t −θi))Ywci,t Φ(γ1,t −θi)1−Y wci,t ××(1−Φ(γ2,t −θi− ymaxR√2))Y rci,tΦ(γ2,t −θi− ymaxR√2)1−Y rci,t,(4.19)Our estimation problem in the first step is to maximize equation (4.17), using (4.18)and (4.19) subject to θ0 = 0 (Assumption 2 for normalization). In practice, we set the32The derivations are shown in Appendix, resulting in equations (C.12) and (C.14).104politician in our sample with DW-Nominate score closest to 0 as our normalizer. This isdone to help comparisons with previous estimates in the literature. For starting values forthis optimization, we use the results of estimating (4.17) for bills with both whip counts androll calls.To estimate it, we must also consider two additional details of the data. First, we mustassign what consists of a “Yea” and of a “Nay”, given that the questions from whip countsand roll calls might not be clear cut.33 For this, we use the party leader votes to assignwhether that bill was a “Yea” for the party or a “Nay”. In order of priority, if available,first we use the (majority/minority) party leader’s direction of voting, (majority/minority)party whip, and then the majority of the party, if needed. For the large majority of bills, it issufficient to look at the party leader votes. Second, because bills in the theory can originatefrom status quo’s both to the left and to the right of a party median and we do not observefrom which side, we must be able to assign whipping directions. Again, we use party leaderdecisions. Our theoretical framework proposes that party leaders are proposing the bills,and should theoretically say “Yea” if they proposed it. Hence, bills that have one partyleader saying “Yea” at the Roll Call with the other saying “Nay” are assigned as proposedto the first. Bills that have both party leaders saying “Yea” are assigned to the minority, theRepublicans.34 Finally, there is a small minority of bills that have both party leaders saying“Nay” with no guidance from theory, yet we do not omit them for the purpose of beingconservative in avoiding selection.35Once the first step is estimated, we estimate σ22 by applying the variance operator onequation (4.11). With it, we find the variance of our estimated γ1,t − γ2,t for bills that hadboth whip counts and roll calls. We then use the upper bound for σ21 given by Var(γ1,t) asits estimate. We also allow some of the parameters to be time-varying. Given our identifica-tion arguments and our interest in changes over time to whipping technology, we estimatedifferent ymaxD ,ymaxR by Congress.33For example, often for the minority party, but not always, a whip count will be framed in the negative:“Will you vote against...” These questions can change directions from whip counts to roll calls.34In the theory, these bills can originate from either party. Nonetheless, we show in our empirical results thatthe results are consistent with proposals by the Republicans. This is because the estimated mass of γ2 for thesebills lies to the right of the Republican median.35We assign these bills to the Democrats, as the majority party, as these may be token votes conceded tocertain members for idiosyncratic reasons outside the model. Our choice holds under an argument of negativeagenda control (for example, [53]). Absent support by both the majority party and the minority, the bill wouldmost likely never reach the floor to begin with. Therefore, not observing a party D leader voting “Yea” for thesebills is just a limitation of the proxy that we use for this assignment.1054.7 ResultsWe now present Maximum Likelihood estimates based on our approach. Our first steplevers whip counts as being revealing of true legislator ideology. That different - and weargued, revealing - information is contained in whip counts relative to roll calls can beintuitively displayed. For bills presented by the majority party that have both whip countsand roll calls, Figure 4.3 plots the distribution of individual vote choices aligned with theparty leadership at the whip count phase and the roll call phase. The number of membersvoting with the leadership dramatically increases. The shift in the figure is from around 160votes aligned with the leadership at the whip count phase at the mean of the distributionto around 218 votes aligned with the leadership on average. Notice that 218 is the simplemajority threshold for the chamber - what is needed to pass a bill at roll call. Around 58members are persuaded to toe the party line on average, exactly the phenomenon we aim tocapture.Table 4.3 presents our estimates from the Maximum Likelihood Estimation for eachparty of the ideology and whipping parameters of the model. In this step we recover from315 whip counts and 5424 roll call votes the estimated legislator ideologies θi for 711members (the Table reports the party medians for each Congress). We also recover theparty discipline parameters ymaxD and ymaxR for each Congress and two standard deviationparameters for the aggregate shocks, σ1 and σ2. All parameters are precisely identified.The model overall identification and convergence was tested repeatedly in Monte Carlosimulations, recovering the assumed parameter vectors and experimenting on a wide rangeof initial values.As it can be seen from the Table, party polarization in terms of distance θm,R− θm,Dclearly widens over time. In addition, ymaxD and ymaxR for each Congress are positive andstatistically different from 0, the alternative hypothesis for a model absent party discipline(i.e. with no whipping).Figure 4.4 plots kernel densities of the estimated legislator ideologies, θi, by party andover time from our full model. It also offers, as a way of comparison, the correspondingideological distribution which would obtain when estimating a misspecified model in whichwe impose by construction ymaxD = 0 and ymaxR = 0. Figure 4.4 is the most intuitive repre-sentation of the bias induced by misspecification of a model where estimation of legislatorideal points does not take into consideration party discipline. For reference, the reader mayconsider the DW-Nominate optimal classification scores, Heckman-Snyder linear probabil-ity model scores or Markov chain Monte Carlo approaches based on roll calls alone. The106models are comparable as they are under the same assumptions: that voting is sincere andlegislators know the position of the bill they have to vote.As it can be seen from Figure 4.4,the distance between party medians is accentuated by the omission of ymaxD + ymaxR in the mis-specified model (represented by the dashed kernel densities) relative to the model correctingfor party discipline (represented by the solid kernel densities). The ideology distributionsare much closer together in reality than in the misspecified model. As a proof of concept,Figure 4.5 shows the estimated legislator ideologies θi from our full model and from themisspecified model absent party discipline compared to DW-Nominate scores. The mis-specified model and DW-Nominate trace each other accurately. However, our full modelreveals a gap in density over the ideological middle ground driven by the loading on legis-lator ideology of a significant component of party discipline omitted using DW-Nominate.This is substantial bias in DW-Nominate, amounting to around 0.30 in DW-Nominate units.The contribution to changes in the ability of party leader in persuading their membersthrough whipping is apparent in our estimates. Figure 4.6 reports estimated party disciplineymaxp by party across all our Congresses and their 95% confidence intervals. The ymaxD andymaxR are not only precisely estimated, but also statistically different when compared at thebeginning of our sample in 1977 relative to the end of our sample in 1986. The trend in ymaxpfor both parties is clearly positive, tracing an increase in the reach of party leaders over therank and file. This is a consequential finding.36The change in party discipline turns out to be a major factor in explaining party polar-ization. Table 4.4 presents a decomposition of political polarization based on differences inthe distributions of legislator ideologies, represented by θm,R− θm,D, and party discipline,given by ymaxD + ymaxR . The share of polarization due to party discipline ranges from 67percent to 71 percent and appears to be also increasing over time.In terms of further probing of our model, Table 4.5 reports a useful check. The the-oretical model predicts that the equilibrium marginal voter for a bill where a party goesfor a whip count should be closer to the proposer party median relative to bills where theparty goes for roll call directly. The intuition is that the former types of policies should36This rise in polarization and party discipline in the mid 1970s coincides with large reforms conducted inthe House of Representatives. During this period, power was heavily concentrated in the party leadership’shands. Amongst the changes, leaders were now responsible for committee assignments, including the RulesCommittee (instead of it being by seniority), larger control of the agenda progress was given to the Speaker,new tactics emerged such as packaging legislation into “megabills” and the Democratic Steering and PolicyCommittee was formed. The latter met regularly to gather information and determine tactics and policies, withthe leadership controlling half the votes. One strong motivation for these reforms was policy: to guarantee thatmore liberal policies could pass and not be held back by certain Committee chairmen. See [153] for a thoroughdescription of the reforms and motivation.107be riskier in terms of passage but have sufficiently high value to warrant the use of whipcounts as option to explore the likelihood of success of the bill. Averaging over the relevantt = 1, ...,T and comparing the estimated γ1,t under whip count and the estimated γ2,t underroll call only bills, the model predicts that Σt | γ1,t−θm,p |< Σt | γ2,t−θm,p |for both partiesp ∈ {D,R}. This is a (subtle) implication of the theory that we do not impose anywhere inestimation. It appears strongly satisfied in Table 4.5.Additional probing of the model is in terms of fit. Table 4.6 reports in-sample modelfit based on individual vote choices correctly predicted by the model. The overall fit forroll calls correctly predicted (with and without whip counts) is 82.4% and for whip counts,66%. Because whip counts are fewer and the MLE does not penalize in any sense incor-rectly predicting a whip count versus a roll calls, the fit is higher in the more numerous rollcall sample. Overall, the fit of the model is very good, especially considering that not asingle roll call is dropped (either lopsided or close). This differs for extant approaches thatcondition on (occasionally hard to justify) selected subsamples of votes. For comparison,over our sample the DW-Nominate prediction rate is 85.9%, but the procedure drops 892roll calls that we instead use.We conclude this section with a specification selection test for constant party disciplineparameter ymaxp . This alternative is broader than simply including a party discipline param-eter equal to zero (i.e. ymaxp = 0, no whipping for p ∈ {D,R}). Our full model is estimatedagainst a model with strategic agenda setting in terms of choice of the status quo qt to pur-sue and optimally selected alternative xt but with ymaxp constant over time. A likelihood ratiotest, presented in Table 4.7, rejects the alternative model with constant ymaxp relative to ourbaseline full model at high confidence levels (p-value < 0.001). This particularly includesthe rejection of the hypothesis ymaxD = ymaxR = 0.4.8 CounterfactualsWe assess the importance of party discipline with counterfactual exercises.We analyze the importance of party discipline for the approval of legislation. To doso, we keep the policy alternatives to be voted on as in the data, but assume that partiescannot whip their members for given xt . The legislators vote solely according to theirideologies. This exercise illustrates the extent that “Yea” votes are driven exclusively byparty discipline, complementing the analysis of polarization in Table 4.4. We focus on aseries of key bills from our sample. The results suggest that whipping was decisive in theoutcomes of key votes, but not always in the same direction.1084.8.1 The importance of party discipline for the approval of legislationOur first exercise looks at important legislation within our sample. It considers counterfac-tual roll call outcomes had parties not been able to whip. Within our model, this meansparties cannot discipline individual legislators (i.e. we set ymaxD = ymaxR = 0). Here, we stillassume that party leaders present the same actual legislation as in our data, subject to thesame shocks. This means that we maintain γ2,t =MVt−η1,t−η2,t at their estimated values.Among the bills that we consider are key legislation in international relations, securityand for the economy. These include the lifting of the arms embargo to Turkey, the PanamaCanal Treaty, the funding of the Contras of Nicaragua, as well as important economic poli-cies. The latter include the National Energy Act of 1977, that brought considerable changesto the industry, and the 1984 Reagan’s Tax Reform.The first and second columns of Table 4.8 show that our model fits these votes well.The third column of Table 4.8 presents the results of the counterfactual exercise. It showsthat party discipline is quantitatively important for the outcomes of these bills as, in somecases, the approval of these bills would have been reverted. There are also a series of lessobvious considerations.Interestingly, the number of “Yea” votes can increase or decrease depending on the bill.For a bill such as the National Energy Act, absent party discipline, we would have had 7less “Yea” votes in support. For the Tax Reform Act of 1984, the result would have beena decrease closer to 100 votes. For other votes, such as the Budget Authorization banningaid to the Contras (HR 5399), the counterfactual displays an increase in “Yea” votes absentparty discipline. The explanation for the different results depends on the policy being votedon and how that leads to different numbers of Democrats and Republicans being whippedby their parties.To see why this is the case, consider H.R. 5399 banning aid to the Contras. For this bill,the Democrats were whipping in favor of the policy, while the Republicans were whippingfor the status quo. The estimated value of γ2 is 0.678.37 As this bill is realized relatively farto the right, there are not many Democrats between the cutpoint of no whipping, 0.678, andthe cutpoint with whipping, 0.678+ymaxD . However, there are many Republicans in betweenthe whipped cutpoint, 0.678− ymaxR , and their no whipping cutpoint, 0.678. Therefore, ifneither party is able to whip, there is little change to the Democrats’ number of “Yea”votes, but we observe a large increase in “Yea” votes for Republicans who no longer arewhipped to the status quo. An analogous argument, with the opposite signs, holds for the37This number rationalizes the large number of both Democrats and Republicans voting “Yea”, even if theRepublican leadership voted against it.109National Energy Act and for the 1984 Tax Reform.Even larger effects can occur when parties are whipping in the same direction. Thisis the case for the lifting of the Turkey arms embargo (H.R. 12514, 1978), which has adecrease of about 200 “Yea” votes without whipping. For this bill, both parties whippedin the same direction.38 The estimated γ2 is −0.84, a value at which very few politicianswould have voted “Yea” absent party discipline. Therefore, removing whipping from bothparties removes most “Yea” votes. A similar case is the Panama Canal Treaty (H.R. 111 in1979).4.9 ConclusionPolarization of political elites is a major empirical phenomenon, that has recently reachedhistorical highs. It has consequential implications, ranging from heightened policy uncer-tainty (and its deleterious consequences on investment and trade) to gridlock and inabilityof political elites to respond to shocks and crises.There are contrasting views of what has been driving polarization. Some researcherspoint squarely at ideological polarization of legislators as a consequence of more polarizedelectorates, possibly in a perverse cycle of segmentation of the voting population alongeconomic and social cleavages driving the election of extremists. Other researchers cautionon the role of ideology and emphasize changes in the rules of controlling the legislativeagenda, increases in the leadership’s grip on policy platforms, and the capacity of parties tomore precisely reward and punish members through appointments and campaign resources.This chapter provides an identification strategy useful in separating these different drivers(all of which are at play). It provides a structural economic assessment of their role over theinitial phase of modern congressional polarization, at its inflection point between the 95thto 99th Congresses.This exercise requires an effort in solving extant political economy problems speakingto internal organization of parties - for instance in terms of internal aggregation of informa-tion from the rank-and-file and persuasion of party members on the fence. Our theoreticalsetting attempts to rationalize these problems within a unified structure. It offers a tractablebut realistic environment that we also estimate, levering whip count information. A seriesof counterfactual exercises indicates a quantitative relevant role for party discipline, almosttwice as important as legislator ideology in explaining polarization.Future research, including by the authors, should address the possibility of extending38This can be seen by both leaders (Rhodes (GOP) and Wright (Dem)) voting in favor of the policy.110our estimation methodology to periods where whip count data as precise and comprehen-sive as the one we employ here may not be available. In a separate paper we are working onan approach able to project some of the methods developed in this chapter beyond the 99thCongress. One aim is providing tools for the assessment of party discipline and unbiased es-timation of legislator ideology designed to be integrated within extant optimal classificationor Markov chain Monte Carlo methods.1114.10 Tables and FiguresFigure 4.3: Votes with the Majority Party at Whip Counts and Roll CallsNotes: The figure presents the kernel density of the number of votes from Democrats with their partyleader at the whip count and at the roll call stages. This is done for bills that had both whip countsand roll calls. The vertical line is plotted at 218, the majority needed to pass a bill in the House ofRepresentatives.Figure 4.4: Estimated ideologies, θi, per Party over TimeNotes: We show the distributions of estimated ideologies, θi, for each party and Congress in thethick lines. In the dashed lines, we show the estimated θi under a misspecified model that assumesno whipping and only uses Roll Call votes. The misspecified model overestimates polarization by a(theoretical) factor of ymaxD + ymaxR , see main text. In practice, we reestimate the misspecified modelto allow for numerical and robustness corrections.112Figure 4.5: Estimated Ideologies from the Model θi compared to DWNominateNotes: We show the correlation between our estimates of ideology to those of DWNominate. Inthe left panel, we fit our model with only roll call data (a misspecified version, assuming there isno whipping). The correlation of this model to the DWNominate estimates is 0.951. In the rightpanel, we present the estimates from our full model. The correlation of ideology estimates fromour full model to DWNominate is 0.829. The wedge generated by party discipline is shown bythe difference across both graphs. Quantitatively, two politicians from different parties that our fullmodels estimates as having the same ideology are estimated by DWNominate to be approximately0.3 DWNominate units apart.Figure 4.6: Estimated ymax per Party over TimeNotes: We show the time series estimates of the whipping technology parameter, for each party.These parameters are in units of our ideology estimates.113Table 4.1: Summary Statistics on Bill SelectionCongress95 96 97* 98* 99*A: Total Number of Bills Whip Counted 131 58 28 50 48B: Number of Bills Whip Counted, but not Roll Called 50 16 8 15 13C: Total Number of Bills Roll Called 1540 1276 812 906 890Notes: The table presents summary statistics on bill selection. It shows how many bills were whipcounted, whip counted but not roll called, and bills that were roll called over Congresses 95-99. *Wedo not have data for Republican Whip Counts for Congresses 97-99, see the Data section.Table 4.2: Number of Whips per PartyWhips Congress95 96 97 98 99Democrats (appointed) 14 14 20 26 41Democrats (elected) 21 23 23 23 23Republicans (appointed) 16 17 23 22 25Notes: The table presents the number of whips per Party over the different Congresses. Data is fromMeinke (2008). Both party leaderships appointed whips, however, between the 95th and 106th Con-gresses, the Democrats also elected assistant/zone whips independently of the party leaders (Meinke(2008)).114Table 4.3: Main Estimates from the First StepParameter Congress95 96 97 98 99ymax, Democrats 0.540 0.630 0.475 0.821 1.157(0.027) (0.027) (0.027) (0.027) (0.027)ymax, Republicans 0.570 0.550 0.851 0.868 0.655(0.007) (0.007) (0.008) (0.008) (0.008)σ1 0.450(0.006)σ2 0.742(0.178)Party Median - Democrats, θm,D -0.547 -0.552 -0.524 -0.552 -0.552(0.101) (0.040) (0.095) (0.040) (0.073)Party Median - Republicans, θm,R 0.00 0.091 0.126 0.142 0.181(0.030) (0.137) (0.105) (0.117) (0.043)N: 711T : 315 Whip Counted bills, 5424 Roll Called billsNotes: The table presents the main estimates for the first step parameters. Standard errors are inparentheses. For time-varying parameters, such as the party specific ymax, we write each Congressspecific parameter. σ1,σ2 are not time varying, so we present its estimate centered in the table.115Table 4.4: Decomposition of Polarization in Ideologies and WhippingCongress95 96 97 98 99Implications of Table 4.3 for PolarizationA: Polarization due to ideology (θm,R−θm,D) 0.547 0.643 0.650 0.694 0.733B: Polarization due to whipping (ymaxR + ymaxD ) 1.110 1.180 1.326 1.689 1.812C: Share of Polarization due to whipping (B/(A+B)) 0.670 0.647 0.671 0.709 0.712Notes: The table shows how (perceived) polarization changes over Congresses. The perceived po-larization is the change in “perceived” ideology due to both whipping and drifting ideologies. Theproportion of this aggregate distance due to whipping is approximately 2/3 throughout the sampleperiod.Table 4.5: Distance of γ1,γ2 to the Party MediansAverage distance of γ1,t Average distance of γ2,t ,to the Party Median only Roll Calls, to the Party MedianDemocrats 0.440 0.824Republicans 0.707 1.677Notes: The theoretical model gives a prediction that the marginal voter with whip counts should becloser to the party median than those with only roll calls (i.e. ∑t∣∣γ1−θm,p∣∣<∑t ∣∣γ2−θm,p∣∣ for bothparties p ∈ {Dem,Rep}). This holds true in our estimates, even though they were not imposed inestimation.116Table 4.6: Model FitModel Variable % Correctly Predicted Votes (“Yea/Nay”)Full Model Roll Call Votes 0.824Whip Count Votes 0.660Notes: The table presents the model fit in terms of correctly predicted “Yea” and “Nay” votes at boththe Whip Counts and Roll Call stages. The fit is better for Roll Calls as they are the vast majority ofbills, and we do not penalize the likelihood to improve inference for Whip Counts.Table 4.7: Likelihood Ratio Test for Constant ymaxModel Estimated ymax Log-LikelihoodTime Varying ymax See Table 4.4 −8.769×105Constant ymax Dem: 0.700, Rep: 0.689 −8.795×105p-value for LR test, with 8 degrees of freedom: 0.00Notes: We test whether the whipping parameter, ymax, is constant across all Congresses in our sample.To do so, we fit a restricted version of our model where each party’s ymax is the same throughout allperiods. We compare it to our original model, and reject the hypothesis of a constant ymax with aLikelihood Ratio test.117Table 4.8: Counterfactual: Voting Outcomes on Salient BillsBill Yea Votes (Data) Yea Votes (Model Predicted) Yea Votes (Counterfactual, no whipping)Aid to Turkey/Lifting of Arms Embargo, 1978 (H.R. 12514) 212 242 29Foreign Intelligence Surveillance Act of 1978 (H.R. 7308) 261 291 330National Energy Act, 1978 (H.R. 8444) 247 284 277Panama Canal Treaty, 1979 (H.R. 111) 224 259 59Tax Reform Act of 1984 (H.R. 4170) 319 409 286Contra Aid, 1984 (H.R. 5399) 294 285 392Notes: The table shows how the outcomes of votes on certain key bills in our sample would havechanged without whipping. To do so, we consider a subset of 6 bills and show (i) how many votesit received in the actual roll call, (ii) how many votes the model predicted it would receive, and (iii)how many votes the model predicts it would receive, absent whipping. For (iii), we set the whippingparameters ymaxD = ymaxR = 0, while keeping the realized marginal voter γ2,t = MVt −η1,t −η2,t attheir estimated values. Hence, the table just shows the effect of whipping, but without the partyreoptimizing the agenda under that assumption.118Chapter 5Information Accumulation and theTiming of Voting Decisions5.1 IntroductionUnderstanding how voters acquire information and learn is key to understanding voter de-cisions. A large theoretical and empirical literature proposes how information and learningcan shape electoral outcomes, incentives for politicians and voters, and ultimately, policyoutcomes.1Empirical evidence since the 1940’s, however, suggests there is only a limited role forlearning by voters. The minimum effects hypothesis, ([114], [22] and, more recently, [101])finds only small changes to voter behaviour when they are exposed to more information.2These results are hard to reconcile with observed behaviour, suggestive that voters mustrespond to information. The latter includes the expenditures in advertisement, focus groups,polling and debates by political campaigns, which aim at informing the electorate, togetherwith previously cited research.In this chapter, I provide an answer to this apparent contradiction. I use a structuralmodel of information acquisition, which highlights how the timing in which informationis acquired explains both sets of results. Some voters may not use new information that1These include the model of [68], on information and turnout, models of learning about candidates ([5],[80]) and recent empirical evidence of learning in primaries ([60], [108]) and elections ([103]).2As [101] conclude from their meta-study of experimental evidence on the effects of campaigning informa-tion on voters’ decisions, “the best estimate for the persuasive effects of campaign contact and advertising...iszero”. The two exceptions that the authors find are that persuasive effects may exist if (i) information is intro-duced months before election day, although that effect decays, and (ii) an election day effect may exist whencampaigns spend disproportionately targeting persuadable voters in the presence of extremist views.119becomes available, as they may have decided on their candidate long before election day.In this case, they may appear to have persistent preferences and not respond to information.Other voters might chose to use that new information to update their beliefs as they are stillundecided.3 Both the voting decision and its timing are observed with the survey data usedin this paper. This allows a model based empirical specification that can capture both thetiming of decisions (and how that presents evidence of learning or changing of beliefs) aswell as the voting decisions themselves (and the extent to which that information is used tochange voting behaviour).The theoretical model has the key feature of endogenous information accumulation.Voters accumulate information until the cost is larger than the benefit. When that occurs,they decide on whom to vote for. Voters benefit from information as it yields more precisebeliefs about the current state of their world, upon which they must choose their candidates.But acquiring information is costly, and as voters learn, the benefit of information decreases.It may decrease until this benefit no longer outweighs the cost. For some voters, this occursearly in the campaign, while for others, it might only be on election day. Such tradeoffimplies an optimal stopping time behaviour by voters.I estimate the model using panel survey data from Israel, which samples the same set ofvoters at the beginning of the campaign and after election day. Observed outcomes includewhen they decided on whom to vote for, and for whom they voted. I estimate the modelusing a simulation based maximum likelihood approach. The results show that I am ableto match several observed features of the timing of voting decisions. These include thedispersion in those decisions across the population (40% of voters know all along who theywill vote for, while many others decide in the last week) and the characteristics of latedeciders (they are often younger and more moderate).4 Furthermore, the structural modelallows me to disentangle the sources of such timing decisions. For some voters, it is thecost of acquiring information that makes them decide earlier in the campaign. For others, itis having tighter priors, which makes information less valuable.I find that younger voters have the lowest costs of information. Meanwhile, more edu-cated and politically knowledgeable voters decide earlier because they have tighter priors tobegin with. The distribution of information signals is also estimated. I find that informationis noisy and gets noisier over time.Finally, I consider the implications of the results for pre-election blackouts. Pre-election3Information early in the campaign may reach a greater number of voters, as more voters are still undecided,information later may be more salient, and swing the remaining undecided voters towards or away from acandidate.4See [34] for a survey of the evidence on the characteristics of late deciders.120blackouts are a widely used policy throughout the world ([2]) that bans electoral polls oradvertisements for a period just before election day. I show how this policy can lead towelfare losses, even if it was designed for “fairness” (i.e. so that all voters would havethe same amount of information before deciding). This welfare loss is driven by the banaffecting only the subset of voters who would have acquired additional information duringthe blackout. I find that the effect of increasing the blackout from 1 day, to 1 week in theempirical context is of 2% of the welfare of the affected voters, and around 0.7% of thewelfare averaged across all voters.This chapter speaks to recent works about the timing of endorsements, learning in elec-tions and campaign effects. [108] also focuses on the effects of the timing of informationupon voting decisions. They are interested in how votes in early contests in the US pri-maries may disproportionately influence outcomes, since they may change the decisionsof voters in subsequent states. [60], also considers the impacts of learning via primaries.However, their focus is on how the timing of primaries affects voters’ incentives for coor-dination. My present work complements both of these papers, as I focus on a wider class ofelections that do not have primaries or such publicly observed signals, and show how learn-ing by different subgroups may lead to campaigning effects. This helps rationalize howextant evidence of learning may coexist with the previously mentioned results that votersdo not respond to information in randomized experiments (e.g. [101]). Meanwhile, worksthat have studied the timing of endorsements, such as the theoretical approach in [159] and[47] show evidence that the timing of information does matter. This is evidence that votersrespond to information by updating their beliefs.This work is closely related to a literature in political science that studies the charac-teristics of late deciders in elections. For example, [73] point out that the timing of votingdecisions can play a key role in explaining the lack of campaign effects: who is learning,and when they do so, matters for whether voters will change their decisions. Nonetheless,these papers usually focus on the characteristics of late deciders from comparisons of sur-vey data ([44], [169], [29]) or in the form of logistic or probit regressions (for example,[86] and [73]). Lacking in this literature is a compelling theoretical framework for whythis happens. As [34] summarize, even studies with large samples “do little to clarify thesituation, as they come to differing conclusions about the character and behavior of thesevoters”. A notable exception is [124] which emphasizes the importance of rational voterswho acquire information. Similar ideas about learning and the impacts on a decision makerwho has limited time are present in models of marketing, such as in [141]. Meanwhile,the model by [58] also studied endogenous information acquisition, focusing instead on the121impacts of information acquisition on turnout.The rest of the chapter is organized as follows. In Section 5.2, I present the structuralmodel of costly information acquisition. In Section 5.3, I discuss how this model can beestimated. In Sections 5.4 and 5.5, I describe the empirical setting and the Data. Sections5.6 through 5.8 present the Identification, Estimation and Results. Some extensions arediscussed in Section 5.9, with the policy implications of the model discussed in Section5.10. I conclude in Section 5.11. Proofs and additional Tables and Figures are shown inAppendix.5.2 ModelThe model is one of dynamic information acquisition, with an environment reminiscent of[143]. There is a set of voters (citizens) denoted i= 1, ...,N. Time is discrete, t = 0,1, ...,T ,with t = 0 denotes the beginning of the political campaign, and T the election.There is an underlying unobserved state of nature x ∈ R. This can be interpreted as thestate of the country, the state of the economy, or whether it is welfare enhancing or not totake up a policy. For simplicity, it is one dimensional, representing a possible projection ofthe issues voters care about onto a left-right spectrum. It is drawn at the beginning of timefrom a N (0,1/τ) distribution, where τ denotes the precision of this distribution. Startingfrom their priors, voters choose whether (and how much) they want to be informed aboutthe issue at hand.The realized state x is unobserved to the voter. She can attempt to learn it over time, tomake the best informed voting decision. Voters have priors given byN (µi,1/τi). I assumeτ¯ = τ , denoting that on average, the precision of voters is as nature’s.5At t = 0, after x has been drawn, voters can choose whether they are going to take part inthe political process (i.e. observe and learn about politics from the campaign). This meansthat they choose whether to access the learning technology or not. This (access) is deemedcostly, parametrized by κ > 0. We will describe this further.6If they choose to have access, each voter can then accumulate information from t =0, ...,T −1, by acquiring a new signal at each period. The cost of acquiring a signal (given5Priors can be from previous experiences that make them persist with their choices (or perception of theworld), such as due to cognitive dissonance ([140]), due to past experiences from before the elections (in thegenerational model [20]), from party identification which influences their views of the world ([42]) or frommedia from other sources or political issues ([1]).6Allowing one to learn involves signing up for costly subscriptions, for reputation losses or for time com-mitments. Assuming this cost is not essential for the model, but useful for empirical purposes. It allows me tocapture voters who do not care or pay attention at all to campaigning.122they have the technology to do so), is given by ci > 0.Note that this model assumes that voters start the information accumulation processtogether, at the beginning of the campaign. They end this process at different times. Analternative set-up of this same model would be one in which voters choose different startingpoints, but have the same stopping time (election day, T ). The latter interpretation, however,is inconsistent with the data on the timing of voting decisions. In the data we observe, suchas the one used in this paper (e.g. Table 5.1 for Israel), the American National ElectionStudies and others, the voters choose whether they decided on their candidate at differentpoints of the campaign, relative to the election day. They do not reply whether they startedmaking the decision at different points of the campaign.5.2.1 TimingI now review the timing of the model, for convenience to the reader. In the followingsections, I then expand on each decision branch by deriving the associated equations thatdetermine what the voter chooses to do.At t = 0, the voter decides whether to have the option of learning. This will be seen byequation (5.6), in a following section. If she does not, she chooses the politician/party thatmaximizes her utility according to her priors.If she does allow information, from t = 0 to t = T −1 (the day before the election), shechooses whether to acquire information, by solving the (sequential) problem of acquiringa signal, given that the utility at a period is given by equation (5.5), derived in the nextsection.The beliefs are updated following Bayes’ rule (see equation (5.3)).Finally, given the beliefs, at t = T the voter chooses the candidate that maximizes hisutility given his beliefs.5.2.2 Environment - VotersOn election day, T , each voter i has quadratic preferences of choosing party j at period tunder beliefs about x, given by:E(ui, j,t | {ei,1, ...,ei,T}) =−E[(a j−bi− x)2 | {ei,1, ...,ei,T}]− T−1∑t=0ciyi,t , (5.1)where a j is the policy proposed by party j, bi is her ideology, ei,t a signal received at123period t and x is the state of nature. ci is the cost of acquiring a signal; and yi,t refers towhether a signal was acquired by i (value of 1 if yes, and 0 if not) at period t. The sum inthe second component represents the total amount of signals bought, with ei,1 being a signalreceived at period 1.If she decides to take part, a signal about x can be obtained at each period t = 0, ...,T−1,by paying the cost ci7 and follows:ei,t = x+ εi,t , (5.2)with εi,t ∼ N(0,σ2) and εi,t i.i.d across time and individuals. This signal can represent anew source of information, the processing of information available at that day ([130]), ora compilation of news feed from the day. The costly acquisition can mean simply that itis costly to process or take new information in, even if available freely. The informationstructure is taken as exogenous.Beliefs are updated following Bayes’ rule, as in [5], [80].8 Let the signal history atperiod t be denoted (Hi,t) and i’s likelihood of the state being x represented by Li(x).Then, by Bayes’ rule:Li(x |Hi,t ∪{ei,t+1}) ∝ Li(ei,t+1 |Hi,t ,x)Li(Hi,t | x)Li,0(x) (5.3)∝ Li(ei,t+1,ei,t , ...,ei,1 | x)Li,0(x) (5.4)with the densities described before andLi,0(x) being the likelihood with the priors of i.Once the sequence of information has been gathered; at the moment of elections (T ),the voter will make a decision on who to vote for.We begin by reviewing the choice of the voter in the branch in which she accumulatesinformation (or partakes in the technology to).7Note that here, the voter wants to learn about x. This differentiates the model from other ones of Bayesianlearning such as [5], [80], [19] where the voter is learning about the differences in campaign policy. I allowthem to be learning, instead, of the issues upon which the policy is being made upon. This is consistent withthe data, in Table D.3 that voters seem to attribute more significance in these elections to the economy/security,and not to valence issues.8I am empirically supported in this assumption by evidence of updating of information ([103]) and rationalchoice in voting decisions in the context of private information in elections, such as the evidence about theswing voter’s curse, [91] and [9]. In our model, there are no strategic interactions, so the environment isconsiderably simpler and would be expected to lead to outcomes close to rational ones [46].124Accumulating InformationSince x is unknown; the voter will have her expected utility of choosing j, at moment T ,given the received signal history (Hi,T ), given by:Ei(ui, j,T (y) |Hi,T ) = Ei(−(a j−bi− x)2 |Hi,T )− T−1∑t=0ciyi,t= −(a j−bi−Ei[x |Hi,T ])2−Ei[(x−Ei(x |Hi,T ))2 |Hi,T ]−T−1∑t=0ciyi,t= −(a j−bi−Ei[x |Hi,T ])2−Vari[x |Hi,T ]−T−1∑t=0ciyi,t , (5.5)where the expectation is taken according to i’s beliefs about x, conditional onHi,T ; and thesecond line follows from adding and subtracting the conditional expectation of x inside thequadratic term.The utility function (and hence, the first term) can be interpreted as a civic duty ([67],[58]) component in the voting: here, the voter gains utility in making the correct choice.This correct choice for her is dependent upon what she knows and her ideology.Note that the utility, in the second line, is written as the sum of 3 components: (i)the utility from voting in j, given the knowledge at T ; (ii) the (expected) quadratic lossfunction from trying to estimate x, given her information, and (iii) the cost of acquiringanother signal.Term (i) is taken as a state of what the voter would gain from voting for j with herinformation set at T . As in Proposition 8 of [143], only the last two terms impact thedecision of information accumulation in the sequential decisions. This is because the firstterm is not known in advance to the voter: at any period t, she expects E(x |Hi,t) to be thesame as E(x |Hi,t+1). This is because the signals are i.i.d., and she cannot anticipate thatthe mean is different than what she already believes in.The second and third terms are the ones that determine the information accumulation.Their values in any period t+1 are known at t (as we will show in the next section). Theseterms have an underlying clear statistical interpretation: the voter tries to estimate x byacquiring signals. Her estimate, given the history of her signals is E(x |Hi,T ). She is tryingto minimize the loss from thinking that the true value is E(x |Hi,T ) when it is actually x.This expected loss can be decreased by acquiring new signals. The gain is exactly the gainin the precision of the beliefs.125Out of the Political ProcessIf the citizen is not taking part in the political process, then she will choose j to maximize:maxj∈{1,...,J}−(a j−bi−µi)2,where I have replaced the prior of x being µi. Denote as p˜ as the party that solves theabove.At the beginning of t = 0 (after x has been drawn, but before the choice to acquire thefirst signal). Each voter has to choose whether to be “In” or “Out” of the process. The voterchooses to be “In” if she deems that it is likely she will change her mind on who to vote.This is because the benefits of information are sufficient to pay the cost, κ > 0, of accessingthe learning technology.She will choose “In” if her vote choice at T , v∗i , satisfies:P(v∗i 6= p˜ | m∗)−κ > 0, (5.6)where m∗ is the number of signals she anticipates acquiring. This captures that, al-though the agent does not know the signals that will come; she can anticipate how muchinformation she will need. If it is unlikely that she will change her mind, even when con-fronted with all the information she would like to have, then there is no reason to activelytake part in costly learning.5.2.3 Political Parties and VotingI model political parties by attributing them a policy value a j ∈ R, for j = 1, ...,J where Jis the number of parties. It can be interpreted as either their campaigned policy if the statewas x (such as their economic policy if x represents the state of the economy). Anotherinterpretation is the ideological position of that party.This captures that as the citizen learns, she finds what party most suits her beliefs andideology. I can consider an outside option for those who choose not to vote, but our sampleconsists mostly of individuals who vote; so I do not consider turnout.99Israeli elections, the empirical application of this chapter, have a high turnout, around 65%-70% in general.Furthermore, our data shows that over half of the individuals who suggested they were not going to turnout,ended up voting.1265.2.4 Voting DecisionsAt t = T , the agent solves:maxj∈{1,...,J}−(a j−bi−Ei[x |Hi,T ])2, (5.7)where this comes from equation (5.5), noting that at T the last two terms are sunk. Idenote the party voted by i as:v∗i = argmax j∈{1,...,J}− (a j−bi−Ei[x |Hi,T ])2, (5.8)and I denote the optimal timing in which the voter makes her decision (i.e. stops acquir-ing more information), as t∗i , which is the last period in which a voter acquired information.I now characterize the solution to the problem. The characterizations of the functionalforms and of comparative statics provided next will guide the empirical approach.5.2.5 Results and SolutionAn Overview of the Theoretical ResultsThe model builds from the need of voters to understand what is truly happening (the state ofnature). All voters, starting from their priors, understand how much they do not know andwhat information can give them. They do not know what the information will be (the valuesof the signals), but they understand that more information will lead to better decisions.First, I present two standard results that will help find closed form solutions to thebeliefs about x, and that will be used to construct the empirical specification. Proofs arepresented in Appendix. Assume that at period t, agent i has received mi signals.Lemma 5.2.1. The variance of i’s beliefs about x is given by Vari[x |Hi,t ] = σ2mi+τiσ2 andis decreasing in the number of signals, in τi (the precision of the priors about x), andincreasing in σ2 (the variance of the signal). It also does not depend upon the (realized)values of the signals acquired.This lemma, a well known statistical result from Normal distributions (see [59]) statesthat, as the voter has more precise prior beliefs about x, so will her posteriors be moreprecise. More precision of either the signals or the prior lead to more precise posteriors.Note that this Lemma implies that the values of the signals do not matter for the compu-tation of the variance (and, hence, the decision to acquire information). This follows from127the normal distribution assumption on the signals: the risk of a Normal distribution doesnot depend upon its values (see [59]). This is important as it implies that the decision ofaccumulating information does not depend upon the value of the signals themselves, andhence, a voter knows in advance the amount of information she wants to have, as well asthe decision trajectory at every moment. If there are no signals, there are no gains in in-formation and, hence, no changes to the optimal policy. The following Lemma formalizedthis.Lemma 5.2.2. The voter’s optimal decision is characterized by a fixed amount of signalacquisitions such that a signal is acquired if (and only if):Vari[x |Hi,t ]−Vari[x |Hi,t ∪{ei,t+1}]> ci, (5.9)for each period t = 0, ...,T − 1, and the optimal number of signals to be acquired is char-acterized by:m∗i =⌊(σ2ci)1/2− τiσ2⌋where bwc denotes the largest non-negative integer smaller than w.By receiving an additional signal, the agent pays cost ci and updates her beliefs about xby Bayes’ rule. An additional signal increases the precision of her beliefs by:Vari[x |Hi,t ]−Vari[x |Hi,t ∪{ei,t+1}] = σ2mi+ τiσ2− σ2mi+1+ τiσ2Then, I can substitute the values of the variance to get the following Corollary:Corollary The stopping time for voter i is given by the m such that:m∗i =0 if ci > 1τi − σ21+τiσ2m such that σ2m−1+τiσ2 −σ2m+τiσ2> ci > σ2m+τiσ2− σ2m+1+τiσ2T if ci < σ2T−1+τiσ2 −σ2T+τiσ2This result shows that the optimal stopping time for accumulating information does notdepend upon the values of the signals, and leads to a closed form solution for the stoppingtime, similar to [59]. The cost of an extra signal is compared to the precision gains.As time goes on, voters stop as they acknowledge they know enough (or the benefit ofknowing more is little compared to the cost), and are ready to make their decisions on who128to vote. This is done by looking at the best fit of a political party/politician, which balancesone’s ideology and one’s belief about the state. Note that, as T < ∞ and information iscostly, beliefs need not converge. In fact, in general there will be dispersion of posterior be-liefs about x across voters, even conditional on ideology and cost. After every citizen stops,they may have conflicting views because of the signals they received, leading to dispersionin choices for even seemingly identical individuals (as we see in the data). This can explainwhy works such as [86], who focus only on controlling for observed characteristics, mightfind that late deciders have “less predictable” choices.With the results above, I have fully characterized the optimal stopping problem of infor-mation accumulation. Now, I provide a useful characterization of the posterior mean belief,useful to model the voting decision in the next section.Lemma 5.2.3. Ei(x |Hi,t) = ∑mit=1 ei,t+σ2τiµimi+τiσ2. It has ambiguous sign in the number of signals,depending upon the value of the history of signals themselves.This result shows that the expected value of x is a weighted mean of the signals; withweights that are functions of the precision of the signals and of the prior. As importantly,the term is linear in the values of the signals. The priors become less relevant with moresignals. Although I do not observe the values of the signals in the data, the fact that it is asum of them allows us to use the Normal distribution property of the signals. This meansthat the conditional belief of a citizen follows a normal distribution.This Bayesian learning result also accommodates the exceptions for the null/zero effectin [101]. After reviewing all experimental evidence for U.S. general elections, they foundno effect for campaigns except when (i) information is introduced months before electionday, although that effect decays and is not present on election day, and (ii) an election dayeffect may exist when campaigns spend disproportionately targeting persuadable voters inthe presence of extremist views.Lemma 5.2.3 and Corollary explains these results. Information introduced earlyon is received by more voters who are updating their beliefs. Hence, the campaign effectsmay persist (since they are part of voters’ beliefs through the result in Lemma 5.2.3). How-ever, information introduced on election day might not show up on average effects, sincethe majority of voters is not accumulating that information and does not update their beliefswith it.Corollary and Lemma 5.2.3 will allow us to construct the likelihood I will usefor the estimation of the model. As signals are i.i.d. across agents and time, Lemmas 5.2.1and 5.2.3 are valid for every individual.129Finally, we look at the following comparative statics which are relevant to the literature.Lemma 5.2.4. As the cost of acquiring information ci increases, the voter does not chooseto acquire more information. For a large enough ideology |bi|, if |bi| increases (i.e. thevoter is more ideologically extreme), the voter will acquire less information.This is due to the more extreme voter staying out of the information acquisition. Con-ditional on acquiring some information, then there is no effect of ideology on the stoppingtime.This concurs with the early observations in the literature on issues driving the decisionsof late voters, but ideology of the early deciders as seen in [44], as cited by [169]: “(1)Partisan precommitment is sufficient to preclude campaign effects, (2) in the absence ofprecommitment, those exposed to the campaign will make their decisions primarily on thebasis of campaign-specific information”. In the model, partisan precommitment is equiv-alent to a prior µi close to that of a specific party, with a large enough precision τi. Inthe data, many “know all along” who they would vote for, many months before even thecampaign was to start.From [144], we have that ideology matters in the time to stop. This is also true in ourdata. So I reconcile these two pieces of evidence.5.3 Moving to the DataIn the data, I observe individual characteristics, such as age, education, gender, which Idenote by zi. I observe the ideology, on a one-dimensional spectrum, bi. I observe thepolitical party positions on this same spectrum, and finally, I also observe the stoppingtimes of voters (when they decided on who to vote), as well as who they voted for.I make the following commonly used parametric assumption about cost:ci = ez′iβ+ηi , (5.10)where zi is a vector of observables (e.g. education, age, media access) which affects thecost of processing and acquiring information; and ηi ∼N (0,σ2η) i.i.d. across individualscaptures unobserved heterogeneity.This specification is parsimonious, and makes sure that the cost must be positive. Under(5.10), we can calculate the probability of a voter stopping at any period, given the resultsfrom the previous section. Given the distribution of signals, one can also calculate theprobability of voting for a given party. Denote the probability of voter i choosing party j130and stopping at period t as P(vi = j, ti = t | zi,x;θ). The explicit form for this is presentedin Appendix, as Lemma D.2.1.Having observed individual decisions on who to vote for and stopping times of de-cisions, {vi, ti}, I can construct the likelihood of these choices directly from the lemmasabove. We would like to estimate the parameters θ = (σ ,ση ,β ,{µi,τi}). τ follows fromknowing τi.Since x is unknown, I integrate it out (as we know it follows a Normal distributionof mean 0, variance 1/τ , which we denote below as F(·)). It follows that an individuallikelihood, when we observe i making the choice for j and t when accumulating informationis:Li(θ ;vi, ti,zi) =J∏j=1T∏t=0(∫P(vi = j, ti = t | zi,x;θ)dF(x))I{vi= j,ti=t}I{P(vi 6=p˜|ti=t,bi)−κ>0},(5.11)where the first indicator is 1 if the voting choice and stopping time correspond to the ob-served one, and the second indicator denotes the choice of “In”. If the voter chooses “Out”,then their voting and stopping choices are trivial (equal to 1 for the best according to theirprior and involving no parameters), as seen in Lemma D.2.1.Since the signals are i.i.d. across individuals, I can write the likelihood as:L (θ ;vi, ti,zi) = (5.12)=N∏i=1J∏j=1T∏t=0(∫P(vi = j, ti = t | zi,x;θ)dF(x))I{vi= j,ti=t}I{P(vi 6=p˜|ti=t,bi)−κ>0}=N∏i=1J∏j=1T∏t=0(∫(P(vi = j | ti = t,zi,x;θ)P(ti = t | zi,x;θ))dF(x))I{vi= j,ti=t}I{P(vi 6=p˜|ti=t,bi)−κ>0},where the indicators capture that the vote was for j and decision was made at t, withchoices being made in the information accumulation stage.Since we do not observe x, but yet it is present in (D.7) and we know its prior distri-bution, the parameters of the model can be estimated via simulation techniques. I will useMaximum Simulated Likelihood. Before describing the estimation approach, I first discussthe data and context of the empirical application, which will guide the identification.1315.4 DataThe main source of data in this chapter comes from a series of surveys conducted by IsraelNational Election Studies, Tel Aviv University. These consist of a two-period panel dataset,conducted before and after the election for the same individuals. The stated aim of thesesurveys is to “investigate voting patterns, public opinion, and political participation in Is-rael”. Although the same individuals are surveyed before and after the election in each year,the sample is different across years. I will focus on data from 2006. The choice to focuson 2006 comes from the extended number of measures for parties in 2006, the extendednumber of questions in the 2006 survey compared to the others, as well as an appropriatesetting that allows more variation for identification. This will be described more thoroughlyin the Identification Section.This dataset has been used by such work as [17], in the context of understanding voter’sdifferent beliefs about government formation, [87] for the effects of terrorism on electoraloutcomes and preferences; and are the basis for the series of books by Arian and Shamir(who were also behind the collection of this data) discussing the election and electoralcontext for each year (for example, [14]).The Appendix provides further details on the Israeli political system and the context. Itis particularly useful for the data to be from Israel, because of its proportional representa-tion, multiparty and nationwide election. Proportional representation allows us to focus onnon-strategic components of the decision, as the utility function of our voters will reflectthis as they only wish to vote to the party which most closely represents their interests.The multiple parties in the sample gives us useful variation in decisions and across thepolicy spectrum, allowing us to identify key parameters of the model. Finally, the nation-wide election means I focus on the learning of issues, and not politicians. This is becausethe nationwide election presents a closed list of politicians, who run under the banner ofthe party’s national policy. Table D.3 shows that these were, in fact, the more relevant con-siderations in voting. These contrast significantly with the data used for research on latedeciders, based mostly of US ([169], [34], [44]) or Canada ([73]).In the pre-election survey, citizens are asked about their views on the state of the coun-try, political preferences, individual characteristics (from education, age, to work status,place of birth, religiosity, gender) and further information on their knowledge and partici-pation of the political process (for example, whether they know who is the Speaker of theKnesset; if they know the threshold to enter government, if they access media, if they areaffiliated to a political party). The summary statistics are presented in Table D.1.132In the post-election survey, they are asked about who they voted for, and importantlyfor this chapter, when they made their decision on whom to vote. For 2006, the distributionof the answers to this question are in Figures D.1a - D.1d, and in Table 5.1.Table 5.1: When Did You Finally Decide to Vote for the Party? 2006 surveyTiming Frequency PercentElection Day 278 212-3 days before the Elections 120 9A Week before the Elections 117 9A Month before the Elections 102 8A Few Months before the Elections 127 9From the Beginning I knew what I would do 594 44Total 1,338 100We can see that there is variation in the timing of decisions, as many voters have decided“All along”, but almost 40% deciding in the last week. It seems that for 2006, 2009 and2015, there are strong movements and outcomes in the last week. This would suggest thatthe very competitive elections in 2015 were decided by those deciding in the last week, asLikud wins with a narrow of majority. Meanwhile, the election in 2013 seems to have beendecided from the beginning, as most voters already were sure of voting Likud at that time.For 2006, there seems to be large variations in the timing of decisions and across parties.Regarding the reliability of this answer in survey data, this is deemed appropriate bythe literature in similar contexts. [72] showed using data from the 1997 Canadian Elec-tion Study, that the answer to the question of when a voter decided on whom to vote forwas highly reliable (around 80% accuracy) when compared to the one constructed from themulti-wave version (that asks voting intentions at many moments of the campaign). ForGermany, [155] also presented high values of reliability. These contrast with results fromthe U.S., which has shown much lower results of concordance (between 40% - 60%, as seenin [147],[45]). According to [72], this is due to the different contexts studied. Parliamen-tary systems (such as Israel, Canada and Germany), have much shorter campaigns whencompared to the U.S. In the Israeli case, the campaigns are 3 months long, while the U.S.ones are much longer (often close to 10 months, from the first primaries in January uptoelection day in November).1335.5 IdentificationI wish to identify the following variables ({τi,µi}i∈N ,β ,σ2,ση ,κ). I have constructed thelikelihood above, and deem to observe the votes, the timing decisions t∗i , individual charac-teristics (zi), the party policy positions ({a j}Jj=1).For that, I use a 2-step procedure. First, I begin by identifying all parameters conditionalon knowing the priors {τi,µi}i∈N . In the next subsection, I discuss the separate identificationof the priors.Let us focus first on equation (D.7). Note that there are no ση ,β in this term (they onlyshow up in equation (D.6)). Conditional on having the same stopping time, if we observetwo agents who have the same ideology bi in the data, then different voting decisions mustbe due to the different signals they have received. The dispersion in voters’ choices forthose who have the same number of signals and the same stopping time identifies σ .Consider two individuals who are identical in their observables zi, but who have differ-ent stopping times. Then this must be due to individual heterogeneity (and its dispersionacross the population). This identifies ση . Intuitively, a different number of signals ob-tained, for apparently identical individuals must mean they are different in an unobservedway.Given σ , the priors and ση , we can look at equation (D.6). β is the only unknown.It is identified from variations in observables (for example, education), mapping out todifferential outcomes in stopping times of accumulating information. Any differences ofstopping times (given we have controlled for unobservables), must be coming from theobserved characteristics.Finally, κ is not identified. When an individual chooses not acquire information, shecould be doing so because: (i) κ is too high, (ii) the benefit of an additional signal is toolow (i.e. σ2 is too high), (iii) or her cost ci is too high. I cannot separately identify whetherthe cost of choosing into the acquisition process is the one which leads to not acquiringinformation, or whether κ was low enough that she would want to, but the information wasnot valuable enough to do so.However, I can still estimate the parameters θ from the likelihood above. This is doneconditionally on those who accumulate some information - i.e. decide at some point of thecampaign. Note that, if we observe some information accumulation (i.e. the voter did notrespond she knew “all along”), she must be in the branch of the game tree of informationaccumulation. Finally, since it wouldn’t be optimal to pay κ and not accumulate informa-tion (as one knows m∗) in advance, it must be the case that those who accumulate some134information are the set of agents that satisfy the restriction.So it must be that P(v∗i 6= p˜ | t∗i = t,bi)−κ > 0 for the voters with an optimal stoppingtime t∗i > 0. Hence, I can estimate all the other parameters, with the indicator functionrelated to the choice of “In” being 1 if t∗i > 0.Identification of the PriorsThe identification arguments in the previous section relied on having identified the priors,{µi,τi}Ni=1. I now proceed with the identification of the priors themselves in a separate step.The identification of µi comes directly from the survey data. I use the answer to thequestion: “In your opinion, what is Israel’s general situation?”. This question identifiesexactly the content of µi in the model, which is the median belief about the state of thecountry. I categorize the qualitative answers to this question (very good, good, so-so, bad,very bad), shown in Table 5.2, into a quantifiable measure on the real line.10Table 5.2: Answer to “Israeli condition in general”, Pre-Election surveyAnswer Frequency PercentVery Good 69 4Good 434 23So-So 740 39Not Good 280 15Bad 390 20Total 1,913 100Notes: The table shows the variation in answers (used as first-stage estimates) for themedian prior of the state of the country/nature.The identification of τi (the precision) is not as immediate. This is because there is noinformation in the data about how “sure” voters are about the state of the country, the idealmeasure for τi. To solve this identification problem, I use an additional source of variationprior to the beginning of the campaign. The intuition is the following. Assume that, beforet = 0 in our model - that is, before the game begins - there were two identical voters with10To do so, I begin by setting the median (so-so) to 0, with the others at symmetric intervals of length 1 apart(i.e. 2,1,0,-1,-2).135the same median prior, µi. One of those voters is exposed to additional signals before thecampaign began, exogenously (i.e. a treatment). The treated voter then has median beliefs,observed in the data at t = 0, given by a different prior, µ ′i . This µ ′i is different than µi, theobserved prior of the untreated voter, because of the additional information the treated voterreceived.This difference, µ ′i − µi, will help us identify the precision of voter beliefs. This isbecause as voters update their beliefs following Bayes’ rule, the distance µ ′i−µi is inverselyproportional to τi. To see this formally, denote qi as this exogenous amount of signals thatthis treated voter received. Using a first order linear approximation of µ ′i around qi = 0 wefind that:11:µ ′i (qi) ≈ µi+µiτiσ2pqi. (5.13)The lower the precision of prior beliefs, the faster the voter updates beliefs with newinformation.The variation I use empirically are terrorist attacks across different regions of the coun-try, before the beginning of the 2006 electoral campaign. I use fatalities of Israeli civiliansin terrorist attacks in their cities of residence as an exogenous variation in (pre-election)signals.12 This is possible for two reasons: (i) the pre-2006 period was during the SecondIntifada, with terrorist attacks and fatalities across different parts of Israel13, and (ii) I canobserve the city in which voters reside in the data.As shown by Table 3 in [87], this variation also appears to be exogenous. Terror doesnot seem to be related to the location of residence. This suggests that demographic condi-tions or migration patterns are not related to terrorism at the local level. Another threat toidentification could be that the timing of terrorist attacks could impact the timing of elec-tions. Further empirical evidence suggests, though, that terrorist attacks do not influencethe timing of the elections ([6]).This variation is related to voters’ beliefs about the state of the country, as shown inTable D.3. The issues that voters care about in that election included security, which isencompassed by the random variable x in the model. Terrorist attacks are informative about11The new prior µ ′i is derived by Bayes’ rule on the original prior µi (as in Lemma 5.2.3).12The attacks used are those prior to January 2006, the beginning of the campaign. I will focus on attacks inthe 2 years before January 2006. These choices follow [87].13Data from B’Tselem (e.g. [87]) computes the number of fatalities of Israeli civilians during this period,shown in Table D.4.136the state of the country: it informs voters about the state of security and the provision ofpublic goods. Such attacks are also relevant: they not only change informational content,but can shift voters’ actions, as terrorist attacks lead to increased vote share to the right bloc([23], [87], [105]). In this chapter, the mechanism for these actions are through updates inbeliefs.The likelihood in (5.12) requires knowledge of the priors to be estimated. This suggestswe must first recover estimates of the priors. To implement this first step, we must firstparametrize prior beliefs as a function of individual characteristics, zi. This will allow meto recover latent beliefs (µi) for those who are treated. I begin by assuming that:µi = z′iδ +νi (5.14)1τi= z′iγ+ξi, (5.15)where E(νi | zi,qi) = E(ξi | zi,qi) = 0. These conditions are satisfied if the variationis indeed exogenous. I normalize the variance of the signals before the campaign to 1(σ2p = 1). Substituting into equation (5.13) leads to:µ ′i (qi) = z′iδ +(z′iδ )(z′iγ)qi+ v˜i, (5.16)where v˜i = νi +νiξiqi +νiz′iγqi + z′iδξiqi, which is a mean zero component. qi is the treat-ment of there being civilian fatalities in the location of i’s residence between January 2004-January 2006. Equation (5.16) encompasses the intuition described previously. It will alsoform the basis for the estimation of the priors, together with (5.14)-(5.15).5.6 EstimationEstimation of the parameters ({µi,τi},β ,σ ,ση ,δ ,γ) follows a two-step procedure. In thefirst stage, I use the data to recover {µi} coming from the individual (mode) answer to thequestion “In your opinion, what is Israel’s general situation?”, which captures exactly thefunction of µ in the model. The answers are {Very good,Good,So−So,Not good,Bad}. Itake the median to be 0, and each answer to be spaced a measure 1 apart from each other.The distribution of this answer (and hence, of µi) is seen in Table 5.2.With this measured, I run an OLS regression of equation (5.16), which recovers esti-mates of δ and γ . Using equation (5.15), I recover estimates of τi for all voters. Due to137the complexity of the cross sectional structure of (z′iδ )(z′iγ), I focus on a single dimensionalvariable generated from the first principal component of zi. This variable corresponds toover 95% of the explanatory power of zi. This is discussed further in the Appendix.I then proceed to the second stage of estimating (β ,σ ,ση). These remaining param-eters are then estimated by Maximum Simulated Likelihood (MSL) on the analogous log-likelihood objective function to (5.12), given by:lnL s(θ ;vi, ti,zi)=N∑i=1J∑k=1T∑t=0I{v∗i = j,t∗i =t}I{t∗i >0}ln(1RR∑r=1P(vi = j | ti = t,zi,xr;θ)P(ti = t | zi;θ)),(5.17)where the terms are given by (D.7) and (D.6); R is the number of draws; and x is drawnfrom a Normal distribution with mean 0 and variance τ . Details about the Optimizationroutine to estimate the parameters are given in the Appendix.The simulated part comes from not observing x in the data. Although I know the distri-bution of x, since I do not observe its value, I have to integrate it out. One can simulate itsrealized values by drawing from the assumed Normal distribution.The MSL estimator for the parameters is consistent, according to [165]; with n→ ∞and R→ ∞. If R rises faster than √n, then MSL and ML are equivalent. Note that theobjective function is continuous and well defined, as it involves (smoothed) probabilities ofvoting one candidate (and not a discontinuous indicator function). Monte Carlo simulationshave shown that this model and likelihood work well. The remaining parameters δ and γare consistent under the conditions of conditional mean zero errors νi,ξi described.The data does not have a daily outcome of when each individual decide on who to vote(which I interpret as the stopping of accumulating information). Instead, it provides inter-vals (last day, 2-3 days before the election, and so forth). The model can remain unaltered,but instead of having that the intervals to compare in (5.2.2) are from t to t + 1, I adjust itto be the interval between the sets we see in the data. For example, those who stop “aroundone week before the election” have then stopped earlier than 3 days before the election (thenext interval), but after around 1 month (30 days before). I set T = 90 as it refers to thebeginning of the campaign. This is the timing between the dissolution of the Knesset, onDecember 29, to the election on March 28.For estimation, I have to construct values for the policy dimension of political parties.14.This is a one-dimensional variable located in the left-right spectrum (with usual support of14Denoted there as a j, where j = 1, ...,J are the political parties.1380-10, with 0 being left, and 10, right). To construct this value, I create 2 different measuresand show that they give very similar results. For the first, I use the average values of theanswers to the question of where the voter locates the parties in the spectrum, for the 2006survey data.15 They represent the average perceived location of each political party by thesample.16 My second measure is from the Duke Accountability and Linkages Project for thepolitical parties in Israel (see [107]). These measures are computed from experts’ answersto surveys about political parties. I use the variable “dw”, which stands for overall left-rightplacement. Table 5.3 shows these measures and that they are highly correlated with eachother, as well as with another measure constructed from the 2009 survey from the IsraelNational Election Studies.15This question is only presented in these two years.16The surveys for 2013 and 2015 do not ask this question. The data is also only available for the 8 “major”parties, although these represent the large majority of votes. In the estimation, I will focus on the parties thatwe do have data, as seen in Table D.2, which are a large majority of the total number of votes.139Table 5.3: Different Measurements for Policy Vectors2006 Computation 2009 Computation DALPKadima 4.987 5.314 5.135Likud 6.824 7.188 7.800Labor 3.615 4.053 3.400Israel Beiteinu 7.003 7.792 8.985Mafdal 7.030 7.003 8.942UTJ N/A N/A 7.274Shas 6.119 6.517 7.433Meretz 2.378 3.500 1.158Notes: The table presents different computed policy measures ({a j} j∈J in our model) thatare used in the estimation. The first column uses the average across the surveyed population(in the 2006 Pre-Election Survey, in our main data) for the position of each political partyon the Left (0) to Right (10) spectrum. The second column repeats this measure for the2009 Pre-Election Survey. (These questions are unavailable for 2013 and 2015). The thirdcolumn presents the measures on the left-right spectrum from the Duke AccountabilityProject (DALP), [107], computed from an aggregate of specialists’ opinions. This is re-scaled from the 1-10 spectrum to the 0-10 spectrum of the other measures. The correlationbetween the first and second column is 0.9835; between the first and the third is 0.994, andthe second and third is 0.9792. This shows that the choice of measure does not seem tovary much. In Table 5.4, I present results using the DALP measure.I show estimates that make these sets more flexible: using the exact days in the set,using values between those intervals, and so forth. The results do not significantly change.5.7 ResultsTable 5.4 shows the main set of results. Across specifications, the estimated variance of sig-nals, σ2 and the variance of unobserved cost heterogeneity ση are estimated to be positiveand relatively large. All results are robust across specifications. The standard deviation ofsignals is estimated to be around 2.5 (or 25% of the support of ideology, which is between[0,10]). Yet, since we observe the acquisition of information, it must be the case that it isstill worth it for those with smaller costs to acquire it. This would mean that campaigns oravailable information do not inform very precisely the state of the country or issues, but arestill useful for learning.With regards to observable heterogeneity, these are also statistically significant. We can140see that age has a significant and positive coefficient, pointing out that older voters havea larger cost of acquiring/processing information about the political scenario. Educationseems to have the inverse sign on the pointwise estimates, implying more educated oneshave a smaller cost of acquiring information. However, this is not significant. This indicatesthat the effects of education are coming through the priors: more educated voters havedifferent priors and initial beliefs than others. This explains the reduced form evidence fromTable D.5 that more educated voters stop earlier. Meanwhile, the unobserved heterogeneityimplies that the dispersion in stopping times comes not only from observable traits, as ageand education, but within those groups there are significant differences.Non-Hebrew speakers (interviews conducted in another language) and religious votersdo not seem to have higher costs. However, in the reduced form results this was the case.This is because all of the correlation in the reduced form, under our structure, is shown to bedue to different priors across these groups. All of these results seem robust to the addition ofcovariates (Column (2)), and changes in the definition of the stopping time dates (Columns(3)-(4)). They are also robust to changing the policy measure to the Duke AccountabilityLinkages Project one (DALP), as shown in Column (2) of Table 5.5.Regarding the priors, the estimates of the precisions/variances are shown in Figure 5.1.They show that most voters have priors concentrated around τ = 1, although a very fewhave estimated high precisions. The average precision is 2.35, implying that there is quitea small variance of beliefs on average.5.8 ExtensionsAllowing for Heteroskedastic SignalsSo far, I have assumed that the information comes from a homoskedastic distribution, as σ2is the same for every period. The model allows for some variation of this.We can allow a more general, yet parsimonious time-varying structure:σ2(t) = α0+α1t+α2t2, (5.18)for t = 1, ...,T .However, I must impose additional constraints to guarantee that the problem is well de-fined. The coefficients must be such that σ2(t)≥ 0 ∀t to guarantee non-negative variance.We must also have that our solution is well defined, which means that the term141Table 5.4: Results of the Structural Model(1) (2) (3) (4)Specification (1) Specification (2) Robust Date 1 Robust Date 2σ2 6.601*** 6.713*** 6.707*** 5.074***(0.083) (0.084) (0.084) (0.055)ση 2.517*** 2.482*** 2.435*** 3.040***(0.176) (0.171) (0.158) (0.218)β - Constant -6.98*** -6.781*** -7.175*** -7.421***(0.813) (0.891) (0.872) (1.090)Age 0.038*** 0.036*** 0.035*** 0.0435***(0.008) (0.009) (0.009) (0.011)Education 0.026 0.020 0.017 0.0307(0.048) (0.047) (0.046) (0.058)Gender (Female) -0.533* -0.432 -0.406 -0.523(0.280) (0.287) (0.282) (0.350)Language (Arabic) 0.535 0.486 0.752(0.414) (0.405) (0.517)Language (Russian) 0.900 0.703 1.515(1.744) (1.781) (2.291)Religiosity (observes a little) 0.140 0.091 0.204(0.330) (0.324) (0.400)Religiosity (observes a lot) -0.674 -0.679 -0.808(0.452) (0.444) (0.547)Religiosity (observes all of it) 0.230 0.174 0.135(0.834) (0.816) (1.016)Rooms Per -0.180 -0.198 -0.212Household Member (0.198) (0.197) (0.235)Knowledgeable 0.521 0.498 0.645(0.330) (0.322) (0.402)N 819 819 819 819Standard errors in parentheses, computed by Outer-Product Gradient Approximation.* p< 0.10, ** p< 0.05, *** p< 0.01The table presents results across specifications of the main model, presented in Section 5.3. Theresults use 150 draws from the simulated distribution of the state of nature, x, assumed to beNormal(0,1/τ), where τ is the average of the first stage estimates of τi, which equals 2.35. Theoptimization routine used is shown in the Appendix. The policy measured used for a was the averageperceived location of each party according to the voters’ sample in 2006. Column (1) shows a sim-pler specification, where the cost function only depends on education, age and gender. Column (2)expands this specification, controlling for relevant variables in this setting. Column (3) and Column(4) change our definitions of stopping times: For Column (4), I take the later of the range: “a coupleof months before” is 45 days before the election, about a month before is 15 days, about a week is5 days, and a couple of days before the election is 2 days before. For Column (5), we use 75 daysbefore, 45 days before, 10 days before and 3 days before, with 1 day before for “on the day”. Theresults are consistent across measures and specifications.142Figure 5.1: Distribution of Estimated Priors (Precision)Notes: The graph presents the values for the estimated precision of priors (τi) and of the as-sociated variance of priors (1/τi), estimated in the first stage as described in the Estimationsection.σ2t−1+ τiσ2 −σ2t+ τiσ2is non-decreasing in t. If that was not the case, then we could have non-monotonicstopping times, with more than one possible solution at each point, as well as an ill-defined143likelihood (as we take log of this term in Equation (5.12)).The results with heteroskedastic signals under these restrictions are shown in Column1 of Table 5.5. We can see that even though we allow σ2(t) to be decreasing over time(as long as the differences of the ratios above are decreasing), the results show σ2(t)′ ≥ 0.We see that the coefficient on α1 is positive and significant, indicating that it seems that thesignals get noisier closer to the election. The results of the other parameters, though, seemto hold.This indicates that issue based knowledge is getting sparser as the election draws nearer.This is consistent with the model: voters with the least cost of processing are still willingto incorporate noisier information, as the gain in precision still compensates their low cost.Those deciding in the last day need whatever small informational contribution to help themmake the decision, while those with larger costs and more extreme do not need that.The magnitude is estimated to vary between reasonably precise information at the be-ginning of the campaign (with t = 0), with point estimate close to 0, until the end of thecampaign where the standard deviation σˆ = 3 (or variance of 9, since 90α1 ≈ 9). Thisis much noisier than the information received during most of the campaign and only af-fects the late deciders. Yet, these late deciders, such as swing voters, are fundamental inmany of economics models ([68]) and the signals that ultimately swing them are the worseones available. This inspires the consideration of improvements in information, as will bedescribed in the counterfactual section.5.8.1 Different Distributions for ηThe model does not rely upon the assumption of the Normal distribution of η . Indeed, onecould replace the Normal CDF in equation (D.6) by an arbitrary CDF G(·), yielding:P(ti = t | zi,x;θ)=1−G((ln( 1τi − σ21+τiσ2)− z′iβ )), i f t = 0G((ln( σ2t−1+τσ2 − σ2t+τσ2 )− z′iβ ))−G((ln( σ2t+τiσ2− σ2t+1+τiσ2 )− z′iβ )), i f 0< t < TG((ln( σ2T−1+τiσ2 −σ2T+τiσ2)− z′iβ )), i f t = T,(5.19)with almost no changes in the proof. For robustness, I consider a distribution G that isExponential, with parameter λ . This is done to show that the signs and significance of theestimates of the cost function are similar than those in the baseline model. Identificationfollows as before, with the exception that η is no longer mean zero. λ is now identifiedfrom the variance of η . Column 3 of Table 5.5 shows that the qualitative results still hold.144Table 5.5: Results of Extensions to the Structural Model(1) (2) (3)Heteroskedastic Signals a DALP η ∼ Exp(λ )σ2 5.692*** 6.283***(0.084) (0.098)α0 0.007(0.083)α1 0.106***(0.002)α2 8.565E-07(1.032E-05)ση 5.146*** 2.504***(0.175) (0.289)λ 2.810***(0.235)β - Constant -15.88 -7.069 -9.458***(12.029) (0.931) (0.580)Age 0.069*** 0.037*** 0.023***(0.021) (0.010) (0.008)Education -0.050 0.030 0.037(0.092) (0.049) (0.032)Gender (Female) -1.045* -0.471 -0.117(0.585) (0.294) (0.243)Language (Arabic) 1.535* 0.255 0.770***(0.880) (0.474) (0.254)Language (Russian) 2.621 1.289 1.540**(3.820) (1.568) (0.766)Religiosity (observes a little) -0.117 0.129 -0.341(0.665) (0.345) (0.215)Religiosity (observes a lot) -1.316** -0.682 -0.217(0.942) (0.455) (0.396)Religiosity (observes all of it) 0.00 0.012 0.589(1.634) (0.761) (0.642)Rooms Per -0.368 -0.209 0.120Household Member (0.407) (0.206) (0.091)Knowledgeable 0.765 0.606 0.412*(0.672) (0.326) (0.216)N 819 819 819Standard errors in parentheses, computed by Outer-Product Gradient Approximation.* p< 0.10, ** p< 0.05, *** p< 0.01The table presents results of the extensions to the baseline model, also using 150 draws for the state ofnature, which is still drawn from a Normal distribution with mean 0, precision τ = 2.23 estimated inthe first stage. Column (1) presents the results, when heteroskedastic signals are allowed, as we defineσ2(t) = α0 +α1t +α2t2. Column (2) presents the estimated parameters under the policy measurefrom the Duke Accountability Project. The third column allows η to be Exponentially distributed,with parameter λ . These indicate that the main results of the baseline model are robust to changes inthe structure.1455.9 Model Fit and Specification TestsIn this section, I perform exercises related to the model fit.To see the value of the model fit, I plot the predicted timing of voting decisions withthe estimated parameters against the data, in Figure 5.2. I also plot the predicted votingdecisions themselves (for simplicity, under the baseline specification), grouped by Left,Centre, and Right Wing Parties (under the prior x = 0). We can see that the model fitswell both decisions. It captures the key patterns of the timing of voting decisions (the largeamount of voters deciding in the beginning, followed by the pattern of less voters decidinguntil the increase in the last day). Similarly, the predicted vote shares are close to theobserved ones.However, this model cannot capture two patterns, which although not essential, illus-trate the limits of a Bayesian learning model. First, the model cannot explain the largenumber of voters deciding between 2-3 days before the election, compared to a similarmass deciding between a month and a week before the election. This is because the largertime frame for the latter should yield more voters deciding in a Bayesian environment thanthe amount of voters deciding in the 2-3 days time frame. I attribute this to cues and mea-surement errors in the answers to these questions.17 Furthermore, the model cannot capturewhy 2 parties in the right, that are close to each other, would have significantly differentvote shares (given the reported ideologies by voters). In essence, only ideologies matter(and not other dimensions, such as ethnicity, history and salience of certain parties).5.10 Policy Implications - Pre-election Silence (Blackouts)The results so far point out that information is quite noisy and voters are very heterogeneousin terms of costs and priors. Altogether, this implies large dispersions in the timing of votingdecisions, as well as dispersion in the votes themselves for seemingly identical individuals.One could then think of whether changes in policy could actually improve decision-making by voters. In the model, this would have to come from changing either the signalstructure, or the amount of information available to voters. Blackouts (or pre-election si-lence) constitute a wide spread policy, as shown in [2]. This policy is a (partial) ban oncampaigning or electoral information within a time frame of election day (for example, aday). During this period, it is often the case that candidates may not campaign, polls maynot be released and sometimes their names cannot be used in the media.One motivation for such a policy is a concern for fairness: all voters should have access17Although reliable, the answers in this time frame are more subject to misreporting, e.g. [73]).146(a) Predicted Timing Decisions(b) Model Fit - Voting DecisionsFigure 5.2: Model FitNotes: The first graph shows the predicted stopping times under the estimated parameters.The second graph presents the model fit of the timing of voting decisions, as well as thedecisions themselves. It shows the average expected vote share after simulating the model100 times under the estimated parameters of the baseline specification. It aggregates thevotes by parties (Left is Meretz and Labor), Centre is Kadima, Right are the remainingparties. Simulation is needed to reproduce the signals that voters receive.to the same information. This is a prevalent concern, for example, in countries with differenttime zones, where some information might be available to voters that are not to others attheir time of decision.18 Another, is that noisy information just before the election couldlead to undesired changes in voting decisions.Taking this type of policy to our model, consider a scenario where information accu-mulation in the last period (i.e. no citizen can buy ei,T ). For voters whose optimal stopping18See the article [43] for further details on this reasoning in the media, and [108] for an application in thecontext of primaries.147time is before T , there is no change to their utility or decisions. However, for those whowould have stopped at T , they now have to stop at T −1. The difference in their (ex-ante)utility, is given by Lemma 5.2.2: how much they would have gained by adding anothersignal. That is,Eblackouti (ui, j,T−1(y) |Hi,T−1)−Ei(ui, j,T−1(y) |Hi,T−1) =(σ2T −1+ τiσ2 −σ2T + τiσ2)+ ci< 0.The ex-ante expected welfare loss from this policy is given by:∫i:t∗i =T(σ2T −1+ τiσ2 −σ2T + τiσ2+ ci)di. (5.20)The blackout policy does not allow a subset of voters, who wanted to learn more beforemaking a decision, the opportunity to do so. Noisy information, if it is known to be so, willonly be accumulated by those for whom it will give a gain in precision. Hence, even noisyinformation can be valuable to some. Banning it can lead to a worse precision of beliefs,and more often voting decisions that are not optimal (with the additional signal).Given this, we can calculate what the effect of increasing the 1 day blackout policy to 1week. To do so, we can just plug-in our estimates in equation (5.20), and compare it. I findthat when looking at the welfare (i.e. dividing equation (5.20) by the utility of the wholepopulation), we find that this change induces a 0.7% loss in welfare. This number is smallas the effects are only on the subgroup that would have decided between 7 days and 1 daybefore the election, and the welfare impact of banning is marginal: those voters already hadall the signals upto 7 days before the election. Nonetheless, if we look at the welfare losson that subgroup of late deciders (rather than all voters), the loss is of 2.2%.5.11 ConclusionIn this chapter, I presented a model of endogenous information acquisition and estimatedit with Israeli data. With the estimates, I showed the importance of the timing in decisionmaking. The model is able to capture several features of the data, including how differentsubgroups decide when and whom to vote for. Furthermore, it is able to separate whetherthat is due to the costs of information or to flat priors. We find that subgroups, such asyounger voters, have the smallest costs of information, while other subgroups, such asreligious voters, do not learn due to tight priors to begin with. I then used the model to study148a widely used policy: pre-election blackouts. The theoretical and empirical results illustratethat a policy that might be deemed to promote fairness and equal amounts of information,might actually lead to welfare losses and unequal beliefs. By banning information fromvoters who still wished to learn more, this generates around 0.7-2% of welfare losses in thisempirical setting.Addressing the issue raised in the Introduction, this model illustrates how it is possiblethat randomized controlled trials that find no effects of information on voters (e.g. [101])can coexist with learning by voters. This is due to the timing dimension of voters’ de-cisions. If information in these experiments are introduced late in the campaign, such asa week before election day, most voters would have already decided. Hence, we shouldfind a 0 average effect of information on voting decisions. Nonetheless, the subgroup ofvoters still deciding may still be acquiring and using new information. The timing of deci-sion, therefore, matters both for campaigns and for our interpretation of its importance. ABayesian model of learning is able to match their two cases of positive impacts of informa-tion: (i) when information is introduced months before election day - leading to a sizeableset of undecided voters using that information, with a decaying effect due to additional sig-nals later on, and (ii) an effect when campaigns targeting persuadable voters later in thecampaign.In future work, I would like to address some unexplored issues in the model. First,it is likely that voters receive information and learn from neighbours, social networks andfriends (such as in [85] and [95]), leading to non-standard convergence of beliefs. Withoutdata on these connections, it is hard to incorporate this into an empirical model, especiallywhen trying to identify correlations of signals between groups of individuals. Second, themodel takes a reduced form approach to the supply side of information. We do not modelpoliticians’ incentives to supply information. The empirical results suggest that partiesshould have different messaging incentives at different points of the campaign: this willdepend on the subset of voters still deciding. This is found empirically, as σ2 seems tovary across groups of voters and across time. Although one can interpret the informationstructure available to voters, as the result of some game theoretic outcome of political partiesthat send messages out to voters, it is a promising avenue for future research.149Bibliography[1] The causal effect of media-driven political interest on political attitudes andbehavior. Quarterly Journal of Political Science, 5, 2011. → page 122[2] ACE. 2013. URL https://aceproject.org/ace-en/topics/me/mea/mec03b. → pages121, 146[3] D. Acemoglu. Politics and economics in weak and strong states. Journal ofMonetary Economics, 52(7):1199–1226, 2005. → page 73[4] D. Acemoglu, C. Garcı´a-Jimeno, and J. A. Robinson. State capacity and economicdevelopment: A network approach. American Economic Review, 105:2364–2409,2015. → pages 45, 73, 76, 77, 78[5] C. H. Achen. Social psychology, demographic variables, and linear regression:Breaking the iron triangle in voting research. Political Behavior, 14(3):195–211,1992. → pages 119, 124[6] D. Aksoy. Elections and the timing of terrorist attacks. The Journal of Politics, 76(04):899–913, 2014. → pages 136, 233[7] J. H. Aldrich. Why parties?: The origin and transformation of political parties inAmerica. University of Chicago Press, 1995. → page 82[8] E. Alema´n and E. Calvo. Explaining policy ties in presidential congresses: Anetwork analysis of bill initiation data. Political Studies, 61(2):356–377, 2013. →page 5[9] S. N. Ali, J. K. Goeree, N. Kartik, and T. R. Palfrey. Information aggregation instanding and ad hoc committees. The American Economic Review, 98(2):181–186,2008. → page 124[10] W. D. Anderson, J. M. Box-Steffensmeier, and V. Sinclair-Chapman. The keys tolegislative success in the us house of representatives. Legislative Studies Quarterly,28(3):357–386, 2003. → pages 6, 15, 19[11] L. Anselin. Spatial Econometrics: Methods and Models. Kluwer AcademicPublishers, The Netherlands, 1988. → page 45150[12] S. Ansolabehere, J. M. Snyder, and C. Stewart III. The effects of party andpreferences on congressional roll-call voting. Legislative Studies Quarterly, pages533–572, 2001. → page 86[13] A. Aradillas-Lopez and E. Tamer. The identification power of equilibrium in simplegames. Journal of Business and Economic Statistics, 2008. → page 46[14] A. Arian and M. Shamir. The elections in Israel 2006, volume 1. TransactionPublishers, 2008. → pages 132, 233[15] A. Badev. Discrete games in endogenous networks: Theory and policy. mimeo:University of Pennsylvania, 2013. → page 5[16] P. Ban, D. J. Moskowitz, and J. James M. Snyder. The changing relative power ofparty leaders in congress. mimeo, 2016. → page 81[17] M. A. Bargsted and O. Kedar. Coalition-targeted duvergerian voting: howexpectations affect voter choice under proportional representation. AmericanJournal of Political Science, 53(2):307–323, 2009. → page 132[18] L. M. Bartels. Messages received: the political impact of media exposure.American Political Science Review, 87(02):267–285, 1993. → page 10[19] L. M. Bartels. Messages received: the political impact of media exposure.American Political Science Review, 87(02):267–285, 1993. → page 124[20] L. M. Bartels and S. Jackman. A generational model of political learning. ElectoralStudies, 33:7–18, 2014. → page 122[21] D. A. Bateman, J. Clinton, and J. S. Lapinski. A house divided? political conflictand polarization in the u.s. congress, 1877- 2011. American Journal of PoliticalScience, 61(3):698–714, 2017. → pages 81, 85[22] P. F. Berelson, Bernard; Lazarsfeld and W. MacPhee. Voting - A Study of OpinionFormation in a Presidential Campaign. Chicago: University of Chicago Press,1954. → page 119[23] C. Berrebi and E. F. Klor. Are voters sensitive to terrorism? direct evidence fromthe israeli electorate. American Political Science Review, 102(03):279–301, 2008.→ page 137[24] T. Besley and T. Persson. The origins of state capacity: property rights, taxation andpolitics. American Economic Review, 99(4):1218–1244, 2009. → page 73[25] L. E. Blume, W. A. Brock, S. N. Durlauf, and Y. M. Ioannides. Identification ofsocial interactions. In A. B. Benhabib, J. and M. Jackson, editors, Handbook ofSocial Economics, volume 1B, pages 853–964. Elsevier, 2010. → page 42151[26] L. E. Blume, W. A. Brock, S. N. Durlauf, and R. Jayaraman. Linear socialinteractions models. Journal of Political Economy, 123:444–496, 2015. → pages45, 46, 47, 48[27] R. Blundell and J. M. Robin. Estimation in large and disaggregated demandsystems: An estimator for conditionally linear systems. Journal of AppliedEconometrics, 14:209–232, 1999. → page 64[28] A. Bonica. Mapping the ideological market place. American Journal of PoliticalScience, 58(2):367–386, 2014. → page 85[29] L. Bowen. Time of voting decision and use of political advertising: The sladegorton-brock adams senatorial campaign. Journalism & Mass CommunicationQuarterly, 71(3):665–675, 1994. → page 121[30] Y. Bramoulle´, H. Djebbari, and B. Fortin. Identification of peer effects throughsocial networks. Journal of Econometrics, 150:41–55, 2009. → pages 45, 46, 48, 68[31] K. A. Bratton and S. M. Rouse. Networks in the legislative arena: How groupdynamics affect cosponsorship. Legislative Studies Quarterly, 36(3):423–460,2011. → page 5[32] E. Breza, A. G. Chandrasekhar, and A. Tahbaz-Salehi. Seeing the forest for thetrees? an investigation of network knowledge. Working Paper, 2018. → page 43[33] W. A. Brock and S. N. Durlauf. Discrete choice with social interaction. Review ofEconomic Studies, 68:235–260, 2001. → page 68[34] B. Brox and J. Giammo. Late deciders in us presidential elections. AmericanReview of Politics, 30:333–55, 2009. → pages 120, 121, 132[35] B. C. Burden and T. M. Frisby. Preferences, partisanship, and whip activity in theus house of representatives. Legislative Studies Quarterly, 29(4):569–590, 2004. →page 83[36] A. Cabrales, A. Calvo´-Armengol, and Y. Zenou. Social interactions and spillovers.Games and Economic Behavior, 72(2):339–360, 2011. → pages3, 5, 8, 12, 13, 14, 34, 38[37] B. Caillaud and J. Tirole. Parties as political intermediaries. The Quarterly Journalof Economics, 117(4):1453–1489, 2002. → page 82[38] A. Calvo´-Armengol, E. Pattacchini, and Y. Zenou. Peer effects and social networksin education. Review of Economic Studies, 76:1239–1267, 2009. → page 45[39] C. Camerer. Behavioral Game Theory. Princeton University Press, New York,2003. → page 53152[40] A. C. Cameron, J. B. Gelbach, and D. L. Miller. Robust inference with multiwayclustering. Journal of Business and Economic Statistics, 29:238–249, 2011. →page 70[41] A. C. Cameron, J. B. Gelbach, and D. L. Miller. Robust inference with multiwayclustering. Journal of Business & Economic Statistics, 29(2):238–249, 2011. →page 177[42] T. M. Carsey and G. C. Layman. Changing sides or changing minds? partyidentification and policy preferences in the american electorate. American Journalof Political Science, 50(2):464–477, 2006. → page 122[43] CBC. 2015. URL http://www.cbc.ca/news/canada/british-columbia/canada-election-2015-bc-results-blackout-1.3278157. → page 147[44] S. H. Chaffee and S. Y. Choe. Time of decision and media use during the ford-cartercampaign. Public Opinion Quarterly, 44(1):53–69, 1980. → pages 121, 130, 132[45] S. H. Chaffee and R. N. Rimal. Time of vote decision and openness to persuasion.Political persuasion and attitude change, pages 267–291, 1996. → page 133[46] G. Charness and D. Levin. The origin of the winner’s curse: a laboratory study.American Economic Journal: Microeconomics, 1(1):207–236, 2009. → page 124[47] C.-F. Chiang and B. Knight. Media bias and influence: Evidence from newspaperendorsements. The Review of Economic Studies, page rdq037, 2011. → page 121[48] W. K. T. Cho and J. H. Fowler. Legislative success in a small world: Social networkanalysis and the dynamics of congressional legislation. The Journal of Politics, 72(01):124–135, 2010. → page 5[49] J. Clinton, S. Jackman, and D. Rivers. The statistical analysis of roll call data.American Political Science Review, 98(2):355–370, 2004. → pages 85, 86[50] L. Cohen and C. J. Malloy. Friends in high places. American Economic Journal:Economic Policy, 6(3):63–91, 2014. → page 6[51] T. G. Conley. GMM estimation with cross sectional dependence. Journal ofEconometrics, 92:1–45, 1999. → page 45[52] G. W. Cox and M. D. McCubbins. Legislative Leviathan: Party Government in theHouse, volume 23. Univ of California Press, 1993. → page 82[53] G. W. Cox and M. D. McCubbins. Setting the agenda: Responsible partygovernment in the US House of Representatives. Cambridge University Press, 2005.→ pages 81, 82, 105153[54] E. Dal Bo´ and M. Tervio. Legislative responsiveness: An empirical study. mimeo,2007. → page 81[55] G. De Giorgi, M. Pellizzari, and S. Redaelli. Identification of social interactionsthrough partially overlapping peer groups. American Economic Journal: AppliedEconomics, 2, 2010. → pages 45, 68[56] E. B. De Mesquita. Strategic and nonpolicy voting: a coalitional analysis of israelielectoral reform. Comparative Politics, pages 63–80, 2000. → page 233[57] A. De Paula. Econometric analysis of games with multiple equilibria. AnnualReview of Economics, 5(1):107–131, 2013. → page 5[58] A. Degan. Policy positions, information acquisition and turnout. The ScandinavianJournal of Economics, 108(4):669–682, 2006. → pages 121, 125[59] M. H. DeGroot. Optimal statistical decisions, volume 82. John Wiley & Sons,2005. → pages 127, 128, 229[60] G. Deltas, H. Herrera, and M. K. Polborn. Learning and coordination in thepresidential primary system. The Review of Economic Studies, 83(4):1544–1578,2016. → pages 119, 121[61] L. C. Dodd. Expanded roles of the house democratic whip system - 93rd and 94thcongresses. In Congressional Studies - A Journal of the Congress, volume 7 (1),pages 27–56. US Capitol Historical Society, 200 Maryland AVE NE, Washington,DC 20515, 1979. → page 83[62] S. N. Durlauf and Y. M. Ioannides. Social interactions. Annual Review ofEconomics, 2, 2010. → page 42[63] S. N. Durlauf and H. Tanaka. Understanding regression versus variance tests forsocial interactions. Economic Inquiry, 46, 2008. → page 68[64] H. Eraslan and X. Tang. Identification and estimation of large network games withprivate link information. Working Paper, 2017. → page 46[65] C. L. Evans. Congressional whip count database. In College of William and Mary,mimeo (Online), 2012. → pages 83, 97[66] C. L. Evans and C. E. Grandy. The whip system of congress. In CongressReconsidered, volume 9. CQ Press Washington, DC, 2009. → page 83[67] T. Feddersen and A. Sandroni. A theory of participation in elections. The AmericanEconomic Review, 96(4):1271–1282, 2006. → page 125[68] T. J. Feddersen and W. Pesendorfer. The swing voter’s curse. The AmericanEconomic Review, pages 408–424, 1996. → pages 119, 144154[69] M. P. Fiorina. The political parties have sorted. Hoover Institution Essay onContemporary Politics, Series No. 3, 2017. → pages 4, 18, 32[70] R. Fisman, N. Harmon, and E. Kamenica. Peer effects in legislative voting.Working Paper, Boston University, 2017. → page 6[71] R. Forgette. Party caucuses and coordination: Assessing caucus activity and partyeffects. Legislative Studies Quarterly, 29(3):407–430, 2004. → page 82[72] P. Fournier, R. Nadeau, A. Blais, E. Gidengil, and N. Nevitte. Validation oftime-of-voting-decision recall. The Public Opinion Quarterly, 65(1):95–107, 2001.→ page 133[73] P. Fournier, R. Nadeau, A. Blais, E. Gidengil, and N. Nevitte. Time-of-votingdecision and susceptibility to campaign effects. Electoral Studies, 23(4):661–681,2004. → pages 121, 132, 146[74] J. H. Fowler. Connecting the congress: A study of cosponsorship networks.Political Analysis, 14(4):456–487, 2006. → pages 5, 8, 17, 32[75] N. Gennaioli and I. Rainer. The modern impact of precolonial centralization inafrica. Journal of Economic Growth, 12(3):185–234, 2007. → page 73[76] N. Gennaioli and H.-J. Voth. State capacity and military conflict. Review ofEconomic Studies, 82(4):1409–1448, 2015. → page 73[77] M. Gentzkow. Polarization in 2016. Essay, Stanford University, 2017. → page 5[78] M. Gentzkow and J. Shapiro. Congressional record for 104th-110th congresses:Text and phrase counts. ICPSR33501-v4, 2015. URLhttp://doi.org/10.3886/ICPSR33501.v4. → pages 5, 19, 23[79] M. Gentzkow, J. M. Shapiro, and M. Taddy. Measuring polarization inhigh-dimensional data: Method and application to congressional speech. NBERWorking Paper No. 22423, 2017. → page 85[80] A. Gerber and D. P. Green. Rational learning and partisan attitudes. Americanjournal of political science, pages 794–818, 1998. → pages 119, 124[81] A. Goldfarb and M. Xiao. Who thinks about the competition? managerial abilityand strategic entry in us local telephone markets. American Economic Review, 101:3130–3161, 2011. → page 46[82] P. Goldsmith-Pinkham and G. Imbens. Social networks and the identification ofpeer effects. Journal of Business and Economic Statistics, 31:3:253–264, 2013. →page 5155[83] P. Goldsmith-Pinkham and G. Imbens. Rejoinder: Social networks and theidentification of peer effects. Journal of Business and Economic Statistics, 31:3:279–281, 2013. → page 5[84] P. Goldsmith-Pinkham and G. W. Imbens. Social networks and the identification ofpeer effects. Journal of Business and Economic Statistics, 31(3):253–264, 2013. →page 45[85] B. Golub and M. O. Jackson. Naive learning in social networks and the wisdom ofcrowds. American Economic Journal: Microeconomics, 2(1):112–149, 2010. →page 149[86] J. D. Gopoian and S. Hadjiharalambous. Late-deciding voters in presidentialelections. Political Behavior, 16(1):55–78, 1994. → pages 121, 129[87] E. D. Gould and E. F. Klor. Does terrorism work? The Quarterly Journal ofEconomics, 125(4):1459–1510, 2010. → pages 132, 136, 137[88] C. Gourieroux, A. Monfort, and A. Trognon. Pseudo maximum likelihood methods:Theory. Econometrica, 52(3):681–700, 1984. → page 104[89] B. S. Graham. An econometric model of link formation with degree heterogeneity.Econometrica, 85:1033–1063, 2017. → page 50[90] T. Groseclose and C. Stewart, III. The value of committee seats in the house,1947-91. American Journal of Political Science, 42(2):453–474, 1998. → page 20[91] S. Guarnaschelli, R. D. McKelvey, and T. R. Palfrey. An experimental study of jurydecision rules. American Political Science Review, 94(02):407–423, 2000. → page124[92] J. J. Heckman and J. M. Snyder. Linear probability models of the demand forattributes with an empirical application to estimating the preferences of legislators.The RAND Journal of Economics, 28, 1997. → page 85[93] S. I. M. Hwang. A robust redesign of high school match. Working Paper, 2016. →page 46[94] M. O. Jackson. A survey of models of network formation: Stability and efficiency.Group Formation in Economics; Networks, Clubs and Coalitions, ed. G Demange,M Wooders. Cambridge, UK: Cambridge Univ. Press, 2005. → page 5[95] M. O. Jackson. Social and Economic Networks. Princeton University Press,Princeton, NJ, USA, 2008. ISBN 0691134405, 9780691134406. → pages 5, 149156[96] M. O. Jackson. Comment: Unraveling peers and peer effects (comments ongoldsmith-pinkham and imbens’ “social networks and the identification of peereffects”). Journal of Business and Economic Statistics, 31:3:270–273, 2013. →page 5[97] M. O. Jackson. The friendship paradox and systematic biases in perceptions andsocial norms. http://ssrn.com/abstract=2780003, 2016. → page 5[98] M. O. Jackson and Y. Zenou. Games on networks. volume 4 of Handbook of GameTheory with Economic Applications, pages 95–163. Elsevier, 2015. → page 5[99] J. A. Jenkins. Examining the robustness of ideological voting: evidence from theconfederate house of representatives. American Journal of Political Science, pages811–822, 2000. → page 86[100] I. Johnsson and R. H. Moon. Estimation of peer effects in endogenous socialnetworks: control function approach. Working Paper, 2016. → page 45[101] J. L. Kalla and D. E. Broockman. The minimal persuasive effects of campaigncontact in general elections: Evidence from 49 field experiments. AmericanPolitical Science Review, pages 1–19, 2017. → pages 119, 121, 129, 149[102] H. H. Kelejian and D. Robinson. A suggested method of estimation for spatialinterdependent models with autocorrelated errors, and an application to a countyexpenditure model. Papers in Regional Science, 72:297–312, 1993. → page 68[103] C. Kendall, T. Nannicini, and F. Trebbi. How do voters respond to information?evidence from a randomized campaign. The American Economic Review, 105(1):322–353, 2014. → pages 119, 124[104] D. Kessler and K. Krehbiel. Dynamics of cosponsorship. American PoliticalScience Review, 90(03):555–566, 1996. → page 6[105] A. Kibris. Funerals and elections: The effects of terrorism on voting behavior inturkey. Journal of Conflict Resolution, pages 220–247, 2010. → page 137[106] J. H. Kirkland. The relational determinants of legislative outcomes: Strong andweak ties between legislators. The Journal of Politics, 73(03):887–898, 2011. →page 6[107] H. Kitschelt. Democratic Accountability and Linkages Project. Durham, NC: DukeUniversity., 2013. → pages 139, 140[108] B. Knight and N. Schiff. Momentum and social learning in presidential primaries.Journal of Political Economy, 118(6):1110–1150, 2010. → pages 119, 121, 147[109] G. Koger. Position taking and cosponsorship in the us house. Legislative StudiesQuarterly, 28(2):225–246, 2003. → page 5157[110] E. D. Kolaczyk. Statistcal Analysis of Network Data. Springer Verlag, New York,2009. → page 61[111] K. Krehbiel. Where’s the party? British Journal of Political Science, 23(2):235–266, 1993. → pages 84, 103[112] K. Krehbiel. Paradoxes of parties in congress. Legislative Studies Quarterly, pages31–64, 1999. → page 84[113] K. Krehbiel. Party discipline and measures of partisanship. American Journal ofPolitical Science, pages 212–227, 2000. → pages 81, 84[114] B. Lazarsfeld, Paul F.; Berelson and H. Gaudet. The people’s choice: how the votermakes up his mind in a presidential campaign. New York, Columbia UniversityPress, 1944. → page 119[115] D. Lazer. Networks in political science: Back to the future. PS: Political Science &Politics, 44(01):61–68, 2011. → page 3[116] D. Lee, E. Moretti, and M. Butler. Do voters affect or elect policies? evidence fromthe u.s. house. Quarterly Journal of Economics, 119(3):807–860, 2004. → pages81, 87[117] S. D. Levitt. How do senators vote? disentangling the role of voter preferences,party affiliation, and senator ideology. The American Economic Review, 86(3):425–441, 1996. → page 82[118] G. Loewenstein, T. O’Donohue, and M. Rabin. Projection bias in predicting futureutility. Quarterly Journal of Economics, 118:1209–1248, 2003. → page 44[119] K. Madara´sz. Information projection: Model and applications. Review of EconomicStudies, 79, 2012. → page 44[120] C. F. Manski. Identification of endogenous social effects: The reflection problem.Review of Economic Studies, 60(3):531–542, 1993. → page 45[121] C. F. Manski. Comment: Social networks and the identification of peer effects.Journal of Business and Economic Statistics, 31:3:273–275, 2013. → page 5[122] D. R. Mayhew. Congress: The electoral connection. Yale University Press, 1974.→ page 11[123] D. R. Mayhew. Divided we govern: Party control, lawmaking and investigations,1946-2002. Yale University Press, 2005. → page 5[124] I. McAllister. Rational or capricious? late deciding voters in australia, britain andthe united states. Do political campaigns matter, pages 22–40, 2002. → page 121158[125] M. McBride, G. Milante, and S. Skaperdas. Peace and war with endogenous statecapacity. Journal of Conflict Resolution, 55(3):446–468, 2011. → page 73[126] N. McCarty. Polarization, congressional dysfunction, and constitutional changesymposium. Indiana Law Review, 50:223, 2016-2017. → pages 81, 85[127] N. McCarty, K. T. Poole, and H. Rosenthal. The hunt for party discipline incongress. American Political Science Review, 95(3):673–687, 2001. → page 85[128] N. McCarty, K. Poole, and H. Rosenthal. Polarized America. MIT Press, 2006. →pages 4, 32[129] N. McCarty, K. T. Poole, and H. Rosenthal. Polarized America: The Dance ofIdeology and Unequal Riches. Cambridge: MIT Press, 2006. → pages 81, 82, 85[130] K. M. McGraw. Contributions of the cognitive approach to political psychology.Political Psychology, 21(4):805–832, 2000. → page 124[131] S. R. Meinke. Who whips? party government and the house extended whipnetworks. American Politics Research, 36(5):639–668, 2008. → page 98[132] K. Menzel. Inference for games with many players. Review of Economic Studies,83:306–337, 2016. → page 45[133] A. Mian, A. Sufi, and F. Trebbi. The political economy of the us mortgage defaultcrisis. American Economic Review, 100(5), December 2010. → page 82[134] A. Mian, A. Sufi, and F. Trebbi. The political economy of the us mortgage defaultcrisis. The American Economic Review, 100(5):1967–1998, 2010. → page 39[135] A. Mian, A. Sufi, and F. Trebbi. Resolving debt overhang: Political constraints inthe aftermath of financial crises. American Economic Journal: Macroeconomics, 6(2):1–28, 04 2014. → page 85[136] S. Michalopoulos and E. Papaioannou. Pre-colonial ethnic institutions andcontemporary african development. Econometrica, 81(1):113–152, 2013. → page73[137] C. Migdalovitz. Israel: Background and Relations with the United States.Congressional Research Service, 2010. → page 234[138] W. Minozzi and C. Volden. Who heeds the call of the party in congress? TheJournal of Politics, 75(3):787–802, 2013. → page 85[139] D. J. Moskowitz, J. Rogowski, and J. James M. Snyder. Parsing party polarization.mimeo, 2017. → pages 81, 85, 86, 87159[140] S. Mullainathan and E. Washington. Sticking with your vote: Cognitive dissonanceand political attitudes. American Economic Journal: Applied Economics, 1(1):86–111, 2009. → page 122[141] S. Narayanan and P. Manchanda. Heterogeneous learning and the targeting ofmarketing communication for new products. Marketing Science, 28(3):424–441,2009. → page 121[142] T. P. Nokken. Dynamics of congressional loyalty: Party defection and roll-callbehavior, 1947-97. Legislative Studies Quarterly, pages 417–444, 2000. → page 86[143] P. Ortoleva and E. Snowberg. Overconfidence in political behavior. The AmericanEconomic Review, 105(2):504–535, 2015. → pages 122, 125[144] T. R. Palfrey and K. T. Poole. The relationship between information, ideology, andvoting behavior. American Journal of Political Science, pages 511–530, 1987. →page 130[145] M. Penrose. Random Geometric Graphs. Oxford University Press, New York,USA, 2003. → page 203[146] T. Piketty and E. Saez. Income inequality in the united states 1913 - 1998.Quarterly Journal of Economics, 118(1):1–39, 2003. → page 85[147] E. Plumb. Validation of voter recall: Time of electoral decision making. PoliticalBehavior, 8(4):302–312, 1986. → page 133[148] K. T. Poole and H. Rosenthal. The polarization of american politics. Journal ofPolitics, 46(4):1061–1079, 1984. → page 85[149] K. T. Poole and H. Rosenthal. Congress: A Political-Economic History of Roll CallVoting. New York: Oxford University Press, 1997. → pages 85, 97, 98[150] K. T. Poole and H. Rosenthal. D-nominate after 10 years: A comparative update tocongress: A political-economic history of roll-call voting. Legislative StudiesQuarterly, pages 5–29, 2001. → pages 82, 84, 97[151] R. Powell. Monopolizing violence and consolidating power. Quarterly Journal ofEconomics, 128(2):807–859, 2013. → page 73[152] R. B. Ripley. The party whip organizations in the united states house ofrepresentatives. American Political Science Review, 58(3):561–576, 1964. → page83[153] D. W. Rohde. Parties and Leaders in the Postreform House. The University ofChicago Press, 1991. → page 107160[154] G. C. Routt. Interpersonal relationships and the legislative process. The Annals ofthe American Academy of Political and Social Science, 195(1):129–136, 1938. →page 3[155] R. Schmitt-Beck and J. Partheymu¨ller. Why voters decide late: A simultaneous testof old and new hypotheses at the 2005 and 2009 german federal elections. GermanPolitics, 21(3):299–316, 2012. → page 133[156] J. Schneider. One-minute speeches: Current house practices. CRS Report,RL30135, 2015. → page 19[157] G. R. Shorack. Probability for Statistics. Springer, New York, 2000. → page 195[158] J. M. Snyder Jr and T. Groseclose. Estimating party influence in congressionalroll-call voting. American Journal of Political Science, pages 193–211, 2000. →pages 84, 85[159] F. Sobbrio. Citizen-editors’ endogenous information acquisition and news accuracy.Journal of Public Economics, 113:43–53, 2014. → page 121[160] K. Song. Econometric inference on large bayesian games with heterogeneousbeliefs. arXiv:1404.2015 [stat.AP], 2014. → page 45[161] C. Stewart, III. The value of committee assignments in congress since 1994. MITPolitical Science Department Research Paper 2012-7. Available at SSRN:https://ssrn.com/abstract=2035632, (2):453–474, 2012. → pages 20, 21[162] C. Stewart, III and J. Woon. Congressional committee assignments, 103rd to 114thcongresses, 1993–2017: House, august 12, 2016. Data Archive, 2016. → page 20[163] The Los Angeles Times. 2005. URLhttp://articles.latimes.com/2005/nov/10/world/fg-izlabor10. → page 234[164] S. M. Theriault. Party Polarization in Congress. New York: Cambridge UniversityPress., 2008. → page 81[165] K. E. Train. Discrete choice methods with simulation. Cambridge university press,2009. → page 138[166] F. Trebbi and E. Weese. Insurgency and small wars: Estimation of unobservedcoalition structures. Working Paper, 2016. → page 16[167] L. Van Boven, G. Loewenstein, and D. Dunning. Mispredicting the endowmenteffect: Underestimation of owners’ selling prices by buyer’s agents. Journal ofEconomic Behavior and Organization, 51:351–365, 2003. → page 44161[168] Y. X. Wang and Y. J. Zhang. Nonnegative matrix factorization: A comprehensivereview. IEEE Transactions on Knowledge and Data Engineering, 25(6):1336–1353,2013. → page 20[169] D. C. Whitney and S. B. Goldman. Media use and time of vote decision a study ofthe 1980 presidential election. Communication Research, 12(4):511–529, 1985. →pages 121, 130, 132[170] R. K. Wilson and C. D. Young. Cosponsorship in the u. s. congress. LegislativeStudies Quarterly, 22(1):25–43, 1997. → page 6[171] J. M. Wooldridge. Econometric analysis of cross section and panel data. MITpress, 2010. → page 104[172] H. Xu. Social interactions in large networks: A game theoretic approach. WorkingPaper, 2015. → page 45[173] X. Xu and L. Lee. Estimation of a binary choice game model with network links.Working Paper, 2015. → page 45[174] C. Yang and L.-F. Lee. Social interactions under incomplete information withheterogeneous expectations. Journal of Econometrics, Forthcoming, 2016. → page45[175] Y. Zhang, A. Friend, A. L. Traud, M. A. Porter, J. H. Fowler, and P. J. Mucha.Community structure in congressional cosponsorship networks. Physica A:Statistical Mechanics and its Applications, 387(7):1705–1712, 2008. → page 5162Appendix AAppendix to Chapter 2A.1 ProofsLemma A.1.1.∑j 6=isis jmi j(s) = si. (A.1)Proof.∑j 6=isis jmi j(s) = ∑j 6=i, j∈P(i)sis jmi j(s)+ ∑j 6=i, j/∈P(i)sis jmi j(s)= ∑j 6=i, j∈P(i)sis j(p(i)p( j)∑k∈P(i),k 6=i p(k)sk+(1− p(i)) (1− p( j))∑k 6=i(1− p(k))sk)++ ∑j 6=i, j/∈P(i)sis j(1− p(i)) (1− p( j))∑k 6=i(1− p(k))sk= si(p(i)(∑ j 6=i, j∈P(i) p( j)s j∑k 6=i,k∈P(i) p(k)sk)+(1− p(i))∑ j 6=i, j∈P(i)(1− p( j))s j∑k 6=i(1− p(k))sk)++si(1− p(i))(∑ j 6=i, j/∈P(i)(1− p( j))s j∑k 6=i(1− p(k))sk)= si(p(i)+(1− p(i))(∑ j 6=i, j∈P(i)∑ j 6=i, j/∈P(i)(1− p( j))s j∑k 6=i(1− p(k))sk))= si(p(i)+(1− p(i))(∑ j 6=i(1− p( j))s j∑k 6=i(1− p(k))sk))= si.163Proposition 2.2.1: The limit equilibrium is defined by equations (2.22)-(2.24).Proof of Proposition 2.2.1. Recall that we have from equations (2.17) and (2.16), from theFirst Order Conditions, that:c =αix∗i+s∗2ix∗2i, (A.2)ands∗ix∗i= φ∑j 6=is∗jmi j(s∗)x∗j . (A.3)We also use that:x∗i = αiXP(i) (A.4)s∗i = αiSP(i), (A.5)for some XP(i),SP(i), which comes from the fact thats∗ix∗iand x∗iαi are the same for all agentswithin a party. Let P(i) ∈ {1,2} be arbitrary.Using (A.4) in (2.17) implies:c =αix∗i+s∗2ix∗2i=αiαiXP(i)+α2i S2P(i)α2i X2P(i)=1XP(i)+S2P(i)X2P(i).Multiplying both sides by X2P(i) yields:cX2P(i) = XP(i)+S2P(i), (A.6)which is (2.24).Let us now substitute (A.4) in (2.16):164αiSP(i)αiXP(i)= φ∑j 6=iα jSP( j)mi j(s∗)α jXP( j)SP(i)XP(i)= φ∑j 6=iα2j XP( j)SP( j)mi j(s∗)= φ∑j 6=iα2j XP( j)SP( j)(p(i)p( j)∑k∈P(i),k 6=i p(k)s∗k+(1− p(i)) (1− p( j))∑k 6=i(1− p(k))s∗k)I{ j∈P(i)}+φ∑j 6=iα2j XP( j)SP( j)((1− p(i)) (1− p( j))∑k 6=i(1− p(k))s∗k)I{ j/∈P(i)}.Note that for the first two terms, p(i) = p( j) because they are only summed whenj ∈ P(i). For the last, p(i) 6= p( j) as it is summed when j /∈ P(i).Rewriting the above with this implies:SP(i)XP(i)= φ∑j 6=iα2j XP(i)SP(i)(p(i)p(i)∑k∈P(i),k 6=i p(i)s∗k+(1− p(i)) (1− p(i))∑k 6=i(1− p(k))s∗k)I{ j∈P(i)}+φ∑j 6=iα2j XP( j)SP( j)((1− p(i)) (1− p( j))∑k 6=i(1− p(k))s∗k)I{ j/∈P(i)}.Using that s∗k = αkSP(k) leads to:SP(i)XP(i)= φ∑j 6=iα2j XP(i)SP(i)(p(i)2p(i)∑k∈P(i),k 6=iαkSP(k)+(1− p(i))2∑k 6=i(1− p(k))αkSP(k))I{ j∈P(i)}+φ∑j 6=iα2j XP( j)SP( j)((1− p(i))(1− p( j))∑k 6=i(1− p(k))αkSP(k))I{ j/∈P(i)}.Let us focus on the case of P(i) = 1, as the other case is symmetric.S1X1= φ∑j 6=iα2j X1S1(p1∑k∈P(i),k 6=iαkS1+(1− p1)2∑k 6=i(1− p(k))αkSP(k))I{ j∈P(i)}+φ∑j 6=iα2j X2S2((1− p1)(1− p2)∑k 6=i(1− p(k))αkSP(k))I{ j/∈P(i)}.Finally, we use that:165∑k 6=i(1− p(k))αkSP(k) = ∑k 6=i,k∈P(i)(1− p(k))αkSP(k)+ ∑k 6=i,k/∈P(i)(1− p(k))αkSP(k)= ∑k 6=i,k∈P(i)(1− p1)αkS1+ ∑k 6=i,k/∈P(i)(1− p2)αkS2= (1− p1)S1 ∑k 6=i,k∈P(i)αk +(1− p2)S2 ∑k 6=i,k/∈P(i)αk= (1− p1)S1A1+(1− p2)S2A2.To finalize the calculations, we use the simplification above for the denominators of thesecond and third terms.Note that only α j is now a function of the summand j itself, in the main expression. Wealso note that we can now use the indicators of j ∈ P(i) for the first two terms, and j /∈ P(i)of the last term, within sums. These observations lead to the final equation:S1X1= φX1S1∑j 6=iα2j(p1S1∑k∈P(i),k 6=iαk+(1− p1)2(1− p1)S1A1+(1− p2)S2A2)I{ j∈P(i)}+X2S2φ∑j 6=iα2j((1− p1)(1− p2)(1− p1)S1A1+(1− p2)S2A2)I{ j/∈P(i)}= φX1S1 ∑j 6=i, j∈P(i)α2j(p1S1A1+(1− p1)2(1− p1)S1A1+(1− p2)S2A2)+X2S2φ ∑j 6=i, j/∈P(i)α2j((1− p1)(1− p2)(1− p1)S1A1+(1− p2)S2A2)= φX1S1B1(p1S1A1+(1− p1)2(1− p1)S1A1+(1− p2)S2A2)+X2S2φB2((1− p1)(1− p2)(1− p1)S1A1+(1− p2)S2A2)= φ(X1B1 p1A1+X1S1B1(1− p1)2(1− p1)S1A1+(1− p2)S2A2 +X2S2B2(1− p1)(1− p2)(1− p1)S1A1+(1− p2)S2A2)= φ(p1X1B1A1+(1− p1)2B1S1X1+(1− p1)(1− p2)B2X2S2(1− p1)A1S1+(1− p2)A2S2).Proof of Proposition 2.2.2. Recall that an interior equilibrium is a solution to (2.22) to(2.24).166So, rewriting these:S1 = X1φ(p1B1X1A1+(1− p1)2B1S1X1+(1− p1)(1− p2)B2S2X2(1− p1)A1S1+(1− p2)A2S2). (A.7)S2 = X2φ(p2B2X2A2+(1− p2)2B2S2X2+(1− p1)(1− p2)B1S1X1(1− p1)A1S1+(1− p2)A2S2). (A.8)cX21 = X1+S21, cX22 = X2+S22. (A.9)Substituting (A.7) into (A.9) leads tocX21 = X1+X21 φ2(p1B1X1A1+(1− p1)2B1S1X1+(1− p1)(1− p2)B2S2X2(1− p1)A1S1+(1− p2)A2S2)2. (A.10)orcX1 = 1+X1φ 2(p1B1X1A1+(1− p1)2B1S1X1+(1− p1)(1− p2)B2S2X2(1− p1)A1S1+(1− p2)A2S2)2. (A.11)There is a similar expression for S2,X2. Note that the right hand side of (A.11) liesabove the left hand side as we approach X1 = 0 (same for X2). To have an interior solution,we need the right hand side to sometimes fall at or below the left hand side for positive X1.Suppose that the equilibrium (when it exists) is such that X1 ≥ X2, and the other case isanalogous just reversing subscripts everywhere. Then the right hand side is less than whatwe get by replacing X2 by X1, and so we wantcX1 ≥ 1+X31 φ 2(p1B1A1+(1− p1)2B1S1+(1− p1)(1− p2)B2S2(1− p1)A1S1+(1− p2)A2S2)2. (A.12)for some interior X1. RewritingcX1 ≥ 1+X31 φ 2(p1B1A1+(1− p1)2B1+(1− p1)(1− p2)B2 S2S1(1− p1)A1+(1− p2)A2 S2S1)2. (A.13)The right hand side is maximized either at S2S1 = 0 orS2S1= ∞, and so it is sufficient to havecX1 ≥ 1+X31 φ 2(p1B1A1+(1− p1)max[B1A1,B2A2])2. (A.14)167LetD1 = p1B1A1+(1− p1)max[B1A1,B2A2](A.15)Then (A.14) can be rewritten ascX1 ≥ 1+X31 φ 2D21. (A.16)for some positive X1. Note thatD1 ≤ D = max[B1A1,B2A2](A.17)So, it is sufficient to havecX1 ≥ 1+X31 φ 2D2. (A.18)for some positive X1.It is necessary and sufficient to check that the left hand side and right hand side aretangent at the point at which the slope of the right hand side is c. This happens at X1 =√c3φ2D2 and then the corresponding sufficient condition becomes:c(c3φ 2D2)1/2≥ 1+(c3φ 2D2)3/2φ 2D2, (A.19)or2c3/23√3≥ φD. (A.20)This is the claimed expression.Proof of Proposition 2.2.3. Assume by way of contradiction that there is {sFBi ,xFBi } firstbest that is finite. Consider the allocation {s′i,x′i}= {λ sFBi ,λxFBi }, with λ > 1. The later in-creases all politician’s utility by a cubic rate (from equation (2.13)), while the costs increasequadratically. Hence, {s′i,x′i} is feasible and yields a higher utility to all agents, which is acontradiction.168A.1.1 Best Response DynamicsBest response dynamics are described as follows. Consider starting at some vectors s0,x0.Then the best response dynamics are described by:sti = xt−1i φ∑j 6=imi j(st−1)st−1j xt−1j , (A.21)andxti =αic+ st−1iφc ∑j 6=imi j(st−1)st−1j xt−1j . (A.22)It follows that if s0 = 0, then mi j(st−1) = 0 for all i j (recall Footnote 7) and we getimmediate convergence to sti = 0,xti =αic for all t. Otherwise, st ,xt will be positive for all t.To see how these best response dynamics work for a special case, let us consider thesituation in which there is some S0,X0 such that s0i = αiS0 and x0i = αiX0 (which has toeventually hold at any limit point).1In that case, working with the limiting or continuum case, in which the matching func-tion is symmetric within a party, and presuming that St−1k > 0 for each party (which happensafter the first period if some s0j > 0 and otherwise the solution is already described above),we end up with the following dynamics. For party k (letting k′ denote the other party):Stk = Xt−1k φ(mkk(St−1)BkSt−1k Xt−1k +mkk′(St−1)Bk′St−1k′ Xt−1k′), (A.23)andX tk =1c+St−1kφc(mkk(St−1)BkSt−1k Xt−1k +mkk′(St−1)Bk′St−1k′ Xt−1k′). (A.24)wheremkk(St−1) =pkSkAk+(1− pk)2(1− p1)S1A1+(1− p2)S2A2 , (A.25)andmkk′(St−1) =(1− p1)(1− p2)(1− p1)S1A1+(1− p2)S2A2 . (A.26)1This is also useful in determining the instability of equilibria.169A.2 Parametric Identification, with set identification of φRecall from equation (2.28) that αi = ez′iβP(i) , and that we observe proxies for (s∗i ,x∗i ).We also have that mi j =(1−p1)(1−p2)(1−p1)A1S1+(1−p2)A2S2 if i, j are from opposing parties and mi j =p(i)AP(i)SP(i)+ (1−p(i))2(1−p1)A1S1+(1−p2)A2S2 if they are from the same. Let us denote mi j = m12, if i, jbelong to different parties, mi j = m11 if they both belong to party 1, and mi j = m22 if theyboth belong to party 2.We now proceed with identification of the parameters of the model. Applying (2.28)and (2.26) in (2.21), for an arbitrary politician i from party P(i) we obtain:s˜ieλi = ez′iβP(i)SP(i),and (A.27)log(s˜i) = log(SP(i))+ z′iβP(i)−λi. (A.28)Since Eλi = Eziλi = 0, we now have elementary moment conditions (like an OLS) toestimate α .2 The moment conditions (A.27) for each party identify the respective party spe-cific parameters. To see this more clearly, one can use the moment equations just describedto get:E(log(s˜i)− log(SP(i))− z′iβP(i))= 0 (A.29)Ezi(log(s˜i)− log(SP(i))− z′iβP(i))= 0. (A.30)As long as E[1, zi][1, zi]′ is invertible, then βP(i) and log(SP(i)) are identified. βP(i)is identified off the different zi for members of the same party. SP(i) is identified from theaverage proxy for effort within a party (the constant in the regression within a party).3Similarly for x˜i:x˜ievi = ez′iβP(i)XP(i)log(x˜i) = log(XP(i))+ z′iβP(i)− vi, (A.31)which can be written as another moment condition in terms of the i.i.d. mean zero2Identification with nonparametric α is proved in a following Appendix.3Implicitly, we require that zi do not include the constant, as that cannot be separately identified fromlog(SP(i)) without further assumptions. The average (log) socializing is party-specific.170random variable vi. XP(i) can be similarly identified for each party.Since we know βP(i), SP(i),XP(i) for both parties, c is identified from equation (2.24),where it is the only unknown.However, p1, p2 cannot be uniquely identified from the system above. It is clear thatp1, p2,φ show up only in the same 2 equations: (2.22) and (2.23). To identify φ , we pursuea set identification approach. To do so, let us first notice that equations (2.22) - (2.23) canbe rewritten as4:S1 = φX1 (B1S1X1m11+B2S2X2m12) (A.32)S2 = φX2 (B2S2X2m22+B1S1X1m12) . (A.33)To proceed with set identification of φ , we calculate all triples m11,m12,m21 that areconsistent with some p1, p2 ∈ [0,1]× [0,1]. Hence, any φ that satisfies the above equationsfor some (p1, p2) ∈ [0,1]× [0,1] is identified.5A few comments about mi j(s∗) are in order. Even if mi j(s∗) was recovered uniquely,that would not guarantee a unique identification of p1, p2. There might be multiple p1, p2that can yield the same meeting probabilities (usually a continuum of them, governed bya hyperbolic function). Second, mi j(s∗) is not a parameter of interest for us, since it is anormalization on the gi j function that governs the probability of linking.Having identified all other parameters of the model, we can now prove the identificationof the parameters γ,ζ . To do so, we use the data on bill passage.4This follows from an alternative rewriting of the first order conditions presented in the model. Namely, weuse that:SP(i)XP(i)= φ∑j 6=iα2j SP( j)XP( j)mi j(s∗)= φ(∑j 6=i, j∈P(i)α2j SP( j)XP( j)mi j(s∗)+ ∑j/∈P(i)α2j SP( j)XP( j)mi j(s∗))= φ(SP(i)XP(i)mi j(s∗)Ii, j∈P(i) ∑j 6=i, j∈P(i)α2j +SP(−i)XP(−i)m12(s∗) ∑j/∈P(i)α2j).5In Appendix, we also provide a solution to point identify p1, p2, but that relies on additional momentconditions, coming from the second moments of the error terms of the proxy variables (s˜i, x˜i). However, sucha solution requires additional structure outside of the theoretical model.171Identifying the components of φ : γ and ζ .We recall that φ = ζγm , where γ was the scale parameter of the distribution of εi (the shockon bill approval, such that the same politician could have different bills passing or not), mwas the institutional threshold for approval of a bill (i.e. the minimum amount of supportneeded for approval), and ζ was the return to the politician from the voters of having thebill approved. We further assumed that m = 1.If we want to identify the components of γ and ζ , we can use the probability of billapproval equation (2.9):P(yi = 1) =γm∑j 6=igi j(s∗)x∗i x∗j . (A.34)This, in turn, can be rewritten using the First Order Condition for si (equation (A.3)) as:P(yi = 1) =γm∑j 6=igi j(s∗)x∗i x∗j=γms∗i x∗i ∑j 6=is∗jx∗jmi j(s∗)=γms∗i x∗i(s∗iφx∗i)=γmφs∗2i=1ζs∗2i . (A.35)where P(yi = 1) is the probability that a bill from politician i is approved.Since s∗i = s˜ieλi , we can rewrite (A.35) as:P(yi = 1) =1ζs˜2i e2λi . (A.36)Taking logs implies that:logP(yi = 1) = log(1ζ)+ log(s˜2i )+2λi. (A.37)Since Eλi = 0, taking expectations on both sides means the only unknown is ζ , which172is identified.ζ must also satisfy the following restriction, due to γ being the scale parameter of thePareto distributed εi6:mφs∗2i≥ γ, ∀iζ ≥ miniα2i S2P(i)Since φ had been previously (set) identified, and m = 1, then γ = φζ is (set) identified.This completes the identification of the model. Finally, we note the importance of themeasurement errors for identification of this model.A.2.1 The need for exponential measurement errors even when α isparametric.The exponential form for the measurement errors is very useful for two main reasons. First,we bypass the truncation issue (having to guarantee that s˜i ≥ 0 for any λ ). Second, it is verytractable (as seen in equation (A.27)).One could think that measurement errors would not have to show up on s∗i and/or x∗i .However, measurement errors on s∗i ,x∗i are needed even with α being parametric. To seethis, consider dividing equation (2.21) by (2.20):s∗ix∗i=SP(i)XP(i). (A.38)The α’s cancel out, and the model is rejected as (A.38) does not hold in the data (withoutrandomness). Hence, there must be an error term in s∗i and x∗i .6This, in turn, implies that the support for εi is [γ,∞)173A.3 Rewriting the Model in terms of Moment conditions over iIn this section, we provide the derivation for transforming the model from the equations inProposition 2.2.1 to the moment equations described in section 2.4. We begin by consider-ing the equations derived in the Identification section.We first note that one could stack-up equations (A.27) and (A.31) across both parties,to get:log(s˜i) = log(S1)+(log(S2)− log(S1))Ii∈P2 + z′iβ1+ z′iIi∈P2(β2−β1)−λi,log(x˜i) = log(X1)+(log(X2)− log(X1))Ii∈P2 + z′iβ1+ z′iIi∈P2(β2−β1)− vi,where Ii∈P2 is an indicator of whether i is in party 2. We have simply introduced thedummy variable for party 2 to stack up the equations. Just like in Ordinary Least Squares,the parameter in front of the indicator recovers the difference of that variable across parties.Note that the above equations can be rewritten as:log(s˜i) = z˜iβs−λi, (A.39)log(x˜i) = z˜iβx− vi, (A.40)where z˜i = [1, Ii∈P2 ,z′i,z′iIi∈P2 ], βs = [log(S1), log(S2)−log(S1),β1,β2−β1], βx = [log(X1), log(X2)−log(X1),β1,β2−β1].Recall that Eλi = Eziλi = 0, which is now rewritten together as Ez˜iλi = 0. Similarly,Evi = Ezivi = 0 is rewritten as Ez˜ivi = 0.We are now ready to rewrite the model in terms of moment conditions. Equations (A.39)and (A.40) are rewritten in terms of moments as:Ez˜i(log(s˜i)− z˜′iβs) = 0, (A.41)Ez˜i(log(x˜i)− z˜′iβx) = 0. (A.42)Each of the above equations is k by 1, where k is the dimensionality of z˜i. Note that theconstant is included in z˜i.Focusing on (2.24), for all i in Party 1, one way we can rewrite it is:1740 = c− 1X1− S21X21= c− 1X1− s2ix2i= c− 1X1− s˜2i e2(λi−vi)x˜2i.Simplifying and taking logs:log(c− 1X1) = 2log(eλi−vis˜ix˜i)= 2(log(s˜i)− log(x˜i)+λi− vi).where we have used equations (2.20) and (2.21) and rewritten it in terms of our proxies(s˜i, x˜i).7Since an analogous version holds for Party 2, one can rewrite the equation above acrossparties as:E(2(log(s˜i)− log(x˜i))− log(c− 1X1)Ii∈P1− log(c− 1X2)Ii∈P2)= 0, (A.43)where we have taken expectations on both sides over λi,vi, across all agents. This canbe rewritten as equation (2.32) by replacing Ii∈P1 = 1− Ii∈P2 .Finally, to rewrite equation (2.33), we simply take expectations over both sides of equa-tion (A.37).A.4 Details on EstimationWe now provide further details on how the estimation procedure was implemented. Thisincludes how we generate starting values for the numerical solution to the GMM optimizer,as well as how the Standard Errors were calculated.7We can replace X1 as well (as a separate equation), and get analogous equations. They will be linearlydependent, as we are already using (2.20) and (2.21) in the estimation, and the previous equation has beenintroduced.175A.4.1 OLS and plug-in Approach as Starting Values for OptimizationFor the starting values for GMM optimization, we use a simple closed form estimate forthe parameters of interest. This new estimator is the OLS estimator for a subset of theparameters, and a plug-in of those for the remaining parameters. Such an estimator isconsistent, but inefficient. Yet, it is a good starting value for the optimization procedure.The inefficiency comes from an OLS approach to equations (2.36) and (2.37) neglectingthat βDem,βRep are the same across both equations. To note that we can find this OLSestimator, (2.36) and (2.37) are the moment conditions associated with an OLS problem.The OLS estimator is then given by:βˆs = (Z˜′Z˜)−1Z˜′log(s˜), (A.44)βˆx = (Z˜′Z˜)−1Z˜′log(x˜), (A.45)where Z˜ is an N by k matrix of z˜′i, log(s˜) and log(x˜) are the vectors of log(s˜i) and log(x˜i)respectively.By our definition of βs,βx, such an estimator is consistent for log(S1), log(X1), log(S2), log(X2)(from the first two parameters in each of βs,βx). Similarly, we recover consistent estimatesfor β1,β2 in each equation. One replaces them into either equation (2.24) to recover c (thatwas not present in any of the other equations).A.4.2 Computation of the Optimal Weighting Matrix and of StandardErrorsThe Optimal Weighting Matrix and Computation of Estimates for the StandardErrorsTo compute the standard errors for our GMM estimates, we use a two step procedure thatis common in the literature. First, we estimate the model using an inefficient weightingmatrix W = I, with I the identity matrix, with starting values from the OLS and plug-inapproach (described above). Given these inefficient estimates, we then compute the optimalweighting matrix for GMM.The optimal weighting matrix for GMM is defined as W =Ω−1, whereΩ=E(g(s˜i, x˜i,yi,zi,θ)g(s˜i, x˜i,yi,zi,θ)′).As it is well known, the asymptotic variance matrix (of√n times) our parameters of interestis then given by (Γ′Ω−1Γ)−1, where Γ= E ∂g(s˜i,x˜i,θ)∂θ ′ .We compute Γ analytically, by taking derivatives of each moment equation in relation176to each parameter. We then replace the expectation by its empirical counterpart (the meanacross all politicians).Finite Sample Corrections for the Standard ErrorsIn finite samples, both Ω and (Γ′Ω−1Γ) can be close to singular. Hence, we provide correc-tions that improve the finite sample performance.For Ω, we implement the correction used in [41]. This involves increasing the standarderrors in Ω by adding a small perturbation to its eigenvalues. This perturbation is sufficientto remove singularity.Such a procedure uses the spectral decomposition of Ω= DΛD′, where Λ is a diagonalmatrix of eigenvalues. We then add a small δΩ> 0 to the diagonal of Λˆ, therefore increasingthe eigenvalues of Ωˆ. Since this procedure increases standard errors, the new standard errorsare still valid for our parameters.In practice, we pick δΩ = 0.0001, and use it on the eigenvalues that are smaller than10−7. This is typically 1 or 2 of the eigenvalues of our estimated Ω. This correction is usedfor both the calculation of the optimal weighting matrix, as well as for the standard errorsof the parameters.However, such a correction might still be insufficient to guarantee that our estimatedvariance matrix 1n(Γˆ′Ωˆ−1Γˆ)−1 has good finite sample performance, as Γ is quite sparse.8 Tosolve this, we use a perturbed estimator, that is still consistent.The estimator we use adds small perturbations to Γ, in a similar fashion to the cor-rection for Ω. That is, we add a sequence δn,Γ to the eigenvalues of (the inverse of) ourestimated variance matrix, where δn,Γ→ 0 as n→ ∞ is a sequence of small perturbations.This estimator is still consistent for our variance matrix. In practice, we pick δn,Γ = 10−7and replace the eigenvalues that are smaller than 10−7 by it.Our results are robust to the choices of δn,Γ,δΩ.A.5 Computation of Comparative StaticsOur estimates of SDem,SRep,XDem,XRep, are useful for our comparative statics and fitting ourmodel. However, those estimates are not necessarily the values that solve the equilibriumsystem in Proposition 2.2.1. They are estimates that solve the system in expected value (themoment conditions), as we only observe proxies for social and legislative effort.8The parameters c and ζ , for example, only appear in one moment condition each.177To calculate the values that are consistent with our model, we solve the system in Propo-sition 2.2.1 using those values as starting points, under the estimated values of (c,φ ,βDem,βRep).The solution are values S∗Dem,S∗Rep,X∗Dem,X∗Rep that, under the estimated parameters, solvethe original game.For example, when we are interested in the changes in the probability of bill approval,we use S∗P(i),αi and equation (2.33) to get our results in the following way:P(yi = 1) =1ζ(αiS∗P(i))2. (A.46)178A.6 Identification and Estimation using second moments of theproxies of (s∗i ,x∗i )In this section, we impose additional structure on the measurement errors of the proxies,denoted λi,vi, such that we can recover point identification of p1, p2,φ . In the followingsubsection, we then estimate this version of the model.As before, let us introduce the measurement errors λi,vi, such that we observe proxiesof (s∗i ,x∗i ) of the following form:s˜i = s∗i e−λi (A.47)x˜i = x∗i e−vi . (A.48)However, we now assume that:λi = ωi p(i)+ui (A.49)vi = ωi p(i)+ηi, (A.50)with ui,ωi,ηi being i.i.d. across i and we have the (standard) assumptions, for all i:E(ui | zi,ωi, p(i)) = 0,Eu2i = σ2uE(ηi | zi,ωi, p(i)) = 0,Eη2i = σ2ηE(ωi | zi, p(i)) = 0,Eω2i = σ2ω ,These conditions imply the restrictions E(λi | zi) = E(vi | zi) = 0 used in the main textof the chapter.The structure allows the measurement errors to vary by individual and by party. Anintuition for this is that, when partisanship increases, it is harder to observe the true so-cial/legislative effort of a politician. This implies we get noisier proxies for s∗i ,x∗i withmore partisanship. When pi = 0, we only have classical measurement error in the proxies.179With no partisanship, there are no differences in how well we observe socializing acrosspoliticians and parties.The identification arguments presented in the previous approach still hold, up to theidentification of φ . Let us continue from there.Under the assumptions stated above, note that:Eλ 2i = E(ωi p(i)+ui)2= E(ω2i p(i)2+u2i +2uiωi p(i))= p(i)2σ2ω +σ2u . (A.51)Similarly,Ev2i = p(i)2σ2ω +σ2η (A.52)E(λivi) = p(i)2σ2ω . (A.53)We make the following normalization, such that all variances are now written in terms ofthe variance of ω: σω = 1.9 In that case, from (A.53) we get that:p(i)2 = E(λivi). (A.54)Since we can take expectations over i∈ P1 or i∈ P2, we get identification of both p1 andp2. Plugging in p(i) into (A.51) identifies σ2u . Plugging p(i) into (A.52) identifies σ2η .Note that the moment conditions given by equations (A.51), (A.52), (A.53) (for eachparty) are of very simple form to use.A.6.1 Estimation with Second Moment Conditions on the Proxies of s∗i ,x∗iTo estimate this version of the model, we can retain all moment conditions presented in themain part of the text, while adding the moment equations (A.51), (A.52), (A.53), togetherwith a moment equation derived for φ . To derive the latter, equations (2.22) - (2.23) arerewritten together as:E(log(s˜i)− log(x˜i)− log(φ)−Ψ0−Ψ1Ii∈P2) = 0. (A.55)9 p1, p2 are always multiplied by σω .180whereΨ0 = log(p1B1X1A1+(1− p1)2B1S1X1+(1− p1)(1− p2)B2S2X2(1− p1)A1S1+(1− p2)A2S2),andΨ1 = log(p2B2X2A2+(1− p2)2B2S2X2+(1− p1)(1− p2)B1S1X1(1− p1)A1S1+(1− p2)A2S2)−Ψ0.Estimation then follows the routines described in the main part of the text, except wenow estimate φ , p1, p2,σ2u ,σ2η .A.6.2 Results under Restrictions on the Second Moments of the Proxies for (s∗i ,x∗i )These are presented in Table A.1 below.181Table A.1: Results, Specification 1, second moments of the (s∗i ,x∗i ) proxyCongress105 106 107 108 109 110c 0.328 0.324 0.360 0.366 0.352 0.334(0.0089) (0.0083) (0.0102) (0.0114) (0.0095) (0.0085)φ 0.0532 0.0572 0.0641 0.0616 0.0611 0.0627(0.00207) (0.00239) (0.00272) (0.00382) (0.00314) (0.00260)ζ 16.110 23.748 20.281 16.461 20.607 19.153(0.9207) (1.900) (1.370) (1.045) (1.325) (1.060)p1 0.0708 0.0585 0.0684 0.106 0.0555 0.0920(0.0231) (0.0247) (0.0225) (0.0222) (0.0296) (0.0179)p2 0.0804 0.0803 0.116 0.0984 0.113 0.0929(0.0203) (0.0197) (0.0171) (0.0203) (0.0206) (0.0246)σ2u 0.228 0.213 0.207 0.204 0.215 0.219(0.00537) (0.00766) (0.00736) (0.00617) (0.00691) (0.0140)σ2η 0.1814 0.162 0.192 0.251 0.208 0.168(0.00374) (0.00564) (0.00443) (0.0263) (0.0131) (0.00732)SDem 0.876 1.054 1.005 0.969 1.010 1.167(0.0329) (0.0362) (0.0355) (0.0342) (0.0313) (0.0300)SRep 0.851 0.983 0.936 0.847 0.885 1.005(0.038) (0.0407) (0.0503) (0.0422) (0.0475) (0.0595)XDem 3.420 3.5858 3.152 3.127 3.295 3.757(0.130) (0.126) (0.117) (0.131) (0.113) (0.0951)XRep 4.0617 4.442 4.170 3.814 3.904 4.138(0.180) (0.187) (0.220) (0.193) (0.211) (0.265)Ideology -0.575 -0.492 -0.714 -0.771 -0.675 -0.465(0.0891) (0.0825) (0.0841) (0.0928) (0.0819) (0.0699)Tenure 0.00374 0.00598 0.00454 0.005245 0.00347 8.415e-05(0.00285) (0.00242) (0.00276) (0.00311) (0.00260) (0.00220)Grosewart -0.0149 -0.0182 -0.0302 -0.0154 -0.0113 -0.00582(0.0095) (0.0097) (0.0094) (0.0114) (0.0096) (0.0085)Ideology×Rep 0.0734 -0.0255 -0.0713 0.0925 0.121 0.112(0.0763) (0.0699) (0.0876) (0.0749) (0.0786) (0.0821)Tenure×Rep 0.0110 0.00708 0.00487 0.00775 0.00176 -0.00019(0.00329) (0.00271) (0.00347) (0.0030) (0.0031) (0.0037)Grosewart×Rep -0.0180 -0.0064 -0.0205 -0.0239 -0.0325 -0.0329(0.0103) (0.0100) (0.0115) (0.0112) (0.0104) (0.0123)N 437 434 436 438 434 437Notes: Standard Errors in parentheses. The table presents the results from the GMM estimation under second momentconditions on the proxies of (s∗i ,x∗i ). The optimal weighting matrix is used, and standard errors are estimated as discussed inAppendix. All other notes follow those in Table 2.2.182A.7 Identification with Nonparametric αIn this section, we prove that our model is identified even with a nonparametric α . Thisemphasizes that the use of the parametrization of α is not strictly needed theoretically, butis useful for data purposes. This is because a nonparametric α requires us to estimate αi foreach politician in each Congress, when we have only one set of observations per politicianper term.Let us maintain the same error structure as in the Parametric approach in the main text,given in equation (2.26). Let i1 be a normalizer (αi1 = 1). Without loss, let him/her be fromparty 1.Then, we can rewrite (2.21) for i1 as:s∗i1 = αi1S1log(s˜i1)+λi1 = log(S1).Taking expectations on both sides yields:Elog(s˜i1) = log(S1), (A.56)which is now identified. Note that Es˜i1 is known from the data, as it is a moment of theobserved proxy.Now using (2.21) for any i in party 1, leads to:Elog(s˜i) = log(αi)+Elog(s˜i1), (A.57)and αi is identified for any politician in party 1. The intuition is that, in equilibrium,all politicians in party 1 choose social efforts that are proportional to S1. This means thatonce we normalize someone, we know everyone else’s scale. It follows that A1 = ∑ j∈P1 α j,B1 = ∑ j∈P1 α2j are identified as well. This is because we know the party affiliations of eachCongress member.For any politician in party 1, we must also have from (2.20) that:183log(x∗i ) = log(x˜i)+ vi = log(αi)+ log(X1)Elog(x˜i) = log(αi)+ log(X1),where we use that vi is mean zero. It follows that X1 is identified, and S1/X1 is nowidentified.Applying this to equation (2.24) of Party 1 leads to:c =1X1+(S1X1)2, (A.58)implying that c is identified (since S1,X1 have been identified).Let us now use equations (2.21) and (2.20) for a party member k from party 2.(2.21) divided by (2.20) yields:s∗k = x∗kS2X2log(s˜k)+λk = (log(x˜k)+ vk)log(S2X2)Elog(s˜k) = Elog(x˜k)S2X2(A.59)Hence, S2/X2 is now identified. Using that in equation (2.24) of Party 2:c− S2X2=1X2. (A.60)Since c is already known, and so is S2/X2, we have that X2 is identified. It follows thatS2 is identified.Applying this on equation (2.21) implies that α j is identified for all politicians in party2. It follows that we now know A2,B2, which are functions of α j for party 2.(Set) identification of φ follows the same way as the parametric approach.184Appendix BAppendix to Chapter 3B.1 Construction of the Estimator for the AsymptoticCovariance MatrixWe now explain our proposal to estimate the asymptotic covariance matrix, given in equa-tion (3.17) for the model with agents of simple type.We first explain our proposal to estimate Λ consistently for the case of β0 6= 0. Then,we later show how the estimator works even for the case of β0 = 0. We first writevi = Ri(ε)+ηi, (B.1)whereRi(ε) = λiiεi+β0λiinP(i)∑j∈NP(i)λi jε j.Define for i, j ∈ N,ei j = E [Ri(ε)R j(ε)|F ]/σ2ε ,where σ2ε denotes the variance of εi. It is not hard to see that for all i ∈ N,eii = λ 2ii +β 20 λ 2iin2P(i)∑j∈NP(i)λ 2i j,185and for i 6= j such that NP(i)∩NP( j) 6=∅, ei j = β0qε,i j, whereqε,i j =λ jiλiiλ j jnP( j)+λi jλiiλ j jnP(i)+β0λiiλ j jnP(i)nP( j)∑k∈NP(i)∩NP( j)λikλ jk.Thus, we write1n∗ ∑i∈N∗E[v2i |F ] = aεσ2ε +σ2η , and (B.2)1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗E[viv j|F ] = β0bεσ2ε ,where σ2η denotes the variance of ηi,aε =1n∗ ∑i∈N∗eii, and bε =1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗qε,i j.(Note that since not all agents in NP(i) are in N∗ for all i ∈ N∗, the set NP(i)∩N∗ does notnecessarily coincide with NP(i).) When β0 6= 0, the solution takes the following form:σ2ε =1n∗β0bε ∑i∈N∗ ∑j∈NP(i)∩N∗E[viv j|F ] and (B.3)σ2η =1n∗ ∑i∈N∗E[v2i |F ]−aεn∗β0bε ∑i∈N∗ ∑j∈NP(i)∩N∗E[viv j|F ].In other words, when β0 6= 0, i.e., when there is strategic interaction among the players,we can “identify” σ2ε and σ2η by using the variances and covariances of residuals vi’s. Theintuition is as follows. Since the source of cross-sectional dependence of vi’s is due to thepresence of εi’s, we can identify first σ2ε using covariance between vi and v j for linked pairsi, j, and then identify σ2η by subtracting from the variance of vi the contribution from εi.In order to obtain a consistent estimator of Λ which does not require that β0 6= 0, wederive its alternative expression. Let us first writeΛ= Λ1+Λ2, (B.4)186whereΛ1 =1n∗ ∑i∈N∗E[v2i |F ]ϕ˜iϕ˜ ′i , andΛ2 =1n∗ ∑i∈N∗ ∑j∈N∗−iE[viv j|F ]ϕ˜iϕ˜ ′j,where N∗−i = N∗\{i}. Using (B.1) and (B.3), we can rewriteΛ2 =1n∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅ei jσ2ε ϕ˜iϕ˜′j=β0n∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅qε,i jσ2ε ϕ˜iϕ˜′j=sεn∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅qε,i jϕ˜iϕ˜ ′j,wheresε =∑i∈N∗∑ j∈NP(i)∩N∗ E[viv j|F ]∑i∈N∗∑ j∈NP(i)∩N∗ qε,i j.Now, it is clear that with this expression for Λ2, the definition of Λ is well definedregardless of whether β0 = 0 or β0 6= 0. We can then find the estimator of Λ, Λˆ, by usingthe empirical analogues to the above, as shown in the main text.187B.2 Proofs of Theorems 3.2.1 - 3.3.1In this section we provide the proofs of Theorems 3.2.1 - 3.3.1. Throughout the proofs, weuse the notation C1 and C2 to represent a constant which does not depend on n or n∗. With-out loss of generality, we also assume that N∗ is F -measurable. This loses no generalitybecause due to Condition A of the sampling process in the chapter, the same proof goesthrough if we redefineF to be the σ -field generated by bothF and N∗.First, we make explicit the best response operator M . Given wij = (wik j)k∈N , let usdefineMiwij =1nP(i)∑k∈NP(i)rikwik j1{ j ∈ NiI(k)}.Recall that player i’s payoff is affected by his GP-neighbors’ actions. Hence player i per-ceives player j as important to him even if player j’s action does not directly influence thepayoff of player i, if player j’s type is observed by and influences many of player i’s GP-neighbors. The expressionMiwij captures this perceived importance of player j to player ithat comes through player j’s influence (as perceived by player i) on his GP-neighbors.Suppose that each agent i has information set Ii,k for some k ≥ 0. Then the best re-sponse operatorM is given by the following relations:wii,1 = γ0+β0Miwii,1, (B.5)wii,ε = 1+β0Miwii,ε ,wii,2 = β0Miwii,2,and for all j ∈ NI(i),wi j,1 = β0Miwij,1, (B.6)wi j,ε = β0Miwij,ε , andwi j,2 ={δ0ri j/nP(i)+β0Miwij,2, if j ∈ NP(i),β0Miwij,2, if j ∈ NI(i)\NP(i),where we write wi j = (w′i j,1,w′i j,2,εi j)′, wi j,1,wi j,2 and wi j,ε being weights given by player ito player j’s type components, Xi,1,Xi,2 and εi j.Proof of Theorem 3.2.1: From the optimization of agent i, we havesBRi (Ii) = X′i,1γ0+ X˜′i,2δ0+β0E[Y˜Bi |Ii]+ εi+ηi, (B.7)188where Y˜ Bi denotes the weighted average (over NP(i)) of sk(Ik) where the weights are givenby beliefs of player i and rik. Thus we writeE[Y˜ Bi |Ii] =1nP(i)∑k∈NP(i)rik ∑j∈NI(i)wi′k jTj1{ j ∈ NiI(k)}.Plugging in this in (B.7), we havesBRi (Ii) =(γ0+β0nP(i)∑k∈NP(i)rikwiki,11{i ∈ NiI(k)})′Xi,1+(1+β0nP(i)∑k∈NP(i)rikwiki,ε1{i ∈ NiI(k)})εi+An+Bn,whereAn =β0nP(i)∑k∈NP(i)rikwi′ki,2Xi,21{i ∈ NiI(k)}, andBn = β0 ∑j∈NI(i)1nP(i)∑k∈NP(i)rik(wi′k j,1X j,1+wik j,εε j)1{ j ∈ NiI(k)}+β0 ∑j∈NI(i)1nP(i)∑k∈NP(i)rikwi′k j,2X j,21{ j ∈ NiI(k)}+1nP(i)∑k∈NP(i)rikδ ′0Xk,2.By setting the coefficients of X j,1, ε j and X j,2 to be wij,1, wij,ε and wij,2, we obtain thatwii,1 = γ0+β0Miwii,1, (B.8)wii,ε = 1+β0Miwii,ε ,wii,2 = β0Miwii,2,and for all j ∈ NI(i),wi j,1 = β0Miwij,1, (B.9)wi j,ε = β0Miwij,ε , and (B.10)wi j,2 ={δ0ri j/nP(i)+β0Miwij,2, if j ∈ NP(i),β0Miwij,2, if j ∈ NI(i)\NP(i),(B.11)189whereMiwij =1nP(i)∑k∈NP(i)rikwik j1{ j ∈ N¯iI(k)}.Now, we apply the behavioral assumptions to this operator to obtain the following:Miwii =1nP(i)∑k∈NP(i)rikτ ikiwii1{i ∈ NP(k)}=1nP(i)∑k∈NP(i)rikτ ikiwii=1nP(i)∑k∈NP(i)1n¯P(k)wii = ciiwii.By plugging this into (B.8), we havewii,1 = γ0+β0wii,1cii, (B.12)wii,ε = 1+β0wii,εcii,andwii,2 = wii,2 · β0nP(i) ∑k∈NP(i)1{i ∈ N¯iI(k)}n¯P(k)= wii,2 · β0nP(i) ∑k∈NP(i)1{i ∈ NP(k)}n¯P(k)= β0wii,2cii.The last equation gives wii,2 = 0, because |β0cii|< 1, and the first two equations givewii,1 =γ01−β0cii (B.13)andwii,ε =11−β0cii . (B.14)Also, we turn toMwij:Miwij =1nP(i)∑k∈NP(i)wi j1{ j ∈ NP(k)}+ri jwij j1{ j ∈ NP(i)}nP(i), (B.15)190where the last term corresponds to the case j = k ∈ NP(i). Using the definition ci j, werewriteMiwij,1 = ci jwi j,1+wii,1ri j1{ j ∈ NP(i)}nP(i)= ci jwi j,1+γ01−β0cii1nP(i)ri j1{ j ∈ NP(i)}andMiwij,ε = ci jwi j,ε +wii,εri j1{ j ∈ NP(i)}nP(i)= ci jwi j,ε +11−β0cii1nP(i)ri j1{ j ∈ NP(i)}.We plug this into (B.9) to obtainwi j,1 =β0γ0ri j1{ j ∈ NP(i)}nP(i)(1−β0ci j)(1−β0cii) ,andwi j,ε =β0ri j1{ j ∈ NP(i)}nP(i)(1−β0ci j)(1−β0cii) .Finally, let us consider wi j,2. Note that from (B.15),Miwij,2 = ci jwi j,2,because wii,2 = 0. By plugging this into (B.9), we obtain thatwi j,2 ={δ0ri j/nP(i)+β0ci jwi j,2, if j ∈ NP(i),β0ci jwi j,2, if j ∈ N2P(i)\NP(i),={δ0ri j/nP(i)+β0ci jwi j,2, if j ∈ NP(i),0, if j ∈ N2P(i)\NP(i),where the last zero follows from the equality wi j,2 = β0ci jwi j,2 with |β0ci j|< 1. Therefore,we havewi j,2 =δ0ri j1{ j ∈ NP(i)}nP(i)(1−β0ci j) .From the form of a linear strategy for sBRi (Ii) with the weights as solved thus far, we obtainthe desired result.Proof of Theorem 3.2.2: Suppose each agent is first-order sophisticated (FS) type; i.e.,191each i ∈ N believes that each k 6= i is simple type and chooses strategies according to:sik(Ik,0) = ∑j∈N¯iI(k)T ′j wik j +ηk.Then i’s best response takes the formsBR,FSi (Ii,1) =X′i,1γ0+ X˜′i,2δ0+β0E[Y˜Bi |Ii,1]+ηi+ εi,where E[Y˜ Bi |Ii,1] is written as in the proof of Theorem 3.2.1, noting only that FS has a largerinformation set and the weights wik j now denote FS-type beliefs. Hence, we can express thebest response of FS types assBR,FSi (Ii,1) = ∑j∈N¯I(i)T ′j wi j +ηi,where, for each i∈N, and each j ∈ N¯I(i), the wi j’s are given according to (B.8) - (B.9) fromTheorem 3.2.1. Hence, the best responses of FS types are linear when utility is quadraticand FS types believe simple types play according to linear strategies.Since an agent i with FS-type believes that all other agents are of simple type, we havethat wik j = wk j, where wk j’s are the weights given to j when the agent k is of simple type.The latter weights are already found in Theorem 3.2.1. Therefore, using λi j = ri j/(1−β0ci j), together with the Theorem 3.2.1 weights, in equations (B.8) - (B.9), we obtain theFS weights as follows:wii,1 = γ0+β0nP(i)∑k∈NP(i)β0γ0rikλki1{i ∈ NP(k)}nP(k)(1−β0ckk) 1{i ∈ N¯iI(k)},wii,ε = 1+β0nP(i)∑k∈NP(i)β0rikλki1{i ∈ NP(k)}nP(k)(1−β0ckk) 1{i ∈ N¯iI(k)},wii,2 =β0nP(i)∑k∈NP(i)δ0rikλki1{i ∈ NP(k)}nP(k)1{i ∈ N¯iI(k)},192and for each j ∈ NP,2(i),wi j,1 =β0nP(i)∑k∈NP(i)wik j,11{ j ∈ NP(k)}=β0nP(i)(∑k∈NP(i)wik j,11{ j ∈ NP(k)}+wij j,11{ j = k})=β0nP(i)∑k∈NP(i)β0γ0rikλk j1{ j ∈ NP(k)}nP(k)(1−β0ckk) +β0γ01{ j ∈ NP(i)}nP(i)(1−β0c j j) .Analogously,wi j,ε =β01{ j ∈ NP(i)}nP(i)(1−β0c j j) +β0nP(i)∑k∈NP(i)β0rikλk j1{ j ∈ NP(k)}nP(k)(1−β0ckk) ,and as for wi j,2, if j ∈ NP(i)wi j,2 = δ0ri j/nP(i)+β0nP(i)∑k∈NP(i)rikδ0λk j1{ j ∈ NP(k)}nP(k){ j ∈ N¯iI(k)},and if j ∈ NI(i)\NP(i),wi j,2 =β0nP(i)∑k∈NP(i)rikδ0λk j1{ j ∈ NP(k)}nP(k){ j ∈ N¯iI(k)}.Next, definingλ¯i j =1nP(i)∑k∈NP(i)rikλk j1{ j ∈ NP(k)}nP(k), andλ˜i j =1nP(i)∑k∈NP(i)rikλk j1{ j ∈ NP(k)}nP(k)(1−β0ckk) ,we may write the weights aswii,1 = γ0+β 20 γ0λ˜ii,wii,ε = 1+β 20 λ˜ii,and wii,2 = β0δ0λ¯ii.193Lastly, for each j ∈ NP,2(i), we havewi j,1 =β0γ0λ j j1{ j ∈ NP(i)}nP(i)+β 20 γ0λ˜i j,wi j,ε =β0λ j j1{ j ∈ NP(i)}nP(i)+β 20 λ˜i j,andwi j,2 =β0δ0λ¯i j, j ∈ NP,2(i)\NP(i)δ0ri j/nP(i)+β0δ0λ¯i j, j ∈ NP(i).Substituting these weights back into the best response function for FS types, we obtain thedesired result. We introduce auxiliary lemmas which are used for proving Theorem 3.3.1.Lemma B.2.1. For any array of numbers {ai j}i, j∈N and a sequence {bi}i∈N of numbers,we have for any subsets A,B⊂ N and for any undirected graph G = (N,E),∑i∈B∑j∈N(i)∩Aai jb j =∑i∈A(∑j∈N(i)∩Ba ji)bi,where N(i) = {i ∈ N : i j ∈ E}.Proof: Since the graph G is undirected, i.e, 1{ j ∈ N(i)} = 1{i ∈ N( j)}, we write the lefthand side sum as∑i∈B∑j∈A1{ j ∈ N(i)}ai jb j = ∑j∈A∑i∈B1{i ∈ N( j)}ai jb j.Interchanging the index notation i and j gives the desired result. Lemma B.2.2. Suppose that the conditions of Theorem 3.3.1 hold. Then,Λ−1/21√n∗ ∑i∈N∗ϕ˜ivi→d N(0, IM).Proof: Choose any vector b ∈ RM such that ||b||= 1 and let ϕ˜i,b = b′ϕ˜i. Defineai = λiiϕ˜i,b1{i ∈ N∗}+β0 ∑j∈NP(i)∩N∗ϕ˜ j,bλ jiλ j jnP( j).194Using Lemma B.2.1, we can write1√n∗ ∑i∈N∗ϕ˜i,bvi = ∑i∈N◦ξi, (B.16)where we recall N◦ =⋃i∈N∗ NP(i), andξi = (aiεi+ ϕ˜i,bηi1{i ∈ N∗})/√n∗.By the Berry-Esseen Lemma for independent random variables (see, e.g., [157], p.259),supt∈R∣∣∣P{∑i∈N◦ξiσξ ,i≤ t|F}−Φ(t)∣∣∣≤ 9E[∑i∈N◦ |ξi|3|F ](∑i∈N◦ σ2ξ ,i)3/2 , (B.17)where σ2ξ ,i = Var(ξi|F ). It suffices to show that the last bound vanishes in probability asn∗→ ∞. First, observe that∑i∈N◦σ2ξ ,i =1n∗ ∑i∈N◦(a2i σ2ε + ϕ˜2i,bσ2η1{i ∈ N∗}) ≥σ2ηn∗ ∑i∈N∗ϕ˜2i,b = σ2η > 0,because 1n∗ ∑i∈N∗ ϕ˜2i,b = 1. Observe thatE[∑i∈N◦|ξi|3|F]≤ 4maxi∈N E[|εi|3|F ](n∗)3/2 ∑i∈N◦|ϕ˜i,b|3|ai|3 (B.18)+4maxi∈N E[|ηi|3|F ](n∗)3/2 ∑i∈N◦|ϕ˜i,b|3≤ C1 maxi∈N E[|εi|3|F ](n∗)3/2 ∑i∈N◦|ai|3+C1n◦maxi∈N E[|ηi|3|F ](n∗)3/2,for some constant C1 > 0, by Assumption 3.3.3. Now, using the fact that for i, j ∈ N◦such that for i 6= j, 0 < λi j ≤ C/(1−β0) (which is due to 0 ≤ ci j ≤ 1 for all i, j ∈ N andβ0 ∈ (−1,1) and by Assumption 3.3.4), and for i = j, 0< λii ≤ 1/(1−β0), and that|ϕ˜i,b| ≤C,195for some constant C > 0, we bound the leading term as (for some constants C2,C3 > 0)C2n∗ ∑i∈N◦|ai|3 ≤ C(1−β0)31n∗ ∑i∈N◦(ϕ˜i,b1{i ∈ N∗}+β0 ∑j∈NP(i)∩N∗ϕ˜ j,bλ jiλ j jnP( j))3≤ C3(1−β0)3 +C3(1−β0)6 R3n,whereRn =1n∗ ∑i∈N◦ ∑j∈NP(i)∩N∗r jinP( j).Using Lemma B.2.1, we rewriteRn =1n∗ ∑i∈N∗1nP(i)∑j∈NP(i)ri j <C,for some constant C > 0. Hence we find that for some constant C1 > 0,1n∗ ∑i∈N◦|ai|3 ≤ C1(1−β0)6 .Therefore, for some constant C2 > 0,E[∑i∈N◦|ξi|3|F]≤ C2√n∗(1−β0)6maxi∈N∗E[|εi|3|F ]+ C2n◦(n∗)3/2maxi∈N∗E[|ηi|3|F ].Thus we conclude that the bound in (B.17) is OP((n∗)−1/2 + n◦(n∗)−3/2). However, forsome constant C > 0,n◦ ≤ ∑i∈N∗|N¯P(i)| ≤Cn∗,by Assumption 3.3.4. Hence we obtain the desired result.Lemma B.2.3. Suppose that the conditions of Theorem 3.3.1 hold. Then for some constantC1 > 0,E[||Sϕ˜v||2|F ]= O((n∗)−1), and E[||SZ∗v||2|F ]= O((n∗)−1),where Z∗i = ∑ j∈NP(i)∩N∗ Z j and SZ∗v =1n∗ ∑i∈N∗ Z∗i vi.196Proof: Note thatE[||Sϕ˜v||2|F ]≤ σ2ε(n∗)2 ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅|ei j|||ϕ˜i||||ϕ˜ j||+ 1(n∗)2 ∑i∈N∗(|eii|σ2ε +σ2η)||ϕ˜i||2.However, since ||ϕ˜i|| ≤C by Assumption 3.3.3, the leading term on the right hand side isbounded by for some constants C1,C2 > 0,C1σ2ε β0(n∗)2(1−β0)4 ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅|NP(i)∩NP( j)|nP(i)nP( j)+C1σ2εn∗(1−β0)3 ≤C2n∗,and the second term by Cσ2η/n∗ for some constant C > 0. Hence the first bound follows.Let us turn to the second bound. Observe that by Assumption 3.3.3, we have someC > 0 such that for all i ∈ N∗, ||Z∗i || ≤C. Following the same proof as before, we find thatE[||SZ∗v||2|F ] is bounded byC1σ2ε β0(n∗)2(1−β0)4 ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅|NP(i)∩NP( j)||NP(i)∩N∗||NP( j)∩N∗|nP(i)nP( j)+C1σ2εn∗(1−β0)3 .The leading term is bounded by (for some constants C1,C2,C3)C1σ2ε β0(n∗)2(1−β0)4 ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅|NP(i)∩NP( j)|≤ C2σ2ε β0(n∗)2(1−β0)4 ∑i∈N∗|NP(i)∩N∗| ≤ C3σ2ε β0n∗(1−β0)4 .Thus we obtain the desired result. Lemma B.2.4. Suppose that the conditions of Theorem 3.3.1 hold. Then the followingholds.(i) 1n∗ ∑i∈N∗(v˜2i − v2i )ϕ˜iϕ˜ ′i = OP(1/√n∗).(ii) 1n∗ ∑i∈N∗∑ j∈NP(i)∩N∗(v˜iv˜ j− viv j)ϕ˜iϕ˜ ′j = OP(1/n∗).(iii) 1n∗ ∑i∈N∗(v2i −E[v2i |F ])ϕ˜iϕ˜ ′i = OP(1/√n∗).(iv) 1n∗ ∑i∈N∗∑ j∈NP(i)∩N∗(viv j−E[viv j|F ])ϕ˜iϕ˜ ′j = OP(1/√n∗).197Proof: (i) First, write v˜− v =−Z(ρ˜−ρ0), where ρ˜−ρ0 =[SZϕ˜S′Zϕ˜]−1SZϕ˜Sϕ˜v. Hence∥∥∥∥∥ 1n∗ ∑i∈N∗(v˜i− vi)2ϕ˜iϕ˜ ′i∥∥∥∥∥≤ C1n∗ ∑i∈N∗(v˜i− vi)2,for some constant C1 > 0. As for the last term, note that1n∗ ∑i∈N∗E[(v˜i− vi)2|F](B.19)=1n∗tr(S′Zϕ˜[SZϕ˜S′Zϕ˜]−1 SZZ [SZϕ˜S′Zϕ˜]−1 SZϕ˜Λ)= OP( 1n∗),by Lemma B.2.3. However, we need to deal with∣∣∣ 1n∗ ∑i∈N∗(v˜2i − v2i )∣∣∣≤√ 1n∗ ∑i∈N∗(v˜i− vi)2√1n∗ ∑i∈N∗(v˜i+ vi)2. (B.20)Note that1n∗ ∑i∈N∗(v˜i+ vi)2 ≤ 2n∗ ∑i∈N∗(v˜i− vi)2+ 8n∗ ∑i∈N∗v2i= OP(1n∗)+8n∗ ∑i∈N∗v2i ,by (B.19). As for the last term,1n∗ ∑i∈N∗E[v2i |F ]≤2n∗ ∑i∈N∗E[Ri(ε)2|F ]+ 2n∗ ∑i∈N∗E[η2i |F ].The last term is bounded by 2σ2η , and the first term on the right hand side is bounded by2σ2ε(1−β0)2 +2n∗ ∑i∈N∗E( β0nP(i)∑j∈NP(i)λi jλiiε j)2|F≤C.Combining this with (B.19) and (B.20), we obtain the desired result.198(ii) Let us first write1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗(v˜iv˜ j− viv j)=1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗(v˜i− vi)(v˜ j− v j)+1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗(v˜i− vi)v j+1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗vi(v˜ j− v j) = An,1+An,2+An,3, say.As for the leading term, by Cauchy-Schwarz inequality,|An,1|=√1n∗ ∑i∈N∗(v˜i− vi)2√√√√ 1n∗ ∑i∈N∗(∑j∈NP(i)∩N∗(v˜ j− v j))2.Note that1n∗ ∑i∈N∗E( ∑j∈NP(i)∩N∗(v˜ j− v j))2|F≤ 1n∗ ∑i∈N∗|NP(i)∩N∗| ∑j∈NP(i)∩N∗E[(v˜ j− v j)2 |F]=1n∗ ∑i∈N∗(∑j∈NP(i)∩N∗|NP( j)∩N∗|)E[(v˜i− vi)2 |F],where the inequality above uses Jensen’s inequality and the equality above uses LemmaB.2.1. Hence the last term is bounded bymaxi∈N∗ |NP(i)∩N∗|2n∗ ∑i∈N∗E[(v˜i− vi)2 |F]≤ OP(1n∗).by (B.19). Thus we conclude that|An,1|= OP(1n∗).199Now, let us turn to An,2. Observe thatAn,2 = − 1n∗ ∑i∈N∗Z′i ∑j∈NP(i)∩N∗v j(ρ˜−ρ0)= −(1n∗ ∑i∈N∗Z∗′i vi)(ρ˜−ρ0) =−SZ∗v(ρ˜−ρ0)using Lemma B.2.1. From the proof of (i), we obtain thatρ˜−ρ0 = OP(1√n∗).Hence combined with Lemma B.2.3, we have|An,2|= OP(1n∗).Since by Lemma B.2.1, An,2 = An,3, the proof of (ii) is complete.(iii) Note thatVar(1n∗ ∑i∈N∗R2i (ε)|F)≤ 2(n∗)2 ∑i∈N∗Var(λ 2iiε2i |F)+2(n∗)2 ∑i∈N∗Var(β0λiinP(i)∑j∈NP(i)λi jε j)2|F .The leading term is OP((n∗)−1). The last term is bounded by2(n∗)2 ∑i∈N∗β 20 λ2ii1nP(i)∑j∈NP(i)λ 2i jE[ε2j |F ] = OP((n∗)−1).Since vi = Ri(ε)+ηi and εi’s and ηi’s are independent, we obtain the desired rate.(iv) For simplicity of notation, defineVi j = (viv j−E[viv j|F ])ϕ˜iϕ˜ ′j.200Then we writeE( 1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗Vi j)2|F=1(n∗)2 ∑i1∈N∗ ∑j1∈NP(i)∩N∗∑i2∈N∗∑j2∈NP(i)∩N∗E [Vi1 j1Vi2 j2 |F ] .The last expection is zero, whenever (i2, j2) is away from (i1, j1) by more than two edges.Hence we can bound the last term by (using Assumption 3.3.4))C1n∗maxi∈NE[v2i |F ]≤C2n∗for some constants C1,C2 which do not depend on n. Lemma B.2.5. Suppose that the conditions of Theorem 3.3.1 hold. Then,Λˆ−Λ= OP(1√n∗).Proof: We writeΛˆ1−Λ1 = 1n∗ ∑i∈N∗(v˜2i −E[v2i |F ])ϕ˜iϕ˜ ′i andΛˆ2−Λ2 = sˆε − sεn∗ ∑i∈N∗ ∑j∈NP(i)∩N∗qi j,ε ϕ˜iϕ˜ ′j.By Assumption 3.3.2 and Lemma B.2.4(ii)(iv), we havesˆε − sε = OP(1/√n∗).The desired result follows by using this and applying Lemma B.2.4(i)(iii) to Λˆ1−Λ1. Lemma B.2.6. Suppose that the conditions of Theorem 3.3.1 hold. Then the followingholds.(i) 1n∗ ∑i∈N∗(vˆ2i − v2i )ϕ˜iϕ˜ ′i = OP(1/√n∗).(ii) 1n∗ ∑i∈N∗∑ j∈NP(i)∩N∗(vˆivˆ j− viv j)ϕ˜iϕ˜ ′j = OP(1/n∗).Proof: First, write vˆ− v =−Z(ρˆ−ρ0), whereρˆ−ρ0 =[SZϕ˜ Λˆ−1S′Zϕ˜]−1SZϕ˜ Λˆ−1Sϕ˜v. (B.21)201Following the same arguments as in the proof of Lemma B.2.4(i)(ii) and Lemma B.2.5, weobtain the desired result. Proof of Theorem 3.3.1: Let us consider the first statement. We write1√n∗Λˆϕ˜ ′vˆ =1√n∗Λˆϕ˜ ′(vˆ− v)+ 1√n∗Λˆϕ˜ ′v= − 1√n∗Λˆϕ˜ ′Z(ρˆ−ρ0)+ 1√n∗Λˆϕ˜ ′v =√n∗(I−P)Λˆ−1/2Sϕ˜v,using (B.21), whereP = Λˆ−1/2S′Zϕ˜[SZϕ˜ Λˆ−1S′Zϕ˜]−1SZϕ˜ Λˆ−1/2.Note that P is a projection matrix from RM onto the range space of Λˆ−1/2S′Zϕ˜ . Hence com-bining Lemmas B.2.2 and B.2.5. We obtain the desired result. The second result followsfrom Lemma B.2.2 and (B.21).Let us turn to the third statement. First, note that1√n∗ ∑i∈N∗ ∑j∈N˜(i)(vˆivˆ j− viv j) = OP(1/√n∗),by following precisely the same proof as that of Lemma B.2.4(ii). (Recall that N˜(i) isdefined in Condition D in the main text) Now, we letσ2 =Var 1√n∗ ∑i∈N∗ ∑j∈N˜(i)ηiη j|Fand write1σ√n∗ ∑i∈N∗ ∑j∈N˜(i)viv j =1√n∗ ∑i∈N∗ri,whereri =1σ ∑j∈N˜(i)ηiη j,because vi = ηi under the null hypothesis. Note that E[ri|F ] = 0. Let G∗P be a graph on N∗such that i and j are adjacent if and only if j ∈ N˜(i) or i ∈ N˜( j). Then {ri}i∈N∗ has G∗P as a202dependency graph conditional onF . Now we show the following:(n∗)−1/4√µ33 +(n∗)−1/2µ24 →P 0, (B.22)where for p≥ 1,µp = maxi∈N∗(E[|ri|p|F ])1/p .Then by Theorem 2.3 of [145], we obtain that1σ√n∗ ∑i∈N∗ ∑j∈N˜(i)viv j→d N(0,1),as n∗→ ∞. First, note thatσ2 = E 1√n∗ ∑i∈N∗ ∑j∈N˜(i)ηiη j2 |F=1√n∗ ∑i1∈N∗ ∑j1∈N˜(i1)∑i2∈N∗∑j2∈N˜(i2)E [ηi1η j1ηi2η j2 |F ] .Note that in the quadruple sum, i1 6= j1 and i2 6= j2. There are only two ways the lastconditional expectation is not zero: either i1 = i2 and j1 = j2 or j1 = i2 and i1 = j2, becauseηi’s are independent across i’s and its conditional expectation given F is zero. Hence thelast term is equal to2σ4ηn∗ ∑i∈N∗|N˜(i)|= 2σ4η d˜av (B.23)Hence for any p≥ 2,µ pp =1σ pmaxi∈N∗E∣∣∣∣∣∣ ∑j∈N˜(i)ηiη j∣∣∣∣∣∣p|F ≤ maxi, j∈N∗ E[|ηiη j|p|F ]σ p≤ maxi, j∈N∗ E[|ηiη j|p|F ]2pσ2pη d˜pav.Note that d˜av ≥ 1 because N˜(i) 6= ∅ for all i ∈ N∗. Thus (B.22) follows. Now, by Lemma203B.2.4, and in the light of the expression (B.23), it is not hard to see that2Sˆ4(β0) = σ2+oP(1).The desired result follows from this and the Bonferroni procedure. 204Appendix CAppendix to Chapter 4C.1 ProofsProof of Lemma 4.3.1:Consider first kt > k′t . Given the increasing cost of exerting influence, a whip exertsthe minimum amount of influence necessary to ensure a vote for kt , provided this amountis less than or equal to ymaxp . The minimum amount of influence is such that the memberis indifferent, u(kt ,ωi,t + yi,t) = u(k′t ,ωi,t + yi,t) or |ωi,t + yi,t − kt | = |ωi,t + yi,t − k′t |. Thisequality is satisfied if and only if ωi,t + yi,t = MVt = kt+k′t2 . If ωi,t ≥ MVt , the requiredinfluence is weakly negative (absent influence, the member votes for kt) and so no influenceis exerted. If ωi,t < MVt , a positive amount of influence, yi,t = MVt −ωi,t > 0 is requiredwhich increases linearly in MVt −ωi,t . Therefore, a member is whipped if and only if theirideology is such that MVt − ymaxp ≤ ωi,t < MVt . For kt < k′t , the argument is reversed: onlymembers for which MVt < ωi,t ≤MVt + ymaxp are whipped.Proof of Lemma 4.3.2:Consider the mass, f (θ), of members at some θ , each of whom has an independent sig-nal of ηˆ1t due to their independent ideological shocks. The average number of ‘yes’ reportsfrom N at θ members is given by limN→∞ f (θ)N ∑Ni=1 I(u(xt ,θ +δ 1i,t + ηˆ1t )≥ u(qt ,θ +δ 1i,t + ηˆ1t ))where I() represents the indicator function. By the law of large numbers, as N → ∞, thisaverage converges to:205f (θ)E[I(u(xt ,θ +δ 1t + ηˆ1t )≥ u(qt ,θ +δ 1t + ηˆ1t ))]= f (θ)Pr(u(xt ,θ +δ 1t + ηˆ1t )≥ u(qt ,θ +δ 1t + ηˆ1t ))= f (θ)Pr(θ +δ 1t + ηˆ1t ≥MVt)= f (θ)(1−G(MVt −θ − ηˆ1t )).Therefore, after observing the number of ‘yes’ reports for a given θ , ηˆ1t is known withprobability one.Proof of Lemma 4.3.3:Consider xt > qt . Let G1+2() denote the cdf of δ 1i,t + δ 2i,t (with corresponding pdf,g1+2()). For a given MVt , the number of votes for xt from a given party’s members isknown with probability one due to independent idiosyncratic shocks and a continuum ofmembers. To see this fact, consider the continuum of party p’s members located at each θ ,each with independent shocks, δ 1i,t and δ 2i,t . With N voters at θ , the average number of votesfrom these members is given by limN→∞f (θ)N ∑Ni=1 I(θi+η1t +η2t +δ 1i,t +δ 2i,t ≥MVt±ymaxp ),where the sign with which ymaxp enters depends upon the direction that party p whips. Bythe law of large numbers, as N→ ∞, this average converges to:f (θ)E[I(θ +η1t +η2t +δ1t +δ2t ≥MVt ± ymaxp )] = f (θ)Pr(θ +η1t +η2t +δ 1t +δ 2t ≥MVt ± ymaxp )= f (θ)(1−G1+2(MVt −η1t −η2t ± ymaxp −θ)).Denote the realized marginal voter after the aggregate shocks as M˜V t = MVt −η1t −η2t . Then, the number of votes for xt from party D’s members is given by YD(M˜V t) =ND[∫ ∞−∞(1−G1+2(M˜V t −θ ± ymaxD ))fD(θ)dθ]. The corresponding expression for party Ris YR(M˜V t) = NR[∫ ∞−∞(1−G1+2(M˜V t −θ ± ymaxR ))fR(θ)dθ]. The total number of votesfor xt is then given by Y (M˜V t)≡ YD(M˜V t)+YR(M˜V t).Y (M˜V t) is strictly decreasing in xt . To see this, consider the votes from party D’smembers, YD(xt):∂YD(M˜V t)∂xt=12∂∂M˜V tND[∫ ∞−∞(1−G1+2(M˜V t −θ ± ymaxD ))fD(θ)dθ]= −ND2∫ ∞−∞g1+2(M˜V t −θ ± ymaxD ) fD(θ)dθ (C.1)206(C.1) is strictly less than zero given that that ideological shocks are unbounded, inde-pendent of the (finite) amount or direction of whipping. The same is true of the deriva-tive of YR(M˜V t), ensuring Y (M˜V t) strictly decreases in xt for xt > q. For xt < qt , we haveYD(M˜V t)=ND[∫ ∞−∞G1+2(M˜V t −θ ± ymaxD ) fD(θ)dθ]and YR(M˜V t)=NR[∫ ∞−∞G1+2(M˜V t −θ ± ymaxR ) fR(θ)dθ]so that Y (M˜V t) increases in xt . Since for qt < θm,p we must have xt > qt and for qt > θm,pwe must have xt < qt , we see that the number of votes for xt strictly decreases the closer itgets to the proposing party’s ideal point.Proof of Proposition 4.3.4:For qt = θm,D, clearly xcountt = xno countt = θm,D are the unique optimal alternative policiesbecause party D can do no better than its ideal point.In the case of no whip count, and qt < θm,D so that xt > qt , we can rewrite party D’sexpected utility asEUno countD (qt ,xt)=(1−Φ(MVt − MˆV R,Rσ))(u(xt ,θm,D)−u(qt ,θm,D))+u(qt ,θm,D)−CbThe derivative with respect to xt is given by(1−Φ(MVt − MˆV R,Rσ))ux(xt ,θm,D)− 12σ φ(MVt − MˆV R,Rσ)(u(xt ,θm,D)−u(qt ,θm,D))where φ() denotes the pdf of the standard normal distribution. At xt = qt , the derivativeis strictly positive given qt < θm,D and the fact that MˆV R,R is finite. At xt = θm,D, it is strictlynegative given u(qt ,θm,D) < 0. Together these facts ensure an interior solution, which wenow show is unique. Any interior solution must satisfy the first-order condition,(1−Φ(MV no countt − MˆV R,Rσ))ux(xno countt ,θm,D)− 12σφ(MV no countt − MˆV R,Rσ)(u(xno countt ,θm,D)−u(qt ,θm,D))= 0 (C.2)Defining zno countt ≡ MVno countt −MˆV R,Rσ , we can re-write the first-order condition as:1−Φ(zno countt )φ(zno countt )=12σu(xno countt ,θm,D)−u(qt ,θm,D)ux(xno countt ,θm,D)(C.3)The left-hand side of (C.3) is the inverse hazard rate of a standard normal distribution207and so is strictly decreasing in zno countt (and therefore xno countt since xno countt strictly in-creases in zno countt ). The sign of the derivative of the right-hand side with respect to xno counttis given by ux(xno countt ,θm,D)2− uxx(xno countt ,θm,D)(u(xno countt ,θm,D)−u(qt ,θm,D)) whichis strictly positive because uxx(xno countt ,θm,D)< 0 and u(xno countt ,θm,D)> u(qt ,θm,D). Thus,the right-hand side is strictly increasing in xno countt . Together, these facts guarantee a uniquesolution, xno countt ∈ (qt ,θm,D).1In the case of a whip count and and qt < θm,D , we can rewrite the party’s expectedutility:EUcountD (qt ,xt)= Pr(η1t ≥ η1t )(Pr(xt wins|η1t ≥ η1t )(u(xt ,θm,D)−u(qt ,θm,D))+u(qt ,θm,D)−Cb)+Pr(η1t < η1t)u(qt ,θm,D)= Pr(η1t ≥ η1t ,xt wins)(u(xt ,θm,D)−u(qt ,θm,D))−Pr(η1t ≥ η1t )Cb+u(qt ,θm,D)=∫ ∞η1t(1−Φ(MVt − MˆV R,R−ησ2))1σ1φ(ησ1)dη (u(xt ,θm,D)−u(qt ,θm,D))−(1−Φ(η1tσ1))Cb+u(qt ,θm,D)Taking the derivative with respect to xt yields:21The second-order condition at xno countt is also easily checked, but must be satisfied given that marginalexpected utility is increasing at xt = qt , decreasing at xt = θm,D and the solution is unique.2The necessary conditions for applying the Leibniz Integral Rule with an infinite bound are satisfied. Specif-ically, the integrand and its partial derivative with respect to xt are both continuous functions of xt and η , andit is possible to find integrable functions of η that bound the integrand and it’s partial derivative with respect toxt .208dEUcountD (qt ,xt)dxt= −dη1tdxt1σ1φ(η1tσ1)(1−Φ(MVt − MˆV R,R−η1tσ2))(u(xt ,θm,D)−u(qt ,θm,D))− 12σ1σ2∫ ∞η1tφ(MVt − MˆV R,R−ησ2)φ(ησ1)dη (u(xt ,θm,D)−u(qt ,θm,D))+1σ1ux(xt ,θm,D)∫ ∞η1t(1−Φ(MVt − MˆV R,R−η)σ2)φ(ησ1)dη+1σ1dη1tdxtφ(η1tσ1)Cb=1σ1ux(xt ,θm,D)∫ ∞η1t(1−Φ(MVt − MˆV R,R−η)σ2)φ(ησ)dη− 12σ1σ2∫ ∞η1tφ(MVt − MˆV R,R−ησ2)φ(ησ1)dη (u(xt ,θm,D)−u(qt ,θm,D))(C.4)where the second equality uses the fact that η1tsatisfies(1−Φ(MVt − MˆV R,R−η1tσ2))(u(xt ,θm,D)−u(qt ,θm,D)) =Cb (C.5)Consider the limit as Cb→ 0. From (C.5), we can see that, provided xt is bounded awayfrom qt so that u(xt ,θm,D)−u(qt ,θm,D)> 0 (which we subsequently confirm), we must haveη1t→−∞ as Cb→ 0. But, as η1t →−∞, the party always continues to pursue the bill afterthe first aggregate shock. In this case, the optimal alternative policy is identical to the caseof no whip count. Formally,limη t t→−∞dEUcountD (qt ,xt)dxt=1σ1ux(xt ,θm,D)∫ ∞−∞(1−Φ(MVt − MˆV R,R−η)σ2)φ(ησ1)dη− 12σ1σ2∫ ∞−∞φ(MVt − MˆV R,R−ησ2)φ(ησ1)dη (u(xt ,θm,D)−u(qt ,θm,D))= ux(xt ,θm,D)(1−Φ(MVt − MˆV R,Rσ))− 12σφ(MVt − MˆV R,Rσ)(u(xt ,θm,D)−u(qt ,θm,D)) (C.6)where the equality follows from the fact that the convolution of two standard normaldistributions is a normal distribution with the sum of the variances and using σ2 = σ21 +209σ22 . Comparing (C.6) with (C.2), we can see immediately that, in the limit, the first-ordercondition for the whip and no whip cases are identical, and it therefore follows that xcountt isunique and interior as in the no whip case. This fact ensures that u(xt ,θm,D)−u(qt ,θm,D)> 0in the limit, confirming that we must have η1t→−∞ as Cb→ 0.We now show that xcountt is unique and interior for strictly positive Cb. From (C.4), wesee that dEUcountD (qt ,xt)dxtis strictly positive at xt = qt and strictly negative at xt = θm,D, ensuringan interior optimum, xcountt which must satisfy the first-order condition3∫ ∞η1t(1−Φ(MVcountt −MˆV R,R−ησ2))φ( ησ1 )dη12σ2∫ ∞η1tφ(MV countt −MˆV R,R−ησ2)φ( ησ1 )dη=(u(xcountt ,θm,D)−u(qt ,θm,D))ux(xcountt ,θm,D)(C.7)As in the case of no whip count, the right-hand side of (C.7) strictly increases in xcountt .It remains to show that, in the limit as Cb→ 0, the left-hand side of (C.7) strictly decreasesin xcountt , which, by continuity of the left-hand side in Cb, ensures there exists a strictlypositive value of Cb, Cˆb > 0, such that for all Cb < Cˆb, the left-hand side continues to strictlydecrease. It then follows that xcountt is unique for all Cb < Cˆb. The sign of the derivative ofthe left-hand side of (C.7) with respect to xcountt , is determined by4− dη1tdxcounttφ(η1tσ1)(1−Φ(MVt − MˆV R,R−η1tσ2))12σ2∫ ∞η1tφ(MV countt − MˆV R,R−ησ2)φ(ησ1)dη+dη1tdxcountt12σ2φ(MV countt − MˆV R,R−η1tσ2)φ(η1tσ1)∫ ∞η1t(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dη−(12σ2∫ ∞η1tφ(MV countt − MˆV R,R−ησ2)φ(ησ1)dη)2− 14σ22∫ ∞η1tφ ′(MV countt − MˆV R,R−ησ2)φ(ησ1)dη∫ ∞η1t(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dη(C.8)By the implicit function theorem,dη1tdxtmust satisfy (from (C.5))3These statements require η1t <∞, which, by continuity, is true for Cb sufficiently small given that η1t →−∞as Cb→ 0.4Again, the necessary conditions for applying the Leibniz Integral Rule with an infinite bound are satisfied.210−φ(MV countt − MˆV R,R−η1tσ2)1σ2(12− dη1tdxcountt)(u(xcountt ,θm,D)−u(qt ,θm,D))+(1−Φ(MV countt − MˆV R,R−η1tσ2))ux(xcountt ,θm,D) = 0ordη1tdxcountt=12−σ2(1−Φ(MV countt − ˆMVR,R−η1tσ2))ux(xcountt ,θm,D)φ(MV countt −MˆV R,R−η1tσ2)(u(xcountt ,θm,D)−u(qt ,θm,D))(C.9)In the limit as Cb→ 0, η1t →−∞, in which case the second term of (C.9) approacheszero because xcountt is bounded away from qt and θm,D, and the inverse hazard rate of astandard normal random variable approaches zero as its argument approaches infinity.5 Thelimit of (C.8) as Cb→ 0 is then determined by the limit of its second two terms because thefirst two terms approach zero. Defining zcountt ≡ MVcountt −MˆV R,Rσ , this limit is given by5limx→∞ 1−Φ(x)φ(x) = limx→∞−φ(x)φ ′(x) = limx→∞−φ(x)−xφ(x) = 0 where the first equality uses L’Hoˆpital’s rule.211limη1t→−∞−(12σ2∫ ∞η1tφ(MV countt − MˆV R,R−ησ2)φ(ησ1)dη)2− 14σ22∫ ∞η t tφ ′(MV countt − MˆV R,R−ησ2)φ(ησ1)dη∫ ∞η1t(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dη= −(12σ2∫ ∞−∞φ(MV countt − MˆV R,R−ησ2)φ(ησ1)dη)2− 14σ22∫ ∞−∞φ ′(MV countt − MˆV R,R−ησ2)φ(ησ1)dη∫ ∞−∞(1−Φ(MVcountt − ˆMVR,R−ησ2))φ(ησ1)dη= −(12σφ(MV countt − ˆMVR,Rσ))2− 14σ2φ ′(MV countt − MˆV R,Rσ)(1−Φ(MVcountt − MˆV R,Rσ))= −(12σφ(zcountt ))2− 14σ2φ ′(zcountt )(1−Φ(zcountt ))= −(12σφ(zcountt ))2+14σ2zcountt φ(zcountt )(1−Φ(zcountt ))< −(12σφ(zcountt ))2+14σ2φ(zcountt )2= 0where the second equality uses properties of the convolution of normal distributions,and the inequality follows from the fact that, for a standard normal random variable, x(1−Φ(x))<φ(x).For qt > θm,D so that xt < qt , we assume party R whips against the bill (supports qt). Incase of no whip count, we can write party D’s expected utility asEUno countD (qt ,xt) =Φ(MVt − MˆV L,Rσ)(u(xt ,θm,D)−u(qt ,θm,D))+u(qt ,θm,D)−CbWith a whip count, it is212EUcountD (qt ,xt)=∫ η1t−∞Φ(MVt − MˆV L,R−ησ2)1σ1φ(ησ1)dη (u(xt ,θm,D)−u(qt ,θm,D))−Φ(η1tσ1)Cb+u(qt ,θm,D)Using these expressions, the optimal policy candidates, xcountt and xno countt , can beshown to be unique (provided Cb is not too large) as in the previous case.To prove Lemma 4.3.5, we first define and prove Lemma C.1.1.Lemma C.1.1. Fix Cb < Cˆb such that the optimal alternative policies, xcountt and xno countt ,are unique. Then, the alternative policies that satisfy the first-order conditions with andwithout a whip count (C.7) and (C.3) are such that:1. For qt 6= θm,D, the optimal alternative policy with a whip count, xcountt , lies strictlycloser to party D’s ideal point, θm,D, than that without, xno countt .2. MV countt (qt) and MVno countt (qt) strictly increase for qt < θm,D and strictly increasefor qt > θm,D .Proof of Lemma C.1.1:Part 1. Consider the case of qt < θm,D. We can write the first-order condition in the caseof no whip count as an integration over the second aggregate shock (as in the case of thewhip count):∫ ∞−∞[1−Φ(MV no countt −MˆV R,R−ησ2 )− 12σ2φ(MV no countt −MˆV R,R−ησ2 )(u(xno countt ,θm,D)−u(qt ,θm,D)u′(xno countt ,θm,D))]φ( ησ1 )dη = 0Consider the left-hand side of this expression, evaluated instead at xcountt :213∫ ∞−∞[1−Φ(MV countt −MˆV R,R−ησ2 )− 12σ2φ(MV countt −MˆV R,R−ησ2 )(u(xcountt ,θm,D)−u(qt ,θm,D)u′(xcountt ,θm,D))]φ( ησ1 )dη=∫ ∞η1t[1−Φ(MV countt −MˆV R,R−ησ2 )− 12σ2φ(MV countt −MˆV R,R−ησ2 )(u(xcountt ,θm,D)−u(qt ,θm,D)u′(xcountt ,θm,D))]φ( ησ1 )dη+∫ η1t−∞ [1−Φ(MV countt −MˆV R,R−ησ2 )− 12σ2φ(MV countt −MˆV R,R−ησ2 )(u(xcountt ,θm,D)−u(qt ,θm,D)u′(xcountt ,θm,D))]φ( ησ1 )dη= +∫ η1t−∞ [1−Φ(MV countt −MˆV R,R−ησ2 )− 12σ2φ(MV countt −MˆV R,R−ησ2 )(u(xcountt ,θm,D)−u(qt ,θm,D)u′(xcountt ,θm,D))]φ( ησ1 )dη (C.10)where the last equality follows from the fact that xcountt satisfies the first-order conditionfor the case of a whip count. Consider the sign of the integrand in (C.10):[1−Φ(MV countt −MˆV R,R−ησ2 )− 12σ2φ(MV countt −MˆV R,R−ησ2 )(u(xcountt ,θm,D)−u(qt ,θm,D)u′(xno countt ,θm,D))]φ( ησ1 ) ≷ 0⇐⇒ 1−Φ(MV countt −MˆV R,R−ησ2)12σ2φ(MV countt −MˆV R,R−ησ2)−(u(xcountt ,θm,D)−u(qt ,θm,D)ux(xno countt ,θm,D))≷ 0The left-hand side of this inequality is a strictly increasing function of η , so that there isat most one value of η at which the integrand is zero. As η→∞, the integrand approaches 1.Thus, to satisfy the first-order condition for the case of a whip count at xcountt , the integrandevaluated at η1tmust be strictly negative so that the single zero-crossing is contained in[η1t,∞) (otherwise the integrand is positive over the whole range and cannot integrate tozero). Thus, the integrand in (C.10) must be strictly negative over [−∞,η1t] so that theintegral is strictly negative: the marginal expected utility for the case of no whip countmust be negative when evaluated at the optimal alternative policy for the case of a whipcount. But, then we must have xno countt < xcountt to ensure that the first-order condition forthe case of no whip count is satisfied (given that xno countt is the unique optimum, for everyxt < xno countt , the marginal expected utility is positive). The case of qt > θm,D can be shownsimilarly.Part 2. Consider the case of qt < θm,D when a whip count is conducted. MV countt isdetermined implicitly by the first-order condition, (C.7). Taking its derivative with respect214to qt , we have∂∂qt∫ ∞η1t(1−Φ(MV countt −MˆV R,R−ησ2 ))φ( ησ1 )dη12σ2∫ ∞η1tφ(MVcountt −MˆV R,R−ησ2 )φ(ησ1 )dη− (u(xcountt ,θm,D)−u(qt ,θm,D))ux(xcountt ,θm,D) = 0⇐⇒ ∂∂MV countt∫ ∞η1t(1−Φ(MV countt −MˆV R,R−ησ2 ))φ( ησ1 )dη12σ2∫ ∞η1tφ(MVcountt −MˆV R,R−ησ2 )φ(ησ1 )dη ∂MV countt∂qt− ∂∂xcountt(u(xcountt ,θm,D)−u(qt ,θm,D)ux(xcountt ,θm,D))∂xcountt∂qt= 0⇐⇒ ∂∂MV countt∫ ∞η1t(1−Φ(MV countt −MˆV R,R−ησ2 ))φ( ησ1 )dη12σ2∫ ∞η1tφ(MVcountt −MˆV R,R−ησ2 )φ(ησ1 )dη ∂MV countt∂qt− ∂∂xcountt(u(xcountt ,θm,D)−u(qt ,θm,D)ux(xcountt ,θm,D))(2∂MV countt∂qt−1)= 0⇐⇒ ∂MVcountt∂qt ∂∂MV countt∫ ∞η1t(1−Φ(MV countt −MˆV R,R−ησ2 ))φ( ησ1 )dη12σ2∫ ∞η1tφ(MVcountt −MˆV R,R−ησ2 )φ(ησ1 )dη−2 ∂∂xcountt(u(xcountt ,θm,D)−u(qt ,θm,D)ux(xcountt ,θm,D))]− ∂∂xcountt(u(xcountt ,θm,D)−u(qt ,θm,D)ux(xcountt ,θm,D))= 0As shown in the proof of Proposition 4.3.4, the term in brackets on the left-hand side isstrictly negative for Cb < Cˆb. But, the term on the right-hand side is also strictly negativeso that ∂MVcountt∂qt > 0. Similarly,∂MV no countt∂qt > 0. For qt > θm,D, we can similarly establish∂MV countt∂qt < 0 and∂MV no countt∂qt < 0. Proof of Lemma 4.3.5:V countD (qt)>Vno countD (qt) because, for Cb sufficiently small, η1t < ∞ and η1t >−∞ (seefootnote 3) so that an alternative policy is pursued for a non-zero measure of the support ofη1t . Therefore, for the same alternative policy, party D’s expected utility with a whip countmust strictly exceed that without because over this support of η1t , the cost, Cb, is avoidedand the probability of the alternative passing is the same. If party D pursues a differentalternative policy with a whip count (which it generally does), then it must because it does215even better.Consider the case of qt < θm,D. We claim both value functions decrease with qt , but thedifference V countD (qt)−V no countD (qt) increases. By the envelope theorem, the derivative ofthe value function for the case of no whip count with respect to qt is given by∂V no countD (qt)∂qt= −(1−Φ(MVno countt − MˆV R,Rσ))uq(qt ,θm,D)− 12σφ(MV no countt − MˆV R,Rσ)(u(xno countt ,θm,D)−u(qt ,θm,D))= −(1−Φ(MVno countt − MˆV R,Rσ))uq(qt ,θm,D)−(1−Φ(MVno countt − MˆV R,Rσ))ux(xno countt ,θm,D)= −(1−Φ(MVno countt − MˆV R,Rσ))(uq(qt ,θm,D)+ux(xno countt ,θm,D))where the first equality follows from applying the first-order condition. With unboundedaggregate shocks and qt ,xno countt < θm,D, this derivative is strictly negative so that the valueof pursuing an alternate policy strictly decreases with qt .In a similar manner, for the case of a whip count, we have∂V countD (qt)∂qt= − 12σ2σ1∫ ∞η1tφ(MV countt − MˆV R,R−ησ2)φ(ησ1)dη (u(xt ,θm,D)−u(qt ,θm,D))− 1σ1uq(qt ,θm,D)∫ ∞η1t(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dη= − 1σ1(uq(qt ,θm,D)+ux(xcountt ,θm,D))∫ ∞η1t(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dηwhich is also strictly negative, given η1t< ∞.Finally, consider the marginal difference in the value functions:216∂ (V countD (qt)−V no countD (qt))∂qt= − 1σ1(uq(qt ,θm,D)+ux(xcountt ,θm,D))∫ ∞η1t(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dη+(uq(qt ,θm,D)+ux(xno countt ,θm,D))(1−Φ(MVno countt − MˆV R,Rσ))From the first part of Lemma C.1.1, xno countt < xcountt , which ensures ux(xno countt ,θm,D) >ux(xcountt ,θm,D). Furthermore,1−Φ(MVno countt − MˆV R,Rσ)> 1−Φ(MVcountt − MˆV R,Rσ)=1σ1∫ ∞−∞(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dη>1σ1∫ ∞η1t(1−Φ(MVcountt − MˆV R,R−ησ2))φ(ησ1)dη> 0given η1t< ∞. Therefore, the difference in expected utility strictly increases with qt .For qt > θm,D, we can establish that both value functions increase in qt , but their differ-ence decreases, in an identical manner. Proof of Proposition 4.3.6:Assume Cb < Cˆb so that, from Proposition 4.3.4, xcountt is unique. Consider qt < θm,D.We first show that as qt → θm,D, V no countD (qt)→−Cb and V countD (qt)→ 0. The first followsfrom simple inspection of EUno countD (qt ,xt), noting that xno countt must approach θm,D asqt → θm,D because it is contained in the interval, (qt ,θm,D), by Proposition 4.3.4. Similarly,inspecting EUcountD (qt ,xt), we see that VcountD (qt)→−(1−Φ(η1tσ1 ))Cb. But, as qt → θm,D,we can see from (C.5) that η1tmust approach infinity such that Φ(η1tσ1 )→ 1.Given these facts, strictly positive costs, and the result of Lemma 4.3.5 that both valuefunctions strictly decrease with |qt −θm,D|, there exists a status quo cutoff, ql < θm,D, suchthat for all qt ∈ (ql,θm,D), no alternative policy is pursued. Specifically, ql is given by thelarger of the two policies, q1 and q2 which satisfy V no countD (q1) = 0 and VcountD (q2) = Cw,217respectively.For qt < ql , there are two possibilities. If q1 > q2, then set ql = ql = q1 with VcountD (q1)<Cw and V no countD (q1) = 0. In this case, for any qt < q1, an alternative policy is pursued with-out a whip count: by Lemma 4.3.5, over this range,V no countD (q1) > 0 so that an alternativepolicy without a whip count is preferred over not pursuing an alternative policy and, as qtdecreases from q1, V countD (qt)−V no countD (qt) decreases so that not conducting a whip countremains more valuable than conducting one.If q1 < q2, then set ql = q2 and define ql < ql to be the policy for which VcountD (ql)−Cw =V no countD (ql). Such a point must exist because, by Lemma 4.3.5, as qt decreases from ql ,V countD (qt)−V no countD (qt) decreases and so must eventually approach zero. Thus, for qtsufficiently small, V countD (qt)−Cw <V no countD (qt). With these cutoffs, for qt ∈ (−∞,ql], analternative policy is pursued without a whip count because V no countD (qt)>VcountD (qt)−Cw >0 for all qt < ql . For qt ∈ (ql,ql], an alternative policy is pursued with a whip count becauseV countD (qt)−Cw > 0 and, by Lemma 4.3.5, V countD (qt)−V no countD (qt) increases with qt overthis range so that V countD (qt)−Cw >V no countD (qt).Symmetric arguments establish cutoffs, qrand qr, for the bill pursuit decisions over therange qt > θm,D.C.2 The choices from Party “R”The model in the main text focuses on the decision of partymembers and whips from party“D”, since the analogous results hold for those from party “R”. The same is true withthe identification results. However, for completeness and for the estimation procedure, itis useful to write out the inequalities used for “R”, as they differ slightly different fromequations (4.1) and (4.5). In the associated likelihood functions, we just substitute by theappropriate probabilities of a yes vote at the whip count and roll call stages.The difference in the equations comes from party “R” being on the right, while “D” ison the left. This implies that, for “R”, those that vote “Nay” (and need to be whipped) areto the left of the Marginal Voter, while for “D”, those that vote “Nay” to their policies areto the right of the marginal voter.Hence, at the whip count, a politician from party “R” votes “Yea” if:δ1,i,t +θi ≥MVt −η1,t , (C.11)The probability of a yes at the whip count stage is given by:218P(Y wci,t = 1) = P(δ1,i,t +θi ≥MVt −η1,t)= P(δ1,i,t ≥MVt −η1,t −θi)= P(δ1,i,t ≥ γ1,t −θi)= 1−G(γ1,t −θi)= 1−Φ(γ1,t −θi). (C.12)At the Roll Call stage, they vote yes if:δ1,i,t +δ2,i,t +η1,t +η2,t +θi ≥MVt − ymaxR . (C.13)where ymaxR is allowed to be different than the “D” one in the main section.Hence, the probability of a yes at the roll count stage is given by:P(Y rci,t = 1) = P(δ1,i,t +δ2,i,t +θi ≥MVt −η1,t −η2,t − ymaxR )= P(δ1,i,t +δ2,i,t ≥MVt −η1,t −η2,t −θi− ymaxR )= P(δ1,i,t +δ2,i,t ≥ γ2,t −θi− ymaxR )= 1−G1+2(γ2,t −θi− ymaxR )= 1−Φ(γ2,t −θi− ymaxR√2), (C.14)where the last line uses the parametric assumptions on G1+2.For the likelihood of this model for this party, we just replace the above equations into(4.17).C.2.1 Agenda SettingWe focus on illustrating the Republicans’ problem in agenda setting in the case of no whipcount, with status quo’s between the party medians. In the case of no whip count, we canrewrite party R’s expected utility asEUno countR (qt ,xt) =(1−Φ(MVt − MˆV L,Rσ))(u(xt ,θm,R)−u(qt ,θm,R))+u(qt ,θm,R)−Cb219The derivative with respect to xt is given by(1−Φ(MVt − MˆV L,Rσ))ux(xt ,θm,R)− 12σ φ(MVt − MˆV L,Rσ)(u(xt ,θm,R)−u(qt ,θm,R))where φ() denotes the pdf of the standard normal distribution. At xt = qt , the derivativeis strictly positive given qt < θm,R and the fact that MˆV is finite. At xt = θm,R, it is strictlydecreasing given u(qt ,θm,R) < 0. Together these facts ensure an interior solution which isunique, following similar arguments to that for Party “D”. Any interior solution satisfiesthe first-order condition,(1−Φ(MV no countt −MˆV R,Lσ))φ(MV no countt −MˆV R,Lσ) = 12σ(u(xno countt ,θm,R)−u(qt ,θm,R))ux(xno countt ,θm,R). (C.15)220Appendix DAppendix to Chapter 5D.1 Additional Tables and Figures221(a) 2006(b) 2009222(c) 2013(d) 2015Notes: The figures present the constructed distribution of votes in the sample, by mainpolitical party and by timing of decision. On the y-axis, I show the fraction of voters (fromthe total amount of votes) who decided for that party at that sub-period. The final voteshare for a party is the sum of the bars across all periods. One can see that the late deciders,deciding in the last 2-3 days and in the last day, were fundamental to deciding elections in2006, 2009 and 2015.223Table D.1: Summary statisticsQuantitative VariablesVariable Obs Mean Std. Dev.Gender (Female = 1) 1919 0.52 0.50Age 1898 44.63 17.62Education 1897 14.00 3.24Ideology 1832 5.57 2.75Rooms per Member of the Household 1853 1.368 .795Politically Knowledgeable 1878 0.24 0.48Qualitative VariablesVariable Frequency Percentage of TotalWatches News on TV:Almost Never 256 13.42Not so Often 137 7.18Once A Week or So 66 3.462-3 times a week 260 13.63At least once a day 789 41.35More than once a day 400 20.96Total 1908 100Language of the Interview:Hebrew 1325 69.05Arabic 289 15.06Russian 305 15.89Total 1919 100Religious Observance:“Not at all” 410 21.80“A little bit” 890 47.32“A lot” 372 19.78“I observe all of it” 209 11.11Total 1919 100Notes: The table presents Summary Statistics for the Main Variables that will be used. Ed-ucation is measured in years of schooling. Ideology is measured in the self-reported valueon the scale of Left (0) to Right (10). Rooms in the dwelling per Household member willbe used as a proxy for income. In the bottom half of the table, we see descriptive evidenceof two variables: Language of the Interview (used as a proxy for Ethnicity), exposure tothe media (captured by how often does one watch News on TV) and Religious Observance.Knowledgeable is defined by the correctly knowing the minimum threshold for a party tojoin the Knesset (2% in 2006), and who the speaker of the Knesset was (R. Rivlin, in 2006).We can see that there is quite a lot of dispersion, as many individuals know both, and manyknow neither.224Table D.2: Distribution of Votes and Seats in the Knesset, 2006 ElectionsParty/List Number of Votes Number of Seats % of Votes % Votes in SampleKadima 690,901 29 22 21.59Labor 472,366 19 15.1 14.52Shas 299,054 12 9.5 4.92Likud 281,996 12 9 8.41Israel Beitenu 281,880 11 9 12.3Ichud Leumi/Mafdal 224,083 9 7.1 6.51Torah and Shabbat Judaism(UTJ) 147,091 6 4.7 3.49Meretz 118,302 5 3.8 4.37Number of Eligible Voters (Total) 5,014,622Valid votes (Total) 3,137,064Qualifying threshold (2%) 62,742Votes per seat 24,619Notes: The table presents the results of the 2006 Election to the Israeli Parliament (Knes-set). Turnout was 63.55%. The first four columns are extracted from the official electionresults, available online.1 The final column is extracted from our data, from the Panel Sur-veys conducted by the Israel National Election Studies, Tel Aviv University. They are theempirical shares, as calculated from the survey (post-elections) answer to “list voted in thelast election”. They are shown to be close to the true outcome. For further information onthe data, see the Data section. I present in this table only the parties and lists that I use inthe empirical section. Small parties that did not cross the threshold, or the lists United ArabList (4 seats), Hadash (3 seats), National Democratic Assembly (3 seats) and the PensionersParty (7 seats); as I do not have measures for their policy vectors.225Table D.3: How Voters Are Deciding(What will have the) Greatest Effect on your Voting (Pre-Elections Answers)Issue Frequency PercentPolitical process with the Palestinians 81 13Defence situation 185 29Socio-economic situation 197 31Relationships between religious and nonreligious 25 4Corruption and the rule of law 41 6The unity of the people 58 9Other 11 2Do not know 22 3More than one answer 18 3Total 638 100Main Opinions on what the Elections were About (Post-Elections Answers)Issue Frequency PercentPolitical topic (disengagement, negotiations with the Palestinians) 257 18The security topic 280 20The economic-social topic 687 49Corruption 126 9Other 8 1All the topics 26 2None 20 1Total 1,404 100Notes: The table presents the distribution of Issues that voters considered were importantfor them (in the first survey), and what they thought was significant in the elections (post-election). We can see that these mostly revolve around policy issues, such as economy andsecurity. Most voters consider that security and economy issues are the most importantones for their voting decision and expect that to be the case from others. This motivates ourtheoretical approach using a state of nature which citizens wish to learn about.226Table D.4: Israeli Civilian Fatalities in Terrorist Attacks, 2003-2005City Number of Israeli Civilian FatalitiesAfula 3Ashdod 10Baqah al-Gharbiyah 1Beersheva 16Erez (Industrial Zone) 1Gadish 1Hadera 7Haifa 36Jerusalem 59Kfar Sava 1Kfar Ya’bez 1Kibbutz Beit Govrin 1Kibbutz Eyal 1Lahav 2Moshav Nehusha 1Netanya 9Netiv Ha’asara 1Petah Tikva 1Rosh Ha’ayin 1Sde Trumot 1Sderot 5Tel Aviv-Yafo 23Notes: The table shows the summary statistics for the data used from the NGO B’Tselem,about the number of fatalities of Israeli civilians between January 2003 - December 2005(just before the campaign begins). This data is then merged with the survey data used inthe rest of the model by the voters’ cities of residence.227Table D.5: Who stops earlier?(1) (2) (3) (4) (5) (6)Full Sample Voters Deciding During the CampaignOrdered Logit Last Week Last Day Ordered Logit Last Week Last Day(Ideology-5) 0.0304 0.00763 0.00444 0.0179 0.00819 0.00542(0.0231) (0.00562) (0.00461) (0.0298) (0.00716) (0.00769)(Ideology−5)2 -0.0162∗∗ -0.00313∗∗ -0.00259∗∗ -0.0125 -0.00198 -0.00308(0.00640) (0.00154) (0.00130) (0.00861) (0.00204) (0.00215)Education -0.0217 -0.00812∗ -0.00720∗∗ -0.0590∗∗ -0.0152∗∗ -0.0141∗∗(0.0181) (0.00462) (0.00358) (0.0230) (0.00607) (0.00592)Age -0.0170∗∗∗ -0.00408∗∗∗ -0.00406∗∗∗ -0.0213∗∗∗ -0.00346∗∗∗-0.00496∗∗∗(0.00377) (0.000947) (0.000803) (0.00499) (0.00114) (0.00121)Gender (Female) -0.119 -0.00449 -0.0143 0.265∗ 0.0744∗∗ 0.0153(0.113) (0.0286) (0.0240) (0.149) (0.0361) (0.0380)Religiosity -0.0493 -0.00495 -0.00666 0.0495 0.00324 -0.00629Observes “A little bit” (0.134) (0.0358) (0.0294) (0.190) (0.0465) (0.0461)Religiosity -0.241 -0.0488 -0.00522 0.327 0.0322 0.0567Observes “A lot” (0.181) (0.0463) (0.0392) (0.245) (0.0583) (0.0608)Religiosity -1.123∗∗∗ -0.204∗∗∗ -0.119∗∗ 0.0879 0.0175 0.00909Observes “All of it” (0.301) (0.0649) (0.0556) (0.398) (0.0846) (0.0968)Language -0.928∗∗∗ -0.160∗∗∗ -0.124∗∗∗ 0.0892 0.0565 -0.0204(Arabic) (0.221) (0.0496) (0.0406) (0.262) (0.0614) (0.0735)Language -1.073∗∗∗ -0.208∗∗∗ -0.191∗∗∗ -0.557∗∗∗ -0.0739 -0.228∗∗∗(Russian) (0.171) (0.0423) (0.0281) (0.205) (0.0656) (0.0524)Rooms Per 0.0268 0.0209 -0.00279 0.0231 0.0241 -0.0149Household Member (0.0734) (0.0183) (0.0159) (0.0987) (0.0222) (0.0257)Knowledgeable -0.352∗∗∗ -0.120∗∗∗ -0.0583∗∗ -0.392∗∗ -0.127∗∗∗ -0.0645(0.123) (0.0319) (0.0263) (0.178) (0.0456) (0.0436)N 1206 1206 1206 677 677 677R2 0.080 0.077 0.077 0.088Robust Standard errors in parentheses.* p< 0.10, ** p< 0.05, *** p< 0.01Notes: The table presents Ordered Logit (Columns (1), (4)) and OLS results (remaining columns) onthe characteristics of voters who decide at different points in time. For Columns (1) and (4), I presentordered logit regressions, where the outcome is “when the individual decided on whom to vote”, asshown in Figure D.1a. The higher values refer to deciding later (i.e. last day is the highest group,with 2-3 days before being the second highest; and “knew all along” is the lowest ordered category).The other columns use the dependent variable Last Week, defined as 1 if the voter decided in the lastweek (and 0 if not), and Last Day (1 if the voter decided in the last day). Columns (4)-(6) conditionon those who have acquired some information according to the model (i.e. did not answer “knew allalong”), according to our extension. The variable Rooms per Household member refers to Roomsin the dwelling divided by members of the household, which is a proxy for income since incomeis not stated. The omitted categories for the categorical variables are: “not at all” (for Religiousobservance), Hebrew (for Language) and Male (for Gender). The regressions also control for MediaExposure, as defined by how often they watch news on TV. These are not significant and are notshown for parsimony.228D.2 ProofsProof. Proof of Lemmas 5.2.1, 5.2.3: For the expressions in both lemmas, it follows from[59], Section 11.The comparative statics for 5.2.1 are immediate, (with the last one being strict for mi >0):∂Var(x |Hi,t)∂mi= − σ2(mi+ τσ2)2< 0 (D.1)∂Var(x |Hi,t)∂τ= − (σ2)2(mi+ τσ2)2< 0 (D.2)∂Var(x |Hi,t)∂σ2=mi+ τσ2− τσ2(mi+ τσ2)2> 0 (D.3)The comparative statics for 5.2.3 have unclear sign, because they are all multiples of∑mit=1 ei,t which has an unclear sign.Proof. Proof of Lemma 5.2.2I use that E(x |Hi,t) cannot be a function of t. Indeed, if that was the case, then the voterwould know which direction the expectation would be at t +1, even at period t. However,since ei,t+1 is unknown at t and is i.i.d., from the expression in Lemma 5.2.3, that cannot bethe case.By the timing of the model, the first term is the state at t (since voting will only occurat T ), so only the term:−Vari[x |Hi,t ]− ciyi,tmatters in the decision of accumulation.From [59], Section 12, Theorem 1 (on p. 285), we have that since the risk (of a sequen-tial decision procedure with a Normal distribution process) does not depend on the valuesof the observations (as the variance does not, from 5.2.1), the optimal sequential decisionprocedure is given by a procedure in which a fixed number of decisions will be taken.Hence, she will choose to acquire a signal at t if and only if:0 < −Vari[x |Hi,t ∪{ei,t+1}]− ci− (−Vari[x |Hi,t ]) (D.4)ci < Vari[x |Hi,t −Vari[x |Hi,t ∪{ei,t+1}] (D.5)229In the case of a normal distribution, with a quadratic loss function, this is given by theequation stated in our Lemma (see p. 260, replacing r by 1/σ2).Proof. Proof of Corollary note that, from Lemma 5.2.1, the value of the variance of the beliefs in x is strictlydecreasing in the number of signals. It follows that, if a voter stops acquiring informationat some t∗, it will not be worth it again to acquire at some t > t∗. We drop the index i in τifor simplicity.From (D.5), a voter will acquire a signal at 0< t < T −1, but not at t+1 if and only if:ci <Vari[x |Hi,t −Vari[x |Hi,t ∪{ei,t+1}]andci >Vari[x |Hi,t+1−Vari[x |Hi,t ∪{ei,t+2}]Replacing the values from Lemma 5.2.1:ci <σ2t−1+ τσ2 −σ2t+ τσ2andci >σ2t+ τσ2− σ2t+1+ τσ2,and hence, the stopping time would be m = t.She will stop at t = 0 if it is not worth to acquire a signal at t = 0. That means thecost must be higher than the gain at 0 (the difference between the prior and the variance ofbeliefs with one signal, which leads to:ci >1τ− σ21+ τσ2.Similarly, the voter will acquire signals at every period (and hence, stop at T ) if it isworth it to acquire signals at T −1. Since the gains of a signal are strictly decreasing overtime, this would mean that signals are acquired at every period. For the citizen to acquire atT −1, it must be that, from D.5:ci <σ2T −1+ τσ2 −σ2T + τσ2.Proof. Proof of Lemmas 5.2.4: From Lemma 5.2.2, we see that if ci increases, then the230voter cannot choose more information; since the right hand side is strictly decreasing in thenumber of signals. For the second part, I will focus on the case in which bi > 0. Let q besuch that aq = max{a1, ...,aJ}. Let bi be such that the agent chooses “In”. Then as bi→ ∞,it is clear that −2bi(aq− ap˜)→−∞. This, in turn, implies that the product over k goes to0, as Φ(−∞) = 0, and hence the whole term converges to −κ < 0, so the agent will alwaysbe out.Note that, conditional on being “In”, the acquisition problem does not depend upon bi,due to (5.2.2).Lemma D.2.1. We now find the probabilities of stopping at a period and voting for a given candidate(given an information history). These are used in our Maximum Likelihood estimation and are basedon the parametrization (5.10), together with the distribution of the information signals.P(ti = t | In,zi,x;θ)=1−Φ( 1ση (ln( 1τi −σ21+τiσ2)− z′iβ )) i f t = 0Φ( 1ση (ln(σ2t−1+τiσ2 −σ2t+τiσ2)− z′iβ ))−Φ( 1ση (ln( σ2t+τiσ2− σ2t+1+τiσ2 )− z′iβ )) 0< t < TΦ( 1ση (ln(σ2T−1+τiσ2 −σ2T+τiσ2)− z′iβ )) i f t = T,(D.6)where Φ(·) is the CDF ofN (0,1), and:P(vi = j | ti = t,zi,x;θ)=1, i f t = 0, j = argmaxk∈{1,...,J}− (ak−bi−µi)20, i f t = 0, j 6= argmaxk∈{1,...,J}− (ak−bi−µi)2∏k 6= jΦ((12 (a2k−a2j −2bi(ak−a j))−(t∗x+τiµiσ2)(ak−a j)t+τiσ2)t+τiσ2|ak−a j|√tσ), t > 0, j 6= J1−∑J−1j=1∏k 6= jΦ((12 (a2k−a2j −2bi(ak−a j))−(t∗x+τiµiσ2)(ak−a j)t+τiσ2)t+τσ2|ak−a j|√tσ), i f else(D.7)In the second part of the Lemma above, we construct the solutions for the probability ofchoosing a party. In the simplest case, when no information is acquired and no randomnessfrom signals arrives, then the choice is deterministic given the ideology. For the other cases,it must be the case that the utility from one party is higher than from all the others ones.For that, we use an approximation. That is warranted for the maximum likelihood sincethe model is (i) identified without this, (ii) the dependency between choises is a nuisanceparameter compared to our main analysis.Proof. Proof of Lemmas D.2.1: An immediate substitution of (5.10) in ( yieldsthe expression above for P(t∗i = t | zi,x;θ). This determines the second term of the likeli-hood. Note that this is similar to a hazard model, with the continuation being that one has231not stopped accumulating information before t. Since the variance term is monotonicallydecreasing in t, from (5.2.1); I am just comparing whether it is worthwhile to accumulatenow and to stop in the following period.For the first term of the likelihood, we use a quasi-likelihood approach. We assumethat there is independence across parties to vote for, j, such that the probability of a voterchoosing j (conditional on t∗i > 0) is given by:P(v∗i = j | t∗i = t, t∗i > 0,x,zi;θ) = P(−(a j−bi−Ei[x |Hi,t∗ ])2 >maxk 6= j−(ak−bi−Ei[x |Hi,t∗ ])2)≈ ∏k 6= jP(−(a j−bi−Ei[x |Hi,t ])2 >−(ak−bi−Ei[x |Hi,t ])2)= ∏k 6= jP(12(a2k−a2j −2bi(ak−a j))> (ak−a j)Ei[x |Hi,t ])where the second line uses that {ak} are fixed and the signals are i.i.d.Using 5.2.3; we know that:Ei(x |Hi,t∗)(ak−a j)= (∑t∗m=1 ei,m+ τiµiσ2)(ak−a j)t∗+ τiσ2∼N((t∗x+ τiµiσ2)(ak−a j)t∗+ τiσ2,(ak−a j)2t∗σ2(t∗+ τiσ2)2)where I have used that ei,m ∼ N (x,σ2) ∀m, and it is i.i.d. over time. Since theprobabilities must sum up to 1, I take j = J as the final option, and that the probability forvoting J is given by one minus the others, when t∗i > 0.It follows that:P(v∗i = j | t∗i = t,zi,x;θ)≈∏k 6= jΦ((12 (a2k−a2j −2bi(ak−a j))−(tx+τiµiσ2)(ak−a j)t+τiσ2)t+τiσ2|ak−a j|√tσ)i f t∗i > 0, j 6= J1 i f t∗i = 0 and j = argmaxk∈{1,...,J}0 i f t∗i = 0 and j 6= argmaxk∈{1,...,J}(D.8)If t∗i = 0, then there is no uncertainty (as no signals are received); and hence the probabilityof choosing j is given by 1 if j is closest to the ideology; as she solves:maxa∈{a1,...,aJ}−(a−bi−µi)2 (D.9)Hence, (D.7) and (D.6) in (5.12) gives us the (quasi) likelihood function.232D.3 Israeli Political SystemThe empirical section of this paper will focus on data from Israel, in particular data from2006. I briefly introduce the Israeli political system which is the environment for my empir-ical study. Israel is a multiparty parliamentary democracy, with one chamber (the Knesset).The Knesset has 120 seats that are distributed according to proportional representation, al-though subje