UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essays in applied econometrics Schwartz, Jacob 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_november_schwartz_jacob.pdf [ 809.33kB ]
JSON: 24-1.0371941.json
JSON-LD: 24-1.0371941-ld.json
RDF/XML (Pretty): 24-1.0371941-rdf.xml
RDF/JSON: 24-1.0371941-rdf.json
Turtle: 24-1.0371941-turtle.txt
N-Triples: 24-1.0371941-rdf-ntriples.txt
Original Record: 24-1.0371941-source.json
Full Text

Full Text

Essays in Applied EconometricsbyJacob SchwartzBSc, University of Victoria, 2009MA, University of Victoria, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Economics)The University of British Columbia(Vancouver)September 2018c© Jacob Schwartz, 2018The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:Essays in Applied Econometrics, submitted by Jacob Schwartz in partial fulfill-ment of the requirements for the degree of Doctor of Philosophy in Economics.Examining Committee:David Green, Economics Co-SupervisorKevin Song, Economics Co-SupervisorVadim Marmer, Economics Supervisory Committee MemberFlorian Hoffmann, Economics Supervisory Committee MemberThomas Davidoff, Sauder School of Business University ExaminerKevin Milligan, Economics University ExamineriiAbstractChapter 1 develops an empirical two-sided matching model with endogenous pre-investment. The model can be used to measure the impact of frictions in labourmarkets using a single cross-section of matched employer-employee data. The ob-served matching of workers to firms is the outcome of a discrete, two-sided match-ing process where firms with heterogeneous preferences over education sequen-tially choose workers according to an index correlated with worker preferencesover firms. The distribution of education arises in equilibrium from a Bayesiangame: workers, knowing the distribution of worker and firm types, invest in edu-cation prior to the matching process. I propose an inference procedure combiningdiscrete choice methods with simulation. Counterfactual analysis using Canadiandata shows that changes in matching frictions can lead to economically significantequilibrium changes in both inequality and the probability of investing in highereducation. These effects are more pronounced when worker and firm attributes arecomplements in the match surplus function.In many economic settings, agents behave similarly because they share in-formation with one another. Information-sharing relations among agents can bemodeled as a network, and the strategic interactions among them as a game on anetwork. Chapter 2, coauthored with Kyungchul Song and Nathan Canen, devel-ops a tractable empirical model of social interactions where each agent - withoutseeing the full information network - shares information with their neighbors andbest responds to the other players based on simple beliefs about their strategies.We provide conditions on the information networks and beliefs of agents such thattheir best responses exhibit economically intuitive features and desirable externalvalidity relative to equilibrium models of social interaction. Moreover, the setupiiiadmits asymptotic inference without requiring that the researcher observes all theplayers in the game, nor that the they know precisely the sampling process.Chapter 3 discusses how discrete distributions of unobserved heterogeneity canbe identified using information on sample attrition. Although attrition is oftenseen as a source of selection problems, we argue that it can also be used to solveselection problems - even in the absence of covariates or panel data.ivLay SummaryThe quality of information that people have can affect the decisions they makeprior to entering a market. For example, education may be less valuable to workersin labour markets with poor information. Chapter 1 develops tools for measur-ing these frictions in markets where agents’ decisions affect one anothers’ hiringoutcomes. I apply the methodology to study labour markets in Canada.Chapter 2 develops an approach for studying how information-sharing agentsaffect one another in social networks. The methods exhibit good external validityand do not require strong assumptions about how the data were sampled. We applythe methods to study public goods provision in Colombia.Chapter 3 argues that information on whether or not agents are observed toleave a researcher’s data set can provide the researcher with useful informationwith which to learn about the agents’ unobserved attributes.vPrefaceChapter 1 of this thesis “Schooling Choice, Labour Market Matching, and Wages”is my original work. The empirical section of this chapter uses data from StatisticsCanada’s Workplace Employee Survey (WES).The second chapter, “Estimating Local Interactions Among Many Agents WhoObserve Their Neighbors”, is an unpublished working paper that I co-authoredwith Nathan Canen and Kyungchul Song. The authors contributed equally to theproject overall.In Chapter 3, “Identification Using Attrition, Section 3.2 up to Assumption3.2.1 of Section 3.2.2 is drawn from a working paper I co-authored with HugoJales entitled “Type-Targeted Treatment Effects and Type Revelation”, of which Iam the lead author and principal contributor. The remainder of Chapter 3 is myoriginal work that is new for the thesis.Any views expressed in this thesis are mine alone and do not reflect the viewsof Statistics Canada or the Government of Canada.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Schooling Choice, Labour Market Matching, and Wages . . . . . . 31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.1 Background and Related Literature . . . . . . . . . . . . 71.2 The Labour Market As a Two-Sided Matching Market . . . . . . . 101.2.1 Baseline model . . . . . . . . . . . . . . . . . . . . . . . 101.2.2 Frictional Matching Model with Worker Investments . . . 141.2.3 Some Implications of Frictional Matching Model . . . . . 171.3 Econometric Inference . . . . . . . . . . . . . . . . . . . . . . . 191.3.1 Two-Stage Inference Accommodating Cross-Sectional De-pendence of Observed Matching . . . . . . . . . . . . . . 22vii1.3.2 First-Stage Estimation of θ . . . . . . . . . . . . . . . . 271.3.3 Matching Probabilities . . . . . . . . . . . . . . . . . . . 321.4 Analysis of a Labour Matching Market in Canada . . . . . . . . . 331.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 331.4.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.4.3 Model Estimates and Counterfactuals . . . . . . . . . . . 351.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 Estimating Local Interactions Among Many Agents Who ObserveTheir Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2 Strategic Interactions with Information Sharing . . . . . . . . . . 452.2.1 A Model of Interactions with Information Sharing . . . . 452.2.2 Predictions from Rationality . . . . . . . . . . . . . . . . 482.2.3 Belief Projection and Best Linear Responses . . . . . . . 522.2.4 The External Validity of Network Externality . . . . . . . 582.3 Econometric Inference . . . . . . . . . . . . . . . . . . . . . . . 622.3.1 General Overview . . . . . . . . . . . . . . . . . . . . . 622.3.2 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . 772.4 A Monte Carlo Simulation Study . . . . . . . . . . . . . . . . . . 792.5 Empirical Application: State Presence across Municipalities . . . 812.5.1 Motivation and Background . . . . . . . . . . . . . . . . 812.5.2 Empirical Set-up . . . . . . . . . . . . . . . . . . . . . . 852.5.3 Model Specification . . . . . . . . . . . . . . . . . . . . 862.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 872.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893 Identification Using Attrition . . . . . . . . . . . . . . . . . . . . . . 913.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913.2 Identification of the Type-Targeted Treatment Effect . . . . . . . . 943.2.1 General Setup . . . . . . . . . . . . . . . . . . . . . . . . 963.2.2 Identification Under One Period of Attrition . . . . . . . . 963.2.3 Identification Using Multiple Periods of Attrition . . . . . 101viii3.3 Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . 1053.3.1 Small Simulation Exercise . . . . . . . . . . . . . . . . . 1063.4 Using Attrition to Correct for Selection on Unobservables . . . . . 1063.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.1 Appendix to Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . 121A.1.1 Equilibrium of Investment and Matching Game . . . . . . 121A.1.2 Additional Mathematical Results . . . . . . . . . . . . . . 125A.1.3 A Monte Carlo Simulation Study . . . . . . . . . . . . . 133A.1.4 Empirical Section . . . . . . . . . . . . . . . . . . . . . . 137A.1.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 137A.1.6 Estimation Results . . . . . . . . . . . . . . . . . . . . . 138A.1.7 Model Counterfactuals . . . . . . . . . . . . . . . . . . . 140A.2 Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . 149ixList of TablesTable 2.1 The Characteristics of the Payoff Graphs . . . . . . . . . . . . 60Table 2.2 The Degree Characteristics of the Graphs Used in the Simula-tion Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Table 2.3 The Empirical Coverage Probability of the Confidence Intervalsfor β0 at 95% Nominal Level. . . . . . . . . . . . . . . . . . . 81Table 2.4 The Average Length of Confidence Intervals for β0 at 95% Nom-inal Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Table 2.5 The Empirical Coverage Probability of Confidence Intervals fora′ρ0 at 95% Nominal Level. . . . . . . . . . . . . . . . . . . 83Table 2.6 The Average Lengths of Confidence Intervals for a′ρ0 at 95%Nominal Level. . . . . . . . . . . . . . . . . . . . . . . . . . 84Table 2.7 State Presence and Networks Effects across Colombian Munic-ipalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Table 3.1 The Empirical Coverage Probability and Average Length of Con-fidence Intervals for Attrition Process Parameters at 95% Nom-inal Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Table A.1 The Empirical Coverage Probability of Asymptotic ConfidenceIntervals for a′θ0 at 95% Nominal Level When β0 is Known. . 135Table A.2 Average Length of Confidence Intervals for a′θ0 at 95% Nomi-nal Level When β0 is Known. . . . . . . . . . . . . . . . . . . 136Table A.3 Matching Technology In Canadian Manufacturing and FinanceIndustries, 1995-2005 . . . . . . . . . . . . . . . . . . . . . . 140xTable A.4 Estimation of Worker and Firm Preferences in Canadian Manu-facturing Industry, 1999-2005 . . . . . . . . . . . . . . . . . . 141Table A.5 Estimation of Worker and Firm Preferences in Canadian Fi-nance Industry, 1999-2005 . . . . . . . . . . . . . . . . . . . . 142Table A.6 Counterfactual Estimated Probabilities of Investing in High Ed-ucation, Manufacturing Industry . . . . . . . . . . . . . . . . 143Table A.7 Counterfactual Estimated Probabilities of Investing in High Ed-ucation, Finance Industry . . . . . . . . . . . . . . . . . . . . 144Table A.8 Counterfactual Estimated Probabilities of Investing in High Ed-ucation, Manufacturing Industry . . . . . . . . . . . . . . . . 145Table A.9 Counterfactual Estimated Probabilities of Investing in High Ed-ucation, Finance Industry . . . . . . . . . . . . . . . . . . . . 146Table A.10 Matching Technology Counterfactuals, Simulated Gini Coeffi-cient, Manufacturing Industry . . . . . . . . . . . . . . . . . . 147Table A.11 Matching Technology Counterfactuals, Gini Coefficient, FinanceIndustry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148xiList of FiguresFigure 1.1 Education and Wage Inequality Under Specification 1 . . . . . 20Figure 1.2 Education and Wage Inequality Under Specification 2 . . . . . 20Figure 1.3 Supply of Highly Educated Workers and Education Wage Premia 21Figure 2.1 Network Externality Comparison Between Equilibrium and Be-havioral Models: Erdo¨s-Re´nyi Graphs . . . . . . . . . . . . . 61Figure 2.2 Network Externality Comparison Between Equilibrium and Be-havioral Models: Baraba´si-Albert Graphs . . . . . . . . . . . 61Figure 2.3 Degree Distribution of GP . . . . . . . . . . . . . . . . . . . 85Figure 2.4 Average Network Externality from being a Department Capital 89Figure A.1 Wage Inequality in Canada’s Workplace-Employee Survey: 99-50 Difference in Quantile of Log Hourly Wages . . . . . . . . 139Figure A.2 Income Inequality in Canada: Gini coefficients, 1990-2015 . . 139xiiAcknowledgmentsThis thesis could not have been written without the expert advice, patience, and en-couragement of each of my main committee members: Kevin Song, David Green,and Vadim Marmer. My work also benefited greatly from conversations with Flo-rian Hoffmann. I could not have asked for a better committee. Special thanks arealso due to Hiro Kasahara, Paul Schrimpf, Michael Peters, and Joris Pinkse.I would also like to thank my family, friends, and colleagues: in particularAnujit Chakraborty, Hugo Jales, Jon Graves, Pierluca Pannella, Oscar Becerra,Ruoying Wang, Tom Cornwall, Nathan Canen, Denis Kojevnikov, Tom Thorn, andAlison Watt. Lastly, I thank David Giles and Judy Clarke for their support duringmy studies in Victoria, and for introducing me to the field of econometrics.xiiiIntroductionThis thesis is mainly concerned with developing methodology for inference inmodels with many simultaneously interacting agents. The first two chapters de-velop tools to help researchers better exploit the rich information contained inmatched employer-employee data and network data: “Schooling Choice, LabourMarket Matching, and Wages” and “Estimating Local Interactions Among ManyAgents Who Observe Their Neighbors”. The third chapter, “Identification UsingAttrition”, provides a general case for the usefulness of data on sample attrition forlearning forms of unobserved heterogeneity.Understanding how and why workers and firms match with one another is im-portant for understanding the sources of wage inequality. Studying matching inlabour markets presents a particular challenge in settings where the decisions of in-dividual job seekers affect one another’s hiring outcomes, thereby creating strongcross-sectional dependence. The first chapter of this thesis develops a method-ology to address this challenge. The method can be used for studying the rolethat matching frictions plays in shaping education and wage patterns using cross-sections of matched employer-employee data. In this paper, the observed matchingis the outcome of a two-sided matching process where firms with hetereogeneouspreference rankings over skill sequentially choose workers according to an indexcorrelated with worker preference rankings over firms - the index being a simpleand flexible way of modeling matching frictions. The distribution of worker skillin my model arises in equilibrium from a Bayesian pre-match investment game:workers, knowing the distribution of worker and firm types, invest in educationprior to the matching process. In simulations of my model, I demonstrate howimprovements in the matching technology can lead to an increase in college wage1premia without leading to a significant increase in the supply of highly educatedworkers. A dynamic extension to the static structural framework that I developin this chapter may thus offer a novel explanation to a longstanding puzzle in theempirical literature. I estimate my model using data from Canada’s Workplace Em-ployee Survey for the years 1999-2005. My study of counterfactual distributionsof education and wages reveals that changes in the matching technology can leadto economically significant changes in both wage inequality and the probability ofinvesting in higher education, and that these effects are more pronounced whenworker and firm attributes are complements in the match surplus function.The increased availability of large network datasets has provided economistswith detailed information to study social interactions. In practice, inference usingsuch network data can present a serious challenge to the researcher, since the sam-pling scheme used to collect such data is often both non-random and not known tothe researcher with precision. In work with Nathan Canen and Kyungchul Song,“Estimating Local Interactions Among Many Agents Who Observe Their Neigh-bors”, we develop a linear model of interactions convenient for inference when theeconometrician sees many (possibly non-random) samples of local interactions be-tween agents. The inference procedure we propose exhibits asymptotic validity re-gardless of the sampling scheme used, provided it satisfy certain weak conditions.A key feature of our model is that correlation between the actions of neighbors canemerge partly due to information sharing between them when such information isunobserved by the econometrician. In our setup, we show how the presence ofsuch information sharing can be detected using a test based on the cross-sectionalcorrelation of the residuals.Sample attrition is typically considered a nuisance that prevents inference onthe underlying population of interest. Chapter 3 takes an alternative view, arguingthat when the attrition decision depends on unobserved heterogeneity, attrition pat-terns over time can be used to identify the distribution of unobserved heterogeneity- even in the absence of covariates or panel data on the outcome of interest.2Chapter 1Schooling Choice, LabourMarket Matching, and Wages1.1 IntroductionSince the 1980s, economists have attributed rising wage inequality to a number ofsources. One possible source of such inequality is positive assortative matchingbetween workers and firms - the tendency for the quality of workers and firms whomatch with one another to be positively correlated.1 Understanding how and whycertain workers and firms match with one another is thus a key to understanding thisassortativity channel and, ultimately, a source of wage inequality. Unfortunately,studying matching in labour markets presents a serious challenge when the deci-sions of individual job seekers affect each other’s hiring outcomes. This chapterdevelops a methodology to address this challenge. In particular, I show how cross-sections of matched employer-employee data can be used to study the role that alabour market matching technology plays in shaping the equilibrium distributionsof education and wages. A key result is that - even in the absence of complementar-ities between worker and firm types in the match production function - the model1Recent empirical papers examining the role of sorting on wage inequality include Card et al. 35,Barth et al. [13], and Kantenga and Law [75].3can capture assortative matching between workers and firms.2A general overview of the labour market in my model is as follows. Agentsfrom one side of the market sequentially choose agents from the other side ac-cording to their preferences. Preference rankings of the choosers depend on apreference parameter, along with the capital of both types of agents. The orderin which the choosers pick depends on the chooser’s capital and a matching tech-nology parameter. Before matching, the agents who will be chosen are allowed tosimultaneously decide their capital given the distribution of the chooser’s capitaland the underlying parameters of the economy (including the frictions).The structural approach I develop allows me to examine counterfactual distri-butions of education and wages under different matching technologies and pref-erences. Although the model of the labour market I develop in this chapter isfundamentally a static one, an extension to my approach may help explain a long-standing empirical puzzle – namely, how college wage premia can increase withoutan associated increase in the supply of highly educated workers. I show that thispattern is captured by simulating a special case of my model in which matchingfrictions become less severe over time.This chapter contributes to the econometric literature concerned with infer-ence in two-sided matching models. I propose a two-stage approach for inferenceon the agents’ preferences and the matching technology. In the first stage, I fixthe matching technology and construct confidence regions for the preference pa-rameter by estimating the Bayesian game associated with the workers’ pre-matchinvestment in education decision. I show that this problem can be cast in a discretechoice framework that is tractably estimable using maximum likelihood when theworkers’ educational decision takes one of two values (college, or no college). Inthe second stage, I construct confidence intervals for the matching technology us-ing a simulation-based inference approach. In the first stage, the presence of thematching function in workers’ expected utility function makes estimating work-ers’ equilibrium expectations highly non-trivial. Nevertheless, under reasonableassumptions, I show that workers’ equilibrium expectations can be written in a2The value of a match between any type of worker, h, and any type of firm, k, can be representedusing a positive, increasing function, f (h,k). We say that the types are complements in f when themarginal product of an h type is higher when matched with a higher k type (and vice versa).4closed form suitable for estimation. The second-stage inference on the matchingtechnology uses the following insight: once the matching process is specified, thefinite sample distribution of the observed matching is known up to a parameter.3I construct a test statistic that measures the distance between the observed jointdistribution of worker education and matched firm capital to simulated counter-parts. A confidence interval for the matching technology can then be constructedby inverting the test.A unique feature of such a model is that worker and firm types need not be com-plements in the match production function for sorting between workers and firmsto occur. This chapter builds on the fundamental insights of Becker [15] and Galeand Shapley [56] to highlight the fact that, under certain circumstances, a meaning-ful notion of sorting can be captured in an empirical model with additive workerand firm effects. In his seminal 1973 paper on the marriage market, Becker arguedthe following: when the match production function is supermodular and utility istransferable between matched agents, high types can outbid low types for the bestpartners, leading to an equilibrium with positive assortative matching.4 The reasonwhy positive assortative matching can occur in my model even when such comple-mentarities are absent is answered in Becker’s same 1973 paper. Becker notes thatsorting can arise in an non-transferable utility (NTU) framework when the pay-offs of the agents on both sides of the market are monotonic in the other agent’stype. To explain why this is so, Becker invokes the logic of pairwise stability (Galeand Shapley, 1962).5 For example, consider an economy with four agents where alow-type firm is paired with a high-type worker and vice-versa. Such a matching isunstable when high types are preferred, because both high-types will agree to aban-don their low-type partners for one another. In a special case of my model wherepreferences are indeed monotonic, sorting - and some inequality - may emerge. Inthis case, complementarities are not necessary for sorting but merely amplify the3This idea of using a structural model to characterize the joint distribution of a discrete matchingmodel that can then be used for inference on the model parameters builds from work-in-progress Iam pursuing with Taehoon Kim, Kyungchul Song, and Yoon-Jae Whang. Although computationallyintractable when the dimension of the parameter is large, this approach is attractive for inference onthe matching technology parameter in the second stage of my approach.4When f is differentiable, (strict) supermodularity is equivalent to ∂ 2 f (h,k)/∂h∂k > 0.5This insight comes to me by way of Chade et al. [37]’s excellent review of the search andmatching literature.5effects of sorting, since interactions between worker and firm types in the wagefunction lead to more wage dispersion than when such interactions are absent.I estimate my model using matched employer-employee data on the finance andmanufacturing industries using Canada’s Workplace-Employee Survey (WES). Ifind that frictions in the matching technology rose in the middle of the sampleperiod, a time corresponding with relatively stable wage inequality. My modelcounterfactuals imply that the matching technology frictions matter: for example,in the 2001 finance industry, the effect of eliminating frictions causes a roughly 8%increase in the equilibrium probability of high education in the complementaritiescase and only a 3% increase in the specification without.The empirical findings highlight the importance of production complementar-ities to wage inequality. The model-predicted level of wage inequality is muchhigher (and more reasonable for Canada) in the case that complementarities arepresent - for example, the predicted Gini coefficient for the 2002 finance industryis 0.2542 for the case of complementarities, while it is only 0.1655 in the additivecase. The effect of information frictions on wage inequality are complicated bythe presence of two competing effects: when frictions are lowered, the number ofworkers investing in education rises (a supply effect), but assortative matching be-tween worker and firms also increases (a sorting effect). The sorting effect tendsto increase inequality while the supply effect tends to decrease it. I argue that therole of the latter effect is relevant in the subgroups I study, where the equilibriumprobability of investment in education is typically quite high. For example, amonghigh skilled workers in the manufacturing industry in 2005, the Gini is 0.2220 andthe investment in education is 77%. This rises to 0.2507 (education investment76%) when information frictions are highest and 0.2452 (education investment85%) when frictions are lowest. Overall, however, the result results suggest thatchanges in wage inequality over time are mostly driven by changes in exogenousworker and firm characteristics and preferences (including shifts in the underlyingproduction technology) rather than changes in the matching technology.The key features of the model are explored in Section 1.2.3. In addition to il-lustrating the model’s main implications concerning the relationship between com-plementarities, frictions, and sorting, the section also suggests that my model maybe able to shed some light on other empirical puzzles. In particular, we see how the6model predicts that a fall in information frictions can lead to a dramatic rise in theeducation wage premium through sorting while at the same time, the same fall ininformation frictions leads to a much more modest increase in the supply of highlyeducated workers. Thus, changes in informational frictions may be a useful wayto explain a puzzling empirical findings concerning the relationship between wagepremia and educational attainment.61.1.1 Background and Related LiteratureSince Abowd et al. [2] (AKM), the availability of matched employer-employee datahas allowed researchers to study the role that unobserved worker and firm attributesplay in driving wage variation over time. In AKM, the correlation of worker andfirm fixed effects from wage regressions is taken to capture a notion of sorting.Although popular for investigating the wage structure, a burgeoning literature hascriticised the viability of AKM for detecting sorting on unobservables. In particu-lar, the additive structure of AKM implies that wages are monotone in firm type -an implication that is difficult to reconcile with equilibrium models of sorting withand without frictions (Eeckhout and Kircher [52], Lopes de Melo [87]).7 For ex-ample, in Eeckhout and Kircher [52], a low-type worker can receive a lower wageat a high-type firm since the worker must implicitly compensate the high-type firmin equilibrium for forgoing the oppotunity to fill a vacant job with a higher-typeworker.8Search and matching models have emerged as the leading alternative to theAKM framework for studying sorting in labour markets.9 In this literature, thestandard matching technology is one that converts aggregates of vacancies andunemployed workers into matches. Although treating matching at the aggregatelevel simplifies the analysis considerably, any strategic interdependence that maybe present in the matching process is assumed away (Chade et al. [37]). In many6See Card and Lemieux [33].7Gautier and Teulings [57] was an early empirical study that detected a concave relationshipbetween wages and firm type.8There are other reasons wages may be non-monotonic in firm type. In Postel-Vinay and Robin[99], workers may be willing willing to accept lower wages at higher type firms when they expect toreceive higher wages in the future.9Hagedorn et al. [65] Bagger and Lentz [11] , Lise et al. [85] and Lopes de Melo [87] all findevidence of positive sorting when an AKM approach finds negligble sorting.7labour markets, it is unrealistic to suppose that the decisions of individual workersdo not affect the outcomes of other workers.10 One contribution of this chapter isto develop and estimate a model that takes such strategic interdependence in thematching process seriously. In the equilibrium of my model, the probability that aworker matches to a given firm typically depends on the decisions of all the otheragents in the economy.Another key facet of search models is that workers direct their job search basedon the wages that employers set for them. However, many recent studies of onlinejob markets have found that it is relatively uncommon for positions to explicitlypost wages.1112 This chapter differs from the traditional search literature by sup-posing that workers do not observe posted wages directly. Instead, workers knowthe underlying distributions of job characteristics and the matching process priorto simultaneously investing in education. In this sense, the worker’s decision toinvest in education is the channel by which workers are able to direct their search.This chapter is related to a literature studying the role that match function com-plementarities play in driving sorting patterns. Since Becker [15]’s seminal studyof the marriage market, it has been known that in two-sided matching markets withnon-transferable utility (NTU), a sufficient condition for a stable matching to ex-hibit positive assortativity is that the agents’ payoff be monotonic in their partners’type.13 NTU arises naturally in my model from the assumption that wages for anymatched pair are determined exogenously (in fact, by imposing Nash bargaining).In my approach, sorting can arise in the absence of direct interactions betweenworker and firm types in the (post-match) surplus function in the special case thatworkers and firms have monotone preferences over one another.1410For example, a worker’s decision to get a master’s degree in finance will not only affect thelikelihood that he gets a job at an investment bank, but also the likelihood that his competitors getthe job.11For example, Marinescu and Wolthoff [91] study the role of job titles in directing the search ofworkers report that only 20% of job the advertisements CareerBuilder.com report a wage.12More generally, Cardoso et al. [36] is another recent paper that studies the impact of informationfrictions in search and matching models.13Becker remarks this without explicitly invoke a notion of pairwise stability, such as that in Galeand Shapley [56] (Chade et al. [37]).14 A desire to accommodate complementarities does not necessarily require us to abandon theAKM framework entirely. Bonhomme et al. [23] incorporate worker-firm complementarities into aframework resembling AKM, while relaxing the exogenous mobility assumption. While providing8This chapter contributes to the literature concerned with inference in two-sidedmatching models.15 A key feature of my setup is that the characteristics of agentson one side of the market are endogenous - in particular, arising in equilibrium froma pre-matching investment game. I show how, rather than making the empiricalanalysis intractable, accommodating such pre-matching investments provides theresearcher useful information for inference.16My framework supposes that each equilibrium gives rise to a single large match-ing between workers and firms.17 Under familiar assumptions (e.g., iid and separa-ble private information), I follow similar arguments to Aguirregabiria and Mira [6]to prove that an equilibrium exists. My setup, however, also allows me to providesufficient conditions for equilibrium uniqueness.This chapter is also part of the literature concerned with estimating cross-sectionally dependent observations. In my setup, the observed matching of workersto firms exhibits cross-sectional dependence of an unknown form due to the match-ing process. This means that asymptotic inference approaches that appeal to thethe law of large numbers and central limit theorems will not work. The approachI pursue builds on an idea from work-in-progress that demonstrates how inferencein structural matching models are possible when knowledge of the matching pro-cess can be used to characterize the joint distribution of the observed matching.This chapter shows how such a simulation-based inference approach, cumbersomewhen the dimension of the parameter space is high dimensional or complex, isuseful for estimating a subset of the parameters in structural models with cross-sectional dependence.18empirical support for the existence of complementarities, they also note that the additive specificationdoes not appear to be a bad approximation in practice.15See Chiappori and Salanie´ [38] for a review of this literature. A seminal paper in this literatureis Choo and Siow [39], which considers inference in a transferable utility setup with a continuum ofagents.16A popular approach for estimating two-sided matching models builds on the notion that theobserved matching is pairwise stable. For example, see Fox and Bajari [55], Echenique et al. [51],and Menzel [93]. Requiring that the observed matching be pairwise stable may be unrealistic in thecontext of a frictional labour matching market of the sort that is the focus of this chapter.17This contrasts with cases in which the researcher sees many independent copies of games in-volving few players, such as those studied by Bresnahan and Reiss [26], Ciliberto and Tamer [40],Berry [17] and many others. See Xu [114], Song [107], Menzel [94] for more papers discussing theestimation of large Bayesian games.18The simulation-based approach used in the second-stage of the inference procedure is known9Section 1.2 introduces the model of two-sided labour market matching withfrictions. In the baseline model of Section 1.2.1, workers and firms with exoge-nous characteristics match with one another and split the match surplus accordingto a Nash bargaining rule. The rest of the chapter is organized as follows. Section1.2.2 extends the baseline model to allow for endogenous worker characteristics- after observing their type, workers simultaneously invest in education prior toentering the labour market. Section 1.3 outlines an approach for inference on theparameters of the structural model of Section 1.2.2. Section 1.4 applies the struc-tural inference methodology to matched employer-employee data from StatisticsCanada’s Workplace Employee Survey to study labour markets in Canada. Section1.5 concludes. Mathematical proofs, additional details of the empirical applica-tion, and a simulation study illustrating reasonable performance of the estimators,are confined to the Appendix of this chapter.1.2 The Labour Market As a Two-Sided MatchingMarketOur goal is to study the distribution of education and wages using separatecross-sections of matched employer-employee data. The first subsection introducesthe core elements of the model that will serve as the basis for the structural modelin the second subsection.1.2.1 Baseline modelLet Nh = {1, ...,nh} be the set of workers and N f = {1, ...,n f } be the set of firms,where nh and n f are used to denote the total number of workers and firms, respec-tively. Each worker seeks one job and each firm seeks to hire one worker.The matching of workers to firms will be determined by the preference rank-ings of workers and firms. Workers value the capital of firms, K = (K j) j∈N f , andfirms value the human capital of workers H =(Hi)i∈Nh , where K j and Hi are scalars.Any worker i who is matched with firm j receives wage wi j ≥ 0 while firm j re-as a Monte Carlo test. Monte Carlo tests have a history in econometrics dating back at least to the1950s, as discussed by Dufour and Khalaf [48] in their overview of the technique.10ceives profit ρ ji≥ 0, where both wages and profits may also depend on a parameter,θ ∈ Rd .19 Since our framework supposes that wages and profits are always non-negative for any worker and firm that could match, I will assume throughout thechapter that no agent will ever unilaterally dissolve a match to become unmatched.This requirement that any matching satisfy an individual rationality constraint isembodied in the following condition:20Condition IR (Individual rationality of matches): For each i ∈ Nh, and j ∈ N fwi j ≥ 0 and ρ ji ≥ 0.Based on the values of ρ j = (ρ ji)i∈Nh each firm j can construct preferencerankings over the workers. We suppose that if the firm is ever indifferent betweenone or more workers, then the firm picks preference rankings over these workers atrandom. Next, we introduce a condition on the worker’s wage function that will beuseful for interpreting the matching process (along with our notion of informationfrictions).Condition H (Homogeneous worker preferences): For each i ∈ Nh, the wage ofworker i is increasing in the capital of their matched firm.The assumption is tantamount to a notion of worker preference homogeneity,implying that all workers prefer higher capital firms. Supposing that workers ac-curately observe the capital of firms, this assumption implies that a matching algo-rithm in which the highest capital firm, j1, choose his preferred worker, i1 ∈Nh, thesecond highest capital firm, j2, choose his preferred worker i2 ∈ Nh\{i1} and so onis an example of the serial dictatorship mechanism (Abdulkadirog˘lu and So¨nmez[1]) and would produce a stable matching.21In order to build a model that accounts for the possibility of mismatches be-tween workers and firms, we suppose there are information frictions in the market.19In this setup, θ represents the preferences of both workers and firms. As we will see, wi j and ρ jidepend on the output of worker i at firm j, and the production function that gives rise to this outputwill depend on a part of θ .20The current setup is tailored to settings where the researcher has at least one cross-section ofmatched employer-employee data and the agents who are unmatched are not of primary interest in theanalysis. An interesting (and challenging) extension of the current framework would accommodatethe possibility of unmatched agents, and hence unemployment.21See Section 2.2. of Roth and Sotomayor [102].11Specifically, workers do not directly observe realizations of the firm’s capital. In-stead, each worker sees v = (v j) j∈N f , where v j is a ‘noisy’ measure of firm j’scapital. In particular, suppose that workers seev j = βK j +η j, (1.1)for each j, where β ∈ B, B ⊂ R is the parameter space of β , and η j is a randomvariable that is independent across j.22 The size of the variance of η j relative tothe magnitude β represents the magnitude of information frictions in the matchingprocess. It is clear that when β is zero and the variance of η j is positive, thenthis setup yields random matching from firm to worker the characteristics, sincevariation across firm capital plays no role in determining the realizations of v. Fur-thermore, when β 6= 0 and Var(η j) = 0, it will be as if firm capital is observed bythe worker, since v j is determined entirely by the firm’s capital. In the latter case,when β > 0 workers would favour firms with the largest realizations of v, while inthe case that β < 0, workers would favour firms with the smallest realizations ofv. However, even in the case that Var(η j) > 0, v j still conveys some useful infor-mation to the worker under certain circumstances. When β > 0, η j’s are iid andworkers see v j1 > v j2 , then workers would still prefer matching with Firm j1 overFirm j2 since the distribution of K j1 stochastically dominates distribution of K j2 .The following condition specifies the matching process we will use throughout thechapter.Condition SD (Matching process): The matching of workers to firms in the econ-omy arises as follows. The highest v firm, j1, chooses his preferred worker, i1 ∈Nh,the second highest v2 firm, j2, chooses his preferred worker i2 ∈ Nh\{i1}, and soon, until the lowest v firm, jn f , chooses his preferred worker among those not cho-sen by any higher ranked firms.One way of understanding this matching algorithm in economic terms is to22We make all independence assumptions explicit in Section 1.2.2 when the structural model isintroduced (in particular, see Assumption 1.2.1). Sections 1.2.2 and 1.3.2 also include further as-sumptions regarding the distribution of the model variables (Assumption 1.3.2) and the informationthat agents have (Assumption 1.3.1). In addition, although we do not discuss inference until Section1.3, the reader may presume that the econometrician observes firm and human capital, wages, andthe matching of workers to firms, but observes neither η nor v.12consider the following thought experiment. Imagine a situation in which a groupof job-seekers have assembled in a large room on the day of a job fair. Workersdo not observe the true quality of any of the firms, (represented by K), but they dosee each firm’s value of v. When β > 0 and η j’s are iid, each worker is happiest tomatch with the highest v firm, since the distribution of capital associated with thehighest v firm stochastically dominates the distribution of capital associated withany of the lower v firms. A procedure in which the highest v firm, j1, chooses hispreferred worker, i1 ∈ Nh, the second highest capital firm, j2, chooses his preferredworker i2 ∈Nh\{i1} and so on, will have no complaints from any of the participantsat the job fair – that is, until uncertainty associated with K is revealed. In this world,agents will typically have more regret (and hence a greater desire to rematch) whenthe frictions in v are large. However, rematching is outside the scope of the model.Next, we add some further structure to wages and profits. In particular, wewill assume that the payoffs for any two matched agents follow a Nash bargainingstructure. Let τ ∈ (0,1) be the bargaining weight. A worker i who matches with afirm j receiveswi j = τ f (Hi,K j)+(1− τ)g(Hi), (1.2)while firm j receivesρ ji = (1− τ)( f (H j,K j)−g(Hi)) , (1.3)where f is the worker-firm output function and g(Hi) is an outside option function,both of which may depend on elements of θ . In a subsequent section, we will allowworker covariates, Xi, to effect wages through the outside option function, g.23 Thefollowing condition requires f to satisfy some intuitive properties with respect tothe worker and firm capital variables.Condition F (Production function): f is increasing in human capital and firmcapital.Condition F merely requires that more capital leads to more output - it does notimpose that the worker and firm attributes be complements in f . Section 1.2.2 goes23Xi’s have supportX ⊂ Rd , where d is an integer greater than or equal to one.13into further detail about the role of f in this model.1.2.2 Frictional Matching Model with Worker InvestmentsWe now introduce a structural model where workers simultaneously invest in edu-cation prior to the serial dictatorship matching process as outlined in the previoussection. A general overview of the matching process is as follows: i) workers, ob-serving only their type, simultaneously choose a level of education, ii) v is realized,iii) firms, seeing only the education of workers, match according to Condition SD.Although firms select their preferred workers in the serial dictatorship phaseafter constructing preference rankings over the workers, firms are not consideredstrategic agents within the context of the investment game itself.There are nh players indexed by i ∈ Nh. Each player chooses an educationlevel, hi, from the discrete set H ≡ {1, ...,J} to maximize their expected payoff.Let λ = (θ ′,β ), where β is the matching frictions parameter and θ ∈ Rd is apreference parameter. The payoff function of player i comprises the wage less acost of education,u(hi,h−i,xi,k,η ,εi;λ ) = ω(hi,h−i,xi,k,η ;λ )− c(hi,xi,εi;λ ), (1.4)where h−i ∈H−i are the choices of the other agents,24 xi ∈X and εi ∈ RJ are theprivate information of worker i, and k ∈Rn f and η ∈Rn f are vectors of exogenousfirm variables that are unobserved by the workers. Although εi and xi are privateinformation of the worker, we will assume xi is observed by the econometrician ina subsequent section. The variable εi represents the worker’s private cost associ-ated with each of the J education levels. In Section 1.3.2 we will supply explicitassumptions on worker and firm information that illustrates why, given the match-ing process, the components of the payoff function depend on model’s underlyingvariables in the way stipulated by equation (1.4).We now provide additional conditions that establish the existence of a BayesianNash equilibrium for our game (which we prove in Appendix A.1.1).24Since the set of pure strategies for each agent is H , it follows that H−i =H nh−1 for each i,whereH nh−1 denotes the (nh−1)-ary Cartesian power ofH .14Assumption 1.2.1. (a) K j’s, η j’s are independent across j. Xi’s, εi’s are indepen-dent across i. X, K, ε , and η are independent. (b) εi’s are continuously distributed.Assumption 1.2.2. The cost function is separable in private information:c(hi,xi,εi;λ ) = c0(hi,xi;λ )+ ε ′i d(hi),where d(hi) is a J dimensional vector with one in the hi-th row and zero otherwise.The assumptions of separability and independence are common in the struc-tural literature.25 In Appendix A.1.1, we show that Assumptions 1.2.1 and 1.2.2are sufficient for establishing the existence of the Bayesian Nash equilibrium forthe game of this section. For now, we will provide some intuition into the worker’seducation decision problem. First, we define the set of pure strategies as σ ={σi(xi,εi) : i ∈ Nh} where σi is a function that maps fromX ×RJ−1 intoH . As-sumption 1.2.2 says that we can write the expected utility of agent i with covariatesxi, who chooses hi under beliefs σ asUi(hi,xi,σ ,εi) = U˜i(hi,xi,σ)+ ε ′i d(hi), (1.5)where the first term in the expected utility isU˜i(hi,xi,σ) = ∑h−i∈H−iu˜i(hi,h−i,xi)P−i(h−i|σ), (1.6)andu˜i(hi,h−i,xi)≡ ω˜i(hi,h−i,xi)− c0(hi,xi), (1.7)where ω˜i(hi,h−i,xi) is given byω˜i(hi,h−i,xi) = E[ω(Hi,H−i,Xi,K,η ;λ )|Hi = hi,H−i = h−i,Xi = xi],and expectation is taken with respect to the distributions of K and η . By Lemma25For example, see the discussions in Kasahara and Shimotsu [76] and Xu [113].15(A.1.1) (on page 122) we can rewrite equation 1.6 asU˜i(hi,xi,σ) = ∑h−i∈H−iu˜i(hi,h−i,xi) ∏j∈Nh\{i}Pj(h j|σ j).Throughout this chapter, we will consider the case in which the wages of work-ers are determined by Nash bargaining. As in equation 1.2, we will suppose thatfirm capital only enters the worker’s payoff through the production function. De-note M (i) as the identity of the firm that worker i matches to as a result of thematching process, and KM (i) as the level of capital associated with firmM (i). Un-der Nash Bargaining wages where firm capital only enters the production function,we may write ω˜i(hi,h−i,xi) asω˜i(hi,h−i,xi) = τ f˜i(hi,h−i)+(1− τ)g(hi,xi), (1.8)wheref˜i(hi,h−i) = E[ f (Hi,KM (i))|Hi = hi,H−i = h−i,Xi = xi],the expectation is taken with respect to the distributions of K and η ,26 and wehave allowed the worker’s characteristics to enter the payoff function through theoutside option function, g.27Education affects the worker’s expected utility in a number of ways. The firsttwo are obvious: since f is increasing in hi by Condition F, the worker who investsin a higher level of education obtains a higher wage at any firm he matches to.The worker’s choice of education also affects his payoff through the outside optionfunction, g. The novel channel in this setup is that hi also determines the expectedquality of the firm that i matches to. Even though (as mentioned before) firms inthis model are non-strategic agents, the functional form of the production function,f , plays a key role in determining whether or not firms with different levels of26In Section 1.3.2, we will demonstrate the precise form that this expectation takes under particularlower-level assumptions.27Here, τi = τ for each i. My framework can be extended to incorporate heterogeneity in workerbargaining positions. In the empirical results, however, I ignore this channel. Bagger and Lentz [11]emphasize the role that disentangling variation such as endogenous search intensity from matchingvariation (e.g., Postel-Vinay and Robin [99]) plays in understanding the causes of wage inequality.16capital exhibit different preferences for workers of differing levels of education.To see how f determines whether or not firms’ preferences are heterogeneous,consider the Nash bargaining preferences of a firm for any worker who chooseseducation level h:ρ(k,h;θ) = (1− τ)( f (h,k;θ)− g˜(h;θ)), (1.9)where g˜ = Eg(h,Xi) and the expectation is taken with respect to the distributionof Xi.28 Suppose that Xi is iid, K takes two values k1, k2 and there are two levelsof education, h1, h2 with h2 > h1. Let us denote the set of firms that prefer higheducation (h2) asM+2 (θ) = {m ∈ {1,2} : ρ(km,h2;θ)≥ ρ(km,h1;θ)}.If f (h,k) is of the form a(h) + b(k), where a and b are two functions thatmap the capital variables to the real numbers, then M+2 (θ) will be either {1,2}or /0. In this case we say that firms have homogeneous preferences, since bothtypes of firms in the economy prefer the higher educated workers. Alternatively,if f (h,k) is of the form a(h)b(k) then M+2 (θ) will be either {1,2}, /0, or {2}.This is the case of heterogeneous firm preferences. In this latter case where fexhibits complementarities in worker and firm types, the set of firms types thatprefer high to low education is more finely partitioned. Moreover, the presence orabsence of complementarities will play a key role in determining the severity ofwage inequality, as we will see in Section 1.4. More general than all these points,however, is the following fact about the model: as long as k appears somewherein f , k does not have to interact directly with h in f for the information frictionsrepresented by β to matter in worker’s investment decision.1.2.3 Some Implications of Frictional Matching ModelIn this section, we explore some key features of the model. We will suppose thatthe functional forms, underlying distributions, and firm preferences are such that28Here, we implicitly assume that firms do not observe workers’ covariates and rank workers onlyin terms of their education. We make these assumptions concerning firm information explicit in asubsequent section.17firms always strictly prefer higher educated workers. In the following subsection,we will illustrate sorting without any direct interactions between worker and firmtypes in the production function.Sorting Without ComplementaritiesIn Figure 1.1 and Figure 1.2, we compare the equilibrium probability of investing ineducation and the equilibrium Gini coefficient for a range of the friction parametersunder two specifications of the production function: Specification 1 allows directinteraction between worker and firm types, f = θ1hk, while such interactions areabsent in Specification 2, f = θ1(h+ k).29 Each point on the plot is the averageof 500 simulations of endogenous variable from the equilibrium of the model. Theoutside option parameter is set to θ2 = (−.75, .25, .5). There are 100 workers andfirms. In Specification 1, the high value of θ1 is 3.5, and the low value of θ1 is 1.5.In Specification 2, the high value of θ1 is 1.8, and the low value of θ1 is 0.8. Thereare two levels of of firm capital: K = 1/2 and K = 1. The fraction of each type offirm is .5 in the economy.A number of implications are straightforward: the equilibrium probability ofinvesting in high education is higher when θ1 is higher and frictions are lower.When θ1 is higher, workers will be compensated more for higher levels of educa-tion. When β is higher, the probability of matching to a higher type firm when theychoose high education is higher.The effect of increasing β (lowering matching frictions) on both the educationand wage inequality is typically much more dramatic in Specification 1. A risein β (a lessening in matching frictions) increases sorting in both specifications,though the effect is more dramatic in the complementarities case: in Figure 1.1,the correlation between worker and firm types rises from 60% to 68% when θ1 ishigh, but from 64% to 81% when θ1 is lower; in Figure 1.2, the correlation betweenworker and firm types rises from 84% to 90% in the high theta case whereas itrises from 77% to 87% in the low theta case. The overall level of inequality inSpecification 1 is also higher since whatever sorting is present is amplified to agreater extent when the types interact in the wage equation than when they do not.29The precise functional forms are the same as the ones used in Section A.1.3.18The high θ1 case in the right hand panel of Figure 1.1 also illustrates the rolethat two competing effects of changes in β play on the level of wage inequality.When β rises from 0 to 1, the level of inequality increases through the sortingchannel. However, as β continues rises, the equilibrium probability of investing ineducation also continues to rise. As the fraction of highly educated surpasses 80%,the level of inequality begins to level off (at β = 2) and then begins to fall. Thisphenomenon is also illustrated to a lesser degree in the high θ1 case of the righthand side panel of Figure 1.2.Supply of Highly Educated Workers and Education PremiaIn this section, we show how simulation of our static model can capture a puzzlingphenomenon discussed in Card and Lemieux [33]. How can dramatic increasesin the education wage premium lead to only modest increases in the supply ofhighly educated workers? The authors note that, over a roughly 30 year periodbeginning in the early 1970s, the college-high school wage gap rose considerablyin the United States, Canada, and the United Kingdom, and that this rise occurredmostly for younger workers. They argue that an important source of this trend is astagnation in the rate of educational attainment among workers born in the 1950sand thereafter.In Figure 1.3, we show how this pattern can be driven entirely by changes in thematching technology over time. The wage premium is measured as the differencebetween the average wages of the workers with high education and the averagewages of workers with low education. Each point on the plot represents the averageof 500 simulations of the model. We use Specification 1, f = θ1hk, under the samesetup as before with only one difference; we choose the low value of θ1 to be 0.6and the high θ1 to be 3.5. In the case that θ1 is very low, the effect of raising β isto dramatically increase sorting without a large benefit to any particular worker.1.3 Econometric InferenceIn this section, we outline the general empirical strategy for performing inferenceon the underlying model parameters. In Section 1.3, we describe how the mainmodel can be used to characterize the observed distribution of the matching of19Figure 1.1: Education and Wage Inequality Under Specification 10 1 2 3 4 5β0.60.650.70.750.80.85Probability of High Educationhigh θ1low θ10 1 2 3 4 5β0.1750.180.1850.190.1950.20.205Gini Coefficienthigh θ1low θ1Figure 1.2: Education and Wage Inequality Under Specification 20 1 2 3 4 5β0.60.620.640.660.680.70.720.740.760.78Probability of High Educationhigh θ1low θ10 1 2 3 4 5β0.10.1050.110.1150.120.125Gini Coefficienthigh θ1low θ1Figures 1.1 and 1.2 plot the equilibrium probability of high education investment and the Gini coeffi-cient for a range of values of the matching frictions parameter, β . We consider two specifications forthe production function: Specification 1 includes interactions between worker and firm types whileSpecification 2 does not. Lowering matching frictions (increasing β ) increases the equilibrium levelof education across specifications. A rise in β impacts inequality through two competing effects: asorting effect that increases inequality and an a supply effect that lowers inequality. This can be seenmost dramatically in Figure 1.1: as β rises past a value of three, the fraction of highly educated risesmore and more and inequality falls, dominating the effects of sorting on inequality.20Figure 1.3: Supply of Highly Educated Workers and Education Wage Premia0 1 2 3 4 5Matching Frictions, β0.50.550.60.650.70.750.80.85Probability of High EducationHigh θ1Low θ11 2 3 4 5Increase in β from β=01520253035404550556065Increase in Education Wage Premium, %High θ1Low θ1Figures 1.3 offers an explanation to an empirical puzzle discussed in Card and Lemieux [33]: whyare increases in wage premia not associated with large increases in the supply of highly educatedworkers? We plot the equilibrium probability of high education investment and the returns to educa-tion for a range of values of the matching frictions parameter, β . We consider Specification 1. In thecase that θ1 is very low, the effect of increasing β is to dramatically increase sorting while keepingthe returns to education for any particular worker reasonably low.workers to firm and hence the wages of all the workers in the economy. The goal isto then use these representations to construct confidence regions for the preferenceand matching technology parameters.However, if the model is high dimensional, the Monte Carlo inference approachmay be cumbersome to apply in practice. For this reason, we propose a two-stageinference approach that relies on the construction of a first-stage confidence inter-val for a subset of the model parameters. We demonstrate this approach in practicein Section 1.3.2 by estimating the Bayesian game from 1.2.2 for fixed values of β .211.3.1 Two-Stage Inference Accommodating Cross-SectionalDependence of Observed MatchingThe econometrician observes a matching of workers to firms, M = (M(i))i∈Nh ,where for each i ∈ Nh, M(i) takes values in the set of firms.30 The main chal-lenge associated with inference is the fact that the distribution of M exhibits cross-sectional dependence of a complicated form. The matching of workers to firms canbe thought of as discrete choice problem on the part of the firm where the choicesets of firms are endogenously constrained by the choices of firms with higher v-indices, which depends on β , η and k. Hence, the event that worker i matchesto firm j cannot be considered independent from the event that a worker i′ 6= imatches to firm j. Also, the fact that firm preferences are heterogeneous meanswe cannot condition on the v-index and firm preferences in a way to remove thecross-sectional dependence as was done by Agarwal and Diamond [5].The econometrician observes the vector M∈Rnh , which represents a matchingof workers to firms. Given the serial dictatorship matching process, the joint distri-bution of M is known up to a parameter. Let K = (K(i))i∈Nh , where K(i) = KM(i);i.e., the capital of the firm matched to by worker i.Our model also implies that the finite sample distribution of wages, (W(i))i∈Nh ,is known up to a parameter. Under Nash bargaining (and a specification of the post-match wage function based off an equation such as 1.2), we have for each i ∈ NhW(i) = w(Hi,K(i)).We denote all the match-related observables as Y = (K,M). M is observed when-ever the researcher has matched employer-employee data. K is observed when theresearcher can use the matching data, M, and the firm capital data, K, to find thecapital of the firm each worker in the sample is employed at. Using Y and workerobservables H and X , the econometrician wishes to infer λ0.30Throughout this chapter, we will suppose that the matching is one-to-one between workers andfirms. In practice, “firms” in this context can be viewed as positions at particular firms.22Inference on ParametersNext, we consider a test statistic that matches the moments of the distribution ofthe matched-related observables with their simulated counterparts. To simplify theexposition, we discuss the construction of a confidence interval for β0 alone, i.e.,supposing that we knew the true values of θ0. Denote R+ 1 as the total numberof simulations in the Monte Carlo inference procedure. Drawing ηr from somecontinuous parametric distribution Fθ ,31 we simulate a version of the matching foreach β ∈ B and each r = 1, ...,R+ 1, which we write as Mr(β ) = {Mr(i;β ) : i ∈Nh}. The simulated wages are thenWr(i;β ) = w(Hi,KMr(i;β ))).It is convenient to defineYr(β ) = {Yr(i;β ) : i ∈ Nh},Y∗R+1(β ) = {Yr(i;β ) : i ∈ Nh,r = 1, ...,R+1}, andY−r(β ) = Y∗R+1(β )\Yr(β ).Next, we will propose a test statistic that depends on both the observed matchingdata, Y, and the simulated matching data, (along with simulated versions of thistest statistic). That is,T (β ) = φn(Y,Y∗R(β )), andTr(β ) = φn(Yr(β ),Y−r(β )).An example of such a test statistic is one that compares the observed joint distribu-tion of worker human capital and matched firm capital with simulated counterparts.For example, we may consider the test statisticT (β ) =1RR∑r=1∥∥Pˆ− Pˆr(β )∥∥ ,31We will specify a particular parametric family that Fθ belongs to, along with additional assump-tions, in Section Pˆ is an M×J matrix32 whose (m, j) element is the estimated probability thata worker of education level h j matches to a firm of capital level m, Pˆr(β ) is definedsimilarly to Pˆ, except we replace the observed matching with the rth simulatedmatching, Mr(β ), and ‖·‖ is the matrix norm.33Using our test statistic, we may compute a confidence region for β asCβα,R = {β ∈ B : T (β )≤ cα,R(β )},where the critical value is computed as the (1−α) -quantile of the empirical dis-tribution of {Tr(β ) : r = 1, ...,R}:cα,R(β ) = inf{c ∈ R : 1RR∑r=11{Tr(β )≤ c} ≥ 1−α}.Under Assumption 1.3.2, it can easily be shown that finite sample inference on β0satisfies P{β0 ∈ Cβα,R} ≥ 1−α when the procedure outlined above involves thetrue parameter, θ0.In practice, we do not know the true value of θ0. In situations in which the fullparameter vector λ0 is not very large, it may be feasible to construct a (1−α)100%confidence region for this parameter that exhibits finite sample validity. That is, weconstructCλα,R = {λ ∈ Λ : T (λ )≤ cα,R(λ )}, (1.10)where T (λ ) and cα,R(λ ) are defined analogously to T (β ) and cα,R(β ). In the casethat Λ is high-dimensional, the finite sample procedure outlined above may not bepractical due to the unreasonable computational cost. In the following subsection,we explore a two-stage inference approach that admits inference on β0 when theresearcher is able to construct a first-stage confidence region for a subset of theparameters, θ0.Plugging in a consistent estimator of θ0, θˆn, for the true value in inference pro-32In this example, we are implicitly assuming that the distribution of K is discrete and has Msupport points. We will make this assumption explicit in a subsequent section.33See equation A.12 in the empirical section for more on constructing Pˆ and Pˆr(β ). In this section,we also choose the matrix norm to be the Frobenius norm. That is, for an m× n matrix A, ||A|| =(∑mi=1∑nj=1 |ai j|2)1/2, where ai j denotes the (i, j)-element of A.24cedure outlined above will (in general) not lead to valid inference on β0. This isbecause there is no reason to suspect that plugging in θˆn for θ0 will make the dis-tribution of the simulated matching, Mr, equal to the distribution of the observedmatching, M. The fact that Mr is not equal in distribution to M, in turn impliesthat Kr does not follow the same distribution as K. The severe consequences ofestimation error in θˆn occur because the firm preferences are typically misspecifiedat all values of θ other than the true value, θ0. Moreover, this problem is not allevi-ated by conditioning on H,K, or exogenous variables. In the following section, wediscuss a general two-stage inference approach when the econometrician can con-struct an (asymptotically) valid confidence first-stage confidence interval for θ0. InSection 1.2.2, we extend our baseline economic model of Section 1.2 in a mannerthat admits the application of this two-stage inference approach to our setup.Two-Stage Inference on β using Test-Inversion Confidence IntervalSuppose that we wish a (1−α)-level asymptotic confidence interval for β0, and canconstruct a confidence interval for θ0. Let us denote the test statistic and its simu-lated counterpart from the previous section, where the θ arguments make explicitthe test statistic’s dependence upon a given value of θ ∈Θ:T (β ;θ0,θ1) = φn(Y(β0,θ0),Y∗R(β ,θ1)), andTr(β ; θ˜ ,θ1) = φn(Yr(β , θ˜),Y−r(β ,θ1)).Note that according to the notation we used in the last section we have T (β ;θ0,θ0)=T (β ). Our inference on β proceeds in two steps:Step 1. Using the first stage estimates of θˆ(β ), we construct a confidence regionfor θ0, Cˆα/2(β ), with (1− (α/2)) asymptotic coverage.Step 2. Next, we construct a test statistic that doesn’t involve θ . DefineS(β ) = infθ1∈Cˆα/2(β )T (β ;θ0,θ1), andS∗r (β ) = supθ˜∈Cˆα/2(β )infθ1∈Cˆα/2 (β )Tr(β ; θ˜ ,θ1).25We now construct a confidence set for β asCˆα,R = {β ∈ B : S(β )≤ c∗1−(α/2),R(β )}, (1.11)where the critical value c∗1−(α/2),R(β ) is computed as the (1− (α/2)) -quantile ofthe empirical distribution of {S∗r (β ) : r = 1, ...,R}; that is,c∗1−(α/2),R(β ) = inf{c ∈ R : 1RR∑r=11{S∗r (β )≤ c} ≥ 1− (α/2)}.The following lemma establishes the asymptotic validity of the two-stage in-ference procedure.Lemma 1.3.1. Suppose that the econometrician can construct Cˆα/2(β0) such thatlimn→∞P(θ0 ∈ Cˆα/2(β0))≥ 1− (α/2).Thenlimn→∞P(β0 ∈ Cˆα,R)≥ 1−α. (1.12)Proof. By the definition of Cˆα,R, P(β0 ∈ Cˆα,R)is equal toP(S(β0)≤ c∗1−(α/2),R(β0))= P(infθ1∈Cˆα/2(β )T (β0;θ0,θ1)≤ c∗1−(α/2),R(β0))≥ P[{infθ1∈Cˆα/2(β )T (β0;θ0,θ1)≤ c∗1−(α/2),R(β0)}∩A1], (1.13)where A1 ≡{θ0 ∈ Cˆα/2(β0)}. Then, the right hand side of the right hand side of26(1.13) is greater than or equal toP supθ˜∈Cˆα/2(β ) infθ1∈Cˆα/2 (β )Tr(β ; θ˜ ,θ1)≤ c∗1−(α/2),R(β0)∩A1 ,≥ P supθ˜∈Cˆα/2(β )infθ1∈Cˆα/2 (β )Tr(β ; θ˜ ,θ1)≤ c∗1−(α/2),R(β0)−P(Ac1) .Now sincelimn→∞P(θ0 /∈ Cˆα/2(β0)) ≤ α/2,we havelimn→∞P(β0 ∈ Cˆα,R) ≥ 1−α.In the following section, we describe how to construct Cˆα/2(β ) for each β ∈ B.341.3.2 First-Stage Estimation of θIn this section, we show how θ can be estimated for a particular fixed value ofβ . We will write an estimator of such an object as θˆ(β ). The main challengeassociated with this problem is that of estimating the worker’s expected utility fromequation 1.7. The problem is difficult because the workers must somehow resolveuncertainty associated with the serial dictatorship matching process in order tocompute the expected output under the equilibrium education choices. In spite ofthese complications, it turns out that, under reasonable assumptions, the parametersare tractably estimable using discrete choice methods with a fixed point constraintwhen there are only two education choices. We now provide and discuss theseassumptions.34In the empirical application of this chapter, we estimate θ0 by maximum likelihood and constructconfidence regions using numerical derivatives of the likelihood function. A Monte Carlo simula-tion (reported in the Appendix) - illustrates acceptable finite sample performance of this inferenceapproach.27Assumption 1.3.1. (a) Firms observe (i) workers’ education decisions, H, and(ii) the distribution of characteristics, X. (b) Workers observe (i) the distributionof firm capital, (ii) the distribution of η , (iii) the distribution of X, and (iv) thedistribution of the number of firms preferring each education level h j ∈H .Under part (a) of Assumption 1.3.1, firms do not take workers’ covariates intoaccount when forming their preference rankings over workers. Thus, workers withthe same education level are equally desirable to any given firm. When worker iconsiders the desirability of choosing education h j, he need only consider the cap-ital a generic agent who chooses level h j expects to receive in the matching pro-cess. In many contexts, (a) will be reasonable for a host of variables that affects theworker’s education decision (e.g., marital status, number of dependent children).35Part (b) says that workers know only the distribution of firm capital without know-ing the precise realizations of capital. Assumption 1.3.1 (b) also stresses that theworker’s knowledge of the distribution of capital is not sufficient for knowledge ofthe distribution of the number of firms that prefers each education class, which willturn out to be crucial for our results of this section.Assumption 1.3.2. (a) K is discrete with probability mass q = (qm)Mm=1 whereqm = P(K = km) for m = 1, ...,M.(b) η j’s are iid N(0,σ2) (c) εi’s follow the Type I extreme value distribution.Part (a) says the distribution of firm capital has discrete support. In practice,we can let M be as large as our application requires. In concert with (b) and theparametric structure for v stipulated by equation 1.1, (a) allows us to express theunconditional distribution of v j as a mixture of normals, G ≡ ∑Mm=1 qmFm, whereFm is N(βkm,σ2).36 Part (c) is an assumption on the worker’s unobserved coststhat allows us to estimate the model parameters using conventional discrete choicemethods.We wish to obtain a convenient representation of each worker’s conditionalexpectation of the production function, for each education level that the worker35In some cases in which employers do see these worker characteristics, they are prohibited fromdiscriminating based on them due to state or federal anti-discrimination laws.36In the simulations and empirical sections of the chapter we normalize σ2 = 1 when we performinference on the model parameters.28can choose. Under the model of Section 1.2.2 the identity of the firm that worker imatches with, M (i), depends on K, H, β , and, θ . Therefore, for each i ∈ Nh andh j ∈H , we wish to estimatef˜i j ≡ E[ f (Hi,KM (i))|Hi = h j,Xi = xi],where the expectation is taken with respect to the distribution of K, H−i and η .Under Assumption 1.3.2 (a), we can express the expectation on the preceding lineasf˜i j = f ′jpi(i)j , (1.14)where f j = ( f j1, ..., f jm)′ is an M× 1 vector with the m-th element of f j given asf jm = f (h j,km) and pi(i)j = (pi(i)1 j , ...,pi(i)M j)′ is an M×1 vector with the m-th elementof pi(i)j given aspi(i)m j = ∑h−i∈H−iP(M (i) = m|Hi = h j,H−i = h−i,Xi = xi)P(h−i|xi). (1.15)This is the probability that worker i matches to a firm of capital level km when hehas chosen education level h j.37 Given that there are M education levels, J choices,and nh workers, the dimensionality of the problem appears daunting. However,under our assumptions the problem is simplified considerably, and we can showthat for each j and m, pi(i)m j = pim j, and hence, f˜i j = f˜ j.38Although it is unclear how to represent pim j’s analytically when the workerfaces a choice between a large number of education levels, the problem becomestractable when there are only two (i.e., J = 2). Proposition A.1.1 shows that un-der our informational assumptions, firms (and workers) cannot distinguish betweenworkers with the same education level during the matching process. As a conse-quence, we find that a worker is only concerned with the number of other workerswho picked one of the two education levels (and not which particular workerschose what). Independence and identical distributions assumptions imply that theprobability that n j workers picked education level h j can be represented using the37Note that although these terms depend on θ and β , we will occasionally omit these from ournotation for convenience.38The argument for why this is the case is given in the proof of Proposition A.1.1.29binomial probability mass function. However, the number of workers choosingeducation level h j is unknown to workers, so they must take expectations. Thus,instead of having to sum over nh−1 indices associated with actions of each of theother workers to compute the worker’s expectation, we need only sum over one:the number of workers choosing a particular education level.We will also allow θ to enter pim j’s through the distribution of the number offirms that prefer high (or low) education. The following assumption is a naturalway to specify this distribution. We use the notation M+j (θ) to denote the set offirm types that prefer education level h j.39Assumption 1.3.3. In the model with J = 2, the probability that exactly n( j) firmsprefer workers with education level h j follows the binomial distribution with prob-ability ∑m∈M+j (θ) qm.The explicit representation of the matching probabilities are given in Propo-sitions A.1.2, A.1.3, and Lemma A.1.3. These results can be used to constructestimates of the pim j’s - and hence the f˜ j’s - for fixed values of θ and β . Usinga given functional form for the production function, we denote an estimate of theexpected production function when the worker chooses education level h j asfˆ j(θ ,β ) = f ′jpˆi j(θ ,β ),where our notation emphasizes the dependence of the objects upon the parame-ter values. To construct pˆim j’s we must estimate the terms of equation A.9. Pˆ(n j)is constructed as B(n j;nh− 1, pˆ j) where the latter denotes the binomial probabil-ity mass function with pˆ j = P(Hi = h j).40 Similarly, Pˆ(n( j);θ) is constructed asB(n j;nh−1, qˆ j(θ)), where qˆ j(θ) = ∑m∈Mˆ+j (θ) qˆm, with qˆm = Pˆ(K j = m),Mˆ+j (θˆ) ={m ∈ {1, ...,M} : ρˆ(km,h j;θ)≥ ρˆ(km,h j′ ;θ), j 6= j′},and ρˆ(km,h j;θ) is as in equation (1.9), except we use gˆ j = 1n ∑ni=1 g(h j,Xi) in placeof g˜.39That is, M+j (θ) = {m ∈ {1, ...,M} : ρ(km,h j;θ)≥ ρ(km,h j′ ;θ), j 6= j′}. See also the discussionbefore Proposition A.1.2.40In so doing, we pursue a two-step approach for estimating the choice probabilities, such as Bajariet al. [12]. See for example Kasahara and Shimotsu [77] for an alternative approach.30Lastly, the Ph j,n j,n( j)(m)’s, from equation A.9 - that is, the probability that aworker matches to a firm of type m when they choose education level h j, n j otherworkers choose h j, and n( j) firms prefer h j - can be simulated for fixed values ofθ and β . Proposition A.1.2 and Proposition A.1.3 show how these can be repre-sented using probabilities involving order statistics. Under Assumption 1.3.2 (b),we can construct Pˆh j,n j,n( j)(m)’s by averaging functions of simulated draws of beta-distributed random variables (in particular, see Corollaries A.1.1 and A.1.2, whichfollow the order statistic result in Lemma A.1.3).Once we have estimated fˆ j(θ ,β ) for each education level, we may use thespecification of the wage from equation 1.8 to write the expected wage asωˆ ji(θ ,β ) =τ fˆ j(θ ,β )+(1− τ)g(Hi,Xi;θ). (1.16)When there are two choices (J = 2), the worker chooses high education (h j = 1) ifand only ifU∗1i−U∗i0 > 0.Under the assumption that εi’s follow the extreme value distribution (Assumption1.3.2), the probability that worker i chooses high education can be written aspˆi(θ ,β ) =exp(ωˆ1i(θ ,β )− ωˆ0i(θ ,β ))1+ exp(ωˆ1i(θ ,β )− ωˆ0i(θ ,β )) .Since the covariates {Xi}ni=1 are iid we can write the joint likelihood as the productof the marginal likelihoods. We can then define the estimator of θ (for a fixed valueof β ) as the minimizer of the standard logit likelihood function:41lnLn(θ ,β ) =−n∑i=1(hi ln pˆi(θ ,β )+(1−hi) ln(1− pˆi(θ ,β ))) .41When β is fixed, maximizing the likelihood by computing the fˆ j(θ ,β )’s for each candidatevalue of θ can be slow. The following strategy can be used to estimate θ for fixed β more quicklyprovided that the support of K is not too large. First, note that θ enters fˆ j(θ ,β ) only through theset of firm types that prefer education level h j, Mˆ+j (θ). Given our assumptions on the productionfunction and firm preferences, Mˆ+j (θ) must take one of M+1 possible values. Therefore, for fixedβ , we can avoid simulating fˆ j(θ ,β ) for each candidate value of θ by pre-allocating the qˆ j(θ)’s andPh j ,n j ,n( j)(m)’s for each of the M+1 cases for Mˆ+j (θ). It then suffices to evaluate Mˆ+j (θ), select theappropriate dimension of the array of terms, then assemble the terms according to equation A.9.311.3.3 Matching ProbabilitiesIn this section, we consider the role of frictions, or the magnitude of β relativeto the variance of η , in shaping matching patterns between workers and firms.Note that these frictions play no role in determing firm preferences, or which firmtypes prefer high education.42 Nevertheless, because the frictions do affect sortingpatterns, they are of considerable importance to workers when they decide howmuch to invest in education.In the following example, we will suppose that that the set of firms that prefereducation level h j, M+j , contains at least two types of firms, m and m˜ with km 6= km˜.Suppose we fix N j, the number of workers who chose education level h j, at somen j and we fix N( j), the number of firms who prefer highly-educated workers atsome n( j) such that n j+1< n( j). In this situation, there are strictly more firms whoprefer type h j workers than there are workers of this type. Let κ = n( j)− n j + 1,and denote pmκ ≡ P(vm > v(κ)) for each m in M+j . Proposition A.1.2 says that thedifference in the probability of matching to a type m˜ versus a type m firm at thesevalues of n j and n( j) in such a situation is given by(pm˜κ − pmκ)q+m˜/cκ + pmκ(q+m˜−q+m)/cκ , (1.17)withcκ ≡ ∑m∈M+jpmκq+m ,where q+m = qm/∑m∈M+j qm. Under Assumption 1.3.2, the case of β = 0 gives usthat pmk = pm˜k, implying that the first term in the parentheses of equation 1.17is zero. This means that when matching frictions are highest (i.e., when β = 0),the difference in the probability of matching to one type of firm that prefers h jover another is captured by the relative prevalence of those types of firms in theeconomy.In the case that β > 0, under Assumption 1.3.2, pm˜κ − pmκ becomes larger askm˜− km becomes larger. This means that higher capital firms have a better chanceof matching with the high education workers when β > 0. On the other hand, in42We discuss the role of firm preferences on matching patterns at the end of Section case that n j +1> n( j) (i.e., h j is demanded by fewer firms than there are in theeconomy), then the above probabilities are independent of firm capital and β onceagain depend solely on the relative prevalence of the each type of firm.1.4 Analysis of a Labour Matching Market in Canada1.4.1 BackgroundIn this section, we investigate the role of a labour matching technology in shap-ing education and wage patterns for the Canadian economy. Since the 1980’s,many scholars studying the Canadian economy have focused on the rise of wageinequality, contrasting it to similar trends in the United States and Western Eu-rope (Fortin et al. [54], Lemieux [83], Saez and Veall [104]). To explain this re-cent growth in wage inequality, some have emphasized forces on the demand sideof the labour market, such as increasing wage premia for highly-skilled workers(Boudarbat et al. [24]), and a declining demand for jobs in the middle of the skilldistribution (Green and Sand [63]). Another set of explanations emphasizes therole of institutional changes and government policy; namely, the effect of mini-mum wages and falling unionization rates (Fortin et al. [54], Lemieux [83]). Oneless-explored cause of this inequality may lie in the underlying process by whichfirms find workers to hire. This chapter emphasizes such a channel, focusing on therole of the labour market matching mechanism in recent wage patterns in Canada.As in the study of the German labour market in Card et al. [34], this channel con-siders the role that sorting patterns play in wage patterns, with a particular focuson the process by which firms find workers to hire.1.4.2 DataThe matched employer-employee data we consider come from the Workplace Em-ployee Survey (WES) of Statistics Canada (Statscan). WES is a longitundinal sur-vey of Canadian firms and the workers they employ. WES allows researchers tostudy how the characteristics and outcomes of workers and firms are related. Thus,WES goes beyond other surveys that track only one side of the market, such LabourForce Survey (LFS) in the case of workers, or the Longitudinal Employment Anal-33ysis Program (LEAP) in the case of firms. Furthermore, the WES allows, in moredetail than in previous surveys, to understand how firms adopted new technologyand what the impacts of this was (Statscan). The WES is especially rich in termsof information concerning worker-firm bargaining, outside options, and technol-ogy use. As the 2006 release only contains employer data, I only consider WESpanels for the years 1999-2005.WES only collects data on firms and workers in Canadian provinces whose in-formation was obtainable from Statcan’s Business Register. The target populationof the study was all non-governmental firms aside from agricultural and religiousorganizations. Furthermore, the focus was only on firms that hired more than oneworker (who was not the owner or the employer). A firm employee is defined as aperson associated with that firm who is working or on paid leave in March of thesurvey year who receives a T-4 slip from Canada Revenue Agency.The workplace component of WES was conducted from 1999-2006. The firmswere followed throughout the course of the study. Every two years, a sample offirms which are new to the Business Register are added to the base sample. Theemployee component of WES was conducted from 1999-2005. In each workplacesurvey firm that employs more than four workers, up to 24 workers are randomlysampled. All firms with fewer than four workers are included in the sample. Work-ers are only followed for two years in the workplace survey. For this reason, everysecond year, workers are resampled from the firms.WES data has been used by other researchers. Dionne and Dostie [46] useWES data from 1999-2002 to study the impact of work arrangements on employeeabsenteeism. Dostie and Jayaraman [47] investigate the role of computer use onfirm productivity gains. Pendakur and Woodcock [98] study the extent to whichimmigrant and minority access to high-paying jobs is determined by barriers tobecoming hired at high-paying firms.The WES (1999-2006) time frame coincides with a period of somewhat modestwage inequality (See A.1.7). In Figure A.1.7, we use the WES data to plot thedifference between the 99th and 50th quantiles of total hourly wages for all workersin the WES sample. A comparable pattern can be seen in the shares of marketincome accruing to the top 1% of recipients (Veall [110]).341.4.3 Model Estimates and CounterfactualsIn this section, we explore the evolution of the matching technology and prefer-ences in two industries from the WES sample: Secondary Products Manufacturing(WES industry 4), and the Finance and Insurance (WES industry). Table A.3 re-ports results for the matching technology for a subgroup of higher skilled workers:namely, managers (WES occupation category 1) and professionals (WES occupa-tion category 2), while tables A.4 and A.5 report preference estimates for the samesubgroup of workers for the manufacturing and finance industries respectively. Theestimates of preferences are reported at the minimum distance estimate of β . Weconsider the two specifications for the expected wage equation 1.8 that are foundto behave reasonably in the simulation studies of Section A.1.3. Specification 1uses the production function where worker and firm types are multiplicative whilethe production function in Specification 2 is additive. The results in this sectionprovide similar insights on parameter inference. However, the results from A.1.7illustrate the importance consequences of production function interactions in wageinequality.The results for the matching technology suggest a period of low frictions in1999-2000, followed by an increase in frictions. Whereas the frictions remainhigh towards the end of the sample in the finance industry, the frictions fall inthe manufacturing industry towards the end of the sample (2004-2005). Note thatthe standard errors are so small in general that we report confidence intervals forβ using the plug-in values of θˆ (that is, we are not strictly taking into accountestimation error of θˆ into account).In the preference estimates for both industries, we typically estimate θ2 - thecoefficient on femalei in the worker’s outside option function - to be negative. Theestimated coefficients on θ3 (marital status), and θ4 (number of dependent children)are less conclusive. In both industries, θ1 is found to be largest towards the end ofthe sample - 2004 in the case of the manufacturing industry, 2005 in the financeindustry. The production technology appears more stable in the finance industrythan it does in the manufacturing industry over time.In section A.1.7, we use the structural model developed in this chapter alongwith the structural estimates of section to A.1.6 to generate statistics from two35key counterfactual distributions of interest: wages and education. We consider thecounterfactual implications of different (in-sample) estimated levels of matchingfrictions. For example, we can see what level of inequality would have prevailedin 1999 if the matching frictions had been as low as they were in 2005. Counter-factuals education levels in the two specifications are generated in a similar way asthe outcome variables generated in the Monte Carlo study from in Section A.1.3,and the wage is generated using these values along with the simulated matchings(that involve iid draws of η). Here, of course, we use the relevant covariates, firmcapital data, and parameter estimates for each of the cell in the tables. Overall, theresults highlight the important role that matching technology and the productioncomplementarities play in educational decisions and wage patterns.Tables A.6 and A.7 shows results for counterfactual education levels. Produc-tion complementarities lead to higher investment in education in both cases, butappear to matter more in the finance industry. For instance, in 1999, the effect ofswitching to a multiplicative production function from an additive one increasesthe equilibrium investment in education by about 2% in the manufacturing indus-try but by about 5% in finance industry. We also see the substantial role that bothpreferences and the matching technology play in the decision to obtain higher edu-cation. In the manufacturing sector in 1999 (a high β year), the effect of switchingto the matching technology from 2001 causes a fall in the equilibrium probabil-ity of attending college by roughly 8%. Overall, however, there is evidence that -taken together - changes to preferences (including the parameter in the productionfunction) - matter much more to the worker’s college decision than changes in thematching frictions. For example, in the year 2001, the effect of switching to 1999’spreference parameter is a fall in the probability of investing in education of almost20%.Tables A.10 and A.11 report counterfactual (weighted) Gini coefficients forthe WES sample years along with two counterfactual levels: maximal frictions(β = 0) and very low frictions (β = 5). The Gini coefficients in the row βˆyear weresimulated from the equilibrium of the model taking the exogenous variables andpreference estimates from that year.In Tables A.10 and A.11 we see that in both industries, inequality is typicallymuch higher in the case with production complementarities (Specification 1). In36the finance industry in Specification 2, the effect of lowering matching frictionsraises wage inequality in every year. In this case, the sorting effect raises inequal-ity and dominates the inequality-lowering effects of a greater supply of highly ed-ucated workers. In other cases, however, the effect is ambiguous. In Specification1 in the manufacturing industry, the level of inequality at the estimated value ofthe frictions is lower than at the counterfactual levels for most years (except 2002).For example, in 2005 the simulated Gini is 0.222 and the investment in education is77%. This rises to 0.2507 (education investment 76%) when information frictionsare highest and 0.2452 (education investment 85%) when frictions are lowest. Theopposite is the case in Specification 1 in the finance industry, where the level ofinequality at the estimated value of the frictions is higher than at the counterfactuallevels for each year (except 1999).1.5 ConclusionThis chapter presents an empirical strategy for studying wages and education ina labour market where the decisions of workers matter in the matching process.In particular, I perform inference on a labour market matching technology usingmatched employer-employee data. I demonstrate the feasibility of my approach inthe case that the worker faces a choice between two education levels.The methodology I develop in this chapter can be extended in a number ofuseful directions. Although the data used in this chapter did not include informa-tion on firms’ profit, Bartolucci and Devicienti [14] have shown that such data isuseful for investigating sorting. Another natural extension of the current setup isto consider the role that heterogeneity in the worker’s bargaining strength plays indriving wage variation.A number of inriguing extensions to the model would prove much more chal-lenging. One limitation of the current approach is its reliance on cross-sectionalvariation alone for inference. In effect, useful information concerning unemploy-ment and job-to-job transitions by workers is unused in my framework.This chapter has also demonstrated how the decision to invest in education -and wage inequality - is sensitive to the presence of a particular source of matchingfrictions in the economy. Although firm capital is exogenous in this chapter, the37role of information frictions on capital accumulation in an extended frameworkcould be a fruitful way to study not only wage inequality, but also economic growth.38Chapter 2Estimating Local InteractionsAmong Many Agents WhoObserve Their Neighbors2.1 IntroductionInteractions between agents - for example, through personal or business relations -generally lead to their actions being correlated. In fact, such correlated behaviorsform the basis of identifying and estimating peer effects, neighborhood effects,or more generally social interactions in the literature. (See Blume et al. [20] andDurlauf and Ioannides [49] for a review of this literature.)Empirical modeling becomes nontrivial when one takes seriously the fact thatpeople are often connected directly or indirectly on a large complex network, ob-serving some others’ types, and that the econometrician observes only a small frac-tion of those on the network. Furthermore, strategic environments are highly het-erogeneous across agents as each agent occupies a nearly “unique” position in thenetwork. Information sharing potentially creates a complex form of cross-sectionaldependence among the observed actions and yet the econometrician rarely has pre-cise information about the actual network on which people observe other people.The main contribution of this chapter is to develop a tractable empirical model39of linear interactions among agents with the following two major features. First,assuming a large game on a complex exogenous network, the empirical modelallows the agents not to observe the full network, but to observe only part of thetypes of their neighbors.1Second, our model explains strategic interdependence among agents throughcorrelated observed behaviors. In this model, the locality of cross-sectional depen-dence among the observed actions reflects the locality of strategic interdependenceamong the agents. Most importantly, unlike most incomplete information gamemodels in the literature, our set-up allows for information sharing on unobserv-ables, i.e., each agent is allowed to observe his neighbors’ payoff relevant signalsthat are not observed by the econometrician.Third, the econometrician does not need to observe the whole set of players inthe game for inference. It suffices that he observes many (potentially) non-randomsamples of local interactions. The inference procedure that this chapter proposesis asymptotically valid independently of the actual sampling process, as long asthe sampling process satisfies certain weak conditions. Accommodating a widerange of sampling processes is useful because random sampling is rarely used forthe collection of network data, and a precise formulation of the actual samplingprocess is often difficult in practice.A standard approach to model interactions among agents is to model them asa game and use equilibrium strategies from the game to obtain predictions andtestable implications. Such an approach is cumbersome in our set-up. Since aparticular realization of any agent’s type affects all the other agents’ actions inequilibrium through a chain of information sharing, each agent needs to form a“correct” belief about the entire information graph. Apart from such an assumptionbeing highly unrealistic, it also implies that predictions from an equilibrium that theeconometrician uses to form testable implications generally involve all the playersin the game, when it is often the case that only part of the players are observedin practice. Thus an empirical analysis which regards the players in the sampleas coincident with the actual set of players in the game will suffer from lack of1For example, a recent paper by Breza et al. [27] documents that people in a social network hasa substantial lack of knowledge on the network, and that the violation of this assumption may havesignificant implications in the predictions of the model.40external validity when his target “population” is the original large game involvingmuch more players than those in the sample.Instead, this chapter adopts an approach of behavioral modeling, where it isassumed that each agent, not knowing fully the information sharing relations, op-timizes according to his simple beliefs about other players’ strategies. The crucialpart of our behavioral assumption is a primitive form of belief projection whichsays that each agent, not knowing who his payoff neighbors observe, projects hisown beliefs about other players onto his payoff neighbors. More specifically, ifagent i gives more weight to agent j than to agent k, agent i believes that each ofhis payoff neighbor s does the same in comparing agents j and k.Belief projection in this chapter is a variant of inter-personal projection stud-ied in behavioral economics. A related behavioral concept is projection bias ofLoewenstein et al. [86] which refers to the tendency of a person projecting his owncurrent taste to his future taste. See also Van Boven et al. [109] who reported ex-periment results on the interpersonal projection of tastes onto other agents. Sinceformation of belief is often tied to the information set the agent has, belief projec-tion is closely related to information projection in Madara´sz [88] who focuses onthe tendency of a person projecting his information to other agents’ information.The main difference here is that our focus is to formulate the assumption in a waythat is useful for inference using observational data on actions on a network.We show that our primitive form of belief projection yields an explicit formof the best linear response which has intuitive features. For example, the bestlinear response is such that each agent i gives more weights to those agents witha higher local centrality to him, where the local centrality of agent j to agent i isdefined to be high if and only if a high fraction of agents from those whose actionsaffect agent i’s payoff have their payoffs affected by agent j’s action. Also, eachagent’s action responds to a change in his own type more sensitively when thereare stronger strategic interactions, due to what we call the reflection effect. Thereflection effect of player i captures the way player i’s type affects his own actionthrough his payoff neighbors whose payoffs are affected by player i’s types andactions.Furthermore, the best linear response from the belief assumption provides atestable implication for information sharing on unobservables in data. The main41idea is as follows. When the agents are strategically interdependent, the best linearresponse gives a linear reduced form for observed actions where the cross-sectionalcorrelation of residuals indicates information sharing on unobservables. Hence asthis chapter shows, using cross-sectional correlation of residuals, one can test forthe role of information sharing on unobservables.One might wonder how close the predictions from our behavioral model is tothe predictions from an equilibrium model. For this we consider a simple linearinteractions model as a complete information game where one can compute theequilibrium explicitly. The equilibrium strategies are given in a primitive formof a spatial autoregressive model. We compare the network externality from ourbehavioral model and that from the complete information game model using simu-lated graphs, one from Erdo¨s-Re´nyi random graphs and the other from a scale-freerandom graph generation of Baraba´si-Albert. In both cases, it is shown that bothmodels have similar predictions when the payoff externality parameter is less thanor equal to 0.5. However, when it is close to one, the network externality becomesmuch higher in the equilibrium model than in the behavioral model. This is be-cause while strong local interactions induce global cross-sectional dependence inthe equilibrium model due to extensive information transmission, it does not inour behavioral model. Also, as the network size increases, the network external-ity from our behavioral model changes more stably than that from the equilibriumstrategies from a complete information game.We investigate the finite sample properties of asymptotic inference throughMonte Carlo simulations using various payoff graphs. The results show reasonableperformance of the inference procedures. In particular, the size and the power ofthe test for the strategic interaction parameter work well in finite samples. We alsoapply our method to an empirical application of decisions of municipalities on statepresence revisiting the study by Acemoglu et al. [4]. We consider an incompleteinformation game model which permits information sharing. The fact that our bestlinear responses explicitly reveal the local dependence structure means that it isunnecessary to separately correct for spatial correlation following, for example,the procedure of Conley [41].The literature of social interactions often look for evidence of interactionsthrough correlated behaviors. For example, linear interactions models investigate42correlation between Yi and the average of outcomes over agent i’s neighbors. Seefor example Manski [90], De Giorgi et al. [45], Bramoulle´ et al. [25] and Blumeet al. [21] for identification analysis in linear interactions models, and see Calvo´-Armengol et al. [29] for an application in the study of peer effects. Goldsmith-Pinkham and Imbens [61] considers nonlinear interactions on a social networkand discusses endogenous network formation. These models often assume thatwe observe many independent samples of such interactions, where each indepen-dent sample constitutes a game which contains the entire set of the players in thegame.In the context of a complete information game, linear interactions models ona large social network can generally be estimated without assuming independentsamples. The outcome equations frequently take the form of spatial autoregressivemodels which have been actively studied in the literature of spatial econometrics.(Anselin [9]) A recent study by Johnsson and Moon [74] consider a model of linearinteractions on a large social network which allows for endogenous network for-mation. Developing inference on a large game model with nonlinear interactionsis more challenging. See Menzel [94], Xu [114], Song [107], Xu and Lee [115],and Yang and Lee [116] for a large game model of nonlinear interactions. Thislarge game approach is suitable when the data set does not have many independentsamples of interactions. One of the major issues in the large game approach is thatthe econometrician often observes only part of the agents in the original game.2Our approach of empirical modeling is also based on a large game modelwhich is closer to the tradition of linear interactions models in the sense that ourapproach attempts to explain strategic interactions through correlated behaviorsamong neighbors. In our set-up, the cross-sectional dependence of the observed ac-tions is not merely a nuisance that complicates asymptotic inference; it provides thevery piece of information that reveals the strategic interdependence among agents.The correlated behaviors also arise in equilibrium in models of complete informa-tion games or games with types that are either privately or commonly observable.2Song [107], Xu [114], Johnsson and Moon [74], Xu and Lee [115] and Yang and Lee [116]assume observing all the players in the large game. In contrast, Menzel [94] allows for observingi.i.d. samples from the many players, but assumes that each agent’s payoff involves all the otheragents’ actions exchangeably.43(See Bramoulle´ et al. [25] and Blume et al. [21].) However, as emphasized before,such an approach can be cumbersome in our context of a large game primarily be-cause the testable implications from the model typically involve the entire set ofplayers, when in many applications the econometrician observes only a small sub-set of the players in the large game. After finishing the first draft of the workingpaper that forms the basis of this chapter, we learned of a recent paper by Eraslanand Tang [53] who model the interactions as a Bayesian game on a large networkwith private link information. Like this chapter, they permit the agents not to ob-serve the full network, and show identification of the model primitives adoptinga Bayesian Nash equilibrium as a solution concept. One of the major differencesof this chapter from their paper is that this chapter permits information sharing onunobservables, so that the actions of neighboring agents are potentially correlatedeven after controlling for observables.A departure from the equilibrium approach in econometrics is not new in theliterature. Aradillas-Lopez and Tamer [10] studied the implications of various ra-tionality assumptions for identification of the parameters in a game. Unlike theirapproach, our focus is on a large game where many agents interact with each otheron a single complex network, and, instead of considering all the beliefs which ratio-nalize observed choices, we consider a particular set of beliefs that satisfy a simplerule and yield an explicit form of best linear responses. (See also Goldfarb andXiao [60] and Hwang [73] for empirical research adopting behavioral modelingfor interacting agents.)This chapter is organized as follows. In Section 2, we introduce an incompleteinformation game of interactions with information sharing. This section derivesthe crucial result of best linear responses under simple belief rules. In this section,we discuss the issue of external validity of network externality comparing two sim-ple interactions models: a complete information game with equilibrium strategiesand our behavioral model. Section 3 focuses on econometric inference. This sec-tion presents inference procedures, explains a situation where we can measure therole of information sharing on unobservables and compares our approach with astandard linear-in-means model. Section 4 investigates the finite sample propertiesof our inference procedure through a study of Monte Carlo simulations. Section 5presents an empirical application on state capacity among municipalities. Section446 concludes.2.2 Strategic Interactions with Information Sharing2.2.1 A Model of Interactions with Information SharingStrategic interactions among a large number of information-sharing agents can bemodeled as an incomplete information game. Let N be the set of a finite yet largenumber of players. Each player i ∈ N is endowed with his type vector (Ti,ηi),where ηi is a private type and Ti a sharable type. As we will elaborate later, in-formation ηi is kept private to player i whereas Ti is observed by his “neighbors”which we define later. Throughout this chapter, we set Ti = (X ′i ,εi)′, where Xi isthe vector of characteristics of player i that are observed by the econometrician,and εi the unobserved characteristic of player i. Thus the model permits informa-tion sharing on unobservables εi. This feature in fact makes a significant departurefrom many existing incomplete information interactions models which assume thatvariables that the econometrician observes are public among the agents whereas thevariables that the econometrician does not observe are kept private among them-selves. (e.g. Blume et al. [21])To capture the strategic interactions among players, let us introduce an undi-rected graph GP = (N,EP), where EP denotes the set of edges i j, i, j ∈N with i 6= jand each edge i j ∈ EP represents that the action of player i affects player j’s pay-off.3 We denote NP( j) to be the GP-neighborhood of player j, i.e., the collectionof players whose actions affect the payoff of player j:NP( j) = {i ∈ N : i j ∈ EP},and let nP( j) = |NP( j)|. We define NP(i) = NP(i)∪{i} and let nP(i) = |NP(i)|.Player i choosing action yi ∈ Y with the other players choosing y−i = (y j) j 6=iobtains payoff:ui(yi,y−i,T,ηi) = yi(X ′i,1γ0+ X˜′i,2δ0+β0y˜i+ εi+ηi)− 12y2i ,3A graph G = (N,E) is undirected if i j ∈ E whenever ji ∈ E for all i, j ∈ N.45where T = (Ti)i∈N , Xi,1 and Xi,2 are subvectors of Xi,X˜i,2 =1nP(i)∑k∈NP(i)rikXk,2, and y˜i =1nP(i)∑k∈NP(i)rikyk,if NP(i) 6=∅, and X˜i,2 = 0 and y˜i = 0 otherwise. The factor rik measures the “relativeweight” of individual k in the network from the viewpoint of i. In this chapter, weconsider two specifications.Specification A : rik = 1, for all i,k ∈ N. (2.1)Specification B : rik = nP(k)/nP(i), for all i,k ∈ N.The simple choice rik = 1 gives equal weight to every other agent, but thechoice of rik = nP(k)/nP(i) give more weights to those who have more edges withothers relative to agent i. Thus the payoff depends on other players’ actions andtypes only through those of his GP-neighbors. We call GP the payoff graph.The parameter β0 measures the payoff externality among agents. In the termi-nology of Manski (1993), δ0 captures the exogenous effect and β0 the endogenouseffect of social interactions. As for β0, we make the following assumption:Assumption 2.2.1. 0≤ |β0|< 1.This assumption is often used to derive a characterization of a unique pure strat-egy equilibrium in the literature. (See e.g. Bramoulle´ et al. [25] and Blume et al.[21] for its use.) When β0 > 0, the game is called a game of strategic complementsand, when β0 < 0, it is called a game of strategic substitutes.Let us introduce information sharing relations in the form of a directed graph(or a network) GI = (N,EI) on N so that each i j in EI represents the edge fromplayer i to player j, where the presence of edge i j joining players i and j indicatesthat Ti is observed by player j. Hence the presence of an edge i j between agentsi and j represents information flow from i to j. This chapter calls graph GI theinformation graph. For each j ∈ N, defineNI( j) = {i ∈ N : i j ∈ EI},46that is, the set of GI-neighbors observed by player j.4 Also writeNI(i) = NI(i)∪{i},i.e., the GI-neighborhood of i including i himself. We define nI(i) = |NI(i)|.In this chapter, we do not assume that each agent knows the whole informationgraph GI and the payoff graph GP. To be precise about each agent’s informationset, let us introduce some notation. For each i ∈ N, we set NP,1(i) = NP(i) andNI,1(i) = NI(i), and for k ≥ 2, define recursivelyNP,k(i) =⋃j∈NP(i)NP,k−1( j), and NI,k(i) =⋃j∈NI(i)NI,k−1( j).Thus NP,k(i) denotes the set of players which consist of player i and those playerswho are connected to player i through at most k edges in GP, and similarly withNI,k(i). Also, define NP,k(i) = NP,k(i) \ {i} and NI,k(i) = NI,k(i) \ {i}. For eachk ≥ 1, let Ni,k−1 be the σ -field generated by NP,k+1(i), NI(i) and some additionalinformation Ci which potentially causes correlation between types across differentplayers. (We will explain Ci later.) That is, for k ≥ 1,Ni,k−1 = σ(NP,k+1(i),NP,k(i), ...,NP,2(i),NI(i))∨Ci,where ∨ between two σ -fields is the smallest σ -field among those which containthe two σ -fields. Define for each k ≥ 0,Ii,k = σ(TNI(i),ηi)∨Ni,k,where TNI(i) = (Tj) j∈NI(i). We use Ii,k to represent the information set of agenti. For example, when agent i has Ii,1 as his information set, it means that agenti knows the set of agents whose types he observes (i.e., NI(i)), the set of agents jwhose actions affect his payoff (i.e., NP,1(i)) and the set of agents whose actionsaffect the payoff of his GP-neighbors j (i.e., NP,2(i)), and the sharable types of hisGI-neighbors (i.e., TNI(i)) and his own private signal ηi.4More precisely, the neighbors in NI( j) are called in-neighbors and nI(i) = |NI( j)| in-degree.Throughout this chapter, we simply use the term neighbors and degrees, unless specified otherwise.47Throughout the chapter, it is not assumed that any agent i knows NI(k) for anyof his GP-neighbors k. In other words, there might be some GP-neighbor k whomay observe other agents that agent i does not observe, and agent i does not knowwho such GP-neighbor k is or who those other agents player k observes are.Regarding the joint distribution of the profile of sharable types T , we make thefollowing assumption:Assumption 2.2.2. For each i∈N, TN\NI(i) and TNI(i) are conditionally independentgiven (GP,NI(i)) and C , whereC = ∨i∈NCi.This assumption allows the individual types to be correlated unconditionally.Each player i has information Ci which can cause correlation between his typeand other agents’ types. For example, any two types Ti and Tj may contain acommon signal which comes from a common observation by the two agents i andj.5 Assumption 2.2.2 says that the sharable types between two non-neighbors inGI are independent conditional on all such pieces of information Ci.The assumption permits the situation where the payoff network GP is exoge-nously formed, for example, as a dyadic regression model degree heterogeity, ai,with errors ui j’s that are independent of εi’s, η j’s, Xi’s and ai’s. (See e.g. Graham[62].) In this case, if we set Ci = σ(Xi,ai), Assumption 2.2.2 is reduced to that foreach i∈N, εN\NI(i) and εNI(i) are conditionally independent given (GP,NI(i),X ,a),where X = (Xi)i∈N and a = (ai)i∈N .2.2.2 Predictions from RationalityEach player chooses a strategy that maximizes his expected payoff according tohis beliefs. This provides predictions for their actions given their beliefs. For thesake of analytical facility, we assume throughout the chapter that each agent having5The signal Ci may contain information accumulated from the past information obtained whenthe information sharing takes place over time, such as information used at the stage of forminginformation and payoff graphs GI and GP. The supplemental note contains details about an extendedmodel where people share information over time and shows how this fits the current set-up in thechapter.48information set Ii =Ii,k for some k ≥ 0 chooses from a class of linear strategies:si(Ii) = ∑j∈NI(i)w′i jTj +ηi,where Ti = [X ′i ,εi]′ and wi j denotes the nonstochastic vector of nonnegative num-bers. We call wi j the weight given to player j by player i. The weight vector wi jsummarizes the influence of player j on player i’s decision making.To characterize predictions from rationality, we introduce some notation. Fori, j,k ∈ N, let wik j denote the weight that player i believes that player k gives toplayer j. Then the strategy of player k as believed by player i is given as follows:sik(Ik) = ∑j∈NiI(k)T ′j wik j +ηk,where NiI(k) denotes the set of players (including player k) who player i believesthat player k observes. Given player i’s strategy and his expected strategy of otherplayers si−i = (sik)k∈N\{i}, the (interim) expected payoff of player i is defined asUi(si,si−i;Ii) = E[ui(si(Ii),si−i(I−i),T,ηi)|Ii],where si−i(I ) = (sik(Ik))k∈N\{i}, I−i = ∨k 6=iIk and T = (Ti)i∈N . A best linearresponse sBRi of player i corresponding to the strategies si−i of the other players asexpected by player i is a linear strategy such that for any linear strategy si,Ui(sBRi ,si−i;Ii)≥Ui(si,si−i;Ii), a.e.Under the assumptions of the model, the best linear responses can be shown toproduce a map from beliefs to actions. To see this, first letwB = (w1, ...,wn)be the belief profile of all the agents, where wi = (wik j)k, j∈N . Then the rationalityof agents (i.e., their choosing a best linear response given their beliefs) gives the49following relation:w =MwB,where w = (wi j)i, j∈N corresponds to best responses and M is the best responseoperator which assigns a strategy profile (in terms of weights wi j) to a given beliefprofile wB.Given our set-up of quadratic payoffs and linear strategies, we can make ex-plicit the best response operatorM . To see this, given wij = (wik j)k∈N , let us defineMiwij =1nP(i)∑k∈NP(i)rikwik j1{ j ∈ NiI(k)}.Recall that player i’s payoff is affected by his GP-neighbors’ actions. Hence playeri perceives player j as important to him even if player j’s action does not directlyinfluence the payoff of player i, if player j’s type is observed by and influencesmany of player i’s GP-neighbors. The expression Miwij captures this perceivedimportance of player j to player i that comes through player j’s influence (as per-ceived by player i) on his GP-neighbors.Suppose that each agent i has information set Ii,k for some k ≥ 0. Then thebest response operatorM is given by the following relations:wii,1 = γ0+β0Miwii,1, (2.2)wii,ε = 1+β0Miwii,ε ,wii,2 = β0Miwii,2,and for all j ∈ NI(i),wi j,1 = β0Miwij,1, (2.3)wi j,ε = β0Miwij,ε , andwi j,2 ={δ0ri j/nP(i)+β0Miwij,2, if j ∈ NP(i),β0Miwij,2, if j ∈ NI(i)\NP(i),where we write wi j = (w′i j,1,w′i j,2,εi j)′, wi j,1,wi j,2 and wi j,ε being weights given by50player i to player j’s type components, Xi,1,Xi,2 and εi j.In order to generate predictions, one needs to deal with the beliefs wB. Thereare three approaches to model these beliefs. The first approach is an equilibriumapproach where the beliefs wB coincide with the actual weights implemented bythe agents in equilibrium. The second approach uses rationalizability where all thelinear strategies that are rationalizable given some belief wB are in consideration.The third approach is a behavioral approach where one considers a set of simplebehavioral assumptions on the beliefs wB and focuses on the best linear responsesto corresponding to these beliefs.There are pros and cons among the three approaches. One of the main differ-ences between the equilibrium approach and the behavioral approach is that theformer approach requires the beliefs wi−i to be “correct” for all players i in equi-librium. However, since each player i generally does not know who each of hisGP-neighbors observes, a Bayesian player in a standard model with rational expec-tations would need to know the distribution of the entire information graph GI (orat least have a common prior on the information graph commonly agreed upon byall the players) to form a “correct” belief given his information. Given a potentiallycomplex form of GP (partially observed in data) and that the econometrician rarelyobserves GI with precision, producing a testable implication from this equilibriummodel appears far from a trivial task.The rationalizability approach can be used to relax this rational expectations as-sumption by eliminating the requirement that the beliefs be correct. The approachconsiders all the predictions that are rationalizable given some beliefs. However,in our context, the best response operator M depends on unknown parameters ingeneral, and hence the set of predictions from rationalizability can potentially belarge and may fail to produce sharp predictions that would be useful in practice.As we explain later in detail, this chapter takes the third approach. We adopta set of simple behavioral assumptions on players’ beliefs which can be incorrectfrom the viewpoint of a person with full knowedge on the distribution of the in-formation graph, yet useful as a rule-of-thumb guidance for an agent in a complexdecision-making environment such as one in our model. As we shall see later, thisapproach can give a sharp prediction that is intuitive and analytically tractable.512.2.3 Belief Projection and Best Linear ResponsesIn this chapter, we consider the following set of behavioral assumptions on thebeliefs.Condition BP (Belief Projection): (i) For each i ∈ N and k ∈ NP(i),(a) wikk = wii,(b) wik j = τik jwi j for all j ∈ NI(i)∩NiI(k) for some positive number τ ik j, whereτ iki = 1/(riknP(k)), and (2.4)τ ik j = 1/rik, for all j ∈ NI(i)∩NiI(k), and(ii) wik j = 0 for all j /∈ N¯P(k).As mentioned before, each player i does not know who his GP-neighbors ob-serve, and Condition BP describes a simple rule of belief formation in this environ-ment. The main premise of Condition BP is that each agent projects his own beliefsabout himself and other players onto his GP neighbors. Condition BP (i)(a) saysthat each player i believes that the self-weight his GP-neighbor k gives to himself isthe same as the self-weight of player i himself. Condition BP (i)(b) says that playeri’s belief on his GP neighbor k’s weight to player j is formed in reference to hisown weight to player j. This assumption says that each agent believes that his GPneighbors follow the same ranking of other agents as he does. The belief projectionis taken as a rule of thumb for each agent i who needs to form an expectation abouthis GP-neighbors’ actions when he does not know who his GP-neighbors observe.The specification of τ iki in (2.4) reflects that player i believes that player k doesnot care much about player i’s type in choosing an action if the player k has manyGP-neighbors. The specification of τ jki in (2.4) says that each player i believes thatthe weight of each of his GP-neighbors given to a GP-neighbor j is (1/rik)wi j. Forexample, if rik = n¯P(k)/n¯P(i), we havewik j =n¯P(i)n¯P(k)wi j.Therefore, player i believes that when player k has more GP-neighbors than hedoes, player k gives less weight to player j than he does. Not knowing who player52k observes, player i employs this rule-of-thumb belief regarding player k’s weightsgiven to other players.Condition BP(ii) is concerned with player i’s belief about the players that hisGP-neighbors observe. A standard approach in an incomplete information gamewith Bayesian players assumes that the players agree on a common prior on theentire information graph GI . From this, each agent i derives his posterior on theGI-neighbors of each of his GP-neighbors. Instead, Condition BP(ii) states thatplayer i simply considers only those players in NP(k) when he deliberates on thoseplayers whose action affects the payoff of player k. This is because while player iknows player k’s GP-neighborhood, he does not know player k’s GI-neighborhood.Let us distinguish between different environments with different informationstructures of the game.Definition 2.2.1. (i) Each agent i ∈ N having beliefs about the other players’strategies as in Condition BP and having information set Ii =Ii,0 with NP,2(i)⊂NI(i) is said to be of simple type.(ii) Each agent i ∈N who believes that the other players are of simple type and hasinformation set Ii =Ii,1 with NP,3(i) ⊂ NI(i) is said to be of first order sophisti-cated type.6The difference between the simple type and a sophisticated type lies not onlyin the difference in the rationality type but also in the information set. A firstorder sophisticated type agent knows who the neighbors of the neighbors of theirneighbors in GP (i.e., NP,3(i)) are, whereas a simple type agent knows only whothe neighbors of their neighbors in GP (i.e., NP,2(i)) are.Regarding the sophistication of agents, we make explicit the following basicassumption which we assume throughout the chapter.Assumption 2.2.3. The game is populated by agents with the same order of so-phistication.6One can also define a higher order sophisticated type, although this chapter does not fully elabo-rate on such a case. More specifically, for k≥ 2, each agent i ∈ N who believes that the other playersare of the (k−1)-th order sophisticated type and has information setIi =Ii,k with NP,k+2(i)⊂NI(i)is said to be of the k-th order sophisticated type.53Different levels of reasoning for agents of the game are assumed in level kmodels which have received a great deal of attention as a behavioral model in theexperiment literature. (See Chapter 5 of Camerer [30] for a review.) In these exper-iments, a simple type agent is often much simpler than those in our set-up, wherethe agent chooses an action without considering any strategic interdependence. Incontrast, our simple type agent already considers strategic interdependence andforms a best linear response. On the other hand, the experiment literature of level-k models allows the agents to be of different rationality type in the same game.In our set-up which focuses on observational data, identification of the unknownproportion of each rationality type appears far from trivial. Hence in this chapter,we consider a game where all the agents have the same order of sophistication.Our focus on linear strategies in combination with other assumptions gives anexplicit form of best linear responses. For the expression, let us introduce somenotation: for each i ∈ N and j ∈ NI(i),ci j ≡ 1nP(i) ∑k∈NP(i)1{ j ∈ NP(k)}, if i 6= j, and (2.5)cii ≡ 1nP(i) ∑k∈NP(i)1{i ∈ NP(k)}n¯P(k)=1nP(i)∑k∈NP(i)1n¯P(k),where the last equality follows due to GP being undirected. Note that ci j is theproportion of player i’s GP-neighbors whose payoffs are influenced by the typeand action of player j. Hence ci j represents the local centrality of player j toplayer i in terms of player j’s influence on player i’s GP-neighbors. On the otherhand, cii is the average of 1/nP(k) among player i’s GP-neighbors k whose payoffsare affected by player i’s sharable type and action.Using the explicit form of the best response operatorM and Condition BP, wecan derive the explicit form of best linear responses. The following theorem givesthe form in the case where all the players are of simple type.Theorem 2.2.1. Suppose that Assumptions 2.2.1 - 2.2.3 hold and all the playersare of simple type. Suppose further that for each i ∈ N and k 6= i, E[ηk|Ii] = 0.54Then each player i’s best linear response sBRi takes the following form:sBRi (Ii) = λii(γ ′0Xi,1+ εi+β0nP(i)∑j∈NP(i)λi j(γ ′0X j,1+ ε j))+1nP(i)∑j∈NP(i)λi jδ ′0X j,2+ηi,where λi j ≡ ri j/(1−β0ci j).The result in Theorem 2.2.1 shows multiple intuitive features. First, it showsthat each player i’s best linear response does not depend on the types of payoff-irrelevant agents whose types player i observes but whose actions do not affectplayer i’s payoff. Note that agents indirectly connected to agent i in GP can stillshape the player’s strategies through the local centralities ci j. (Later, we also con-sider the case of sophisticated type, where the types of indirectly connected agentsare permitted to influence the agent i’s actions.)7 Furthermore, observe that forj ∈ NP(i),∂ sBRi (Ii)∂x j,1=β0ri jnP(i)(1−β0cii)(1−β0ci j)γ0 and (2.6)∂ sBRi (Ii)∂x j,2=δ0ri jnP(i)(1−β0ci j) ,both of which measure the response of actions of agent i to a change in the observedtype change of his GP-neighbors. Hence, these quantities capture the network ex-ternality in the strategic interactions.First, note that the network externality for agent i from a particular agent jdecreases in the neighborhood size nP(i) of agent i. More importantly, the networkexternality for each agent i is different across i’s and across their GP neighborsj depending on their “importance” to agent i in the payoff graph. This is seenfrom the network externality (2.6) being an increasing function of agent j’s localcentrality to agent i, i.e., ci j, when the game is that of strategic complements (i.e.,7The local dependence of actions from best linear responses regardless of what values β0 take in(−1,1) is in contrast with the complete information version of the game, where a high value of β0makes the dependence close to be global.55β0 > 0). In other words, the larger the fraction of agent i’s GP-neighbors whosepayoff is affected by agent j’s action, the higher the network externality of agent ifrom agent j’s type change becomes. Therefore, in our model network externalityis heterogeneous across agents, depending on the local feature of the payoff grapharound each agent.It is interesting to note that the network externality for agent i with respect to hisown type Xi,1 has a factor λii ≡ rii/(1−β0cii) = 1/(1−β0cii) which is increasingin cii when β0 > 0. We call11−β0cii −1the reflection effect which captures the way player i’s type affects his own actionthrough his GP neighbors whose payoffs are affected by player i’s types and ac-tions. The reflection effect arises because each agent, in decision making, consid-ers the fact that his type affects other GP-neighbors’ decision making. When thereis no payoff externality (i.e., β0 = 0), the reflection effect is zero. However, whenthere is a strong strategic interactions or when a majority of player i’s GP-neighborshave a small GP-neighborhood (i.e., small n¯(k) in the definition of cii in (2.5)), thereflection effect is large. Note that for those agents whose cii the econometricianobserves, the reflection effect is easily recovered once one estimates the payoffexternality β0.Now let us turn to the case where the game is played among the first-ordersophisticated players.8Theorem 2.2.2. Suppose that Assumptions 2.2.1 - 2.2.3 hold and that all the play-ers are of first-order sophisticated type. Suppose further that for each i ∈ N andk 6= i, E[ηk|Ii] = 0. Then each player i’s best linear response sBR.FSi takes thefollowing form:8For this result, we focus on Specification A (i.e., rik’s are set to 1).56sBR.FSi (Ii1) = γ0Xi,1+ εi+β0nP(i)∑j∈NP(i)λ j j(γ ′0X j,1+ ε j)+β20 ∑j∈N¯P,2(i)λ˜i j(γ ′0X j,1+ ε j)+δ0X˜i,2+δ ′0β0 ∑j∈N¯P,2(i)λ¯i jX j,2+ηi.where,λ¯i j =1nP(i)∑k∈NP(i)λk j1{ j ∈ NP(k)}nP(k),and λ˜i j =1nP(i)∑k∈NP(i)λk j1{ j ∈ NP(k)}nP(k)(1−β0ckk) .Note that as compared to the case of the game with agents of simple type,the game with agents of the first order sophisticated type predicts outcomes withbroader network externality. For example, in contrast to the case of simple typeagents, the types of neighbors whose actions do not affect player i’s payoff canaffect his best response. More specifically, note that for j ∈ NP,2(i)\NP(i),∂ sBR.FSi (Ii)∂x j,1= β 20 γ0λ˜i j and∂ sBR.FSi (Ii)∂x j,2= β0δ0λ i j.This externality from player j on player i is strong when ck j’s are large for manyk ∈NP(i), i.e., when player j has a high local centrality to a large fraction of playeri’s GP-neighbors.99Using the explicit form of the best response operator M and Condition BP, we can derive bestlinear responses in a game populated by agents of a higher order sophisticated type. As the sophisti-cation of agents becomes of higher order, the network externality of each agent broadens to a widerset of agents. The derivation is easy but tedious algebraically, giving a more complex form of bestlinear responses. Hence, details are omitted in the chapter.572.2.4 The External Validity of Network ExternalityThrough a simple model of linear interactions, we explore two issues of externalvalidity. The first issue is about generalizing the results that come from a modelwith a smaller graph to the population with a larger graph. We see how sensitivelythe network externality changes as the network grows. If the sensitivity is nothigh, this supports the external validity of a model toward a larger graph. Thesecond issue is about misspecification of behavioral assumptions. Here we setthe benchmark (true) model to be a complete information model with equilibriumstrategies, but assume that the econometrician adopts our behavioral model to makethe analysis tractable. Then we explore how close the network externality from thebehavioral model is to the true model of complete information game. Both modelsassume the same payoff function and the same payoff graph. For simplicity, weremove Xi’s and ηi’s. The main focus here is on the stability of the prediction ofthe network externalities as we progressively move from a small payoff graph to alarge payoff graph. Let Yi be the observed outcome of player i as predicted fromeither of the two game models.The complete information game model assumes that every agent observes allthe types εi’s of other agents. This model yields the following equilibrium equa-tion:Yi =β0nP(i)∑j∈NP(i)Yj + εi.Then the reduced form for Yi’s can be written asy = (I−β0A)−1ε,where y= (Y1, ...,Yn)′,ε = (ε1, ...,εn)′, and A is a row-normalized adjacency matrixof the payoff graph GP, i.e., the (i, j)-th entry of A is 1/nP(i) if j ∈ NP(i) and zerootherwise. Thus in the complete information equilibrium model, each Yi is a linearcombination of all εi’s. The model implies that when β0 is close to one (i.e., thelocal interaction becomes strong), the equilibrium outcome can exhibit extensivecross-sectional dependence.On the other hand, our behavioral model (with specification A: rik = 1 in (2.1)58and with the assumption that all the players are of simple type) predicts the out-comes in the following simple reduced form:Yi = λii(εi+β0nP(i)∑j∈NP(i)λi jε j),which comes from Theorem 2.2.1 without Xi’s and ηi’s. It is important to notethat the two models have the same payoff with the same payoff externality param-eter β0. The only difference is the information set assumptions and the solutionconcepts of the game.The parameter of interest is the average network externality:1n ∑i∈N1nP(i)∑j∈NP(i)∂ sBRi (Ii)∂ε j=1n ∑i∈N1nP(i)∑j∈NP(i)[(I−β0A)−1]i j, from the equilibrium modelβ0λiin ∑i∈N1nP(i)∑j∈NP(i)λi j, from the behavioral model,where [(I−β0A)−1]i j denotes the (i, j)-th entry of the matrix (I−β0A)−1.Note that the network externalities depend only on β0 and the payoff graphGP. For the payoff graph GP, we considered two different models for randomgraph generation. The first kind of random graphs are Erdo¨s-Re´nyi (ER) randomgraph with the probability equal to 5/n and the second kind of random graphsare Baraba´si-Albert (BA) random graph such that beginning with an Erdo¨s-Re´nyirandom graph of size 20 with each link forming with equal probability 1/19 andgrows by including each new node with two links formed with the existing nodeswith probability proportional to the degree of the nodes.For each random graph, we first generate a random graph of size 10,000, andthen construct three subgraphs A,B,C such that network A is a subgraph of networkB and the network B is a subgraph of network C. We generate these subgraphs asfollows. First, we take a subgraph A to be one that consists of agents within distancek from agent i = 1. Then network B is constructed to be one that consists of the59Table 2.1: The Characteristics of the Payoff GraphsErdo¨s-Re´nyi Baraba´si-AlbertNetwork A Network B Network C Network A Network B Network Cn 162.0 766.4 3067.4 432.2 2080.1 4663.5dmx 10.72 12.50 14.14 76.62 98.82 113.66dav 2.043 2.296 3.186 1.437 1.902 2.233Notes: This table gives average characteristics of the payoff graphs, GP, used in the simulation study,where the average was over 50 simulations. dav and dmx denote the average and maximum degreesof the payoff graphs.neighbors of the agents in network A and network C is constructed to be one thatconsists of the neighbors of the agents in network B. For an ER random graph, wetook k = 3 and for a BA random graph, we took k = 2. We repeated the process50 times to construct an average behavior of network externality as we increase thenetwork. Table 1 shows the average network sizes and degree characteristics as wemove from Networks A, B to C.First, we would like to see how sensitive the predicted average network exter-nality becomes as we move across three networks of increasing sizes. The resultsare in Figures 2.1 and 2.2. Figure 2.1 captures the relation between β0 and theaverage network externality for the case of ER graphs and Figure 2.2 captures thatfor the case of BA graphs. The left panel depicts the relation from the completeinformation equilibrium model and the right panel depicts the relation from thebehavioral model.As shown in Figures 2.1-2.2, the predicted network externality from the behav-ioral model is less sensitive to the change of the networks than that from the equi-librium model. In particular, this contrast is stark when β0 is close to 1. The mainreason behind this contrast is that in the case of the equilibrium model, strongerlocal strategic interactions induce extensive cross-sectional dependence. This ex-tensiveness will sensitively depend on the size and the shape of the network. Onthe other hand, the behavioral model limits the extent of the cross-sectional de-pendence even when β0 is high. Hence the predicted network externality does notvary as much as the equilibrium model as we change the network. The result illus-trates the point that our behavioral model translates local strategic interactions to60Figure 2.1: Network Externality Comparison Between Equilibrium and Be-havioral Models: Erdo¨s-Re´nyi GraphsNotes: Each line gives the average network externality as a function of β0, where the network is gen-erated through an ER graph. The complete information game shows how the relationship betweenthe network externality and β0 changes as we expand the graph from a subgraph of agents withindistance k from the agent 1. (Networks A, B, and C correspond to networks with k = 3,4,5 from asmall graph to a large one.) The figures show that the average network externality from the behav-ioral model behaves more stably across different networks than that from the equilbrium model inparticular when β0 (local interaction parameter) is high.Figure 2.2: Network Externality Comparison Between Equilibrium and Be-havioral Models: Baraba´si-Albert GraphsNotes: The figure is similar to the previous one except that the graph is now BA. The completeinformation game shows the relation changes as we expand the graph from a subgraph of agentswithin distance k from the agent 1. Again, the behavioral model gives a prediction of the relation thattends to be more stable than the complete information game in this network generation.local stochastic dependence of observed actions gives a better property of externalvalidity than the complete information equilibrium model.Suppose that the econometrician believes the true model is an equilibriummodel, but uses our behavioral model as a proxy for the equilibrium model. If61these two models generate “similar” predictions, using our behavioral model as aproxy will not be a bad idea. The results in Figures 2.1 and 2.2 again show that theanswer depends on the payoff externality β0. Unless the parameter β0 is very high(say larger than or equal to 0.5), both the equilibrium approach and the behavioralapproach give similar network externality. However, the discrepancy widens whenβ0 is high. Hence in this set-up, using our behavioral approach as a proxy for anequilibrium approach makes sense only when strategic interdependence is not toohigh.The comparison here uses a set-up where the econometrician observes all theplayers in the game. However, it should be kept in mind that as we shall see laterwhen we propose inference, the behavioral model naturally accommodates the casewhere one observes only part of the players whereas the complete informationgame model does not in general. Hence when the local strategic interactions arenot very high, the behavioral model can be a good proxy for a complete informationgame model with predictions from an equilibrium when only part of the players areobserved in the sample.2.3 Econometric Inference2.3.1 General OverviewPartial Observation of InteractionsA large network data set is often obtained through a non-random sampling process.(See e.g. Kolaczyk [80].) The main difficulty in practice is that the actual samplingprocess by which the network data are gathered is hard to formulate formally withaccuracy. Our approach of empirical modeling can be useful in such a situationwhere interactions are observed only partially through a certain non-random sam-pling scheme that is not precisely known. In this section, we make explicit the datarequirements for the econometrician and propose inference procedures. We mainlyfocus on the game where all the players in the game are of simple type. Later, wediscuss the situation with agents of first order sophisticated type.Suppose that the original game of interactions consists of a large number of62agents whose set we denote by N. Let the set of players N be on a payoff graph GPand an information graph GI , facing the strategic environment as described in thepreceding section. Denote the best response as an observed dependent variable Yi:for i ∈ N,Yi = sBRi (Ii).Let us make the following additional assumption on this original large game. Letus first defineF = σ(X ,GP,GI)∨C ,i.e., the σ -field generated by X = (Xi)i∈N , GP, GI and C .Assumption 2.3.1. (i) εi’s and ηi’s are conditionally i.i.d. across i’s givenF .(ii) {εi}ni=1 and {ηi}ni=1 are conditionally independent givenF .(iii) For each i ∈ N, E[εi|F ] = 0 and E[ηi|F ] = 0.The last condition (iii) excludes endogenous formation of GP or GI , becausethe condition requires that the unobserved type components εi and ηi be condi-tionally mean independent of these graphs, given X = (Xi)i∈N and C . However,the condition does not exclude the possibility that GP and GI are formed based on(X ,C ). Hence the formation of networks by agents using information in X or C ispermitted in the chapter.The econometrician observes only a subset N∗ ⊂ N of agents and part of GPthrough a potentially stochastic sampling process of unknown form. We assumefor simplicity that n∗ ≡ |N∗| is nonstochastic. This assumption is satisfied, forexample, if one collects the data for agents with predetermined sample size n∗.We assume that though being a small fraction of N, the set N∗ is still a large setjustifying our asymptotic framework that sends n∗ to infinity. Most importantly,constituting only a small fraction of N, the observed sample N∗ of agents inducesa payoff subgraph which one has no reason to view as “approximating” or “similarto” the original payoff graph GP. Let us make precise the data requirements.Condition A: The stochastic elements of the sampling process are conditionallyindependent of {(T ′i ,ηi)′}i∈N givenF .63Condition B: For each i ∈ N∗, the econometrician observes NP(i) and (Yi,Xi), andfor each j ∈ NP(i), the econometrician observes |NP(i)∩NP( j)|, nP( j) and X j.Condition C: Either of the following two conditions is satisfied:(a) For i, j ∈ N∗ such that i 6= j, NP(i)∩NP( j) =∅.(b) For each agent i∈N∗, and for any agent j ∈N∗ such that NP(i)∩NP( j) 6=∅,the econometrician observes Yj, |NP( j)∩NP(k)|, nP(k) and Xk for all k ∈ NP( j).Before we discuss the conditions, it is worth noting that these conditions aretrivially satisfied when we observe the full payoff graph GP and N∗=N. ConditionA is satisfied, for example, if the sampling process is based on observed charac-teristics X and some characteristics of the strategic environment that is commonlyobserved by all the players. This condition is violated if the sampling is based onthe outcomes Yi’s or unobserved payoff-relevant signals such as εi or ηi. ConditionB essentially requires that in the data set, we observe (Yi,Xi) of many agents i, andfor each GP-neighbor j of agent i, observe the number of the agents who are com-mon GP-neighbors of i and j and the size of GP-neighborhood of j along with theobserved characteristics X j.10 As for a GP-neighbor j of agent i ∈ N∗, this condi-tion does not require that the agent j’s action Yi or the full set of his GP-neighborsare observed. Condition C(a) is typically satisfied when the sample of agents N∗is randomly selected from a much larger set of agents so that no two agents haveoverlapping GP-neighbors in the sample.11 In practice for use in inference, one cantake the set N∗ to include only those agents that satisfy Conditions A-C as long asN∗ thereof is still large and the selection is based only on (X ,GP). One can sim-ply use only those agents whose GP-neighborhoods are not overlapping, as long asthere are many such agents in the data.10Note that this condition is violated when the neighborhoods are top-coded in practice. Forexample, the maximum number of friends in the survey for a peer effects study can be set to belower than the actual number of friends for many students. The impact of this top-coding upon theinference procedure is an interesting question on its own which deserves exploration in a separatepaper.11This random selection does not need to be a random sampling from the population of agents.Note that the random sampling is extremely hard to implement in practice in this situation, becauseone needs to use the equal probability for selecting each agent into the collection N∗, but this equalprobability will be feasible only when one has at least the catalog of the entire population N.64Estimating Payoff Parameters and the Average Network ExternalityIn order to introduce inference procedures for β0 and other payoff parameters, letus define for i ∈ N,Zi,1 = λiiXi,1+β0λiinP(i)∑j∈NP(i)λi jX j,1, and (2.7)Zi,2 =1nP(i)∑j∈NP(i)λi jX j,2.(Note that Zi,1 and Zi,2 rely on β0 although it is suppressed from notation for sim-plicity as we do frequently below for other quantities.) By Theorem 2.2.1, we canwriteYi = Z′i,1γ0+Z′i,2δ0+ vi,wherevi = λiiεi+β0λiinP(i)∑j∈NP(i)λi jε j +ηi.Note that the observed actions Yi are cross-sectionally dependent (conditional onXi’s) due to information sharing on unobservables εi. However, since only the typesof GP-neighbors turn out to be relevant in the best linear response, the correlationbetween Yi and Yj is non-zero only when agents i and j are GP-neighbors.We define Zi = [Z′i,1,Z′i,2]′ ∈Rdx1+dx2 and ρ0 = [γ ′0,δ ′0]′ ∈Rdx1+dx2 , where Xi,1 ∈Rdx1 and Xi,2 ∈ Rdx2 , so that we can rewrite the linear model asYi = Z′iρ0+ vi.Suppose that ϕi is M×1 vector of instrumental variables (which potentially dependon β0) with M > d ≡ dx1 +dx2 such that for all i ∈ N,E[viϕi] = 0.Note that the orthogonality condition above holds for any ϕi as long as for each65i ∈ N, ϕi is F -measurable, i.e., once F is realized, there is no extra randomnessin ϕi. This is the case, for example, when ϕi is a function of X = (Xi)i∈N . We alsoallow that each ϕi depends on β0.While the asymptotic validity of our inference procedure admits a wide rangeof choices for ϕi’s, one needs to choose them with care to obtain sharp inferenceon the payoff parameters. Especially, it is important to consider instrumental vari-ables which involve the characteristics of GP-neighbors to obtain a sharp inferenceon payoff externality parameter β0. This is because the cross-sectional dependenceof observations carries substantial information for estimating strategic interdepen-dence among agents.The moment function is nonlinear in the payoff externality β0 and it is noteasy to ensure that these moment conditions uniquely determine the true parametervector even in the limit as n∗ goes to infinity.12 In this chapter, we adopt a Bon-ferroni procedure in which we first obtain a confidence interval for β0 and, usingthis, we perform inference on ρ0. This approach works well even when β0 is notconsistently estimable.We proceed first to estimate ρ0 assuming knowledge of β0. DefineSϕϕ = ϕ ′ϕ/n∗,and letϕ˜ = ϕS−1/2ϕϕ ,where ϕ is an n∗×M matrix whose i-th row is given by ϕ ′i , i ∈ N∗. DefineΛ=1n∗ ∑i∈N∗ ∑j∈N∗E[viv j|F ]ϕ˜iϕ˜ ′j, (2.8)and let Λˆ be a consistent estimator of Λ. (We will explain how we construct this12One might consider following the nonlinear iterated least squares approach of Blundell andRobin [22]. However, it is not clear in our context whether the parameter β0 is consistently estimableacross various payoff graph configurations as n∗ diverges to infinity. Thus, this chapter takes aBonferroni approach.66estimator later.) DefineSZϕ˜ = Z′ϕ˜/n∗, and Sϕ˜y = ϕ˜ ′y/n∗,where Z is an n∗×d matrix whose i-th row is given by Z′i and y is an n∗×1 vectorwhose i-th entry is given by Yi, i ∈ N∗. Since (from the fact that GP is undirected)ci j =|NP(i)∩NP( j)|nP(i),we can construct Zi for each i ∈ N∗ from the data satisfying Conditions A-C. Thenwe estimateρˆ =[SZϕ˜ Λˆ−1S′Zϕ˜]−1SZϕ˜ Λˆ−1Sϕ˜y. (2.9)Using this estimator, we construct a vector of residuals vˆ = [vˆi]i∈N∗ , wherevˆi = Yi−Z′i ρˆ. (2.10)Finally, we form a test statistic as follows:T (β0) =vˆ′ϕ˜Λˆ−1ϕ˜ ′vˆn∗, (2.11)making it explicit that the test statistic depends on β0. Later we show thatT (β0)→d χ2M−d , as n∗→ ∞,where χ2M−d denotes the χ2 distribution with degree of freedom M−d. Let Cβ1−(α/2)be the (1− (α/2))100% confidence set for β0 defined asCβ1−(α/2) ≡ {β ∈ (−1,1) : T (β )≤ c1−(α/2)},where T (β ) is computed as T (β0) with β0 replaced by β and the critical valuec1−(α/2) is the (1− (α/2))-quantile of χ2M−d .67Then we establish13 that under regularity conditions,√n∗Vˆ−1/2(ρˆ−ρ0)→d N(0, I),as n∗→ ∞, whereVˆ =[SZϕ˜ Λˆ−1S′Zϕ˜]−1.Using this estimator ρˆ , we can construct a (1−α)100% confidence interval fora′ρ0 for any non-zero vector a. For this defineσˆ2(a) = a′Vˆ a.Let ca1−(α/4) be the (1− (α/4))-percentile of N(0,1). Define for a vector a withthe same dimension as ρ ,Cρ1−(α/2)(β0,a) =[a′ρˆ−ca1−(α/4)σˆ(a)√n,a′ρˆ+ca1−(α/4)σˆ(a)√n].Then the confidence set for a′ρ is given byCρ1−α(a) =⋃β∈Cβ1−(α/2)Cρ1−(α/2)(β ,a).Notice that since β runs in (−1,1) and the estimator ρˆ has an explicit form, theconfidence interval is not computationally costly to construct in general.Often the eventual parameter of interest is one that captures how strongly theagents’s decisions are inter-dependent through the network. Here let us introduceparameters representing the sensitivity. Let sBRi (Ii) be the best linear response ofagent i having information set Ii. Let us define the average network externality13The asymptotic theory proofs can be found in the working paper version of this chapter, (Canenet al. [32]).68with respect to variable Xi,1,r (where Xi,1,r represents the r-th entry of Xi,1) to beθ1(β0,γ0,r) =1n∗ ∑i∈N∗1nP(i)∑j∈NP(i)∂ sBRi (Ii)∂x j,1,r=1n∗ ∑i∈N∗1nP(i)∑j∈NP(i)β0ri jnP(i)(1−β0cii)(1−β0ci j)γ0,r,where γ0,r denotes the r-th entry of γ0. See (2.6). Thus the confidence interval forθ1(β0,γ0) can be constructed from the confidence interval for β0 and γ0 as follows:Cθ11−α ={θ1(β ,γr) : β ∈Cβ1−α , and γr ∈Cγr1−α},where Cβ1−α denotes the confidence interval for γ0,r. We can define similarly the av-erage network externality with respect to an entry of Xi,2 and construct a confidenceinterval for it. Details are omitted.Downweighting Players with High Degree CentralityWhen there are players who are linked to many other players in GP, the graphGP tends to be denser, making it difficult to obtain good variance estimators thatperform stably in finite samples. To remedy this situation, this chapter proposes adownweighting of those players with high degree centrality in GP. More specifi-cally, in choosing an instrument vector ϕi, we may consider the following:ϕi(X) =1√n¯P(i)gi(X), (2.12)where gi(X) is a function of X . This choice of ϕi downweights players i who havea large GP-neighborhood. Thus we rely less on the variations of the characteristicsof those players who have many neighbors in GP.Taking downweighting agents too heavily may hurt the power of the inferencebecause the actions of agents with high centrality contain information about theparameter of interest through the moment restrictions. On the other hand down-weighting them too lightly will hurt the finite sample stability of the inference dueto strong cross-sectional dependence they cause to the observations. Since a model69with agents of higher order sophisticated type results in observations with moreextensive cross-sectional dependence, the role of downweighting can be prominentin securing finite sample stability in such a model.Comparison with Linear-in-Means ModelsOne of the most frequently used interaction models in the econometrics literatureis a linear-in-means model specified as follows:Yi = X ′i,1γ0+X′i,2δ0+β0µei (Y i)+ vi, (2.13)where µei (Y i) denotes the player i’s expectation of Y i, andY i =1nP(i)∑i∈NP(i)Yi and X i =1nP(i)∑i∈NP(i)Xi.The literature assumes rational expectations by equating µei (Y i) to E[Y i|Ii], andthen proceeds to identification analysis of parameters γ , δ0 and β0. For actualinference, one needs to use an estimated version of E[Y i|Ii]. One standard way inthe literature is to replace it by Y i so that we haveYi = X ′i,1γ0+X′i,2δ0+β0Y i+ v˜i,where v˜i is an error term defined as v˜i = β0(E[Y i|Ii]−Y i)+ vi. The complexityarises due to the presence of Y i which is an endogneous variable that is involved inthe error term v˜i.14One of the frequently used approaches is to use instrumental variables. Thereare two types of instrumental variables. The first kind is a peers-of-peers typeinstrumental variable which is based on the observed characteristics of the neigh-bors of the neighbors. This strategy was proposed by Kelejian and Robinson [78],Bramoulle´, Djebbari, and Fortin [25] and De Giorgi, Pellizzari, and Redaelli [45].The second kind of an instrumental variable is based on observed characteristics14A similar observation applies in the case of a complete information version of the model, whereone directly uses Y i in place of µei (Y i) in (2.13). Still due to simultaneity of the equations, Y inecessarily involve error terms vi not only of agent i’s own but other agents’ as well.70excluded from the group characteristics as instrumental variables. (See Brock andDurlauf [28] and Durlauf and Tanaka [50].) However, finding such an instrumentalvariable in practice is not always a straightforward task in empirical research.Our approach of empirical modeling is different in several aspects. Our mod-eling uses behavioral assumptions instead of rational expectations, and producesa reduced form for observed actions Yi from using best linear responses. This re-duced form gives a rich set of testable implications and makes explicit the sourceof cross-sectional dependence in relation to the payoff graph. Our approach per-mits any nontrivial functions ofF as instrumental variables at least for the validityof the inference. Furthermore, one does not need to observe many independentinteractions for inference.Estimation of Asymptotic Covariance MatrixThe inference requires an estimator of Vˆ . First, let us find the population version ofVˆ . After some algebra, it is not hard to see that the population version (conditionalonF ) of Vˆ is given byV =[SZϕ˜Λ−1S′Zϕ˜]−1. (2.14)For estimation, it suffices to estimate Λ defined in (2.8). For this, we needto incorporate the cross-sectional dependence of the residuals vi properly. Fromthe definition of vi, it turns out that vi and v j can be correlated if i and j are con-nected indirectly through two edges in GP. However, constructing an estimator ofΛ simply by imposing this dependence structure and replacing vi by vˆi can resultin a conservative estimator with unstable finite sample properties, especially wheneach player has many players connected through two edges. Instead, this chapterproposes an alternative estimator of Λ as follows. This estimator is found to workwell in our simulation studies.We first explain our proposal to estimate Λ consistently for the case of β0 6= 0.Then we later show how the estimator works even for the case of β0 = 0. We firstwritevi = Ri(ε)+ηi, (2.15)71whereRi(ε) = λiiεi+β0λiinP(i)∑j∈NP(i)λi jε j.Define for i, j ∈ N,ei j = E [Ri(ε)R j(ε)|F ]/σ2ε ,where σ2ε denotes the variance of εi. It is not hard to see that for all i ∈ N,eii = λ 2ii +β 20 λ 2iin2P(i)∑j∈NP(i)λ 2i j,and for i 6= j such that NP(i)∩NP( j) 6=∅, ei j = β0qε,i j, whereqε,i j =λ jiλiiλ j jnP( j)+λi jλiiλ j jnP(i)+β0λiiλ j jnP(i)nP( j)∑k∈NP(i)∩NP( j)λikλ jk.Thus, we write1n∗ ∑i∈N∗E[v2i |F ] = aεσ2ε +σ2η , and (2.16)1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗E[viv j|F ] = β0bεσ2ε ,where σ2η denotes the variance of ηi,aε =1n∗ ∑i∈N∗eii, and bε =1n∗ ∑i∈N∗ ∑j∈NP(i)∩N∗qε,i j.(Note that since not all agents in NP(i) are in N∗ for all i ∈ N∗, the set NP(i)∩N∗does not necessarily coincide with NP(i).) When β0 6= 0, the solution takes the72following form:σ2ε =1n∗β0bε ∑i∈N∗ ∑j∈NP(i)∩N∗E[viv j|F ] and (2.17)σ2η =1n∗ ∑i∈N∗E[v2i |F ]−aεn∗β0bε ∑i∈N∗ ∑j∈NP(i)∩N∗E[viv j|F ].In other words, when β0 6= 0, i.e., when there is strategic interaction among theplayers, we can “identify” σ2ε and σ2η by using the variances and covariances ofresiduals vi’s. The intuition is as follows. Since the source of cross-sectional de-pendence of vi’s is due to the presence of εi’s, we can identify first σ2ε using co-variance between vi and v j for linked pairs i, j, and then identify σ2η by subtractingfrom the variance of vi the contribution from εi.In order to obtain a consistent estimator ofΛwhich does not require that β0 6= 0,we derive its alternative expression. Let us first writeΛ= Λ1+Λ2, (2.18)whereΛ1 =1n∗ ∑i∈N∗E[v2i |F ]ϕ˜iϕ˜ ′i , andΛ2 =1n∗ ∑i∈N∗ ∑j∈N∗−iE[viv j|F ]ϕ˜iϕ˜ ′j,where N∗−i = N∗\{i}. Using (2.15) and (2.17), we can rewriteΛ2 =1n∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅ei jσ2ε ϕ˜iϕ˜′j=β0n∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅qε,i jσ2ε ϕ˜iϕ˜′j=sεn∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅qε,i jϕ˜iϕ˜ ′j,73wheresε =∑i∈N∗∑ j∈NP(i)∩N∗ E[viv j|F ]∑i∈N∗∑ j∈NP(i)∩N∗ qε,i j.Now, it is clear that with this expression for Λ2, the definition of Λ is well definedregardless of whether β0 = 0 or β0 6= 0.Thus, to obtain an estimator Λˆ of Λ (up to β0), we first obtain a first-step esti-mator of ρ as follows:ρ˜ =[SZϕ˜S′Zϕ˜]−1 SZϕ˜Sϕ˜y. (2.19)Using this estimator, we construct a vector of residuals v˜ = [v˜i]i∈N∗ , wherev˜i = Yi−Z′i ρ˜. (2.20)Then we estimate15Λˆ= Λˆ1+ Λˆ2,whereΛˆ1 =1n∗ ∑i∈N∗v˜2i ϕ˜iϕ˜′i , andΛˆ2 =sˆεn∗ ∑i∈N∗ ∑j∈N∗−i:NP(i)∩NP( j)6=∅qε,i jϕ˜iϕ˜ ′j,andsˆε =∑i∈N∗∑ j∈NP(i)∩N∗ v˜iv˜ j∑i∈N∗∑ j∈NP(i)∩N∗ qε,i j. (2.21)(Note that the quantity qε,i j can be evaluated once β0 is fixed.) Using Λˆ, we con-15Under Condition C(a) for sample N∗, we have Λ2 = 0 because the second sum in the expressionfor λ2 is empty. Hence in this case, we can simply set Λˆ2 = 0.74struct the estimator for the covariance matrix V , i.e.,Vˆ =[SZϕ˜ Λˆ−1S′Zϕ˜]−1. (2.22)Later we provide conditions for the estimator to be consistent for V .16Testing for Information Sharing on UnobservablesOne may want to see how much empirical relevance there is for incorporatinginformation sharing on unobservables. Observe that when β0 = 0, presence ofinformation sharing on unobservables is not testable. When β0 = 0, it follows thatsBRi (Ii) = X′i,1γ0+ X˜′i,2δ0+ vi,where vi = εi +ηi. In this case, it is not possible to distinguish between contribu-tions from εi and ηi.Consider the following hypotheses:H0 : σ2ε = 0, andH1 : σ2ε > 0.The null hypothesis tells us that there is no information sharing on unobservables.Let vˆi(β ), aε(β ) and bε(β ) be the same as previously defined vˆi, aε and bε onlywith β0 replaced by generic β . From here on we assume that β0 6= 0.The main idea for testing the hypothesis is that when σ2ε > 0, this implies cross-sectional dependence of residuals vi. We need to compute the sample version ofthe covariance between vi and v j for GP-neighbors i and j. However, Condition Calone does not guarantee that for each i ∈ N∗, we will be able to compute vˆ j forsome j ∈ NP(i), because there may not exist such j for some i ∈ N∗ at all. Thus letus introduce an additional data requirement as follows:16 In finite samples, Vˆ is not guaranteed to be positive definite. We can modify the estimator byusing spectral decomposition similarly as in Cameron et al. [31]. More specifically, we first take aspectral decomposition Vˆ = BˆAˆBˆ′, where Aˆ is a diagonal matrix of eigen values aˆ j of Vˆ . We replaceeach aˆ j by the maximum between aˆ j and some small number c > 0 in Aˆ to construct A∗. Then themodified version V˜ ≡ BˆA∗Bˆ′ is positive definite. For c > 0, taking c = 0.005 seems to work well inthe simulation studies.75Condition D: For each i ∈ N∗, the econometrician observes a nonempty subsetN˜(i) ⊂ NP(i) (possibly a singleton) of agents where for each j ∈ N˜(i), the econo-metrician observes Yj, |NP( j)∩NP(k)|, nP(k) and Xk for all k ∈ NP( j).Condition D is satisfied if there are many agents in the data set where eachagent has at least one GP-neighbor j for which the econometrician observes theoutcome Yj, the number of their GP-neighbors, the observed characteristics of theirGP-neighbors, and the number of the agents who are both their GP-neighbors andthe neighbors of their GP-neighbors. While this data requirement can be restrictivein some cases where one obtains a partial observation of GP, it is still weakerthan the usual assumption that the econometrician observes GP fully together with(Yi,X ′i )i∈N .Now let us reformulate the null and the alternative hypothesis as follows:H0 :1n∗ ∑i∈N∗ ∑j∈N˜(i)E[viv j|F ] = 0, and (2.23)H1 :1n∗ ∑i∈N∗ ∑j∈N˜(i)E[viv j|F ] 6= 0. (2.24)For testing, we propose the following method. Let Cβ1−(α/2) be the (1−(α/2))-level confidence interval for β . We consider the following test statistics:ÎU = infβ∈Cβ1−(α/2)12Sˆ4(β )n∗∑i∈N∗∑j∈N˜(i)vˆi(β )vˆ j(β )2 ,whereSˆ2(β ) =d˜1/2avn∗ ∑i∈N∗vˆ2i (β ), and d˜av =1n∗ ∑i∈N∗|N˜(i)|.When the confidence set includes zero, the power of the test becomes asymptot-ically trivial, as expected from the previous remark that information sharing onunobservables is not testable when β0 = 0.As for the critical value, we take the (1− (α/2))-percentile from the χ2 dis-tribution with degree of freedom 1, which we denote by c1−(α/2). Then the level76α-test based on the test statistic ÎU rejects the null hypothesis if and only if ÎU >c1−(α/2). We investigate the finite sample properties of this test in the supplementalnote to this chapter.2.3.2 Asymptotic TheoryIn this section, we present the assumptions and formal results of asymptotic infer-ence. We introduce some technical conditions.Assumption 2.3.2. There exists c > 0 such that for all n∗ ≥ 1, λmin(Sϕϕ) ≥ c,λmin(SZϕ˜S′Zϕ˜)≥ c, λmin(SZϕ˜Λ−1S′Zϕ˜)≥ c, λmin(Λ)≥ c, and1n∗ ∑i∈N∗λiinP(i)∑j∈NP(i)∩N∗λi j > c,where λmin(A) for a symmetric matrix A denotes the minimum eigenvalue of A.Assumption 2.3.3. There exists a constant C > 0 such that for all n∗ ≥ 1,maxi∈N◦||Xi||+maxi∈N◦||ϕ˜i|| ≤Cand E[ε4i |F ]+E[η4i |F ]<C, where n◦ = |N◦| andN◦ =⋃i∈N∗N¯P(i).Assumption 2.3.2 is used to ensure that the asymptotic distribution is nonde-generate. This regularity condition is reasonable, because an asymptotic schemethat gives a degenerate distribution would not be adequate to derive a finite sample,nondegenerate distribution of an estimator. Assumption 2.3.3 can be weakened atthe expense of complexity in the conditions and the proofs.We introduce an assumption which requires the payoff graph to have a boundeddegree over i in the observed sample N∗.Assumption 2.3.4. There exists C > 0 such that for all n∗ ≥ 1,maxi∈N∗|NP(i)| ≤C.77We may relax the assumption to a weaker, yet more complex condition at theexpense of longer proofs, but in our view, this relaxation does not give additionalinsights. When N∗ is large, one can remove very high-degree nodes to obtain astable inference. As such removal is solely based on the payoff graph GP, theremoval does not lead to any violation of the conditions in the chapter.The following theorem establishes the asymptotic validity of the inferencebased on the best linear responses in Theorem 2.2.1. The proof is found in thesupplemental note to this chapter.Theorem 2.3.1. Suppose that the conditions of Theorem 2.2.1 and Assumptions2.3.1 - 2.3.4 hold. Then,T (β0)→d χ2M−d , and Vˆ−1/2√n∗ (ρˆ−ρ0)→d N(0, I),as n∗→ ∞. Furthermore, under the null hypothesis in (2.23),limn∗→∞P{ÎU > c1−α/2}≤ α.The asymptotic validity of inference is not affected if the researcher choosesa nonempty subset N˜(i) in Condition D as a singleton subset, say, j(i) ⊂ NP(i),j(i) ∈ N, such that we observe Yj(i), |NP( j(i))∩NP(k)|, nP(k) and Xk for all k ∈NP( j(i)) are available in the data, so far as the choice is not based on Yi’s but on Xonly.Testing for Information Sharing on UnobservablesWhen β0 = 0, it follows thatsBR.FSi (Ii) = X′i,1γ0+ X˜′i,2δ0+ vFSi ,where vFSi = εi+ηi. Therefore, we have sBRi (Ii,0) = sBR.FSi (Ii,1) and just as in thecase of a simple type model, it is not possible to distinguish between contributionsfrom εi and ηi. Thus let us assume that β0 6= 0. The presence of cross-sectionalcorrelation of residuals vFSi serves as a testable implications from information shar-ing on unobservables. As in the case of a model with agents of simple type, we78need to strengthen Condition D as follows:Condition D1: For each i ∈ N∗, the econometrician observes a nonempty subsetN˜(i) ⊂ NP(i) (possibly a singleton) of agents where for each j ∈ N˜(i), the econo-metrician observes Yj, |NP( j)∩NP(k)|, nP(k) and Xk for all k ∈ NP,2( j).Similarly as before, we consider the following test statistics:ÎUFS= infβ∈Cβ1−(α/2)12(SˆFS(β ))4n∗∑i∈N∗∑j∈N˜(i)vˆFSi (β )vˆFSj (β )2 ,where(SˆFS(β ))2 =d˜1/2avn∗ ∑i∈N∗vˆ2i (β ).As before, we reject the null hypothesis of no information sharing on unobservablesif and only if ÎUFS> c1−(α/2), where c1−(α/2) is the (1− (α/2))-percentile of χ21 .2.4 A Monte Carlo Simulation StudyIn this section, we investigate the finite sample properties of the asymptotic in-ference across various configurations of the payoff graph, GP. The payoff graphsare generated according to two models of random graph formation, which we callSpecifications 1 and 2. Specification 1 uses the Baraba´si-Albert model of prefer-ential attachment, with m representing the number of edges each new node formswith existing nodes. The number m is chosen from {1,2,3}. Specification 2 is theErdo¨s-Re´nyi random graph with probability p= λ/n, where λ is also chosen from{1,2,3}.17 In the first table, we report degree characteristics of the payoff graphsused in the simulation study.For the simulations, we also set the following: ρ0 =(γ ′0,δ ′0)′, with γ0 =(2,4,1)′,and δ0 = (3,4)′. We choose a to be a vector of ones so that a′ρ0 = 14. The vari-ables ε and η are drawn i.i.d. from N(0,1). The first column of Xi,1 is a column of17Note that in Specification 1, the Baraba´si-Albert graph is generated with an Erdo¨s-Re´nyi seedgraph, where the number of nodes in the seed is set to equal the smallest integer above 5√n. Allgraphs in the simulation study are undirected.79Table 2.2: The Degree Characteristics of the Graphs Used in the SimulationStudySpecification 1 Specification 2n m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3500 dmx 17 21 30 5 8 11dav 1.7600 3.2980 4.8340 0.9520 1.9360 2.96001000 dmx 18 29 34 6 7 9dav 1.8460 3.5240 5.2050 0.9960 1.9620 3.00205000 dmx 32 78 70 7 10 11dav 1.9308 3.7884 5.6466 0.9904 2.0032 3.0228Notes: This table gives characteristics of the payoff graphs, GP, used in the simulation study. dav anddmx denote the average and maximum degrees of the payoff graphs.ones, while the remaining columns of Xi,1 are drawn independently from N(1,1).The columns of Xi,2 are drawn independently from N(3,1).For instruments, we consider the following nonlinear transformations of X1 andX2:ϕi = [Z˜i,1,X2i,1,X2i,2,X3i,2]′,where we defineZ˜i,1 ≡ 1nP(i) ∑j∈NP(i)λi jX j,1.We generate Yi from the best response function in Theorem 2.1. While the in-struments X2i,1,X2i,2,X3i,2 capture the nonlinear impact of Xi’s, the instrument Z˜i,1captures the cross-sectional dependence along the payoff graph. The use of thisinstrumental variable in crucial in obtaining a sharp inference for β0. Note thatsince we have already concentrated out ρ in forming the moment conditions, wecannot use linear combinations of Xi,1 and Xi,2 as our instrumental variables. Thenominal size in all the experiments is set at α = 0.05.Overall, the simulation results illustrate the good power and size propertiesfor the asymptotic inference on β0 and a′ρ0. As expected, the average length ofthe confidence intervals for both β0 and a′ρ0 become shorter as the sample size80Table 2.3: The Empirical Coverage Probability of the Confidence Intervalsfor β0 at 95% Nominal Level.Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 0.9760 0.9715 0.9710 0.9740 0.9685 0.9700n = 1000 0.9665 0.9720 0.9660 0.9785 0.9705 0.9755n = 5000 0.9745 0.9705 0.9680 0.9715 0.9765 0.9690−0.3 n = 500 0.9730 0.9690 0.9690 0.9740 0.9690 0.9655n = 1000 0.9665 0.9710 0.9660 0.9750 0.9700 0.9710n = 5000 0.9700 0.9670 0.9645 0.9710 0.9755 0.96450 n = 500 0.9610 0.9660 0.9715 0.9735 0.9670 0.9625n = 1000 0.9640 0.9690 0.9690 0.9745 0.9665 0.9670n = 5000 0.9685 0.9670 0.9655 0.9710 0.9745 0.96600.3 n = 500 0.9640 0.9665 0.9725 0.9780 0.9720 0.9675n = 1000 0.9670 0.9695 0.9655 0.9770 0.9700 0.9730n = 5000 0.9705 0.9660 0.9635 0.9725 0.9755 0.96750.5 n = 500 0.9725 0.9735 0.9735 0.9800 0.9715 0.9700n = 1000 0.9690 0.9705 0.9660 0.9770 0.9770 0.9785n = 5000 0.9720 0.9695 0.9650 0.9770 0.9755 0.9730Notes: The table reports the empirical coverage probability of the asymptotic confidence interval forβ0. The simulated rejection probability at the true parameter is close to the nominal size of α = 0.05.The simulation number is R = 2000.increases. We find that the confidence interval for β0 exhibits empirical coverageclose to the 95% nominal level, while the confidence interval for a′ρ0 is somewhatconservative. This conservativeness is expected, given the fact that the interval isconstructed using a Bonferroni approach.2.5 Empirical Application: State Presence acrossMunicipalities2.5.1 Motivation and BackgroundState capacity (i.e., the capability of a country to provide public goods, basic ser-vices, and the rule of law) can be limited for various reasons. (See e.g. Besley81Table 2.4: The Average Length of Confidence Intervals for β0 at 95% Nomi-nal Level.Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 0.2735 0.3069 0.3334 0.3318 0.2933 0.2832n = 1000 0.2264 0.2513 0.2749 0.2874 0.2489 0.2491n = 5000 0.1771 0.1783 0.1971 0.2368 0.1917 0.1818−0.3 n = 500 0.2541 0.2971 0.3245 0.3189 0.2818 0.2703n = 1000 0.2132 0.2445 0.2680 0.2810 0.2398 0.2412n = 5000 0.1710 0.1739 0.1952 0.2387 0.1901 0.17910 n = 500 0.2434 0.2881 0.3175 0.3043 0.2692 0.2606n = 1000 0.2051 0.2377 0.2627 0.2730 0.2316 0.2332n = 5000 0.1639 0.1715 0.1925 0.2373 0.1877 0.17460.3 n = 500 0.2481 0.2891 0.3149 0.2982 0.2652 0.2629n = 1000 0.2114 0.2377 0.2639 0.2673 0.2260 0.2314n = 5000 0.1657 0.1729 0.1913 0.2325 0.1884 0.17100.5 n = 500 0.2613 0.2974 0.3186 0.2977 0.2693 0.2691n = 1000 0.2193 0.2429 0.2688 0.2620 0.2267 0.2350n = 5000 0.1703 0.1770 0.1912 0.2253 0.1851 0.1687Notes: The table reports average length of the asymptotic confidence interval for β0. The averagelengths of the confidence intervals decrease with n. The simulation number is R = 2000.and Persson [19] and Gennaioli and Voth [59]). A “weak state” may arise due topolitical corruption and clientelism, and result in spending inadequately on publicgoods (Acemoglu [3]), accommodating armed opponents of the government (Pow-ell [100]), and war (McBride et al. [92]). Empirical evidence has shown how theseweak states can persist from precolonial times. Higher state capacities seem re-lated to the current level prosperity at the ethnic and national levels (Gennaioli andRainer [58] and Michalopoulos and Papaioannou [95]).Our empirical application is based on a recent study by Acemoglu et al. [4]who investigate the local choices of state capacity in Colombia, using a model ofa complete information game on an exogenously formed network. In their set-up, municipalities choose a level of spending on public goods and state presence(as measured by either the number of state employees or state agencies). Thereis network externality in a municipality’s choice because municipalities that are82Table 2.5: The Empirical Coverage Probability of Confidence Intervals fora′ρ0 at 95% Nominal Level.Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 0.9945 0.9955 0.9925 0.9955 0.9950 0.9910n = 1000 0.9925 0.9930 0.9955 0.9955 0.9970 0.9910n = 5000 0.9850 0.9875 0.9900 0.9915 0.9925 0.9930−0.3 n = 500 0.9935 0.9960 0.9940 0.9960 0.9950 0.9905n = 1000 0.9905 0.9935 0.9955 0.9955 0.9960 0.9910n = 5000 0.9795 0.9880 0.9895 0.9885 0.9940 0.98950 n = 500 0.9935 0.9965 0.9945 0.9955 0.9925 0.9925n = 1000 0.9925 0.9935 0.9945 0.9960 0.9970 0.9920n = 5000 0.9785 0.9875 0.9895 0.9840 0.9910 0.98600.3 n = 500 0.9950 0.9940 0.9935 0.9955 0.9915 0.9925n = 1000 0.9940 0.9940 0.9940 0.9920 0.9965 0.9925n = 5000 0.9790 0.9895 0.9925 0.9790 0.9860 0.98500.5 n = 500 0.9940 0.9925 0.9935 0.9940 0.9915 0.9920n = 1000 0.9945 0.9940 0.9945 0.9890 0.9930 0.9930n = 5000 0.9835 0.9900 0.9910 0.9645 0.9850 0.9860Notes: The true a′ρ0 is equal to 14.The table reports the empirical coverage probability of the asymp-totic confidence interval. The coverage probability for a′ρ0 is generally conservative, which is ex-pected from the use of the Bonferroni approach. Nevertheless, the length of the confidence intervalis reasonably small. The simulation number, R, is 2000.adjacent to each other can benefit from their neighbors’ choices of public goodsprovisions, such as increased security, infrastructure and bureaucratic connections.Thus, a municipality’s choice of state capacity can be thought of as a strategicdecision on a geographic network.It is not obvious that public good provision in one municipality leads to higherspending on public goods in neighboring municipalities. Some neighbors mayfree-ride and under-invest in state presence if they anticipate others will investhighly. Rent-seeking by municipal politicians would also limit the provision ofpublic goods. On the other hand, economies of scale could yield complementari-ties in state presence across neighboring municipalities.1818Note that our analysis excludes the possibility that the provision of of public goods by municipal-83Table 2.6: The Average Lengths of Confidence Intervals for a′ρ0 at 95%Nominal Level.Specification 1 Specification 2β0 m = 1 m = 2 m = 3 λ = 1 λ = 2 λ = 3−0.5 n = 500 1.5840 1.6752 1.7643 1.3326 1.4123 1.8075n = 1000 1.3626 1.4522 1.4819 1.1218 1.1474 1.4152n = 5000 0.9995 0.9979 1.0333 0.8658 0.7786 0.8503−0.3 n = 500 1.5337 1.6361 1.7225 1.4037 1.4196 1.7567n = 1000 1.3263 1.4140 1.4466 1.2028 1.1714 1.3966n = 5000 0.9896 0.9749 1.0145 0.9656 0.8244 0.86500 n = 500 1.5068 1.6007 1.6761 1.5646 1.4632 1.6976n = 1000 1.3060 1.3607 1.4031 1.3721 1.2259 1.3798n = 5000 0.9840 0.9486 0.9873 1.1527 0.9154 0.89220.3 n = 500 1.5516 1.6019 1.6501 1.8290 1.5653 1.6869n = 1000 1.3416 1.3257 1.3754 1.6133 1.3204 1.3959n = 5000 1.0066 0.9412 0.9690 1.3938 1.0199 0.92130.5 n = 500 1.6553 1.6353 1.6567 2.1022 1.6772 1.7101n = 1000 1.4069 1.3146 1.3731 1.8362 1.4272 1.4376n = 5000 1.0552 0.9420 0.9542 1.5865 1.1029 0.9593Notes: The true a′ρ0 is equal to 14. The table reports the average lengths of the asymptotic confidenceinterval for a′ρ0. The length of the confidence interval is reasonably small. The simulation number,R, is 2000.In our study, we extend the model in Acemoglu et al. [4] to an incomplete in-formation game where information may be shared across municipalities. In partic-ular, we do not assume that all municipalities know and observe all characteristicsand decisions of the others. It seems reasonable that the decisions made acrossthe country may not be observed or well known by those municipalities that aregeographically remote.ities is related to the preferences of agents who may sort into regions according to their preferencesfor such public goods (as may be the case in models based on the insights of Tiebout [108]).84Figure 2.3: Degree Distribution of GPNotes: The figure presents the degree distribution of the graph GP used in the empirical specification.The average degree is 5.48, the maximum degree is 20, and the minimum degree is Empirical Set-upLet yi denote the state capacity in municipality i (as measured by the (log) num-ber of public employees in municipality i) and GP denote the geographic network,where an edge is defined on two municipalities that are geographically adjacent.19We assume that GP is exogenously formed. The degree distribution of GP is shownin Figure 2.3. We study the optimal choice of yi, where yi leads to a larger prosper-ity pi. Prosperity in municipality i is modeled as:pi =(β y¯+ x1,iγ+ηi+ εi+ ςDi)yi, (2.25)where ςDi is a district specific dummy variable, εi and ηi are our sharable andnon-sharable private information, and y¯ = 1nP(i) ∑ j∈NP(i) y j. The term x1,i representsmunicipality characteristics. These include geographic characteristics, such as landquality, altitude, latitude, rainfall; and municipal characteristics, such as distanceto highways, distance to royal roads and Colonial State Presence.2019This corresponds to the case in of δ1 = δ2 = 0 in Acemoglu et al. [4].20Note that, from our notation in Section 3, here we take x2,i = 0. This is done for a closercorrespondence to the specification in Acemoglu et al. [4]. Finally, note that pi is only a function of85The welfare of a municipality is given byui(yi,y−i,T,ηi) = pi(yi, y¯,T,ηi)− 12y2i , (2.26)where the second term refers to the cost of higher state presence, and the first termis the prosperity pi.We can rewrite the welfare of the municipality by substituting (2.25) into (2.26):ui(yi,y−i,T,ηi) =(β y¯+ x1,iγ+ηi+ εi+ ςDi)yi− 12y2i , (2.27)which is our model from Section 3. We assume that municipalities (or the mayor incharge), wishes to maximize welfare by choosing state presence, given their beliefsabout the types of the other municipalities.In our specification, we allow for incomplete information. This is reflected inthe terms εi, ηi, which will be present in the best response function. The munici-pality, when choosing state presence yi, will be able to observe εi of its neighborsand will use its beliefs over the types of the others to generate its best response.The best response will follow the results from Theorem (2.2.1).2.5.3 Model SpecificationWe follow closely Table 3 in Acemoglu et al. [4] for the choice of specifications andvariables. Throughout the specifications, we include longitude, latitude, surfacearea, elevation, annual rainfall, department fixed effects and a department capitaldummy (all in X1). We further consider the effect of variables distance to currenthighways, land quality and presence of rivers in the municipality.For the choice of instruments, we consider two separate types of instruments.The first is the sum of neighbor values (across GP) of the historical variables (de-noted as Ci).21 The historical variables used are Total Crown Employees (alsoterms are multiplied by yi. This is a simplification from their specification. We do so because wewill focus on the best response equation. The best response equation, derived from the first ordercondition to this problem, would not include any term that is not a function of yi itself.21For this, we assume the exclusion restriction in Acemoglu et al. [4], namely that historical vari-ables only affect prosperity in the same municipality. This means that although one’s historicalvariables (Total Crown Employees, Distance to Royal Roads, Colonial State Agencies and HistoricalPopulation, as well as functions thereof) can affect the same municipality’s prosperity, it can only86called Colonial State Officials), Distance to Royal Roads, Colonial State Agenciesand Historical Population, as well as Colonial State Presence Index squared andDistance to Royal Roads squared. The later two add additional power to the in-ference. We also use the variable Z˜i = nP(i)−1∑ j∈NP(i)λi jX j,1 as part of the instru-mental variable, which was shown to perform well in the Monte Carlo Simulationsin Section 5. This variable captures cross sectional dependence as a crucial sourceof variation for inference on the strategic interactions. We use downweighting ofour instruments as explained in a preceding section.2.5.4 ResultsThe results across a range of specifications are presented in Table 2.7. In these re-sults, we see that the effect is statistically different than 0 and stable across specifi-cations. It indicates that there is complementarity in the provision of public goodsand state presence (β > 0).Let us compare our results to those in Acemoglu et al. [4]. There, the authorsreport the average marginal effects over their weighted graph. The (weighted)average degree is 0.0329, so our results can be compared in an approximation, byconsidering 0.0329 βˆ .In general, our estimates have the same sign and significance as those of Ace-moglu et al. [4]. Our estimates are in the range of [0.002, 0.013], after reweightingas mentioned before, somewhat comparable to theirs of [0.016, 0.022] (in the caseof the outcome of the number of public employees, in Table 3). Hence, we find sim-ilar qualitative effects, although a smaller magnitude. Recall that our confidenceset is built without assuming that β0 is consistently estimable.In Figure 2.4, we show the results of our estimated network externalities forthe estimates from Table 2.7, for the importance of being a department capital.The average network externality is computed from1N ∑i∈N1nP(i)∑j∈NP(i)β0γˆdcnP(i)(1−β0cii)(1−β0ci j) ,affect those of the neighbors by impacting the choice of state capacity in the first, which then impactsthe choice of the state capacity in the neighbors.87Table 2.7: State Presence and Networks Effects across Colombian Munici-palitiesOutcome: The Number of State EmployeesBaseline Distance to Highway Land Quality Rivers(1) (2) (3) (4)β0 [0.16,0.31] [0.15,0.32] [0.17,0.39] [0.09,0.38]dyi/d(colonial state [−0.051,0.001] [−0.045,−0.001] [−0.043,0.000] [−0.024,0.003]officials)Averagedyi/d(colonial state [−1.138,3.760] [−1.335,2.742] [−0.609,3.388] [−1.775,1.987]agencies)Averagedyi/d(distance to [−0.010,0.009] [−0.008,0.010] [−0.007,0.015] [−0.005,0.012]Royal Roads)n 1018 1018 1003 1003Notes: Confidence sets for β are presented in the table, obtained from inverting the test statistic T (β )from Section 3, with confidence level of 95%. The critical values in the first row come from theasymptotic statistic. Downweighting is used. The average marginal effects for historical variablesupon state capacity are also shown. This is computed from finding the confidence set for the ap-propriate γ estimate. For Colonial State Agencies and Distance to Royal Roads, since they enter inquadratic form in X1, we show the average marginal effect. All specifications include controls oflatitude, longitude, surface area, elevation, rainfall, as well as Department and Department capitaldummies. Instruments are constructed from payoff neighbors’ sum of the GP neighbors values ofthe historical variables Total Crown Employees, Colonial State Agencies, Colonial State Agenciessquared, population in 1843, distance to Royal Roads, distance to Royal Roads squared, togetherwith the non-linear function Z˜i = nP(i)−1∑ j∈NP(i) λi jX j,1. Column (2) includes distance to currenthighway in X1, Column (3) expands the specification of Column (2) by also including controls forland quality (share in each quality level). Column (4) controls for rivers in the municipality and landquality, in addition to those controls from Column (1). One can see that the results are very stableacross specifications.where γˆdc is the estimated parameter of the X1 variable department capital, and wevary β0 within its confidence set. The parameter is defined in Section 3.1.2, andcaptures the average effect of a neighbour being a department capital.The figure shows that there is a strong and increasing network externality frombeing a department capital over the range of the confidence set of β . This indicatesthat the effect of being a capital has spillovers on other municipalities: since β > 0,and one expects that department capitals have more state presence and resources,88Figure 2.4: Average Network Externality from being a Department CapitalNotes: The figure presents the average network externalities from being a department capi-tal. We use the estimated results from Column (3) in Table 2.7. This captures the externalityfor a municipality from being a department capital, which involves higher state presenceand centralization of resources. This effect is not only the direct effect, but it also quantifiesa reflection effect: neighbors of department capitals also benefit from it. The grey shadedarea represents the 95% confidence interval for β0.being a department capital yields increasing returns the stronger the complemen-tarity.2.6 ConclusionThis chapter proposes a new approach of empirical modeling for interactions amongmany agents when the agents observe the types of their neighbors. The main chal-lenge arises from the fact that the information sharing relations are typically con-nected among a large number of players whereas the econometrician observes onlya fraction of those agents. Using a behavioral model of belief formation, this chap-ter produces an explicit form of best linear responses from which an asymptoticinference procedure for the payoff parameters is developed. As we showed in ourchapter, this explicit form gives a reduced form for the observed actions, and ex-hibits various intuitive features. For example, the best linear responses show thatnetwork externality is heterogeneous across agents depending on the relations of89their payoff neighbors.The advantage of our chapter’s approach is two-fold. First, the empirical mod-eling according to our approach accommodates a wide range of sampling pro-cesses. Such a feature is crucial because the econometrician rarely has preciseknowledge about the actual sampling process through which data are generated.Second, the model can be used when only part of the players are observed from alarge connected network of agents.90Chapter 3Identification Using Attrition3.1 Introduction[Attrition is] perhaps the most potentially damaging andfrequently-mentioned threat to the value of panel data.— Moffit, Fitzgerald, and Gottschalk (1999)Shortly after Heckman’s seminal 1976 work on sample selection bias (Heck-man [69]), the author diagnosed a related problem - that of selection bias due tonon-random sample attrition.1 Since then, a large literature has emerged proposingsolutions to the problem (e.g., Hausman and Wise [66], Moffit et al. [96]).This chapter discusses the value associated with the information contained inattrition patterns. The key insight is as follows: provided that agents attrit from asample differently according to their unobserved attributes, the researcher can usethis information to back out the distribution of unobserved heterogeneity. Overall,information on attrition can be used to address a variety of interesting identificationproblems - both in economics and beyond.A key related work that recognizes the value of attrition for identification isBellemare [16]. The author uses sample attrition to identify the probability that amigrant leaves his adopted country (termed “outmigration”) when this decision isunobserved by the researcher. Using identification arguments from Lewbel [84],1See discussion in Moffit et al. [96]91the author shows that the outmigration probability can be identified from sampleattrition when the researcher observes a continuous variable that affects attritionsolely through outmigration via a single index. The author stresses that, in contrastwith the literature on mismeasured binary variables (e.g., Hausman et al. [68],Lewbel [84]), the key properties of attrition imply that missclassification is onlypossible when attrition occurs. One purpose of this chapter is to stress that attritionmay be informative for unboserved heterogeneity in a more general sense.This chapter is related to the literature on identifying nonlinear models withmeasurement error. One way to view the role of attrition in this chapter is as a typeof (possibly repeated) measurement of the unobserved quantity. However, thereare at least two important differences between attrition and repeated measurementsas they are typically considered in the nonlinear measurement error literature. Thefirst difference recognizes the strong possibility that an agent’s attrition decisionswill be correlated across time - even conditional on the unobserved heterogeneityitself. In Cunha et al. [43], the distribution of unobservables are identified whenrepeated measurements of the unseen quantity satisfy an assumption of mutualconditional dependence.23 A second key difference between attrition and tradi-tional measurements is the tendency for attrition to be irrevocable, which impartsa special structure to the data. A common data structure when the researcher seeswhether or not an agent of has attrited in a T -period sample is as follows: if theagent has attrited at any period t with t ≤ T , the researcher sees a t × 1 vectorof ones with a zero in the the last entry, and a T × 1 vector of ones otherwise.Throughought this chapter, we will maintain the assumption that attrition is anabsorbing state - in other words, once an agent has attrited from the researcher’ssample, they are gone from that sample for good.4Far from being a nuisance - dependence in the attrition process over time maybe precisely what is informative for the distribution of unobserved heterogene-ity under certain functional form restrictions - even when there are no covariates.2This is Assumption 2 of Theorem 2 of Cunha et al. [43]. This is also equivalent to Assumption2 of Hu and Schennach [72] (see Cunha et al. [44]).3In fact, in the absence of additional information, Lemma 3.2.1 shows how such conditional inde-pendence can lead to non-identification. Examples are then provided where parametric assumptionsand temporal dependence may restore identification.4This is what Robins et al. [101] call a monotone missing data pattern.92In Section 3.2.3, we show how an analogue of the hazard function familiar fromthe econometrics of duration analysis arises as a natural byproduct of a particularspecification of the attrition process. Thus, this chapter is related to a literature thatshows how duration models can be used to identify unobserved heterogeneity.5Another relevant related literature is that concerned with identification in non-linear panel models with mismeasurement. Like Hausman et al. [67], Schennach[106], we consider repeated measurements of the mismeasured variable. Althoughwe use repeated measurements of attrition as information on heterogeneity, therelationship is clearly not linear. Moreover, the models we discuss are not strictlyof a panel nature, since the researcher does not need to observe outcome variablefor more than one period.As the main goal of this chapter is to highlight the usefulness of attrition datafor identification in as simple an exposition as possible, we restrict ourselves to dis-crete forms of unobserved heterogeneity and focus on the problem of identificationin parametric models.6 Moreover, to stress the usefulness of focusing on discreteunboserved heterogeneity, we center much of our discussion of identification ona new parameter known as the Type-Targeted Treatment Effect (TTTE).7 In ourfocus of identifying the TTTE, we will suppose that the main outcome variablessatisfy a non-differential error assumption.8 In other words, attrition doesn’t affectthe outcome of interest conditional on the unobserved heterogeneity.The remainder of this chapter is structured as follows. In Section 3.2 we intro-duce the notion of the TTTE and discusses identification. In Section 3.3 we proposea minimum distance inference procedure appropriate for our setup. A short MonteCarlo simulation shows the procedure performs acceptably in finite samples. InSection 3.4 we show how - in conjunction with the preceding discussions on iden-tification - attrition data can be used to correct for selection on unobservables.5For example, e.g., see pages 88-93 in Heckman and Singer [70]) for a discussion of non-parametric identification of duration models6Williams [111] show how continuous forms of heterogeneity may not be identified in a modelswith binary proxies.7This parameter was first proposed in work-in-progress the author is pursuing with Hugo BorgesJales, titled “Type-Targeted Treatment Effects”.8This assumption has been employed widely in the measurement error literature (e.g., see Hu[71] Assumption 1 and associated references).933.2 Identification of the Type-Targeted Treatment EffectThe Average Treatment Effect (ATE) is a parameter informative for the meanchange in outcome induced by assigning someone to one treatment (or policy)state of interest compared to another. The ATE is widely used for evaluating treat-ments in many fields for its simplicity of implementation and ease of interpretation.Despite these advantages, a naı¨vely deployed ATE may lead the researcher astray.For instance, a researcher interested in the mean impact of a new drug versus theindustry standard would ideally select a sample of individuals who actually havethe disease the drug is designed to treat. Unfortunately, a researcher will likelyhave to face an imperfectly composed sample, due to the difficulty of accuratelyobserving the health status of each patient when the decision to admit them intothe study is made.We suppose that an agent can have one of two possible types - either the agentis a ‘target’ type, θ1, or a ‘mistarget’, θ0. Let Y˜i =Yi1−Yi0, where Yi1 represents theperson’s potential outcome when they receive the treatment and Yi0 represents theirpotential outcome without the treatment. For example, in a study of the impact offinancial assistance on graduate student outcomes, the type might capture whetherthe student is an ‘academic type’ (e.g., has a talent and passion for research), andY˜i may represent a student’s potential gain from receiving financial assistance interms of the student’s course grades. We denote the probability of targeted agentsin the population as p = P(θi = θ1). By the law of iterated expectations, we canwrite the standard average treatment effect (ATE) 9 asa = a1 p+a0(1− p), (3.1)where the type-targeted treatment and (mis-)targeted treatment effect are defined9Conditioning on covariates is trivial, so we omit them for notational convenience. At this stage,the reader can implicitly assume that treatment assignment is independent of potential outcomes, or,as in a model with covariates, conditionally independent of potential outcomes. This simplificationis done to highlight the main issues of unobserved heterogeneity and attrition.94asa1 ≡ E[Y˜i|θ1], anda0 ≡ E[Y˜i|θ0],where we express E[Y˜i|θ ] = E[Y˜i|θi = θ ] for each type for notational convenience.The average treatment effect is a weighted average of the treatment effect on thetargets a1, and the mistargets, a0, where weights are given by the proportion ofindividuals from each type in the sample. Since the effect of the policy on thetargets is the true object of interest, we can think of a1 as a ‘Type-Targed’ TreatmentEffect (TTTE). Provided that p is strictly less than one, then there will necessarilybe some mistargets in the population. Depending on how potential outcomes areaffected by type, the ATE and TTTE may differ substantially. For instance, if pis low then a may be very small even when a1 is not. In such a case, a researcherwho considers the ATE and ignores the possibility of mistargeting may erroneouslyconclude that financial assistance doesn’t improve student outcomes, or - in themedical example - a drug is not an improvement over the status quo for the subjectsof interest.Type mistargeting is also a relevant issue in economics more broadly. Whenexamining how receiving disability insurance affects the labour supply decision(e.g., Maestas et al. [89]), it is natural to question what impact such policies wouldhave on the subpopulation of agents who actually suffer from chronic disability.The question is not obvious, since disability status is typically only imperfectlyobserved. In welfare reform studies, policy changes such as workfare legislationmay make it less desirable for noncompliers to receive welfare in both the treatmentand control groups of the study.1010Economic theory can help us understand how people of different types respond to such policyvariations. For example, in Besley and Coate [18] work requirement legislation can induce fullseparation by type; i.e., only the needy find it incentive compatible to continue collecting welfarepayments after the policy change is introduced. Here the policy acts as a screening mechanism.953.2.1 General SetupAgents have one of two types, θ1 or θ0, where θ1 is the target type of individualsin the study and θ0 is the mistargeted type. Our object of identification is the typetargeted treatment effect on the targets, a1, untreated, a0, and the marginal typeprobability p. In this section, we will write the parameters of interest as functionsof differences in the potential outcomes, to avoid discussing separate identificationthat have been well explored in the literature. In all that follows, we assume thatthe econometrician observes some outcomes, Yi, a treatment, Di, and a variable{Li j}Jij=0 for all agents, where Ji ≥ 1 for all i. Li j is a variable that takes value l0 ifagent i fails the jth test and l1 if the agent passes a test. We can think of “tests” asinstitutional changes or policy interventions, the “score” being a binary behavioralresponse to the policies, such as attrition from the econometrician’s sample. Wealso suppose that once an agent attrits from the sample they are gone for good -that is; Li j = l0 implies that Ji = j.The broad intuition of our approach is as follows: once the outcomes of agentsare observed, behavior of participants that is both type-specific and observable may,under certain assumptions, allow the econometrician to compute causal effects ina way that is targeted to the subpopulation of interest.3.2.2 Identification Under One Period of AttritionIn this section, we consider what can be learned when the econometrician observesat most one test for each agent. That is, Ji = 1 for all i. By the Law of Iteratedexpectations, we know that the ATE and ATE conditional on conditional on passingone test can be written as:E[Y˜i|Li = l1] = E[Y˜i|θ1, l1]P(θ1|l1)+E[Y˜i|θ0, l1]P(θ0|l1).We now introduce an assumption that reduces the dimensionality of the identifica-tion problem.Assumption 3.2.1. For each i, Y˜i is independent of Li conditional on θi.Assumption 3.2.1 implies that the difference in the agent’s potential outcomesare conditionally independent of the previous test outcomes once the agent’s true96type is accounted for. Effectively, the only way the treatment effect varies onceconditioning on test passing is through changes in the composition of each type.11The reasonability of Assumption 3.2.1 should be carefully justified for each ap-plication. This condition fails if the probabilty that an individual of a particulartype passes a test is conditionally correlated with her gain from treatment. For ex-ample, suppose that Li represents whether or not a student decides drop out of agraduate program during the first year. Then Assumption 3.2.1 is reasonable if theresearcher believes that - conditional on the student’s type - the difference in thestudent’s first term grade-point average with and without financial assistance, Y˜i,does not influence their decision to drop out of the program. Under Assumption3.2.1 , the previous equation simplifies to:E[Y˜i|l1] = a1λ p/`+(1−λ p/`)a0,where λ = P(Li = l1|θi = θ1), and `= P(Li = l1). Therefore, provided that p> 0,the difference in TTTE, ∆≡ a1−a0, can be written as:∆=(E[Y˜i|l1]−E[Y˜i])`(λ − `)p . (3.2)Since p and λ are unknown, ∆ is unidentified in the absence of further information.However, the sign of ∆may be identified if the researcher knows that targeted typesare strictly or more or less likely to pass than the mistargeted agents; e.g., λ−`> 0.In the education example, the parameter a1 captures the expected differencein the student’s first year grades with and without financial assistance for thosestudents of innately high quality, while a0 captures the same for the students of lowquality.Identification Using CovariatesSince λ and p are unknown, we now explore how these can be learned when at-trition varies according to both the agent’s unknown type and some observablecharacteristics. Consider the joint distribution of attrition and a variable Xi with11This is an exclusion restriction of the kind that is used in Section 4.1. of Bellemare [16].97supportX ⊂ Rm:P(Li = l1,Xi = x) =∑θP(l1|x,θ)P(x|θ)P(θ).We will consider a parametric approach to identification. In particular, we willsuppose that the researcher is willing to take a stand on attrition process.Assumption 3.2.2. a) Li = 1{γ ′ψ(Xi,θi)+ηi > 0}, where ηi is continuous withknown cdf, Ψ, and where ψ : Rm+1→ Rd is a known, non-stochastic function. b)Xi is independent of θi for each i.Denote g(τ,x) ≡ Pτ(l1|x) for each τ ∈ T . Our model is identified if for allx ∈ X , there exists no τ 6= τ0 such that Pτ(l1|x) = P(l1|x). Under Assumption3.2.2, we can express the conditional probability of passing a test asg(τ,x) =Ψ(γ ′ψ(x,θ1))p+Ψ(γ ′ψ(x,θ0))(1− p),where τ = (γ ′, p). We can approach identification in our context by consideringsolutions to k ≥ d+1 non-linear equations of the following formg(τ0)−pi0 = 0, (3.3)where g(τ0) = (g(τ0,x1), ...,g(τ0,xk))′ and pi0 = (P(l1|x1), ...,P(l1|xk))′. Sufficientconditions for identification of τ0 are that the parameter space, T , is a compactsubset of Rk, g(τ) is continuous on T , and the solution to equation 3.3 is unique.Demonstrating that equation 3.3 has a unique solution is not straightforward. In-stead, we may argue that local identification should hold for many models of inter-est satisfying our functional form assumptions.12 In our case, τ0 is locally identifiedif g is continuously differentiable in τ over T , and the Jacobian of g(τ0) has rankk.13 If p and γ are identified, then we may also establish the identification of the12See Rothenberg [103], Sargan [105], and pages 2127 of Newey and McFadden [97].13This is the standard condition for local identification in minimum distance models (see page2128 of Newey and McFadden [97]).98difference in type targeted effects, ∆, from equation 3.2, sinceλ = ∑x∈Xg(τ0,x)P(x).In order to separately identify a1 and a0, we may extend Assumption 3.2.2 sothat Xi also satisfies an exclusion restriction.Assumption 3.2.3. (i) Y˜i is independent of Li conditional on θi and (ii) Y˜i is inde-pendent of Xi conditional on θi.Assumption 3.2.3 says that the change in potential outcomes are not influencedby Xi once the type is taken into account. Returning to the education examplefrom before, suppose Xi measures the student’s experience in the private sector.Then, conditional on the student’s ability, it may be reasonable to imagine that thestudent’s private sector experience may influence their likelihood of dropping outof graduate school in the first year without affecting their first term grades (with andwithout financial assistance). The usefulness of Assumption 3.2.3 for identifyingthe model can be seen by considering equations of the formE[Y˜i|Li = l1,Xi = x] = E[Y˜i|l1,x,θ1]P(θ1|l1,x)+E[Y˜i|l1,x,θ0](1−P(θ1|l1,x)).Under Assumption 3.2.3 we have:E[Y˜i|l1,x,θ1] =a1, andE[Y˜i|l1,x,θ0] =a0.Therefore, using Bayes’ Rule and Assumption 3.2.2 we have14E[Y˜i|Li = l1,Xi = x] =a1P(l1|x,θ1)P(l1|x)p+a0 (1−P(l1|x,θ1)P(l1|x)p) .14That isP(θ1|l1,x) = P(l1|x,θ1)P(θ1,x)P(l1,x) =P(l1|x,θ1)P(θ1)P(x)P(l1,x)= P(l1|x,θ1)P(θ1)P(l1|x).99We wish to identify the following d + 3 parameters: a1,a0, p and γ . For the pur-poses of this exposition, we use the identification of γ and p from the previoussection. Denotey(x)≡ E[Y˜i|l1,x], andc(x)≡ P(l1|x,θ1)P(l1|x)P(θ1).Then, provided C−1(x) exists, the TTTE, α = (a1,a0), are identified asα =C−1(x)y,where y = (y(x1),y(x2))′, and C(x) is a 2x2 matrix with row r equal to (c(xr),1−c(xr))′.Example Local Identification Using CovariatesIn this section, we provide some simple examples of specifications for the attritionprocess that admit local identification. Take Li = 1{τ1θiXi+ηi > 0}, where τ1 ∈R,ηi is standard normal, θi takes values in {0,1}, and Xi takes at least two distinctvalues. Then we can express the conditional probability of attrition given Xi = x asg(τ,x) =Ψ(τ1x)τ2+Ψ(0)(1− τ2),where Ψ(·) is the standard normal cdf and τ2 ∈ (0,1). In this example, we maycompute the Jacobian matrix, J = ∂g(τ)/∂τ ′, explicitly. The rank condition issatisfied when the determinant of J is nonzero. Therefore, τ is locally identifiedfor the current example provided thatΨ′(τ1)Ψ′(2τ1)26= (Ψ(τ1)−Ψ(0))(Ψ(2τ1)−Ψ(0)) , (3.4)noting that we have assumed that Xi takes values in the set {1,2}, without any lossof generality. Similar arguments can be used to establish the identification of τ1,τ2, and p when the attrition process is given by Li = 1{τ1Xi+ τ2θi+ηi > 0} (as in3.8), provided that Xi has additional support points.1003.2.3 Identification Using Multiple Periods of AttritionIn this section, we investigate what can be learned from outcomes and attritionalone. That is, the researcher observes the outcome Yi for all agents, two rounds ofattrition Li1, Li2 - but no covariates Xi. We will show that - even in the absence ofadditional information - the researcher can back out heterogeneity using attritionalone, provided they are willing to make parametric assumptions about the attritionprocess.We are interested in identifying a1, a0, and p. Consider the average treatmenteffect conditional on subsequent rounds of the attrition variable:E[Y˜i|Li1 = l1] = E[Y˜i|θ1,Li1 = l1]P(Li1 = l1|θ)pP(Li1 = l1)−E[Y˜i|θ0,Li1 = l1]P(Li1 = l1|θ)pP(Li1 = l1)+E[Y˜i|θ0,Li1 = l1],andE[Y˜i|Li1 = l1,Li2 = l1] = E[Y˜i|θ1,Li1 = l1,Li2 = l1]P(Li1 = l1,Li2 = l1|θ1)pP(Li1 = l1,Li2 = l1)−E[Y˜i|θ0,Li1 = l1,Li2 = l1]P(Li1 = l1,Li2 = l1|θ1)pP(Li1 = l1,Li2 = l1)+E[Y˜i|θ0,Li1 = l1,Li2 = l1].To identify the model, we extend the conditional independence of tests on out-comes from Assumption 3.2.1 to accommodate multiple periods of attrition.Assumption 3.2.4. Y˜i is independent of {Li j}Jj=1 conditional on θi, for all J ≥ 1.Under Assumption 3.2.4 we can rewrite the preceding equations asE[Y˜i|Li1 = l1] = (a1−a0)P(Li1 = l1|θ)pP(Li1 = l1) +a0, andE[Y˜i|Li1 = l1,Li2 = l1] = (a1−a0)P(Li1 = l1,Li2 = l1|θ1)pP(Li1 = l1,Li2 = l1) +a0.Assumption 3.2.4 can also be illustrated using the education example we have been101considering. The assumption holds provided some outcome of interest (such asfirst-year course grades) do not induce attrition at any subsequent phase of theprogram once we have controlled for the student’s underlying type.Non-Identification Under Conditionally Independent AttritionWhen the attrition process is unknown, we have too many parameters to identify.To reduce the dimensionality of the problem, one possibility is to assume that Li jis independent across j given θi. Regrettably, the following result shows that themodel cannot be identified under such a strong assumption.Lemma 3.2.1. Suppose that Assumption 3.2.4 holds. Then (a1,a0, p,λ ) are notidentified when Li j are independent across j given θi.Proof. Define `1≡P(Li1 = l1), `2≡P(Li1 = l1,Li2 = l1), and `3≡P(Li1 = l1,Li2 =l1,Li3 = l1). Under the assumption that Li j is independent across j given θi thesystem of equations that we consider to identify the model areE[Y˜i] = a1 p+(1− p)a0,E[Y˜i|Li1 = l1] = a1λ p/`1+a0(1−λ p/`1),E[Y˜i|Li1 = l1,Li2 = l1] = a1λ 2 p/`2+a0(1−λ 2 p/`2), andE[Y˜i|Li1 = l1,Li2 = l1,Li3 = l1] = a1λ 3 p/`3+a0(1−λ 3 p/`3).The third equality uses the assumption that P(Li1 = l1,Li2 = l1|θi = θ1) = P(Li1 =l1|θi = θ1)2 ≡ λ 2 (and similarly for the fourth equality). Recall that the Jacobianof this nonlinear system is defined as J = [ j1, j2, j3, j4], where for k = 1, ..,4,jk ≡(∂g1∂θk, ...,∂g4∂θk)′.102Denote ∆= a1−a0. In our case, sincej1 =(p,λ p`1,λ 2 p`2,λ 3 p`3)′j2 =(1− p,1− pλ`1,1− λ2 p`2,1− λ3 p`3)′,j3 =(∆,∆λ`1,∆λ 2`2,∆λ 3`3)′,j4 =(0,∆p`1,∆2λ p`2,∆3λ 2 p`3)′,we have that det(J) = 0 . This follows from the fact that, for the choice of c1 =−1,c2 = 0, c3 = p/∆, c4 = 0, we havec1 j1+ c2 j2+ c3 j3+ c4 j4 = 0,Hence the model is not identified.The following section illustrates the value of dependence in the attrition processfor identification.Example of Local Identification Under Two Periods of ConditionallyDependent, Monotone AttritionIn this section, we illustrate local identification of the unobserved heterogeneitywhen the researcher only has information on attrition, but is willing to make para-metric assumptions on the attrition process. DefineL j−1 ≡ {Lik} j−1k=0. Suppose, forexample that Li0 = 1 and the attrition process is given recursively for j > 0 asLi j = 1{γ1θih(L j−1)+η j > 0}, (3.5)where h(·) is a known, non-stochastic function.Define h j−1 ≡ h(L j−1). Then the system of nonlinear equations that we must103solve to identify p and γ1 are:P(Li1 = 1) =Ψ(γ1h0)p+Ψ(0)(1− p), andP(Li2 = 1|Li1 = 1) =Ψ(γ1h1)p+Ψ(0)(1− p).The following additional assumption on h allows for convenient discussion of iden-tification of the model with conditionally dependent attrition.Assumption 3.2.5. h is monotone in j.An example of an h that satisfies Assumption 3.2.5 is one where h j−1 =∑ j−1k=0 Likfor each j, implying that h0 = 1 and h1 = 2 in the preceding equations. Then, γ1and p are locally identified provided the Jacobian is nonsingular, as it will be un-der a condition that is identical to equation 3.4. On the other hand, one additionalperiod of attrition is required to identify a specification such as:Li j = 1{γ1θi+ γ2h(L j−1)+ηi > 0}. (3.6)We now take a moment to highlight the source of identification of the model pa-rameters using this type of approach. Continuing with the example from equation3.5, we can write the probability that an agent attrits in the jth period given that theagent has never previously attrited as:ν j ≡ 1− (pΨ(γ1h j−1)+(1− p)Ψ(0)) . (3.7)In our setup, the object ν j fulfills a similar conceptual purpose to the hazard func-tion that is familiar from the econometric literature on duration analysis that beganin the 1970s (e.g., Cox [42], Lancaster [82]). For example, under a positive mono-tonic h, it is clear that the pseudo-hazard of 3.7, ν j, is increasing in j when γ1 < 0,and that ν j is decreasing in j when γ1 > 0. Thus, the sign of γ1 determines whetherthe attrition process exhibits positive or negative duration dependence. The pa-rameters of the model also governs how the likelihood of attrition changes with j.Here, a key difference from the usual duration literature is that we did not modelthe duration process - we instead modeled the underlying economic phenomenonof interest (attrition). The pseudo-hazard in 3.7 simply arises as a byproduct of our104attempt to identify the model using properties of an attrition process we specify ina reduced-form such as equation Estimation and InferenceIn this section, we propose a general inference strategy for the models we haveconsidered. In all cases, the parameter of interest, τ0 ∈Rk, can be expressed as thesolution to a system of non-linear equations of the formg(τ0) = pi0,where g(·) is a known, non-stochastic function.For the problem of identifying the TTTE, we are interested in the parametersτ0 = (a1,a0,γ ′0, p0), where a1 and a0 are the TTTE, p0 represents the distribution ofdiscrete, unobserved heterogeneity, and γ0 represents the parameters of the attritionprocess. As for pi0, example 3.2.3 from the section concerned with identificationwithout covariatess16 considers the moments pi0 = (pi01,pi02)′, wherepi01 = (E[Y˜i],E[Y˜i|Li1 = l1],E[Y˜i|Li1 = l1,Li2 = l1]), andpi02 = (P(Li1 = l1),P(Li2 = l1|Li1 = l1)).Letting pˆin be the sample analogue estimator of pi0 and An be a random k× kweight matrix, we can define a minimum distance estimator, τˆn, as the minimzerof the following criterion function over τ0 ∈T , where T ⊂ Rk:Qn(τ) = ‖An(pˆin−g(τ))‖2 /2.Under the usual assumptions, the asymptotic covariance matrix of τˆn has the famil-iar sandwich formΣ0 = B−10 Ω0B−10 ,15Although this chapter has focused on monotone attrition patterns, one can consider relax-ing Assumption 3.2.5. For example, we may take an attrition process such as one where Li j =1{γ1θi(h j−1−a)2 +η j > 0} with a unknown. Although the local identification argument in such acase is similar to before, the rank becomes harder to verify.16In the case where covariates are used to identiy τ0, pi0 follows a very similar structure, exceptwe use the moments of Xi and the conditional moments of Y˜i given Xi.105where B0 = Γ′0A′AΓ0, Ω0 = Γ′0A′AV0A′AΓ0, and Γ0 ≡ ∂∂τ ′ g(τ0). In the followingsection, I consider inference on τ0 when Σ0 is estimated using Γˆn = ∂∂τ ′ g(τˆn) andV0 is estimated using a non-parametric bootstrap procedure.3.3.1 Small Simulation ExerciseIn this section, we assess the finite sample performance of the minimum distanceinference approach outlined in the previous section. We consider the followingmodel for the attrition equation:Li = 1{γ1Xi1+ γ2Xi2θi+ηi > 0},where ηi is standard normal, Xi1 and Xi2 are both discrete uniform onX = {1,2,3,4}.θi takes the value of 1.5 with probability p and 0.5 with probability 1− p. Xi1,Xi2,θiare all independent of one another and drawn independently across i in each iter-ation of the simulations. Table 3.1 considers the size and power properties of theminimum distance estimation. We set the parameter values γ1 = −0.65, γ2 = 0.8,and p = 0.5. Although the procedure suffers from size distortion in the very smallsample sizes, the performance is drastically improved as the sample size increases.Moreover, the procedure does not appear particularly sensitive to the choice ofstarting value.3.4 Using Attrition to Correct for Selection onUnobservablesThe following discussion draws from the arguments of Section 3.2 to illustrate that- although attrition is often considered a type of selection problem - it can also beseen as solution to selection problems under special circumstances.We consider the the incidental truncation model of Gronau [64], where a vari-able Wi is only observed if another variable, Li, takes on the value 1.17 We wish tostudy the distribution of Wi. In the spirit of Gronau’s 1974 example, suppose thatWi represents the worker’s wage offer and Li represents whether or not the worker17See also Wooldridge [112].106Table 3.1: The Empirical Coverage Probability and Average Length of Con-fidence Intervals for Attrition Process Parameters at 95% Nominal LevelEstimation 1 Estimation 2γ1 γ2 p γ1 γ2 pn = 1000 0.9100 0.8655 0.9195 0.9210 0.8575 0.8810n = 2500 0.9420 0.9175 0.9385 0.9270 0.9025 0.9155n = 5000 0.9425 0.9405 0.9440 0.9575 0.9350 0.9370n = 10000 0.9470 0.9395 0.9510 0.9505 0.9460 0.9460n = 15000 0.9520 0.9385 0.9380 0.9520 0.9430 0.9480γ1 γ2 p γ1 γ2 pn = 1000 0.3302 0.5745 0.3822 0.3357 0.5687 0.3697n = 2500 0.2051 0.3733 0.2230 0.2059 0.3708 0.2235n = 5000 0.1442 0.2665 0.1540 0.1443 0.2659 0.1537n = 10000 0.1014 0.1891 0.1078 0.1012 0.1890 0.1078n = 15000 0.0828 0.1546 0.0875 0.0829 0.1545 0.0873Notes: The first half of the table reports the empirical coverage probability of the con-fidence interval for the attrition process parameters, τ = (γ1,γ2, p)′, and the second halfreports its average length. We choose the value τ0 = (−0.65,0.8,0.5)′. The simulationnumber is S = 2000. We use B = 1000 in the bootstrap procedure. Estimation 1 uses astarting value of (0,0,0.5)′ for the minimum distance procedure while Estimation 2 uses astarting value (−0.3,0.4,0.3)′. As n increases, we find that the empirical rejection proba-bilities approach the nominal size and that confidence intervals shrink (as expected).took the job. Here, εi may represent the worker’s abilities, θi may capture whetheror not the worker is discouraged, and ηi captures some additional income shocks.Suppose further, for concreteness, that the reduced form for these variables aregiven as follows:Wi = X ′i1β1+β2θi+ εi, andLi = 1{Z′iγ1+ γ2θi+ηi > 0}, (3.8)where Zi = (Xi1,Xi2)′, εi and ηi are possibly correlated, but independent of Zi, andθi is a discrete, unobserved variable, possibly correlated with Zi. If β2 = γ2 = 0,then the model is the familiar Type II Tobit model described in Amemiya [8].107When β2 and γ2 are not zero, the usual Heckman correction approach may not leadto consistent estimation of the model.In the case that Li depends on covariates as in 3.8, we can identify γ1 and thedistribution of θi using using an assumption such as 3.2.2 and the arguments ofSection 3.2.2, provided Zi has enough support points. Then it is a simple matter toshow that:E[Wi|Li = 1,Zi] = X ′i1β1+β2+ρ(pλ (Z′iγ1)+(1− p)λ (Z′iγ1+ γ2)),where ρ is the correlation between ηi and εi and λ (·) is the inverse Mills ratio.We can solve the problem of selection when the attrition process takes the formof 3.2.3 in a similar way when the econometrician observes Li2 in addition to Li1.Note that none of the arguments in this section require panel data on Wi.The following example demonstrates an economic application of the pure attri-tion identification strategy. Let Wi represent a student’s midterm exam grade, whichis observed if the student remained enrolled in class and wrote the midterm exam,i.e., Li1 = 1. Here, εi may represent the student’s underlying intellectual abilities,θi represents their love of the subject matter, and ηi captures a general measureof their academic ambition. Let Li2 = 1 be the event that the student remainedenrolled in the class and wrote the final exam. In this situation, if Li2 and Li1 areboth influenced by θi and are not themselves conditionally independent given θi,we may use functional form restrictions to back out the distribution of θi. Then wefollow arguments similar from the preceding discussion to make inference on thedistribution of Wi.3.5 ConclusionThis chapter makes a general case for the use of information on attrition to uncoverunobserved heterogeneity and other parameters of interest. We show that attritionis special from an economic point of view in that modeling the attrition process canyield identification strategies that exploit duration dependence without any actualneed to model the duration itself. The usefulness of information on sample attritionfor identifying for identifying more complicated forms of heterogeneity - such asthat varying over time - is left for future work.108ConclusionsIn this thesis, I devise novel econometric approaches to extract information fromtwo-sided matching data, network data, and data on sample attrition.In the Chapter 1, I develop an empirical model of matching with endoge-nous pre-investments. Although the chapter focuses on investment in educationby workers prior to entering a labour market, the model can be applied to manyother situations of interest. For example, the model can be used to investigate therole that frictions and preferences play in a worker’s choice of job sector. Themodel can be applied to the economics of education, to see how pre-investmentson the part of students impact their college placements. The approach can alsobe used study marriage markets in which agents make observable pre-investmentsprior to looking for a partner.One special feature of the model of Chapter 1 is that it can capture assortativematching between workers and firms when the match production function doesnot exhibit complementarities between their types. In this chapter, we investigatethe role of these complementarities by performing counterfactuals under differentspecifications of the production technology. One avenue for future research wouldconsider how the insights from such counterfactuals change in a model that usesricher information on the agents, such as firm profits.One obvious extension to the framework of Chapter 1 would consider whethera tractable model of matching with endogeneous pre-investment is possible whenwe allow workers to choose between many different options.Chapter 1 illustrates that preferences and frictions in two-sided matching mar-kets can be studied when the researcher only observes a single cross-section ofmatched employer-employee data. One very challenging extension to the static109setup considered here would use information on unemployment and job-to-jobtransitions to learn more about frictions in markets along with the preferences ofthe agents who participate in them.One key insight of Chapter 1 is that capital and wages are affected in subtleways by the presence of market frictions when the decisions of workers are en-dogenous. A useful extension that may yield further insights on wage inequalitywould consider the role of endogenous firm capital. Such an extended frameworkmay also provide insights on economic growth.In Chapter 2 we develop a model of social interactions with many economicallyattractive features that is also highly practical from the point of view of empiricalmodeling. In our approach, we suppose that agents optimize by projecting theirown beliefs onto those of their neighbors. This condition is well-motivated from aliterature in behavioural economics on inter-personal projection. The methods wedevelop illustrate the usefulness of behavioural approaches to economic modeling.An open question is whether a tractable model of interactions is possible underother assumptions about belief formation. A related avenue for future research iswhether researchers can develop empirical approaches to test alternative models ofbelief formation.In Chapter 3, we find that when an agent’s attrition decision depends on theirunobserved heterogeneity, we may consider using attrition patterns to learn the dis-tribution of heterogeneity, even in the absence of covariates and panel data on themain outcome of interest. These approaches can be usefully applied to correct forselection on unobservables when the researcher believes that the attrition processsatisfies a certain parametric structure. A useful extension of the approaches dis-cussed in this chapter would allow for heterogeneity that may be time varying orcontinuous, and consider identification under minimal assumptions on the relation-ship between attrition and the unobserved heterogeneity.110Bibliography[1] A. Abdulkadirog˘lu and T. So¨nmez. Random serial dictatorship and the corefrom random endowments in house allocation problems. Econometrica, 66(3):689–701, 1998. → page 11[2] J. M. Abowd, F. Kramarz, and D. N. Margolis. High wage workers andhigh wage firms. Econometrica, 67(2):251–333, 1999. → page 7[3] D. Acemoglu. Politics and economics in weak and strong states. Journal ofMonetary Economics, 52(7):1199–1226, 2005. → page 82[4] D. Acemoglu, C. Garcı´a-Jimeno, and J. A. Robinson. State capacity andeconomic development: A network approach. American Economic Review,105:2364–2409, 2015. → pages 42, 82, 84, 85, 86, 87[5] N. Agarwal and W. Diamond. Identification and estimation in two-sidedmatching markets. 2014. → page 22[6] V. Aguirregabiria and P. Mira. Identification of games of incompleteinformation with multiple equilibria and unobserved heterogeneity. 2016.→ pages 9, 123[7] M. Ahsanullah, V. B. Nevzorov, and M. Shakil. An introduction to orderstatistics. Springer, 2013. → page 131[8] T. Amemiya. Advanced econometrics. Harvard university press, 1985. →page 107[9] L. Anselin. Spatial Econometrics: Methods and Models. KluwerAcademic Publishers, The Netherlands, 1988. → page 43[10] A. Aradillas-Lopez and E. Tamer. The identification power of equilibriumin simple games. Journal of Business and Economic Statistics, 26:261–283, 2008. → page 44111[11] J. Bagger and R. Lentz. An empirical model of wage dispersion withsorting. Technical report, National Bureau of Economic Research, 2014.→ pages 7, 16[12] P. Bajari, H. Hong, and D. Nekipelov. Game theory and econometrics: Asurvey of some recent research. In Advances in Economics andEconometrics, 10th World Congress, volume 3, pages 3–52, 2013. → page30[13] E. Barth, A. Bryson, J. C. Davis, and R. Freeman. It?s where you work:Increases in the dispersion of earnings across establishments andindividuals in the united states. Journal of Labor Economics, 34(S2):S67–S97, 2016. → page 3[14] C. Bartolucci and F. Devicienti. Better workers move to better firms: asimple test to identify sorting. 2012. → page 37[15] G. S. Becker. A theory of marriage: Part i. The Journal of PoliticalEconomy, pages 813–846, 1973. → pages 5, 8[16] C. Bellemare. A life-cycle model of outmigration and economicassimilation of immigrants in germany. European Economic Review, 51(3):553–576, 2007. → pages 91, 97[17] S. T. Berry. Estimation of a model of entry in the airline industry.Econometrica: Journal of the Econometric Society, pages 889–917, 1992.→ page 9[18] T. Besley and S. Coate. Workfare versus welfare: Incentive arguments forwork requirements in poverty-alleviation programs. The AmericanEconomic Review, 82(1):249–261, 1992. → page 95[19] T. Besley and T. Persson. The origins of state capacity: property rights,taxation and politics. American Economic Review, 99(4):1218–1244, 2009.→ page 82[20] L. E. Blume, W. A. Brock, S. N. Durlauf, and Y. M. Ioannides.Identification of social interactions. In A. B. Benhabib, J. and M. Jackson,editors, Handbook of Social Economics, volume 1B, pages 853–964.Elsevier, 2010. → page 39[21] L. E. Blume, W. A. Brock, S. N. Durlauf, and R. Jayaraman. Linear socialinteractions models. Journal of Political Economy, 123:444–496, 2015. →pages 43, 44, 45, 46112[22] R. Blundell and J. M. Robin. Estimation in large and disaggregateddemand systems: An estimator for conditionally linear systems. Journal ofApplied Econometrics, 14:209–232, 1999. → page 66[23] S. Bonhomme, T. Lamadon, and E. Manresa. A distributional frameworkfor matched employer employee data. Unpublished manuscript, Universityof Chicago, 2015. → page 8[24] B. Boudarbat, T. Lemieux, and W. C. Riddell. The evolution of the returnsto human capital in canada, 1980-2005. Canadian Public Policy, 36(1):63–89, 2010. → page 33[25] Y. Bramoulle´, H. Djebbari, and B. Fortin. Identification of peer effectsthrough social networks. Journal of Econometrics, 150:41–55, 2009. →pages 43, 44, 46, 70[26] T. F. Bresnahan and P. C. Reiss. Entry and competition in concentratedmarkets. Journal of Political Economy, 99(5):977–1009, 1991. → page 9[27] E. Breza, A. G. Chandrasekhar, and A. Tahbaz-Salehi. Seeing the forest forthe trees? an investigation of network knowledge. Working Paper, 2018. →page 40[28] W. A. Brock and S. N. Durlauf. Discrete choice with social interaction.Review of Economic Studies, 68:235–260, 2001. → page 71[29] A. Calvo´-Armengol, E. Pattacchini, and Y. Zenou. Peer effects and socialnetworks in education. Review of Economic Studies, 76:1239–1267, 2009.→ page 43[30] C. Camerer. Behavioral Game Theory. Princeton University Press, NewYork, 2003. → page 54[31] A. C. Cameron, J. B. Gelbach, and D. L. Miller. Robust inference withmultiway clustering. Journal of Business and Economic Statistics, 29:238–249, 2011. → page 75[32] N. Canen, J. Schwartz, and K. Song. Estimating local interactions amongmany agents who observe their neighbors. arXiv preprintarXiv:1704.02999, 2017. → page 68[33] D. Card and T. Lemieux. Can falling supply explain the rising return tocollege for younger men? a cohort-based analysis. The Quarterly Journalof Economics, 116(2):705–746, 2001. → pages 7, 19, 21113[34] D. Card, J. Heining, and P. Kline. Workplace heterogeneity and the rise ofwest german wage inequality. 2012. → page 33[35] D. Card, J. Heining, and P. Kline. Workplace heterogeneity and the rise ofwest german wage inequality. The Quarterly journal of economics, 128(3):967–1015, 2013. → page 3[36] A. R. Cardoso, A. Loviglio, and L. Piemontese. Information frictions andlabor market outcomes. 2015. → page 8[37] H. Chade, J. Eeckhout, and L. Smith. Sorting through search and matchingmodels in economics. Journal of Economic Literature, 55(2):493–544,2017. → pages 5, 7, 8[38] P.-A. Chiappori and B. Salanie´. The econometrics of matching models.Journal of Economic Literature, 54(3):832–861, 2016. → page 9[39] E. Choo and A. Siow. Who marries whom and why. Journal of politicalEconomy, 114(1):175–201, 2006. → page 9[40] F. Ciliberto and E. Tamer. Market structure and multiple equilibria inairline markets. Econometrica, 77(6):1791–1828, 2009. → page 9[41] T. G. Conley. GMM estimation with cross sectional dependence. Journalof Econometrics, 92:1–45, 1999. → page 42[42] D. R. Cox. Regression models and life-tables. 34(2):187–220, 1972. →page 104[43] F. Cunha, J. J. Heckman, and S. M. Schennach. Estimating the technologyof cognitive and noncognitive skill formation. Econometrica, 78(3):883–931, 2010. → page 92[44] F. Cunha, J. J. Heckman, and S. M. Schennach. Supplement to ”estimatingthe technology of cognitive and noncognitive skill formation”.Econometrica, 78(3):883–931, 2010. → page 92[45] G. De Giorgi, M. Pellizzari, and S. Redaelli. Identification of socialinteractions through partially overlapping peer groups. AmericanEconomic Journal: Applied Economics, 2:241–275, 2010. → pages 43, 70[46] G. Dionne and B. Dostie. New evidence on the determinants ofabsenteeism using linked employer-employee data. Industrial & LaborRelations Review, 61(1):108–120, 2007. → page 34114[47] B. Dostie and R. Jayaraman. Organizational redesign, informationtechnologies and workplace productivity. 2008. → page 34[48] J.-M. Dufour and L. Khalaf. Monte carlo test methods in econometrics.Companion to Theoretical Econometrics, Blackwell Companions toContemporary Economics, Basil Blackwell, Oxford, UK, pages 494–519,2001. → page 10[49] S. N. Durlauf and Y. M. Ioannides. Social interactions. Annual Review ofEconomics, 2, 2010. → page 39[50] S. N. Durlauf and H. Tanaka. Understanding regression versus variancetests for social interactions. Economic Inquiry, 46, 2008. → page 71[51] F. Echenique, S. Lee, and M. Shum. Partial identification in two-sidedmatching models. In Structural Econometric Models, pages 117–139.Emerald Group Publishing Limited, 2013. → page 9[52] J. Eeckhout and P. Kircher. Identifying sorting - in theory. The Review ofEconomic Studies, page rdq034, 2011. → page 7[53] H. Eraslan and X. Tang. Identification and estimation of large networkgames with private link information. Working Paper, 2017. → page 44[54] N. Fortin, D. A. Green, T. Lemieux, K. Milligan, and W. C. Riddell.Canadian inequality: Recent developments and policy options. CanadianPublic Policy, 38(2):121–145, 2012. → page 33[55] J. T. Fox and P. Bajari. Measuring the efficiency of an fcc spectrumauction. American Economic Journal: Microeconomics, 5(1):100–146,2013. → page 9[56] D. Gale and L. S. Shapley. College admissions and the stability ofmarriage. The American Mathematical Monthly, 69(1):9–15, 1962. →pages 5, 8[57] P. A. Gautier and C. N. Teulings. How large are search frictions? Journalof the European Economic Association, 4(6):1193–1225, 2006. → page 7[58] N. Gennaioli and I. Rainer. The modern impact of precolonialcentralization in africa. Journal of Economic Growth, 12(3):185–234,2007. → page 82[59] N. Gennaioli and H.-J. Voth. State capacity and military conflict. Review ofEconomic Studies, 82(4):1409–1448, 2015. → page 82115[60] A. Goldfarb and M. Xiao. Who thinks about the competition? managerialability and strategic entry in us local telephone markets. AmericanEconomic Review, 101:3130–3161, 2011. → page 44[61] P. Goldsmith-Pinkham and G. W. Imbens. Social networks and theidentification of peer effects. Journal of Business and Economic Statistics,31(3):253–264, 2013. → page 43[62] B. S. Graham. An econometric model of link formation with degreeheterogeneity. Econometrica, 85:1033–1063, 2017. → page 48[63] D. A. Green and B. M. Sand. Has the canadian labour market polarized?Canadian Journal of Economics/Revue canadienne d’e´conomique, 48(2):612–646, 2015. → page 33[64] R. Gronau. Wage comparisons–a selectivity bias. Journal of politicalEconomy, 82(6):1119–1143, 1974. → page 106[65] M. Hagedorn, T. H. Law, and I. Manovskii. Identifying equilibrium modelsof labor market sorting. Econometrica, 85(1):29–65, 2017. → page 7[66] J. A. Hausman and D. A. Wise. Attrition bias in experimental and paneldata: the gary income maintenance experiment. Econometrica: Journal ofthe Econometric Society, pages 455–473, 1979. → page 91[67] J. A. Hausman, W. K. Newey, H. Ichimura, and J. L. Powell. Identificationand estimation of polynomial errors-in-variables models. Journal ofEconometrics, 50(3):273–295, 1991. → page 93[68] J. A. Hausman, J. Abrevaya, and F. M. Scott-Morton. Misclassification ofthe dependent variable in a discrete-response setting. Journal ofeconometrics, 87(2):239–269, 1998. → page 92[69] J. J. Heckman. The common structure of statistical models of truncation,sample selection and limited dependent variables and a simple estimator forsuch models. In Annals of Economic and Social Measurement, Volume 5,number 4, pages 475–492. NBER, 1976. → page 91[70] J. J. Heckman and B. Singer. Econometric duration analysis. Journal ofEconometrics, 24(1-2):63–132, 1984. → page 93[71] Y. Hu. Identification and estimation of nonlinear models withmisclassification error using instrumental variables: A general solution.Journal of Econometrics, 144(1):27–61, 2008. → page 93116[72] Y. Hu and S. M. Schennach. Instrumental variable treatment ofnonclassical measurement error models. Econometrica, 76(1):195–216,2008. → page 92[73] S. I. M. Hwang. A robust redesign of high school match. Working Paper,2016. → page 44[74] I. Johnsson and R. H. Moon. Estimation of peer effects in endogenoussocial networks: control function approach. Working Paper, 2016. → page43[75] K. Kantenga and T. H. Law. Sorting and wage inequality. In 2016 MeetingPapers, number 660. Society for Economic Dynamics, 2016. → page 3[76] H. Kasahara and K. Shimotsu. Pseudo-likelihood estimation and bootstrapinference for structural discrete markov decision models. Journal ofEconometrics, 146(1):92–106, 2008. → page 15[77] H. Kasahara and K. Shimotsu. Sequential estimation of structural modelswith a fixed point constraint. Econometrica, 80(5):2303–2319, 2012. →page 30[78] H. H. Kelejian and D. Robinson. A suggested method of estimation forspatial interdependent models with autocorrelated errors, and anapplication to a county expenditure model. Papers in Regional Science, 72:297–312, 1993. → page 70[79] R. Kellogg. Uniqueness in the schauder fixed point theorem. Proceedingsof the American Mathematical Society, 60(1):207–210, 1976. → page 124[80] E. D. Kolaczyk. Statistcal Analysis of Network Data. Springer Verlag, NewYork, 2009. → page 62[81] A. Konovalov and Z. Sa´ndor. On price equilibrium with multi-productfirms. Economic Theory, 44(2):271–292, 2010. → page 124[82] T. Lancaster. Econometric methods for the duration of unemployment.Econometrica: Journal of the Econometric Society, pages 939–956, 1979.→ page 104[83] T. Lemieux. The changing nature of wage inequality. Journal ofPopulation Economics, 21(1):21–48, 2008. → page 33[84] A. Lewbel. Identification of the binary choice model with misclassification.Econometric Theory, 16(4):603–609, 2000. → pages 91, 92117[85] J. Lise, C. Meghir, and J.-M. Robin. Matching, sorting and wages. Reviewof Economic Dynamics, 19:63–87, 2016. → page 7[86] G. Loewenstein, T. O’Donohue, and M. Rabin. Projection bias inpredicting future utility. Quarterly Journal of Economics, 118:1209–1248,2003. → page 41[87] R. Lopes de Melo. Firm wage differentials and labor market sorting:Reconciling theory and evidence. Unpublished Manuscript, 2013. → page7[88] K. Madara´sz. Information projection: Model and applications. Review ofEconomic Studies, 79, 2012. → page 41[89] N. Maestas, K. J. Mullen, and A. Strand. Does disability insurance receiptdiscourage work? using examiner assignment to estimate causal effects ofssdi receipt. American Economic Review, 103(5):1797–1829, 2013. →page 95[90] C. F. Manski. Identification of endogenous social effects: The reflectionproblem. Review of Economic Studies, 60(3):531–542, 1993. → page 43[91] I. Marinescu and R. Wolthoff. Opening the black box of the matchingfunction: The power of words. 2016. → page 8[92] M. McBride, G. Milante, and S. Skaperdas. Peace and war withendogenous state capacity. Journal of Conflict Resolution, 55(3):446–468,2011. → page 82[93] K. Menzel. Large matching markets as two-sided demand systems.Econometrica, 83(3):897–941, 2015. → page 9[94] K. Menzel. Inference for games with many players. Review of EconomicStudies, 83:306–337, 2016. → pages 9, 43[95] S. Michalopoulos and E. Papaioannou. Pre-colonial ethnic institutions andcontemporary african development. Econometrica, 81(1):113–152, 2013.→ page 82[96] R. Moffit, J. Fitzgerald, and P. Gottschalk. Sample attrition in panel data:The role of selection on observables. Annales d’Economie et de Statistique,pages 129–152, 1999. → page 91[97] W. K. Newey and D. McFadden. Large sample estimation and hypothesistesting. Handbook of econometrics, 4:2111–2245, 1994. → page 98118[98] K. Pendakur and S. Woodcock. Glass ceilings or glass doors? wagedisparity within and between firms. Journal of Business & EconomicStatistics, 28(1):181–189, 2010. → page 34[99] F. Postel-Vinay and J.-M. Robin. Equilibrium wage dispersion with workerand employer heterogeneity. Econometrica, 70(6):2295–2350, 2002. →pages 7, 16[100] R. Powell. Monopolizing violence and consolidating power. QuarterlyJournal of Economics, 128(2):807–859, 2013. → page 82[101] J. M. Robins, A. Rotnitzky, and L. P. Zhao. Analysis of semiparametricregression models for repeated outcomes in the presence of missing data.Journal of the american statistical association, 90(429):106–121, 1995. →page 92[102] A. E. Roth and M. A. O. Sotomayor. Two-sided matching: A study ingame-theoretic modeling and analysis. Number 18. Cambridge UniversityPress, 1992. → page 11[103] T. J. Rothenberg. Identification in parametric models. Econometrica:Journal of the Econometric Society, pages 577–591, 1971. → page 98[104] E. Saez and M. R. Veall. The evolution of high incomes in northernamerica: Lessons from canadian evidence. American Economic Review, 95(3):831–849, 2005. doi:10.1257/0002828054201404. URLhttp://www.aeaweb.org/articles.php?doi=10.1257/0002828054201404. →page 33[105] J. D. Sargan. Identification and lack of identification. Econometrica:Journal of the Econometric Society, pages 1605–1633, 1983. → page 98[106] S. M. Schennach. Estimation of nonlinear models with measurement error.Econometrica, 72(1):33–75, 2004. → page 93[107] K. Song. Econometric inference on large bayesian games withheterogeneous beliefs. arXiv:1404.2015 [stat.AP], 2014. → pages 9, 43[108] C. M. Tiebout. A pure theory of local expenditures. Journal of politicaleconomy, 64(5):416–424, 1956. → page 84[109] L. Van Boven, G. Loewenstein, and D. Dunning. Mispredicting theendowment effect: Underestimation of owners’ selling prices by buyer’sagents. Journal of Economic Behavior and Organization, 51:351–365,2003. → page 41119[110] M. R. Veall. Top income shares in canada: recent trends and policyimplications. Canadian Journal of Economics/Revue canadienned’e´conomique, 45(4):1247–1272, 2012. → page 34[111] B. Williams. Identification of a nonseparable model under endogeneityusing binary proxies for unobserved heterogeneity. 2017. → page 93[112] J. M. Wooldridge. Econometric analysis of cross section and panel data.MIT press, 2010. → page 106[113] H. Xu. Estimation of discrete games with correlated types. TheEconometrics Journal, 17(3):241–270, 2014. → page 15[114] H. Xu. Social interactions in large networks: A game theoretic approach.Working Paper, 2015. → pages 9, 43[115] X. Xu and L. Lee. Estimation of a binary choice game model with networklinks. Working Paper, 2015. → page 43[116] C. Yang and L.-F. Lee. Social interactions under incomplete informationwith heterogeneous expectations. Journal of Econometrics, Forthcoming,2016. → page 43120Appendix ASupporting MaterialsA.1 Appendix to Chapter 1A.1.1 Equilibrium of Investment and Matching GameIn this section, we characterize the equilibrium of the incomplete information gameof Section 1.2.2. First, we introduce a representation of the worker’s expected util-ity function that proves useful for establishing the existence of the Bayesian Nashequilibrium of the game as a fixed point of a best probability response operator.We begin by defining relevant terms. A profile of strategy functions (or decisionrules) isσ = {σi(xi,εi) : i ∈ Nh},where the functions σi : X ×RJ−1 → H . The conditional probability that aworker with covariates xi chooses action hi can be writtenPi(hi|xi,σi)≡∫1{σi(xi,εi) = hi}dF(εi).Since Xi’s are private information in this model, each agent i must take expectationswith respect to the distribution of X−i. The following result shows that under theindependence assumptions embodied by Assumption 1.2.1, the agent’s expectedutility has a very convenient form - it is only affected by the behaviour of the otheragents through the choice probabilities.121Lemma A.1.1. In the model of Section (1.2.2) and Assumptions 1.2.1 and 1.2.2,we can represent the first term in the expected utility of agent i from equation 1.5asU˜i(hi,xi,σ) = ∑h−i∈H−iu˜i(hi,h−i,xi)∏j 6=iPj(h j|σ j).Proof. First, we write equation 1.5 asU˜i(hi,xi,σ) = ∑x−i∈X−i∑h−i∈H−iu˜i(hi,h−i,xi)P−i(h−i|x−i,σ)P(x−i), (A.1)where x−i = (x j) j∈Nh\{i} and we use the shorthand P(x−i)≡ P(X−i = x−i). Withoutloss of generality, let i = 1. Then we write U˜1(h1,x1,σ) as∑h−1∈H−1∑x2∈X... ∑xn∈Xu˜1(h1,h−1,x1)P−1(h−1|x2, ...,xn,σ)nh∏j=2Pj(x j), (A.2)where we used the independence of Xi’s from Assumption 1.2.1. Next, since As-sumption 1.2.1 says that Xi’s and εi’s are independent, we know that the actions ofeach of the agents are independent and depend only on their personal value of Xiand εi. Therefore,P(h−1|x2, ...,xn,σ) =nh∏j=2Pj(h j|x2, ...xn,σ j) =nh∏j=2Pj(h j|x j,σ j). (A.3)Plugging (A.3) back into (A.2) yields that U˜1(h1,x1,σ) is equal to∑h−i∈H−iu˜i(hi,h−i,xi) ∑x2∈X... ∑xn∈Xnh∏j=2Pj(h j|x j,σ j)nh∏j=2Pj(x j). (A.4)Grouping the sums in (A.4) and restoring the generic i index gives∑h−i∈H−iu˜i(hi,h−i,xi)∏j 6=i∑x j∈XPj(h j|X j = x j,σ j)Pj(x j)and hence we have the desired result.We will show the existence of the equilibrium for our model. The solution122concept for the game described in Section 1.2.2 is Bayesian Nash Equilibrium(BNE), which we now define.Definition A.1.1. A Bayesian Nash Equilibrium (BNE) of the game described inSection (1.2.2) is a profile of decision rules σ∗ such that for any player i and forany (xi,εi):σ∗i (xi,εi) = argmax hi∈H {Ui(hi,xi,εi,σ∗)} . (A.5)The notation and arguments in this section follow Aguirregabiria and Mira [6],but we include them here for completeness. Under Assumption 1.2.2, we write theexpected utility of i asUi(hi,xi,σ ,εi) = U˜i(hi,xi,σ)+ ε ′i d(hi).By Lemma A.1.1 we can express the first term on the right hand side of the thepreceding equation asU˜i(hi,xi,σ) = ∑h−i∈H−iu˜i(hi,h−i,xi) ∏j∈Nh\{i}Pj(h j|σ j).Note that U˜i(hi,xi,σ) only depends on the choices of other agents through thechoice probabilities of the other players that are induced by σ . We write the choiceprobabilities of the people other than i asP−i ≡ {Pj(h j) : ( j,h j) ∈ N\{i}×H \{1}}.For any P−i, we can define a best response probability function as:Ψ˜i(hi|xi,P−i) ≡∫1{argmax hi∈H U˜i(hi,xi,σ)+ ε ′i d(hi) = hi}dF(εi).Ψ˜i tells us the probability that a particular action is optimal for i with covariates xi123when others choose according to probabilities P−i.1 LetΨi(hi|P−i) = ∑xi∈XΨ˜i(hi|xi,P−i)P(xi).An equivalent to Definition 1 on the preceding page is that the equilibrium proba-bilities, P∗ ≡ P(σ∗), satisfy the fixed point constraint, P∗ =Ψ(P∗), where Ψ is thebest response probability mapping:Ψ(P) = {Ψi(hi|P−i) : (i,hi) ∈ N×H \{1}}. (A.6)Lemma A.1.2. Under Assumption 1.2.1 and Assumption 1.2.2 the game describedin Section (1.2.2) has a Bayesian Nash Equilibrium.Proof. Since Ψ(·) maps from a compact convex set, [0,1]n×(J−1), to itself and iscontinuously differentiable (by the continuity of εi’s (Assumption 1.2.1)), Ψ(·) hasa fixed point by Brouwer’s fixed point theorem.We can appeal to a result of Kellogg [79] to show that the Bayesian equilibriumis unique under a mild condition on the derivatives of the best response probabil-ity mapping. Let In be an identity matrix of size n. Kellogg’s result - as stated inKonovalov and Sa´ndor [81] - stipulates that the equilibrium is unique if the deter-minant of J˜n ≡ ∂Ψ(P)/∂P′− In is nonzero and the mapping Ψ has no fixed pointson the boundary of [0,1]n×(J−1). In our setup, the former condition holds providedthat the best response of any agent is not excessively sensitive to a change in theprobability of any other agent.21Note that when εi’s have the extreme value distribution (as in Assumption 1.3.2) then we haveΨ˜i(hi|xi,P−i) = exp(U˜i(hi,xi))∑Jj=1 exp(U˜i(h j,xi)).2Under our assumptions, J˜n is a matrix with−1’s on the diagonal and p˜≡ ∂Ψi/∂ p j for all i 6= j onthe off diagonals. Using the fact that J˜n is a circulant matrix, we can express its determinant explicitlyas (p˜(n−1)−1)(−(1+ p˜))n−1. A sufficient condition for det(J˜n) 6= 0 is thus p˜< 1/(n−1).124A.1.2 Additional Mathematical ResultsThe remaining results of this section allow us to represent the matching probabili-ties from equation 1.15, hence workers’ expectations, in a convenient way. Theserepresentations can then be used to estimate θ(β ) using maximum likelihood.Proposition A.1.1. Suppose that J = 2, that Assumptions 1.2.1, 1.2.2, 1.3.1, and1.3.2 hold. Then for each m = 1, ...,M and each h j ∈H the conditional proba-bility of an arbitrary worker i matching to firm with capital class m after choosingeducation level h j ispim j = ∑nh−1n j=0 P(M (i) = m|Hi = h j,N j = n j)B(n j;nh−1, p j),where N j is the number of workers other than i who picked education level h j,B(n j;nh−1, p j) is the binomial p.m.f. and p j = P(Hi = h j).Proof. Part (a) of Assumption 1.3.1 that says firms do not consider the workers’covariates when ranking them in the matching process. This means that for eachm = 1, ...,M we have thatP(M (i) = m|Hi = h j,H−i = h−i,Xi = xi) = P(M (i) = m|Hi = h j,H−i = h−i).Combining this with equation 1.15, we can writepi(i)m j = ∑h−i∈H−iP(M (i) = m|Hi = h j,H−i = h−i)P(H−i = h−i|Xi = xi).125Next, it is straightfoward to see that3P(H−i = h−i|Xi = xi) =nh∏j 6=iPj(h j). (A.8)Since εi are identically distributed by Assumption 1.2.1, for each j and m, havepi(i)m j = pim j.When there are only two education levels, any h−i ∈H−i can be representedas a total number of workers other than i who picked education level h j, n j. Fromworker i’s point of view, n j is a particular realization of the random variable N jthat takes values in the set {0, ...,nh−1}. Since there are nh−1 agents other than iin the economy, the sum over h−i ∈H−i amounts to a sum over the support of N j.Now consider any n j in the support of N j. The assumption that εi’s are iid impliesthat the probability that exactly n j out of nh−1 workers pick h j can be representedas(nh−1)!n j!(nh−1−n j)! pn jj (1− p j)nh−1−n j ,which is the binomial probability mass function, B(n j;nh−1, p j).3This can be shown using the same arguments as those in Lemma A.1.1. The private informationand independence of Xi’s (Assumption 1.2.1) implies that the left hand size of A.8 equals∑x−i∈X−iP(h−i|x−i,xi)P(x−i|xi) = ∑x−i∈X−iP(h−i|x−i)P(x−i). (A.7)Suppose without loss of generality that i= 1. It is convenient to rewrite the above as follows (usingindependence):∑x2∈X... ∑xn∈XP(h−1|x2, ...,xn)nh∏j=2Pj(x j).Next, since Xi’s and εi’s are independent across i and each i’s strategy function is only a function ofXi and εi we haveP(h−1|x−1) =nh∏j 6=1Pj(h j|x−1) =nh∏j 6=1Pj(h j|x j).Combining these two results we write A.7 as∑x2∈XPj(h2|x2)Pj(x2)... ∑xn∈XPn(hn|xn)Pn(xn) =nh∏j 6=1Pj(h j).126Characterization of Matching Probabilities when J equals 2When J = 2 we can partition the types of firms, m = 1, ...,M into two sets: thosewho prefer h j ∈H and those who prefer h j′ with j′ 6= j. It is convenient to intro-duce the following notation:M+j (θ) = {m ∈ {1, ...,M} : ρ(km,h j;θ)≥ ρ(km,h j′ ;θ), j 6= j′}, andM−j (θ) = {1, ...,M}\M+j ,recalling that firm preferences are given in 1.9. The firm classes that prefer h jare pinned down by the functional form for firm preferences, ρ , along with thepreference parameter, θ , and the distribution of Xi. Furthermore let us denotepim j ≡n f∑n( j)=0nh−1∑n j=0Ph j,n j,n( j)(m)P(n j)P(n( j);θ), (A.9)wherePh j,n j,n( j)(m) = P(M (i) = m|hi = h j,N j = n j,N( j) = n( j)). (A.10)Note that this object depends on both β and θ through the matching function. Foreach firm type m = 1, ...M let Fm ≡ N(βkm,σ2), and define the following for eacheducation choice h j:G j+ ≡ ∑m∈M+jqmFm and G j− ≡ ∑m∈M+jqmFm.Furthermore, define the posterior firm types as follows:q+m ≡ qm/ ∑m∈M+jqm and q−m ≡ qm/ ∑m∈M+jqm.We also define v(b1,b2;F) as the b1-order statistic of b2 random variables indepen-dently distributed according to cdf F . Propositions A.1.2 and A.1.3 are characteri-zations of Ph j,n j,n( j)(m)’s of the model in the case that J = 2 and nh = n f = n.When considering these results, it is important to recall one core feature of the127matching model as we outline it in Section 1.2: that there is no unemployment.Therefore, when reading the arguments, the reader should take for granted the factthat the probability that each worker matches to some firm occurs with probabilityone.Proposition A.1.2. (Heterogeneous firm preferences). Denote n¯ j ≡ n j + 1 andsuppose that nh = n f = n. Then under the assumptions of Proposition A.1.1 wehave the following for any n j such that 1≤ n j ≤ n and n( j) such that 0< n( j) < n:i) For each m ∈M+j ,Ph j,n j,n( j)(m) =q+mn( j)/n¯ j if n¯ j ≥ n( j)P(vm>vˆ)q+m∑m∈M+jP(vm>vˆ)q+mif n¯ j < n( j),where vˆ≡ v(a,b;F) with a = n( j)− n¯ j, b = n( j), and F = G j+.ii) For each m ∈M−j ,Ph j,n j,n( j)(m) =P(vm<vˆ)q−m((n¯ j−n( j))/n¯ j)∑m∈M−jP(vm<vˆ)q−mif n¯ j > n( j)0 if n¯ j ≤ n( j),where vˆ≡ v(a,b;F) with a = n¯ j−n( j)+1, b = n−n( j), and F = G j−.Proof. We begin by introducing some notation. We denote the event that a workerwho chose education level h j matches to any firm of type m ∈M+j or m ∈M−j asM+i j and M−i j respectively.4First, we consider the probability that a worker who chose h j matches to anyfirm in the class m ∈ M+j . Consider the case that n¯ j ≥ n( j). In this case, thereare at least as many workers who chose h j as firms who prefer h j. Given thatCondition IR implies that no worker or firm will never unilaterally dissolve a matchto become unmatched, the case of n¯ j ≥ n( j) implies that every firm in class m whowants a worker with h j will hire one in the matching process. For each class offirm m ∈M+j , the probability that a worker who chose h j matches to a firm in theset of firms that prefers h j and to the particular class m ∈M+j is given as follows4That is, M+i j ≡ {Mi ∈M+j } and similarly for M−i j ≡ {Mi ∈M+j }.128when n¯ j ≥ n( j):Pj(Mi = m,M+i j ) = Pj(Mi = m|M+i j )Pj(M+i j ).= q+mn( j)/n¯ j,where the j-subscript on the probabilities denote a probability conditional on theevent Hi = h j. Pj(M+i j ) is equal to n( j)/n¯ j because workers with the same h j are in-distinguishable to the firms that prefer them, so firms choose among these workersat random. The probability of matching to a firm of type m ∈ M+j given that theworker has already matched to some firm in M+j is equal to the relative proportionof type m firms in this category, q+m .Next, we consider the case that n¯ j < n( j). Since there are strictly more firmsthat prefer h j than workers who chose h j, the probability that a worker who choseh j matches to a firm that prefers workers with h j occurs with probability one; thatis Pj(M+i j ) = 1.5Although Pj(M+i j ) = 1, only the firms with the n¯ j largest v-indices will be ableto match with a worker who chose h j. Thus, a firm in M+j matches to a workerwith h j if and only if its v statistic exceeded the κ = n( j)− n¯ j order statistic amongall n( j) firms in M+j . Thus, by Assumptions 1.3.1 and 1.3.2, the probability thata worker who chose h j matches with a firm from class m ∈ M+j conditional onmatching to some firm in M+j isP(v(K) = v(km)|v(K)> vˆ,m ∈M+j ),which by Bayes’ rule equalsP(v(K)> vˆ|v(K) = v(km),m ∈M+j )P(v(K) = v(km)|m ∈M+j )∑m∈M+j P(v(K)> vˆ|v(K) = v(km),m ∈M+j )P(v(K) = v(km)|m ∈M+j ), (A.11)where vˆ ≡ v(κ,n( j);G j+). Equation A.11 represents the relative proportion of type mfirms represented among threshhold crossers among all firms that prefer h j. We5This follows from Condition IR and the following two facts: i) h j workers are scarce relativeto the firms that prefer them ii) firms that prefer h j′ will never choose a h j worker in the matchingprocess since the condition nh = n f = n and J = 2 implies that h j′ workers are always available (i.e.,when nh = n f = n, n( j) > n¯ j implies that n j′ > n( j′), since n j′ = n− n¯ j and n( j′) = n−n( j)).129next consider the probability of matching to each firm with m ∈M−j . We considerfirst the case that n¯ j > n( j). The relevant probability isPj(Mi = m,M−i j ) = Pj(Mi = m|M−i j )Pj(M−i j )= Pj(Mi = m|M−i j )(1−n( j)/n¯ j).As stated above, the of case n¯ j > n( j) combined with our assumption that nh =n f = n implies that n( j′) > n j′ , since n j′ = n− n¯ j and n( j′) = n− n( j). Thereforeby similar logic to before, firms who prefer h j′ match to workers with h j if theirv-index is lower than the n( j′)−n j′ +1 = n¯ j−n( j)+1 order statistic among thosefirms in M−j . Letting κ ≡ n¯ j− n( j)+ 1, the probability of a worker who chose h jmatching to a type m ∈ M−j firm conditional on matching to some firm in M−i j isgiven as the proportion of type m firms whose v index falls below this threshhold:Pj(Mi = m|M−i j ) =P(vm < vˆ)q−m∑m∈M−j P(vm < vˆ)q−m,where vˆ ≡ v(κ,n( j′);G j−). Lastly, in the case that n¯ j ≤ n( j), P(M−i j ) = 0. This com-pletes the proof.Next we define G≡ ∑Mm=1 Fmqm. Proposition A.1.3 characterizes the matchingprobabilities in the case that all firms types prefer one level of education; that is, inthe case that firm preferences are homogeneous over worker education types. Thearguments are abridged, since they are very similar to those used in the proof ofProposition A.1.2.Proposition A.1.3. (Homogeneous firm preferences). Suppose that nh = n f = n.Then under the assumptions of Proposition A.1.1 we have the following for thecases that n( j) = n and n( j) = 0.1. if n( j) = n, then M−j = /0 and for each m ∈M+j = M we havePh j,n j,n( j)(m) =qm if n¯ j = nP(vm>vˆ)qm∑m∈M P(vm>vˆ)qmif n¯ j < n,where vˆ≡ v(a1,a2;G), with a = n− n¯ j and b = n.1302. If n( j) = 0, then M+j = /0 and for each m ∈M−j = M we havePh j,n j,n( j)(m) =qm if n¯ j = nP(vm<vˆ)qm∑m∈M P(vm<vˆ)qmif n¯ j < n,where vˆ≡ v(a1,a2;G), with a = n¯ j +1 and b = n.Proof. When n( j) = n and n¯ j = n the probability of matching to firm m is simplyequal to the marginal probability of that firm type in the economy, qm. When n( j) =n and n¯ j < n, using logic identical to that employed in the proof of PropositionA.1.2, we conclude that the probability of matching to a firm from class m is equalto the proportion of type m firms above the n− n¯ j order statistic of the v’s.When n( j)= 0, we must have n¯ j > n( j)= 0 (since at least one person is assumedto choose h j). Since the top n j′ = n− n¯ j ranked firms in terms of v receive a workerwith their preferred education, h j′ , the probability of matching to a firm in class mis equal to the proportion of type m below the n¯ j +1 order statistic of the v’s.The following result takes for granted a well-known fact that uniform orderstatistics follow the Beta distribution.6Lemma A.1.3. Let: i) {Xi}ni=1 be iid random variables from continuous distribu-tion function G; ii) Z be normally distributed with mean µ and variance σ2; iii)X(i) be the i-th order statistic of {Xi}ni=1; iv) U(i) be the i-th order statistic of iiduniform random variables {Ui}ni=1. Then,P(Z ≥ X(i)) = 1−EΦ((G−1(U(i))−µ)/σ),where Φ(·) is the standard normal cdf, and E(·) is taken over the distribution ofU(i), which follows the Beta distribution with parameters i and n+1− i.Proof. Note that since Xi’s are continuously distributed according to G it followsfrom the probability integral transformation result that for each iXi =d G−1(Ui).6For example, see Chapter 2 Ahsanullah et al. [7].131Also, since G is monotone we have that for each iX(i) =d G−1(U(i)).The previous line implies thatP(Z ≥ X(i)) = P(Z ≥ G−1(U(i)))= 1−P(Z ≤ G−1(U(i)))= 1−EΦ((G−1(U(i))−µ)/σ),where E(·) is taken over the distribution of U(i). The last equality used the fact thatZ is normal with mean µ and variance σ2.The following results are direct application of the previous results. They areuseful for constructing the pim j’s that are used in the structural estimation of thispaper. Recall the definitions of G, G j+, G j−, and v(b1,b2,F) from before. Weintroduce introduce the following notation:a(κ,n,m;G)≡ EΦ((G−1(U(κ;n))−βkm)/σm) ,where U(κ;n) is the κ-order statistic of n uniform random variables and E(·) is takenover the distribution of U(κ;n).Corollary A.1.1. Suppose the conditions of Proposition A.1.2 hold and let vm bedistributed according to Fm. Then, in the heterogeneous preferences case withn¯ j < n( j),1. For each m ∈M+j , P(vm > v(κ,n( j);G j+)) = 1−a(κ,n( j),m;G j+), whereκ = n( j)− n¯ j.2. For each m ∈M−j , P(vm < v(κ;n( j′);G j−)) = a(κ,n( j′),m;G j−), whereκ = n¯ j−n( j)+1.Corollary A.1.2. Suppose the conditions of Proposition A.1.3 hold and let vm bedistributed according to Fm. Then, in the homogeneous preferences case with n¯ j <n,1321. If n( j) = 0, P(vm < v(κ;n,G)) = a(κ,n,m;G) for each m ∈M, whereκ = n¯ j +1.2. If n( j) = n, P(vm > v(κ;n,G)) = 1−a(κ,n,m;G) for each m ∈M, whereκ = n− n¯ j.Proof. The proofs of Corollaries 1 and 2 follows directly from Lemma A.1.3.A.1.3 A Monte Carlo Simulation StudyIn this section, we investigate the finite sample size and power properties of theestimator of preferences, θˆn(β ), under a variety of parameters and functional formassumptions. The results in this section are for the case the matching technology,β , is known to the econometrician.In this study, we choose the following general structure for the worker’s ex-pected utility function:U˜i = ( fi+gi)/2+d(Hi)εi,where θ = (θ1,θ ′2)′ ∈R4, Xi ∈R3. d(Hi) be 2×1 vector with one in the Hi-th rowwhere Hi ∈{1,2}.We suppose that εi ∈R2 follows the extreme value distribution sothat the best response probability function of each worker has the logit structure.We consider two functional forms for the production function fi, which we callSpecification 1 ( fi1) and Specification 2 ( fi2):fi1 = θ1Hi ·pii(θ ,β )′k, andfi2 = θ1(Hi+pii(θ ,β )′k).fi1 implies direct production complementarities between the worker and firmvariables whereas any complementarities in fi2 are forced through the worker’sexpectation of firm capital pi ′i k. We also choose the following gi that ensures that133the worker’s outside option is positive7:gi = exp(Hi ·X ′i θ2).Xi = (X1i,X2i,X3i)′ are drawn independently across i and one another fromU [0,1]. For each simulation sample, the Hi’s are generated as follows. First, wesolve for fixed point in the best response operator to obtain P∗.8 Then we computethe best response at the simulated covariatesΨi(Hi|Xi,P∗−i) =exp(U˜∗i (Hi,Xi))∑2j=1 exp(U˜∗i (H j,Xi)).Letting Ψ∗i (Xi)≡Ψi(Hi|Xi,P∗−i) we generate the simulated actions as,Hi = 1{Ψ∗i1 > ωi}where ωi’s are drawn iid from the uniform distribution on [0,1].7Note that when gi = Hi exp(X ′i θ2) - that is, a functional form guaranteeing that gi is increasingin Hi - also yielded comparable performacne in the the simulation studies.8In experiments with different starting values, iterating the best response operator yielded thesame fixed point each time.134Table A.1: The Empirical Coverage Probability of Asymptotic ConfidenceIntervals for a′θ0 at 95% Nominal Level When β0 is Known.Specification 1 Specification 2β0 M = 2 M = 3 M = 5 M = 2 M = 3 M = 5−1 n = 500 0.9480 0.9430 0.9470 0.9510 0.9410 0.9430n = 1000 0.9480 0.9560 0.9490 0.9450 0.9450 0.9270n = 2000 0.9380 0.9300 0.9250 0.9590 0.9300 0.9090−0.5 n = 500 0.9470 0.9490 0.9490 0.9530 0.9500 0.9310n = 1000 0.9480 0.9370 0.9550 0.9480 0.9310 0.9300n = 2000 0.9430 0.9430 0.9450 0.9170 0.9180 0.88600 n = 500 0.9460 0.9470 0.9530 0.9380 0.9350 0.9380n = 1000 0.9490 0.9490 0.9530 0.9370 0.9480 0.9440n = 2000 0.9360 0.9510 0.9400 0.9580 0.9530 0.94900.5 n = 500 0.9520 0.9530 0.9600 0.9420 0.9310 0.9330n = 1000 0.9410 0.9500 0.9360 0.9460 0.9450 0.9440n = 2000 0.9430 0.9260 0.9230 0.9560 0.9500 0.92801 n = 500 0.9440 0.9430 0.9380 0.9330 0.9360 0.9270n = 1000 0.9260 0.9180 0.8930 0.9470 0.9340 0.9220n = 2000 0.9170 0.9050 0.8850 0.9200 0.9240 0.9190Notes: The table reports the empirical coverage probability of the asymptotic confidenceinterval for θ0. The simulated rejection probability at the true parameter is close to thenominal size of α = 0.05. The simulation number is R = 1000.135Table A.2: Average Length of Confidence Intervals for a′θ0 at 95% NominalLevel When β0 is Known.β0 M = 2 M = 3 M = 5 M = 2 M = 3 M = 5−1 n = 500 15.5881 1.8333 1.8353 1.6223 1.4490 1.3557n = 1000 1.3573 1.2290 1.2920 1.1102 0.9449 0.9240n = 2000 0.9093 0.8762 0.8643 0.6812 0.6858 0.6385−0.5 n = 500 2.3023 2.0498 2.0104 2.4145 1.9593 1.6319n = 1000 3.4657 1.508 1.3757 1.6552 1.2743 1.0797n = 2000 1.0376 1.0841 1.0300 1.0803 0.8720 0.74510 n = 500 2.2851 2.2847 2.2773 1.1456 1.1542 1.2197n = 1000 1.5994 1.5993 1.6397 0.7371 0.8457 0.7375n = 2000 3.1524 1.1593 1.1208 0.5205 0.5196 0.52060.5 n = 500 3.0442 2.9003 2.9723 2.4889 3.0612 3.1814n = 1000 3.0954 1.9949 2.1289 2.0466 2.0044 2.3200n = 2000 1.5633 1.4310 1.5487 0.9286 1.3152 1.24691 n = 500 6.0386 11.0447 4.7156 4.8434 4.7347 4.3571n = 1000 2.7670 3.1329 4.6450 2.8590 3.1888 3.2592n = 2000 1.6956 2.0135 2.2905 2.1166 2.5055 2.2695Notes: This table reports the average length of the asymptotic confidence interval for θ0.The lengths of the of the confidence intervals decrease with n. The simulation number isR = 1000.136A.1.4 Empirical SectionVariablesWe use variables EDC 1 through EDC 12 to construct the indicator variable forhigh educated. A number of definitions are possible. The results record the valueof Hi = 1 if the individual has completed both high school and a college degree.We rely on firm size as our measure of firm productivity. There is specific evi-dence that firm size is a useful proxy for firm productivity in both manufacturingand non-manufacturing sectors in the Canadian context. Leung et al. (2008), usingCanadian administrative data for the period 1984-1997, argue that firm size is pos-itively correlated with measures of labour and total factor productivity, particularlywithin the manufacturing sector.For the outside option function, we use the worker’s marital status, the numberof dependent children the worker has, and the worker’s gender. We drop the fewindividuals who reported having six dependent children (the maximum allowablein the sample). The form of the outside option is as reported in the simulationsstudy.A.1.5 EstimationThe estimation proceeds in two steps, broadly as outlined in 1.3. We begin by dis-cussing the estimation of θ0, then we discuss the Monte-Carlo inference approachused to construct confidence intervals for β in oru samples of interest.We set the parameter space for β0 to be B= [−1 : 0.075 : 3]′.We normalize σ =1 throughout. For each value of β ∈ B we obtain θˆ(β ) using the employee-finalweights provided for WES by Statistics Canada. Standard errors are constructedas the square roots of the diagonal elements of inverse of the sample Hessian,constructed using numerical differentiation of the log-likelihood function via a five-point stencil approach.Much of computational difficulty associated with the estimation of θ involvesthe construction of the worker’s expectations, i.e., the pˆi j’s. For the number of sup-port points for the capital variable, we set a value of M = 5, but the results are notvery sensitive to similar values of M (i.e., M = 3, M = 7). The empirical distribu-137tion of firm size, qˆ, is constructed using the WES workplace weights after groupingthe firms into categories based on log-firm size. We also use a value of fifty drawsof random variables from the beta distribution for the construction of the thresh-hold crossing-probabilities. We use the empirical distribution of high education asthe equilibrium choice probabilities. For the distribution of, N( j), the number offirms that prefer workers who chose education level j, we use the binomial prob-ability mass function with probability equal to the expected fraction of firms whoprefer education level j. In practice, we construct the matching probabilities usingan interpolation of the support of N( j). A choice of forty support points is found towork well.The test statistic of interest is based on matched observed characteristics in thedata. In particular, we consider a test statistic that compares the observed jointdistribution of worker human capital and the matched firm capital to the simulatedcounterpart. That is,T (β ) =1RR∑r=1∥∥Pˆ− Pˆr(β , θˆn(β ))∥∥ , (A.12)where Pˆ is an M× J matrix whose (m, j) element is the probability that a workerof education level j matches to a firm of capital level m (i.e., Pˆ(M(i) =m,hi = j)),Pˆr is similarly defined except we use the the simulated matching, Mir(i;β , θˆ(β )),in place of the observed matching, M(i), and ‖·‖ is the Frobenius norm. Notethat when we construct Pˆr we use the θˆ that were estimated using the employee-weights. However, no set of weights provided by Statistics Canada are appropriateat the level of the match itself. We also define an estimator of β as the minimizerof the T (β ) as a heuristic measure of β .A.1.6 Estimation Results138Figure A.1: Wage Inequality in Canada’s Workplace-Employee Survey: 99-50 Difference in Quantile of Log Hourly Wages1999 2000 2001 2002 2003 2004 2005Year1.ˆ0.99−qˆ0.50The plot shows the difference in the weighted sample quantiles of log hourly wages(HR WAGET) in the WES sample years of 1999-2005.Figure A.2: Income Inequality in Canada: Gini coefficients, 1990-20151990 1995 2000 2005 2010 2015Year0.250.30.350.40.45GiniMarket incomeTotal incomeAfter-Tax incomeThe figure plots Gini coefficients for total, adjusted market, and after-tax income forCanada. The period of 1999-2005, coinciding with WES, is a period of relatively sta-ble income inquality in Canada. The source of the data is CANSIM Table 206-0033 fromIncome Statistics Division, Statistics Canada.139Table A.3: Matching Technology In Canadian Manufacturing and FinanceIndustries, 1995-2005Manufacturing FinanceYear Specification 1 Specification 2 Specification 1 Specification 21999 1.475 1.775 0.950 0.875[0.5000,2.450] [0.575, 2.450] [0.200,1.625] [0.200, 1.625]2000 1.475 1.475 1.625 1.550[0.2000,2.525] [0.275 2.600] [0.950,1.850] [1.100, 1.925]2001 -1.000 -0.175 0.575 0.575[-1.000,1.325] [-1.000, 1.400] [-0.775,1.700] [-0.700, 1.700]2002 -0.175 -0.100 0.725 0.650[-1.000,1.775] [-1.000, 1.625] [-.550,2.150] [-0.550, 2.225]2003 0.725 0.875 0.875 0.800[-1.000,2.375] [-1.000 2.600] [0.0250,1.750] [-0.025, 1.700]2004 1.325 1.100 0.650 0.575[-0.325,2.450] [-0.325, 2.525] [-0.550,1.775] [-0.550, 1.850]2005 2.300 2.375 0.425 0.425[0.950,2.975] [0.950, 2.975] [-0.550,1.550] [-0.550, 1.550]This table reports minimum distances estimates of β using the test statistics consideredA.12 along with 95% confidence intervals for β for the years 1999-2005 using WES datafor managers and professionals in the Secondary Products Manufacturing sector and theFinance and Insurance Industry. Specification 1 and Specification 2 refer to the cases inwhich worker and firm capital are multiplicative and additive, respectively. The resultsfor β are similar across specifications. The matching technology in the manufacturingindustry exhibited the least frictions at the start of the sample, rising in the middle, thenfalling again towards the end of the sample. In the finance industry, we find an increasein matching frictions from 1999 onwards. Weighted sample sizes for relevant years andindustries are reported in table A.4 and A.5.A.1.7 Model Counterfactuals140Table A.4: Estimation of Worker and Firm Preferences in Canadian Manu-facturing Industry, 1999-2005Specification 11999 2000 2001 2002 2003 2004 2005θ1 3.6061 1.9910 3.2229 0.0596 4.4090 5.4205 3.2859(0.0023) (0.0012) (0.0006) (0.0016) (0.0016) (0.0010) (0.0015)θ2 -0.9760 0.1460 0.0240 -0.3201 -0.4391 0.7434 -29.7608(0.0269 ) (0.0011 ) (0.0002) (0.0005) (0.0012) (0.0003) (6.785·109)†θ3 0.2742 -30.7611 0.2259 0.6203 0.5536 -1.3002 0.0877(0.0009) (3.132·109)† (0.0005) (0.0003) (0.0006) (0.0058) (0.0007)θ4 -0.0889 0.5008 0.2017 0.1410 -0.0299 -0.3937 0.2015(0.0006) (0.0004) (0.0002) (0.0001) (0.0003) (0.0014) (0.0002)bi 71361 72464 80738 76214 87026 153520 98879Specification 21999 2000 2001 2002 2003 2004 2005θ1 1.7022 0.9582 0.2891 0.0257 2.1104 2.5843 1.8401(0.0004) (0.0005) (0.0004) (0.0005) (0.0005) (0.0003) (0.0006)θ2 -0.9760 -5.5360 -0.1571 -0.3201 -0.4391 0.5897 -30.3271(0.0002) (0.0110) (0.0003) (0.0072) (0.0046) (0.0015) (0.0071)θ3 0.2742 0.6741 0.6012 0.6203 0.5536 -0.1873 0.0877(0.0009) (0.0004) (0.0077) (0.0003) (0.0007) (0.0007) (0.0002)θ4 -0.0889 -0.2500 0.1432 0.1410 -0.0299 0.1068 0.2015(0.0039) (0.0002) (2.423·109)† (0.0004) (0.0003) (0.0003) (0.0001)bi 71361 72464 80738 76214 87026 153520 98879This table reports preference parameter estimates and standard errors (in parantheses) forprofessionals and managers in the Canadian manufacturing industry for two specificationsfor the years 1995-2005 (WES sample frame). bi denotes the weighted sample size in yeart. For each year, we report the value of θˆn(βˆ ), where βˆ is the value of the minimum dis-tance estimate for that year. Specification 1 and Specification 2 refer to the cases in whichworker and firm capital are multiplicative and additive respectively. θ1 is the coefficienton worker and firm attributes in the production function. The remainder are coefficientson the outside option function θ2: femalei, θ3: marital status, θ4: number of dependentchildren. All coefficients other than ones with † are statistically significant at α = 0.01.Both specifications suggest modest increases in the production technology parameter overtime, θ1 . Both specifications, particularly the additive, suggest a negative coefficient onfemalei in the worker’s wage equation.141Table A.5: Estimation of Worker and Firm Preferences in Canadian FinanceIndustry, 1999-2005Specification 11999 2000 2001 2002 2003 2004 2005θ1 2.8284 2.5676 3.2546 2.9339 2.4986 2.5479 4.5527(0.0014) (0.0012) (0.0009 (0.0014) (0.0013) (0.0007) (0.0016)θ2 -1.3669 -1.2849 0.4988 -1.2993 -1.7819 -0.9356 -1.3559(0.0055) (0.011) (0.0003) (0.0072) (0.0046) (0.0015) (0.0071)θ3 0.0842 0.4970 -1.1876 0.4788 -0.3668 -0.3690 0.6391(0.0007) (0.0004) (0.0053) (0.0003) (0.0007) (0.0007) (0.0002)θ4 0.1541 0.0397 -0.3510 -0.2085 0.4309 0.4245 -0.0591(0.0003) (0.0002) (0.0014) (0.0004) (0.0003) (0.0003) (0.0001)bi 193420 159320 191340 180890 188320 190750 185670Specification 21999 2000 2001 2002 2003 2004 2005θ1 1.0408 1.0157 1.1637 1.1011 0.9652 0.9886 1.7575(0.0004) (0.0005) (0.0003 ) (0.0005) (0.0005) (0.0003) (0.0006)θ2 0.5115 -1.2849 0.4988 -1.2993 -1.7819 -0.9356 -1.3559(0.0002) (0.0110) (0.0003) (0.0072) (0.0046) (0.0015) (0.0071)θ3 -0.4881 0.4970 -1.1876 0.4788 -0.3668 -0.3690 0.6391(0.0009) (0.0004) (0.0053) (0.0003) (0.0007) (0.0007) (0.0002)θ4 -1.0452 0.0397 -0.3510 -0.2085 0.4309 0.4245 -0.0591(0.0039) (0.0002) (0.0014) (0.0004) (0.0003) (0.0003) (0.0001)bi 193420 159320 191340 180890 188320 190750 185670This table reports preference parameter estimates and standard errors (in parantheses) forprofessionals and managers in the Canadian finance and insurance industry for two spec-ifications for the years 1995-2005 (WES sample frame). bi denotes the weighted samplesize in year t. For each year, we report the value of θˆn(βˆ ), where βˆ is the value of theminimum distance estimate for that year. Specification 1 and Specification 2 refer to thecases in which worker and firm capital are multiplicative and additive respectively. θ1 isthe coefficient on worker and firm attributes in the production function. The remainderare coefficients on the outside option function θ2: femalei, θ3: marital status, θ4: numberof dependent children. All results are statistically significant at α = 0.01. Both specifi-cations suggest a relatively stable production technology, θ1, over time with increases in2005 . Both specifications suggest a negative coefficient on femalei in the worker’s wageequation.142Table A.6: Counterfactual Estimated Probabilities of Investing in High Edu-cation, Manufacturing IndustrySpecification 1Year 1999 2000 2001 2002 2003 2004 2005βˆCF1999 0.7443 0.7310 0.8605 0.6723 0.8205 0.8276 0.7622βˆCF2000 0.7442 0.7310 0.8605 0.6723 0.8205 0.8275 0.7622βˆCF2001 0.6656 0.7263 0.8596 0.6736 0.7494 0.7436 0.7049βˆCF2002 0.6947 0.7281 0.8600 0.6732 0.7763 0.7758 0.7259βˆCF2003 0.7237 0.7298 0.8603 0.6727 0.8026 0.8066 0.7470βˆCF2004 0.7404 0.7308 0.8604 0.6724 0.8172 0.8235 0.7594βˆCF2005 0.7624 0.7321 0.8607 0.6719 0.8358 0.8400 0.7762Specification 2Year 1999 2000 2001 2002 2003 2004 2005βˆCF1999 0.7253 0.6761 0.7036 0.6724 0.8066 0.8283 0.7762βˆCF2000 0.7224 0.6744 0.7042 0.6725 0.8040 0.8254 0.7738βˆCF2001 0.7036 0.6635 0.7074 0.6728 0.7860 0.8055 0.7582βˆCF2002 0.7045 0.6641 0.7072 0.6728 0.7868 0.8065 0.7590βˆCF2003 0.7160 0.6707 0.7053 0.6726 0.7980 0.8187 0.7685βˆCF2004 0.7185 0.6721 0.7049 0.6725 0.8003 0.8213 0.7705βˆCF2005 0.7306 0.6793 0.7027 0.6723 0.8116 0.8336 0.7806This table reports the model-simulated probability of investing in high education at theestimated parameter values for the secondary products manufacturing sector (WES indus-try 4) for two-specifications. Specification 1 and 2 are defined in the Section A.1.6. Theresults demonstrate the importance of the matching technology and complementarities oneducation decisions. The equilibrium probability of education is typically higher in thecomplementarities case (Specification 1). In the manufacturing sector in 1999 (a high βyear), the effect of switching to the matching technology from 2001 causes a fall in theequilibrium probability of attending college by roughly 8% in Specification 1, and 2.5%in Specification 2. The variation of preferences and worker distribution of characteristicsover time is also significant. In 2001, if preferences and characteristics and the were as theywere in 2002, the probability of investing in higher education would plummet from 86%to 67%. The effect in greatly attenuated in Specification 2, which assumes no productioncomplementarities between worker and firm types are present in production.143Table A.7: Counterfactual Estimated Probabilities of Investing in High Edu-cation, Finance IndustrySpecification 1Year 1999 2000 2001 2002 2003 2004 2005βˆCF1999 0.6903 0.6961 0.7163 0.6979 0.6833 0.6799 0.7889βˆCF2000 0.7059 0.7097 0.7341 0.7146 0.6965 0.6932 0.8050βˆCF2001 0.6807 0.6877 0.7052 0.6877 0.6751 0.6719 0.7787βˆCF2002 0.6847 0.6911 0.7097 0.6918 0.6784 0.6752 0.7829βˆCF2003 0.6885 0.6944 0.7142 0.6960 0.6817 0.6784 0.7870βˆCF2004 0.6827 0.6894 0.7075 0.6898 0.6768 0.6735 0.7808βˆCF2005 0.6767 0.6842 0.7005 0.6833 0.6717 0.6686 0.7744Specification 2Year 1999 2000 2001 2002 2003 2004 2005βˆCF1999 0.6399 0.6619 0.6507 0.6415 0.6472 0.6431 0.7380βˆCF2000 0.6449 0.6665 0.6565 0.6470 0.6514 0.6474 0.7442βˆCF2001 0.6376 0.6598 0.6480 0.6390 0.6452 0.6411 0.7351βˆCF2002 0.6382 0.6603 0.6487 0.6396 0.6457 0.6416 0.7358βˆCF2003 0.6394 0.6614 0.6500 0.6409 0.6467 0.6426 0.7373βˆCF2004 0.6376 0.6598 0.6480 0.6390 0.6452 0.6411 0.7351βˆCF2005 0.6364 0.6587 0.6466 0.6376 0.6441 0.6401 0.7336Specification 1 and 2 are as defined in Seciton A.1.6. As in the manufacturing sector, theequilibrium probability of investing in education is higher in the case of complemenarities(Specification 1). Changes in preferences and technology and characteristics matter toeducation patterns: for example, in 1999, the effect of switching to 2005’s preferences andexogenous characteristics leads to an increase of almost 10% in both specifications.144Table A.8: Counterfactual Estimated Probabilities of Investing in High Edu-cation, Manufacturing IndustrySpecification 1Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.7443 0.7310 0.8596 0.6732 0.8026 0.8235 0.7762β = 0 0.7007 0.7285 0.8600 0.6731 0.7817 0.6822 0.7302β = 5 0.7954 0.7342 0.8610 0.6710 0.8629 0.8758 0.8036Specification 2Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.7253 0.6744 0.7074 0.6728 0.7980 0.8213 0.7806β = 0 0.7058 0.6647 0.7070 0.6727 0.7880 0.8078 0.7600β = 5 0.7456 0.6888 0.6998 0.6721 0.8250 0.8483 0.7936In this table we consider the effects of very low frictions (β = 5) and maximal frictions(β = 0) in the case that the production function exhibits interactions between worker andfirm characteristics (Specification 1) and when they do not (Specification 2). The tableillustrates the importance of both production complementarities and matching frictions toeducational attainment. In 2004, the effect of lowering β to 0 causes a fall in the probabilityof high education by 14% in Specification 1, but only by 2% in Specificaiton 2. The overalllevel of investment in education is typically much lower in the case with high matchingfrictions (low β ). β has a greater effect on the outcome in the complementarities case(Specification 1).145Table A.9: Counterfactual Estimated Probabilities of Investing in High Edu-cation, Finance IndustrySpecification 1Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.6903 0.7097 0.7052 0.6918 0.6817 0.6735 0.7744β = 0 0.6648 0.6739 0.6867 0.6707 0.6618 0.6588 0.7615β = 5 0.7509 0.7500 0.7830 0.7618 0.7366 0.7335 0.8492Specification 2Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.6399 0.6665 0.6480 0.6396 0.6467 0.6411 0.7336β = 0 0.6330 0.6555 0.6425 0.6339 0.6412 0.6372 0.7292β = 5 0.6614 0.6817 0.6755 0.6650 0.6658 0.6618 0.7645In this table we consider the effects of very low frictions (β = 5) and maximal frictions(β = 0) for Specifications 1 and 2 in the finance industry. The implications are similarto those for the manufacturing industry in the previous table. A rise in β causes a muchgreater increase in the probability of high education in Specification 1: In 2001, the effectof a rise in the estimated β to β = 5 leads to a 8% increase in the equilibrium probability ofhigh education under Specification 1, but only a 3% increase in Specification 2. The levelof investment in education is lower in the case without complementarities (Specification2).146Table A.10: Matching Technology Counterfactuals, Simulated Gini Coeffi-cient, Manufacturing IndustrySpecification 1Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.1916 0.4207 0.2112 0.4216 0.1646 0.2121 0.2220β = 0 0.2138 0.4299 0.2149 0.4218 0.2013 0.2534 0.2507β = 5 0.2112 0.4330 0.2280 0.4208 0.1951 0.2360 0.2452Specification 2Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.1129 0.2203 0.3481 0.4206 0.1183 0.1268 0.1653β = 0 0.1181 0.2238 0.3499 0.4207 0.1301 0.1285 0.1710β = 5 0.1255 0.2311 0.3478 0.4201 0.1294 0.1251 0.1728This table reports the model-simulated Gini coefficients under two-specifications. Spec-ification 1 and 2 are as defined in Seciton A.1.6. The predicted level of wage inequalityis typically much lower in Specification 2, where there are no production complementar-ities. In Specification 1, the level of inequality at the estimated value of the frictions islower than at the counterfactual levels for most years (except 2002). For example, in 2005the simulated Gini is 0.222 and the investment in education is 77%. This rises to 0.2507(education investment 76%) when information frictions are highest and 0.2452 (educationinvestment 85%) when frictions are lowest. Similar patterns can be seen in Specification2.147Table A.11: Matching Technology Counterfactuals, Gini Coefficient, Fi-nance IndustrySpecification 1Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.2602 0.2929 0.2769 0.2542 0.3594 0.2976 0.2607β = 0 0.2516 0.2814 0.2502 0.2332 0.3408 0.2813 0.2495β = 5 0.2647 0.2899 0.2523 0.2380 0.3421 0.2856 0.2432Specification 2Year 1999 2000 2001 2002 2003 2004 2005βˆyear 0.1998 0.2245 0.1756 0.1655 0.2801 0.2200 0.1815β = 0 0.1984 0.2237 0.1714 0.1623 0.2759 0.2185 0.1793β = 5 0.2075 0.2315 0.1803 0.1697 0.2827 0.2266 0.1837This table reports the model-simulated Gini coefficients under two-specifications. Specifi-cation 1 and 2 are as defined in Section A.1.6. As in the manufacturing industry, the levelsof inequality are typically much lower when there are no production complementarities(Specification 2). In Specification 1, the level of inequality at the estimated value of thefrictions is higher than at the counterfactual levels for each year (except 1999). In Spec-ification 2, the effect of lowering matching frictions raises wage inequality in every year.In this case, when frictions are lowered, the effect of increased sorting is stronger than theinequality-lowering effect of the greater supply of highly educated workers .148A.2 Appendix to Chapter 2Proof of Theorem 2.2.1: From the optimization of agent i, we havesBRi (Ii) = X′i,1γ0+ X˜′i,2δ0+β0E[Y˜Bi |Ii]+ εi+ηi, (A.13)where Y˜ Bi denotes the weighted average (over NP(i)) of sk(Ik) where the weightsare given by beliefs of player i and rik. Thus we writeE[Y˜ Bi |Ii] =1nP(i)∑k∈NP(i)rik ∑j∈NI(i)wi′k jTj1{ j ∈ NiI(k)}.Plugging in this in (A.13), we havesBRi (Ii) =(γ0+β0nP(i)∑k∈NP(i)rikwiki,11{i ∈ NiI(k)})′Xi,1+(1+β0nP(i)∑k∈NP(i)rikwiki,ε1{i ∈ NiI(k)})εi+An+Bn,whereAn =β0nP(i)∑k∈NP(i)rikwi′ki,2Xi,21{i ∈ NiI(k)}, andBn = β0 ∑j∈NI(i)1nP(i)∑k∈NP(i)rik(wi′k j,1X j,1+wik j,εε j)1{ j ∈ NiI(k)}+β0 ∑j∈NI(i)1nP(i)∑k∈NP(i)rikwi′k j,2X j,21{ j ∈ NiI(k)}+1nP(i)∑k∈NP(i)rikδ ′0Xk,2.149By setting the coefficients of X j,1, ε j and X j,2 to be wij,1, wij,ε and wij,2, we obtainthatwii,1 = γ0+β0Miwii,1, (A.14)wii,ε = 1+β0Miwii,ε ,wii,2 = β0Miwii,2,and for all j ∈ NI(i),wi j,1 = β0Miwij,1, (A.15)wi j,ε = β0Miwij,ε , and (A.16)wi j,2 ={δ0ri j/nP(i)+β0Miwij,2, if j ∈ NP(i),β0Miwij,2, if j ∈ NI(i)\NP(i),(A.17)whereMiwij =1nP(i)∑k∈NP(i)rikwik j1{ j ∈ N¯iI(k)}.Now, we apply the behavioral assumptions to this operator to obtain the fol-lowing:Miwii =1nP(i)∑k∈NP(i)rikτ ikiwii1{i ∈ NP(k)}=1nP(i)∑k∈NP(i)rikτ ikiwii=1nP(i)∑k∈NP(i)1n¯P(k)wii = ciiwii.By plugging this into (A.14), we havewii,1 = γ0+β0wii,1cii, (A.18)wii,ε = 1+β0wii,εcii,150andwii,2 = wii,2 · β0nP(i) ∑k∈NP(i)1{i ∈ N¯iI(k)}n¯P(k)= wii,2 · β0nP(i) ∑k∈NP(i)1{i ∈ NP(k)}n¯P(k)= β0wii,2cii.The last equation gives wii,2 = 0, because |β0cii| < 1, and the first two equationsgivewii,1 =γ01−β0cii (A.19)andwii,ε =11−β0cii . (A.20)Also, we turn toMwij:Miwij =1nP(i)∑k∈NP(i)wi j1{ j ∈ NP(k)}+ri jwij j1{ j ∈ NP(i)}nP(i), (A.21)where the last term corresponds to the case j = k ∈ NP(i). Using the definition ci j,we rewriteMiwij,1 = ci jwi j,1+wii,1ri j1{ j ∈ NP(i)}nP(i)= ci jwi j,1+γ01−β0cii1nP(i)ri j1{ j ∈ NP(i)}andMiwij,ε = ci jwi j,ε +wii,εri j1{ j ∈ NP(i)}nP(i)= ci jwi j,ε +11−β0cii1nP(i)ri j1{ j ∈ NP(i)}.We plug this into (A.15) to obtainwi j,1 =β0γ0ri j1{ j ∈ NP(i)}nP(i)(1−β0ci j)(1−β0cii) ,151andwi j,ε =β0ri j1{ j ∈ NP(i)}nP(i)(1−β0ci j)(1−β0cii) .Finally, let us consider wi j,2. Note that from (A.21),Miwij,2 = ci jwi j,2,because wii,2 = 0. By plugging this into (A.15), we obtain thatwi j,2 ={δ0ri j/nP(i)+β0ci jwi j,2, if j ∈ NP(i),β0ci jwi j,2, if j ∈ N2P(i)\NP(i),={δ0ri j/nP(i)+β0ci jwi j,2, if j ∈ NP(i),0, if j ∈ N2P(i)\NP(i),where the last zero follows from the equality wi j,2 = β0ci jwi j,2 with |β0ci j| < 1.Therefore, we havewi j,2 =δ0ri j1{ j ∈ NP(i)}nP(i)(1−β0ci j) .From the form of a linear strategy for sBRi (Ii) with the weights as solved thus far,we obtain the desired result.Proof of Theorem 2.2.2:Suppose each agent is first-order sophisticated (FS) type; i.e., each i ∈ N be-lieves that each k 6= i is simple type and chooses strategies according to:sik = ∑j∈NP(k)T ′j wik j +ηk.The best responses of FS types are linear because the utility is quadratic in theplayer’s own actions, and they believe simple types play according to linear strate-152gies, with i’s best response taking the formsBR.FSi (Ii,1) =X′i,1γ0+X′i,2δ0+β0(1nP(i)∑k∈NP(i)sik)+ηi+ εi.Since an agent i with FS-type believes that all other agents are of simple type,we have that wik j equal the weights given by player k to player j according to thebest response strategies in Theorem 2.2.1. Using λi j = 1/(1−β0ci j), together withthe Theorem 2.2.1 weights, in equations (A.14) - (A.15), we obtain the FS weightsas follows:wii,1 = γ0+β0nP(i)∑k∈NP(i)β0γ0λki1{i ∈ NP(k)}nP(k)(1−β0ckk) ,wii,ε = 1+β0nP(i)∑k∈NP(i)β0λki1{i ∈ NP(k)}nP(k)(1−β0ckk) ,wii,2 =β0nP(i)∑k∈NP(i)δ0λki1{i ∈ NP(k)}nP(k),and for each j ∈ NP,2(i),wi j,1 =β0nP(i)∑k∈NP(i)wik j,11{ j ∈ NP(k)}=β0nP(i)(∑k∈NP(i)wik j,11{ j ∈ NP(k)}+wij j,11{ j = k})=β0nP(i)∑k∈NP(i)β0γ0λk j1{ j ∈ NP(k)}nP(k)(1−β0ckk) +β0γ01{ j ∈ NP(i)}nP(i)(1−β0c j j) .Analogously,wi j,ε =β01{ j ∈ NP(i)}nP(i)(1−β0c j j) +β0nP(i)∑k∈NP(i)β0λk j1{ j ∈ NP(k)}nP(k)(1−β0ckk) ,153and as for wi j,2, if j ∈ NP(i),wi j,2 = δ0/nP(i)+β0nP(i)∑k∈NP(i)δ0λk j1{ j ∈ NP(k)}nP(k),and if j ∈ NP,2(i)\NP(i),wi j,2 =β0nP(i)∑k∈NP(i)δ0λk j1{ j ∈ NP(k)}nP(k).Next, using the definitions of λ¯i j and λ˜i j in the theorem, we writewii,1 = γ0+β 20 γ0λ˜ii,wii,ε = 1+β 20 λ˜ii,and wii,2 = β0δ0λ¯ii. Lastly, for each j ∈ NP,2(i), we havewi j,1 =β0γ0λ j j1{ j ∈ NP(i)}nP(i)+β 20 γ0λ˜i j,wi j,ε =β0λ j j1{ j ∈ NP(i)}nP(i)+β 20 λ˜i j,andwi j,2 =β0δ0λ¯i j, j ∈ NP,2(i)\NP(i)δ0/nP(i)+β0δ0λ¯i j, j ∈ NP(i).Substituting these weights back into the best response function for FS types, weobtain the desired result. 154


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items