UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essays in labor economics Jales, Hugo Borges 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_february_jales_hugo.pdf [ 10.28MB ]
Metadata
JSON: 24-1.0221355.json
JSON-LD: 24-1.0221355-ld.json
RDF/XML (Pretty): 24-1.0221355-rdf.xml
RDF/JSON: 24-1.0221355-rdf.json
Turtle: 24-1.0221355-turtle.txt
N-Triples: 24-1.0221355-rdf-ntriples.txt
Original Record: 24-1.0221355-source.json
Full Text
24-1.0221355-fulltext.txt
Citation
24-1.0221355.ris

Full Text

Essays in Labor EconomicsbyHugo Borges JalesB.A. Economics, Universidade Federal de Minas Gerais, 2008M.A. Economics, Fundacao Getulio Vargas, 2010A THESIS SUBMITTED IN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinThe Faculty of Graduate and Postdoctoral Studies(Economics)The University of British Columbia(Vancouver)December 2015c© Hugo Borges Jales 2015AbstractThis thesis examines two topics in labor economics and policy evaluation. Chapter 1 providesan introduction.Chapter 2 addresses the estimation of the effects of the minimum wage on labor market out-comes in developing countries. The main finding is that, even in the absence of policy variation,that is, when the same level of the minimum wage holds for all the workers in the data, it is stillpossible to recover the effects of this policy under particular assumptions of a dual economymodel. Using this result, the effects of the minimum wage in Brazil from 2001 to 2009 areestimated. It is shown that the minimum wage has considerably increased average wages andreduced wage inequality. However, these effects are accompanied by higher unemployment andan increase in the size of the informal sector. Overall, the loss of tax revenues from the outflowof workers to the informal sector and unemployment more than offsets the increase in wages.Thus, this minimum wage policy contributes to a decrease in the labor tax revenues collected bythe government.Chapter 3 also considers estimation of the effects of the minimum wage on labor marketoutcomes in developing countries. However, this chapter explores the use of less restrictiveassumptions regarding the joint distribution of sectors and wages. To ease the estimation of themodel parameters, a parametric approach (maximum likelihood) is used. The results validatethe conclusions obtained in the previous chapter.Chapter 4 investigates the estimation of policy effects in partially randomized designs. It isshown that when randomization is implemented in a stratified way, the usual tests of balanceof characteristics between treatment and control groups can suffer from size distortions, lack ofpower, or both. A solution to this problem is proposed, and its performance is compared withthe baseline estimators in a simulation. It is shown that the proposed test possesses the desirablecharacteristics of correct nominal size and consistency. Finally, to illustrate the use of thesetechniques, a stratified, randomized job training program is analyzed.iiPrefaceChapter 4 is a manuscript co-authored with Sergio Firpo and Miguel Foguel. I have been activelyparticipating in all stages of this project, including deriving the main results, conducting the dataanalysis and writing the manuscript. The research presented in Chapter 4 is covered by UBCBehavioural Research Ethics Board Certificate number H15-01827.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Estimating the Effects of the Minimum Wage in a Developing Country: A DensityDiscontinuity Design Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.1 The Meyer and Wise Approach . . . . . . . . . . . . . . . . . . . . . 72.3.2 Doyle’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.3 Minimum Wage Effects in a Dual Economy . . . . . . . . . . . . . . . 122.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.5 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5.1 Non-Parametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.7 Empirical Application: The Effect of the Minimum Wage in Brazil . . . . . . . 252.7.1 Data and Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . 272.7.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36iv2.7.3 Tax Revenues and the Size of the Informal Sector . . . . . . . . . . . . 412.8 Testing the Underlying Assumptions and Robustness Checks . . . . . . . . . . 432.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Measuring the Effects of the Minimum Wage on Employment, Formality and theWage Distribution: A Structural Econometric Approach . . . . . . . . . . . . . 533.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.2.1 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.3 Empirical Application: The Effects of the Minimum Wage in Brazil . . . . . . 613.3.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3.2.1 Parametric Model under Independence between (Latent) Sec-tor and Wage . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3.2.2 Relaxing the Independence Assumption . . . . . . . . . . . 633.3.2.3 Marginal Effects of the Minimum Wage . . . . . . . . . . . 683.3.2.4 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3.2.5 Decomposing the Differences in the Wage Distributions AcrossSectors – The Role of the Minimum Wage . . . . . . . . . . 763.3.3 Interpreting the Evolution of Formality over Time . . . . . . . . . . . . 783.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 Estimation of Average Treatment Effects in Partially Randomized Designs . . . 834.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.2 Tests for Imbalances in Pre-Treatment Variables under Stratified RandomizationDesigns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.2.1 Basic Setup and Notation . . . . . . . . . . . . . . . . . . . . . . . . . 874.2.2 Separate Tests for Each Stratum . . . . . . . . . . . . . . . . . . . . . 884.2.3 Tests Based on Pooling All Strata . . . . . . . . . . . . . . . . . . . . 894.2.3.1 Tests for Simple Difference in Means . . . . . . . . . . . . . 894.2.3.2 Tests for Weighted Difference in Means . . . . . . . . . . . 904.2.4 Regression-based Test Statistics . . . . . . . . . . . . . . . . . . . . . 904.2.4.1 The Long Regression . . . . . . . . . . . . . . . . . . . . . 904.2.4.2 Relation to Short Regression Test Statistics . . . . . . . . . . 924.3 Estimation of ATE and ATT Using Information on Covariates Imbalance . . . . 944.3.1 Estimating ATE After Testing for Imbalance at Each Stratum . . . . . . 964.3.2 Estimating ATE After Testing for Imbalance Using Pooled Data AcrossStrata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3.3 Empirical Likelihood Estimators for ATE and ATT . . . . . . . . . . . 97v4.3.4 Euclidean Distance Norm . . . . . . . . . . . . . . . . . . . . . . . . 1004.4 A Monte Carlo Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.4.1 Tests of Covariate Imbalance . . . . . . . . . . . . . . . . . . . . . . . 1014.4.2 Monte Carlo - Performance of Average Treatment Effect EstimationProcedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.5 The PLANFOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1144.5.1 Program Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 1144.5.2 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.5.3 Testing for Pretreatment Imbalances in PLANFOR’s data . . . . . . . . 1164.5.4 Treatment Effect Results . . . . . . . . . . . . . . . . . . . . . . . . . 1194.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122A Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127A.1 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127A.2 Identification under Independence between Sector and Wages . . . . . . . . . . 130A.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132A.3.1 Local Linear Density Estimation . . . . . . . . . . . . . . . . . . . . . 132A.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.4.1 Role of Covariates and Unobserved Heterogeneity . . . . . . . . . . . 134A.4.2 Random Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . 136A.4.3 Lack of Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138A.4.4 Aggregate Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 139A.4.5 Robust Estimates of the Effects of the Minimum Wage on the Size ofthe Informal Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140A.5 Tax Effects of the Minimum Wage Under Alternative Assumptions . . . . . . . 141B Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144B.1 Estimating Spillovers Under Parametric Assumptions . . . . . . . . . . . . . . 144B.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144B.1.2 Identification and Estimation . . . . . . . . . . . . . . . . . . . . . . . 145B.1.3 Remarks and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 148C Appendix to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149C.1 Construction of Regression Based Test Statistics . . . . . . . . . . . . . . . . . 149C.1.1 General Expression for Weighted Averages that Correct for DifferentialTreatment Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 151C.2 Derivation of Test Statistics for Weighted Averages . . . . . . . . . . . . . . . 151viList of Tables2.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Descriptive Statistics by Sector . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3 Minimum and Sub-minimum Wage Conditional Probabilities . . . . . . . . . . 302.4 Formality and Informality Conditional Probabilities . . . . . . . . . . . . . . . 312.5 Model Parameter Estimates by Year . . . . . . . . . . . . . . . . . . . . . . . 382.6 Distributional Effects of the Minimum Wage . . . . . . . . . . . . . . . . . . . 402.7 The Geographic Heterogeneity of Minimum Wage Effects . . . . . . . . . . . 412.8 Minimum Wage Effects on Labor Tax Revenues . . . . . . . . . . . . . . . . . 432.9 Formality vs. Wages - Linear Regression Estimates . . . . . . . . . . . . . . . 492.10 Descriptive Statistics by Sector: The Role of the Minimum Wage . . . . . . . . 492.11 Placebo Tests: Discontinuity Estimates using Minimum Wage Values of OtherYears . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.12 Robustness - McCrary’s Density Discontinuity Estimator . . . . . . . . . . . . 513.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2 Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.3 Minimum Wage Effects Parameters . . . . . . . . . . . . . . . . . . . . . . . 633.4 Transformed Parameter Estimates: General Model . . . . . . . . . . . . . . . . 643.5 Structural Parameter Estimates: General Model . . . . . . . . . . . . . . . . . 643.6 Marginal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.7 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.1 Parameter Specification for the Monte Carlo Simulation . . . . . . . . . . . . 1014.2 Treatment Status Parameters for the Monte Carlo Simulation . . . . . . . . . . 1024.3 Performance of Disaggregated Tests under the Null Hypothesis . . . . . . . . 1044.4 Performance of Aggregated Tests under the Null Hypothesis . . . . . . . . . . 1054.5 Performance of Disaggregated Tests off the Null Hypothesis . . . . . . . . . . 1064.6 Performance of Aggregated Tests off the Null Hypothesis . . . . . . . . . . . 1074.7 Performance of Disagregated Tests off the Null Hypothesis . . . . . . . . . . . 1084.8 Performance of Aggregated Tests Off the Null Hypothesis . . . . . . . . . . . 1104.9 Parameters Specification for the Potential Outcomes in the Monte Carlo Simu-lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111vii4.10 Performance of ATE Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 1124.11 Performance of ATE Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 1134.12 Evolution of Sample Size by City and Treatment Status . . . . . . . . . . . . . 1174.13 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184.14 Strata Level Tests of Balancement of Covariates . . . . . . . . . . . . . . . . . 1184.15 Aggregated Balancement Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 1194.16 Planfor – Estimates of the Average Treatment Effects on the Treated . . . . . . 120A.1 Robust Estimates of the Effects of the Minimum Wage on the Size of the FormalSector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142A.2 Labor Tax effects under a “No Unemployment” assumption . . . . . . . . . . . 143viiiList of Figures2.1 A Worker’s Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Doyle’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Doyle’s Model: Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Dual-economy Model: Latent and Observed Densities . . . . . . . . . . . . . . 172.5 Dual-economy Model: Latent and Observed Densities . . . . . . . . . . . . . . 182.6 Dual-economy Model: Latent and Observed Conditional Probabilities . . . . . 192.7 Dual Economy Model: Latent and Observed Conditional Probabilities underIndependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.8 Empirical CDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.9 Wage Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.10 Nominal Wages and Minimum Wage Evolution . . . . . . . . . . . . . . . . . 342.11 Real Wages and Minimum Wage Evolution . . . . . . . . . . . . . . . . . . . 352.12 Kernel Density Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.13 Formality vs. Wages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.14 Formality vs. Log-wages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.15 Density of Wages by Sector above the Minimum Wage . . . . . . . . . . . . . 472.16 Empirical CDF by Sector above the Minimum Wage . . . . . . . . . . . . . . 483.1 Empirical CDFs by Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2 Conditional Probability of the Sector Given the Wage . . . . . . . . . . . . . . 653.3 Formal Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.4 Informal Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.5 Latent Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.6 Observed Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.7 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.8 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.9 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.10 Conditional Probability of Sector Given the Wage . . . . . . . . . . . . . . . . 743.11 Observed and Predicted CDFs . . . . . . . . . . . . . . . . . . . . . . . . . . 743.12 Observed and Predicted CDFs . . . . . . . . . . . . . . . . . . . . . . . . . . 753.13 Observed and Predicted CDFs . . . . . . . . . . . . . . . . . . . . . . . . . . 75ix3.14 Empirical CDFs Above the Minimum Wage by Sector . . . . . . . . . . . . . . 763.15 Decomposition of the Differences of the Densities Across Sectors . . . . . . . 773.16 Time Trends in Formality by Wage Group – Threshold in Real Terms . . . . . 783.17 Time Trends in Formality by Wage Group – Robustness of Threshold Specification 80xAcknowledgementsI would like to express my deepest gratitude to my advisor, Professor Thomas Lemieux, for hissupport, kindness, patience, and friendship. I am indebted to Professors Nicole Fortin, DavidGreen, and Vadim Marmer for their guidance throughout all these years. I learned a lot from allof you. I thank Professors Francesco Trebbi, Patrick Francois, and Craig Riddell for encouragingme to work on this topic when the idea was still in its infancy. I leave UBC bringing fondmemories with me.xiTo my parents.xiiChapter 1IntroductionThis thesis examines two topics in labor economics and economic policy evaluation. Chapters2 and 3 look at the effects of the minimum wage in developing countries. Chapter 4 considershow to properly test for the similarity of characteristics between treatment and control groupsin stratified randomized control trials.Minimum wage policies are often set at the national level. Additionally, it is common thatthe policy will set the same level for every worker in the population. A unified wage level ap-plied to every worker complicates the task of evaluating such a policy, as there is no “natural”control group. Furthermore, developing countries are often characterized by a large informalsector in which workers are not forced to comply with labor market standards. This sectormay behave endogenously, for example, growing in response to changes in the minimum wage.Chapters 2 and 3 of this thesis describe a Dual Economy model wherein workers can work ineither the formal or the informal sector, which describes how such an economy may react toa minimum wage policy. I show that, even when the policy is applied to every worker in thepopulation, it is still possible to recover the effects of this policy. Chapter 2 ends by providingestimates of the effects of the minimum wage in Brazil for the 2001–2009 period. Those re-sults are obtained under a somewhat restrictive assumption regarding the relationship betweensectors and wages. To address this limitation, Chapter 3 further investigates the plausibility ofthis assumption by strengthening the assumptions regarding the distribution of wages and thenrelaxing the assumptions about the relationship between sectors and wages.Chapter 4 presents an integrated method for estimating typical treatment effect parametersin a stratified randomized design with potentially corrupted randomization. We first demonstratehow to properly test for imbalance in covariates with a stratified experimental design. Althoughtesting for imbalance in covariates is the classic method of checking for improper randomization,the literature has not provided sufficient guidance in that matter for stratified designs. We thenshow how the information on covariate imbalance can be used to estimate the average treatmenteffect (ATE) and the average treatment effect on the treated (ATT). We present some MonteCarlo exercises comparing our method with previous ones. Finally, we apply the method toevaluate the impact of a job training program where randomization was based on stratification.1Chapter 2Estimating the Effects of the MinimumWage in a Developing Country: ADensity Discontinuity Design Approach2.1 IntroductionDespite its widespread use, controversy persists regarding the economic impact of the minimumwage. A simple one-sector competitive market model predicts that a minimum wage will gen-erate unemployment when the minimum exceeds the market-clearing wage. However, if theemployer has market power, then a minimum wage can lead to an increase in wages and em-ployment. In an economy with a large informal sector, where some employers do not complywith the minimum wage policy, the minimum wage might not generate unemployment effectseven in the absence of market power on the part of the employer. This will hold as long as theworkers can freely migrate from one sector to the other and the informal sector is sufficientlylarge to accommodate such movements.These conflicting theoretical predictions provide a strong motivation for empirical studieson the effects of minimum wage policies. In this paper, I develop a Dual-economy model basedon Meyer and Wise (1983) to assess the impacts of the minimum wage on (a) unemployment,(b) average wages, (c) wage inequality, (d) sector mobility, (e) the size of the informal sector and(f) labor tax revenues. I model the joint distribution of wages and sectors (latent and observed),as opposed to the marginal distribution of wages, as in Meyer and Wise (1983). A model forthe joint distribution of sector and wages allows me to infer the size of the formal sector thatwould prevail in the absence of the minimum wage and compute the proportion of workers whomove to the informal sector in response to the policy. I provide the conditions for identifying theDual-economy model parameters and the latent joint distribution of sector and wages, that is,the distribution that would prevail in the absence of the minimum wage policy. My identificationstrategy relies on the discontinuity in wage density at the minimum wage and the differences inthe response to the minimum wage between the formal and informal sectors.This paper’s contributions to the literature are the following: (i) I document key empiricalfacts concerning the relationship between formal and informal wage densities that have beenoverlooked in previous research, namely, the similarity between these densities conditional on2values above the minimum wage. (ii) I provide a novel identification strategy that combinesa non-parametric density discontinuity design with a parametric model for the conditional dis-tribution of sector given the wage. In particular, I show that under reasonable conditions, theparameters that describe the effects of the minimum wage and the underlying latent joint distri-bution of sector and wages are identified using only cross-sectional data. (iii) I estimate a sectormobility parameter, the probability that a worker in the formal sector moves to the informal sec-tor in response to the minimum wage. (iv) I demonstrate how to test some of the assumptionsinvoked to identify the parameters of the model. (v) I show that these assumptions hold in theempirical application. (vi) I estimate the effect of the minimum wage on the joint distributionof sector and wages and estimate the effect of the minimum wage on labor tax revenues. To thebest of my knowledge, this is the first paper that attempts to identify both the latent share of theformal sector and the effects of the minimum wage on labor tax revenues.The model is estimated using the years 2001 to 2009 from “Pesquisa Nacional por Amostrade Domicı´lios” (PNAD), a dataset comprising repeated cross sections of an annual householdsurvey that is representative of the Brazilian population. I find that the probability of a formalworker switching to the informal sector as a result of the policy is small – approximately 10%.The combined effect of unemployment and transitions to the informal sector generated by theintroduction of the minimum wage leads to an 11% decrease in the size of the formal sectorrelative to the counterfactual state defined by the absence of the minimum wage. This associatedgrowth in the size of the informal sector as a result of the policy is 46% - an effect attributable tothe fact that the latent formal sector is four times larger than the informal sector of the economy.Unemployment effects of the minimum wage are, as expected, highly correlated with the realvalue of the minimum wage. Moreover, the minimum wage strongly affects average wages(promoting an increase of approximately 16%), wage inequality (an approximately -19% effecton the standard deviation of log wages and a -24% impact on the Gini Index), and labor taxrevenues (-10%).2.2 BackgroundIn Brazil, all workers are required to carry a government document called a “Carteira de Tra-balho”, or worker’s card (see Figure 1). This document, introduced in 1932, serves as proof ofthe worker’s legal employment status. If a worker is formally employed in the Brazilian labormarket, then her contract is signed by the employer on a page of the worker’s card. This laborcontract implies that the worker’s employment is in compliance with labor taxes and labor regu-lations such as the minimum wage. Formal employment gives the worker access to benefits thatinclude unemployment insurance and severance payments.Not all labor contracts are signed by the employer and included in the worker’s card. Whenan employer and a worker agree to a labor contract but decide not to formally sign it and includeit in the worker’s card, the worker’s employment is called informal. Reasons for the existence of3Figure 2.1: A Worker’s Cardinformal contracts include the evasion of labor regulations, such as the payment of labor taxes,compliance with the minimum wage, job safety standards, and restrictions on hours worked perweek.1 Note that this definition of informality is tightly related to compliance with the minimumwage. However, they are not equivalent. A worker with a wage below the minimum wage levelis surely an informal worker. However, a worker whose wage is above the minimum wagemay be formal or informal depending on whether his contract is signed by the employer.2 Theproportion of private sector workers between the ages of 18 and 60 who are employed in theformal sector is .73. In other words, more than one quarter of private sector workers do not havea signed contract included in their worker’s card.The minimum wage in Brazil has been set at the federal level since 1984. In theory, all jobsare covered, meaning that the (same) minimum wage level should apply to every worker. Inpractice, coverage only extends to workers with a contract written on the worker’s card (Lemos,2009). A unified minimum wage set at the federal level with full coverage complicates the taskof finding an appropriate control group. This is because cross-border differences-in-differencesanalysis, such as that in Card and Krueger (1994), is ruled out as a practical option, as thesame level prevails in all states. Another feature of the minimum wage changes in Brazil is thatsince 2005, they have been linked to inflation and GDP growth, which poses further challengesto the use of time-series variation to estimate the effects of the minimum wage. Under these1Firms face a trade-off between the costs of complying with the regulation and the probability/magnitude ofpunishment. The firms’ decision to hire formal versus informal workers was investigated in Almeida and Carneiro(2012), and Mattos and Ogura (2009).2As we will discuss in greater detail below, approximately 20% of the workers whose wages are above theminimum are informal workers.4conditions, it is more difficult to disentangle the effects of the minimum wage from other sourcesof changes in the wage distribution that are linked to changes in economic activity.Despite these challenges, it is nevertheless possible to identify the effect of the minimumwage using only cross-sectional data on sector and wages. This paper describes a set of a priorirestrictions – on the joint distribution of sector and wage, and on the effects of the policy – thatallows for identification of the effects of the minimum wage using only cross-sectional data onsector and wages. This research design is well suited to the Brazilian case because, as notedabove, the country almost entirely lacks variation that can be used to identify the effects of theminimum wage.3The points of departure for this paper are the works of Meyer and Wise (1983) and Doyle(2006). These papers show how to identify the effects of the minimum wage on the distributionof wages. I extend their model to a two-sector, or dual-economy, setting with sector mobility.This allows wages in both sectors to be affected, but in different ways, by the minimum wage.It also allows me to capture the effects of the minimum wage on the size of the formal sectorand other related outcomes, such as labor tax revenues.The dual-economy extension I develop presents new challenges for identification. This isbecause the techniques presented in Doyle (2006) are not alone sufficient to recover the sector-specific parameters of the model in this general version. The reason is that applying Doyle’sstrategy to the aggregate economy only recovers a weighted average of the parameters, whichwill be uninformative for most of the outcomes of interest. Applying his method to each sectorseparately is not feasible, as workers have moved from one sector to the other as a result of thepolicy. Thus, one of the contributions of this paper is demonstrating how to identify the effectsof the minimum wage in this dual-economy setting.In the next sections, I briefly describe the models of Meyer and Wise (1983) and Doyle(2006) to highlight the similarities and differences between their papers and the approach fol-lowed here.2.3 ModelThe effect of the minimum wage on a worker’s wage is the difference between her wage underthe policy and the wage she would receive in its absence. The fundamental problem of causalevaluation is that this difference is conceptually well defined but never observed in the data. Thisis true because we can at most observe the wages for each worker in one of the two possiblestates of the world. However, it is nevertheless helpful to consider these objects. Thus, let3The assumptions made to address the question of interest will not be in terms of agents’ preferences, technologyor equilibrium mechanisms; rather, they will be in terms of the relationship between latent and observed variables.In this sense, the identification is semi-structural. It is structural in the sense that it relies on assumptions concerningthe effects of the policy and semi-structural in the sense that those assumptions can be satisfied by a wide set ofdifferent fully specified structural models.5worker i be characterized by an observed wage Wi(1) and a corresponding latent wage Wi(0),which is defined as the wage that the worker would receive in the absence of the minimum wage.I will denote the minimum wage level by m. I will denote by F0(w) ( f0(w)) the CDF (pdf) oflatent wages. Similarly, denote by F(w) ( f (w)) the CDF (pdf) of observed wages. To keep themodel as simple as possible, assume that these workers come from a population with similarobservable characteristics, and hence, we do not need to control for these characteristics. In theabsence of the minimum wage, every worker i in this population obtains a draw Wi(0) from thedistribution F0, which I will refer to as the underlying wage distribution or the distribution of“market” wages. Although workers are intrinsically homogeneous ex ante, meaning that theydraw their wages from the same distribution, they will have different wages ex post.In the presence of the minimum wage policy, the worker will receive a draw Wi(1) from thedistribution F , which I will refer to as the distribution of observed wages. The most flexible wayto model the effects of the minimum wage in the wage distribution is to assume that each workercan potentially be affected by the policy. If we consider wages in terms of a discrete variable, theeffects of the minimum wage on the distribution of wages can be completely characterized by amatrix of transitions that govern the probability that a worker at any point w of the latent wagedistribution will end up at any point w′ in the observed wage distribution. That is, a completelygeneral (and, by construction, correctly specified) model for the effects of the minimum wageon the wage distribution is a transition matrix in which every entry is given by Pr[W (1) =w′|W (0) = w].This transition matrix is highly informative regarding the effects of the minimum wage onthe labor market. However, the generality of the model means that finding an adequate source ofvariation may be infeasible in the absence of exogenous policy variation. To make the problemof identifying the effects of the minimum wage tractable, I follow Meyer and Wise by imposinga set of a priori restrictions on the transition matrix.4 As we will see, restricting the class ofmodels will allow us to identify the effects of the policy without relying on exogenous policyvariations or time-series data. I will now motivate and describe such restrictions.It is natural to imagine that a minimum wage set below the lowest value of wages in the latenteconomy would generate a small effect, if any. Implicit in this is the principle that the minimumwage should primarily affect those workers whose wages would fall below the minimum wagelevel. The minimum wage should typically present stronger effects on wage distributions suchthat a larger proportion of workers have market wages below the minimum wage. This type ofreasoning is key to the empirical work in Meyer and Wise (1983), Card (1992) and Lee (1999).Thus, a natural starting point to analyze this problem is to presume that the effect of thepolicy is such that wages below the minimum wage increase up to the level at which they complywith the policy, that is, up to the point at which wages are equal to m. Under this scenario andin the absence of spillover effects, it is simple to compute the implied effect of an exogenous4For example, the absence of spillovers is a restriction that states that for all w > m, the matrix is diagonal, thatis, Pr[W (1) = w|W (0) = w′] = 1 if w = w′ and zero otherwise.6change in the minimum wage.5 If the sole reason for low wages in the economy is the lack ofbargaining power on the part of workers, then presuming that all wages below the minimumwage will be raised to comply with the policy can be a reasonable guess for the policy effect.However, because employers are not forced to hire or retain workers, an upward shift in laborcosts may force employers to reduce labor demand. As a result, some of the workers who wouldhave been employed in the absence of the policy will end up without work as a consequenceof it. In a simple competitive model, in which all wages are equal to the marginal product oflabor, this event will have probability one for all workers for whom the market wage is belowm. Finally, it is also realistic to entertain the possibility that some workers will not complywith the policy, potentially by moving to an uncovered sector. This uncovered sector could beone in which the policy does not apply or one in which it should, in principle, be applied butenforcement is, in fact, absent.Under certain assumptions regarding the economy, we can ensure that only one of theseevents described above will prevail. In such cases, the identification problem becomes trivialand the counterfactual effects of exogenous changes on the minimum wage level are simple tocompute. However, the restrictiveness of the assumptions needed for these events to prevailwith such certainty also argues against the credibility of these types of estimates. In general, theeffects of the minimum wage on the lower tail of the wage distribution should be interpreted as acombination of the probabilities of “truncation”, non-employment, and non-compliance. Meyerand Wise provide a formal way of assessing the effects of the minimum wage by identifying thelikelihood of those events given data on wages observed in the presence of a particular level ofthe minimum wage, m. In the next section, I will briefly describe how this can be achieved.2.3.1 The Meyer and Wise ApproachThis section describes the assumptions and estimation strategy used by Meyer and Wise (1983).Assume that the econometrician observes a random sample of observed wages {Wi(1)} of sizeN from a population of interest.6 Let the following hold:Assumption MW1. The latent wage is log-normally distributed. That is, log(W (0))∼N(µ,σ2).Assumption MW2. There are no spillovers from the minimum wage. This means that W (1) =W (0) when W (0)> m.Assumption MW3. If W (0) < m, then with probability pim, the worker receives the minimumwage (W (1) = m). With probability pid , (W (1) = W (0)), the worker’s wage is the same as5Estimates of this type were used in Welch and Cunningham (1978) and Card (1992).6Note that this is a non-standard policy evaluation problem in which all individuals are treated (Wi = Wi(1)),meaning that the (same) minimum wage level holds for everyone in the population. The absence of a control groupforces the use of a model to identify the effects of the policy.7the latent wage (non-compliance). With the complementary probability piu = 1−pim−pid , theworker becomes unemployed (W (1) = ·).7The probabilities (pim,pid ,piu) represent the likelihood of receiving the minimum wage, non-compliance and unemployment. These parameters arise so naturally in the context of the min-imum wage that is occasionally difficult to recognize how they restrict the model in any way.They seem to resemble a list of all possible outcomes. This is not the case, however. The restric-tions imposed by defining these probabilities are as follows: (i) Pr[W (1) > m|W (0) < m] = 0,that is, no worker whose market wage is below the minimum wage will receive a wage greaterthan the minimum wage when the policy is introduced; (ii) the probabilities pid , pim and piu arenot a function of the worker’s latent wage, such as, for example, a function of how far they arefrom the minimum wage; and (iii) workers who do not comply with the policy retain the samewage, that is, Wi(1) =Wi(0).8,9The goal of the exercise is to recover the parameters of the underlying latent distributionof market wages (µ,σ) and the parameters (pid ,pim,piu) that govern how the minimum wageaffects the economy. The key contribution of Meyer and Wise is to show that those parametersare identified using data on observed wages (Wi(1)). Perhaps surprisingly, one need not haveany variation in the policy to recover its effects. To observe how this is achieved, define thelog-likelihood of observing Wi(1) = w as:logL(Wi =w|µ,σ ,pi)= 1I{w < m} log pid f0(w)c +1I{w = m} logpimF0(m)c+1I{w > m} log f0(m)c,where 1I{A} is the indicator function of the event A and c≡ 1−piuF0(m) is a rescaling factor thatensures that the observed density of wages integrates to one. The parameter c can be interpretedas the ratio of employment before and after the introduction of the policy. Meyer and Wise usemaximum-likelihood to estimate the parameters of the model. An intuitive way to think aboutthe identification is to recognize that the model allows us to use the information on wages abovethe minimum wage level to predict the characteristics of the wage distribution in the absence ofthe policy.107Strictly speaking, the appropriate expression should be “non-employment”. I will refer to this effect as the“unemployment” effect of the minimum wage. Throughout the paper I will use non-employment and unemploymentexchangeably, given that the model can not distinguish these effects.8At first glance, the restriction (i) may not appear problematic, as it is difficult to imagine why someone wouldcomply with the policy by increasing a worker’s wage to a value greater than m. This is not impossible, however.An example of a model that is excluded by this assumption is that of Teulings (2000).9The restriction (ii) can be relaxed in certain ways, for example, by making the probabilities (pid ,pim,piu) low-order polynomials of the worker’s latent wage. Restriction (iii) can be relaxed without affecting the identificationstrategy by making the worker draw from the distribution of market wages conditional on values below the minimumwage. Changes to the average wages of those who do not comply with the policy also can be incorporated. However,this change requires some modifications in the identification strategy.10A closer inspection on the likelihood function shows that Meyer and Wise’s approach nests “standard measure-82.3.2 Doyle’s ApproachA limitation of Meyer and Wise’s (1983) approach is that the it relies on a parametric assumptionconcerning the latent wage distribution.11 The contribution of Doyle (2006) is to show that thisis not actually necessary for identification when one is only willing to assume continuity inthe distribution of latent wages. The key idea behind this strategy is that the continuity of thedistribution of latent wages implies that the ratio of the density of observed wages just aboveand below the minimum wage identifies pid , the likelihood of non-compliance with the policy.Given the similarity between this identification strategy and RD Designs, Doyle (2006) termedit a density discontinuity design. In this section, I discuss the identification of the minimumwage effects under the model proposed by Doyle (2006). In the following discussion, I willmaintain assumptions MW2 (no spillovers) and MW3 (minimum wage effects) from Meyer andWise (1983). Again, assume that the econometrician only observes wages in the presence of thepolicy; that is, a random sample of size N from the distribution of {Wi(1)} is available.Assumption D1. The density of latent wages is continuous at m. That is, limw→m+ f0(w) =limw→m− f0(w).As discussed in Doyle (2006), this assumption exploits the fact that the distribution ofworker productivity is likely to be smooth, but the observed density of wages has a jump aroundthe minimum wage. This jump can provide exactly the information necessary to trace back theeffects of the policy on the outcomes of interest. Under assumptions MW2 and MW3, there isa relationship between the latent and observed distribution of wages. This relationship is givenby:f (w) =pid f0(w)c if w < mpimF0(w)c if w = mf0(w)c if w > m,(2.1)where c = 1−piuF0(m), as before. Figure 2.2 provides a graphical example of the relationshipbetween the observed and the latent densities. Taking the ratio of the density of observed wagesjust below and above the minimum wage, that is, considering the lateral limits of the density atm, we have:ment”, truncation and censoring of the wages below the threshold m. If piu is equal to one, the likelihood functionof Meyer and Wise model is the same as the likelihood of a truncation model. If pim is equal to one, the likelihoodof the model is the same as the likelihood of a Tobit model (censoring). If the probability of non-compliance pid isequal to one, the likelihood becomes the standard likelihood of a normal distribution, with no censoring or trunca-tion. Moreover, the probabilities of “measurement”, truncation and censoring have a direct economic interpretationas different responses of the economy to the minimum wage policy.11The sensitivity of the estimates with respect to the parametric assumptions was studied in Dickens, Machin andManning (1994).9Figure 2.2: Doyle’s Modellimw→m− f (w)limw→m+ f (w)=limw→m−pid f0(w)climw→m+f0(w)c= pid ,where the last equality is obtained using assumption D1. Figure 2.3 graphically depicts themechanics of the identification of the non-compliance probability. To recover the remainingparameters, it is easy to see that by integrating the density of observed wages up to the minimumwage, we have Pr[W (1)< m] = pidF0(m)c . Then, we havePr[W (1)=m]Pr[W (1)<m] =pimpid . Because the left-handside of this equation is identified from the data and the right-hand side is a function of only oneunknown, this implies that pim is identified. This also implies that piu = 1−pid−pim is identified.To recover the latent density of wages, one needs to recover F0(m). This is the case becausethe relationship between the observed and latent densities can be inverted once we know therescaling factor c. To see this, note:f0(w) ={f (w)·cpid if w < mf (w) · c if w≥ m. (2.2)10Figure 2.3: Doyle’s Model: Identification11One way to identify F0(m) is to use the fact that:F0(m) =Pr[W (1)< m]Pr[W (1)< m]+pidPr[W (1)> m],which follows from:Pr[W (1)<m]Pr[W (1)<m]+pidPr[W (1)>m]= pidF0(m)/cpidF0(m)/c+pid(1−F0(m))/c =F0(m)F0(m)+1−F0(m) = F0(m).This implies that the latent distribution of wages can be recovered under assumptions D1, MW1,and MW2. Note that the discontinuity in the observed distribution around the minimum wageidentifies the probability of non-compliance with the policy pid .12 This in turn allows us torecover pim, F0(m) and, consequently, the entire latent wage distribution. ,132.3.3 Minimum Wage Effects in a Dual EconomyThe Brazilian economy, similar to those of many other developing countries, is characterized bya large informal sector. In Brazil, an informal worker is defined as a worker whose worker’s carddoes not include a signed labor contract. Informality is thought to arise in developing countriesas a result of restrictive and costly labor laws. Note that once the worker’s card is signed, thecollection of labor taxes should follow and compliance with minimum wage and other laborstandards has to be assured. A natural question that arises in this context is the following: Whatis the role of the minimum wage in a economy with such a large informal sector? Note that thisis not a trivial question: A large fraction of contracts outside the “umbrella” of the labor lawsmay be a consequence of the minimum wage, meaning that many workers (intentionally or not)have moved to the informal sector as a consequence of the minimum wage policy. However, inprinciple, it could also be the case that the observed proportion of workers in the informal sectoris completely unrelated to the level of the minimum wage. Informality may instead depend onlabor taxes and other forms of labor regulation (hours worked, job safety standards and so forth)that have to be met regardless of where the worker is located in the wage distribution. These twoexplanations have markedly different policy implications but are in principle equally plausible12Each step after the identification of pid from the limit of the ratio of densities relies on the assumption thatthis probability is not a function of the wage. This feature contrasts with the parametric model of Meyer and Wise.By restricting the set of latent wage distributions, more flexibility can be introduced in the functional form of therelationship between the latent wage W (0) and the model parameter pid . This is the case because the shape of thelatent wage distribution can be recovered in the parametric setting using the information above the minimum wage.This allows us to identify not only a probability of non-compliance but also a function pid(w) that maps wages tonon-compliance probabilities. This function need not be constant with respect to latent wages.13Doyle’s modeless can be identified under the assumption that pid is a low-order polynomial of the latent wage.However, in this case, identification can only be achieved by using derivatives of the wage density at the minimumwage level.12explanations for the observed size of the informal sector. One of the goals of this paper is toassess the relative importance of these explanations.To do so, I generalize the models of Meyer and Wise (1983) and Doyle (2006) to the caseof a dual economy. I model the joint distribution of wages and sectors (latent and observed), asopposed to the marginal distribution of wages. This allows me to infer the size of the formalsector that would prevail in the absence of the minimum wage and compute the proportion ofworkers who move to the informal sector in response to the policy.Let worker i be characterized by a pair of wage (Wi(1)) and sector (Si(1)), which is equalto one if she is employed in the formal sector and zero otherwise. Compliance with minimumwage legislation is perfect in the formal sector but not in the informal sector. This effectivelymeans that the workers in the formal sector are not allowed to have wages below the minimumwage once the policy is introduced. If they remain employed in the presence of the policy, theymust either move to an informal contract or comply with the policy by receiving a wage equalto m. In addition, for each worker, define a pair (Wi(0),Si(0)) that denotes the counterfactual -or latent - wage and sector in the absence of the minimum wage. Finally, define F0(w) ( f0(w))as the c.d.f (p.d.f) of W (0) and F(w) ( f (w)) as the c.d.f (p.d.f) of observed wages (Wi(1) or, inshort notation, Wi). I will assume that the latent wage and sector distribution have the followingcharacteristics:Assumption 1 (Continuity). The density of latent wages and its first derivative are continuousat m. That is, limw→m+ f0(w) = limw→m− f0(w), and limw→m+ f ′0(w) = limw→m− f′0(w).Because this is a model of the joint distribution of sector and wages, we need to defineanother object, Pr[S(0) = 1|W (0) = w]:Assumption 2 (Conditional probability of (latent) sector given the wage). The conditional dis-tribution of latent sector given the latent wage belongs to a parametric family {Λ(w,β ) : β ∈B⊂Rk}. That is, Pr[S(0) = 1|W (0) = w] = Λ(w,β ) for some β0 ∈ B. Moreover, Pr[Λ(W (0),β0) 6=Λ(W (0),β ′)|W (0)> m]> 0 for all β ′ 6= β0.With the conditional probability of latent sector (given the wage) and the marginal distribu-tion of latent wages, we have completely specified the joint distribution of these variables.14 Therestrictive part of this assumption is that the conditional probability of the latent sector given la-tent wages can be described by a parametric model. The first part of the above assumption statesthat there is a parameter β for which the probability of the latent sector given the latent wagew is exactly equal to Λ(w,β ). The second part of the assumption ensures that there is only oneparameter for which this condition holds. Both assumptions are standard in models with binary14This joint distribution could come, for example, from a Roy-type model of sector choice, in which workerswould choose the sector that yields the highest utility. Another model would be one in which workers are assigned tofirms that, based on labor taxes and probability of punishment, decide whether they will employ formal or informalworkers.13outcomes. For concreteness, assume that the parametric model is a logit.15,16 The reason for theneed of a parametric model, as will become clear in the identification section, is that this modelinduces censoring in the probabilities of working in the formal sector for wages below the min-imum wage. This forces us to rely on extrapolation using values above the minimum to identifythe share of formal workers for low wages. The need for extrapolation excludes methods basedon a local approximation of the conditional mean function as an option.Assumption 3 (No spillovers). Workers whose latent wages would be above the minimum wageare not affected by the policy. That is, W (1) =W (0) and S(1) = S(0) when W (0)> m.This assumption is potentially strong. In the non-parametric framework, this assumption isalso untestable. However, it is straightforward to see that the model is still identified under anyknown and invertible spillover function. Moreover, bounds can be computed for the parameterswhen positive spillovers are assumed to exist and the researcher has no prior information ontheir size.17 Furthermore, spillovers can also be identified and estimated if one is willing toassume a parametric family for the latent wage distribution.To complete the model we need to define the minimum wage effects in the lower-tail of thewage distribution. As discussed by Meyer and Wise, workers in sectors operating in competitivemarkets whose wages would be below the minimum might become unemployed as a result ofthe minimum wage. If there is some bargaining involved in the wage determination, or if theemployers hold market power, some workers will “cluster” at the minimum as a result of thepolicy. Finally, because compliance with the minimum is imperfect in some markets, workersmight migrate from the formal to the informal sector to avoid unemployment. In terms of themodel, this leads to the following assumption:Assumption 4 (Minimum wage effects). For wages below the minimum wage (W (0)< m), wehave the following: If S(0) = 0, then S(1) = S(0). Moreover, with probability pi(0)d , the wagecontinues to be observed (W (1) =W (0)). With the complementary probability pi(0)m = 1−pi(0)d ,the worker earns the minimum wage (W (1) = m).18 If S(0) = 1, then with probability pi(1)d , thewage continues to be observed (W (1) =W (0)), meaning that the worker successfully transits15The logistic functional form is assumed only for clarity in the exposition. All identification results are preservedif the logistic functional form is replaced by another parametric form, such as a probit.16Importantly, one can make the model flexible by adding higher-order polynomials of wages (squares and cubes)as regressors in the logit to better adjust the curve. As long as the degree k of the polynomial is fixed with respect tothe sample size, that is, the model remains parametric, the identification results will hold.17See the Appendix A.4 for further discussion of this issue.18The reason for allowing pi(0)m to be greater than zero, that is, to allow workers in the informal sector to clusterat the minimum wage, is for the model to account for the empirical fact that they seem to do so. The informal sectorwage distribution presents a spike similar to the formal sector distribution at the minimum wage. The economic logicbehind this regularity is under debate. One hypothesis is that the minimum wage acts as a signal to the agents of afair price for unskilled labor, which might affect the way workers in the informal sector bargain with their employers.This feature seems to be related to the “self-enforcing” nature of the minimum wage.14from the formal sector to the informal sector.19 In this case, the observed sector will be S(1) = 0,being different from the latent sector. With probability pi(1)m , the worker earns the minimum wage(W (1) = m,S(1) = 1). With the complementary probability (pi(1)u = 1−pi(1)d −pi(1)m ), the workerbecomes unemployed (W (1) = ·,S(1) = ·).202.3.4 DiscussionOur goal is to recover the unknown parameters pi ≡ (pi(1)d ,pi(1)m ,pi(1)u ,pi(0)d ,pi(0)m )′ and the jointdistribution of latent sector and wages, that is, the joint density that would prevail in the absenceof the minimum wage. By comparing this distribution with the observed distribution, we canevaluate the impact of the minimum wage on expected wages, wage inequality, employment andother labor market outcomes. By defining the latent sector and the sector-specific parameters, abroader range of implications of the minimum wage becomes assessable, such as changes in taxrevenues and movements between sectors. In Sections 2.3.5 and 2.7.3, I will discuss in detailhow the minimum wage affects these outcomes.The assumptions used in this model are similar to the assumptions used previously in thisliterature. I maintain all assumptions from Doyle – or Meyer and Wise, if one prefers a para-metric specification for latent wages – and generalize their approach to address sector-specificresponses. Assumption 4, the assumption that defines the sector-specific effects of the minimumwage, implies the assumptions used by Meyer and Wise (1983) and Doyle (2006) concerning themarginal distribution of wages. That is, the marginal distribution of wages (which is obtainedafter integrating out the sector-specific wage distributions) will resemble the density of wagesthat appears in Meyer and Wise (1983) and Doyle (2006).Despite these similarities, there are numerous advantages of using a model for the jointdistribution of sector and wages. This is especially true for developing countries, where theinformal sector plays an important role in the economy. This model accommodates a varietyof responses of the economy to the minimum wage policy. The model allows for the standardunemployment effect. The model allows the minimum wage to have a “supporting” effect onthe lower tail of the wage distribution in such a way that the policy can affect average wagesand wage inequality. The model allows wages in the informal sector to be affected by theintroduction of the minimum wage – an effect captured by the parameter pi(0)m . This model alsoallows workers to move to the informal sector as a response to the minimum wage – this eventis captured by the parameter pi(1)d . Combined, these unemployment and sector mobility effects19The assumption that the wage remains exactly the same when the worker moves to the informal sector, that is(W (1) =W (0)), substantially simplifies the exposition. The same results hold when this assumption is replaced withone in which the worker draws a new wage from f0(w|S(0) = 1,W (0)< m).20To ease the exposition, I have assumed that pi(1)m and pi(1)u do not vary as a function of the latent wage. In thiscase in which they vary over the latent wages, the parameter recovered by assuming that they are constants is theexpectation of the distribution of pi(1)m and pi(1)u over the distribution of wages below the minimum. This result holdsonly as long as pi(1)d remains constant as a function of the wage.15allow the minimum wage to affect the relative size of the formal sector in the economy, whichin turn can affect labor tax revenues.A two-sector model helps to interpret the parameters identified in the previous work fromMeyer and Wise (1983) and Doyle (2006). Meyer and Wise discuss the possible reasons thatone would observe a non-zero density of wages below the minimum, such as uncovered jobsand non-compliance in covered sectors. Ultimately, however, Meyer and Wise’s model identifythe aggregate likelihood of non-compliance (pid). This parameter is the proportion of workerswho, following the introduction of the minimum wage, do not ultimately respond to the policy.An application of the law of iterated expectations shows that the parameter estimated in theirmodel is a weighted average of the sector-specific parameters, with weights given by the latentshares of the sectors in the economy. The parameter pid does not identify whether workers earnsub-minimum wages because they would already be working in non-covered sectors regardlessof the policy or because they moved there as a response to the policy. These two different storiesare implied by different values of the sector-specific parameters. However, they can imply theexact same value for pid . Moreover, any combination of the two is also equally likely whenone estimates only the aggregate or “unconditional” parameter pid . Thus, the sector mobilityparameter pi(1)d and the latent size of the uncovered sector Pr[S(0) = 1] are more economicallymeaningful than the aggregate parameters.2.3.5 Model AnalysisIn this section, I show that this model can capture a wide range of potential effects of the min-imum wage policy. To do so, I discuss the model’s implications for some objects of interest,such as the sector-specific wage densities and the conditional probability of formality given thewage. Given assumptions 2 to 4 above, there is a relationship between the latent and observedunconditional wage distributions. It is given by:f (w) =pid(w) f0(w)c if w < m∫ m pim(w) f0(w)c dw if w = mf0(w)c if w > m,, (2.3)where c ≡ 1− ∫ mpi(1)u Λ(w) f0(w)dw is a rescaling factor that ensures both densities integrateto one. This parameter has the interpretation as the ratio of employment in the presence of thepolicy to that in the absence of the policy. Regarding the relationship between the sector-specificparameters and the aggregate ones, we have:pid(w) ≡ Λ(w)pi(1)d +(1−Λ(w))pi(0)dpim(w) ≡ Λ(w)pi(1)m +(1−Λ(w))pi(0)mpiu(w) ≡ Λ(w)pi(1)u .16Figure 2.4: Dual-economy Model: Latent and Observed DensitiesThe parameters pid(w), pim(w) and piu(w) are weighted averages of the sector-specific parameterswith weights given by the relative shares of each sector in the latent distribution. They describethe unconditional probability of non-compliance, “clustering” at the minimum wage level andunemployment at a given value of the wage. These are the parameters estimated in the previousapproach employed by Meyer and Wise (1983) and Doyle (2006).21 Examining the sector-specific wage density, one can see that for the formal sector, we have:f (w|S(1) = 1) =0 if w < mpi(1)m F0(m|S(0)=1)c(1)if w = mf0(w|S(0)=1)c(1)if w > m.(2.4)For the informal sector, we have:f (w|S(1) = 0) =pid(w) f0(w|S(0)=0)(1−Λ(w))c(0) if w < mpi(0)m F0(m|s(0)=0)c(0)if w = mf0(w|S(0)=0)c(0)if w > m,(2.5)21Note that, here, they are allowed to be functions of w as long as the latent sizes of the sectors differ acrosswages and the model parameters differ across sectors.17Figure 2.5: Dual-economy Model: Latent and Observed Densitieswhere c(1)≡ 1−F0(w|S(0)= 1)(1−pi(1)m ) and c(0)≡ 1+pi(1)d∫ m Λ(w)1−Λ(w) f0(w|S(0)= 1)dw ensurethat both densities integrate to one. They have the interpretation of the ratio of employmentobserved in the sector to that in the absence of the policy. Figures 2.2, 2.4 and 2.5 display therelationship between the latent and observed densities for the aggregate wage distribution, forthe formal sector, and for the informal sector, respectively. The dual-economy model preservesthe same relationship between the latent and observed unconditional wage densities as in Meyerand Wise’s model. However, the dual-economy model presents heterogeneity in the responsesto the minimum wage across sectors. The formal sector wage density below the minimum wagevanishes, whereas in the informal sector, the density grows according to the inflow of workersfrom the formal sector. As a result, the density in the informal sector below the minimum wagecan, for some values of the model parameters, present a discontinuity at the minimum wage withthe “inverse” shape relative to that observed in the aggregate wage distribution.Regarding the conditional probability of working in the formal sector as a function of thewage, we have:Pr[S(1) = 1|W (1) = w] =0 if w < mpi(1)m∫ m f0(w)Λ(w)dw∫ m pim(w) f0(w)dw if w = mΛ(w) if w > m.18Figure 2.6: Dual-economy Model: Latent and Observed Conditional ProbabilitiesFigure 2.6 graphically displays the relationship between the latent and the observed proba-bilities of formality with respect to the wage. The model offers a sharp prediction concerningthe effect of the minimum wage on the conditional probability of the sector given the wage. Itstates that for values above the minimum wage, this probability is equal to the latent probability(Pr[S(0) = 1|W (0) > m] = Pr[S(1) = 1|W (1) > m]). It states that the probability of workingin the formal sector given the wage will be zero for values below the minimum wage. At theminimum wage level, it should be a particular constant (pi(1)m∫ mΛ(w) f0(w)dw∫ m pim(w) f0(w)dw ), which is likely dif-ferent from this function’s left and right limits. This result follows from the fact that workersare not able to maintain wages below m in the formal sector and the assumption of the absenceof spillovers above the minimum wage level.It is helpful to understand the implications of the model using limiting cases for the param-eter values. For example, if pi(1)d tends to zero, there is no mobility between sectors. In this case,the magnitude of the unemployment effect will be given by (1−pi(1)m )F0(m|S(0) = 1)Pr[S(0) =1], which means that unemployment will be higher when workers’ probability of clustering atthe minimum wage is lower, when the mass of workers for whom the minimum wage “bites” islarger, and when the size of the formal sector is larger. At the other extreme, when pi(1)d tends toone, all affected workers in the formal sector manage to find a job in the informal sector, which19also implies no unemployment effects from the minimum wage. The effects of the minimumwage on average wages are maximized in the case where pi(1)m and pi(0)m are equal one. In termsof market structures that could generate these values, pi(1)d tends to one if the economy can bedescribed by a simple two-sector model with imperfect compliance with the minimum wage andcostless sector mobility. pi(1)m tends to be higher if the economy primarily consists of employerswith monopsonistic power in the labor market, and pi(1)u tends to be higher if the labor marketoperates close to perfect competition and mobility to the informal sector is limited.2.4 IdentificationIt is not possible to directly use the techniques developed in Doyle (2006) in each sector sep-arately, as I have introduced movements between them. To identify the model, a different ap-proach must be used. Below, I state the main identification results of this paper, which concernthe identification of (a) the latent joint distribution of sector and wages; that is, the distributionthat would prevail in the absence of the minimum wage; (b) the vector of parameters pi thatgoverns how the minimum wage affects the economy; (c) the effects of the minimum wage onfunctionals of the distribution of sector and wages; and (d) the effects of the minimum wage onlabor tax revenues. In what follows, assume that the econometrician observes a random sampleof the pair {(Wi(1),Si(1))} of size N from a population of interest. I also assume the follow-ing easily verifiable technical conditions: the minimum wage m is set at a point with non-zerodensity, that is, f0(m)> 0, and Λ′(m;β ) 6= 0.Lemma 2.4.1 (Identification of sector-specific parameters). Under Assumptions 1, 2, 3, and 4,pi is identified. Proof: See Appendix 1.Lemma 2.4.2 (Identification of latent distributions). Under Assumptions 1, 2, 3 and 4, the latentjoint distribution of sector and wages is identified. Proof: See Appendix 1.Corollary 2.4.3 (Identification of the minimum wage treatment effects). Under Assumptions1, 2, 3 and 4, the effects of the minimum wage on functionals of the joint distribution of sectorand wages are identified. Examples of such functionals are the effects of the minimum wage onaverage wages, on the standard deviation of wages, on quantiles of the wage distribution, on thesize of the formal and informal sectors and on the average wages conditional on sectors.Corollary 2.4.4 (Identification of the minimum wage effects on labor tax revenues). UnderAssumptions 1, 2, 3, and 4 and assuming no tax revenues from the informal sector, the effectsof the minimum wage on labor tax revenues are identified. Identification of the effects of theminimum wage on labor tax revenues holds as long as the effects can be written as a functionalof the latent and observed wage distributions and the model parameters pi .2222See Section 2.7.3 for further discussion of this issue.20The key points that permit the identification are as follows: The shape of the relationshipbetween sector and wages for values above the minimum wage is preserved in the presence ofthe policy. This allows us to obtain estimates of the latent share of the formal sector for valuesbelow the minimum wage level by extrapolating the curve we observe in the upper part of thewage distribution.23 The identification of the latent wage density builds on the approach inDoyle (2006) in the sense that the probability of non-compliance with the policy is identifiedusing the ratio of the density of wages above and below the minimum wage level. To completethe identification, the sector-specific parameters are identified using the derivative of the wagedensity.24 This is feasible because the relationships obtained by looking at the derivative of thedensity equate the number of equations and the number of unknowns and by so they allow meto obtain a closed form solution for the structural sector-specific parameters.2.5 EstimationIn this section, I discuss how to estimate the model parameters and latent distributions usingnon-parametric kernel methods. In Chapter 3, I show how maximum likelihood estimation canbe performed once a parametric family is assumed for the distribution of latent wages. Thenon-parametric estimation strategy used here is local linear density estimators. The discussionhere will closely follow that of McCrary (2008) in the context of testing for manipulation of therunning variable in RD designs.2.5.1 Non-Parametric EstimationAs in Doyle (2006), the model can also be estimated without assuming that the latent wagedistribution belongs to a known parametric family. A crucial step in obtaining non-parametricestimates of the objects of interest, such as the model parameters and the counterfactual distribu-tions, involves the estimation of a ratio of one sided limits of the density at the minimum wage.The estimation of these quantities can be performed using non-parametric methods. Note thatbecause the density is discontinuous around the minimum wage, only observations below theminimum are informative for limw→m− f (w) (and similarly for the density above the minimum).23Note here the importance of the parametric assumption. Given the model, the relationship between latent sectorand wages can only be observed for values above m. If this function is specified non-parametrically, the latent shareof formal workers for values below m would essentially be unidentified. However, by relying on the parametricfunctional form, I can extrapolate the relationship observed above m to predict the latent share of workers that wouldprevail below the minimum wage in the absence of the policy. This is achieved by estimating the parameters of thefunction Λ(w) using wages above m and then using the estimated parameters for the prediction for all wages, bothabove and below m.24If latent sector and wages are independent, one need not resort to the derivative of the wage density at m. In thiscase, identification of the sector-specific parameters can be achieved by examining the distribution of wages giventhe sector. See Appendix 2 for a detailed discussion of this issue.21This implies that the estimators of these quantities will behave as if the minimum wage were aboundary point of the density, which has implications in terms of bias and variance.Therefore, it is advisable to use methods ensuring that the performance of the density esti-mator is satisfactory on points that are close to the support boundaries. I use local linear densityestimators, which have the same order of bias at the boundary as at interior points of the distri-bution. This estimator builds on the idea of local linear conditional mean estimators. It beginsby dividing the support of the density into a set of bins. Then, a “response variable” is definedas the bin counts of these disjoint intervals. After this process, one is left with a vector con-taining the “independent variable,” which are the bin centers, and a corresponding “dependentvariable,” the bin counts. Finally, standard local polynomial smoothing estimates are applied tothese constructed variables.25In Appendix 3, I formally describe how to non-parametrically perform the density estima-tion. For the remaining terms that need to be estimated, I will use the plug-in approach andreplace the unknown objects in the identification section with their consistent estimators. Thus:d̂(m) =f̂ (m−)f̂ (m+),where f̂ (m−) is the estimator of the density just below the minimum wage value using the locallinear density estimator. In addition, for the estimator of pi ′d(m), we can define:d̂′(m) =(f̂ ′(m−)f̂ ′(m+)− f̂ (m−)f̂ (m+))· f̂′(m+)f̂ (m+).To complete the process of recovering the structural parameters pi , one requires estimates ofΛ(m) and Λ′(m). These objects are the latent share of the formal sector and the change in it atm. For that purpose, one needs to estimate β . Given Assumptions 2 and 3, the estimator can bedefined as:β̂ ≡ argminβN∑i=1(Si−Λ(Wi;β ))21I{Wi > m}.Then, given that we estimated β , we can plug it into the function Λ(.) to obtain an estimateof Λ(m) and Λ′(m). They will be given by Λ̂(m) = Λ(m; β̂ ) and Λ̂′(m) = Λ′(m; β̂ ). Using theestimate Λ̂(m) of the latent share of the formal sector, we can define the plug-in estimator forthe parameters pi(1)d and pi(0)d :̂(0)d = d̂(m)− Λ̂(m)Λ̂′(m) · ̂′d(m)̂(1)d = [ d̂(m)− (1− Λ̂(m)) · ̂(0)d ] · Λ̂(m)−1.25See McCrary (2008) for a detailed discussion of this issue.22To complete the estimation, we first need to estimate c before we recover the latent wage density:ĉ = [∫ m f̂ (u)d̂(u)du+1− F̂(m)]−1.Then, the estimates of the latent wage distribution can be defined as:f̂0(w) ={f̂ (w)ĉd̂(w)if w < mf̂ (w)ĉ if w≥ m.The consistency of the ̂, β̂ and, consequently, Λ̂(w) and f̂ (w) follows directly from the iden-tification equations and the consistency of f̂ (w) and f̂ ′(w). Closed-form expressions for theasymptotic variances can be derived. However, I will rely on resampling methods to estimatethem in the empirical application.2.6 TestingThis research design allows us to perform partial tests of the validity of the model’s assumptions.This section I describe how these tests can be performed and their limitations.Assumption 1, the continuity of the latent wage distribution, can be verified by visual inspec-tion of the histogram and the kernel density estimates using different values for the bandwidth.Formally, this condition can be tested by performing a placebo test, that is, by checking whetherthere are differences between the left and right limits of the density estimates at wage pointsother than the minimum wage.Assumption 2 can be tested by comparing the fit of the parametric model with non-parametricsmoothing estimates. If Λ(w;β ) is correctly specified, for the true value of the parameters β0,we have: ∫ ∞m(Pr[S(0) = 1|W (0) = u]−Λ(u;β0))2 f0(u)du = 0.While this equation is in terms of latent variables, we can restate it using observables by relyingon Assumption 3. Thus, we have:I ≡∫ ∞m(Pr[S(1) = 1|W (1) = u]−Λ(u;β0))21I{u > m} f (u)du = 0,where β0 ≡ argminβ E[(Si(1)−Λ(Wi(1);β ))21I{Wi(1)> m}]. This condition is in terms ofquantities we can observe. Correctness of the specification of the model for Λ(W (0),β ) im-plies that I = 0. This is a integrated mean squared error type of condition that can be used forspecification testing (see Pagan and Ullah (1999)). The idea behind it is to compare the fit of aparametric model with the fit of a non-parametric model. This type of comparison can be used23to identify the proper functional form for the sector-wage relationship. This is relevant becausepart of the identification relies on extrapolating this conditional mean function to values belowthe minimum wage.26 It should be noted, however, that this is, at best, a partial test of the as-sumption. There are some deviations from the null for which this test does not have the powerto reject. To make this point clear, observe the following condition:∫ ∞−∞(Pr[S(0) = 1|W (0) = u]−Λ(u;β0))2 f0(u)du = 0.This condition is equivalent to the correctness of the specification of the parametric model forthe conditional probability of the latent sector given the wage. The crucial difference betweenthis condition and that used in the test above is that it can detect when the model is incorrectlyspecified for values below the minimum wage. Unfortunately, it is not possible to create a fea-sible version of this condition, as once we move from latent to observed wages, all informationon the conditional probability of latent sector given the wages is lost for values below the min-imum. In sum, it is conceivable that the parametric functional form holds for values abovethe minimum wage but fails to hold for values below it. This part of the assumption remainsuntestable.It is also possible to test Assumption 4. In Assumption 4, the probabilities that capture theeffects of the minimum wage are defined. A restriction imposed by that assumption is that theprobabilities of non-compliance (pi(1)d and pi(0)d ) are invariant across workers with different latentwages in the same sector.27 This is a restrictive assumption, as workers whose latent wage isclose to the minimum wage level could be more likely to comply with the policy than workerswhose latent wage is far from the minimum. To see why this assumption is testable, one mustfirst examine the second derivative of the observed wage density:f ′′(w) ={pi ′′n (w) f (w)c +2pi ′n(w) f ′(w)c +pid(w) f ′′(w)c if w < mf ′′(w)c if w > m.(2.6)If the continuity assumption on the latent wage distribution is strengthened up to the secondderivative, that is, if limw→m+ f ′′0 (w) = limw→m− f′′0 (w), then we have:limε→0+(c f ′′(m+ ε)− c f′′(m− ε)−pi ′′d (m) f0(m)−2pi ′d(m) f ′0(m)pid(m))= 0.26There are also parametric versions of these tests. For example, testing Assumption 2 in a parametric settingcan be achieved by increasing the order of the polynomial of the wage and testing the restriction that the higherorder terms are equal to zero. In the simplest case in which one has a linear logit of the sector given the wage, thecorrectness of the specification can be tested by estimating a model in which the square of the wage is added as aregressor and assessing whether the coefficient associated with the squared term is different from zero.27One can see that in aggregate, the likelihood of non-compliance pid(w) will be a function of latent wages dueto changes in the composition of each sector as we move along different wages.24Intuitively, we can test Assumption 4 because by examining the second derivative, we haveadded another equation while the number of parameters remained the same. This provides usthe overidentification condition that allows us to test the model.28,29,302.7 Empirical Application: The Effect of the Minimum Wage inBrazilFor my empirical application, I consider a stronger version of Assumption 2:Assumption 5. Independence:Pr[S(0) = 1|W (0) = w] = Λ ∀ w.This assumption implies that latent sector and wages are independent. Figure 2.7 displaysthe relationship between the latent and observed conditional probabilities of formality with re-spect to the wages under this assumption. This assumption is testable. Below, I provide evidencethat it is not violated in the context of the Brazilian labor market.Independence greatly simplifies the identification and estimation, as can be seen in Ap-pendix 2. Independence (and the absence of spillovers) allows me to identify the latent share ofthe formal sector by examining the observed share of the formal sector for wages that are abovethe minimum wage level. Moreover, it implies that the aggregate minimum wage probabilities(pid(w),pim(w),piu(w)) do not vary across wages even if the parameters differ across sectors. Thisis because the latent share of each sector becomes constant with respect to wages. This allowsme to identify the model parameters by simply examining the discontinuity in the aggregatewage distribution at m and the sector-specific wage distributions, so I will not need to rely onestimating the first derivative of the wage distribution at the boundary point, m.28This is easier to see in the case in which one assumes a linear probability model for Pr[S(0) = 1|W (0)]. In thisscenario, it is possible to find a closed-form solution for the model parameters using either the first or the secondderivative of the wage density. These different ways of identifying the parameters must yield the same result if themodel is correctly specified. However they do not coincide if the model is misspecified, that is, when the probabilitiesof non-compliance are functions of latent wages.29If one is willing to impose further smoothing conditions on the latent wage distribution, it is possible to identifythe model by imposing flexible conditions on the relationship between the parameters and the wages. For example,if one believes that (pi(1)d ,pi(0)d ) is appropriately described by a quadratic (cubic) function, then one needs to go up tothe third (fourth) derivative of the wage density to estimate the model parameters.30This condition is easier to test in the parametric version of the model. To do so, one simply needs to estimatea version of the model in which the probabilities of non-compliance pi(1)d and pid(0) are allowed to be a low-orderpolynomial of the latent wages and compute a likelihood ratio test that uses the baseline version of the model as acomparison. A rejection of the null indicates that the more general version is a better description of the economy,that is, the probabilities of non-compliance are indeed functions of latent wages.25Figure 2.7: Dual Economy Model: Latent and Observed Conditional Probabilitiesunder Independence26Table 2.1: Descriptive StatisticsIn Appendix 2, I describe how to identify the model under this condition. The estimationstrategy I use follows the same method as in the general form of the model. That is, I estimatethe density of wages at the boundary using local linear density estimators and use a plug-inmethod for the remaining objects. Namely, once I estimate the lateral limits of the density ofwages at m, I complete the estimation by replacing the objects in the identifying equations usingtheir respective sample counterparts. In the next sections, I describe the data and discuss theresults obtained when estimating this model for the Brazilian labor market.2.7.1 Data and Descriptive StatisticsTo evaluate the effects of the minimum wage on labor market outcomes, I used data for theperiod from 2001 to 2009 from the PNAD dataset. These data have been collected by the IBGE– which is a Portuguese acronym for “Brazilian Institute of Geography and Statistics” – since1967 and contain information on income, education, labor force participation, migration, healthand other socioeconomic characteristics of the Brazilian population. Workers who do not reportwages, those who work in the public sector and workers who are older than 60 years of ageor younger than 18 years of age were removed from the sample. The PNAD dataset includesinformation on the worker’s labor contract status, which was used to define formality.The variable of interest – the wage – is measured at the monthly level, which is the mostnatural unit in the Brazilian institutional context. A feature of the Brazilian labor market is thatwages are typically specified at the monthly level, the same unit of measure as the minimumwage. The labor contract also establishes the number of hours of work per day (typically 6 or 8hours).31 I will treat the wage reported in the survey as the contracted wage, so no adjustment for31At the end of the month the worker will receive a payment “pro rata” based on the actual number of days27hours need to be performed. As a result, wages below the minimum wage are not, in principle,a result of a “division bias”.The empirical strategy will assume also that the wage is measured without error. This isunquestionably a strong assumption. The observed wage distribution presents heaping at roundnumbers. I will show that the estimates of the parameters of the model are fairly robust to thepresence of heaping by using different values of the bandwidth in the density estimation.As mentioned above, all workers in Brazil carry an official document called “Carteira deTrabalho” (worker’s card). This document is signed by the employers in the formal act ofhiring. The lack of a formal signed labor contract means that the employer is not forced tocollect labor taxes or to comply with the minimum wage and other types of regulation. TheBrazilian economy is known to be characterized by a large informal sector. Tables 2.1, 2.2 and2.3 illustrate this fact and describe the main features of the data.32Figure 2.8 displays the empirical CDFs of the formal and informal sectors. A few interestingfacts can be noted: The empirical cumulative distribution of wages seems to have a spike at theminimum wage level in both sectors, and virtually no worker in the formal sector receives wagesbelow m. The same pattern appears on Figure 2.9, where I display the estimates of the wagedensity of the formal and informal sectors. Thus, informality is closely related to sub-minimumwages. However, these concepts are not equivalent, as a sizable fraction of informal workersearn wages above the minimum wage level.Table 2.2 shows that workers in the informal sector earn on average approximately 35%less than workers in the formal sector. In addition, in terms of the observable characteristics,workers in the informal sector are more likely to be male, nonwhite, less educated and young.Considering the likelihood of earning minimum and sub-minimum wages, Table 2.3 shows theheterogeneity of these probabilities across population subgroups. For example, white workersare 40% less likely to earn the minimum wage than are nonwhite workers. Workers with lessthan 5 years of education have an approximately 20% likelihood of earning the minimum wage,whereas the corresponding likelihood is only 5% for workers with more than 12 years of edu-cation. Regarding the geographic variation, workers in the South Region have a 6% probabilityof earning the minimum wage, whereas workers in the Northeast have an approximately 24%probability of earning the minimum wage. A similar heterogeneity pattern appears when weconsider the probability of earning sub-minimum wages.Table 2.4 shows that formality presents considerable heterogeneity across observable char-acteristics. It shows that the probability of formality is close to zero for workers with wagesbelow the minimum wage, as predicted by the dual-economy model. Also, it shows that theprobability of working in the formal sector is lower for low education groups, nonwhite, and inthe North and Northeast regions.he or she worked. This payment will present some small variation across months due to reasons such as holidays,absences, overtime pay and the like.32All estimates are computed using survey weights.28Table 2.2: Descriptive Statistics by Sector29Table 2.3: Minimum and Sub-minimum Wage Conditional Probabilities30Table 2.4: Formality and Informality Conditional Probabilities31Figure 2.8: Empirical CDFs32Figure 2.9: Wage DensitiesNote: Local linear density estimates using Silverman’s rule of thumb bandwidth.33Figure 2.10: Nominal Wages and Minimum Wage Evolution34Figure 2.11: Real Wages and Minimum Wage EvolutionThe history of the minimum wage in Brazil began during the Getulio Vargas government, onMay 1st, 1940. Initially, the minimum wage varied across regions to accommodate differencesin price levels across the country. Subsequently, in 1984, regional minimum wages were unifiedinto a single wage at the national level. The Constitution of 1988 prohibited the use of theminimum wage as a reference for wage bargaining for other categories of workers and contracts.The aim of this prohibition was to reduce the over-indexation of the economy, which was thoughtto be fueling inflation. The periodicity of changes in the minimum wage has been annual sincethe economy stabilized in 1994 (Lemos, 2009). Figures 2.10 and 2.11 depict the evolution ofaverage wages, minimum wage, and different quantiles of the wage distribution over the lastdecade.33Regarding Figures 2.10 and 2.11, the challenge of relying on time-series variation to identifythe effects of the minimum wage becomes clear, as there is nearly as much evidence in favor ofthe minimum wage affecting the 20th percentile as there is of it affecting the 80th percentile.3433Real wages displayed in Figure 2.11 were computed using the IPCA, a Portuguese acronym for “Nation-wideconsumer price index”. IPCA is the consumer price index used by the Central Bank in its inflation target system.34A similar point was made by Lee (2008) when analyzing U.S. data.35The correlation between minimum wage changes and changes in such high percentiles of thewage distribution is likely a reflection of the pro-cyclical nature of changes to the minimumwage. Given this, effects of the minimum wage on other objects such as average wages or lowerquantiles that are based on time-series variation should also be interpreted with caution.2.7.2 Main ResultsIn this section, I will discuss the results obtained after estimating the model for the Brazilianlabor market. The model is estimated (separately) for the years 2001 to 2009. As discussedin the estimation section, all objects in the model can be estimated by replacing the populationobject with its sample analog. The only exception to this is the density of wages at the boundary.To estimate this object, I use a local linear kernel estimator with a normal kernel and a bandwidthequal to eight times Silverman’s rule of thumb, which has been shown to be mean squared erroroptimal in Monte Carlo simulations.35 In the robustness section, I show that the estimates arenot sensitive to this choice by using McCrary’s automatic bandwidth selection rule.36Figure 2.12 shows a plot of the observed density of wages and its latent counterpart. Wecan see that, as a consequence of sizable unemployment effects, the observed density abovethe minimum wage is higher than the latent density. Due to both truncation at the minimumand unemployment, the observed density below the minimum wage is smaller than the latentdensity. The estimates of the model parameters used to construct this latent density are shownin Table 2.5. Note, also, that the latent wage distribution peaks in a point close to the minimumwage. This feature is probably explained by the minimum wage being set at a point closer to themedian wage in Brazil when compared to developed countries.In examining the point estimates and standard errors in Table 2.5, we see sizable estimatesof the unemployment effects of the minimum wage. This result is comparable to the estimates ofpiu obtained in other applications of this approach. Doyle, for example, found that approximately60% of young workers who would earn below the minimum became unemployed. A possibleexplanation for this regularity is that the reduced-form and panel data approach, such as the workof Card and Krueger (1994), estimates the effects of marginal changes in the minimum wage,whereas I estimate the effect of the minimum wage when compared to counterfactual scenariodefined as absence of it.37 High unemployment probabilities can generate a small marginal35A close inspection on the identification equations shows that the standard error of the structural parameters willtend to be very small, since the parameter estimates are constructed from a combination of simple sample proportions(and the ratio of the density at the boundary). The variance of the sample proportion (whether is the proportion ofworkers that earn the minimum wage or the proportion of formal workers above the minimum wage) will be boundedby 0.25N−1. Thus, the small standard errors reflect the high precision of the estimates of these probabilities in largesamples. These standard errors are correct only if all assumptions of the model are valid. Thus, it is important toevaluate the robustness of the results to deviations of the model baseline assumptions.36Note that the optimal bandwidth to estimate the ratio of the left and right limits of the density at the boundarycan be considerably larger than the optimal bandwidth to estimate the left or the right limit of the density.37In Chapter 3 I discuss the predictions of this Dual-economy model for marginal changes in the minimum wage.36Figure 2.12: Kernel Density EstimatesUnconditional Wage Distribution37Table 2.5: Model Parameter Estimates by Yeareffect of the minimum wage, depending on the size of the density around the minimum wageand the magnitude of the change in the minimum wage.38The evidence from Table 2.5 also suggests that sector mobility is limited. The estimates ofthe sector-mobility parameter (pi(1)d ) are approximately 10%, with a maximum of 22%. I discussin greater detail the implications of this result in Section 2.7.3.As seen in Figures 2.10 and 2.11, the period of 2001 to 2009 is characterized by an increasein the nominal and real value of the minimum wage. We should expect the estimates of the massof affected workers, F0(m), to reflect this feature of the data. Regarding Table 2.5, we observea close relationship between the minimum wage level and estimates of F0(m). The correlationcoefficient with the nominal value of the minimum wage is approximately 0.90.39The estimate of pi(0)m is larger than the estimate of pi(1)m . This is explained by two featuresof the data: First, the minimum wage “bite”, that is, the proportion of workers that earn theminimum wage, is similar across the two sectors. However, it is slightly larger in the informal38In Chapter 3 I estimate the effects of marginal changes in the minimum wage based on the parameter estimatesof the Dual-Economy model.39The results also suggests a high correlation over time between unemployment estimates (piu) and the minimumwage level. However, a causal interpretation of this relationship is difficult to establish.38sector (see Table 2.3). If there were no movements between sectors and no unemployment,the difference in the estimates of pi(1)m and pi(0)m would be proportional to this difference in theminimum wage bite across these sectors. However, the formal sector observed distributionpresents a rescaling factor that increases the density at every point, due to the movements ofworkers away from this sector. The informal sector presents a rescaling factor that reducesthe density at every point, due to the movements of workers into that sector. Thus, even ifthe observed bites were exactly the same, the mode model would rationalize this fact with alarger estimate of pim for the informal than the formal sector. Thus, the combination of differentrescaling factors and a larger bite for the informal sector indicate that pim must be substantiallylarger for the informal sector than it is for the formal sector.Table 2.6 shows how the minimum wage affects the shape of the (log-) wage distribution.Here, I compute the effects of the minimum wage on the usual measures of wage inequality,such as the standard deviation of log wages and the Gini coefficient. The estimates show thatthe minimum wage has a positive impact on average wages (conditional on employment). Themaximum difference is .39 log points in 2007, and the minimum is .18 in 2002. The minimumwage also reduces wage inequality, as measured by differences in quantiles, the standard de-viation, or the Gini coefficient. These estimates indicate the trade-off faced by policy makerswhen choosing the minimum wage level. On the one hand, there is a gain in terms of reducingwage inequality and increasing average wages. On the other hand, workers tend to have moredifficulty finding jobs.The Brazilian economy is characterized by considerable geographic variation in the sizeof the formal sector, as shown in Table 2.4. The size of the formal sector in the Southeastregion is approximately 0.77, whereas in the Northeast region the size of the formal sector isapproximately 0.66. Table 2.7 shows the model parameter estimates separately for the Southeastand the Northeast region. In the Northeast region the minimum wage “bites” at a much higherpoint in the wage distribution when compared to the Southeast. The latent size of the formalsector in the Southeast is 0.80. In the Northeast region, the latent size is 0.76. These regionsalso differ in their responses to the (same) minimum wage policy. In the Southeast we observe ahigh probability of unemployment (0.65). We also observe a low estimate of the sector mobilityparameter (0.04). In the Northeast, we observe a lower probability of unemployment (0.33),higher probability of non-compliance (0.33), and higher probability of moving to the informalsector (.26).40Regarding the informal sector parameters, pi(0)d and pi(0)m , I do not reject the null hypothesisthat the coefficients are the same across regions. This suggest that the differences we observein the joint distribution of sector and wages across these regions come from differences in their40The region where the latent size of the formal sector is higher also presented a higher likelihood of sectormobility. This result may also suggest that formal and informal sectors operate in most cases in distinct labormarkets, in the sense that they are located in different geographic regions or different industries. This could be oneexplanation for the small estimates of the likelihood of sector mobility found in the aggregate economy.39Table 2.6: Distributional Effects of the Minimum Wage40Table 2.7: The Geographic Heterogeneity of Minimum Wage Effectslatent distributions and differences in the formal sector’s response to the minimum wage. Adecomposition exercise based on the estimates from Table 2.7 show that approximately 65% ofthe differences in the observed size of the formal sector between the Northeast and the Southeastare a result of the minimum wage. The remaining 35% of the differences in the size of the formalsector across these regions are due to other economic factors that cause the Southeast to havea larger size of the formal sector beyond their differences in the minimum wage effects. Thisexercise indicates that the minimum wage affects a substantially larger proportion of workersin the Northeast economy, thereby inducing a larger inflow of workers to the informal sector inthat region.2.7.3 Tax Revenues and the Size of the Informal SectorA comparison of Tables 2.1 and 2.5 shows that the minimum wage reduces the share of theformal sector in the economy. This occurs through two different but related channels: First, theminimum wage reduces the size of the formal sector as long as the unemployment effects are41greater than zero, as has been found in Brazil. Second, the minimum wage increases the size ofthe informal sector through sector movements that are driven by the policy itself. These lattereffects were shown to be relatively small in this application. Overall, the share of the formalsector in the Brazilian economy is reduced by approximately 10% as a result of the minimumwage policy.41For this reason, the minimum wage indirectly affects the government budget.42 The mini-mum wage affects the shape of wage distribution, the relative size of the formal sector and thelikelihood of employment. Each of these effects has the potential to alter tax revenues.The goal of this section is to derive an estimate of these effects. I consider the effectson revenues from the INSS tax, which is the Brazilian labor tax. The INSS is collected tofund the social insurance system in Brazil, and the rate is 20% for companies included in theregular system of taxation and 12% for small companies that opt for the “simplified” system. Toestimate the effects, I will rely on the following assumption:Assumption 6. No Tax Revenues in the Informal SectorLet T (1) represent the tax revenues in the formal sector under the imposition of a minimumwage and T (0) in its absence.43 By definition, we have:T (1)≡N∑i=1τ(Wi(1))Wi(1)Si(1)T (0)≡N∑i=1τ(Wi(0))Wi(0)Si(0).The object of interest is the ratio between these two quantities. After some algebra, we have:R≡ T (1)T (0)=Pr[S(1) = 1]Pr[S(0) = 1]· c · E[τ(W (1))W (1)|S(1) = 1]E[τ(W (0))W (0)|S(0) = 1] .This expression is further simplified in the Brazilian case, where labor taxes are a constantfraction of wages. In this case, R is given by:R≡ T (1)T (0)=Pr[S(1) = 1]Pr[S(0) = 1]· c · E[W (1)|S(1) = 1]E[W (0)|S(0) = 1] . (2.7)41My estimates imply that the mass of workers at and below the minimum wage level is inconsistent with absenceof disemployment effects under smooth non-compliance probabilities and a continuous latent distribution of wages.The “missing” mass of workers at or below the minimum wage level is attributed in the model to unemploymenteffects of the policy. Similarly, high sector-mobility probabilities (pi(1)d ) are inconsistent with my estimate of thelatent share of the formal sector and the density of low-wages in the informal sector. That is, we do not observeenough small wages in the informal sector to justify larger sector mobility parameter estimates.42Here, I use the term “indirectly” because the minimum wage affects the government’s budget through thespending channel. This is due to the indexation of pensions to the minimum wage.43Note that I abuse notation here and use N to refer to the size of the population, not the size of the sample.42Table 2.8: Minimum Wage Effects on Labor Tax RevenuesThus, the effects on tax revenues can be decomposed into three components: compression ofthe formal sector, reduction in the size of the workforce through unemployment effects, andchange in expected wages in the formal sector.44 This equation shows that the tax effect of theminimum wage will depend on the relative magnitude of these effects.45I compute the tax effects of the minimum wage using a plug-in approach for the componentsof Equation 2.7 based on the model parameter estimates from Table 2.5. Table 2.8 displays theestimated effects. The minimum wage policy seems to generate sizable unemployment effectsand to reduce the size of the formal sector. These effects are large enough to compensate forthe increase in expected wages. Therefore, the minimum wage reduces the mass of wages in theformal sector, with a corresponding decline in labor tax revenues. The estimates range from 2%in 2001 to 15% in 2009.462.8 Testing the Underlying Assumptions and Robustness ChecksThis research design allows me to indirectly test some of model assumptions. First, I willindirectly test Assumption 5, the independence between latent sector and wage. This assumptionis testable in different ways. One way to test it is to consider the proportion of workers ineach sector as a function of wages. If the assumption holds, this proportion should not vary44The expression for R, the effect of the minimum wage on labor tax revenues, relies exclusively on Assumption6. It does not rely on the particular assumptions I used for the dual economy-model.45Note that the parameter R also answers a related question: Is the mass of wages, the sum of the wages of allworkers in the formal sector, higher under the minimum wage or in its absence? Because the tax rate τ is a constantfunction of the wages, the effects on tax revenues are proportional to the effects on the mass of wages.46As a sensitivity test, fixing all other parameters, the unemployment effect of the minimum wage in 2009 needsto be 39% smaller than my estimates for the minimum wage to have no effect on labor taxes. Similarly, the minimumwage effect on average wages needs to be underestimated by at least 15% for the minimum wage to have no impacton labor tax revenues. This evidence suggest that the model needs to be severely misspecified for the estimates ofthe direction of the effect to be wrong.43Figure 2.13: Formality vs. WagesNote: Conditional probabilities estimates based on a local-constant estimator using an Epanechnikov kernel and thestandard “rule of thumb” bandwidth.with the wage for wage values that are above the minimum.47 A naive regression of formalityon wages should mechanically detect a negative relationship because no worker in the formalsector can earn below the minimum wage. However, after restricting our attention to wagevalues well above the minimum, the relationship should disappear. Another related way to testthe assumption is to examine the estimated wage densities restricting the analysis to wage valuesabove the minimum. If the model is correct, differences in wage densities for values above theminimum between sectors will only be due to rescaling and movements between sectors. Thus,by conditioning on values above the minimum, the effects of rescaling and sector movementsshould have no effect, and the densities should be approximately the same.47See Figure 2.7.44Figure 2.14: Formality vs. Log-wagesNote: Conditional probability estimates based on a local-constant estimator using an Epanechnikov kernel. Bandwidth:0.03.45Figures 2.13, 2.14, 2.15, and 2.16 provide visual evidence of the accuracy of this assumptionwithin the Brazilian context. Above the minimum wage level, the proportion of workers in theformal sector of the economy does not seem to systematically vary with the wage. Figure 2.14shows that this is also true when we inspect the relationship between formality and log-wages.48This evidence supports the assumption that the underlying latent density of wages should bethe same between sectors. The plots of kernel density in Figure 2.15 and the empirical CDFsestimates in Figure 2.16 across formal and informal sectors point in the same direction: Workersin the formal and informal sectors apparently draw from similar distributions for wages abovethe minimum wage. This suggests that the differences between the overall distribution of wagesoccur as a result of the different ways in which the sectors respond to the minimum wage. Note,however, that the assumption required for identification is that the entire wage distribution bethe same across sectors. As discussed in the testing section, the presence of the minimum wageprevents me from testing this condition for values below m. Thus, it is still possible that the latentwage distributions are indeed equal conditional on wages above the minimum wage, while thisis not the case for values below it. This last part of the identifying assumption is untestable. Theevidence that the wage distributions are similar for values above the minimum wage seems toindicate that they may also be so for values below m in the absence of the policy. However, thisconclusion is subject to debate.Table 2.9 shows the estimates of the elasticity of formality with respect to the wage based ona linear probability model, using different restrictions on the sample. The relationship betweensector distribution and wages becomes substantially weaker after one conditions the regressionto only consider wages above the minimum. Regarding the coefficient while conditioning onhigher values, several estimates that are not different from zero were found.These results allow us to reinterpret the observed differences of worker’s demographic char-acteristics across sectors, as shown in Table 2.2. The higher proportion of nonwhite and lesseducated (and other such characteristics) seems not to be due to structural differences betweensectors beyond the way in which they respond to the minimum wage. It seems to be mainlya consequence of the fact that these workers have a higher probability of having a latent wagelower than the minimum, which makes it more likely for a worker in the informal sector tohave these characteristics. This can be observed by considering the differences in observablecharacteristics of the workers between sectors while conditioning on values above the minimumwage. Table 2.10 shows a sizable decrease in several estimates of the differences in workercharacteristics across sectors after conditioning on wages above the minimum.Another maintained assumption of the model is that the latent wage density is continuousaround the minimum. If the wage density is continuous, then our estimates should not revealany effect for values other than the minimum wage. Table 2.11 reports the estimates of pid forseveral values, all of which are different from the actual value of the minimum wage in the48Under Assumption 5 there should be no relationship between formality and any function of the wage above theminimum wage level.46Figure 2.15: Density of Wages by Sector above the Minimum WageNote: Kernel density estimates with boundary correction at the minimum wage. Bandwidth 1.8 times Silverman’s“rule of thumb”.47Figure 2.16: Empirical CDF by Sector above the Minimum Wage48Table 2.9: Formality vs. Wages - Linear Regression EstimatesTable 2.10: Descriptive Statistics by Sector: The Role of the Minimum Wagerespective year. If the continuity assumption holds, the estimate of pid should not be statisticallydifferent from one.As expected, the estimates fluctuate around one. However, we reject the null of no gap formost years. Discontinuities in the latent wage density could arise, for example, from “heap-ing” at round numbers. I discuss in Appendix A.4 the consequences of estimating the modelincorrectly assuming continuity for the latent wage distribution.49,5049I show that the estimate of the latent size of the formal sector does not rely on the continuity assumption.50The probabilities of non-compliance and “clustering” at the minimum wage will be underestimated if the latentwage distribution is discontinuous at the minimum wage level. The ratio between the true structural parameters andthe (probability limit of the) estimators will be given by the magnitude of the discontinuity in the latent density atthe minimum wage level. For example, adjusting the estimates for a discontinuity of 0.926 in the latent density49Table 2.11: Placebo Tests: Discontinuity Estimates using Minimum Wage Values ofOther YearsAs a robustness check, I investigate the sensitivity of my estimates to the choice of band-width and the presence of spillovers. A key parameter of the model, pid , is identified by theratio of the wage density above and below the minimum wage. In the baseline specification,the estimation was performed using local linear density estimators with the bandwidth equalto eight times Silverman’s rule of thumb. Table 2.12 shows the parameter estimates when pidis estimated using the automatic bandwidth selection procedure proposed by McCrary (2008).Comparing the results displayed in Table 2.5 and Table 2.12, we note that the point estimatesare different. The qualitative implications, however, remain similar.In the Appendix A.4 I discuss the identification of the effect of the minimum wage on thesize of the formal sector under the presence of (limited) spillovers. Identification of the latentsize of the formal sector can be achieved by assuming that spillovers vanish at a point higherup in the wage distribution. My spillover-robust estimates of the impact of the minimum wageon the size of the formal sector are approximately 14%. Thus, these estimates are higher thanthe baseline estimates from Table 2.5 that are obtained under the assumption of absence ofspillovers. This suggests that the -10% effect from the baseline estimate underestimates of thetrue effect of the minimum wage on the size of the formal sector if Assumption 3 is violated.increases the estimate of pid for the year 2001 from 0.202 to 0.218. Similarly, pim increases from 0.256 to 0.276.Thus, the qualitative implications of the estimates presented here will remain valid if the latent wage distributionpresents moderate-size discontinuities.50Table 2.12: Robustness - McCrary’s Density Discontinuity Estimator2.9 ConclusionThis paper develops a dual-economy model to analyze the effects of the minimum wage in acountry with a large informal sector. I discuss the conditions under which the effects of thepolicy are identified using only cross-sectional data on wages and sector (defined by formalitystatus) and the same level of the policy is applied to all workers. I show that the discontinuity ofthe wage density at the minimum wage level identifies the probability of non-compliance withthe policy, and the latent relationship between sector and wages can be recovered using dataon wages and sector above the minimum wage. I then show that the latent joint distributionof sector and wages can be identified based solely on data on sector and wages. This resultallows me to estimate the impact of the policy on a broad range of labor market outcomes suchas expected wages, unemployment, wage inequality, the size of the formal sector and labor taxrevenues.The main results are that the minimum wage significantly alters the shape of the lower partof the wage distribution and thereby reduces wage inequality. My estimates show that expectedwages increase by approximately 16% and the Gini coefficient decreases by approximately 24%.However, the minimum wage policy generates sizable unemployment effects and a reduction inthe size of the formal sector of the economy. My estimates imply a decrease of approximately10% in the size of the formal sector. This result is due to both unemployment effects on theformal sector and movements of workers from the formal sector to the informal sector as a51consequence of the policy. My estimates also indicate that the latent size of the formal sectoris approximately four times larger than the informal sector. Thus, small movements from theformal to the informal sector still induce a sizable change in the relative size of the informalsector. My estimates show that the minimum wage increases the size of the informal sector byapproximately 46%. Together, these effects imply a reduction in the tax revenues collected bythe government to support the social welfare system of approximately 10%.The research design based on the sharp contrast in the effects of the minimum wage betweenworkers on each side of the minimum wage value allows for indirect tests of the underlyingidentification assumptions of the model. The graphical and statistical evidence supports themaintained assumptions. The robustness checks performed produced similar results to those ofthe baseline estimator.There are, however, several limitations of this strategy. A fully structural model of workersand firms behavior is not specified. Thus, this approach does not recover deep parameters of theeconomy such as the elasticity of labor demand.51 An extended version of the dual-economymodel presented in this paper that fully incorporate optimizing behavior from the workers side,such as a Roy-model of sector choice, is the object of ongoing research.It could also be enlightening for future work to further investigate the heterogeneity of theimpacts of the minimum wage across population sub-groups. This extension is the subject ofongoing research.51As long as the underlying structure of the economy implies the assumptions of the dual-economy model usedin this paper, the estimates of the effects of the minimum wage should be similar.52Chapter 3Measuring the Effects of the MinimumWage on Employment, Formality andthe Wage Distribution: A StructuralEconometric Approach3.1 IntroductionThe second chapter of this thesis discussed the challenges faced by researchers attempting toestimate the effects of a minimum wage when a uniform minimum wage is set at the nationallevel. It also proposed a new approach, based on previous work by Meyer and Wise (1983)and Doyle (2006), that addresses this identification problem. Using this empirical strategy,the effects of the minimum wage in Brazil were estimated using cross-sectional data for the2001–2009 period. To avoid estimating the derivative of the wage density at a boundary point,an independence assumption was used. Relying on the assumption of independence betweensector and wage to estimate the effects of the policy is potentially a substantial disadvantage ofthis approach. Naturally, Chapter 2 devoted considerable attention to empirically verifying thevalidity of this assumption. The results seem to indicate that this assumption is not violated inthe context of the Brazilian economy. Nevertheless, it is useful to investigate how to performthe estimation without assuming independence.This chapter examines the effects of the minimum wage using a parametric assumption(log-normality) concerning the shape of the latent wage distribution. This assumption allows usto use maximum likelihood to estimate the parameters of the Dual-Economy model describedin Chapter 2. By imposing stricter assumptions on the shape of the latent wage distribution,it is possible to allow for greater flexibility in the joint distribution of sectors and wages.1,21An advantage of imposing a parametric functional form assumption on the latent wage distribution is that it alsoallows for the identification of spillover effects of the minimum wage. Although this approach will not be followedin the empirical section of this chapter, an overview is included in the appendix.2As will become clear in the estimation section, by imposing a parametric functional form for the latent wagedistribution, all wage data become informative of the model parameters. This feature sharply contrasts with thenon-parametric estimation strategy followed in Chapter 2, wherein only the wages around the minimum wage can beused to identify the probability of non-compliance.53The assumption that sector and wage are independent can easily be relaxed in the parametricframework. The independence assumption essentially forces the wage distributions in both theformal and informal sectors to be identical in the absence of a minimum wage policy. Thus,in this chapter, I investigate the extent to which the results obtained under independence holdwhen the model allows for a more flexible relationship between sectors and wages.3The results show that the independence assumption provides a good approximation of theBrazilian labor market, although the estimates based on the general form of the model sta-tistically reject the null hypothesis of independence. The magnitude of the estimates of thedependence parameter is small. The results also show that this model can capture the mostprominent features of the joint distribution of sectors and wages using a relatively small numberof parameters.A parametric model makes it possible to analyze policy-relevant issues such as the impactof marginal changes in the minimum wage on labor market outcomes. It also allows me toinvestigate the contribution of the minimum wage to the over time variation in the level offormality. Thus, I conclude the empirical section by providing estimates of the marginal effectsof the minimum wage and reduced-form evidence of the effects of the minimum wage on thesize of the formal sector based on the evolution of formality over time for different wage groups.This chapter is organized as follows: Section 3.2 discusses the model, its predictions and theestimation strategy.4 Section 3.3 presents the results obtained when this model is estimatedusing representative cross-sectional data on the Brazilian population at the national level for the2001–2009 period. Section 3.4 concludes and highlights some topics for further research.3.2 The ModelA worker i is characterized by a wage Wi(1) and a sector Si(1), which is equal to one if theworker is employed in the formal sector and zero otherwise. Compliance with minimum wagelegislation is perfect in the formal sector but not in the informal sector. This effectively meansthat workers in the formal sector are not allowed to receive wages that are below the minimumwage after the policy is introduced. If they remain employed in the presence of the policy, theymust either move to an informal contract or comply with the policy by obtaining a wage equal3The identification strategy employed in Chapter 2 assumes continuity in the shape of the latent wage den-sity. Estimation is performed using local linear non-parametric density estimators to recover the likelihood of non-compliance. These approaches to identification and estimation are natural generalizations of Doyle’s (2006) work tothe dual-economy case. Chapter 3, in contrast, relies on a parametric functional form for the shape of the latent wagedistribution. A functional form assumption concerning the shape of the latent density makes it possible to constructmaximum likelihood estimators of the parameters. The parametric approach used in this chapter is analogous to thatof Meyer and Wise (1983) but applied to the dual-economy context.4For the sake of completeness, I will restate all model assumptions. For a detailed discussion of the rationale forthese assumptions and the institutional background that justifies this approach, the reader is directed to Sections 2.1,2.2 and 2.3 of Chapter 2.54to m. In addition, for each worker, define a pair (Wi(0),Si(0)) denoting the counterfactual – orlatent – wage and sector in the absence of the minimum wage. Finally, define F0(w) ( f0(w))as the c.d.f (p.d.f) of W (0) and F(w) ( f (w)) as the c.d.f (p.d.f) of observed wages (Wi(1) or,using shorter notation, Wi). I will assume that the latent wage and sector distributions have thefollowing characteristics:Assumption 7 (Log Normality). The latent distribution of wages has a known parametric func-tional form with unknown parameters θ . Specifically, I will work with a log-normal latent wagedistribution in this application.This is a model of the joint distribution of sectors and wages; thus, we need to define anotherobject, the conditional probability of the latent sector given the wage (Pr[S(0) = 1|W (0) = w]):Assumption 8 (Conditional probability of the (latent) sector given the wage). The conditionaldistribution of the latent sector, given the latent wage, belongs to a parametric family {Λ(w,β ) :β ∈B⊂Rk}. That is, Pr[S(0)= 1|W (0)=w] =Λ(w,β ) for some β0 ∈B.Moreover, Pr[Λ(W (0),β0) 6=Λ(W (0),β ′)|W (0)> m]> 0 for all β ′ 6= β0.As explained in Chapter 2, the conditional probability of the latent sector given the wageand the marginal distribution of latent wages together specify the joint distribution of these vari-ables.5 The restrictive part of this assumption is that conditional probability of the latent sectorgiven latent wages can be described by a parametric model. The first part of the assumptionabove states that there is a parameter β for which the probability of the latent sector given thelatent wage w is exactly equal to Λ(w,β ). The second part of the assumption ensures that thereis only one parameter for which this condition holds. Both assumptions are standard in binaryoutcome models. For concreteness, assume that the parametric model is logit.6,7,8As long as the function Λ(.) is not constant for all wages, the latent wage distribution in theformal sector will differ from the latent distribution in the informal sector (see Figures 2.6 and2.7)). Thus, the main contribution of this chapter is to allow a greater degree of flexibility in theshape of the conditional probabilities of the latent sector as functions of wages at the cost of astronger assumption regarding the shape of the latent (marginal) distribution of wages.5This joint distribution could come, for example, from a Roy-type model of sector choice in which workerschoose the sector that yields the highest utility. Another model would be a model in which workers are assigned tofirms that decide whether they will employ workers formally or informally based on labor taxes and the probabilityof punishment.6The logistic functional form merely provides clarity in the exposition. All identification results are preserved ifthe logistic functional form is replaced by another parametric form such as probit.7One can make the model flexible by adding higher-order polynomials of wages (squares and cubes) as regres-sors in the logit model to adjust the curve. As long as the degree k of the polynomial is fixed with respect to thesample size, that is, the model remains parametric, the identification results will hold.8A parametric model is needed, as will become clear in the identification section, because for wages below theminimum wage, this model induces censoring in the probability of working in the formal sector. This forces us torely on extrapolation using values above the minimum to identify the share of formal workers earning low wages.The need for extrapolation excludes non-parametric methods as an option.55Assumption 9 (No spillovers). Workers whose latent wages would be above the minimum wageare not affected by the policy. That is, W (1) =W (0) and S(1) = S(0) when W (0)> m.This assumption is potentially strong. As discussed in Chapter 2, it is possible to pointidentify only a subset of model parameters under the assumption of independence betweensectors and wages, namely the effects of the minimum wage on the size of the formal sector.9In contrast, by assuming a parametric functional form for the latent wage distribution, all modelparameters can, in principle, be identified.10 Although this exercise will not be performed in theempirical application, the appendix to this chapter describes how one can identify and estimatethe spillover effects of the minimum wage in this model by assuming a parametric functionalform for the latent wage distribution.Assumption 10 (Minimum wage effects). For wages below the minimum wage (W (0)< m), wehave the following: If S(0) = 0, then S(1) = S(0). Additionally, with probability pi(0)d , the wage(W (1) =W (0)) continues to be observed. With the complementary probability pi(0)m = 1−pi(0)d ,the worker earns the minimum wage (W (1) = m). If S(0) = 1, then with probability pi(1)d , thewage (W (1)=W (0)) continues to be observed, meaning that the worker successfully transitionsfrom the formal to the informal sector.11 In this case, the observed sector will be S(1) = 0,which differs from the latent sector. With probability pi(1)m , the worker earns the minimum wage(W (1) = m,S(1) = 1). With the complementary probability (pi(1)u = 1−pi(1)d −pi(1)m ), the workerbecomes unemployed (W (1) = ·,S(1) = ·).12Our goal is to recover the unknown parameters pi ≡ (pi(1)d ,pi(1)m ,pi(1)u ,pi(0)d ,pi(0)m ) and the jointdistribution of latent sectors and wages – that is, the joint density that would prevail in theabsence of the minimum wage. By comparing this distribution with the observed one, we canevaluate the impact of the minimum wage on expected wages, wage inequality, employmentand other labor market outcomes. In section 2.3.5, I describe the implications of this model forsome objects of interest, such as employment, the size of the formal sector and the shape of thesector-specific wage distributions. In the next section, I discuss the implications of this modelfor marginal changes in the minimum wage, given the parameters.9If sector and wage are independent, this parameter can be identified under minimal assumptions concerning thepoint in the wage distribution at which spillovers from the minimum wage policy should vanish.10In the parametric version of the model, a parametric spillover function can operate throughout the wage distri-bution.11The assumption that the wage remains exactly the same when the worker moves to the informal sector, that is(W (1) =W (0)), substantially simplifies the exposition. The same results hold when this assumption is replaced withone in which the worker draws a new wage from f0(w|S(0) = 1,W (0)< m).12To ease the exposition, I assume that pi(1)m and pi(1)u do not vary as a function of the latent wage. In the case inwhich they vary over latent wages, the parameter recovered by assuming that they are constant is the expectation ofthe distribution of pi(1)m and pi(1)u over the distribution of wages below the minimum. Importantly, this result holdsonly as long as pi(1)d remains constant as a function of the wage. See the appendix on robustness for further discussionof this issue.563.2.1 Model AnalysisIt is straightforward to derive the effects of the minimum wage on certain outcomes of interestas a function of the objects from the latent state and the parameters of the model.13 The equa-tions below describe the effects of the minimum wage on expected wages, employment and therelative size of the formal sector.∂E[W (1)]∂m= Pr[W (1) = m]+piu(m) f0(m)c(E[W (1)]−m) (3.1)∂E[W (1)|S = 1]∂m= Pr[W (1) = m|S = 1]+ (1−pi(1)m ) f0(m|S(0) = 1)c(1)(E[W (1)|S(1) = 1]−m) (3.2)∂E[W (1)|S = 0]∂m= Pr[W (1) = m|S = 0]− pi(1)dc(0)Λ(m)1−Λ(m) f0(m|S(0) = 0)(E[W (1)|S(1) = 0]−m) (3.3)∂c∂m=− f0(m)piu(m) (3.4)∂c(1)∂m=− f0(m|S(0) = 1)(1−pi(1)m ) (3.5)∂c(0)∂m= f0(m|S(0) = 0) Λ(m)1−Λ(m)pi(1)d (3.6)∂ log(Pr[S(1) = 1]/Pr[S(1) = 0])∂m=− f0(m|S(0) = 1)(1−pi(1)m )c(1)−f0(m|S(0) = 0) Λ(m)1−Λ(m)pi(1)dc(0). (3.7)The effect of the minimum wage on the average wages of the employed has two key com-ponents. The first is the “bite”, that is, the proportion of workers who receive the minimumwage. This result is similar to that reported in Autor et al. (2010). The second componentconcerns unemployment effects. As long as the minimum wage is smaller than the expectationof the observed wages, unemployment will increase the perceived effect of the minimum wageon average wages for those who remain employed. This mechanical effect is due to the removalof certain observations at the left tail of the distribution, which contributes to increasing theaverage.This model predicts heterogeneous effects of the minimum wage across sectors. The marginaleffects of the minimum wage conditional on the sector are a function of the “bite” and the co-efficients that govern the movements into or out of that sector in response to the policy. Theterm multiplying (1−pi(1)m ) in the Equation 3.2 measures the effect that workers moving out offormality have on expected wages in the sector. The term multiplying pi(1)d in the Equation 3.3captures the effect that the entry of workers from the formal sector has on expected wages in theinformal sector.1413A key implicit assumption in this exercise is that the model parameters are stable in the neighborhood of theminimum wage. This assumption is not necessary to recover the latent wage distribution. It is, however, required tocompute the marginal effect of the minimum wage. This is a non-trivial requirement. Even if all model assumptionsare valid at two different minimum wage levels, the two levels may imply substantially different parameter values.14Assumptions 3 and 4 exclude general equilibrium effects. The entry of workers into the formal sector is57In contrast to the formal sector, the minimum wage has an ambiguous impact on averagewages in the informal sector. The minimum wage policy induces an inflow of low-wage work-ers from the formal sector to the informal sector – as captured by the parameter pi(1)d . Thismechanism, depending on the size of the model parameters, can be sufficient to induce an over-all reduction of average wages in that sector.15 In terms of the size of each sector, as long aspi(1)d or pi(1)u is greater than zero, the minimum wage will reduce the total number of workersemployed in the formal sector, and as long as pi(1)d is greater than zero, the opposite will be truefor the informal sector.163.2.2 EstimationGiven the parametric form of the model, the estimation can be performed using maximum like-lihood.17 Let Θ ≡ (θ ,β ,pi) be the entire vector of model parameters, that is, those governingthe latent distribution of wages, the conditional probability of sector given wages and minimumwage effects. Define the likelihood of observing a pair (w,s) given the minimum wage level mand model parameters Θ as:L(W (1) = w,S(1) = s|Θ) = Pr[S(1) = s|W (1) = w;Θ] f (w|Θ).Let ψ(Θ)≡ pi(1)m∫ m f0(w|θ)Λ(w|β )dw∫ m pim(w) f0(w|θ)dw . For the first term appearing in the log-likelihood, we have:log(Pr[S(1) = s|W = w;Θ]) =1I{w = m}(1I{s = 1} logψ(Θ)+1I{s = 0} log(1−ψ(Θ))+1I{w > m}(1I{s = 1} logΛ(w|β )+1I{s = 0} log(1−Λ(w|β ))).assumed not to change the wages of workers in the informal sector. Additionally, workers who move are assumed toretain the same wage that they previously received in the formal sector. This assumption can, in principle, be relaxedin the parametric version of the model.15A panel-data-based identification would typically identify the effect of the minimum wage on workers who wereclassified as informal sector workers at the beginning of the analyzed period, thus identifying E[W (1)−W (0)|S(0) =0], or in all periods, thus identifying E[W (1)−W (0)|S(0) = 0,S(1) = 0]. A repeated cross-section would, byconstruction, also compute the effect of the inflow of workers coming from the formal sector, thus identifyingE[W (1)|S(1) = 0]− E[W (0)|S(0) = 0]. Thus, the sector-mobility mechanism leads to a subtle difference in theobject estimated using repeated cross-sections versus panel-data variation regardless of the validity of the identifica-tion strategy.16Note that c is the ratio of employment observed under the minimum wage level m and the employment under thecounterfactual scenario defined as the absence of the minimum wage. Thus, ∂c∂m measures the effect of the minimumwage on the ratio of observed employment and the counterfactual employment under the absence of the minimumwage. To obtain the marginal effect of the minimum wage on the observed level of employment, one just needs todivide ∂c∂m by c. A similar argument holds for c(1) and c(0).17The identification of the parameters is guaranteed by the results in Chapter 2.58For the second term in the log-likelihood, we have:log( f (w)) =1I{w < m} log(pid(w) f0(w|θ))+1I{w = m} log(∫ mpim(u) f0(u|θ)du)+1I{w > m} log f0(w|θ)− logc(Θ).Given that log(L(W (1) = w,S(1) = s|Θ) = logPr[S(1) = s|W (1) = w;Θ]+ log f (W (1) = w|Θ),and we can define the maximum likelihood estimator of the model parameters as:Θ̂= argmaxΘ1NN∑ilogL(wi,si|Θ). (3.8)The numerical optimization of the likelihood function can be simplified using a three-stepprocedure. First, estimate the mean and variance of the latent wage distribution by consideringonly the values above the minimum wage:θ̂ = argmaxθ1NN∑i1I{wi > m} log f (wi|wi > m;θ).Then, estimate the conditional probability of the latent sector given the wages while also usingvalues above the minimum wage:β̂ = argmaxβ1NN∑i1I{wi > m} logPr[si|wi;β ].Finally, maximize the likelihood function over the subset of parameters that remains to be esti-mated pi using the full sample:̂= argmaxpi1NN∑ilogL(wi,si|θ̂ , β̂ ,pi).This procedure yields consistent estimates because the density of wages for values abovethe minimum is merely a function of a subset (θ ) of the parameter vector (Θ). The same holdsfor the conditional probability of sector given the wage for values above the minimum wage; itis a function only of β . In this case, estimation is simple: In the first step, one merely needs toestimate a Tobit regression of wages on a constant for values above the minimum. Then, in thesecond step, one needs to estimate a logit regression of sector on wages, using only values abovethe minimum wage, as before. Only in the last step is the entire likelihood function numericallyoptimized to recover pi . Efficiency can then be improved by using these estimates as initialvalues for the maximum likelihood estimator Θ̂.59Table 3.1: Descriptive StatisticsParameter 2001 2002 2003 2004 2005 2006 2007 2008 2009E[W ] 5.92 5.98 6.08 6.15 6.25 6.33 6.41 6.50 6.58Sd[W ] 0.73 0.73 0.71 0.70 0.67 0.65 0.63 0.63 0.60Skewness[W] 0.37 0.31 0.22 0.19 0.19 0.19 0.10 -0.03 0.03Kurtosis[W] 4.28 4.52 4.66 4.76 4.75 4.82 5.41 5.87 5.51Pr[W = m] 0.09 0.12 0.13 0.12 0.16 0.16 0.14 0.15 0.15Pr[W < m] 0.06 0.07 0.08 0.08 0.08 0.09 0.08 0.09 0.08q80[W ] 6.41 6.54 6.58 6.68 6.68 6.80 6.91 6.91 7.00q20[W ] 5.30 5.38 5.48 5.56 5.70 5.86 5.94 6.03 6.14E[W |S = 1] 6.07 6.13 6.23 6.31 6.39 6.47 6.54 6.62 6.69E[W |S = 0] 5.57 5.63 5.71 5.77 5.88 5.98 6.06 6.14 6.22Sd[W |S = 1] 0.66 0.65 0.62 0.61 0.58 0.56 0.54 0.54 0.51Sd[W |S = 0] 0.77 0.78 0.77 0.75 0.74 0.72 0.73 0.72 0.69q80[W |S = 1] 6.55 6.62 6.68 6.75 6.80 6.91 6.91 7.09 7.09q20[W |S = 1] 5.52 5.58 5.70 5.77 5.86 5.99 6.02 6.12 6.21q80[W |S = 0] 6.11 6.21 6.21 6.25 6.40 6.43 6.55 6.68 6.68q20[W |S = 0] 5.19 5.19 5.30 5.30 5.30 5.52 5.63 5.70 5.70Pr[S = 1] 0.70 0.70 0.71 0.71 0.72 0.72 0.74 0.74 0.76Pr[S = 1|W < m] 0.08 0.08 0.09 0.04 0.03 0.04 0.04 0.05 0.04Pr[S = 1|W = m] 0.58 0.53 0.63 0.63 0.62 0.68 0.72 0.73 0.77Pr[S = 1|W > m] 0.76 0.78 0.79 0.79 0.81 0.81 0.82 0.82 0.84m 5.19 5.30 5.48 5.56 5.70 5.86 5.94 6.03 6.14Note: Monthly wages in (log) R$.60Figure 3.1: Empirical CDFs by Sector3.3 Empirical Application: The Effects of the Minimum Wage inBrazil3.3.1 Descriptive StatisticsTo evaluate the impact of the minimum wage in Brazil, I use the PNAD household survey.18These data, which are representative of the Brazilian population, are collected yearly by theIBGE, a Brazilian statistical agency. As in the previous chapter, workers who do not reportwages, workers who work in the public sector and workers who are older than 60 years of ageor younger than 18 were removed from the sample. Additionally, workers who report monthlywages above R$5000 were removed from the sample, which excludes the upper 1.15% of thewage data.Table 3.1 and Figure 3.1 present some empirical facts concerning the joint distribution ofsectors and (log) wages for Brazilian data from the 2001–2009 period. Regarding Table 3.1, weobserve that expected wages are higher during periods of higher minimum wage levels. Wageinequality, as measured by the standard deviation of log-wages, is lower during periods of higherminimum wage levels. The observed log-wage distribution is asymmetric and presents higherkurtosis than what would be implied by normality. Wages in the formal sector are, on average,higher than wages in the informal sector. The informal sector comprises a large share of theaggregate economy – approximately 28% based on these data. The probability that a wage is18PNAD is an acronym for the Portuguese name of the survey, which can be translated as “Nationwide HouseholdSample Survey”. For further details regarding the data, the reader is directed to section 2.7.1.61Table 3.2: Parameter EstimatesParameter 2001 2002 2003 2004 2005 2006 2007 2008 2009Latent wage distribution:µ 5.7143∗∗∗ 5.7881∗∗∗ 5.8261∗∗∗ 5.9001∗∗∗ 5.9879∗∗∗ 5.9872∗∗∗ 6.0369∗∗∗ 6.2021∗∗∗ 6.2042∗∗∗(0.0084) (0.0121) (0.0098) (0.0090) (0.0095) (0.0099) (0.0069) (0.0099) (0.0066)log(σ) -0.1183∗∗∗ -0.1191∗∗∗ -0.1273∗∗∗ -0.1518∗∗∗ -0.1654∗∗∗ -0.1757∗∗∗ -0.1872∗∗∗ -0.2275∗∗∗ -0.2359∗∗∗(0.0056) (0.0075) (0.0060) (0.0061) (0.0060) (0.0057) (0.0052) (0.0064) (0.0046)Conditional probability of sector given the wage:β0 1.2149∗∗∗ 1.2683∗∗∗ 1.3365∗∗∗ 1.3728∗∗∗ 1.4859∗∗∗ 1.5059∗∗∗ 1.5785∗∗∗ 1.5971∗∗∗ 1.7139∗∗∗(0.0116) (0.0128) (0.0117) (0.0137) (0.0135) (0.0135) (0.0119) (0.0129) (0.0104)Note: Bootstrapped standard errors (computed using 100 replications) are given in parentheses.equal to the minimum wage (Pr[W = m]) ranges from 9 to 16%. The proportion of workerswho receive wages below the minimum wage Pr[W < m] ranges from approximately 6 to 9%.The informal sector wage distribution is stochastically dominated by that of the formal sector.The informal sector wage distribution presents higher inequality, as measured by the standarddeviation of log-wages, relative to the formal sector. The probability of working in the formalsector as a function of the wage is discontinuous. It is approximately 80% for values above theminimum wage, 65% at the minimum wage and virtually zero below it.3.3.2 ResultsIn this section, I will discuss the estimates of the model parameters for the Brazilian labormarket. The estimation was performed using maximum likelihood with initial values providedby the three-step quasi-maximum likelihood estimator described in the previous section. I willfirst describe the results obtained under the independence assumption, which are the parametric(maximum likelihood) analogs of the non-parametric density discontinuity results obtained inChapter 2. Then, I proceed to the general form of the model, where the joint distribution oflatent sectors and wages is allowed to have a greater degree of complexity.193.3.2.1 Parametric Model under Independence between (Latent) Sector and WageIn examining the point estimates and standard errors in Table 3.3, we observe sizable estimatesof the unemployment effects of the minimum wage. The results also indicate that the minimumwage affects wage setting in both the formal and informal sectors. The evidence from Table 3.3suggests that sector mobility is limited. The estimates of the sector-mobility parameter (pi(1)d )are approximately 6%, with a maximum estimate of 15%. The estimates of the proportion ofaffected workers present an upward trend over the analyzed period, which is expected because19In practice, the independence assumption is imposed by constraining β1, the slope coefficient in the logitequation for the conditional probability of the latent sector given the wage, to be zero. In the general form of themodel, β1 is unrestricted.62Table 3.3: Minimum Wage Effects ParametersParameter 2001 2002 2003 2004 2005 2006 2007 2008 2009pid 0.18∗∗∗ 0.19∗∗∗ 0.18∗∗∗ 0.18∗∗∗ 0.15∗∗∗ 0.14∗∗∗ 0.12∗∗∗ 0.15∗∗∗ 0.11∗∗∗(0.01) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)pim 0.22∗∗∗ 0.31∗∗∗ 0.25∗∗∗ 0.23∗∗∗ 0.29∗∗∗ 0.23∗∗∗ 0.18∗∗∗ 0.23∗∗∗ 0.19∗∗∗(0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.00) (0.01) (0.00)piu 0.60∗∗∗ 0.51∗∗∗ 0.58∗∗∗ 0.59∗∗∗ 0.55∗∗∗ 0.64∗∗∗ 0.70∗∗∗ 0.62∗∗∗ 0.71∗∗∗(0.01) (0.02) (0.01) (0.01) (0.01) (0.01) (0.00) (0.01) (0.00)F(m) 0.28∗∗∗ 0.29∗∗∗ 0.35∗∗∗ 0.35∗∗∗ 0.37∗∗∗ 0.44∗∗∗ 0.45∗∗∗ 0.41∗∗∗ 0.47∗∗∗(0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00) (0.01) (0.00)pi(1)d 0.05∗∗∗ 0.15∗∗∗ 0.08∗∗∗ 0.08∗∗∗ 0.10∗∗∗ 0.04∗∗∗ 0.00∗∗∗ 0.05∗∗∗ 0.00(0.01) (0.02) (0.01) (0.01) (0.01) (0.01) (0.00) (0.01) (0.00)pi(1)m 0.17∗∗∗ 0.20∗∗∗ 0.19∗∗∗ 0.18∗∗∗ 0.22∗∗∗ 0.19∗∗∗ 0.15∗∗∗ 0.20∗∗∗ 0.17∗∗∗(0.00) (0.01) (0.01) (0.00) (0.01) (0.00) (0.00) (0.01) (0.00)pi(1)u 0.78∗∗∗ 0.65∗∗∗ 0.73∗∗∗ 0.74∗∗∗ 0.68∗∗∗ 0.78∗∗∗ 0.85∗∗∗ 0.75∗∗∗ 0.83∗∗∗(0.01) (0.02) (0.01) (0.01) (0.01) (0.01) (0.00) (0.01) (0.00)pi(0)d 0.59∗∗∗ 0.32∗∗∗ 0.55∗∗∗ 0.57∗∗∗ 0.37∗∗∗ 0.59∗∗∗ 0.69∗∗∗ 0.62∗∗∗ 0.71∗∗∗(0.01) (0.04) (0.02) (0.01) (0.02) (0.01) (0.01) (0.01) (0.01)pi(0)m 0.41∗∗∗ 0.68∗∗∗ 0.45∗∗∗ 0.43∗∗∗ 0.63∗∗∗ 0.41∗∗∗ 0.31∗∗∗ 0.38∗∗∗ 0.29∗∗∗(0.01) (0.04) (0.02) (0.01) (0.02) (0.01) (0.01) (0.01) (0.01)Pr[S(0) = 1] 0.77∗∗∗ 0.78∗∗∗ 0.79∗∗∗ 0.80∗∗∗ 0.81∗∗∗ 0.82∗∗∗ 0.83∗∗∗ 0.83∗∗∗ 0.85∗∗∗(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)Note: Bootstrapped standard errors (computed using 100 replications) are given in parentheses.the period is characterized by an increase in the (nominal and real) value of the minimum wage.The correlation between Pr[W (0)< m] and the minimum wage level (m) is approximately 0.9.The latent size of the formal sector is approximately 80% of the economy (taking the year2004 as an example). This implies that the minimum wage reduces the size of the formal sectorby approximately 11%. The same algebra shows that the size of the informal sector increasedby approximately 45%, from approximately 20% to 29% of the economy. This larger effect inthe relative size of the informal sector is explained by the fact that this sector is approximatelyfour times smaller than the formal sector in the absence of the policy. These results are similarto those obtained when the model is estimated non-parametrically under the independence as-sumption in Chapter 2. This evidence suggests that normality is a good approximation for thedata generating process.3.3.2.2 Relaxing the Independence AssumptionTable 3.4 displays the parameter estimates based on the general form of the model. Based onthese estimates, I reject the null hypothesis of independence between latent sectors and wages.This can bee seen by the statistical significance of the estimate of β1. However, the slope of therelationship between sectors and wages is small.Comparing the structural parameter estimates from the restricted and general forms of themodel (presented in Table 3.3 and Table 3.5, respectively), we can see that the point estimates63Table 3.4: Transformed Parameter Estimates: General ModelParameter 2001 2002 2003 2004 2005 2006 2007 2008 2009Latent wage distribution:µ 5.7432∗∗∗ 5.8388∗∗∗ 5.8713∗∗∗ 5.9415∗∗∗ 6.0661∗∗∗ 6.0063∗∗∗ 6.0827∗∗∗ 6.2378∗∗∗ 6.2509∗∗∗(0.0095) (0.0098) (0.0089) (0.0081) (0.0120) (0.0104) (0.0087) (0.0111) (0.0095)log(σ) -0.1344∗∗∗ -0.1459∗∗∗ -0.1493∗∗∗ -0.1724∗∗∗ -0.2038∗∗∗ -0.1848∗∗∗ -0.2078∗∗∗ -0.2447∗∗∗ -0.2579∗∗∗(0.0071) (0.0076) (0.0067) (0.0055) (0.0077) (0.0061) (0.0060) (0.0066) (0.0063)Conditional probability of sector given the wage:β0 0.9066∗∗∗ 1.0692∗∗∗ 1.0441∗∗∗ 1.0554∗∗∗ 1.3348∗∗∗ 1.2543∗∗∗ 1.3652∗∗∗ 1.4237∗∗∗ 1.4659∗∗∗(0.0265) (0.0229) (0.0356) (0.0360) (0.0382) (0.0280) (0.0288) (0.0413) (0.0326)β1 0.0006∗∗∗ 0.0004∗∗∗ 0.0005∗∗∗ 0.0005∗∗∗ 0.0002∗∗∗ 0.0003∗∗∗ 0.0003∗∗∗ 0.0002∗∗∗ 0.0003∗∗∗(0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000)Note: Bootstrapped standard errors (computed using 100 replications) are given in parentheses.Table 3.5: Structural Parameter Estimates: General ModelParameter 2001 2002 2003 2004 2005 2006 2007 2008 2009pid 0.20∗∗∗ 0.22∗∗∗ 0.20∗∗∗ 0.21∗∗∗ 0.19∗∗∗ 0.14∗∗∗ 0.13∗∗∗ 0.16∗∗∗ 0.12∗∗∗(0.01) (0.01) (0.01) (0.01) (0.01) (0.00) (0.00) (0.01) (0.00)pim 0.24∗∗∗ 0.36∗∗∗ 0.28∗∗∗ 0.25∗∗∗ 0.37∗∗∗ 0.24∗∗∗ 0.20∗∗∗ 0.25∗∗∗ 0.21∗∗∗(0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.00) (0.01) (0.01)piu 0.56∗∗∗ 0.42∗∗∗ 0.52∗∗∗ 0.54∗∗∗ 0.44∗∗∗ 0.62∗∗∗ 0.67∗∗∗ 0.58∗∗∗ 0.67∗∗∗(0.01) (0.02) (0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.01)F(m) 0.27∗∗∗ 0.27∗∗∗ 0.33∗∗∗ 0.33∗∗∗ 0.33∗∗∗ 0.43∗∗∗ 0.43∗∗∗ 0.40∗∗∗ 0.44∗∗∗(0.00) (0.00) (0.00) (0.00) (0.01) (0.01) (0.00) (0.01) (0.01)pi(1)d 0.04∗∗∗ 0.21∗∗∗ 0.09∗∗∗ 0.08∗∗∗ 0.19∗∗∗ 0.01 0.00∗∗∗ 0.06∗∗∗ 0.00(0.01) (0.02) (0.01) (0.01) (0.02) (0.01) (0.00) (0.01) (0.00)pi(1)m 0.19∗∗∗ 0.23∗∗∗ 0.22∗∗∗ 0.20∗∗∗ 0.26∗∗∗ 0.20∗∗∗ 0.17∗∗∗ 0.22∗∗∗ 0.19∗∗∗(0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.00) (0.01) (0.01)pi(1)u 0.77∗∗∗ 0.56∗∗∗ 0.69∗∗∗ 0.71∗∗∗ 0.55∗∗∗ 0.79∗∗∗ 0.83∗∗∗ 0.72∗∗∗ 0.81∗∗∗(0.02) (0.02) (0.02) (0.01) (0.02) (0.01) (0.00) (0.01) (0.01)pi(0)d 0.62∗∗∗ 0.24∗∗∗ 0.56∗∗∗ 0.59∗∗∗ 0.21∗∗∗ 0.63∗∗∗ 0.69∗∗∗ 0.61∗∗∗ 0.71∗∗∗(0.01) (0.03) (0.02) (0.02) (0.03) (0.01) (0.01) (0.01) (0.01)pi(0)m 0.38∗∗∗ 0.76∗∗∗ 0.44∗∗∗ 0.41∗∗∗ 0.79∗∗∗ 0.37∗∗∗ 0.31∗∗∗ 0.39∗∗∗ 0.29∗∗∗(0.01) (0.03) (0.02) (0.02) (0.03) (0.01) (0.01) (0.01) (0.01)Pr[S(0) = 1] 0.76∗∗∗ 0.78∗∗∗ 0.78∗∗∗ 0.78∗∗∗ 0.81∗∗∗ 0.80∗∗∗ 0.82∗∗∗ 0.83∗∗∗ 0.84∗∗∗(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)Note: Bootstrapped standard errors (computed using 100 replications) are given in parentheses.64Figure 3.2: Conditional Probability of the Sector Given the WageNote: Observed conditional probabilities based on a non-parametric local constant (Nadaraya-Watson) estimatorusing a gaussian kernel (bandwidth = R$30). Year 2004.are similar across these specifications. The qualitative implications of both models are similar.20Figure 3.2 displays the observed and latent conditional probability of formality with respect towage. We can graphically observe the small elasticity of (latent) formality with respect to thewage around the minimum wage level from the apparent horizontal shape of this curve. Thisflatness contrasts with the steep slope of the observed conditional probability of the sector giventhe wage, which discontinuously jumps at the minimum wage level.Figures 3.3 and 3.4 display the observed and latent densities for the formal and informalsectors based on the (general form of the) model parameter estimates for the year 2004. Thelatent wage distribution tends to be above the observed distribution for the formal sector forvalues above the minimum wage. This is a consequence of workers moving away from theformal sector (into either unemployment or informal employment). The sector-mobility channelincreases the measured density above the minimum wage due to a rescaling effect. The informalsector, as predicted by the model, behaves in the opposite way: The observed density tends to bebelow the latent density for values above the minimum wage. This result is due to the inflow ofworkers from the formal sector, which induces a rescaling of the density and reduces its valuesabove the minimum wage. Figure 3.5 shows the estimates of the latent wage distributions forthe formal and informal sectors. We can see that the informal sector wage distribution tends tohave higher density for low wages relative to the formal sector. This follows from the positiveestimated slope coefficient on the relationship between latent sectors and wages (β1).Figure 3.6 shows kernel density estimates of the wage distributions in the formal and infor-20This result should not be surprising, as the general form of the model generalizes an assumption that appearsto be approximately true in the data based on the tests performed in Chapter 2.65Figure 3.3: Formal SectorNote: Density estimates using a gaussian kernel (bandwidth = R$30). Year 2004.Figure 3.4: Informal SectorNote: Density estimates using a gaussian kernel (bandwidth = R$30). Year 2004.66Figure 3.5: Latent DensitiesNote: Year 2004.Figure 3.6: Observed DensitiesNote: Density estimates using a gaussian kernel (bandwidth = R$30). Year 2004.67Table 3.6: Marginal EffectsParameter Expression 2001 2002 2003 2004 2005 2006 2007 2008 2009Average Wage:Aggregate ∂E[W (1)]∂m 13.84∗∗∗ 10.35∗∗∗ 12.14∗∗∗ 12.25∗∗∗ 9.89∗∗∗ 13.31∗∗∗ 13.67∗∗∗ 11.78∗∗∗ 13.20∗∗∗(0.36) (0.38) (0.29) (0.26) (0.35) (0.25) (0.12) (0.26) (0.14)Formal Sector ∂E[W (1)|S(1)=1]∂m 21.33∗∗∗ 16.74∗∗∗ 20.29∗∗∗ 20.15∗∗∗ 16.94∗∗∗ 23.54∗∗∗ 23.36∗∗∗ 20.47∗∗∗ 22.72∗∗∗(0.49) (0.54) (0.45) (0.40) (0.58) (0.47) (0.22) (0.41) (0.23)Informal Sector ∂E[W (1)|S(1)=0]∂m 1.71∗∗∗ -0.29 1.65∗∗∗ 1.49∗∗∗ 0.85∗∗∗ 3.41∗∗∗ 2.95∗∗∗ 2.06∗∗∗ 3.01∗∗∗(0.26) (0.31) (0.22) (0.20) (0.24) (0.16) (0.00) (0.14) (0.00)Employment:Aggregate ∂c∂m -0.02∗∗∗ -0.02∗∗∗ -0.02∗∗∗ -0.02∗∗∗ -0.01∗∗∗ -0.02∗∗∗ -0.02∗∗∗ -0.01∗∗∗ -0.01∗∗∗(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)Formal Sector ∂c(1)∂m -0.03∗∗∗ -0.03∗∗∗ -0.03∗∗∗ -0.03∗∗∗ -0.02∗∗∗ -0.02∗∗∗ -0.02∗∗∗ -0.02∗∗∗ -0.02∗∗∗(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)Informal Sector ∂c(0)∂m 0.00∗∗∗ 0.02∗∗∗ 0.01∗∗∗ 0.01∗∗∗ 0.02∗∗∗ 0.00 0.00∗∗∗ 0.00∗∗∗ 0.00(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)Relative size (∆%) ∂ log(Pr[S(1)=1]/Pr[S(1)=0])∂m -0.12∗∗∗ -0.20∗∗∗ -0.14∗∗∗ -0.14∗∗∗ -0.21∗∗∗ -0.10∗∗∗ -0.10∗∗∗ -0.13∗∗∗ -0.10∗∗∗(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)Note: Marginal effect estimates multiplied by a typical change (R$20.00) in the minimum wage. Bootstrapped standard errors (computed using 100 replications) are given inparentheses.mal sectors for the year 2004. We observe substantial differences between the observed wagedistributions in the formal and informal sectors. The formal sector wage distribution presentsalmost no density below the minimum wage level, whereas the informal sector exhibits consid-erable mass in that range. Above the minimum wage level, the formal sector density tends to behigher than the informal sector density. By comparing these observed differences between thewage distributions of the formal and informal sectors with the latent distributions that we ob-serve in figure 3.5, we conclude that most of the observed differences in the wage distributionsof the formal and informal sectors are due to different responses to the minimum wage policyacross sectors. I will discuss this issue in detail in Section 3.3.2.5.3.3.2.3 Marginal Effects of the Minimum WageThe structural parameters of the model characterize the economy in the complete absence of theminimum wage. Arguably, the effects of changes in the minimum wage level are parameters ofgreater policy relevance. Thus, in this section I compute the effects of changes in the minimumwage level. To do so, I use a plug-in approach for the terms in Equations 3.1 to 3.7.Table 3.6 displays the estimated effects of changes in the minimum wage level implied bythe estimates of the structural parameters. I computed the effects of the minimum wage onaverage wages, on employment, and on the relative size of the formal sector. To place thenumbers in perspective, I multiply the marginal effects obtained by the typical change in thereal value of the minimum wage observed in the analyzed period (R$20, or approximately 6%increase).2121Table 3.6 reports the marginal effects of the minimum wage multiplied by a factor of 20. These estimates68The estimates show that the minimum wage increases wages for the aggregate economy. Theestimated effect is approximately R$12, or 60% of the change in the minimum wage. This effectis driven by the increase in the wage of low-wage workers and a decrease in the proportion oflow wage workers due to unemployment effects. The estimated effects on average wages show alarger effect in the formal sector, of approximately R$20. That is, the effect on average wages inthe formal sector is approximately one-to-one.22 This result is a consequence of full compliancewith the minimum wage policy in the formal sector. The probability of a worker “cluster” atthe minimum wage is higher in the aggregate economy than it is in the formal sector. Thus,the force that drives a larger minimum wage effect on the formal sector versus the aggregateeconomy is the substantial rescaling effect from the lost density at low wages.The results indicate a small positive effect of the minimum wage on wages in the informalsector. My estimates suggest an effect of approximately R$2, or 10% of the change in theminimum wage. The estimates of the probability of compliance with the policy (pi(0)m ) suggestthat a substantial fraction of workers in the informal sector receive higher wages after a changein the minimum wage. The small estimated effects of the minimum wage on average wagesin the informal sector result from the inflow of low-wage workers to the informal sector. Thischannel decreases the perceived effect of the policy in the informal sector.Averaging the estimates across the period considered, the results suggest an approximately1.45% decrease in employment following a typical (and exogenous) change in the minimumwage level.23 This result highlights that high unemployment probabilities, like the estimatesobtained in Chapter 2, in Meyer and Wise (1983), and in Doyle (2006), do not necessarily meanthat marginal changes in the minimum wage would generate sizable employment effects. AsEquation 3.4 shows, the density of wages at the minimum wage level – or, in other words, theproportion of affected workers at the margin – is an important determinant of marginal effect ofthe minimum wage on the employment level.The decrease in the size of the formal sector is larger, approximately 1.9%. The estimatesshow that the informal sector experiences a 0.6% increase in size (this effect is not statisticallydifferent from zero for years 2007 and 2009). In terms of the relative size of the formal sector,the estimates suggest a decrease of approximately 13 %. To place this number in perspective,the size of the formal sector in 2004 is 0.71. This estimate suggests that an exogenous increase(of R$20,00) in the minimum wage level would induce the formal sector to decrease to 0.67,provide an approximation of the effect of a discrete change in the minimum wage.22This result is not due to increases in wages higher up in the wage distribution. Rather, this effect is due to a“rescaling” spillover induced by the unemployment effect of the minimum wage. By reducing the density of lowwages through unemployment, the minimum wage increases the average wage in the formal sector. See Section 3.2.1for a detailed discussion of this issue.23This should be interpreted as the change on the ratio of employment under the minimum wage versus in theabsence of the minimum wage following the change in the minimum wage level. This is a somewhat uninterestingcounterfactual. To obtain the percentage change on employment when compared to the initial minimum wage level,one needs to multiply the estimated coefficient by c. Given the parameter estimates, this would involve multiplyingthe coefficient by approximately 1.25.69that is, to decrease by 4 percentage points. This effect takes into account the outflow of workersfrom the formal sector, the inflow of workers to the informal sector, and the decrease in the sizeof the formal sector due to unemployment.3.3.2.4 Model FitTable 3.7 presents a comparison between certain moments of the data and those implied bythe model parameters. In examining this table, we can see that this model, even under theindependence assumption, can capture most of the features of the joint distribution of sectorsand wages described previously. It predicts the discontinuous shape of Pr[S = 1|w] observedin the data. It fits the probabilities of observing wages at and below the minimum wage leveland explains the differences observed between the formal and informal sector distributions.Interestingly, the model can match higher moments of the wage distribution, such as skewnessand kurtosis. This need not be the case in general, especially if the parametric family for thewage distribution is misspecified.Figure 3.10 shows the observed and predicted conditional probabilities of formality giventhe wage. Figures 3.7, 3.8 and 3.9 display the predicted and observed densities of the aggregatewage distribution and the formal and informal sector distributions. Similarly, Figures 3.11, 3.12and 3.13 graphically display the fit of the model for the cumulative distribution functions of theunconditional and conditional wage distributions, all using the general form of the model. Byexamining these figures, we again see that the model matches most of the prominent featuresof the data, except the “heaping” observed at round numbers. It is nevertheless interesting tonote the resemblance between the predicted and observed curves in the empirical cumulativedistribution for the formal and informal sectors, particularly at and below the minimum wagelevel.70Figure 3.7: Model FitAggregate (Unconditional) Wage DistributionNote: Density estimates based on a gaussian kernel (bandwidth = R$30). Year 2004.Figure 3.8: Model FitFormal SectorNote: Density estimates based on a gaussian kernel (bandwidth = R$30). Year 2004.71Figure 3.9: Model FitInformal SectorNote: Density estimates based on a gaussian kernel (bandwidth = R$30). Year 2004.72Table 3.7: Model FitParameter 2001 2002 2003 2004 2005 2006 2007 2008 2009Observed Predicted Observed Predicted Observed Predicted Observed Predicted Observed Predicted Observed Predicted Observed Predicted Observed Predicted Observed PredictedUnder independence between sector and wagesE[W ] 502.03 520.27 531.82 550.67 574.89 594.00 611.17 628.34 659.54 679.50 705.07 719.42 754.67 773.14 815.75 835.69 869.99 887.85Sd[W ] 516.65 467.29 543.06 492.57 551.01 505.53 567.00 517.37 580.69 536.01 590.97 536.76 605.91 555.54 630.09 583.56 634.83 587.97Skewness[W] 3.59 3.21 3.46 3.25 3.29 3.12 3.21 3.00 3.07 3.00 2.97 2.92 2.81 2.78 2.64 2.65 2.57 2.63Kurtosis[W] 19.82 17.51 18.23 17.49 16.89 16.21 16.15 15.22 14.96 14.75 14.16 14.22 12.89 13.13 11.68 11.96 11.30 11.90Pr[W = m] 0.09 0.08 0.12 0.10 0.13 0.11 0.12 0.10 0.16 0.14 0.16 0.14 0.14 0.12 0.15 0.13 0.15 0.13Pr[W < m] 0.06 0.06 0.07 0.06 0.08 0.08 0.08 0.08 0.08 0.07 0.09 0.08 0.08 0.08 0.09 0.08 0.08 0.08q20[W ] 200.00 244.38 218.00 267.23 240.00 301.61 260.00 321.69 300.00 374.15 350.00 422.09 380.00 452.21 415.00 492.37 465.00 554.14E[W |S = 1] 560.76 543.09 595.14 580.68 641.40 620.25 683.67 657.67 729.06 701.47 776.67 734.96 823.05 788.92 890.53 847.00 942.11 888.37E[W |S = 0] 362.96 417.61 386.62 409.96 413.02 440.01 433.39 463.42 480.05 470.23 518.65 499.67 560.71 548.77 603.82 589.94 638.17 623.98Sd[W |S = 1] 546.56 473.62 572.75 503.43 580.69 513.23 595.35 524.03 601.61 546.17 608.70 543.60 623.10 558.98 645.77 587.09 648.77 590.30Sd[W |S = 0] 405.13 446.34 434.38 461.56 429.99 476.44 442.91 487.68 478.16 503.43 495.77 508.82 506.31 531.30 528.75 560.20 524.49 570.52q20[W |S = 1] 250.00 264.06 264.00 287.63 300.00 331.73 320.00 361.85 351.00 404.42 400.00 462.25 412.00 492.37 455.00 542.57 500.00 595.02q20[W |S = 0] 180.00 175.50 180.00 185.62 200.00 191.16 200.00 201.20 200.00 243.00 250.00 251.41 280.00 261.45 300.00 291.57 300.00 319.06Pr[S = 1] 0.70 0.71 0.70 0.70 0.71 0.71 0.71 0.72 0.72 0.73 0.72 0.73 0.74 0.75 0.74 0.75 0.76 0.77Pr[S = 1|W < m] 0.08 0.00 0.08 0.00 0.09 0.00 0.04 0.00 0.03 0.00 0.04 0.00 0.04 0.00 0.05 0.00 0.04 0.00Pr[S = 1|W = m] 0.58 0.58 0.53 0.51 0.63 0.62 0.63 0.62 0.62 0.60 0.68 0.67 0.72 0.70 0.73 0.73 0.77 0.76Pr[S = 1|W > m] 0.76 0.77 0.78 0.78 0.79 0.79 0.79 0.80 0.81 0.81 0.81 0.82 0.82 0.83 0.82 0.83 0.84 0.84General ModelE[W ] 502.03 520.12 531.82 550.86 574.89 594.30 611.17 628.66 659.54 680.72 705.07 719.61 754.67 774.09 815.75 836.67 869.99 889.00Sd[W ] 516.65 464.06 543.06 487.55 551.01 502.67 567.00 514.75 580.69 531.65 590.97 535.81 605.91 554.02 630.09 582.30 634.83 586.75Skewness[W] 3.59 3.19 3.46 3.23 3.29 3.09 3.21 2.98 3.07 2.97 2.97 2.91 2.81 2.76 2.64 2.64 2.57 2.61Kurtosis[W] 19.82 17.41 18.23 17.35 16.89 16.05 16.15 15.07 14.96 14.54 14.16 14.16 12.89 13.01 11.68 11.88 11.30 11.78Pr[W = m] 0.09 0.08 0.12 0.11 0.13 0.11 0.12 0.10 0.16 0.14 0.16 0.14 0.14 0.12 0.15 0.13 0.15 0.13Pr[W < m] 0.06 0.06 0.07 0.07 0.08 0.08 0.08 0.08 0.08 0.08 0.09 0.08 0.08 0.08 0.09 0.08 0.08 0.08q20[W ] 200.00 244.38 218.00 267.23 240.00 301.61 260.00 321.69 300.00 374.15 350.00 422.09 380.00 452.21 415.00 492.37 465.00 554.14E[W |S = 1] 560.76 567.30 595.14 601.66 641.40 643.80 683.67 682.01 729.06 720.95 776.67 749.83 823.05 802.98 890.53 860.68 942.11 904.45E[W |S = 0] 362.96 357.95 386.62 360.72 413.02 384.52 433.39 404.88 480.05 425.45 518.65 461.25 560.71 510.32 603.82 549.57 638.17 573.80Sd[W |S = 1] 546.56 498.12 572.75 520.83 580.69 534.73 595.35 545.62 601.61 556.42 608.70 559.09 623.10 571.16 645.77 597.99 648.77 602.24Sd[W |S = 0] 405.13 335.61 434.38 375.07 429.99 373.40 442.91 380.81 478.16 438.38 495.77 434.73 506.31 463.43 528.75 498.04 524.49 490.97q20[W |S = 1] 250.00 264.06 264.00 297.83 300.00 341.77 320.00 361.85 351.00 414.51 400.00 462.25 412.00 502.41 455.00 552.61 500.00 605.24q20[W |S = 0] 180.00 175.50 180.00 185.62 200.00 191.16 200.00 201.20 200.00 243.00 250.00 251.41 280.00 261.45 300.00 291.57 300.00 308.84Pr[S = 1] 0.70 0.71 0.70 0.70 0.71 0.71 0.71 0.71 0.72 0.72 0.72 0.73 0.74 0.75 0.74 0.75 0.76 0.77Pr[S = 1|W < m] 0.08 0.00 0.08 0.00 0.09 0.00 0.04 0.00 0.03 0.00 0.04 0.00 0.04 0.00 0.05 0.00 0.04 0.00Pr[S = 1|W = m] 0.58 0.57 0.53 0.48 0.63 0.60 0.63 0.60 0.62 0.56 0.68 0.67 0.72 0.70 0.73 0.72 0.77 0.75Pr[S = 1|W > m] 0.76 0.77 0.78 0.78 0.79 0.79 0.79 0.80 0.81 0.82 0.81 0.82 0.82 0.83 0.82 0.83 0.84 0.85Note: Numerical integration performed using the quadrature technique.73Figure 3.10: Conditional Probability of Sector Given the WageNote: Observed conditional probabilities based on a non-parametric local constant (Nadaraya-Watson) estimatorusing a gaussian kernel (bandwidth = R$30).Year 2004.Figure 3.11: Observed and Predicted CDFsAggregate (Unconditional) Wage DistributionNote: Year 2004.74Figure 3.12: Observed and Predicted CDFsFormal SectorNote: Year 2004.Figure 3.13: Observed and Predicted CDFsInformal SectorNote: Year 2004.75Figure 3.14: Empirical CDFs Above the Minimum Wage by Sector3.3.2.5 Decomposing the Differences in the Wage Distributions Across Sectors – TheRole of the Minimum WageFigure 3.14 displays the empirical CDF for the formal and informal sector for wages above theminimum wage. These estimates are, by construction, invariant to the minimum wage effectson the formal and informal sector for values below the minimum wage. Note that we observea substantially smaller difference in the CDFs across sectors in Figure 3.14 than in Figure 3.1.This exercise suggests that the differences across sectors observed in the upper part of the wagedistribution across sectors are also a consequence of the effects of the minimum wage in thebottom part of the wage distribution.The estimates of the model parameters allow us to understand the differences between thewage distributions in the formal and informal sector. Let∆1 ≡ f (w|S(1) = 1)− f (w|S(1) = 0),that is, ∆1 is defined as the observed difference in the density of wages between the formal andinformal sectors. Let:∆m ≡ [ f (w|S(1) = 1)− f0(w|S(0) = 1)]− [ f (w|S(1) = 0)− f0(w|S(0) = 0)],that is, ∆m is defined as the difference in the effects of the minimum wage between the formaland informal sectors. Define:∆0 ≡ f0(w|S(0) = 1)− f0(w|S(0) = 0),76Figure 3.15: Decomposition of the Differences of the Densities Across SectorsFormal minus InformalNote: Density estimates based on a gaussian kernel (bandwidth = R$30). Year 2004.that, is, ∆0 is defined as the difference between the latent wage densities between formal andinformal sectors. From the definitions, we have ∆1 = ∆m+∆0. Given the estimates of the modelparameters, it is possible to compute ∆1, ∆m and ∆0. By comparing these estimates, we can inferthe extent to which the differences in the wage densities between the formal and informal sectorsis due to the minimum wage versus differences that would be present regardless of the minimumwage policy. This decomposition can be performed at every point of the wage distribution.Figure 3.15 displays the differences in the density of wages between the formal and informalsectors. Given that the observed wage distribution in the formal sector stochastically dominatesthat in the informal sector, we observe a negative difference between the formal and informalwage densities for low wages and positive for high wages. We observe a similar pattern forthe latent density as well. Figure 3.15 also displays the differences the differences betweenthe effects of the minimum wage at each point of the wage distribution. The estimates of thedifferences in the minimum wage effect tend to closely follow the differences in the observedwage density across sectors. If we decompose the differences in the observed wage distributionsbetween the formal and informal sectors in differences in latent wage distributions and differ-ences in minimum wage effects, my estimates suggest a larger role for the latter. For example,at the 5th quantile of the wage distribution, the minimum wage policy accounts for 82% of thedifferences in the wage density across sectors.A decomposition of the difference in quantiles across sectors can be performed in a similarway. The minimum wage accounts for 93% of the differences in the 10th percentile of the wage77Figure 3.16: Time Trends in Formality by Wage Group – Threshold in Real Termsdistribution across sectors. My estimates imply that 72% of the differences in the median ofthe wage distribution between the formal and the informal sector is due to the minimum wage.At higher percentiles, such as the 90th percentile, the minimum wage still accounts for 68% ofthe differences observed between the formal and informal sector. Note that the minimum wageaccounts for a substantial part of the differences in the wage distribution across sectors for lowand high wages, although the model do not allow wages above the minimum wage to be affectedby the policy.This exercise highlights that the minimum wage policy amplify differences in the wagedistributions across formal and informal sector that reach quantiles considerably far from theminimum wage level even under the assumption of absence of spillovers from the policy. Thereason for a substantial role of the minimum wage on explaining the differences in the formaland informal sector wage densities above the minimum wage is the opposite ways that the wagedensities are rescaled across sectors due to the inflow/outflow of workers as a result of theminimum wage policy.3.3.3 Interpreting the Evolution of Formality over TimeThe model parameter estimates suggest that minimum wage had a significant impact on reducingthe size of the formal sector. However, the size of the formal sector increased during the period78from 2001 to 2009, a period characterized by a sizable increase in the minimum wage. Thisempirical regularity seems to contradict the implications of the model.The changes over time in the level of formality are a combination of the causal effect of theminimum wage on the size of the formal sector and changes in the underlying latent size of theformal sector. Thus, it is important to consider (i) the fact that minimum wage changes in Brazilare highly correlated with positive shocks to the economy, and (ii) the pro-cyclical nature ofjob-finding rates in the formal sector (Bosch et. al., 2007). If the economic shocks that partiallydetermine the changes in the minimum wage level are positive and lead to an increase the sizeof the formal sector, it is conceivable that minimum wage changes may follow an increase inthe size of the formal sector, despite that its causal effect goes in the opposite direction.A great feature of this problem is that we can gather indirect evidence of these effects. Tobegin, let ∆Pr[S= 1]≡Pr[S= 1|T = 1]−Pr[S= 1|T = 0] be the change in the size of the formalsector between two years.24 The change in the size of the formal sector can be decomposed as:∆Pr[S = 1] = ∆Pr[S = 1|W < k]Pr[W < k|T = 1]+Pr[S = 1|W < k,T = 0]∆Pr[W < k]∆Pr[S = 1|W > k]Pr[W > k|T = 1]+Pr[S = 1|W > k,T = 0]∆Pr[W > k],where k >m is a particular wage level. This expression follows from the law of iterated expecta-tions and adding and subtracting the appropriate terms. The interesting feature of this equationis that is allows us to consider the determinants of changes in the size of the formal sector andhow the minimum wage is related to them. The total variation in the size of the formal sectoris composed of the changes in the size of the formal sector in the bottom part of the wage dis-tribution (∆Pr[S = 1|W < k]), the changes in the size of the formal sector in the upper part ofthe wage distribution (∆Pr[S = 1|W > k]), and the changes in the probabilities of high and lowwages (∆Pr[W < k]). According to the dual-economy model, the minimum wage will not affectthe size of the formal sector in the upper part of the wage distribution. It will, however, reducethe size of the formal sector in the bottom part (∆Pr[S = 1|w < k]) and, reduce the size of thebottom part (∆Pr[W < k]).It is interesting to consider how positive shocks would affect the joint distribution of sectorand wage. It is natural to assume that positive shocks that are correlated with the changes inthe minimum wage would also increase the latent size of the formal sector. If these shocksare also correlated with wage growth, then they will also affect the relative size of each group(∆Pr[W > k] and ∆Pr[W < k]).In considering this decomposition, it is easy to see why an increase in the size of the formalsector can occur after an increase in the minimum wage. For that to be the case, it is necessary24To simplify notation, I will denote the observed wage by W and the observed sector by S, dropping the potentialoutcome indexing. Indexing is not relevant to this discussion because I will only be addressing the observed sectorand wage. In the following discussion, I will assume that in the second period (T = 1), the minimum wage level ishigher than in the first period (T = 0).79Figure 3.17: Time Trends in Formality by Wage Group – Robustness of ThresholdSpecificationthat shocks affecting the latent size of the formal sector and the wage distribution be sufficientlylarge to compensate for the causal effect of the minimum wage change. However, because weobserve data on wages and sectors, we can attempt to disentangle these different channels. Thisis the case because the minimum wage effect should only operate though changes in the bottompart of the wage distribution, whereas the correlated shocks should operate throughout both thewage distribution and the conditional probability of formality given (high and low) wages.Figure 3.16 shows the evolution over time of the share of workers in the formal sectordivided into three groups: high wages, low wages and aggregate (all workers). I define low-wage workers in this exercise using a threshold of R$470/month measured in real terms withrespect to 2009.25 The upward trend in the minimum wage over the period from 2001 to 2009was followed by a decrease in the share of workers in the formal sector in the bottom part ofthe wage distribution. This is where the minimum wage effect on the size of the formal sectorshould operate. However, consistent with the hypothesis that the changes in the minimum wageare correlated with positive shocks to the economy, the share of workers in the formal sector inthe upper part of the wage distribution increased during the same period. This effect, combinedwith the increase in the proportion of workers who belong to the latter group (which is also25The choice of k is not relevant to the qualitative conclusions of this exercise provided that k>m. To demonstratethis, in the next graph, I will use a different threshold.80consistent with the hypothesis of positive correlated shocks) is sufficiently large to induce thechanges in the minimum wage level to be statistically correlated with increases in the size of theformal sector.26Figure 3.17 shows the same graph using a nominal wage threshold (R$600), that is, a fixedvalue in terms of Brazilian R$ for all years. This fixed nominal threshold introduces inflationas one of the correlated effects. This is the case because, as time passes, the entire wage distri-bution shifts to the right, leaving a progressively smaller group of workers with wages below aparticular value. This effect is easy to capture under the decomposition above. Inflation shouldoperate through the ∆Pr[w > k] channel. Figure 3.17 shows that, as expected, the share of work-ers in the formal sector with low wages decreased over the period when the minimum wagewas increasing. The share of workers in the formal sector from the high-wage group increased,and the proportion of workers who belong to the high-wage group also increased (through acombination of the minimum wage, inflation and correlated shock effects).This analysis shows that although, in the aggregate, an increase in the size of the formalsector is observed during the period, the data provide sufficient evidence to indicate that thisincrease was due to changes in the latent size of the formal sector following positive economicshocks that were correlated with but not causally related to the minimum wage changes. Theindirect evidence of this effect is the increase in the size of the formal sector in the upper part ofthe wage distribution (and the increase in the proportion of high-wage workers). The (causal)minimum wage effect of reducing the size of the formal sector can be seen in the decrease inthe size of the formal sector among the set of low-wage workers, which is the set of workers onwhom the minimum wage should have an effect.273.4 ConclusionThis chapter evaluates the effects of the minimum wage in the Brazilian economy using a dualeconomy model under a parametric assumption regarding the shape of the latent wage distribu-tion. The parametric assumption allows me to estimate the model using maximum likelihoodand to eliminate the assumption of independence between latent sectors and wages used inChapter 2.The results obtained using this strategy reinforce the conclusions of the second chapter ofthis thesis. It appears that, for the Brazilian economy, the independence assumption provides agood approximation of the data-generating process. This is indicated by the small magnitude of26Note that this analysis shows that one can only claim that changes in the minimum wage caused an increase inthe size of the formal sector in the period if one is willing to assume that the minimum wage increased the size ofthe formal sector in the upper part of the wage distribution.27As mentioned in Chapter 2, changes in the minimum wage in Brazil are partially based on inflation and GDPgrowth. This feature complicates interpreting changes in outcomes after changes in the minimum wage level ascausal. The fact that the latent size of the formal sector was growing during the period featuring increases in theminimum wage level is indirect evidence of the endogeneity of changes in the minimum wage.81the slope coefficient of the conditional probability of (latent) formality with respect to the wage.The empirical exercise shows that the minimum wage policy generates sizable unemploymenteffects and increases the size of the informal sector.A decomposition exercise based on the estimates of the model parameters suggest that un-employment effects and the inflow of low-wage workers to the informal sector are responsiblefor most of the observed differences on the wage distribution across sectors, even at points farfrom the minimum wage. Indirect evidence of such a sector mobility channel is in line with themodel’s prediction, that is, the evolution of the size of the formal sector across low- and high-wage groups replicates the patterns predicted by the model. Another interesting feature of theparametric model is that it can approximate most of the stylized facts concerning the joint dis-tribution of sectors and wages using a small number of parameters – even higher moments, suchas skewness and kurtosis. In future work, it would be interesting to evaluate the heterogeneityof minimum wage effects across observable worker and firm characteristics and the contributionof the minimum wage to changes in between- and within-group wage inequality.82Chapter 4Estimation of Average TreatmentEffects in Partially RandomizedDesigns4.1 IntroductionEmpirical evidence in economics is increasingly based on clean research designs, which allowresearchers to obtain valid estimates of causal effects of interventions (treatment) on outcomevariables of interest. The two most prominent clean research designs in economics are random-ized control trials (RCT) and regression discontinuity designs (RDD). In both cases individualsare randomly assigned to treated and control groups either completely (RCT) or around a cut-off point of a continuous variable (RCT). One of main the benefits of using randomization isthat, in principle, it ascertains that the two groups are similar both in terms of observable andunobservable characteristics, allowing credible causal inference.Currently, a large fraction of RCTs in economics are small scale interventions that resemblethe clinical trials that are used to test efficacy of newly developed drugs.1 An important aspectof small scale interventions is that it allows researchers to become involved with the whole ran-domization process. Such involvement implies that they have to make choices on, for example,which exact randomization design will be used.A very popular randomization design is stratified randomization.2 Stratified randomizationor blocking, originally proposed by Fisher (1935), is the standard way to avoid imbalance onpre-treatment variables and to increase estimation precision. In a randomized research designusing stratification, units are randomly assigned to treatment and control groups within eachstratum. If units are stratified or grouped based on values of pre-treatment variables that are cor-related with the outcome of interest then there will be a precision improvement in the estimationof treatment effects, over the completely randomized design. In fact, as shown by Imbens, King,1This is a relatively recent trend in the profession as in the past most of randomized experiments used in thesocial sciences were large scale ones typically ran by governmental agencies. See Levitt and List (2009) for anhistorical overview of field experiments in economics.2To give an idea of how widespread is the use of stratified randomization in development economics nowadays,only five out of the eighteen recent studies summarized in Table 1 of Bruhn and McKenzie (2009) do not use somekind of stratification method.83McKenzie and Ridder (2009) and Imai, King and Nall (2009), if each stratum is formed by a sin-gle pair of treated and control units, a design called pairwise randomization, maximal precisionis obtained.In many instances, empirical researchers do not choose to work with stratified RCTs toreduce bias or increase precision. In fact, in many situations it may be simply unfeasible orundesirable to design the study in a way that each stratum is formed by single pairs of treatedand control units or even to design the experiment guaranteeing equal treatment proportionsacross strata. 3 Consider, for example, the case that each stratum is a different venue for thesame program,4 or a different program itself, as in the case of meta-analyses that treat each studyas a given stratum.In RDD a similar situation occurs when there are several cutoffs in the running variable,the continuous variable that is used to determine treatment status. For each one of these cutoffpoints, treated and control groups will lie on different sides of those points (sharp design) andresearchers use information coming from all those ‘experiments’. The proportion of treatedunits within a window at each one the cutoff points will most likely differ across cutoff points.Hence it is not unlikely that empirical researchers will have to deal with random designs ex-hibiting unequal treatment probabilities across strata/cutoff points. However, they will not findmuch guidance from the current literature. In fact, there are just few articles in the econometricand applied literature that discuss the different implications for inference arising from differentrandomization procedures. Some examples are Bruhn and McKenzie (2009) and Duflo, Glen-nerster, and Kremer (2008). The statistical and medical literature are richer in this sense, andexamples of articles on the topics are Altman (1985), Permutt (1990), Therneau (1993), Ker-nan, Viscoli, Makuch, Brass, and Horwitz (1999), Aickin (2001), Hansen and Bowers (2008),Imbens, King, McKenzie and Ridder (2009), and Imai, King and Nall (2009).Most of the existing literature recently surveyed by Imbens (2011) is concerned with preci-sion of average treatment effects estimates and testing the null hypothesis of zero effects, takingfor granted that the randomization was properly performed. However, little has been devotedto the discussion on (i) how to test for systematic differences between treatment and controlgroups in the distribution of baseline covariates in stratified randomization and (ii) what to doif differences are detected. An exception is Hansen and Bowers (2008) who explicitly dealwith testing imbalance in a stratified design with differential treatment probabilities but do notprovide guidance on how to proceed if the null hypothesis of perfect randomization is rejected.3For example, there might exist institutional restrictions impeding the proportion of treated units for all stratato be the same or there might be differential attrition that leads treated units to leave the experiment more often forsome strata than others. The researcher may also purposely use different proportions of treated units across strata.That is the case when we want to overrepresent some subpopulation (characterized by its stratum) among the treatedunits.4Examples of this sort abound in randomized studies on education, in which randomization is typically per-formed either within school classes or within schools. See, for example, Krueger, and Whitmore (2001). In theempirical section of this chapter we analyze a job training program where randomization of trainees and controlsoccurred within each class.84Testing for balancing in pretreatment covariates is not unanimous in the evaluation litera-ture. In fact, there are some studies that advocate against testing for equality on the distributionof covariates (e.g. Senn, 1994). The key reason being that if randomization was properly imple-mented, any difference in distributions would be spurious and simply reflect sampling variation.However, corruption to the random assignment may be present and, if it is the case, selectionbias may be a consequence. Hence it seems important to obtain an assessment on how clean therandomization protocol was that can be constructed from available data.A first methodological question we address in this study is how to test the null hypothesisthat randomization was properly performed across all strata in ex-ante stratified randomizeddesigns with differential treatment probabilities across strata. The current dominant testingprocedure in the literature, as reported by Bruhn and McKenzie (2009), is to test for imbalancesat each stratum independently. It is well known that ignoring the fact that one has multipletests is problematic, simply because the probability of rejecting at least one true null hypothesisincreases with the number of univariate tests and can become very close to one (Romano, Shaikhand Wolf, 2010). But even if a correction for performing separate tests is used, other issuesarising from the small sample size typically found in each stratum may be important. First, teststatistics that are optimal asymptotically are expected to behave poorly and second, alternativeexact tests, such as the Fisher’s Exact Test, will have small power.There are some alternative testing procedures that overcome the small sample size problemby using all strata together as discussed, for example, in Hansen and Bowers (2008). Thoseprocedures apply to covariates the same reasoning used to test the null hypothesis of zero treat-ment effects. However these tests must have small power as they are designed to verify whetherweighted sums of covariances between treatment and covariates are zero. Belonging to thatclass is what we may call the ‘naive’ test, corresponding to a test of overall difference in meansbetween the treatment and the control groups. One can construct an unbiased test statistic forthe hypothesis that a weighted sum of covariances is zero using a regression approach. For thatpurpose, one should use a t-statistics (or F-statistics for a vector of covariates) to test whetherthe coefficient of the treatment assignment dummy is zero in a regression of the covariate on theassignment dummy and strata dummies. That procedure is advocated, among others, by Dufloet al. (2008).However, testing that a weighted sum of imbalance measures is zero does not say muchabout the null hypothesis that randomization was well performed in each and every strata. As asimple example, consider two strata with the same treatment probabilities and the same popula-tion size. In the first stratum, there is a positive imbalance in a given baseline covariate, and inthe second one a negative imbalance but equal in absolute value to the first. They would cancelout in a simple sum and no test for the null hypothesis that that sum is zero would be powerfulto detect the imbalance occurring at each stratum.Our first contribution is to show that a regression-based test statistics can be constructed totest the null hypothesis that randomization was properly performed across all strata in ex-antestratified randomized designs with differential treatment probabilities across strata. Our test is85a simple generalization of the Chow test (Chow, 1960), which is a test for equality of the co-efficients from two linear regressions on different data sets. Our generalization is to considernot only two data sets but L, the number of strata. The test statistic will have a known limit-ing asymptotic distribution as it is an F-statistic constructed using pooled data from all strata.Finally, that testing procedure will be consistent and will have the correct nominal size.A nice feature of using regression based methods is that it provides another way to under-stand the problem of simply comparing differences in means. The simple difference in meansof a covariate is equivalent to regress that covariate on the treatment dummy. Therefore any biasthat arises from that procedure is an omitted variable bias. Only after we control for the levelwhere the randomization was taken, that is, the stratum, we could assure independence betweenthe treatment and the error term (Duflo et al., 2008). It thus becomes clear that if treatmentstatus is correlated with stratum and if stratum is correlated with the covariate, we should addstrata dummies into the regression as ‘regressors’. What is not clear in the literature was thatalthough that method guarantees independence between treatment and the error term, it may notproduce a powerful test for overall imbalance of pre-treatment covariates, where by overall wemean to have all strata considered simultaneously.We acknowledge that testing for imbalance in covariates is certainly not the ultimate goalin a evaluation study but the estimation of the Average Treatment Effect (AT E) or a modifiedversion of that parameter, such as the Average Treatment Effect on the Treated (AT T ). However,if there is imbalance in covariates, then there is much reason to believe that AT E estimates willbe biased. Therefore, our second contribution is to develop an integrated method that usesthe realized values of the test statistics constructed to detect imbalances at each stratum asconstraints in the maximization problem of an empirical likelihood estimator (EL) for AT E.Importantly, the sum of those test statistics over all strata is the F-statistic used to test the nullhypothesis of no imbalance in covariates at each and every strata. Thus, if one rejects the nullhypothesis, there will be at least one stratum for which the constraint is binding.There are some nice aspects of using an empirical likelihood estimator in our case. First,we show that if none of the constraints are binding, that is, if there is no sizeable imbalance incovariates for each and all strata, the estimator for AT E is a weighted sum of simple differencesin means, using strata sample proportions as weights. Second, if the restriction is binding, wefind optimal weights for each stratum that are inversely proportional to the value of stratum-leveltest statistics used to detect difference in means of baseline covariates at each stratum. Finally,the asymptotic distribution of our estimators are obtained straightforwardly from known resultson the aymptotic distribution of EL estimators.The integration of testing for imbalance in covariates and estimation of AT E may be seenas a useful contribution of our method to empirical work relying on randomization designs. Ac-cording to Bruhn and McKenzie (2009), when faced with strata that are imbalanced in covariatesresearchers opt for one out of two strategies: they either drop the “problematic” strata or theyre-randomize units in each strata until balance is achieved at each strata.There are obvious problems with those two alternatives. First, strata deletion is implemented86with the hope to decrease bias. However, if there is heterogeneity of treatment effects acrossstrata, such procedure may in fact increase bias, as the resulting estimand will not be definedover some subpopulations. Thus, compounded with the loss in precision due to discardingobservations, dropping strata has the potential to increase both bias and variance.The second alternative to deal with covariate imbalance, re-randomization, is not alwaysfeasible and also present some problems. First, consider the feasibility issue. Examples occurthat units need to be publicly informed about it and follow the re-randomization process. Inmany cases this may not be accomplished. Another potential problem with re-randomization isthat, as extensively discussed by Bruhn and McKenzie (2009), it affects directly the statisticalinference and that is an issue which is typically ignored in practice. Ignoring the fact that re-randomization was performed leads to hypothesis tests with incorrect size.We therefore propose a testing method for covariate imbalance that is embedded in an esti-mation procedure for treatment effects. Statistical inference for our estimation method is alsoprovided in this study, which is divided as the following. We first present some notation anddiscuss how to optimally test for imbalance in covariates under a stratified randomized design.We then present our estimation method that incorporates our testing procedure for covariate im-balance to estimate AT E and AT T . We finally evaluate our “two-step”method by Monte Carloexercises and by applying it to the evaluation of a Brazilian job training program, whose designfor a specific cohort of program applicants followed the stratified randomization protocol.4.2 Tests for Imbalances in Pre-Treatment Variables underStratified Randomization DesignsIn this section we discuss two groups of existing tests used for detecting imbalances in pre-treatment variables. The first group of tests uses information from each stratum independently,whereas the second group is based on joint information coming from all strata. For both groupswe investigate how appropriate those tests are for the null hypothesis that the randomizationprotocol was properly performed at each and every stratum. We then show how they can betranslated as tests for vector of coefficients on linear regressions and present the test statisticsthat we will use within this regression framework, allowing us to invoke well established resultson optimal properties of that test statistics.4.2.1 Basic Setup and NotationLet X be a vector of K pretreatment covariates and T a binary variable that equals one if treatedand zero if control. Let S be a vector that indexes L different strata. If the randomizationprotocol within each stratum s = 1, ...,L was properly performed, then for each and every s,87γs = 0,5 whereγs =Cov [X ,T |S = s] . (4.1)We then write the relevant null hypothesis to be tested asH0: γs = 0 ∀s, s = 1, ...,L. (4.2)Testing for H0 is equivalent to test for E(X |T = 1,S = s) = E(X |T = 0,S = s), at each and ev-ery s, as E(X |T = 1,S = s) = E(X |T = 0,S = s) = γs/V [T |S = s]. Sometimes we will rewriteH0 using a vector representationγ = [γ1,γ2, ...,γL]> (4.3)so H0: γ = 0 or, equivalently, H0: γ1 = γ2 = ...= γL = 0.4.2.2 Separate Tests for Each StratumWe can do separate tests of γs = 0 for each s = 1, ...,L. However, note that this is not equivalentto test whether H0 is true, as the latter is a joint hypothesis test and the former are separateindividual hypothesis tests. Therefore, all typical difficulties arising from the difference betweenseparate and joint tests show up here. We list some of those difficulties below.A first difficulty is that the probability of rejecting at least one true null hypothesis out ofL increases with L and can become very close to one even if H0 is true. Put in another wayif one does not take into account that multiple tests are being performed, the consequence willbe a large probability that some of the individual true hypotheses will get rejected by chance.Romano, Shaikh and Wolf (2010) provide a simple numerical example that can be adapted toour setting: Suppose that we want to test H0 when there we have 100 strata and that H0 is true.If we perform 100 separate tests with size of each test exactly equal to 0.05 that each stratumis balanced (γs = 0 for each s = 1, ...,L), one expects that five true hypotheses will be rejected.Further, if all test-statistics are mutually independent, then the probability that at least one truenull hypothesis will be rejected is given by 1−0.95100 = 0.994.In principle, however, accounting for the multiplicity of individual tests can be achieved bycontrolling for the probability of one or more inappropriate rejections. Thus, although relevant,the problem raised by performing separate tests could be overcome.A second difficulty is that within each stratum usual test-statistics such as t and F statisticsthat are based on large sample properties will behave poorly. This happens because each stra-tum tend to be small and asymptotic approximations will be misleading. Of course, this claimdepends on stratum size and can be ignored if one has large number of observations for eachstratum. If, however, usual tests based on F or t statistics are invalid due to the small sample sizeone can circumvent that problem by using permutation (exact) tests.5Obviously, randomization implies independence, not only mean independence. For simplicity and becausemost of the literature focus on means, we center our analysis on expectations only.88Nonparametric permutation tests, or Fisher’s tests can be used to test for differences indistributions when sample size is small. Those tests have a long tradition in the treatment effectsliterature as they go back to the early analysis of randomized experiments by Fisher (1925) andNeyman (1923) and are an alternative procedure to check for imbalances between treated andcontrol samples in settings where asymptotic approximations are known to be unreliable.6 Foreach s, the distribution of the test statistic under γs = 0 is obtained by calculating all possiblevalues of the test statistic under rearrangements of the labels (treated vs. non-treated) on theobserved data. If labels are exchangeable under γs = 0, then these permutation tests yield exactsignificance levels.A third difficulty could arise if exact tests were used. The problem here is caused by thesame reason that biased tests based on t and F statistics: typically small strata. The consequencehere is not bias for testing γs = 0 at each s, but small power against alternatives of improperrandomization within stratum s. Thus one may overaccept the null that γs = 0 when it is false,contributing therefore to an increase in the bias of final AT E estimates.4.2.3 Tests Based on Pooling All Strata4.2.3.1 Tests for Simple Difference in MeansIn order to check whether randomization was properly performed, researchers usually test whetherdifferences in means of pretreatment covariates are the same between treated and control units.Nevertheless, overall mean differences in pre-treatment variables might exist if treatment wasrandomly assigned within strata and treatment probabilities differ across strata.Test stastistics constructed for testing E(X |T = 1) = E(X |T = 0) may present large valueseven if H0 is true. Simple algebra exploiting the covariance decomposition and the fact that Tis binary reveals that pooling X across strata for both groups yields7E(X |T = 1)−E(X |T = 0) = E(γS)V [T ]+Cov [E(X |S) ,E(T |S)]V [T ]. (4.4)Under H0 the first term of the sum in Equation (4.4) is zero, but not necessarily its secondterm. Two separate sufficient conditions for E(X |T = 1) = E(X |T = 0) under H0 are that forall s = 1, ...,L: (i) ps = p, where ps ≡ Pr [T = 1|S = s] and p≡ Pr [T = 1], that is, the treatmentprobability within each stratum is the same, regardless the stratum; and (ii) E(X |S = s) =E(X),that is, pretreatment covariates are (mean) independent of S. Although there are situations inwhich the covariate distribution does not vary across strata, condition (ii) is rarely observed instratified randomization as treated and control units are typically grouped in strata according totheir similarity on X’s. However, as discussed previously, condition (i) may not not necessarily6See also Alan Agresti’s (1992) survey on exact inference.7Note that because T is binary, its conditional and unconditional variances formulae are respectively V [T |S] =Pr [T = 1|S] (1−Pr [T = 1|S]) and V [T ] = Pr [T = 1] (1−Pr [T = 1]).89hold in several important situations, implying that E(X |T = 1) 6= E(X |T = 0) even when H0 istrue.4.2.3.2 Tests for Weighted Difference in MeansA naive test of imbalance using pooled data will be biased for H0 if conditions (i) and (ii) are notsatisfied. However, if condition (i) is not satisfied, that is, if treatment probabilities are differentacross strata, one can use a reweighted approach to test for imbalance using pooled data. Forexample, tests for E(ω (S,T )X |T = 1) = E(ω (S,T )X |T = 0) may not be biased for H0 evenif ps 6= p for some s, if ω (S,T ) is chosen in a proper way. A sufficient rule for choosing ω (S,T )that produces unbiased test statistics satisfies: 8ω (S,1)ω (S,0)=(1− pSpS)·(p1− p). (4.5)As examples of ω’s satisfying the condition established by Equation (4.5), consider the follow-ing three weighting functions:ωFE (S,T ) = T(1− pS) pE(pS (1− pS)) +(1−T )pS (1− p)E(pS (1− pS)) (4.6)ωAT E (S,T ) = TppS+(1−T )(1− p1− pS)(4.7)ωAT T (S,T ) = T +(1−T )(1− pp)(pS1− pS). (4.8)Note that even though tests for E(ω (S,T )X |T = 1) = E(ω (S,T )X |T = 0) are not biasedfor H0, they will have low power as positive covariances may cancel with negative covariancesin the weighted expectation E((pSω (S,1)/pV [T |S])γS) of γS. In that case, we may not be ableto detect deviations from H0 when it is clearly false.4.2.4 Regression-based Test Statistics4.2.4.1 The Long RegressionConsider the following regression model:9Xi =L∑s=1Ds,i (βs+δsTi)+Ui (4.9)8See the appendix for a simple proof of the sufficiency claim.9For easeness of exposition we consider the case of a scalar regression, i.e., X is a scalar. The argument caneasily be generalized to multivariate regressions, in which X is a vector of dimension K.90where Ui is a projection error term and Ds,i = 1I{Si = s}, s = 1, ...,L. Ordinary Least Square(OLS) estimates of δs areδ̂s = γ̂s(p̂s− p̂2s)−1, (4.10)where γ̂s, the estimator of the conditional covariance between X and T given S = s, isγ̂s =∑Nsi=1 Ds,iXi (Ti− p̂s)Ns, (4.11)and p̂s, the proportion of treated units in stratum s, isp̂s = N−1sNs∑i=1Ds,iTi. (4.12)The quantity p̂s− p̂2s is an estimator of the conditional variance of T given S = s and Ns =∑Nsi=1 Ds,i is the size of stratum s, so summing up all strata we haveL∑s=1Ns∑i=1Ds,i =L∑s=1Ns = NL∑s=1φ̂s = N (4.13)whereφ̂s = N−1Ns. (4.14)Finally we define the estimator of the overall probability of receiving the treatment by p̂, whichis simply the sample proportion of treated units across all strata:p̂ = N−1L∑s=1Ns∑i=1Ds,iTi =L∑s=1p̂sφ̂s. (4.15)Given that an observation i can never be in more than one stratum, we have that Ds,iDq,i = 0for all i and s 6= q. This simplifies calculations and one can easily check that an F-statistic usedto test H0 obtained as a regression by-product can be written as simple average of F-statistics.Its formula is given byW f = N∑Ls=1 φ̂s(p̂s− p̂2s)−1 γ̂2sσ̂2X= LFD,H0→ χ2L (4.16)where σ̂2X is some consistent estimator for the conditional variance of X given S and T .10The test statistics W f is a Wald version of an asymptotically equivalent likelihood-ratio teststatistics, which is known to be consistent for H0.10We assume homoskedasticity for simplicity, that is, Var[U |T,S] = σ2X .91Equation (4.9) is the most general relating X to T at each and every s. This implies that teststatistics based either on separate tests for each γs or on (weighted) expectations of γS can berepresented either using the model for a subpopulation or as sub-models of Equation (4.9).For example, we know that an F-statistic to test γs = 0 for a given s isFs = Nφ̂s(p̂s− p̂2s)−1 γ̂2sσ̂2X(4.17)and that corresponds to the case that we drop all observations in strata l for all l = 1, ...,L ands 6= l. Thus,W f =L∑s=1Fs⇒F = L−1L∑s=1Fs (4.18)We have already discussed that eachFs may not be used to test if γs = 0 as normal distribu-tions may be a poor approximation of its distribution when Ns is small. However, when we sumup these statistics we expect to have better appoximations (as long as L increases at a lower ratethan N). We investigate this in a Monte Carlo exercise.More importantly here is the fact that if W f is greater than a given critical value obtainedfrom a χ2L distribution, then at least one Fs will also be greater than a critical value obtainedfrom a χ21 distribution. This provides the motivation for implementing our suggested procedurefor computation of AT E using weights for each stratum s that are inversely proportional toFs.4.2.4.2 Relation to Short Regression Test StatisticsThe non-inclusion of interaction terms of treatment and strata dummies in Equation (4.9) leadsto shorter regressions that are useful to produce tests for averages of the covariance terms. Theregression that easily generated a test statistic for testing H0: γ = 0 can also be used to test c>γ =0 for some vector of positive weights c = [c1,...,cL]>. For example, a short OLS regression thatimposes δs = δ for all sXi =L∑s=1Ds,i (βs+δTi)+Ui (4.19)yields δ̂ as an estimate for δ :δ̂ = ∑Ls=1∑Nsi=1 Ds,iXi (Ti− p̂s)∑Ls=1 Ns (p̂s− p̂2s )=L∑s=1(φ̂s∑Ls=1 φ̂s (p̂s− p̂2s ))γ̂s. (4.20)Thus, using the Law of Large Numbers we have thatδ̂ P→ δ = E(γSE(V [T |S])). (4.21)92Simple algebra shows that in this case,δ = E(ωFE (S,T )X |T = 1)−E(ωFE (S,T )X |T = 0) (4.22)whereωFE (S,T ) = T(1− pS) pE(pS (1− pS)) +(1−T )pS (1− p)E(pS (1− pS)) , (4.23)whose estimator for a given (s, t) isω̂FE (s, t) =(L∑s=1φ̂s(p̂s− p̂2s))−1(t (1− p̂s) p̂+(1− t) p̂s (1− p̂)) . (4.24)Finally, note that we are labeling the weighting function ωFE (S,T ) to emphasize the point thatthis short regression is equivalent to a regression of X on T controlling for stratum fixed effects(FE).We can construct a test-statistics using ωFE (S,T ) that is suitable to test for δ = 0. First, wenote that in general a test-statistics for E(ω(S,1)p(1−pS)γS)will assume the following structure:11W fω = N(∑Ls=1 φ̂sω̂(s,1)(1−p̂s) γ̂s)2σ̂2X ∑Ls=1 φ̂s(ω̂(s,1)1−p̂s)2(p̂s− p̂2s ). (4.25)Within the specific case of fixed-effects, we haveW f ,FE = N(∑Ls=1 φ̂sγ̂s)2σ̂2X ∑Ls=1 φ̂s (p̂s− p̂2s )D,H0→ χ21 . (4.26)Now, consider running a shorter regression of X on T , allowing for an intercept, that is, fixβs = β and δs = δ for all s. In order to correct for the fact that treatment probabilities differacross strata, one needs to run a weighted version of that short regression. For example, considerusing ωAT E as weights, where:12ωAT E = Tpps+(1−T )(1− p1− ps)(4.27)then the estimated coefficient δ of such regression using feasible weightsω̂AT E (s, t) = tp̂p̂s+(1− t)(1− p̂1− p̂s)(4.28)11See the appendix.12In the next section we discuss the convenience of calling that weighting scheme by ‘ATE’.93isδ̂ω =L∑s=1∑Nsi=1 Ds,iXi (Ti− p̂s)N (p̂s− p̂2s )=L∑s=1φ̂sγ̂sp̂s− p̂2sP→ E(γSV [T |S]). (4.29)FinallyE(γSV [T |S])= E(ωAT E (S,T )X |T = 1)−E(ωAT E (S,T )X |T = 0) . (4.30)We can construct the same test-statistics usingωAT E (S,T ) that is suitable to test forE(ωAT E (S,T )X |T = 1)= E(ωAT E (S,T )X |T = 0) that Hansen and Bowers (2008) used. Let us call it W f ,AT E , whereW f ,AT E = N(∑Ls=1 φ̂sγ̂s(p̂s− p̂2s)−1)2σ̂2X ∑Ls=1 φ̂s (p̂s− p̂2s )−1D,H0→ χ21 . (4.31)Note the subtle but important difference between W f and both W f ,FE and W f ,AT E . WhileW f is a weighted sum of squared γ̂s, both W f ,FE and W f ,AT E are squared weighted sums ofγ̂s. That difference in order (squared weighted sum versus weigthed sum of squares) is whatreduces power of tests based on W f ,FE and W f ,AT E as positive covariances may cancel withnegative covariances in the weighted sums and we may not be able to detect deviations fromH0 when it is clearly false. In an extreme case, we could have test statistics at the stratumlevel rejecting the null in all strata, but in different directions for each strata, in a way that thedifferences would offset each other and the aggregated difference is still zero. Clearly, that doesnot seem to be a very well conducted randomization, but tests based on W f ,FE and W f ,AT E willnot detect that.Therefore, commonly used testing procedures for imbalances in pretreatment covariates thatused pooled data, as the one proposed by Hansen and Bowers (2008) or the one that add stratadummies in a regression will have asymptotically correct nominal size but are designed to test adifferent null hypothesis. They test whether linear combinations of the conditional covariancesbetween treatment status T and pretreatment covariates X given stratum s are zero. They arenot suited to test whether all conditional covariances are zero, whose failure to reject guaranteesproperness of randomization.4.3 Estimation of ATE and ATT Using Information on CovariatesImbalanceWe are ultimately interested in estimates of AT E or AT T . Once data are collected from arandomized experiment, it should be an easy task to estimate the AT E parameter as a simpledifference in means of the response variable Y between treated and control units. Given that we94have L experiments, we compute AT E for each one of those L using:ÂT Es = N−1sNs∑i=1Ds,iTiYip̂s−N−1sNs∑i=1Ds,i (1−Ti)Yi1− p̂s (4.32)Estimates of AT E and AT T in the case of stratified randomization are obtained throughA˜T E =L∑s=1φ̂sÂT Es (4.33)A˜T T =L∑s=1p̂sφ̂sÂT Esp̂. (4.34)Under H0, one expects that ÂT EsP,H0→ AT Es for each and every s as there will be no selectionbias at each stratum, that is, a simple difference in means of Y between treated and controls ateach s will be consistent for the true AT Es. In that case, A˜T EP,H0→ AT E and A˜T T P,H0→ AT T .Selection bias would be a problem if some of the pretreatment covariates that have a directeffect on Y were unbalanced after randomization for some strata. In that case we would neverbe able to tell whether any effect that we find was caused by the treatment itself or by thoseunbalanced covariates. Therefore, our estimates of ÂT Es would not be consistent for some s.If randomization was not properly implemented we have to account for that potential se-lection. There are many ways to control for selection on pretreatment covariates in estimationof AT E. Imbens (2004) provides a list of methods that deal with that selection problem, likematching, inverse probability weighting, subclassification, covariance-adjustment and imputa-tion methods. Those are valid methods under the hypothesis that there are no other unobservedcovariates that could have affected selection.The problem of choosing all covariates that may have influenced the selection into treatmentis not one with an easy solution. In fact, some authors, as Pearl (2009), have argued that sta-tistical methods used for modeling selection may not even be appropriate. Therefore, once weacknowledge that randomization was corrupted we should try to investigate the reasons why ithad failed first in order to understand what forces could have led treatment and control groupsto become systematically different in baseline covariates. A simple application of methods thatare consistent under “selection on observables” may be insufficient to control for all the system-atic components of selection that are not observed. Furthermore, even if we are able to sort outwhat those components are, it is not clear what functional form to use either for the conditionalprobability of receiving treatment or for the conditional expectation of Y on those covariates andtreatment.We advocate in favor of not explicitly modeling selection on unbalanced covariates or thestructural relationship between Y , treatment and covariates for some reasons. First, there is theproblem of selection being also based on unobservables. Second, the inclusion of unbalanced95covariates as controls is determined from tests that, as discussed before, are either biased orpowerless. Third, modeling selection or conditional expectations given a vector of covariates ateach stratum may also be problematic as in that case we will have a relative small number ofobservations. Fourth, given that randomization was used to guarantee a clean design and useof simple procedure, it sounds contradictory that under randomization one has to end up using‘matching’ methods: It would be interesting to keep our estimator of AT Es as simple as a simpledifference in sample means. Finally, we do not have any a priori knowledge on functional forms,as discussed in the previous paragraph.We now discuss how to estimate AT E and AT T if our test statistics detect some imbalanceon covariates without explicitly modeling how those unbalanced covariates affects bias. We fixas given the experimental estimators for AT Es, ÂT Es, as expressed in Equation (4.32).4.3.1 Estimating ATE After Testing for Imbalance at Each StratumThere are two common ways to use strata-level test results on covariates balance on AT E andAT T estimation. One option is re-randomization. Another option is to drop the problematicstrata, that is, to drop all strata for which one rejected the hypothesis of perfect covariates balancewithin stratum.Re-randomization is not always feasible as in the cases that units (subjects) need to be pub-licly informed about the randomization process. Another problem with re-randomization is thatthe sampling distribution of the final estimator will depend on the test results of the previoussamples in a very complex way. Most of the empirical literature doing re-randomization appar-ently ignores this problem as reported by Bruhn and McKenzie (2008).We call the second method Dropping Unbalanced Strata (DUS). In that case, one simplyignores all strata in which strata-level tests detected imbalance in covariates. By doing that, oneexpects to have two effects on the bias of the final estimator.First, bias would be reduced. If Y is affected by X and this is unbalanced across treatmentand control groups, differences in expected Y across groups could have been caused by theunbalancing on X . If we drop all strata in which imbalance occurs, we would get rid of the bias.Second, bias could increase. Let us illustrate this point with a very simple example. Supposethat we have two strata, A and B. Stratum A is perfectly balanced while B is not. Let φA =Pr [s = A]. We can writeAT E = AT EB+φA (AT EA−AT EB) 6= AT EA. (4.35)By artificially making φA = 1, that is, by dropping stratum B, we will have that AT E = AT EA.That would be in general true for any level of φA only if AT EA = AT EB. Thus, if there is het-erogeneity in the treatment effect across strata, dropping strata that have imbalanced covariateswill in general produce some bias. If one has a large number of strata and if the probability ofrejecting γs = 0 for some s is independent of AT Es, then this bias tends to be small.96Finally, DUS will increase the variance of the final estimator by an usual argument that thevariance of a sample mean is an increasing function of the sample size.4.3.2 Estimating ATE After Testing for Imbalance Using Pooled Data AcrossStrataWe have two types of tests that use all strata simultaneously. The first type uses test statisticsthat are based on squared weighted sums of γ̂s. The question we pose is how to proceed if ahigh value for that type of test statistic is obtained. In principle, one would conclude that theentire randomization protocol was corrupted and that the experiment should be abandoned. Onthe other hand, if no evidence of imbalance in covariates is found, one would use A˜T E and A˜T Tas estimators for AT E and AT T respectively. However, remember that in this case, deviationsfrom the null may not be detected. Thus, using that type of test may result in biased estimatesfor AT E and AT T .The second type of test statistic is based on the long regression approach. It yields a con-sistent test for H0. If one rejects the null, however, what to do? In the rest of this section weprovide a simple method that uses information on relative imbalances in each stratum, allowingthe researcher to still use the data collected and not abandon the experimental results.4.3.3 Empirical Likelihood Estimators for ATE and ATTWe have seen how to proper test for imbalances in pre-treatment covariates. That test, unlike thefirst group of tests, is based on a weighted sum of squared γ̂s. We now propose new estimatorsfor AT E and AT T that take into account explicitly the imbalance problem detected from a propertest for H0.Suppose we commit ourselves to a given probability of rejection of the null hypothesis whenthe null is true, that is, we fix α , the size of the test. Say we fix α = α∗. Then, wereject H0 if W f = N∑Ls=1 γ̂2s φ̂s(p̂s− p̂2s)−1σ̂2X> cL,1−α∗ (4.36)where cL,1−α∗ is the (1−α∗)th quantile of a random variable distributed as chi-square with Ldegrees of freedom (χ2L).Define λ̂s as N times the squared conditional correlation between X and T given S:13λ̂s ≡ Nγ̂2s(p̂s− p̂2s)−1σ̂2X, (4.37)13Note that λ̂s =Fs/φ̂s.97thus wereject H0 ifL∑s=1λ̂sφ̂s > cL,1−α∗ . (4.38)We do not derive an explicit formula for the bias of estimators of AT E or AT T that mightarise from failures in the randomization protocol. Such formulae would require modelling therelationship between Y and X and S. Therefore, although we do not impose any restriction onthat relationship, we can have a sensitivity of the bias incurred in estimation of AT E and AT Tby measuring how far W f is from cL,1−α∗ .Our Empirical Likelihood (EL) estimators for AT E and AT T are:ÂT E = argmax∆L (∆) (4.39)whereL (∆) = maxd1,...,dLL∑s=1Ns log(ds) subject to (i), (ii) and (iii) (4.40)where (i): ∑Ls=1 ds = 1, ds ≥ 0, for s = 1, ...,L; (ii): ∑Ls=1 dsλ̂s ≤ cL,1−α∗ ; and (iii): ∆ =∑Ls=1 dsÂT Es. For ATT we have:ÂT T = argmax∆LT (∆) (4.41)whereLT (∆) = maxd1,...,dLL∑s=1Ns log(ds) subject to (i), (ii) and (iii)’ (4.42)where (iii)’: ∆= ∑Ls=1p̂sp̂ dsÂT Es.Following Qin and Lawless (1994) and Kitamura (2006) we know that for a fixed ∆ thesolution to maxd1,...,dL ∑Ls=1 Ns log(ds) subject to (i), (ii) and (iii) can be written as the solutionto the following Lagrangean[φ ∗,η∗1 ,η∗2 ,η∗3 ]> = arg maxd1,...,dL,η1,η2,η3L∑s=1Ns log(ds)+η1(1−L∑s=1ds)−η2L∑s=1ds(λ̂s− cL,1−α∗)−η3L∑s=1ds(ÂT Es−∆),(4.43)whose first order conditions yield a non-linear system of equationsNsφ ∗s−η∗1 −η∗2(λ̂s− cL,1−α∗)−η∗3(ÂT Es−∆)= 0, for s = 1, ...,L (4.44)∑Ls=1 φ ∗s = 1, ∑Ls=1 φ ∗s λ̂s = cL,1−α∗ , ∑Ls=1 φ ∗s ÂT Es = ∆.98Therefore, we haveη∗1 (∆) = N (4.45)andφ ∗s (∆) = φ̂s(1+N−1η∗2 (∆)(λ̂s− cL,1−α∗)+N−1η∗3 (∆)(ÂT Es−∆))−1(4.46)[η∗2 (∆)η∗3 (∆)]= arg zeroη∗2 ,η∗3 ∑Ls=1 φ̂s(1+η∗2(λ̂s−cL,1−α∗N)+η∗3(ÂT Es−∆N))−1(λ̂s−cL,1−α∗N)∑Ls=1 φ̂s(1+η∗2(λ̂s−cL,1−α∗N)+η∗3(ÂT Es−∆N))−1(ÂT Es−∆N)leading to the following equivalent way to express our Empirical Likelihood EstimatorsÂT E = argmin∆L∑s=1φ̂s log(1+N−1η∗2 (∆)(λ̂s− cL,1−α∗)+N−1η∗3 (∆)(ÂT Es−∆)). (4.47)By analogy we obtainÂT T = argmin∆L∑s=1φ̂s log(1+N−1η∗2,T (∆)(λ̂s− cL,1−α∗)+N−1η∗3,T (∆)(p̂sp̂ÂT Es−∆))(4.48)where[η∗2,T (∆)η∗3 ,T (∆)]= arg zeroη∗2 ,η∗3 ∑Ls=1 φ̂s(1+η∗2(λ̂s−cL,1−α∗N)+η∗3(p̂sp̂ ÂT Es−∆N))−1(λ̂s−cL,1−α∗N)∑Ls=1 φ̂s(1+η∗2(λ̂s−cL,1−α∗N)+η∗3(p̂sp̂ ÂT Es−∆N))−1( p̂sp̂ ÂT Es−∆N) .(4.49)The optimal weights for the strata in this case areφ ∗s,T (∆) = φ̂s(1+N−1η∗2,T (∆)(λ̂s− cL,1−α∗)+N−1η∗3,T (∆)(p̂sp̂ÂT Es−∆))−1. (4.50)There are some advantages of using a Empirical Likelihood Estimator in our case. First notethat if we were only looking for weights that would respect restriction (i), then it is clear that theoptimal weights would be the original ones φ̂s (after zeroing η2 and η3). If we add restriction(ii) then optimal weights would be either φ ∗s = φ̂s for the case that restriction (ii) is not binding,that is, for the case that W f ≤ cL,1−α∗ , or φ ∗s would be a decreasing function of λ̂s−cL,1−α∗ : thelarger the correlation between X and T for a stratum s, smaller will be the weights associated toit.99The second advantage is that we can use known results on the aymptotic distribution of ELestimators. For example, we know that EL estimators achieve the semiparametric efficiencybounds (see Kitamura, 2006) and therefore, if we had no estimation problem in the first step,where we estimate AT Es and λs then we would achieve efficiency. Although we cannot sayanything about overall efficiency in this setting, we can claim, however, that the proposed AT Eand AT T EL-type estimators are, if we ignore the uncertainty arising from estimation of AT Esand λs, minimum variance estimators.Another interesting result is that an EL estimator for AT E that imposes restriction (ii) whenthat restriction is binding, cannot have a larger variance (ignoring first step issues) than anEL estimator that does not take that into restriction (ii) into account when it is binding. See,for example, section 5 of Qin and Lawless (1994), but the reasoning is the same as a GMMestimator that will have smaller variance than another one that does not include all relevantmoment restrictions. Finally, following Kitamura (2006) bootstrap testing and inference forAT E and AT T are valid and we rely on resampling methods in our next applied section toconstruct confidence intervals for AT E and AT T .4.3.4 Euclidean Distance NormAs noted by Kitamura (2006), the Empirical Likelihood is just one of many contrast functionsthat can be invoked. A particularly interesting alternative choice is the Euclidean Distancedivergence function. In this case one minimize the squared distance from the strata “natural”weights (ps). This can be done just by replacing log(ds) in equation (40) to 12(ds− ps)2. It ispossible to derived closed form solution for the parameters in this case. Also, it can be shownthat the resulting weights for the strata are a decreasing function of the F statistic in that strata,as one intuitively should expect.14The estimator for the ATE in this case becomes:ÂT E = argmax∆L (∆) (4.51)whereL (∆) = maxd1,...,dLL∑s=1Ns (ds− ps)2 subject to (i), (ii) and (iii) (4.52)where (i): ∑Ls=1 ds = 1, ds ≥ 0, for s = 1, ...,L; (ii): ∑Ls=1 dsλ̂s ≤ cL,1−α∗ ; and (iii): ∆ =∑Ls=1 dsÂT Es.14See Kitamura (2006) for more details.100Table 4.1: Parameter Specification for the Monte Carlo SimulationCoefficient/Equation X2 X3 ρβ0 15 0 0β1 2 2 0.1β2 - 0.1 2β3 - - 1.5σ 4 8 4.54.4 A Monte Carlo Study4.4.1 Tests of Covariate ImbalanceIn this section we report the results of Monte Carlo (MC) exercises to verify the performanceof the tests of pretreatment imbalances in finite-size samples. We work with three distinct sam-ples whose differences are in the number of individuals and strata. The first sample has 1000individuals divided in 100 strata, the second 2000 individuals in 100 strata, and the last 2000individuals in 10 strata. From now on we will use the convention that ε are random variables uni-formly distributed on the [0,1] interval and u are standard normal random variables. Throughoutthe number of replications in the MC exercises is 1000.The vector Xof observed pretreatment covariates consists of three variables, one of which isbinary (X1), one has discrete support (X2), and the last one is continuous (X3). More specifically,they are constructed by the following equations:15X2i = h(β0+β1X1i+σε2i),X3i = β0+β1X1i+β2X2i+σε3i,where X1i is set to be Bernoulli distributed with probability of success equal to 0.5.The relation between the distribution of pretreatment covariates and the strata is set to be:ρi = β1X1i+β2X2i+β3X3i+σui. (4.53)The individuals are divided across the strata according to the percentiles of the distributionof ρ . To simplify, this partition is homogeneous in the number of individuals, so each stratumhas the same size (for instance, the first sample has 10 individuals in each stratum). The specificchoice of values of the parameters in the previous three equations is presented in Table 4.1.16To analyze the performance of the tests, we consider three different scenarios for the dis-tribution of the treatment status. The first corresponds to the case under the null hypothesis of15For notational simplicity, we drop the superscript on the parameters, although they are specific to each equation.h(x) is a function that for every x returns the highest integer that is smaller than x. This ensures that X2i is discrete101Table 4.2: Treatment Status Parameters for the Monte Carlo Simulationproperness of the randomization protocol, the second is the case off the null hypothesis, butrestricting the deviations from the null to have the same sign in all strata, and the third refersto the case where the deviations from the null have different signs across the strata. Equation(4.54) determines the treatment status, where each scenario is characterized by the appropriatechoice of the parameters, as shown in Table 4.2:Ti ≡ 1I{δw0 +ηwSi+δw1 X1i+δw2 X2i+δw3 X3i ≤ ε4i}, (4.54)where the function 1I{.} assumes unitary value if its argument is true and zero otherwise, and thesuperscript w is used to allow for different parameters at different strata. In principle, these pa-rameters can be specified for each specific stratum but for simplicity we set wi ≡ 1I{Si is even},which implies that the strata with odd and even labels are associated with different parameters.Tables 4.3 and 4.4 present the performance of the tests under the null hypothesis. Table4.3 contains the results of the tests that are implemented at the stratum level (separate tests),whereas 4.4 shows the results when all strata are pooled together (aggregate tests). For separatetests we calculate at each replication the proportion of all strata for which the null hypothesisinstead of continuous.16As should be clear from the discussion in the theoretical section, our results are extremely robust to changes inthese parameters’ values.102was rejected at the 5% level and compute the mean across all replications; for the aggregate testswe compute the size of the test, which is the proportion of all simulations for which the null wasrejected. The separate tests we conduct are the usual t-test and two exact tests, the Fisher andthe Wilcoxon tests. Apart from our proposed test, the set of aggregate tests are: the naive test,which simply tests the equality of the means in the pretreatment covariates between treated andcontrol individuals, and the fixed effect test.17 The aggregate tests are implemented using theregression approach presented in section 2.4. In all tables we present in this section, panels A,B, and C display the results for the first, second and third samples respectively.Under H0 the tests at the stratum level show what was expected: the proportion of rejectionsof the null was close to the nominal size for all tests.18 This is valid for all samples sizes andnumber of strata (Table 4.3). Looking at the aggregate tests under the null (Table 4.4), allweighted tests, including ours, display rejection rates that are close to the nominal size. This isnot the case for the naive test, which shows quite large size distortions. The bad performance ofthe naive test is not surprising since, as discussed in section 2.3.1, overall differences in means inpretreatment variables may show up even under correct randomization within strata. Given thatthe naive test does not perform well under the null, we do not present it in the tables containingthe results for the scenarios off the null.Tables 4.5, 4.6, 4.7, and 4.8 present the results for the two scenarios that deviate from thenull hypothesis. The first pair of tables contains the results in the case where deviations have thesame sign in all strata, while the second pair refers to case in which the deviations have differentsigns across the strata. Tests are conducted separately for each pretreatment covariate and forall them together (joint test).Looking at the separate tests in tables 4.5 and 4.7, we see that the proportion of rejectionstends to be slightly above the nominal size when strata are small in size (panel A). Naturally,when the strata become larger in size (panels B and C), the stratum level tests show an increasein the proportion of rejections of the null. These results indicate that separate tests are capableof detecting deviations from the null but with relative small power. This become more evi-dent when these results are compared to those from aggregate tests, which show much largerproportion of rejections.Table 4.6 displays the results for the aggregate tests when the deviation from the null has thesame sign across strata. The coverage rate of our test is smaller than the fixed effect for indi-vidual variables and for the joint test. Since the computation of our test involves the estimationof many more parameters, its coverage rate approximates those of the other tests as sample orstratum size increases.The context in which our test shows one of its main strengths is displayed in Table 4.8. As17The ATE and ATT reweighted tests are also included in the simulation. To avoid unnecessarily increasing thenumber of outcomes to discuss, we chose not to report them. The results are, as expected, similar to those obtainedfor the fixed effect test. They are available from the authors upon request.18The Fisher test results were a little smaller than the nominal value, a result that has been detected by Liddell(1976).103Table 4.3: Performance of Disaggregated Tests under the Null Hypothesis104Table 4.4: Performance of Aggregated Tests under the Null Hypothesis105Table 4.5: Performance of Disaggregated Tests off the Null Hypothesis(a) Deviations with the same sign across strata106Table 4.6: Performance of Aggregated Tests off the Null Hypothesis(a) Deviations with the same sign across strata107Table 4.7: Performance of Disagregated Tests off the Null Hypothesis(a) Deviations with the different signs across strata108it can be seen the performance of the other test is rather poor, with very low coverage rates forindividual and joint tests, and for all sample or strata sizes. Our test however behaves quitewell, with higher proportions of rejections in all cases. To the extent the patterns of correlationsbetween treatment status and individuals characteristics are unknown in stratified randomizedtrials, this relative gain in power offered by our test seems quite desirable.4.4.2 Monte Carlo - Performance of Average Treatment Effect EstimationProceduresIn this section we will evaluate the performance of the most common procedures to estimatethe average treatment effect. To do that, we simply add two potential outcomes equation to thespecification of the previous section. The equations that describe the DPGs are as follows, andthe parameters that characterize the DGP are shown in Table 4.9.Y 1i = βw0 +ηSi+δwZi+βw1 X1i+βw2 X2i+βw3 X3i+σu1i (4.55)Y 0i = βw0 +ηSi+δwZi+βw1 X1i+βw2 X2i+βw3 X3i+σu0i (4.56)We considered the following procedures for estimating the Average Treatment Effect. The naiveestimator, which consists of a simple difference of means between the treatment and the controlgroups. The fixed-effect estimator, which we called unrestricted estimator to emphasize the factthat it does not use the information about the properness of randomization at each strata. We alsoincluded the fixed effect estimator after dropping the strata where the null hypothesis was re-jected. We called this estimator DUS, which is an acronym for “Dropping Unbalanced Strata”).Finally, we also evaluate the performance of the Empirical Likelihood and CR Euclidean Dis-tance estimators. Tables 4.10 and 4.11 shows the results using X3 as the tested covariate.19The main result we have is that the Empirical Likelihood and the CR-Euclidian Distanceestimator showed a smaller MSE than the other ones. Notice that these results did not changeafter we reduce the size of the parameters of the potential outcomes equation to a half of the firstspecification. Usually the DUS estimator presents a higher variance than the competitors, whichis expected since it removes the information of all unbalanced strata. On the other extreme case,the unrestricted fixed-effect estimator presents higher bias, since it includes the information ofstrata where the randomization was not well performed, so it is more likely that the estimates ofthe ATE in that strata will be biased. Our procedures, by using all the information of the data,but reducing the weight of it as a function of the measure of properness of the randomization,allow us to keep the bias under control without increasing too much the variance.19We also obtained results for X1, which are qualitatively similar to the ones presented here. They are not shownbut are available from the authors upon request.109Table 4.8: Performance of Aggregated Tests Off the Null Hypothesis(a) Deviations with the different signs across strata110Table 4.9: Parameters Specification for the Potential Outcomes in the Monte CarloSimulation[111Table 4.10: Performance of ATE EstimatorsCase I112Table 4.11: Performance of ATE EstimatorsCase II1134.5 The PLANFOR4.5.1 Program DescriptionUntil the mid 1990s, public-sponsored training programs in Brazil were traditionally provided ina centralized fashion and were mostly targeted to (semi-)skilled workers. However, around thatmoment, the federal government decided to implement profound changes in the public trainingsystem. The Plano Nacional de Formacao Profissional (PLANFOR), which was launched in1995, was conceived to be the cornerstone of this new system.The general objective of PLANFOR was to gradually mobilize and articulate the entirenetwork of professional education in the country. To achieve that goal, the program intendedto integrate all training actions developed by private and public institutions, non-governamentalorganizations (NGOs) and other related institutions into a national training policy. Its specificgoals were to develop training and re-training activities that could increase the participationrate, the employability and the capacity to generate labor earnings of workers with low levels ofschooling, the unemployed and the socially excluded.Although the general conceptualization and management of PLANFOR were under the re-sponsibility of the federal government, the implementation of the program was highly decentral-ized and involved two broad types of partnerships. The first was with State authorities, whichdefined the training sub-programs to be implemented and sub-contracted vocational proprietaryschools and local community colleges to provide the courses. The second type of partnershipwas made directly with NGOs and other institutions (e.g. Foundations, Universities and labourunions) that could provide training services either on a national or local basis. One of the con-sequences of this decentralized system was that the program was actually implemented by amyriad of individual training providers.PLANFOR’s target-population prioritized the more vulnerable segments of the populationsuch as the poor (those with per capita income in the lower third of the income distribution),the low-educated (those with less than primary education), head-of-family women, the youthsearching their first job, seasonal workers, and those with special needs. Though these eligibilitycriteria had to be followed in general, the program allowed some flexibility for States and otherpartners to define their own target populations.In general, the courses provided classroom training consisting of the provision of basicskills necessary for a variety of occupations, such as waiters, hairdressers, administrative jobs,receptionists, sellers, sewers and electricians. The program operated on a continuous basisthroughout the year and, in general, courses would last between 30 and 60 days, with around 60hours of classes per month. There were no established minimum and maximum class sizes, sothe number of students in each class varied a great deal across courses and individual providers.Also, to the extent that distinct occupations attract different types of individuals, the compositionof students tended to be quite different across the courses and classes.1144.5.2 SampleIn 1998/99, the Brazilian Ministry of Labor financed an experimental evaluation of the effective-ness of PLANFOR on labour earnings and employment of program participants.20 The experi-ment was carried out in two different metropolitan areas of the country, namely Rio de Janeiroand Fortaleza. The former is a major city situated in the more developed Southeast region of thecountry, while the latter is a medium-sized city located in the less developed Northeast region.As aforementioned, PLANFOR offered a large variety of distinct training courses across thecountry, and this was not different for the two cities in our sample. Given this heterogeneity, theevaluators decided to implement the process of randomization of individuals in and out of theprogram at the class level. More specifically, for each provider and for each class of the distinctcourses being offered, individuals previously enrolled in each class were randomly selected toparticipate in their chosen course/class.21 Since there was excess demand for all courses/classesin all providers, all classes could be filled up and those enrollees who were selected out becamepotential control group individuals. The sample just after randomization contained 2321 treat-ments and 2195 controls in Rio de Janeiro (4516 in total) and 1528 treatments and 1845 controlsin Fortaleza (3373 in total).All courses in both cities took place between September and December 1998. Participantsand nonparticipants were interviewed in both cities through the application of the same ques-tionnaire. The questionnaire included questions about demographic characteristics of the indi-viduals as well as retrospective questions about their labour market history in the twelve monthspreceding that of the program’s course. Some treated individuals were interviewed at the placewhere they attended the course, but others had to be interviewed at home after the end course.Interviews with controls were always done at home. In this process some individuals from boththe treatment and control groups ended up not being interviewed.22 This implied a reduction inthe initial sample size to 1323 treatments and 1293 controls in Rio de Janeiro (2616) and 1311treatments and 1490 controls in Fortaleza (2801). Hence, attrition represented a loss of around43% of the planned sample for treatments and 41% for controls in Rio de Janeiro, with thesefigures decreasing significantly to 14% and 19% in Fortaleza, respectively.The motivation of control group individuals to participate in the program was an importantconcern of the evaluators. In particular, they were skeptical about comparing actual treatmentswith controls that did not seem sufficiently motivated to attend a training course. In order todetect a lack of interest to participate in the program, a question was included at the beginningof the questionnaire about whether controls would attend a training course were they actually20The evaluation was jointly conducted by two external institutions, namely the Instituto de Pesquisa EconomicaAplicada (IPEA) and the Centro de Desenvolvimento e Planejamento Regional (CEDEPLAR).21The list of courses is long. Important examples are courses for computer operators, secretaries, receptionists,salespersons, waiters, and hairdressers.22Main reasons for that are change of address, individual not present at the time of the interview, and difficultiesto reach the individual’s household.115offered it. All controls that responded ‘no’ to this question did not answer the rest of the ques-tionnaire, so there is no available information for this set of individuals. Interestingly, thisproduced a significative reduction of 132 control group observations in Rio de Janeiro, but just2 in Fortaleza.A follow-up questionnaire (also identical in the two cities) was fielded in November 1999,and interviewed individuals responded retrospective questions about their labor market historygoing back to September 1998. As is natural in this kind of situation, there was sample attritionin between the baseline and the follow-up interviews. In Rio de Janeiro, attrition led to a lossof 228 individuals, out of which 126 were from the treatment group and 102 from the controlgroup. This represents a relative reduction of around 9% of the initial sample, with respectivelosses of 10% and 9% for the treatment and control samples in Rio de Janeiro. The correspond-ing figures for Fortaleza are: 382 (14%), 147 (11%), and 235 (16%), respectively. Altogether,combining the first and the follow-up interviews, the sample for which there is complete in-formation amounts to 1197 treatments and 1059 controls in Rio de Janeiro (2256), and 1164treatments and 1253 controls in Fortaleza (2417). Tables 4.12 and 4.13 summarize the evolutionof sample sizes over time in the two cities and describe the main features of the data.4.5.3 Testing for Pretreatment Imbalances in PLANFOR’s dataIn this section we discuss potential threats to PLANFOR’s experiment integrity, namely differ-ences in the distribution of covariates between treated and control groups. Table 4.14 summarizethe main results of imbalance tests at the strata level.As it can be seen, for a large group of classes, the null hypothesis is accepted for all thecovariates. It should be noted that even if randomization was properly performed we shouldexpect to observe a rejection rate of the null hypothesis equal to the size of the test. The smallnumber of rejections should lead us to think that these rejections are due to “type I error”. Infact, we cannot reject the hypothesis that the proportion of rejections is equal to the size of thetest. However, given the lack of power of these tests, the absence of rejection of the null shouldnot be taken as definitive evidence in favor of the randomization.Moving on to aggregate procedures, Table 4.15 shows the results of the test of difference inmeans of each pre-treatment variable. It contains the results of our proposed test (Wf ), the usualnaive test and the fixed-effect test that accounts for stratification. The last two lines in the tablerefer to the joint test of significance of the treatment status on all covariates simultaneously.The three p-values below .01 level in a total of five tests give us a very definitively rejectionof the null hypothesis. This result is confirmed by the p-value of the joint-test. Notice that thefixed-effect, which has correct nominal size, failed to detect it. As it can be seen by comparingthe naive and the fixed-effect estimators, the stratification is responsible, but not entirely, for thedifferences observed between treated and controls.116Table 4.12: Evolution of Sample Size by City and Treatment Status117Table 4.13: Descriptive StatisticsTable 4.14: Strata Level Tests of Balancement of Covariates118Table 4.15: Aggregated Balancement Tests4.5.4 Treatment Effect ResultsIn the last section, we verified that randomization was not well done, although several testsfailed to detect it. In this section we present estimates of the Average Treatment Effects on theTreated (AT T ). Table 4.16 displays the estimated effects on wages and employment, the latterbeing defined as the proportion of months that the individuals were employed. The baselinestrata treatment effect estimates (ÂT Es) used on the DUS, Euclidean Distance and EmpiricalLikelihood are constructed using the simple procedure of a regression of the dependent variableon the treatment indicator at each stratum. This procedure is equivalent to the difference on theaverage of the outcomes of the treatment and control group and it is an unbiased estimator ofthe treatment effect under the null hypothesis of a proper randomization. To avoid an attenu-ation bias generated by the possibility of lock-in effects, we did not use the information aboutemployment until three months after the conclusion of the course.We report the AT T with several different techniques. First, we report the naive AT T , i.e.the AT T calculated by the simple mean difference in outcomes between treated and controlindividuals. Given our previous findings on cavariate imbalance, it is unlikely that the naiveestimator is consistent for the AT T . Then, we report the AT T using the fixed-effects estimator.As we fail to accept the hypothesis of balance in covariates between treated and controls, wereport the ATT calculated after dropping the strata for which we reject the null hypothesis. Wealso report the Empirical Likelihood (EL) and the Euclidean Distance (ED estimators proposedin sections 3.3 and 3.4.By looking at Table 4.16, it seems that the program did not have any effect on the wageand employment outcomes on statistical grounds. Though our proposed estimators implement119Table 4.16: Planfor – Estimates of the Average Treatment Effects on the Treateda correction for the covariate imbalance that was previously detected, they did not unveil anystatistically significant impact of the program on the outcomes of interest.4.6 ConclusionsThe use of small-scale randomized control trials to assess the effects of various types of socialinterventions has increased a great deal in the last two decades. Either for econometric or in-stitutional reasons, many times the randomization process follows a stratified design in whichindividuals are randomly selected into the treatment and control groups at the stratum level(e.g. communities, schools, classes etc). Though it is expected the pre-treatment characteristicsof treated and controls are balanced in this type of design, this is not guaranteed and shouldbe carefully tested. If there is imbalance in baseline characteristics between the experimentalgroups, there is much reason to believe that estimates of treatment effect parameters will be120biased.The available literature has provided some guidance to test the null hypothesis of balance ofcharacteristics between treated and controls in stratified randomized trials. Tests are proposedeither at the stratum or the aggregate level. When stratum size is small - a situation that oftenhappens with small-scale programs - the usual asymptotic tests (e.g., the t-test) are expected tobehave poorly and the exact tests (e.g., Fisher Exact Test) tend to have small power. The usualaggregate tests (e.g. the test of differences in means between treated and controls in baselinecharacteristics) are unbiased but may also display low power.To overcome these problems, we proposed a F-test that can detect a larger spectrum ofdeviations from the null than the usual aggregated tests. We showed theoretically and in aMonte Carlo exercise that with heterogeneous violations in the strata, the performance of ourtest is significantly better than the ones from tests taken either at the strata or at the aggregatedlevel.Since the ultimate goal in an evaluation study is not testing for imbalance in covariates, wedeveloped an integrated method that uses the realized values of our tests for covariate imbalanceto estimate two of the most used treatment effect parameters, the Average Treatment Effect(AT E) and the Average Treatment Effect on the Treated (AT T ).We applied our “two-step” method to the evaluation of a job training program whose designfor a selected cohort of applicants followed a stratified design. We showed that the stratificationstructure accounts for much, but not all the imbalances presents in the pretreatment distributionof treated and controls. We also showed that even after accounting for the covariate imbalancethrough our integrated estimator, the program has no statistically significant effect on the wageor employment of the treated.121Bibliography[1] Alan Agresti. A survey of exact inference for contingency tables. Statistical Science, pages131–153, 1992.[2] Mikel Aickin. Randomization, balance, and the validity and efficiency of design-adaptiveallocation methods. Journal of Statistical Planning and Inference, 94(1):97–119, 2001.[3] Rita Almeida and Pedro Carneiro. Enforcement of Labor Regulation and Informality. Amer-ican Economic Journal: Applied Economics, 4(3):pp. 64–89, 2012.[4] Douglas G Altman. Comparability of randomised groups. The Statistician, pages 125–136,1985.[5] Glauco Arbix. A queda recente da desigualdade no brasil. Nueva Sociedad, 212, 2007.[6] David H Autor, Alan Manning, and Christopher L Smith. The contribution of the minimumwage to us wage inequality over three decades: a reassessment. NBER Working Paper Series,(16533), 2010.[7] Ricardo Paes de Barros. A efetividade do sala´rio mı´nimo em comparac¸a˜o a` do programabolsa famı´lia como instrumento de reduc¸a˜o da pobreza e da desigualdade. Brası´lia: IPEA,2007.[8] Ricardo Paes de Barros and Rosane Silva Pinto de Mendonc¸a. Os determinantes da de-sigualdade no brasil. 1995.[9] Howard S Bloom. The core analytics of randomized experiments for social research, 2006.[10] Colin R Blyth. On simpson’s paradox and the sure-thing principle. Journal of the AmericanStatistical Association, 67(338):364–366, 1972.[11] Mariano Bosch, Edwin Goni, and William F Maloney. The determinants of rising infor-mality in Brazil: Evidence from gross worker flows. World Bank Policy Research WorkingPaper Series, Vol, 2007.[12] Miriam Bruhn and David McKenzie. In pursuit of balance: Randomization in practicein development field experiments. World Bank Policy Research Working Paper Series, Vol,2008.122[13] Gary Burtless. The case for randomized field trials in economic and policy research. TheJournal of Economic Perspectives, pages 63–84, 1995.[14] David Card and Alan B Krueger. Minimum wages and employment: A case study ofthe fast food industry in new jersey and pennsylvania. Technical report, National Bureau ofEconomic Research, 1993.[15] Gregory C Chow. Tests of equality between sets of coefficients in two linear regressions.Econometrica: Journal of the Econometric Society, pages 591–605, 1960.[16] Richard Crump, V Joseph Hotz, Guido Imbens, and Oscar Mitnik. Moving the goalposts:Addressing limited overlap in the estimation of average treatment effects by changing theestimand, 2006.[17] Richard Dickens, Stephen Machin, and Alan Manning. The effects of minimum wages onemployment: Theory and evidence from britain. Journal of Labor Economics, 17(1):1–22,1999.[18] Joseph J Doyle Jr. Employment effects of a minimum wage: A density discontinuitydesign revisited. Working Paper, 2007.[19] Esther Duflo, Rachel Glennerster, and Michael Kremer. Using randomization in develop-ment economics research: A toolkit. Handbook of development economics, 4:3895–3962,2007.[20] Francisco HG Ferreira et al. Os determinantes da desigualdade de renda no Brasil: lutade classes ou heterogeneidade educacional? Pontifı´cia Universidade Cato´lica de Rio deJaneiro, Departamento de Economı´a, 2000.[21] Sergio Firpo, Nicole Fortin, and Thomas Lemieux. Occupational tasks and changes in thewage structure. Working Paper, 2009.[22] Sergio Firpo, Nicole M. Fortin, and Thomas Lemieux. Unconditional quantile regressions.Econometrica, 77(3):953–973, 2009.[23] Sergio Firpo and Maurı´cio Cortez Reis. O sala´rio mı´nimo ea queda recente da desigualdadeno brasil. Desigualdade de renda no Brasil: uma ana´lise da queda recente, 2, 2007.[24] Ronald A Fisher. On the interpretation of χ2 from contingency tables, and the calculationof p. Journal of the Royal Statistical Society, pages 87–94, 1922.[25] Ronald A Fisher. The design of experiments. 1935. Oliver and Boyd, Edinburgh, 1935.[26] Albert Fishlow. Brazilian size distribution of income. The American Economic Review,pages 391–402, 1972.123[27] Miguel Nathan Foguel and Joa˜o Pedro Azevedo. Uma decomposic¸a˜o da desigualdade derendimentos do trabalho no brasil: 1984-2005. 2006.[28] Nicole Fortin, Thomas Lemieux, and Sergio Firpo. Decomposition methods in economics.Handbook of labor economics, 4:1–102, 2011.[29] Ben B Hansen and Jake Bowers. Covariate balance in simple, stratified and clusteredcomparative studies. Statistical Science, pages 219–236, 2008.[30] Rodolfo Hoffmann. Transfereˆncias de renda e a reduc¸a˜o da desigualdade no brasil e cincoregio˜es entre 1997 e 2004. Revista Econoˆmica, 8(1), 2006.[31] Kosuke Imai, Gary King, Clayton Nall, et al. The essential role of pair matching in cluster-randomized experiments, with application to the mexican universal health insurance evalua-tion. Statistical Science, 24(1):29–53, 2009.[32] Guido Imbens, Gary King, David McKenzie, and Geert Ridder. On the benefits of stratifi-cation in randomized experiments, 2008.[33] Guido W Imbens. Nonparametric estimation of average treatment effects under exogene-ity: A review. Review of Economics and statistics, 86(1):4–29, 2004.[34] Chinhui Juhn, Kevin M Murphy, and Brooks Pierce. Wage inequality and the rise in returnsto skill. Journal of political Economy, pages 410–442, 1993.[35] Walter N Kernan, Catherine M Viscoli, Robert W Makuch, Lawrence M Brass, and Ralph IHorwitz. Stratified randomization for clinical trials. Journal of clinical epidemiology,52(1):19–26, 1999.[36] Yuichi Kitamura. Empirical likelihood methods in econometrics: Theory and practice.2006.[37] Alan B Krueger and Diane M Whitmore. The effect of attending a small class in the earlygrades on college-test taking and middle school test results: Evidence from project star. TheEconomic Journal, 111(468):1–28, 2001.[38] Carlos Geraldo Langoni. Distribuic¸a˜o da renda e desenvolvimento econoˆmico do brasil:uma reafirmac¸a˜o. 1973.[39] David S Lee. Wage inequality in the united states during the 1980s: Rising dispersion orfalling minimum wage? Quarterly Journal of Economics, pages 977–1023, 1999.[40] Thomas Lemieux. Minimum wages and the joint distribution employment and wages.Department of Economics, University of British Columbia Working Paper, 2011.124[41] Sara Lemos. Minimum wage effects in a developing country. Labour Economics,16(2):224–237, 2009.[42] Steven D Levitt and John A List. Field experiments in economics: the past, the present,and the future. European Economic Review, 53(1):1–18, 2009.[43] Douglas Liddell. Practical tests of 2× 2 contingency tables. The Statistician, pages 295–304, 1976.[44] A Russell Localio, Jesse A Berlin, Thomas R Ten Have, and Stephen E Kimmel. Ad-justments for center in multicenter studies: an overview. Annals of internal medicine,135(2):112–123, 2001.[45] Jose´ AF Machado and Jose´ Mata. Counterfactual decomposition of changes in wage dis-tributions using quantile regression. Journal of applied Econometrics, 20(4):445–465, 2005.[46] Enlinson Mattos and Laudo M Ogura. Skill differentiation between formal and informalemployment. Journal of Economic Studies, 36(5):461–480, October 2009.[47] Justin McCrary. Manipulation of the running variable in the regression discontinuity de-sign: A density test. Journal of Econometrics, 142(2):698–714, 2008.[48] Blaise Melly. Decomposition of differences in distribution using quantile regression.Labour Economics, 12(4):577–590, 2005.[49] Blaise Melly. Estimation of counterfactual distributions using quantile regression. Reviewof Labor Economics, 68(4):543–572, 2006.[50] Robert H Meyer and David Wise. Discontinuous distributions and missing persons: Theminimum wage and unemployed youth. Econometrica, 51(6):1677–98, 1983.[51] Robert H Meyer and David A Wise. The effects of the minimum wage on the employmentand earnings of youth. Journal of Labor Economics, pages 66–100, 1983.[52] David Neumark, Wendy Cunningham, and Lucas Siga. The effects of the minimum wagein brazil on the distribution of family incomes: 1996–2001. Journal of Development Eco-nomics, 80(1):136–159, 2006.[53] Adrian Pagan and Aman Ullah. Nonparametric econometrics. Cambridge university press,1999.[54] Thomas Permutt. Testing for imbalance of covariates in controlled experiments. Statisticsin medicine, 9(12):1455–1462, 1990.125[55] Jin Qin and Jerry Lawless. Empirical likelihood and general estimating equations. TheAnnals of Statistics, pages 300–325, 1994.[56] Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf. Hypothesis testing in economet-rics. Annual Review of Economics, 2(1):75–104, 09 2010.[57] Joa˜o Saboia. O sala´rio mı´nimo e seu potencial para a melhoria da distribuic¸a˜o de rendano brasil. Desigualdade de renda no Brasil: uma ana´lise da queda recente, Brası´lia: IPEA,2007.[58] Stephen Senn. Testing for baseline balance in clinical trials. Statistics in medicine,13(17):1715–1726, 1994.[59] Coen N Teulings. Aggregation bias in elasticities of substitution and the minimum wageparadox. International Economic Review, 41(2):359–398, 2000.[60] Terry M Therneau. How many stratification factors are too many to use in a randomizationplan? Controlled Clinical Trials, 14(2):98–108, 1993.[61] Finis Welch and James Cunningham. Effects of minimum wages on the level and agecomposition of youth employment. The Review of Economics and Statistics, pages 140–145,1978.126Appendix AAppendix to Chapter 2A.1 IdentificationProof of lemmas 1 and 2:In the following exposition, I will assume that (i) the latent density is different from zero atthe minimum wage and (ii) Λ′(m) 6= 0.Given Assumptions 3 and 4, the relationship between the observed density and the latentone can be written as:f (w) =pid(w) f0(w)c if w < m∫ m pim(w) f0(w)c dw if w = mf0(w)c if w > m.(A.1)Given Assumptions 2, 3 and 4, the latent share of the formal sector Λ(w(0)) is identifiedusing the information above the minimum wage. This is true because Pr[S(0) = 1|W (0) = w] =Pr[S(1) = 1|W (1) = w] when w > m. Then, we have:β0 = argminβ∫ ∞m(Pr[S(1) = 1|W (1) = u]−Λ(u;β ))2 f (u|W (1)> m)du.Furthermore, we have that Λ(w;β0) = Pr[S(0) = 1|W (0) = w] for all w.1 Given Assumptions 1,3 and 4, we have:pid(m) = limε→0+f (m− ε)f (m+ ε).Moreover, regarding the derivative of the wage density, we have:f ′(w) ={pi ′d(w) f0(w)c +pid(w) f ′0(w)c if w < mf ′0(w)c if w > m.(A.2)1Note the importance of all w in this sentence. This means that once we recover β0, we can forecast Pr[S(0) =1|W (0) = w] for values of w that are below the minimum wage level. It should be clear here why non-parametricestimation of the conditional probability of sector given the wage is not an option. By assuming a parametric form, Ican use the parameters to predict the latent probability of sector given the wage for values at which, in the data, thisprobability is equal to zero due to the minimum wage policy.127Then, it can be shown that:pi ′d(m) = limε→0+(f ′(m− ε)f ′(m+ ε)− f (m− ε)f (m+ ε))· f′(m+ ε)f (m+ ε).Because the RHS of this equation contains only objects of the observed wage distribution, thisimplies that pi ′d(m) is identified. Given that the function Λ(m) is identified, we have:pi(0)d = pid(m)−Λ(m)Λ′(m)·pi ′d(m)pi(1)d = [pid(m)− (1−Λ(m)) ·pi(0)d ] ·Λ(m)−1.This can be shown using the equation below and its derivative with respect to the wage:pid(w) = pi(1)d Λ(w)+pi(0)d (1−Λ(w)).Given that all terms of the equation above are identified, we have that the function pid(w) isidentified. Inverting the relationship between the observed and latent wage densities, we have:f0(w) ={f (w)·cpid(w)if w < mf (w) · c if w≥ m. (A.3)Which implies:c = [∫ m f (w)pid(w)dw+1−F(m)]−1.Because the function pid(w) is already identified and F(m) is simply the fraction of workers inthe observed wage distribution who earn less than or equal to the minimum wage, c is identified.This implies the identification of the entire latent wage distribution f0(w). Using the latent wagedensity and the function Λ(w) allows the identification of the latent densities of the formal andinformal sectors and, finally, the remaining parameters pi(1)m and pi(1)u .f (W (0) = w|S(0) = 1) = Pr[S(0)=1|W (0)=w]· f0(w)Pr[S(0)=1] = Λ(w)· f0(w)∫ Λ(u) f0(u)duf (W (0)|S(0) = 0) = Pr[S(0)=0|W (0)=w]· f0(w)Pr[S(0)=0] = (1−Λ(w))· f0(w)∫ (1−Λ(u)) f0(u)dupi(1)m = Pr[W (1)=m|S(1)=1]1−Pr[W (1)=m|S(1)=1)] · 1−F0(m|S(0)=1)F0(m|S(0)=1)pi(1)u = 1−pi(1)d −pi(1)mpi(0)m = 1−pi(0)d .q.e.d.128It is important to note that the identification result holds if one assumes that pi(1)m and pi(1)uare non-specified functions of the latent wage, as long as pi(1)d remains constant. In this scenario,the parameters recovered above are expectations - E(pi1d ) and E(pi(1)u ) - over the distribution ofworkers whose latent wages are below the minimum wage. Formally, the parameters identifiedare pi(1)m = Pr[W (1) = m|S(0) = 1,W (0) < m] and pi(1)u = Pr[W (1) = .|S(0) = 1,W (0) < m].Under the maintained assumptions, this probability is the same for all workers regardless oftheir latent wage. In the case in which workers are heterogeneous in the probability of becomingunemployed, or receiving the minimum wage, with respect to their latent wages, the modelrecovers the natural extension of this parameter in the presence of such heterogeneity. That is,it recovers the average effect for the population of affected workers. Interestingly, this does notimply that the latent wage distributions obtained under the assumption of constant probabilitieswill be inconsistent. The assumption of constant probabilities is maintained only to simplify theexposition.2Further, it should be stressed that this proof does not require the wage distribution to peakabove the minimum wage. In fact, one can identify the effects of the minimum wage regardlessof where in the latent wage distribution the minimum wage happens to be set, as long as thedensity of wages is greater than zero at the minimum wage, pi(1)d and pi(0)d are constants andeither one of them is greater than zero.Proof of Corollary 4.3:The identification of treatment effect parameters follows directly from the identification ofthe joint distribution of observed and latent sector and wages from i.i.d data on {(Wi(1),Si(1))}.2This will hold as long as the part regarding the probability of non-compliance is correctly specified with respectto latent wages – for example, if it is constant. See more on this in Appendix 3.129A.2 Identification under Independence between Sector and WagesIn this section, I discuss the identification given the independence between (latent) sector andwages, that is, Pr[S(0) = 1|W (0) = w] = Λ ∀ w. I maintain Assumptions 1 (continuity), 3 (nospillovers) and 4 (minimum wage effects). Given those assumptions, the aggregate wage densitywill be given by:f (w) =pid f0(w)c if w < mpimF0(m)c if w = mf0(w)c if w > m.(A.4)This is exactly the one-sector version of this model, as proposed by Doyle (2006). This meansthat at least the aggregate parameters pid , pim and piu are identified as:pid = limε→0f (m− ε)f (m+ ε).To identify pim, one simply needs to verify that:pim = pid · Pr[W (1) = m]Pr[W (1)< m] .Given pid , F0(m) can identified by: 3F0(m) =Pr[W (1)< m]pidPr[W (1)> m]+Pr[W (1)< m].The relationship between the aggregate data parameters pid and pim and the sector-specific modelparameters can be derived as:pid = Λpi(1)d +(1−Λ)pi(0)dpim = Λpi(1)m +(1−Λ)pi(0)mpiu = Λpi(1)upi(1)d +pi(1)m +pi(1)u = 1pi(0)d +pi(0)m = 1.Having recovered the aggregate parameters, the goal is solve for the sector-specific parameters.To do so, one first needs to identify Λ. Note:Λ≡ Pr[S(0) = 1] = Pr[S(0) = 1|W (0)> m] = Pr[S(1) = 1|W (1)> m],3See the section on the identification of Doyle’s model.130where the first equality holds because of the independence between latent sector and wages,and the second holds due to the lack of spillovers on sector probabilities. Interestingly, theidentification of the latent size of the formal sector does not rely on anything but independence,the lack of spillovers, and the assumption that Pr[W (1) > m|W (0) < m] = 0.4 This means thatwe can correctly identify the size of the formal sector even if we mis-specify the continuity ofthe latent distribution of wages or the way in which that the minimum wage affects the lower tailof the wage distribution. Note that, given the aggregate data parameters and Λ, this is a systemof five equations and five unknowns. Unfortunately, the system is rank deficient, and hence, anadditional equation needs to be added to recover the sector-specific parameters. Relying on theidentification of Λ , pi(1)u is identified by:pi(1)u =piuΛ=1−pid−pimΛ.To recover pi(1)m , it is necessary to consider the density of the formal sector:f (w|S(1) = 1) =0 if w < mpi(1)m F0(m)c(1)if w = mf0(w)c(1)if w > m,(A.5)where c(1) = 1−F0(m)(1−pi(1)m ) is a scaling factor such that the two densities integrate to one.The key feature of the formal sector that allows for the identification of pi(1)m is that because thedensity is zero below the minimum wage, the scaling factor on the denominator is a function ofonly one unknown parameter (note that F0(m) is already identified). Finally, using:Pr[W (1) = m|S(1) = 1] = pi(1)m F0(m)/c(1),it is possible to show that:pi(1)m =Pr[W (1) = m|S(1) = 1]1−Pr[W (1) = m|S(1) = 1] ·1−F0(m)F0(m).The RHS of this equation consists only of quantities that are already identified. Given that pi(1)mis identified based on the expression above, we can now return to the system and recover all theother parameters:pi(0)m =pim−Λpi(1)m1−Λ .4Pr[W (1)>m|W (0)<m] = 0 is implied by Assumption 4. When Assumption 4 does not hold, the identificationstrategy described above will be valid if Pr[W (1) > m|W (0) < m] = 0. An example of this situation is when theprobability of non-compliance is a function of the worker’s latent wage. This would invalidate Assumption 4 whilepreserving the condition Pr[W (1)> m|W (0)< m] = 0.131Thus:pi(0)d = 1−pi(0)m .Finally:pi(1)d = 1−pi(1)m −pi(1)u .The latent wage density can be recovered in the same way as in the baseline model, that is, byinverting the relationship and using the fact that c and pid were already identified:f0(w) ={f (w)·cpid if w < mf (w) · c if w > m. (A.6)This implies that we have identified the latent distribution of wages f0(w), the latent size of theformal sector Λ and the parameters pi that govern how the minimum wage affects the economy.Note that estimation in this context is considerably easier than in the baseline model. Thisis the case because it is not necessary to estimate the derivative of the density of wages at m tosolve for the sector-specific parameters. All objects in the identifying equations – except by thelateral limit of the density of wages at m – can be estimated by replacing the population objectwith its respective sample counterpart. I used this plug-in method to estimate the parameters ofthe model in the empirical application.A.3 EstimationA.3.1 Local Linear Density EstimationIn this section, I describe the local linear approach to density estimation. A standard approachto non-parametrically estimate densities at boundary points is to use a local linear density esti-mator. This estimator builds on the idea of local linear conditional mean estimators. It beginsby dividing the support of the density into a set of bins. Thereafter, a “response variable” isdefined as the bin counts of these disjoint intervals. After this process, one is left with a vectorcontaining the “independent variable,” which is the bin center, and a corresponding “dependentvariable,” the bin counts. Finally, standard local polynomial smoothing estimates are applied tothese constructed variables. The discussion and notation here will follow the approach advo-cated by McCrary (2008) in the context of testing for manipulation in RD designs. To begin,define g(wi) as the discretized version of the wage support for a bin size equal to b.g(w) ={ bw−mb cb+ b2 +m if w 6= mm if w = m,where bac is the greatest integer in a.5 Clearly, it holds that g(w) ∈ χ ≡ {...,m− 5 b2 ,m−5As discussed by McCrary, the greatest integer in a is the unique integer Q such that Q < a < Q+ 1 (round tothe left). In software, this is known as the “floor” function.1323 b2 ,m− b2 ,m,m+ b2 ,m+ 3 b2 ,m+ 5 b2 , ...}. I will call the jth element of this set X j. 6 Define thenormalized cell size for the jth bin Yj = 1Nb ∑Ni=1 1I{g(Wi) = X j} . Let K(.) be a symmetric kernelfunction satisfying the usual properties and let h be a bandwidth satisfying the conditions h→ 0, nh→ ∞, (nh)1/2h2→ 0 and b/h→ 0 . Then, the local linear estimator of the density and itsderivative are defined, for w 6= m, as:[f̂ (w)f̂ ′(w)]= argmin(a0,a1)′∑Jj(Yj−a0−a1(X j−w))2K(w j−wh )(1I{X j > m}1I{w > m}+1I{X j < m}1I{w < m}).A.4 RobustnessThis section demonstrates how generalizable the inferences based on the model are when themodel is misspecified or when we fail to obtain data on other determinants of the joint distribu-tion of sector and wages. I claim that (i) the model is still correctly specified if the unobservedheterogeneity affects either the model parameters or the latent wage distribution, but not both,(ii) the model correctly identifies the desired features of the data when the model parameters areallowed to vary across latent wages and individuals under some restrictions on the unobservedheterogeneity, (iii) the odds ratio of clustering at the minimum wage (pim) versus non-compliance(pid), and the latent share of the formal sector are correctly identified even when unemploymenteffects cannot be, and (iv) the aggregate parameters pid , pim and piu are correctly identified evenwhen Assumption 2 does not hold or when unemployment effects are also present in the infor-mal sector.To show (i), I reformulate the model and allow its parameters or distributions to be func-tions of potentially unobservable worker characteristics. I show that under some conditions, theassumptions I require for the baseline model to hold will still be valid. (ii) I reformulate themodel under a random coefficients framework. I show that under reasonable conditions for theheterogeneity of the parameters across individuals, the estimands based on the baseline modelidentify the expectation of the distribution of parameters over the set of workers affected by theminimum wage. To show (iii), I prove that a lack of continuity implies the inconsistency ofsome, but not all, of the parameters of interest in the model. To show (iv), I recall that identifica-tion of Doyle’s aggregate parameters does not rely on all four assumptions that I use to identifythe baseline model.These results reveal an important feature of the baseline model. It is easy and feasible to in-fer the direction that the parameter estimates will go when some of the model’s assumptions areviolated. Moreover, as the identification is achieved using “separable” pieces – a model for theconditional distribution of sector given the wages, continuity of latent wage distribution to iden-tify pid , and so forth – some of the results will still hold when the model is partially misspecified.6 As discussed in McCrary (2008), the endpoints X1 and X j may always be chosen arbitrarily small (large) suchthat all points in the support of the distribution of wages are inside one of the bins.133Taken together, these features should increase the credibility of the results when there are someconcerns with the correctness of the model specification. Some pieces of information based onthis approach can be useful even in the worst case scenario in which the model is guaranteed tobe inconsistent for some parameters.A.4.1 Role of Covariates and Unobserved HeterogeneityBy exploring the different effects of the minimum wage across sectors and the discontinuity ofthe density of wages around the minimum, one can estimate how the economy responds to thispolicy. This approach has some similarities to the quasi-experimental Regression DiscontinuityDesigns. Because one of the main advantages of Regression Discontinuity Designs is to providea way to avoid most of the endogeneity concerns associated with using observational data toinfer causality, it is useful to discuss the extent to which these advantages are also present in thismethod.Assume that there is a random variable Z – say, for example, age – that is known to affectindividual labor market conditions. One example is when workers with different values of Zdraw from different latent wage distributions. Another way that Z can affect a worker’s labormarket conditions is through the model parameters. For example, after the introduction of theminimum wage, younger workers might be more likely to move into the informal sector thanolder workers, which, in the model, would be represented by a higher pi(1)d . In these cases, is itnecessary to estimate the model conditional on Z for the inferences to be valid?In the following discussion, I will always assume continuity of the Z-specific latent wagedistribution, an absence of spillovers and a covariate-specific version of the assumption thatdescribes the minimum wage effects. I will also assume the following:Assumption 11. Conditional probability of latent sector given the wage:Pr[S(0) = 1|W (0) = w]≡∫Pr[S(0) = 1|W (0) = w,Z = z] f (z|w)dz = Λ(w;β ).This assumption simply means that whatever the model for the conditional probability ofthe sector given the wage and Z is, this model can be aggregated to a unconditional one withparameters β .7 Two sufficient conditions for the inferences based on the unconditional wagedistribution to be valid in the presence of covariates are the following:Case 1:7In general, this model will be more complex than the covariate-specific one. A simple, sufficient, but clearlynot necessary, condition to guarantee that such a model will exist is when strengthened to Pr[S(0) = 1|W (0) =w,Z =z] = Λ(w,β ), that is, covariates only enter the conditional probability of sector given the wage though their effectson wages.134Assumption 12. Equality of parameterspi(z) = pi ∀z.When the effect of Z occurs through changes in the latent joint distribution of sector andwages but not through differential responses to the minimum wage, then Z can be safely ignoredwhen making inferences with regard to the unconditional distribution. The reason for this resultis simple. The assumptions above imply that all assumptions of the model for the aggregateddata remain valid.Case 2:Assumption 13. Equality of latent distributionsW (0)|Z∼F.This assumption means that Pr[W (0) < w|Z = z] = Pr[W (0) < w|Z = z′] for all (z,z′) andall w. By restricting the latent wage distribution to be the same for all values of z, inferencebased on the unconditional distribution ignoring the covariate will be valid when parameters areallowed to vary over Z. The parameters pi recovered from the aggregate data will be weightedaverages of the covariate-specific ones, with correct weights to reflect the share of each group ofvalues of Z in the population. These, of course, are much stronger conditions than those in Case1, as the role of covariates is severely limited when they are only allowed to determine wagesthrough the differences in minimum wage effects.When both the latent wage distribution and the parameters are allowed to vary over Z, theestimate of pid can be interpreted as a local effect, as it recovers the likelihood of non-compliancefor those with latent wages around the minimum wage. Preliminary results from simulationsshow that an unreasonably large degree of heterogeneity in both the latent distributions and themodel parameters is necessary for the inference based on unconditional distribution to showsizable distortions.The relevance of these results is quite small if the wage determinants are observable, as themodel can be easily estimated conditional on these variables. If the estimation is performedwhile conditioning on the covariates, one need not be concerned with the cases above, meaningthat the model parameters and latent joint distributions can be different for different values ofZ. However, the situation differs when not all wage determinants are observable. Failure toobserve wage determinants is a major source of bias in inferences based on regression models.In this design, this is not the case, as long as the model parameters remain constant over thedistribution of the variables that are ignored, which seems to be a much easier condition to sat-isfy than the zero correlation usually assumed in regression models. In this sense, this researchdesign resembles most of the characteristics of Regression Discontinuity Designs, overcoming135the difficulties in assessing causal effects based on observational data due to endogeneity con-cerns. The reason for this is that the identification does not rely on variation in the minimumwage to assess the policy’s impact. Instead, identification relies on the sharp contrast betweenthe effect of the minimum wage across individuals whose wages would fall on each side of it.Thus, concerns with omitted variable biases should be much more limited.A.4.2 Random CoefficientsIn the model, a worker is characterized by a pair (Wi,Si) of observed sector and wage, a vector(Wi(0),Si(0),ζi), and a vector (pi(ζ ) ≡ (pi(1)n (ζ ),pi(1)m (ζ ),pi(1)u (ζ ),pi(0)n (ζ ),pi(0)m (ζ )), which isnow ζ -specific. One way to interpret this is that we are treating ζ as the worker’s unobservedtype. For now, I will not assume anything regarding the relationship between the worker’s typeand his latent wages. Of course, it still holds that pi(0m (ζ )+pi(0)n (ζ ) = 1 for all ζ and similarly forthe formal sector parameters. This means, in addition to the worker’s latent sector and wages,he receives a draw for the model parameters. Here, I also allow this draw to be a function of thelatent wage. Thus, for example, workers with higher latent wages can have a higher probabilityof receiving the minimum wage versus becoming unemployed. This extension captures the ideathat (i) minimum wage effects might vary across dimensions of worker’s characteristics that areunobservable to the researcher and (ii) minimum wage effects can, and likely will, vary acrossworkers with respect to the distance of their latent productivities from the minimum wage level.The rest of the model remains the same, meaning that I will retain continuity and the absenceof spillovers. I will also assume independence between latent sector and wages for simplicity inthe rest of this discussion.This extension adds a great degree of flexibility to the model. It relaxes Assumption 4 in twoways. It allows different workers with similar wages to have different minimum wage responseprobabilities in an unknown and unspecified way. It also allows workers in the formal sector tohave different probabilities of becoming unemployed (pi(1)u ) versus truncating at the minimumwage (pi(1)m )) for different values of the latent wage. Importantly, this can be achieved withoutrelying on any specified functional form; that is, it is not assumed that these probabilities varyover latent wages in any parametric, continuous or known way.To analyze the model, we now need to define some new objects. Let:E(pi(1)m (ζ |w)) =∫pi(1)m (u) fζ |w(u)du.This expression defines the “average probability of truncation at the minimum wage for a formalsector worker with latent wage equal to w” as the integral of this probability for each worker’sunobserved type weighted by the proportion of each type for that wage value. We can analo-gously define similar objects for the other probabilities.Now, under this new set of assumptions, the relationship between latent and observed den-sities will be given by:136f (w) =E(pid(ζ |W (0)=w) f0(w)c if w < m∫ m E(pim(ζ )|W (0)=u) f0(u)duc if w = mf0(w)c if w > m.Now, let us consider the behavior of the estimands defined for the baseline model used underthis, more general, version.pid ≡ limε→0 f (m− ε)f (m+ ε) .It is easy to see that:limε→0f (m− ε)f (m+ ε)= ΛE(pi(1)d (ζ |W (0) = m,S(0) = 1)+(1−Λ)E(pi(0)d (ζ )|W (0) = m,S(0) = 0).Now, it is also easy to see that this estimand will converge to the number that we need ifE(pi(1)d (ζ |W (0) = w,S(0) = 1) = E(pi(1)d (ζ |W = w′,S(0) = 1) and E(pi(0)d (ζ )|W (0) = w,S(0) =0) = E(pi(0)d (ζ )|W (0) = w′,S(0) = 0). This means that the only restriction on the relationshipbetween the types and latent wages is that the expectation of the non-compliance probabilities(taken with respect to the type distribution) is not a function of the wage. 8Assuming that this condition holds, we have that our baseline estimand limε→0f (m−ε)f (m+ε) iden-tifies the expected value of pid over the population of affected individuals. That is:limε→0f (m− ε)f (m+ ε)= Pr[W (1) =W (0)|W (0)< m].Regarding the estimand of pim:pim ≡ pid Pr[Wi = m]Pr[Wi < m] .It can be shown that:pidPr[Wi = m]Pr[Wi < m]=∫ m E(pim(ζ )|W (0) = u) f0(u)du∫ m f0(u)du = Pr[W (1) = m|W (0)< m],which means that pim converges to the expectation of the parameter over the population of af-fected workers. The intuition for this result is that the estimand of pim comes from the pointmass at the minimum wage level, which is obtained by integrating the probability of “cluster-ing” at the minimum wage level for all workers whose latent wages are below the minimum8This does not mean that the model is unidentified if this condition fails to hold. It means that in this case, wewould need to rely on the derivatives of the wage density to identify the slope of the relationship between expectedminimum wage probabilities and latent wages. This can be achieved in the same way as discussed in the testingsection.137wage level. Thus, irrespective of what functional form exists between the latent wage and theprobability of receiving the minimum wage, this form simply reveals itself in the data in theform of the proportion of workers at the minimum wage level. The mass of wages at the min-imum wage level has already “integrated out” the unobserved heterogeneity. This allows usto consider estimating Pr[W (1) = m|W (0) < m] without completely describing the shape ofpi(1)m (ζ ) as a function of W (0). The term Pr[W (1) = m|W (0)< m,S(0) = 1] coincides with theparameter pi(1)m as defined in the baseline model when pi(1)m is not a function of the latent wage.When pi(1)m is indeed a function of the latent wage (through unobserved types, for example), wecan bypass the task of modeling this function and directly identify the aggregate componentPr[W (1) = m|W (0) < m,S(0) = 1]. Similar calculations show that the same is the case withrespect to the estimand of unemployment:1−pid−pim = Pr[W (1) = .|W (0)< m].Finally, it can be shown that the estimates for the implied treatment effects and latent den-sities will also converge to the correct values. This is a somewhat remarkable result, as it canbe tempting to say that the way in which one understands the relationship between the hetero-geneity in parameters and latent wages will necessarily determine the estimated latent densitiesthat, in turn, will drive the results for the treatment effects. This result shows that this intuitionis incorrect. As long as the part that concerns the likelihood of non-compliance (pi(0)d ,pi(1)d ) isreasonably specified, which can be achieved in a very flexible way by utilizing higher orderderivatives of the wage density, all the results will hold. This result will hold even if unemploy-ment or truncation at the minimum wage happen to have a unknown pattern that varies acrossindividuals, unobservable characteristics, or latent wages. 9A.4.3 Lack of ContinuityIn the following discussion, I will assume independence between latent sector and wages. Nowsuppose that pid is not identified. This could be the case for two reasons. The first case is whenlatent wage distribution is not continuous. In this case, the estimate of pid actually identifiespidκ , where κ is the (unknown) size of the discontinuity of the latent wage around the minimumwage. It is clear that as long as κ = 1, the estimate of pid will be consistent. The second caseis when spillovers are misspecified. For example, if one incorrectly assumes that spillovers areabsent, when in fact they are present and reduce the density of wages just above the minimum9Of course, the marginal effects estimates will break down, as one needs to know not only the average probabilityof truncating at the minimum wage and unemployment but, more important, also the marginal probability to identifythe effect of changes in the minimum wage level. The marginal probabilities will only be recovered if they eithercoincide with the average probability, as in the baseline version of the model, or if they have a known or estimablefunctional form.138wage,10 pid is misspecified because the density of wages observed just above the minimum wageis not the correct quantity to scale the density below to measure the extent of non-compliance.However, by using this identification strategy, one can compute bounds for the extent ofnon-compliance. For example, if spillovers are assumed to be weakly positive, meaning thatworkers above the minimum do not suffer wage cuts following the policy, then pid estimatedwhen ignoring spillovers represent an upper bound of the likelihood of non-compliance. Thiswill also provide an upper bound for pim and a corresponding lower bound for piu. Importantly,this means that the sizable unemployment effects found in the application cannot be explainedby having a misspecified model for spillovers, as unemployment effects will necessarily bemagnified in the presence of spillovers.Interestingly, some features of the minimum wage effects can still be correctly identified inthis scenario. It is straightforward to see that the odds ratio of truncation versus non-compliancewill still be correctly identified regardless of the lack of continuity in the latent wage distributionor misspecification of spillovers. Moreover, those quantities will be meaningful even if thecorrect specification for the minimum wage effects would need a more flexible form for theparameters – by making them vary across individuals or latent wages, for example. In general,the statistic Pr[W (1)=m]Pr[W (1)<m] will always identifyPr[W (1)=m|W (0)<m]Pr[W (1)<m|W (0)<m] , which is the ratio of the expectedlikelihood of truncating at the minimum wage versus not complying with the minimum wagefor those directly affected by the policy.11. Interestingly, note that the same does not hold forthe ratio of unemployment to either non-compliance or truncation.It should also be stressed that under independence between sector and wages, the latentshare of the formal sector – which is perhaps one of the most relevant parameters of the model– is still identified regardless of misspecifying how the minimum wage affects the lower part ofthe wage distribution or a lack of continuity in the latent wage distribution.12A.4.4 Aggregate ParametersDoyle (2006) and Meyer and Wise (1983) define what I call “aggregated data” probabilities pid ,pim and piu. I call them aggregated because they are a weighted average of the correspondingsector-specific likelihood of non-compliance, truncating and becoming unemployed. Becausetheir goal is to compute aggregate data parameters, they do not need to have a correctly specifiedform for the conditional probability of the sector given the wage.The identification of the simplified version of the model here uses Doyle’s estimate as afirst step. Then, the weights of the sector-specific probabilities are estimated, and finally, one10This will be the case if one assumes that spillovers are weakly positive.11The conditions needed for this result are weak, namely, the lack of point mass at the minimum wage level inthe latent wage distribution and a “no-crossing condition” that rules out workers higher up in the wage distributionreceiving the minimum wage or less in the presence of the policy12The only additional assumption needed for that identifying this parameter is a lack of spillovers on sectorprobabilities. See the section on identification of the restricted version of the model.139can solve for the sector-specific parameters. This is a worthwhile exercise because, as I haveshown above, a broader set of counterfactuals, such as labor tax and the size of the formal sector,analyses can be performed with sector-specific parameters. Moreover, pi(1)d is a parameter withmore economic meaning than pid itself.Importantly, misspecification of either sector-specific assumptions or the form of the jointdistribution of sector and wages has different consequences for the aggregate parameters whencompared to the sector-specific parameters. Two cases can illustrate this: If either (a) unem-ployment effects on the informal sector are present, which is ruled out by Assumption 4, or (b)the model for the conditional probability of the sector given the wages is incorrectly specified,then the sector-specific parameter estimators will be inconsistent. However, the aggregate oneswill not be. It is straightforward to see this because neither Doyle or Meyer and Wise use thisinformation.The results concerning the misspecification of the minimum wage effects, the joint distri-bution of latent sector and wages, spillovers or unknown heterogeneity of parameters point inthe same direction. They show that the quantities obtained by the identification of the baselinemodel remain informative when some of the model’s assumptions are incorrectly specified.A.4.5 Robust Estimates of the Effects of the Minimum Wage on the Size of theInformal SectorThis paper develops a model that allows one to estimate the effects of the minimum wage ona broad range of policy-relevant outcomes. Importantly, the model captures a channel throughwhich workers move from the formal sector to the informal sector in response to the minimumwage policy. The effects of the minimum wage on the size of the informal sector have importantpolicy implications. This parameter is key to understand the effects of the minimum wage onthe government budget, for example.Under the assumptions of the model, this parameter, the effect of the minimum wage onthe size of the informal sector, can be consistently estimated. This section discuss the extent towhich those estimates are robust to deviations from these assumptions, in particular the absenceof spillovers. This will be achieved using Assumption 5, the independence between latent sectorand wages. The object of interest is Pr[S(1)=1]Pr[S(0)=1] , that is, the ratio of the size of the formal sectorin the presence of the minimum wage versus its size in the absence of the minimum wage. Thenumerator of this fraction can be directly estimated from the data. The counterfactual object isthe latent size of the informal sector. Under independence between latent sector and wages, wehave:Pr[S(0) = 1] = Pr[S(0) = 1|W (0)> m] = Pr[S(1) = 1|W (1)> m].This expression uses the size of the formal sector above the minimum wage as the estimate ofthe latent size of the formal sector in the absence of the policy. Interestingly, this result does notrely on the continuity of the latent wage distribution or the correctness of the specification of the140minimum wage effects on the bottom part of the wage distribution. It relies on the independence,the lack of spillovers assumptions, and Pr[W (1)>m|W (0)<m] = 0. To evaluate the robustnessof this estimate to departures from the absence of spillovers, one simply needs to specify a limitat which the spillovers should vanish. In the most extreme version of this assumption, the effectsof the minimum wage vanish at the minimum wage level. However, one can specify that theminimum wage effects vanish at twice, or in general, k-times the minimum wage level. Thislead to the following identification equation:Pr[S(0) = 1] = Pr[S(0) = 1|W (0)> km] = Pr[S(1) = 1|W (1)> km],where k is a number greater than or equal to one. The first equality follows from independencebetween sector and wages, whereas the second follows from the absence of spillovers at pointshigher than km.13 Table A.1 reports the effects of the minimum wage on the size of the formalsector based on different assumptions concerning where spillovers should vanish. The baselineestimates are approximately 10%. The estimates robust to spillovers find an effect of around 12to 16%. The point estimates are significantly different. The qualitative conclusions, however,remain similar. The minimum wage has a sizable impact on the size of the formal sector. Thissection shows that those effects should be further magnified if spillovers are indeed present.These results are based on the minimal assumptions of independence and lack of spillovers inthe upper part of the wage distribution. They are robust to limited spillovers, a lack of continuityin the latent wage distribution and misspecification of the minimum wage effects on the lowerpart of the wage distribution.A.5 Tax Effects of the Minimum Wage Under AlternativeAssumptionsTo provide an idea of the importance of the unemployment effects on the matter at hand, I willalso compute the effects of the minimum wage on taxes based on a different model. In thisversion, I will force the unemployment effects to be equal to zero. By doing so, I no longerneed to assume the continuity of the latent wage distribution. Formally, the model operates asfollows. I will retain Assumptions 6 (independence) and 3 (no spillovers). Assumption 4 willbe modified to force piu = 0:Assumption 14. No Unemployment EffectsUnder the minimum wage, a fraction pid of workers will earn the same wage as in the latentwage distribution. The remaining fraction will earn the minimum wage. These fractions can be13It is interesting to note that one can also add spillovers on wages above this threshold. The only restrictionthat needs to be imposed for this identification to be effective is the absence of spillovers on sector probabilities.That is, workers do not move across in response to the minimum wage if their latent wages are above km andPr[W (1)> km|W (0)< km] = 0.141Table A.1: Robust Estimates of the Effects of the Minimum Wage on the Size of theFormal SectorW>2m W>3m2001 0.867*** 0.853***(0.002) (0.003)2002 0.853*** 0.844***(0.003) (0.004)2003 0.865*** 0.849***(0.003) (0.004)2004 0.847*** 0.845***(0.003) (0.003)2005 0.860*** 0.866***(0.003) (0.004)2006 0.858*** 0.850***(0.003) (0.004)2007 0.879*** 0.872***(0.002) (0.004)2008 0.873*** 0.873***(0.002) (0.003)2009 0.883*** 0.878***(0.002) (0.003)142Table A.2: Labor Tax effects under a “No Unemployment” assumptionsector-specific as in the baseline model. Note that there is no Assumption 1 (continuity) in thiscase. Under these assumptions, the observed wage density will relate to the latent density by thefollowing equation:f (w) =pid f0(w) if w < m(1−pid)F0(m) if w = mf0(w) if w > m,where f0(w) is the latent wage distribution based on this different set of assumptions. In thiscase, we only need to estimate pid . One way to do so is by recognizing that in this case:pid =Pr[w < m]Pr[W < m]+Pr[W = m].Therefore, a consistent estimator can be constructed by plugging in the maximum likelihoodestimator of the respective quantity. With an estimate of pid , the latent wage density can beeasily estimated by properly reweighting the observed wage density. Then, the tax effects of theminimum wage can be computed under the “no unemployment” assumption, given by:R≡ T (1)T (0)=Pr[S(1) = 1]Pr[S(0) = 1]· E(τ(W (1))W (1)|S(1) = 1)E(τ(W (0))W (0)|S(0) = 1) .This is exactly the same expression as before without the unemployment component c. Im-portantly, the expected wages under the latent distribution also change, as the estimate of thelatent distribution is different under this different set of assumptions. Table A.2 reports theestimates of R for the years from 2001 to 2009. The estimates under the assumption of no-unemployment indicate that the minimum wage has a sufficiently strong effect on average wagesto compensate for the reduction in the share of the formal sector due to sector transition. More-over, note that the for the same data, the implied effect of the minimum wage on the averagewages of those employed is, as expected, smaller when one assumes the absence of unemploy-ment effects.143Appendix BAppendix to Chapter 3B.1 Estimating Spillovers Under Parametric AssumptionsB.1.1 IntroductionChapter 3 discuss the estimation of the effects of the minimum wage in a Dual Economy char-acterized by a large informal sector. The model allows for the minimum wage to affect the jointdistribution of sector and wages in a variety of ways. It allows for the standard employmenteffects. It allows the minimum wage to affect the shape of the lower part of the wage distri-bution. It also allows the minimum wage to induce endogenous movements from the formalto the informal sector. Despite all these features, the model is restrictive in the way it treatsworkers in the upper part of the wage distribution, since it assumes away spillovers effects ofthe minimum wage. This appendix shows that spillovers can in principle be incorporated in theparametric form of the model. In the folllowing discussion I will keep assumptions 7, 8 and 10.Assumption 9 will be replaced by the assumption 15, as described below.Assumption 15. Spillover Effects (1):W (0)> m =⇒[W (1)S(1)]=[ν(W (0);ζ ,κ)Λ(ν(W (0);ζ ,κ),β )]Assumption 16. Spillover Effects (2):W (0)> m =⇒[W (1)S(1)]=[ν(W (0);ζ ,κ)S(0)]Where ν : [m,∞]→ [m,∞] is a spillover known up to the parameters ζ and κ . An exampleof this kind of function is a Location Scale Shift:w = ν(W (0))≡{ζ +κW (0) with probability pW (0) with probability(1− p)144B.1.2 Identification and EstimationIf an indicator of spillover were available, identifying the effects of minimum wage on the upperpart of the wage distribution would be a trivial task. Nevertheless, it is helpful to remember howthis task would be performed in this scenario. Suppose the econometrician observe a randomvariable pi– on top of (wi,si) – where pi is an indicator function that is equal to 1 when W (0)>m and W (1) 6= W (0). That is, pi is an indicator if the individual “spilled” up in the wagedistribution. Let N1(N0) be the number of individuals for which pi = 1 (pi = 0). Then it isstraightforward to see that to construct an estimator of (κ ,ζ ) using only the information of theupper part of the wage distribution by:κ̂ = (N0∑1I{wi > m}(wi pi− w¯pi=1)2N1∑1I{wi > m}(wi(1− pi)− w¯pi=0)2)1/2ζ̂ = ∑1I{wi > m}wi piN1− κ̂∑1I{wi > m}wi(1− pi)N0Also, an estimator of the probability of spillovers would be given by:p̂ =∑Ni=1 piNThese estimators are consistent for the parameters of the spillover function. This can be seentrivially from the fact that they estimate the difference in location and scale of the distributionof workers that spilled up when compared with the ones that were not affected by the policy.Unfortunately, pi is not observable. For each worker observed in the upper part of the wagedistribution it is not possible to tell if his wage in the latent wage distribution would be the sameas the observed one or if he was affected by the spillovers. Because of that, a more intricateapproach must be taken. However, by relying on the parametric assumptions, the problembecomes tractable. Lets begin by writing the log likelihood the parameters, given i.i.d. data onthe vector (wi,si, pi) , wi > 0 ∀ i. To derive the formulas for the likelihood I will assume latentwages were normally distributed. 1The joint distribution of f (wi,si, pi|Θ) can be nicely decomposed into the following compo-nents:f (wi,si, pi|θ) = Pr(si|wi,si;Θ) f (wi|pi;Θ) j(pi|Θ)Where Θ≡ (θ ′,β ′,pi(1)d ,pi(1)m ,pi(1)u ,pi(0)d ,pi(0)m ,ζ ,κ, p) . Pr(si|wi,si,Θ) is the conditional distribu-tion of observed sector given the wage. The function f (wi|pi;Θ) is the distribution of observed1Of course, similar expressions will hold for different parameterizations.145wages (W (1)) given pi and, lastly, j(pi|Θ) is the marginal distribution of pi, which can bemodeled by a Bernoulli random variable since pi is binary.L(Θ|wi,si, pi) =N∏i=1f (wi,si, pi|Θ)logL(Θ|wi,si, pi) =N∑i=1log f (wi,si, pi|Θ)logL(Θ|wi,si, pi) =N∑i=1log Pr(si|wi, pi;Θ) f (wi|, pi;Θ) j(si|Θ)logL(Θ|wi,si, pi) =N∑i=1log Pr(si|wi, pi;Θ)+N∑i=1log f (wi|, pi;Θ)+N∑i=1log j(si|Θ)Finally, let:A≡N∑i=1log Pr(si|wi, pi;Θ)B≡N∑i=1log f (wi|, pi;Θ)C ≡N∑i=1log j(pi|Θ)So it holds trivially that:logL(Θ|wi,si, pi) = A+B+CAlso , it follows from the assumptions above (using Assumption 15 so the expression becomessimpler)2 that:A =N∑i=1,wi>m{si logΛ(wi;β )+(1− si) log(1−Λ(wi|β ))},thus, if the latent distribution is assumed to be normal:B≡N∑i=1,wi<mlog1c(Θ)(pi(1)d Λ(wi|β )+pi(0)d (1−Λ(wi|β ))1σφ(wi−µσ))+N∑i=1,wi=mlog1c(Θ)∫(pi(1)m Λ(w|β )+pi(0)m (1−Λ(w|β )) 1σ φ(w−µσ)dw+N∑i=1,wi>mlog1c(Θ)((1− pi) 1σ φ(wi−µσ)+ pi1κσφ(wi− (µ+δ )κσ))2A similar expression can be derived when Assumption 16 is used instead of Assumption 15.146Where the rescaling factor c(Θ) ensures the observed density integrates to one:c(Θ) = 1−∫ m−∞{(1−pi(1)d −pi(1)m )Λ(w|β )}1σφ((wi−µσ)dwFinally, we have that:C =N∑i=1,w>m{pi log(p)+(1− pi) log(1− p)}Now we have a function that maps from the observed distribution of data onto the likelihoodof the parameters. If pi were available, this function could be easily evaluated for each valueof the parameters, and a maximum likelihood estimator could be obtained as the maximizer ofthis function. Note that pi can be thought as a source of unobserved heterogeneity that has to bedealt with when maximizing the likelihood function. Fortunately, a few methods are availableto adress this issue. One known way to solve this problem is to use the EM-algorithm. Thealgorithm starts first by augmenting the data including the latent variable that governs the regimeswitching. Starting from a guess Θ0 for the parameter values, the algorithm can be defined as:E-step: Given the current value of the parameter, θ k, compute the conditional distributionof the latent index pi given the data:pi = E(pi|wi,si;Θk) = Pr(pi = 1|wi,si,Θk) =pk 1κkσ k φ(wi−(µk+δ k)κkσ k )pk 1σ kκk φ(wi−(µk+δ k)κkσ k )+(1− pk) 1σ k φ(wi−µkσ k )Then, compute the expectation of the log likelihood with respect to the distribution of the latentindex. Thus the E-step results in:Q(Θ|Θk) = EP{logL(Θ|wi,si, pi)},where the subscript indicates the random variable where the expectation is been taken over. So,we have that:Q(Θ|Θk) =N∑i=1{pi(wi,si,Θk) log f (Θ|wi,si, pi = 1)+(1− pi(wi,si,Θk)) log f (Θ|wi,si, pi = 0)}.M-step: The next value for the parameter Θk+1 should be given by:3Θk+1 = argmaxθQ(θ |Θk).To complete the definition of the estimator, stop the iterations when the difference between theEuclidean norm of the vector of parameters is smaller than a specified criteria.3In the simplest case that the problem consists of only the mixture of two normal random variables (that is, theonly effect of minimum wage is to generate spillovers) the maximum likelihood estimates of the mean and varianceof the two regimes are given by weighted averages of the correspondent random variables with weights given by theprobability of that particular realization belonging to each state.147B.1.3 Remarks and DiscussionThe model above provides a broad picture of the effects of minimum wage throughout the econ-omy. It allows for minimum wages to have an impact on the sector probabilities, unemploymentand on the wage distribution through the truncation effect on the left tail and through spilloverson the upper tail. The particular choice for the spillover function can be extended to allow,for example, smaller chances of spillovers as a function of the difference between the latentwage and the minimum wage. Similarly, the assumption that the probabilities pi(1)d and the otherparameters describing the minimum wage effects on the lower tail of the wage distribution areconstant can be easily extended to a parametric form of the probabilities as functions of the latentwage. These assumptions were maintained during the discussion to simplify the exposition.148Appendix CAppendix to Chapter 4C.1 Construction of Regression Based Test StatisticsWe begin withXi =L∑s=1Ds,i (βs+δsTi)+Ui (C.1)and stack all N observations:X =L∑s=1Dsβs+L∑s=1Zsδs+U (C.2)= Dβ +Zδ +U (C.3)where D =[D1, ...,DL], Ds = [Ds,1, ...,Ds,N ]>, Z =[Z1, ...,ZL], Zs = [Ds,1T1, ...,Ds,NTN ]>, X =[X1, ...,XN ]>,U =[U1, ...,UN ]>, β =[β1, ...,βL]>, and δ = [δ1, ...,δL]>. Thenδ̂ =[δ̂1, δ̂2, ..., δ̂L]>=(Z>MDZ)−1Z>MDX (C.4)where MD = IN −Z(Z>Z)−1 Z>. And because Ds,iDq,i = 0 for all i and s 6= q, we have thatZ>MDZ= diag[Ns(p̂s− p̂2s)]Ls=1 and Z>MDX=[∑N1i=1 D1,iXi (Ti− p̂1) , ...,∑NLi=1 DL,iXi (Ti− p̂L)]>.We know that for each s, δ̂s =(p̂s− p̂2s)−1 γ̂s, thus stacking all L terms we obtain γ:γ̂ = [γ̂1, γ̂2, ..., γ̂L]> (C.5)=(D>D)−1Z>MDX (C.6)=(D>D)−1Z>MDZδ̂ (C.7)where D>D = diag [Ns]Ls=1. Now, assuming homocedasticity (Var [U |D,T ] = σ2X ) and under149usual requirements for Central Limit Theorem to hold we have(D>D)1/2(δ̂ −δ)=(D>D)1/2(Z>MDZ)−1Z>MDU (C.8)=[∑Nsi=1 Ds,iUi (Ti− p̂s)√Ns (p̂s− p̂2s )]Ls=1(C.9)D→N(0,σ2X(diag[(ps− p2s)]Ls=1)−1). (C.10)Therefore, under the null hypothesis H0: γ = 0 (or δ = 0), we have(D>D)1/2(γ̂− γ) (C.11)=(D>D)−1/2Z>MDZ(D>D)−1/2(D>D)1/2δ̂ (C.12)= diag[(p̂s− p̂2s)]Ls=1(D>D)1/2δ̂ (C.13)D→N(0,σ2X diag[(ps− p2s)]Ls=1). (C.14)An unfeasible Wald test statistic can be constructed as:Wu =γ̂>(D>D)1/2(diag[(ps− p2s)]Ls=1)−1 (D>D)1/2 γ̂σ2X(C.15)=∑Ls=1 γ̂2s Ns(ps− p2s)−1σ2X(C.16)and a feasible version can be written as:W f =∑Ls=1 γ̂2s Ns(p̂s− p̂2s)−1σ̂2XD,H0→ χ2L , (C.17)where σ̂2X is some consistent estimator of σ2X , as for example S2 from the long regression.Note that because of the diagonal structure of the asymptotic variance-covariance matrix ofN1/2 (γ̂− γ), we have that a F-statistic for the single null γs = 0 is obtained asFs =γ̂2s Ns(p̂s− p̂2s)−1σ̂2XD,H0→ χ21 . (C.18)150C.1.1 General Expression for Weighted Averages that Correct for DifferentialTreatment ProbabilitiesFor any weighting scheme ω (S,T ) satisfying Equation (4.5) there will exist a function ρ (S,T )such that:E(ω (S,T )X |T = 1)−E(ω (S,T )X |T = 0) = E(γSE(ρ (S,T ) |S)) . (C.19)Finally, one can see that if there is a function ρ (S,T ) satisfying Equation (C.19), than if H0 istrue we will have that E(γSE(ρ (S,T ) |S)) = 0.We can verify that Equation (4.5) implies Equation (C.19) by applying the Law of It-erated Expectations and using the fact that for any integrable function g(S), we have thatE(g(S) |T = 1) = E(T g(S))/p and E(g(S) |T = 0) = E((1−T )g(S))/(1− p). That is seenbelow:E(ω (S,T )X |T = 1)−E(ω (S,T )X |T = 0)= E(pSpω (S,1)E(X |S,T = 1))−E((1− pS1− p)ω (S,0)E(X |S,T = 0))(C.20)Thus, if Equation (4.5) holds,E(ω (S,T )X |T = 1)−E(ω (S,T )X |T = 0) = E(pSpω (S,1)γSV [T |S]). (C.21)C.2 Derivation of Test Statistics for Weighted AveragesA test statistic for δ = 0, where δ = E(ω(S,1)p(1−pS)γS)is a general weighted average of γ ′s, can beconstructed using δ̂ ≡ 1N ∑Ls=1 ω̂(s,1)p̂(1−p̂s) γ̂s = ĉ>γ̂ . where weights ĉ = [ĉ1,..., ĉL] , ĉs =ω̂(s,1)N p̂(1−p̂s) =φ̂sĉ∗s . The population probability that an individual i will be in stratum s is φs, thus ∑Ls=1 φs = 1and its sample version is φ̂s =N−1Ns. Population weights are c= [c1,...,cL]>=[φ1c∗1,...,φLc∗L]>where ĉ∗sP→ c∗s . We writeN1/2(δ̂ −δ)= N1/2(ĉ>γ̂− c>γ)(C.22)= N1/2(ĉ> (γ̂− γ)+(ĉ− c)> γ)(C.23)151but under the null that H0: γ = 0 , we have δ = 0. ThusN1/2(δ̂ −δ)= N1/2δ̂ = N1/2ĉ>γ̂ (C.24)= N1/2ĉ>(D>D)−1/2(D>D)1/2γ̂ (C.25)= N−1/2ĉ∗>(D>D)1/2(D>D)1/2γ̂D,H0→ N (0,σ2XE[c∗2S (pS− p2S)]) . (C.26)Therefore,W fω = N(∑Ls=1 φ̂sω̂(s,1)p̂(1−p̂s) γ̂s)2σ̂2X ∑Ls=1 φ̂s(ω̂(s,1)p̂(1−p̂s))2(p̂s− p̂2s )(C.27)= N(∑Ls=1 φ̂sω̂(s,1)(1−p̂s) γ̂s)2σ̂2X ∑Ls=1 φ̂s(ω̂(s,1)(1−p̂s))2(p̂s− p̂2s ). (C.28)W fω =N2θ̂ 2Cσ̂2X ∑Ls=1 Nsĉ∗2s (p̂s− p̂2s )(C.29)=(∑Ls=1 Nsĉ∗s γ̂s)2σ̂2X ∑Ls=1 Nsĉ∗2s (p̂s− p̂2s )(C.30)=(∑Ls=1 ĉsγ̂s)2σ̂2X ∑Ls=1 N−1s ĉ2s (p̂s− p̂2s )D,H0→ χ21 . (C.31)152

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0221355/manifest

Comment

Related Items