Instrumental Variables Selection: a Comparison betweenRegularization and Post-Regularization MethodsbyChiara Di GravioB.Sc. Statistics, Finance and Insurance, Sapienza University of Rome, 2009M.Sc in Actuarial and Financial Science, Sapienza University of Rome, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)August 2015c Chiara Di Gravio, 2015AbstractInstrumental variables are commonly used in statistics, econometrics, and epidemi-ology to obtain consistent parameter estimates in regression models when some ofthe predictors are correlated with the error term. However, the properties of theseestimators are sensitive to the choice of valid instruments. Since in many applica-tions, valid instruments come in a bigger set that includes also weak and possiblyirrelevant instruments, the researcher needs to select a smaller subset of variablesthat are relevant and strongly correlated with the predictors in the model.This thesis reviews part of the instrumental variables literature, examines theproblems caused by having many potential instruments, and uses different vari-ables selection methods in order to identify the relevant instruments. Specifically,the performance of different techniques is compared by looking at the number ofrelevant variables correctly detected, and at the root mean square error of the re-gression coefficients’ estimate. Simulation studies are conducted to evaluate theperformance of the described methods.iiPrefaceThis work was prepared under the supervision of Professor Gabriela Cohen Freue.The research question was earlier identified by Professor Gabriela Cohen Freue.Based on that, I compared already existing techniques with new approaches, andimplemented the simulation studies reported in this thesis.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Instrumental Variables: Two Stage Least Squares and InstrumentsSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Endogeneity and Instrumental Variables . . . . . . . . . . . . . . 62.3 Two Stage Least Squares (2SLS) . . . . . . . . . . . . . . . . . . 92.4 Instrumental Variables Problems . . . . . . . . . . . . . . . . . . 12iv2.4.1 The Hausman Test for Endogeneity . . . . . . . . . . . . 132.4.2 Weak Instruments: Definition and Finite Sample Bias . . . 142.5 Choosing the Relevant Instruments: Instrumental Variables Selection 173 First Stage: Instrumental Variables Selection . . . . . . . . . . . . . 203.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Brief Review of Regularization Methods: Lasso and Elastic Net . 213.3 Post-Regularization Methods . . . . . . . . . . . . . . . . . . . . 253.3.1 The Post-Lasso Estimator . . . . . . . . . . . . . . . . . 253.3.2 The Post-EN/L2 Estimator . . . . . . . . . . . . . . . . . 263.4 Adaptive Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Supervised Principal Components Analysis . . . . . . . . . . . . 283.6 Using Variables Selection in Instrumental Variables Regression . . 304 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 The Case of n > p . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.1 Simulation’s Results . . . . . . . . . . . . . . . . . . . . 354.3 The Case of n < p . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.1 Simulation’s Results . . . . . . . . . . . . . . . . . . . . 424.4 The Choice of the Penalty Parameter l . . . . . . . . . . . . . . . 485 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56vList of TablesTable 4.1 Coefficient RMSE for the simulated examples and for differentinstruments strength. Cor(e,v) = 0.3. For the regularizationand the post-regularization methods, the penalty parameter lhas been chosen such that the cross-validation error is withinone standard error from the minimum. . . . . . . . . . . . . . 37Table 4.2 Coefficient RMSE for the simulated examples and for differentinstruments strength. Cor(e,v) = 0.6. For the regularizationand the post-regularization methods, the penalty parameter lhas been chosen such that the cross-validation error is withinone standard error from the minimum. . . . . . . . . . . . . . 38Table 4.3 Average number of variables selected by each method. Cor(e,v)=0.3. For the regularization and the post-regularization methods,the penalty parameter l has been chosen such that the cross-validation error is within one standard error from the minimum. 39viTable 4.4 Average number of variables selected by each method. Cor(e,v)=0.6. For the regularization and the post-regularization methods,the penalty parameter l has been chosen such that the cross-validation error is within one standard error from the minimum. 40Table 4.5 Coefficient RMSE for the simulated examples and for differentinstruments strength. Cor(e,v) = 0.3. For the regularizationand the post-regularization methods, the penalty parameter lhas been chosen such that the cross-validation error is withinone standard error from the minimum. . . . . . . . . . . . . . 44Table 4.6 Coefficient RMSE for the simulated examples and for differentinstruments strength. Cor(e,v) = 0.6. For the regularizationand the post-regularization methods, the penalty parameter lhas been chosen such that the cross-validation error is withinone standard error from the minimum. . . . . . . . . . . . . . 45Table 4.7 Average number of variables selected by each method. Cor(e,v)=0.3. For the regularization and the post-regularization methods,the penalty parameter l has been chosen such that the cross-validation error is within one standard error from the minimum. 46Table 4.8 Average number of variables selected by each method. Cor(e,v)=0.6. For the regularization and the post-regularization methods,the penalty parameter l has been chosen such that the cross-validation error is within one standard error from the minimum. 47viiList of FiguresFigure 2.1 Mean estimates of b (dot) for different levels of r2xz. The dottedline represents the true value of b . The bars show the standarderror of the estimates. . . . . . . . . . . . . . . . . . . . . . . 12Figure 3.1 Regularization methods coefficients path for a real data exam-ple. The ratio of the sum of the absolute current estimate overthe sum of the absolute OLS estimates is on the x-axis: as wemove towards zero, more coefficients are excluded from themodel. The standardized coefficients computed by multiply-ing the original coefficients by the norm of the predictors areon the y-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . 24viiiFigure 4.1 Trend of the second stage coefficient’s RMSE for different val-ues of l . The data is generated based on Example 4,Cor(e,v)=0.6, and the first stage F-statistic is set to 40 (medium strengthinstruments). A grid of l s is generated, and for each value inthe grid N = 500 datasets are simulated. For each l , the RMSEplotted on the y-axis is computed as the mean of the 500 repli-cation. The black vertical line indicates the value of log(l )such that the cross-validation error in the first stage is withinone standard deviation of its minimum. . . . . . . . . . . . . 51Figure 4.2 Trend of the second stage coefficient’s RMSE for different val-ues of l . The data is generated based on Example 8,Cor(e,v)=0.6, and the first stage F-statistic is set to 40 (medium strengthinstruments). A grid of l s is generated, and for each value inthe grid N = 500 datasets are simulated. For each l , the RMSEplotted on the y-axis is computed as the mean of the 500 repli-cation. The black vertical line indicates the value of log(l )such that the cross-validation error in the first stage is withinone standard deviation of its minimum. . . . . . . . . . . . . 52ixFigure 4.3 Trend of the average number of variables selected for differ-ent values of l . The data is generated based on Example 4,Cor(e,v) = 0.6, the first stage F-statistic is set to 40 (mediumstrength instruments), and the Lasso is used to select the rel-evant instruments. A grid of l s is generated, and for eachvalue in the grid N = 500 datasets are simulated. For each l ,the number of variable selected, the true positive and the falsepositive plotted on the y-axis are computed as the mean of the500 replication. The black vertical line indicates the value oflog(l ) such that the cross-validation error in the first stage iswithin one standard deviation of its minimum. . . . . . . . . . 53Figure 4.4 Trend of the average number of variables selected for differ-ent values of l . The data is generated based on Example 8,Cor(e,v) = 0.6, the first stage F-statistic is set to 40 (mediumstrength instruments), and the Lasso is used to select the rel-evant instruments. A grid of l s is generated, and for eachvalue in the grid N = 500 datasets are simulated. For each l ,the number of variable selected, the true positive and the falsepositive plotted on the y-axis are computed as the mean of the500 replication. The black vertical line indicates the value oflog(l ) such that the cross-validation error in the first stage iswithin one standard deviation of its minimum. . . . . . . . . . 54xAcknowledgmentsFirst, I would like to thank my supervisor Professor Gabriela Cohen Freue forher support and guidance. Her constant encouragement, valuable comments andprecious advice helped me not only to improve this thesis, but also to become abetter researcher. Thanks Gaby, for being extremely patient and for dedicating somuch time teaching me how to write my computer codes properly.My time at UBC has been wonderful thanks to the people I met. In these pasttwo years I have encountered incredible people in the Department of Statistics, Iwould like to thank them for every conversation, and every laughter we had in andoutside the Department. You definitely make these past two years an unforgettablejourney.Most of all, I would like to thank my parents. They are the most amazingpeople I know, they have been providing me with so many opportunities, and theyhave been always encouraging me to follow my dreams. They are the ones thatmade all of this work possible.xiChapter 1IntroductionThe recent development of genomic technology has allowed researchers to carryout various studies in order to understand the molecular complexity of any diseaseprocess, and to develop tools for helping treatment, detection and prevention of aspecific disease. Molecular biomarkers, such as genes or proteins, are measurablemolecular indicators of the presence or progress of a particular disease, or the effectof a given treatment. Due to the ability of molecular biomarkers to provide an earlyindication of the disease, the number of biomarkers discovery studies has beenincreasing over the years. However, since most of these studies are characterized bymany variables among which only a smaller set is relevant, new statistical methods“tailored for the analysis of -omics data” have been growing in the last decade.The study of biomarkers can be carried out using either univariate or multi-variate statistical methods [14]. Univariate methods can be used to identify signif-icant biomarkers by considering each biomarker one at a time (e.g. Student’s t-testand non-parametric tests). Multivariate methods identify relevant biomarkers byconsidering the relationships existing between the candidate molecules. Among1univariate methods, ordinary least squares (OLS) might be used to discover whichbiomarkers are related to a particular disease. Specifically, assuming a linear re-lation between the potential biomarkers and the presence of the disease studied, afeasible model is represented by:yi = xib + ei, (1.1)i = 1, ...,nwhere y is a continuous variable indicating the state of the disease of interest andx is a measurement of the considered biomarker. For instance, in studying Type2 diabetes, it is possible to build a model where insulin resistance is the responsevariable, and potential molecular biomarkers are considered as predictors [11].However, in this setting, the use of the OLS regression presents different prob-lems that need to be addressed. First, some of the potential biomarkers that areincluded as predictors could have been measured with error; for instance, errorsrelated to the machinery used, or to the collection and storage of the samples. Sec-ond, some of the potential biomarkers might not be observed, so it is possible toincorrectly exclude from the model predictors correlated with the ones considered(omitted variables). In both the cases of measurement error and omitted variables,the OLS estimates are biased and inconsistent. Instrumental variables estimatorsare commonly used in statistics, econometrics and epidemiology to address thisproblem.Instrumental variables estimators consistently estimate the regression coeffi-cients of the predictors by using new variables, not included in the model, calledinstruments. To be considered an instrument, a variable Z needs to be such that2changes in Z are associated with changes in the predictor X , but do not lead directlyto changes in the response y; in other words, Z is an instrument if it is associatedwith the predictor X , but not with the error term u. This leads to:Z X yuPermutt and Hebel in [13] measured the effect of maternal smoking on theirchild’s birth weight, using an encouragement to stop smoking as the instrumentalvariable. Moreover, mendelian randomization studies can be seen as an applicationof instrumental variables; in fact, mendelian randomization uses genotype as aninstrument for the exposure of interest. As pointed out by Didilez and Sheehanin [5], the genotype can be considered an instrument since it only affects the diseasestatus indirectly, it is assigned randomly (given the parents genes), and it doesnot depend on any possible confounding. Furthermore, in proteomics biomarkersdiscovery studies, proteins are usually measured with error due to the technologyuse to quantify them. Since protein levels are regulated in part by gene expression,genes might also be used as instrumental variables when looking for molecularbiomarkers.While instrumental variables estimators provide consistent parameter estimates,their properties are sensitive to the choice of valid instruments. Since in manyapplications, valid instruments come in a bigger set that includes also weak andpossibly irrelevant instruments, the researcher needs to select a smaller subset ofvariables that are relevant and strongly correlated with the predictors in the model.3This thesis examines the problems caused by having many potential instruments,and uses different selection methods to identify the relevant ones. In particular, itcompares the performance of the different techniques used in terms of number ofinstruments selected and model’s coefficients’ estimate.The thesis is organized as follows. In Chapter 2, we present background ofinstrumental variables and we define a classical instrumental variables estimator:the two stage least squares (2SLS). We highlight the problem of having to choosevalid instruments among a bigger set, and we consider instrumental variables se-lection as a feasible solution. In Chapter 3, we discuss a wide variety of selectionmethods; specifically, we review the Lasso, the elastic net, the adaptive Lasso,the post-Lasso and the supervised principal component analysis (SPCA). We alsopropose a methodology called post-EN/L2. Finally, in Chapter 4 we report a se-ries of simulation studies designed to compare different selection method in termsof selecting the right subset of instruments and reducing the model’s coefficients’bias.4Chapter 2Instrumental Variables: TwoStage Least Squares andInstruments Selection2.1 IntroductionChapter 1 briefly introduces the consequences of fitting an OLS regression whenthe predictors in the model are measured with error or when potential confoundingvariables are not considered. In both cases, the OLS estimates are biased and in-consistent. Instrumental variables estimators can solve the inconsistency problemby using variables not included in the model, called instruments, to estimate thecoefficients in the OLS regression. This chapter presents in detail a classical in-strumental variables estimator, and illustrates some of the issues related to havingnumerous instruments from which the researcher needs to select a smaller subset.5Specifically, Section 2.2 introduces the concept of endogeneity and instrumentalvariables. Section 2.3 describes the two stage least squares (2SLS) as a possiblemethod to consistently estimate the coefficient in the model. Section 2.4 explainsthe problems of having weak and irrelevant instruments. Finally, Section 2.5 sug-gests instrumental variables selection as a method to improve the model’s coef-ficients’ estimate when the set of available variables includes relevant, weak andpossibly irrelevant instruments.2.2 Endogeneity and Instrumental VariablesConsider the model:yi = x⇤i b +ui, (2.1)i = 1, ...,n,E[u] = 0 and Var(u) = su.where yi is the response variable and x⇤i is a single predictor. Under certain reg-ularity conditions, the classical OLS regression produces unbiased and consistentestimates. One important assumption is that the predictors in the model are uncor-related with the error term: E [x⇤i ui] = 0. When the aforementioned condition is notmet, then the OLS estimate is biased and inconsistent, and the predictor correlatedwith the error term is called endogenous.Endogeneity originates in several contexts: when the predictors are measuredwith error (errors in variables), when they are correlated with unobserved predic-tors (omitted variables), or when they simultaneously affect and are affected by6the response variable (simultaneity). As the consequences of endogeneity are thesame regardless of the possible causes, this section only focuses on the endogeneitycaused by measurement error.Suppose that, while (2.1) represents the true model, the predictor x⇤i cannot beobserved. Specifically, the best the researcher can do is to observe a noisy measureof x⇤i :xi = x⇤i + vi, (2.2)i = 1, ...,n.where v is the measurement error with variance s2v . Then, the model in (2.1)becomes:yi = x⇤i b +ui = (xi vi)b +ui = xib +(ui vib )| {z }ei, (2.3)If vi and ui are assumed to be independent, then the predictor xi is endogenousas it is correlated with the error term ei:E[xiei] = E[(x⇤i + vi)(ui vib )] =bs2v . (2.4)It can be easily proved that the OLS estimator bˆOLS = Âi xiyix2i resulting fromregressing yi on xi is biased:bias⇣bˆOLS⌘= EhbˆOLSib = EÂi xieiÂi x2i⇡Cov(X ,e)s2x, (2.5)7and inconsistent:plimbˆOLS =plim1n (Âi xiyi)plim1nÂi x2i =b1+(s2v /s2x )6= b . (2.6)To be able to consistently estimate the regression coefficient of the endogenouspredictor, it is possible to use new variables, not included in the model, called in-struments. A set of variables can be considered as potential instruments if theysatisfy two important conditions: first, they need to be correlated with the endoge-nous predictor X ; second, they have to be uncorrelated with the error term e . Thefirst condition requires that there is some sort of association between the instru-ments and the endogenous predictor. The second condition excludes the possibilitythat the instruments are predictors for y; in fact, if both X and Z are predictors fory, and y is only regressed on X , then Z would have been absorbed into the errorterm e , and E[ZT e] 6= 0. If an instrument fails to satisfy the first condition, thenthe instrument is said to be irrelevant; however, if an instrument satisfies the firstcondition, but it is only weakly correlated with the endogenous variable, then theinstrument is defined as weak. If an instrument satisfies both the conditions, thenthe coefficient of the endogenous predictor can be consistently estimated using aclassical instrumental variables estimator called two stage least squares (2SLS). Inwhat follow, we consider one endogenous predictor X and p instruments; however,the results obtained in the next sections can be easily extended to the case wheremultiple endogenous predictors are included in the model and to the case whereboth endogenous and non endogenous predictors are considered. To ease the nota-tion, we omit the subscript i and we write all the formulas using matrix notation.82.3 Two Stage Least Squares (2SLS)Given the model in (2.3) and a set of valid instrument Z, it is possible to estimatethe predictor’s coefficient b by fitting a procedure called two stage least squares(2SLS). Formally, assuming that the predictor X and the instruments Z are linearlyrelated, we can extend the model (2.3) to:y = Xb + e (2.7)X = ZP+w. (2.8)where y (n⇥ 1) is the response variable, X (n⇥ 1) is the endogenous predictor, Z(n⇥ p) is the matrix of p instruments,P is the vector of coefficients associated witheach instruments, E[XT e] 6=0, E[ZTw] = E[ZT e] = 0, Var(e) = s2e , Var(w) = s2wand Cov(e,w) = swe measures the endogeneity level. Without loss of generalityit is possible to assume that the endogenous variable and the instruments, and theresponse variable are centered.The idea behind 2SLS is to first use the model in (2.8) to separate the endoge-nous predictor X in two components: the non problematic component ZP (Z ischosen such that it is independent of e), and the problematic one w (w and e arecorrelated). Then, 2SLS aims to estimate b in (2.7) by using ZP only. However,sinceP is not known a priori, the non problematic component cannot be computedexactly and needs to be estimated using an OLS regression. Hence, this methodas been referred to as 2SLS. In other words, to estimate b , first, X is regressed onZ and the predicted Xˆ is computed as Xˆ = ZPˆ; then, y is regressed on Xˆ and the9coefficient b is estimated as:bˆ2SLS = (XˆT Xˆ)1XˆT y =XTPZX1XTPZy (2.9)where PZ = ZZTZ1ZT . Using the coefficient’s estimate in (2.9) and replacingX with the equation in (2.8), the approximate bias of the 2SLS estimates is [9]:E[bˆ2SLS]b ⇡swe pPTZTZP+ ps2v = swes2w ✓PTZTZPps2w +1◆1 . (2.10)Thus, bˆ2SLS is still biased. However, it can be proved that bˆ2SLS is consistent andasymptotically normal. Since Sˆxz = 1nXTZ! Sxz (by the law of large numbers),plimbˆ2SLS = plimhXTPZX1XTPZyi= plimhXTPZX1XTPZ(Xb + e)i= b + plim"✓XTZn◆✓ZTZn◆1✓ZTXn◆#1"✓XTZn◆✓ZTZn◆1✓ZT en◆#= b + plim⇣⇥SˆxzSˆ1zz Sˆzx⇤1 SˆzxSˆ1zz Sˆze⌘= b +⇥SxzS1zz Szx⇤1SzxS1zz Sze= b . (2.11)Moreover,bˆ2SLS !d N✓b , w2n◆, (2.12)where the asymptotic variance is given by w2 = s2e⇥SxzS1zz Szx⇤1.The asymptotic variance of bˆ2SLS plays a primary role in understanding theprecision of the 2SLS estimate. For simplicity, considering only one instrument,10w2 can be rewritten as:w2 =Var⇣bˆ2SLS⌘= s2e⇥SxzS1zz Szx⇤1=s2es2x r2xz, (2.13)where r2xz is the correlation between the endogenous variable X and the instrumentZ, it is clear how the precision of bˆ2SLS increases when the instrument included inthe model is strongly correlated with the endogenous variable. Similarly, if onlyone instrument is considered, the approximate bias of the 2SLS estimate in (2.10)can be rewritten as:E[bˆ2SLS]b ⇡swes2w✓p2s2zs2w+1◆1=swes2w✓r2xzs2xs2w◆+11. (2.14)Hence, the estimate’s bias decreases when the instruments are strongly associ-ated with the predictor. Therefore, in order to have a more precise and a less biasedestimate, the researcher needs to carefully individuate which of the available instru-ments is highly correlated with the endogenous predictor. Figure 2.1 shows howthe variability and the bias of the 2SLS estimate decrease when the correlation be-tween the endogenous predictor X and the instrumental variable Z increases. In thiscase, we considered one endogenous predictor and one instrument Z and we gen-erated a sample size of n = 100; then, for each level of association, we simulatedN = 200 datasets and we estimated b2SLS for each datasets.11Figure 2.1: Mean estimates of b (dot) for different levels of r2xz. The dottedline represents the true value of b . The bars show the standard error ofthe estimates.2.4 Instrumental Variables ProblemsSection 2.3 briefly introduces the problem of having to choose the optimal instru-ment (or set of instruments). First, it is essential to choose a variable Z that isnot endogenous (i.e., Z has to be uncorrelated with the error term e); second, thevariable Z needs to be correlated with the endogenous predictor X ; finally, the cor-relation between X and Z needs to be strong. Moreover, it is important to keep inmind that the use of the instrumental variables estimator decreases the precision ofthe coefficients’ estimate when compared to the OLS.In order to facilitate the choice of whether it is worth using the instrumen-12tal variables estimator, and which instruments to include in the model, the instru-mental variables literature provides different tests that are useful for understandingwhether a specific variable is endogenous, and if potential instruments are onlyweakly correlated with the endogenous predictor. This section discusses one of themost common tests used to check for endogeneity, provides a formal definition ofweak instruments, and it outlines some of the issues that originate from includingweak instruments in the model.2.4.1 The Hausman Test for EndogeneityEven though the instrumental variables estimator removes the inconsistency of theOLS, it causes a loss of efficiency that needs to be taken into account if the re-searcher suspects that the predictor in the model is not endogenous. The Hausmantest for endogeneity, introduced by J.A. Hausman in [10], is essentially a test ofwhether this loss in efficiency is worth removing the inconsistency of the OLS es-timator. Formally, given the model discussed in Section 2.3, the test is based onthe following hypotheses:H0 : Cov(X ,e) = 0Ha : Cov(X ,e) 6= 0.and it involves fitting the model using both the instrumental variables and the OLSregression, and comparing a weighted square of the difference between the twoestimators. Under the null hypothesis, both the OLS and the instrumental variablesestimators are consistent; hence, if the sample size is large, the difference between13them converges to zero. That is:d =⇣bˆ2SLS bˆOLS⌘2Var⇣bˆ2SLS⌘Var⇣bˆOLS⌘ ! 0, (2.15)where it can be proved that d ⇠approx. c2. Therefore, if d is significantly differentfrom zero, then the use of instrumental variables is justified; otherwise, if d is notsignificantly different from zero, then the predictor X might not be endogenousand it is not worth loosing estimate’s precision by using 2SLS. While this testis extremely simple to carry out, it is based on the assumption that the availablesample size is large; therefore, it might work poorly in small sample problems.2.4.2 Weak Instruments: Definition and Finite Sample BiasGiven the model discussed in Section 2.3, once the researcher proves the pres-ence of endogeneity, he/she needs to indentify one or more instruments to include.Selecting a set of relevant instruments that satisfies the two conditions listed inSection 2.2, it is not a simple task. In practice, since testing for the absence of cor-relation between Z and e presents numerous difficulties, most of the instrumentsare included solely based on what the researcher believes to be appropriate [15];however, for every instrument considered in the model it is possible to easily quan-tify its strength.A practical approach used to measure the strength of a set of instruments looksat the the first stage F-statistic testing the joint significance of the coefficients ofthe instruments. Specifically, given the model in (2.8) and regardless of the avail-able sample size, if the first stage F-statistic is less than 10, then the instrumentsare classified as weak [15]. The definition of weakness provided in [15] needs to14be viewed as a simple rule of thumb, and it is just one among the many available.While many authors have argued against the use of the first stage F-statistic as ameasure of instruments weakness, this rule is still widely used in many applica-tions; consequently, we will adopt it in this thesis as a benchmark for identifyingweak instruments.It is essential to understand the consequences of including weak instruments ina 2SLS model. Section 2.3 shows how using instruments that are only weakly cor-related to the endogenous variable decreases the precision of the estimate; however,additional problems might affect the results. In [4], Bound et al. identify two mainissues caused by the use of weak instruments: the inconsistency of bˆ2SLS, and theso-called finite sample bias. To understand the inconsistency problem, it is funda-mental to remember that in all cases the lack of correlation between the instrumentin the model and the error term e cannot be formally tested. If we consider aninstrumental variables estimator, intuition might suggest that, when the correlationbetween e and the instruments is much lower than the correlation between the eand the endogenous variable X , the bias of the 2SLS estimator should be lowerthan the bias of OLS; however, this intuition can be seriously wrong when the in-struments in the model are only weakly correlated with the endogenous predictor.More formally, comparing the probability limit:plimbˆ2SLS = b +plimXˆT enplimXˆTXn, (2.16)with:15plimbˆOLS = b +plimXT enplimXTXn, (2.17)clearly shows how a high covariance between X and e could translate into modestinconsistency for bˆOLS if the variance of X is sufficiently large, while a small co-variance between Xˆ and e could translate into a large inconsistency for bˆ2SLS if thecovariance between Xˆ and X is very small (i.e., if the instruments are weak).The finite sample bias problem can be easily understood by rewriting the biasin (2.10) as:bias⇣bˆ2SLS⌘= E[bˆ2SLS]b ⇡swes2w✓1F +1◆, (2.18)where F = PˆTZTZPˆps2w is the value of first stage F-statistic. In this case, in fact, it canbe noted that if s2x = s2w, and the F-statistic is equal to zero (i.e. the instrumentused does not provide any information about the endogenous predictor) then:bias⇣bˆ2SLS⌘= E[bˆ2SLS]b ⇡swes2w=swes2x⇡ E[bˆOLS]b = bias⇣bˆOLS⌘.(2.19)Thus, when the F-statistics approaches to 0 (i.e. the instrument included provideslittle information about the endogenous predictor), then the 2SLS coefficient’s biasincreases and it approaches to the OLS coefficient’s bias.The use of instruments that are only weakly correlated with the endogenouspredictor is not the only source of finite sample bias. The bias can also arise whenthere are “too many” instruments in the model compared to the sample size n. For16instance, given the model summarized in (2.7) and (2.8), if n valid instruments areincluded, then, in the first stage regression, the number of available parameters willbe equal to the available data points. Consequently, a perfect fit will be obtainedand the predicted values Xˆ will be equal to the actual values of the endogenousvariable X . In this scenario, the second stage regression coincides with the standardOLS regression; hence, the estimated coefficient of the endogenous variable isbiased and inconsistent. In other words, as the number of instruments approachesthe sample size, the 2SLS estimator tends to be equal to the OLS estimator.A feasible solution that can be used to estimate the coefficient b , and to reducefinite sample bias is to select a smaller set of instruments that are relevant andstrongly correlated with the endogenous predictor. As different selection methodscan be used, it is essential to understand which is the most appropriate based onthe estimated bias of the endogenous predictor’s coefficient’s, and on the ability ofthe method to select the “right” set of instruments.2.5 Choosing the Relevant Instruments: InstrumentalVariables SelectionSection 2.4.2 briefly suggests instrumental variables selection as a possible methodto solve the problems related to having to choose the right variables among a largeset of instruments that might be relevant, weak or possibly irrelevant. This sectionaims to further explore the reasons behind instrumental variables selection, to de-scribe briefly how the selection can be done, and to explain how selecting variablesmight reduce the finite sample bias.While introducing the 2SLS estimator, one of the assumptions we made was theavailability of a set of relevant instruments Z; however, we would like to extend the17estimator discussed in Section 2.3 by taking into account both relevant and possiblyirrelevant instruments. In order to reach our objective we still adopt the frameworkpresented in (2.7) and (2.8), and we define the relevance of each instrument basedon the vector of coefficients P; specifically:• The instrument i is relevant if pi 6= 0 (i.e., there is a linear association be-tween the instrument i and the endogenous predictor X).• The instrument i is irrelevant if pi = 0 (i.e., there is not a linear associationbetween the instrument i and the endogenous predictor X).Since the elements of the vector P are not known in advance, the 2SLS proce-dure estimates them by regressing the endogenous predictor X on the set of instru-ments Z; however, as in this new framework we would like some of the pˆi to beexactly zero, we need to resort to a variables selection method to be able to keep inthe model only the relevant instruments and to discard the rest.As introduced by Belloni et al. in [3], it is possible to use different selectionmethods in order to choose a set of relevant instruments and to optimally predict Xˆ .In this thesis, we look at selection methods that work well in both the case wheremany instruments are relevant and the case where most of the instruments are irrel-evant (i.e., sparse case); in particular, we focus on three main categories of selectionmethods: regularization (Lasso, elastic net, adaptive Lasso), post-regularization(post-Lasso, post-EN/L2), and principal components (supervised principal compo-nent analysis).Before introducing each method in detail, it is important to highlight that whilemost of the regularization methods shrink the estimated coefficients so that theprediction error of the out-of-sample data is minimized, the aim for our problem is18to select only the relevant instruments such that the estimated endogenous predictorXˆ is as free as possible from noise and the coefficient b of the endogenous variablecan be estimated consistently.19Chapter 3First Stage: InstrumentalVariables Selection3.1 IntroductionAs shown in Chapter 2, the properties of the 2SLS are sensitive to the choice ofvalid instruments; consequently, when weak instruments are considered, or whentoo many instruments are included in the model, the traditional 2SLS might leadto inconsistent and highly biased estimates. Moreover, as briefly introduced in theprevious chapter, there could be situations where the number of potential instru-ments p is greater than the sample size n; in these settings, 2SLS breaks down andnew methods need to be employed in order to estimate the model’s parameters.For instance, suppose we are carrying out a proteomics biomarker study andwe are considering genes as instruments. For each protein, we have a great numberof potential instruments, among which we need to search for the most relevantones. In addition, the large number of instruments is often accompanied by a small20sample (i.e., n < p). In this setting, a possible solution consists in selecting asmaller set of instruments in the first stage, and using only the instruments selectedto find an estimate of the endogenous predictor Xˆ that will be passed into the secondstage.This chapter introduces different methods that can be used to select a set ofinstruments. Section 3.2 briefly reviews both the Lasso and the elastic net meth-ods. Sections 3.3 and 3.4 describe some post-regularization methods, the adap-tive Lasso, and introduce a possible modification of already existing techniques.Finally, Section 3.5 discusses the supervised principal component analysis as anadditional method to select the relevant instruments and predict the endogenousvariable in the first stage.3.2 Brief Review of Regularization Methods: Lasso andElastic NetGiven a data set with n observations and p predictors, the regression of the endoge-nous predictor X on a set of instruments Z can be written as:X = ZP+w, (3.1)w⇠ N(0,s).Without loss of generality, it is possible to assume that the covariates Zj ( j =1, ..., p) are standardized (i.e. all the covariates have mean 0 and variance one) andthat the response variable X has mean 0. Then, according to [16], for any fixed21non-negative penalty l , the Lasso criterion is defined as:L(l ,P) = |XZP|2 +l |P|1, (3.2)where | · |1 indicates the l1-norm |a|1 = Âi |ai|, and | · |2 refers to the l2-norm |a|2 =Âi a2i . Given (3.2), the Lasso coefficients are estimated as:Pˆlasso = argminP {L(l ,P)} = argminP(nÂi=1(xipÂj=1zi jP j)2 +lpÂj=1|p j|),(3.3)where the amount of shrinkage is controlled by the penalty parameter l . Thus,the larger the value of l , the greater the amount of shrinkage. When l = 0, noshrinkage is applied and Lasso simply becomes an OLS estimator. The idea be-hind Lasso is to select a subset of relevant variables by imposing an l1 penalty onthe regression coefficients. Specifically, Lasso simultaneously selects a subset ofvariables and shrinks the chosen coefficients toward zero (compared with the OLSsolution). Since, the minimization problem in (3.3) corresponds to a convex mini-mization procedure, any algorithm that finds the minimum it is guaranteed to detectthe global, overall minimum. Consequently, the Lasso estimates can be computedusing fast algorithms (e.g. LARS, cyclical coordinate descent) that work well evenwhen p is greater than n.Even though the Lasso procedure is able to select a model easy to interpretand accurate in terms of prediction error, it has few drawbacks. First, when p > n,Lasso selects at most n variables; second, when there is a group of highly correlatedvariables, it tends to randomly select only one variable for that group and to ignore22the rest. The above limitations are addressed by Zou and Hastie who introduce theelastic net procedure [18]. Specifically, for any fixed non-negative l1 and l2, theelastic net criterion is defined as:L(l1,l2,P) = |XZP|2 +l1|P|1 +l2|P|2, (3.4)Given (3.4), the elastic net coefficients are estimated as:PˆEN = argminP {L(l1,l2,P)} (3.5)= argminP(nÂi=1(xipÂj=1zi jp j)2 +l1pÂj=1|p j|+l2pÂj=1|p j|2)which is still a convex minimization problem (i.e. as in the Lasso case, the mini-mum is guaranteed to be a global one). As in the Lasso case, the elastic net simul-taneously does automatic variables selection and continuous shrinkage; however,unlike the Lasso, both an l1 penalty and an l2 penalty are considered. In particular,the l1 part in (3.4) does automatic variables selection, whereas the l2 part facilitatesgrouped selection. Consequently, the elastic net can select groups of correlatedfeatures when those groups are not known in advance.In [18], Zou and Hastie describe the elastic net procedure as a regularized leastsquares problem; specifically, given a = l1/(l1 + l2), finding PˆEN in (3.5) isequivalent to finding the coefficients that minimize the following:PˆEN = argminP(nÂi=1(xipÂj=1zi jp j)2)(3.6)((1a)|P|1 +a|P|2) t,23where t is a constant that controls the amount of shrinkage and a is the elastic netparameter: when a = 0 elastic net reduces to a Lasso problem, when a = 1 elasticnet is equivalent to ridge regression. As in [18], only the cases where a 2 [0,1)are considered. Figure 3.1 shows, for the prostate cancer data discussed by Tibshi-rani in [16], the Lasso and the elastic net estimates as a function of the proportionof shrinkage. Specifically, since each colored line represents the path of a singlecoefficient shrinking towards zero, it is possible to see how each coefficient is se-lected in both the considered methods. The comparison between the two panels inFigure 3.1 clearly shows how elastic net is sensitive to having a group of correlatedvariables, and how it tends to stabilize the solution path.LASSO EN-2.50.02.55.07.5Prostate Cancer Data0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00|beta|/max|beta|Standardized Coefficientsvariablex1x2x3x4x5x6x7x8Figure 3.1: Regularization methods coefficients path for a real data example.The ratio of the sum of the absolute current estimate over the sum of theabsolute OLS estimates is on the x-axis: as we move towards zero, morecoefficients are excluded from the model. The standardized coefficientscomputed by multiplying the original coefficients by the norm of thepredictors are on the y-axis.243.3 Post-Regularization MethodsAlthough regularization methods allow the researcher to select only the relevantvariables and to reduce the estimates’ variance, the coefficients estimated in thesemodels tend to be biased. The shrinkage in the regularization methods, in fact,causes the estimates of non zero-coefficients to be biased towards zero (i.e., thepositive coefficients will be underestimated, while the negative coefficients will beoverestimated), and non consistent [8].To solve these issues different modifications of regularization methods havebeen suggested. One of these, introduced in 2004, is a hybrid LARS/OLS proce-dure where the bias of the model’s coefficients is reduced by fitting a least squaresto the model selected by LARS [6]. Based on the LARS/OLS procedure, Belloniet al. proposed the post-Lasso estimator as a feasible method to select only therelevant variables and to reduce the non-zero coefficients’ bias [2].In the next sections we discuss in detail the post-Lasso procedure, and we pro-pose a modification of post-Lasso that might be helpful in all the situations wherethe number of relevant predictors is greater than the available sample size.3.3.1 The Post-Lasso EstimatorSimilar to the LARS/OLS procedure, Belloni et al. introduce the post-Lasso whichfits an OLS to the model selected using the l1 penalty [2]. Formally, if Tˆ = { j 2{1,2, ..., p} : |pˆ j| > 0} indicates the model chosen by the Lasso procedure, then thepost-Lasso coefficients ePpost can be defined as:ePpost = argminP(nÂi=1(xiÂjzi jp j)2), j 2 Tˆ . (3.7)25Belloni et al. prove that, when p > n, the post-Lasso estimator can performat least as well as Lasso in terms of the rate of convergence, and it has a smallercoefficients’ bias. In addition, they show that Lasso and post-Lasso performs sim-ilarly when the absolute value of the non-zero coefficients is small (less than 0.5)and that post-Lasso has a better performance when the coefficients of the relevantpredictors are greater (in absolute value) than 0.5.3.3.2 The Post-EN/L2 EstimatorAs already discussed in Section 3.1, there might be scenarios where given the setof all variables p, the number of relevant predictors s is greater than the sample sizen. In these cases, any selection method based on an l1 penalty will be able to selectat most n variables and it will miss some relevant features. A possible solution thatcan be adopted when s > n is to select the relevant predictors using an elastic netpenalty, and to reduce the endogenous predictor’s coefficient bias using only theset of variables selected.However, unlike post-Lasso, it might be impossible to apply an OLS to thechosen model if more than n variables are selected. To be able to simultane-ously attenuate the coefficients’ bias and select more than n variables, we pro-pose a new estimator called post-EN/L2. The idea of the post-EN/L2 is to fitridge regression using only the variables selected by the elastic net. Formally,if TˆEN = { j 2 {1,2, ..., p} : |pˆ j| > 0} indicates the model chosen by the elastic net,then the post-EN/L2 coefficients ePpostEN/L2 can be defined as:ePpostEN/L2 = argminP(nÂi=1(xiÂjzi jp j)2 +l2Âjp2j), j 2 TˆEN . (3.8)26Fitting ridge regression using the selected variables only allows us to considermore than n predictors, to take into account the presence of groups of correlatedvariables, and to separate the two effects of any regularization method (i.e., variableselection and coefficient shrinkage). In addition, by using a two-stage regulariza-tion procedure we now consider two penalty parameters: a larger one, on the firstpass, used to select a small number of variables, and a small penalty, on the secondpass, used to shrink the non-zero coefficients.3.4 Adaptive LassoAs already introduced in Section 3.3, Zou in [17] proves that the Lasso estimatesare only consistent under specific circumstances, and he proposes a new estimatorcalled the adaptive Lasso. The proposed method introduces weights to penalizedifferently the coefficients in the l1 penalty and to guarantee the consistency of theLasso estimates. Formally, given the model in (3.1), P can be estimated as:Pˆada.lasso = argminP(nÂi=1(xipÂj=1zi jp j)2 +lpÂj=1wj|b j|)(3.9)where w = (w1, ...,wp) is a vector of the data dependent weights. In particular,given the coefficient estimated by OLS, PˆOLS, the vector w can be estimated by:wˆ = 1/|PˆOLS|g (3.10)where g > 0. Since the optimization problem in (3.9) is convex, once the minimumis found, it is guaranteed to be the global minimum.27Based on the weights in (3.10), Zou proves that, when n> p, the adaptive Lassoprovides consistent estimates under every circumstance, identifies the right subsetof relevant variables, and achieves the optimal estimation rate. As in the Lassocase, both g and the penalty parameter l are chosen based on cross-validation;specifically, for a given g , Zou uses cross-validation paired with LARS to exclu-sively search for l . Moreover, while it is also possible to change PˆOLS with anyother consistent estimates, Zou in [17] suggests to use PˆOLS unless the researcheris concerned of the effects of multicollinearity. In that case, the use of Pˆridge (i.e.,the estimate resulting from fitting ridge regression) is advised.3.5 Supervised Principal Components AnalysisAn alternative method that can be used to select the relevant covariates and topredict the response variable X from a set of p predictors (Z1, ..., Zp) measured oneach of the n sample, is the supervised principal components (SPCA). SPCA is afour steps methodology, introduced by Bair et al. in [1], that allows the researcherto predict the outcome of interest and to identity relevant predictors.Formally, given Z = (Z1, ...,Zp) an n⇥ p matrix of variables and X an n⇥ 1vector of outcome measurements, and assuming that (without loss of generality)all the variables in the model are centered, SPCA first estimates the standard re-gression coefficient for each predictor as:si =ZTi XqZTi Zi. (3.11)Afterwards, by using cross-validation of the likelihood ratio test it estimates athreshold q , and it forms a reduced data matrix Zq considering only the collection28of indices, Cq , such that |si| > q . Next, SPCA computes the first principal compo-nent (or the first few) of the reduced data matrix, uq ,1. Finally, it fits a univariateregression model with response X and predictor uq ,1:Xspc,q = g0 + g1uq ,1 (3.12)Since uq ,1 is a left singular vector of Zq , it has mean zero and norm one; hence,gˆ1 = uTq ,1X, gˆ0 = x¯ and:Xˆ spc,q = X + gˆ1uq ,1. (3.13)Consequently, the model in (3.12) is a restricted linear model built using all theselected predictors that can be used to make reliable predictions of X .By performing principal components analysis on only a smaller set of vari-ables, SPCA tries to consider only those predictors with the strongest estimatedcorrelation with the outcome of interest; thus, it tends to eliminate the “excessivenoise”. Moreover, the use of PCA allows the researcher to find a more compactrepresentation of the data by reducing a set of p variables (more or less correlated)into k (where k < p) uncorrelated linear combinations of the original variablescalled principal components that contain most of the relevant information of thedata [14].Having derived the predictor uq ,1, Bair et al. also introduce the concept of im-portance score to assess the contribution of each individual covariate. Specifically,they define the importance score as the correlation between each variable and the29supervised principal component predictor:impi =Cor(Zi,uq ,1) (3.14)Based on (3.14), one can understand which are the most relevant predictors (i.e., the features with higher importance score), discard the irrelevant variables, andprove that the SPCA coefficients’ estimates are consistent. Although SPCA allowsthe researcher to select a smaller set of features, it is essential to highlight that theability of cross-validation to select the correct threshold has not been establishedtheoretically in [1].3.6 Using Variables Selection in Instrumental VariablesRegressionEvery selection method introduced in this chapter allows the researcher to predictthe outcome variable X based on a selected smaller subset of relevant predictorsand to select a smaller subset of relevant predictors. In this thesis, we use thesevariables selection techniques in order to primarily choose a smaller set of validinstruments from a collection of variables that might also include weak and possi-bly irrelevant instruments. Given the framework in (2.7) and (2.8), we employ atwo steps procedure. Specifically, first, we use a variable selection method (reg-ularization, post-regularization and SPCA) to select a set of relevant variable andto calculate Xˆ as free as possible from noise; then, we regress y on Xˆ in order toestimate the coefficient b of the endogenous predictor. Ideally, we would like to30chose a method such that all the relevant instruments are detected correctly:P [( j : pˆ j 6= 0) = ( j : p j 6= 0)]! 1, (3.15)and the estimated coefficient bias of the endogenous predictor tends to zero.In the next chapter we compare the performance of SPCA to the one of regu-larization and post-regularization methods using different simulation settings thattake into account both the cases when n > p and when n < p.31Chapter 4Simulation Study4.1 IntroductionWe carried out multiple simulation studies in order to understand which selectionmethod minimizes the estimate’s bias and selects the right set of instruments. Inthis chapter, we explain in detail the settings used, we report the results obtained,and we compare all the methods employed.In our simulation study, we consider one endogenous variable, n observations,and p instruments (where p can be either less or greater than the sample size).Specifically, the simulation is based on the following model:y = Xb + eX = ZP+w. (4.1)0B@eiwi1CA⇠ N0B@0,0B@1 sewswe s2w1CA1CA32b = 1 is the parameter of interest, Z is the n⇥ p matrix of instruments generateddifferently in different examples, but such that in all considered cases E[ZTw] =E[ZT e] = 0. The endogenous predictor X and the outcome variable y are generatedhierarchically according to (4.1). For the other parameters, a wide variety of differ-ent settings are used. Specifically, we consider two main scenarios (the case wheren > p, and the one where n < p), two different levels of endogeneity (low andmedium), and three different values of s2w which are chosen to benchmark threedifferent strengths of instruments. The two endogeneity levels are fixed by settingCor(e,w) to either 0.3 (low) or 0.6 (medium); whereas the instrument strength isintroduced in the model by computing s2w as:s2w =nPTSzPFPTP , (4.2)where Sz is the variance and covariance matrix of Z, and F is the first stage F-statistic set to be 10 (weak instruments), 40 (medium strength instruments) and160 (strong instruments).For each combination of the simulation parameter values, we simulate N = 500datasets. For each dataset, we estimate the coefficient b of the endogenous variableusing all the methods listed in Chapter 3, and we compare their root mean squareerror (RMSE):RMSE =s1NNÂi=1hbˆibii2, (4.3)as well as the average number of variables selected in the first stage, and the averagenumber of relevant instruments correctly identified by the chosen method (i. e. true33positive). All the reported results have been rounded to have two significant digits.4.2 The Case of n > pIn the scenario where n > p, we consider four examples: the first three examplesare taken from the original elastic net paper [18] and they are partially changedin order to add endogeneity, whereas the last one is taken from the Belloni et. al’ssimulation study in [3]. Each example is characterized by a different n, p,P and Z.The first two examples represent cases in which Lasso and elastic net are supposedto perform similarly, the third example creates a grouped variables situation, andthe last example depicts a scenario where most of the potential instruments areirrelevant. The details of the four examples are reported below:Example 1. The data are simulated from the true model (4.1). The sample sizen = 20, while the number of instruments p = 8. Among the 8 instruments onlys = 3 are relevant:P= (3,1.5,0,0,2,0,0,0).The matrix of instruments Z is generated as N(0,S) with Si j = 0.5|i j|.Example 2. It is the same as example 1 except that all the instruments con-sidered are relevant but each instrument does not have a strong influence on theendogenous predictor X : Pi = 0.85 (i = 1,2, ...,8).Example 3. The data are simulated from the model (4.1) where the numberof observations n = 50 and the number of instruments p = 40. Among the 4034instruments only s = 15 are relevant:P= (3, ...,3| {z }15,0, ...,0)| {z }25.Example 3 considers three equally important groups of five instruments each,and other 25 irrelevant variables. The matrix of potential instruments is generatedas: zi =W1 + vi withW1 ⇠ N(0,1), i = 1, ...,5zi =W2 + vi withW2 ⇠ N(0,1), i = 6, ...,10zi =W3 + vi withW3 ⇠ N(0,1), i = 11, ...15zi ⇠ N(0,1), i = 16, ...,40.where vi (i = 1, ...,15) are independent identically distributed N(0,0.01).Example 4. The data are simulated from the model (4.1) where the numberof observations n = 500 and the number of potential instruments p = 100. Amongthe 100 instruments, only s = 5 are relevant:P= (1, ...,1| {z }5,0, ...,0)| {z }95.The matrix of instruments Z is generated as N(0,S) with Si j = 0.5|i j|.4.2.1 Simulation’s ResultsThe comparison between selection methods is done based on the endogenous pre-dictor’s coefficient’s RMSE and the average number of instruments selected byeach method. Tables 4.1, 4.2, 4.3 and 4.4 report the obtained results.Tables 4.1 and 4.2 show that, regardless of the endogeneity level and the sim-35ulation design, post-Lasso minimizes the coefficient’s RMSE. On the other hand,Lasso and elastic net lead to the highest coefficient’s RMSE; surprisingly, both themethods have a worse performance than the 2SLS. As previously observed by Bel-loni et. al results show that the selection of a subset of instruments comes at a priceof an excessive shrinkage of the coefficients in the first stage [3]. Fitting a ridgeregression (post-EN/L2) or a least squares (post-Lasso) only on the selected vari-ables helps to reduce the coefficient’s RMSE. While every regularization methodis outperformed by post-Lasso, it is essential to highlight that SPCA leads to acoefficient’s RMSE close to the one estimated using post-Lasso.Tables 4.3 and 4.4 show the number of variables selected by each method. Assome of the methods used are based on the same selection, the comparison reportedonly inspects the differences between the Lasso, the elastic net, the adaptive Lassoand the SPCA. Before drawing any conclusion, it is essential to remark that, inevery example, when the number of selected variables is higher than the numberof relevant instruments, s, the model selects all the right coefficients (hence, thenumber of true positives is equal to s). When the number of selected variables islower than the number of relevant instruments, the model does not select any ofthe irrelevant coefficients (thus, the number of false positives is equal to zero). TheLasso, the elastic net and the adaptive Lasso tend to choose the same number ofvariables, whereas the SPCA selects less variables and tends to miss some of thetrue instruments.36Weak Instruments2SLS post-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 1 0.05 0.03 0.36 0.32 0.10 0.06 0.06Example 2 0.06 0.04 1.17 1.19 0.08 0.20 0.07Example 3 0.02 0.01 0.20 0.17 0.03 0.03 0.02Example 4 0.02 0.01 0.97 0.79 0.02 0.27 0.01Medium Strength Instruments2SLS post-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 1 0.05 0.04 0.15 0.18 0.10 0.04 0.06Example 2 0.06 0.04 0.22 0.23 0.08 0.06 0.07Example 3 0.02 0.01 0.10 0.14 0.03 0.01 0.02Example 4 0.02 0.01 0.70 0.70 0.02 0.07 0.01Strong Instruments2SLS post-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 1 0.05 0.04 0.07 0.13 0.08 0.04 0.06Example 2 0.04 0.05 0.08 0.10 0.09 0.05 0.07Example 3 0.02 0.01 0.14 0.10 0.09 0.01 0.02Example 4 0.02 0.01 0.26 0.27 0.02 0.03 0.01Table 4.1: Coefficient RMSE for the simulated examples and for different instruments strength. Cor(e,v) = 0.3. Forthe regularization and the post-regularization methods, the penalty parameter l has been chosen such that thecross-validation error is within one standard error from the minimum.37Weak Instruments2SLS post-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 1 0.06 0.04 0.37 0.37 0.10 0.06 0.06Example 2 0.04 0.05 1.36 1.18 0.09 0.20 0.07Example 3 0.02 0.01 0.20 0.18 0.03 0.03 0.02Example 4 0.04 0.02 0.84 0.96 0.03 0.26 0.01Medium Strength Instruments2SLS post-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 1 0.05 0.04 0.07 0.08 0.12 0.04 0.06Example 2 0.04 0.05 0.08 0.08 0.09 0.06 0.07Example 3 0.02 0.01 0.13 0.06 0.03 0.01 0.02Example 4 0.04 0.02 0.25 0.26 0.02 0.06 0.01Strong Instruments2SLS post-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 1 0.05 0.04 0.15 0.18 0.10 0.04 0.06Example 2 0.04 0.04 0.24 0.24 0.07 0.05 0.07Example 3 0.02 0.01 0.20 0.14 0.03 0.01 0.02Example 4 0.04 0.01 0.71 0.72 0.02 0.02 0.01Table 4.2: Coefficient RMSE for the simulated examples and for different instruments strength. Cor(e,v) = 0.6. Forthe regularization and the post-regularization methods, the penalty parameter l has been chosen such that thecross-validation error is within one standard error from the minimum.38Weak Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 1 20 8 3 3.75 3.83 3.96 3.11Example 2 20 8 8 4.20 4.30 5.11 5.17Example 3 50 40 15 21.60 20.92 19.06 10.43Example 4 500 100 5 1.66 1.73 2.60 16.71Medium Strength Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 1 20 8 3 3.89 4 3.62 3.14Example 2 20 8 8 6.89 7 7.15 5.96Example 3 50 40 15 21.41 21.15 16.88 10.00Example 4 500 100 5 4.80 4.81 5.1 8.97Strong Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 1 20 8 3 3.84 4.01 3.19 3.15Example 2 20 8 8 7.86 7.93 7.94 6.16Example 3 50 40 15 20.92 21.13 15.18 10.42Example 4 500 100 5 5.04 5.07 5.52 5.29Table 4.3: Average number of variables selected by each method. Cor(e,v)=0.3. For the regularization and the post-regularization methods, thepenalty parameter l has been chosen such that the cross-validation er-ror is within one standard error from the minimum.39Weak Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 1 20 8 3 3.73 3.80 3.91 3.10Example 2 20 8 8 4.16 4.29 5.13 5.21Example 3 50 40 15 21.50 20.92 19.02 10.36Example 4 500 100 5 1.68 3.08 2.6 16.41Medium Strength Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 1 20 8 3 3.88 3.96 3.62 3.12Example 2 20 8 8 6.91 6.98 7.16 5.96Example 3 50 40 15 21.34 21.10 16.72 10.48Example 4 500 100 5 4.80 4.81 4.98 8.98Strong Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 1 20 8 3 3.84 4.01 3.17 3.15Example 2 20 8 8 7.85 7.94 7.95 6.18Example 3 50 40 15 20.86 21.13 15.18 10.56Example 4 500 100 5 5.04 5.07 5.70 5.28Table 4.4: Average number of variables selected by each method. Cor(e,v)=0.6. For the regularization and the post-regularization methods, thepenalty parameter l has been chosen such that the cross-validation er-ror is within one standard error from the minimum.404.3 The Case of n < pIn biomarker discovery studies where the genes can be considered as potential in-struments, the number of potential instrumental variable p is usually higher thanthe sample size n. In these settings, we would like to identify all the relevantinstruments in order to have a complete picture of the problem investigated; con-sequently, it becomes essential to employ methodologies that are able select morethan n variables.In this part of the simulation study, we look at the performance of the instru-mental variables selection methods when p > n. Specifically, given a set of ppotential instruments, four examples are considered: in the first one, the number ofrelevant instruments s is greater than the total sample size n; in the second example,the number of relevant instruments is less than the sample size n; in the third one allthe considered instruments are relevant but they do not have a strong influence onthe endogenous predictor; finally, the last example depicts a sparse scenario wheremost of the potential instruments are actually irrelevant. The details of the fourexamples are reported below:Example 5. The data are simulated from the true model (4.1) where the numberof observations n= 40, the number of predictors p= 80 and the number of relevantvariables s = 50:P= (2, ...,2| {z }25,0, ...,0| {z }20,2, ...,2| {z }25,0, ...,0)| {z }10.The matrix of instruments is generated as: Z⇠ N(0,S) with Si j = 0.5|i j|.Example 6. It is the same as example 5 except that the number of relevant41variable s = 20 is smaller than the sample size n = 40.P= (2, ...,2| {z }15,0, ...,0| {z }30,2, ...,2| {z }5,0, ...,0)| {z }30.Example 7. It is the same as example 5 except that all the instruments includedare relevant but each instrument does not have a strong influence on the endogenouspredictor X : P j = 0.85 ( j = 1,2, ...,80).Example 8. The data are simulated from the model (4.1) where the number ofobservations n = 50 and the number of instruments p = 1000. Among the 1000instruments, s = 100 are relevant:P= (1, ...,1| {z }100,0, ...,0)| {z }900.The matrix of instruments Z is generated as N(0,S) with Si j = 0.5|i j|.4.3.1 Simulation’s ResultsTables 4.5 and 4.6 report the coefficient’s RMSE for the examples considered. Bothtables show results similar to the ones obtained when n > p. For instance, we cansee that the post-Lasso outperforms all the other regularization methods, and thatpost-EN/L2 reduces the coefficient’s RMSE of the endogenous predictor’s coeffi-cient by diminishing the first stage coefficient’s bias. Unlike the case where n > p,the SPCA has the same performance of the post-Lasso in all the considered set-tings. The behavior of the SPCA agrees with what was observed in the previoussection, and it can be considered as an indicator that SPCA might have a betterperformance when both n and p increase.42Tables 4.7 and 4.8 show the number of variables selected by each method. Likein the case where n > p, the comparison reported examines only the methods thatare based on different selection mechanisms. Both tables clearly show the Lassoand the adaptive Lasso’s inability to select more instruments than the availablesample size. On the other hand, the elastic net tends to select more instrumentswith a higher number of true positive detected.As a genomic dataset resembles a sparse situation, it is interesting to observethe results obtained in Example 8. In this scenario, the regularization methodsfail to select a complete subset of relevant instruments: the maximum number ofvariables selected is 70.5 out of 100 non-zero coefficients among which only 58are correctly detected. The performance of SPCA is not as good either; in fact,while SPCA selects all the relevant instruments, it adds, on average, other 94 falsepositives. Regardless of the strength of the instruments included in the model, thelevel of endogeneity considered, and the number of relevant instruments selected,both post-Lasso and SPCA lead to the same value for the RMSE of the endogenouspredictor’s coefficient. It is interesting to observe that the first principal componentretains most of the relevant information from the 194 variables; as a result, theestimated endogenous predictor Xˆ is close to that obtained from fitting an OLSregression with only the five variables selected by Lasso.43Weak Instrumentspost-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 5 0.01 0.27 0.10 0.04 0.03 0.01Example 6 0.01 0.16 0.34 0.04 0.04 0.01Example 7 0.01 0.48 0.10 0.06 0.11 0.01Example 8 0.01 0.75 1.08 0.10 0.75 0.01Medium Strength Instrumentspost-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 5 0.01 0.22 0.10 0.04 0.03 0.01Example 6 0.01 0.08 0.27 0.04 0.02 0.01Example 7 0.01 0.41 0.05 0.10 0.10 0.01Example 8 0.01 1.11 1.51 0.11 0.64 0.01Strong Instrumentspost-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 5 0.01 0.20 0.09 0.04 0.03 0.01Example 6 0.01 0.05 0.27 0.04 0.02 0.01Example 7 0.01 0.41 0.04 0.11 0.09 0.01Example 8 0.01 0.75 1.03 0.13 0.61 0.01Table 4.5: Coefficient RMSE for the simulated examples and for different instruments strength. Cor(e,v) = 0.3. Forthe regularization and the post-regularization methods, the penalty parameter l has been chosen such that thecross-validation error is within one standard error from the minimum.44Weak Instrumentspost-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 5 0.01 0.26 0.11 0.04 0.03 0.01Example 6 0.01 0.17 0.34 0.04 0.04 0.01Example 7 0.01 0.49 0.11 0.06 0.11 0.01Example 8 0.01 0.92 1.05 0.09 0.69 0.01Medium Strength Instrumentspost-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 5 0.01 0.24 0.10 0.04 0.03 0.01Example 6 0.01 0.08 0.27 0.04 0.02 0.01Example 7 0.01 0.41 0.06 0.09 0.08 0.01Example 8 0.01 1.23 1.15 0.11 0.64 0.01Strong Instrumentspost-Lasso Lasso EN Post EN/L2 Adaptive Lasso SPCAExample 5 0.01 0.21 0.09 0.04 0.02 0.01Example 6 0.01 0.05 0.27 0.04 0.02 0.01Example 7 0.01 0.39 0.04 0.11 0.08 0.01Example 8 0.01 0.76 1.18 0.14 0.58 0.01Table 4.6: Coefficient RMSE for the simulated examples and for different instruments strength. Cor(e,v) = 0.6. Forthe regularization and the post-regularization methods, the penalty parameter l has been chosen such that thecross-validation error is within one standard error from the minimum.45Weak Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 5 40 80 50 53.59 26.69 25.66 27.28Example 6 40 80 20 24.96 23.20 20.44 19.92Example 7 40 80 80 63.90 22.66 22.68 28.41Example 8 50 1000 100 70.50 4.80 8.24 191.77Medium Strength Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 5 40 80 50 55.79 27.62 25.06 27.42Example 6 40 80 20 26.56 26.74 21.48 20.23Example 7 40 80 80 74.96 25.11 23.22 29.23Example 8 50 1000 100 56.50 4.78 9.64 196.99Strong Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 5 40 80 50 56.12 28.05 25.16 27.46Example 6 40 80 20 25.92 28.14 20.28 10.02Example 7 40 80 80 76.56 25.60 21.82 29.73Example 8 50 1000 100 70.50 4.80 9.82 195.52Table 4.7: Average number of variables selected by each method. Cor(e,v)=0.3. For the regularization and the post-regularization methods, thepenalty parameter l has been chosen such that the cross-validation er-ror is within one standard error from the minimum.46Weak Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 5 40 80 50 53.68 26.82 25.90 27.45Example 6 40 80 20 24.95 23.25 20.50 19.92Example 7 40 80 80 63.84 22.59 22.84 28.38Example 8 50 1000 100 70.50 4.94 8.04 192.21Medium Strength Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 5 40 80 50 55.81 27.71 24.32 27.45Example 6 40 80 20 26.64 26.74 21.52 19.97Example 7 40 80 80 74.97 25.20 22.90 29.58Example 8 50 1000 100 56.60 4.69 8.56 198.09Strong Instrumentsn p s EN Lasso Adaptive Lasso SPCAExample 5 40 80 50 56.27 28.08 25.02 28.46Example 6 40 80 20 25.99 28.24 20.32 20.07Example 7 40 80 80 76.56 25.60 21.92 29.77Example 8 50 1000 100 70.50 4.94 9.94 191.98Table 4.8: Average number of variables selected by each method. Cor(e,v)=0.6. For the regularization and the post-regularization methods, thepenalty parameter l has been chosen such that the cross-validation er-ror is within one standard error from the minimum.474.4 The Choice of the Penalty Parameter lAs already introduced in Section 3.3, the penalty parameter l used in regulariza-tion and post-regularization methods can be chosen based on different criteria. Inthis section, we consider a possible grid of l s, and we look at the performance ofboth regularization and post-regularization methods in order to illustrate the im-portance of choosing the appropriate penalty.The most commonly used penalties are the ones introduced by Hastie et al.in [8]. To compute the optimal l , they fix a grid of l , then for each l in thegrid, they carry out a k folds cross validation; specifically, they divide the availabledata in k subsamples, train the model in k 1 subsamples and test it in the kthsubsample. They repeat the cross-validation procedure k times and the k resultsfrom the different subsamples are combined to produce a single estimation for themodel’s prediction error. Using the aforementioned procedure, they distinguishbetween two different penalties:• lmin: the value of l that gives the minimum mean cross-validated error;• l1se: the value of l that gives the most regularized model such that the cross-validation error is within one standard deviation of its minimum.Since the error at each value of l is the average error over the k folds, thisestimate is characterized by uncertainty. Consequently, they advise to take a moreconservative approach and to use l1se as penalty parameter that gives a slightlysimpler model than the best model, and takes into account the uncertainty in the kfolds cross-validation estimate.Another possible choice of l is introduced by Belloni et al. in [2]. In particular,48the penalty parameter adopted by Belloni et. al is not computed based on the cross-validation error of the first stage, but it estimated looking only the data availablewithout considering the out-of-sample prediction error. Moreover, they justify thechoice of a data-driven penalty by proving that, if the penalty is not chosen usingthe data, then it tends to be quite conservative (i.e. too large) when the predictors inthe model are highly correlated [2]. As there exists multiple choices for the penaltyparameter, we set a grid of possible values to look at the trend of the endogenouspredictor coefficient’s RMSE and of the average number of variables selected overthe grid.Given the model in (4.1), since the variables selection is carried out in the firststage, the choice of l refers to the regression of X on Z; however, as the mainfocus of the study is the estimate of b , we could check whether the l chosen inthe first stage is the one that minimizes the coefficient’s RMSE in the second stage.Figures 4.1 and 4.2 show the trend of the coefficient’s RMSE for Examples 4 and8, given a grid of possible l s. We use a vertical black line to indicate the value oflog(l1se) that gives the most regularized model such that the cross-validation errorin the first stage is within one standard deviation of its minimum. When n > p,we only plot the trend of the coefficient’s RMSE for the Lasso as it is quite similarto the one generated fitting the elastic net. In both examples, it can be observedhow the classical regularization methods lead to a higher coefficient’s RMSE asthe shrinkage increases. Moreover, it is essential to highlight that the coefficient’sRMSE in the second stage it is not minimized at the value of log(l1se) computedbased on the cross-validation error. However, the difference between the absoluteminimum and the coefficient’s RMSE estimated using log(l1se) is minimal, andin all the cases considered the coefficient’s RMSE is minimized when post-Lasso49is fitted. Consequently, we can think of log(l1se) as a good choice for a penaltyparameter even though it was not tuned based on the second stage.Figures 4.3 and 4.4 show the trend of the average number of variables selected,true positives and false positives for Examples 4 and 8, given a grid of possiblel s. In both examples it can be observed that, for smaller values of the penaltyparameter, the majority of the variables included in the model are false positives.As the value of the penalty increases (i.e., as less variables are selected), the numberof false positives drops to zero while the true positives are retained into the model.At log(l1se), it can be immediately noted that, when n > p the model chooseson average all the relevant variables (5 out of 5) and does not include any falsepositives. On the other hand, when n< p, at log(l1se) the model selects on averagefive variables out of the 100 that are set to be relevant; in particular, among the fivevariables chosen only two have been selected correctly. However, regardless ofthe value of the penalty parameter l , the model never performs well in term ofinstruments selection. Nevertheless, since log(l1se) is closer to the value of thepenalty that minimize the coefficient’s RMSE, it still represents a good choice fora penalty.50Figure 4.1: Trend of the second stage coefficient’s RMSE for different values of l . The data is generated based onExample 4,Cor(e,v) = 0.6, and the first stage F-statistic is set to 40 (medium strength instruments). A grid of l sis generated, and for each value in the grid N = 500 datasets are simulated. For each l , the RMSE plotted on they-axis is computed as the mean of the 500 replication. The black vertical line indicates the value of log(l ) suchthat the cross-validation error in the first stage is within one standard deviation of its minimum.51Figure 4.2: Trend of the second stage coefficient’s RMSE for different values of l . The data is generated based onExample 8,Cor(e,v) = 0.6, and the first stage F-statistic is set to 40 (medium strength instruments). A grid of l sis generated, and for each value in the grid N = 500 datasets are simulated. For each l , the RMSE plotted on they-axis is computed as the mean of the 500 replication. The black vertical line indicates the value of log(l ) suchthat the cross-validation error in the first stage is within one standard deviation of its minimum.52Figure 4.3: Trend of the average number of variables selected for different values of l . The data is generated based onExample 4, Cor(e,v) = 0.6, the first stage F-statistic is set to 40 (medium strength instruments), and the Lasso isused to select the relevant instruments. A grid of l s is generated, and for each value in the grid N = 500 datasetsare simulated. For each l , the number of variable selected, the true positive and the false positive plotted on they-axis are computed as the mean of the 500 replication. The black vertical line indicates the value of log(l ) suchthat the cross-validation error in the first stage is within one standard deviation of its minimum.53Figure 4.4: Trend of the average number of variables selected for different values of l . The data is generated based onExample 8, Cor(e,v) = 0.6, the first stage F-statistic is set to 40 (medium strength instruments), and the Lasso isused to select the relevant instruments. A grid of l s is generated, and for each value in the grid N = 500 datasetsare simulated. For each l , the number of variable selected, the true positive and the false positive plotted on they-axis are computed as the mean of the 500 replication. The black vertical line indicates the value of log(l ) suchthat the cross-validation error in the first stage is within one standard deviation of its minimum.54Chapter 5ConclusionsIn this thesis we employed different variables selection methods in the contextof instrumental variables selection. Looking at scenarios where the relevant in-struments are part of a bigger set that contains also weak and possibly irrelevantinstrumental variables, we employed each method with the objective of selectingonly the relevant instruments. We reviewed some existing methods and we intro-duced new approaches that might help the researcher in all the settings where thenumber of available instruments is greater than the sample size.Based on the results obtained from different simulation studies, we concludedthat, regardless of the instruments strength, the available sample size, and the num-ber of instruments, post-regularization methods and supervised principal compo-nents outperform regularization methods by reducing the first stage coefficients’bias. When the sample size and the number of instruments increase, post-Lassoand supervised principal components analysis lead to the same coefficient’s RMSEeven though they select a different number of instruments.55Bibliography[1] E. Bair, T. Hastie, P. Debashis, and R. Tibshirani. Prediction by super-vised principal component. Journal of the American Statistical Association,101(473), March 2006. ! pages 28, 30[2] A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse model. Bernoulli, 19(2):521–547, 2013. ! pages 25, 48,49[3] A. Belloni, V. Chernozhukov, and C. Hansen. Lasso methods for gaussianinstruemental variables model, 2011. ! pages 18, 34, 36[4] J. Bound, D.A. Jaeger, and R.M. Baker. Problems with instrumental vari-aables estimation when the correlation between the instruments and the en-dogenous variable is weak. Journal of the American Statistical Association,90(430):443–450, 1995. ! pages 15[5] V. Didelez and N. Sheehan. Mendelian randomization as an instrumental vari-able approach to causal inference. Statistical Methods in Research, 16:309–330, 2007. ! pages 3[6] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression.The Annals of Statistics, 32(02):407–451, 2004. ! pages 25[7] E. Pinzon Garcia. Essays on Instrumental Variables. PhD thesis, Universityof Wisconsin-Madison, 2012. ! pages[8] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learn-ing: Data Mining, Inference and Prediction. Springer Series in Statistics,second edition, 2008. ! pages 25, 48[9] J Hausman and J. Hahn. Note on bias in estimators for simulataneous equa-tion models. Economics Letters, pages 237–241, 2002. ! pages 1056[10] J.A. Hausman. Specification tests in econometrics. Econometrica,46(6):1251–1271, 1978. ! pages 13[11] J.B. Meigs. Multiple biomarker prediction of type 2 diabetes. Diabetes Care,32(7):1346–1348, 2009. ! pages 2[12] N. Meinshausen. Relaxed lasso. Computational Statistics and Data Analysis,pages 374–393, 2007. ! pages[13] T. Permutt and J.R. Hebel. Simultaneous-equation estimation in a clinicaltrial of the effect of smoking on birth weight. Biometrics, 45(2):619–622,1989. ! pages 3[14] E. Robotti, M. Manfredi, and E. Marengo. Biomarkers discovery throughmultivariate statistical methods: A review of recently developed methods andapplications in proteomics. Journal of Proteomics and Bioinformatics, 2014.! pages 1, 29[15] D. Staiger and J.H. Stock. Instrumental variables regression with weak in-struments. Econometrica, 65(3):557–586, 1997. ! pages 14[16] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal ofthe Royal Statistical Society, 58:267–288, 1996. ! pages 21, 24[17] H. Zou. The adaptive lasso and its oracle properties. Journal of the AmericanStatistical Association, 101:1418–1429, 2006. ! pages 27, 28[18] H. Zou and T. Hastie. Regularization and variable selection via the elasticnet. J. R. Statistic. Soc. B, 67(Part 2):301–320, 2005. ! pages 23, 24, 3457
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Instrumental variables selection : a comparison between...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Instrumental variables selection : a comparison between regularization and post-regularization methods Di Gravio, Chiara 2015
pdf
Page Metadata
Item Metadata
Title | Instrumental variables selection : a comparison between regularization and post-regularization methods |
Creator |
Di Gravio, Chiara |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | Instrumental variables are commonly used in statistics, econometrics, and epidemiology to obtain consistent parameter estimates in regression models when some of the predictors are correlated with the error term. However, the properties of these estimators are sensitive to the choice of valid instruments. Since in many applications, valid instruments come in a bigger set that includes also weak and possibly irrelevant instruments, the researcher needs to select a smaller subset of variables that are relevant and strongly correlated with the predictors in the model. This thesis reviews part of the instrumental variables literature, examines the problems caused by having many potential instruments, and uses different variables selection methods in order to identify the relevant instruments. Specifically, the performance of different techniques is compared by looking at the number of relevant variables correctly detected, and at the root mean square error of the regression coefficients’ estimate. Simulation studies are conducted to evaluate the performance of the described methods. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-08-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166610 |
URI | http://hdl.handle.net/2429/54569 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_september_digravio_chiara.pdf [ 763.33kB ]
- Metadata
- JSON: 24-1.0166610.json
- JSON-LD: 24-1.0166610-ld.json
- RDF/XML (Pretty): 24-1.0166610-rdf.xml
- RDF/JSON: 24-1.0166610-rdf.json
- Turtle: 24-1.0166610-turtle.txt
- N-Triples: 24-1.0166610-rdf-ntriples.txt
- Original Record: 24-1.0166610-source.json
- Full Text
- 24-1.0166610-fulltext.txt
- Citation
- 24-1.0166610.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166610/manifest