COMPARISON OF THE APPROACHES TO ASSESSING STATISTICAL INTERACTIONS: AN APPLICATION TO RISK FACTORS FOR ADOLESCENT PROBLEM BEHAVIOUR by GORDANA RAJLIC A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS in The Faculty of Graduate Studies (Measurement, Evaluation, and Research Methodology) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) July 2014 © Gordana Rajlic, 2014 ii Abstract The purpose of the current project was to utilize and compare several approaches to assessing interactions among continuous variables. The approaches used in the project were: (a) multiple regression, (b) unconstrained mean-centered approach (Marsh, Wen, & Hau, 2004), (c) orthogonalizing approach (Little, Bovaird, & Widaman, 2006), and (d) latent moderated structural equations approach (LMS; Klein & Moosbrugger, 2000). The last three approaches utilize the latent variables modeling framework, and they address some of the limitations of multiple regression related to the assumption that the predictors are measured without error. All selected approaches were applied to a problem from psychology domain concerned with adolescent problem behaviour. Specifically, the interactions between certain risk factors relevant for adolescent delinquency (i.e., low self-control, family risk, and neighbourhood risk) were assessed. The International Youth Survey data collected from 3114 students in grades 7 to 9, in the city of Toronto, were utilized in the study. The results obtained by the different approaches were compared and their consistency was examined in terms of the existence, direction, and strength of the relations of interest (specifically, the statistical significance, sign, and magnitude of the obtained coefficients were examined, as well as the magnitude of the standard errors and model fit indices). According to the results of the comparison, there was a considerable consistency in the results of the different approaches. However, some differences were also noted. The obtained differences are of importance as they may affect researchers’ conclusions in regard to the substantive problems of interest. The current study provided a number of highlights that may be of interest to researchers focused on methodological as well as applied aspects of assessing interactions. iii Preface This dissertation is original, unpublished, independent work by the author, G. Rajlic. The statistical analyses were conducted on the Statistics Canada International Youth Survey data, obtained from the Abacus (research data collection of the British Columbia Research Libraries' Data Services). iv Table of Contents Abstract................................................................................................................................. ii Preface ................................................................................................................................. iii Table of Contents ................................................................................................................. iv List of Tables....................................................................................................................... vii List of Figures ..................................................................................................................... viii Acknowledgments................................................................................................................. ix Dedication ............................................................................................................................. x 1. Introduction ...................................................................................................................... 1 1.1 Rationale ............................................................................................................................... 1 1.2 Study Purpose ....................................................................................................................... 2 1.2.1 Approaches Utilized in the Study .................................................................................. 3 1.2.2 Interactions between Risk Factors for Adolescent Delinquency ................................... 5 2. Literature Review and Background .................................................................................... 7 2.1 Assessing Interactions in Multiple Regression ..................................................................... 7 2.1.1 Challenges in Assessing Interactions in Multiple Regression ....................................... 9 2.2 Assessing Interactions in SEM with Latent Variables ........................................................ 12 2.2.1 Constrained Approach ................................................................................................. 14 2.2.2 Mean-centered Approach ............................................................................................. 16 2.2.3 Orthogonalizing Approach........................................................................................... 18 2.2.4 LMS ............................................................................................................................. 20 2.2.5 Challenges in Assessing Interactions in SEM ............................................................. 21 v 2.3 Research Addressing Interactions between Risk Factors for Delinquency ........................ 23 2.3.1 Low Self-control and Neighbourhood Risk ................................................................. 23 2.3.2. Low Self-control and Family Factors ......................................................................... 25 2.3.3 Family and Neighbourhood Factors............................................................................. 26 3. Method ............................................................................................................................ 28 3.1 Data..................................................................................................................................... 28 3.2 Measures ............................................................................................................................. 28 3.2.1 Low Self-control (LSC) ............................................................................................... 29 3.2.2 Neighbourhood Risk (NR) ........................................................................................... 30 3.2.3 Family Risk (FR) ......................................................................................................... 31 3.2.4 Delinquency (DELQ) ................................................................................................... 32 3.3 Procedure and Data Analysis ............................................................................................. 33 3.3.1 Multiple Regression ..................................................................................................... 33 3.3.2 Mean-centered Approach ............................................................................................. 34 3.3.3 Orthogonalizing Approach........................................................................................... 36 3.3.4 LMS ............................................................................................................................. 38 3.3.5 Missing Data ................................................................................................................ 39 3.3.6 Data Screening ............................................................................................................. 41 4. Results ............................................................................................................................ 51 4.1 Measurement Model – Confirmatory Factor Analysis ....................................................... 51 4.2 Interaction between Low Self-control and Neighbourhood Risk ........................................ 51 4.3 Interaction between Low Self-control and Family Risk ...................................................... 55 4.4 Interaction between Neighbourhood Risk and Family Risk ............................................... 56 vi 5. Summary and Discussion ................................................................................................ 58 5.1 Conclusions and Recommendations ................................................................................... 65 5.2 Substantive Problems Addressed in the Study .................................................................... 69 References .......................................................................................................................... 71 Appendices .......................................................................................................................... 80 Appendix A – Constructs and Measures Utilized in the Study .................................................. 80 Appendix B – Syntax for the Utilized SEM Approaches ........................................................... 81 Appendix C – Supplementary Analyses..................................................................................... 84 vii List of Tables Table 1. Descriptive Statistics for the Variables in Regression Analyses....................................41 Table 2. Correlations among the Predictors in Regression Analysis............................................47 Table 3. Tests of Univariate Normality of the Indicators Used in the SEM Approaches.............48 Table 4. Interaction between Low-self Control and Neighbourhood Risk – The Coefficients Obtained in the Three Approaches..................................................................................54 Table 5. Interaction between Low-self Control and Family Risk – The Coefficients Obtained in the Three Approaches......................................................................................................56 Table 6. Interaction between Neighbourhood Risk and Family Risk – The Coefficients Obtained in the Three Approaches..................................................................................................57 viii List of Figures Figure 1. Distribution of the Variables in Regression Analysis..................................................42 Figure 2. Distribution of the Residuals in Regression Analyses – Histograms and Q-Q Plots...44 Figure 3. Bivariate Relations between the Variables used in Regression Analysis.....................45 Figure 4. Scatterplots of the Residuals Plotted against the Predicted Values..............................46 Figure 5. Distributions of the Indicators of the Latent Variables (LSC, NR, FR) – Q-Q Plots...50 Figure 6. Confirmatory Factor Analysis – Path Diagram............................................................52 ix Acknowledgments I would like to express my deep appreciation to all who facilitated realization of this thesis. I would like to thank the MERM professors for the excellent courses that the MERM program offers. It was a pleasure to take those courses. A special appreciation goes to my research committee - my supervisor Dr. Nand Kishor, Dr. Bruno Zumbo, and Dr. Amery Wu. This study was supported by the SSHRC funding – I am grateful for and humbled by their support. On a personal note, I thank my friends and former coworkers for their support and encouragement to pursue my interests in research – thank you Heather, Rob, and Grant. And above all, I owe special thanks to my family. To my husband, I am indebted forever – without your support this thesis would not be, and I dedicate it to you. x Dedication To Predrag and Ella Majkic 1 1. Introduction 1.1 Rationale When researchers are assessing relations between predictors and an outcome they may be interested in assessing interactions among the predictors. Interaction between two predictors in relation to the outcome exists if the relation between one predictor and the outcome depends on the level of the other predictor. In terms of causal relations, we say that the effect of one predictor on the outcome is different for different values of the other predictor. Consequently, the joint effect of all predictors on the outcome is different than the sum of the effects of individual predictors. In relation to the constructs in domains of psychology and behavioural sciences, studying linear relations and additive effects is predominant; however, interactive relations have been proposed in various theoretical frameworks applicable to human behaviour. Interactive relations in domain of social sciences remain underexplored, and methodological difficulties are among the reasons behind the lack of exploration of such type of relations. Traditionally, when researchers are assessing interactions among continuous variables, they use multiple regression analysis, which has certain limitations. The main limitation of the multiple regression approach is related to the fact that the predictors are treated as measured without error, while in practice that is not the case. While the presence of error is problematic for any variables in regression analysis, this is particularly the case in relation to the interactive term that represents interaction between variables (Aiken & West, 1991). In order to address the limitation of regression technique, related to measurement error, several structural equation modeling (SEM) approaches for assessing interactions among latent variables have been developed. As relatively recent, these approaches have not been used often in applied research. That is, while simulation studies that 2 compare various approaches are available, the studies that apply them to real data in different fields are rare. Translation of knowledge about the available SEM with latent variables approaches into specific research fields may initiate a greater interest and greater exploration of interactive relations among the factors relevant for human behaviour. 1.2 Study Purpose The purpose of the current project was to present, utilize, and compare the results of several approaches to assessing statistical interactions among continuous variables. The approaches selected and used in the project were: (a) multiple regression, (b) unconstrained mean-centered approach (Marsh, Wen, & Hau, 2004), (c) orthogonalizing approach (Little, Bovaird, & Widaman, 2006), and (d) latent moderated structural equations approach (LMS; Klein & Moosbrugger, 2000). The approaches were employed in assessing specific problems that involve interactions, from the field of adolescent antisocial behavior. Consistency of the results obtained by the different approaches was assessed in relation to (a) statistical significance of the obtained coefficients (coefficients of interaction terms as well as coefficients of first-order terms), (b) sign of the obtained coefficients, (c) magnitude of the obtained coefficients, (d) magnitude of the standard errors, and (e) model fit indices for the latent variables models. In other words, consistency of the results of the different approaches was examined in regard to several aspects relevant to researchers, such as agreement of the results with the theoretical proposition (i.e., whether there is interaction between relevant variables), and direction and strength of the relevant relations. Utilizing different techniques for assessing interactions and comparison of their results are beneficial for multiple reasons. The SEM approaches have been predominantly used in simulation studies so far, and more information about their performance in various applied 3 contexts is needed. Real data often encompass a wide range of conditions that are not all covered in simulation studies; that is, some of the issues arising in applied research are often not present in simulation studies. Hence, examining the techniques in practical context (in testing theoretical hypotheses, with real data) adds to simulation studies, and is valuable in terms of assessing generalizability of the results obtained in such studies. Further, comparison of the results of the SEM with latent variables approaches and the regression analysis, as proposed in the current study, is beneficial in terms of providing more information about the effects of measurement error on assessing interactive relations. In relation to the specific problem from the field of adolescent delinquency addressed in this study, a comparison of the results obtained in SEM approaches and in regression analysis is beneficial in evaluation of the findings of the previous studies that addressed the similar problems but exclusively utilized the methods that do not account for measurement error. 1.2.1 Approaches Utilized in the Study The first approach to assessing interactions that was utilized in this study was multiple regression, which has been the most widely used technique for assessing interactions in nonexperimental studies. Specifically, multiple regression procedure presented by Aiken and West, 1991 was employed. Then, three approaches to assessing latent interactions in the SEM context were selected and utilized in the current project. In the last couple of decades, several nonlinear SEM approaches have been developed in order to overcome various challenges inherent in assessing interactions. Different approaches have different strengths and drawbacks, and there is no agreement about the optimal one, as this field is in development – an evaluation of the available approaches is ongoing. The approaches chosen for the current project are among the most researched SEM approaches to assessing latent interactions, with promising findings in 4 regard to their performance. All selected SEM approaches are easy to utilize by applied researchers, and can be employed in widely available SEM software (LISREL or Mplus). The first two selected SEM techniques are based on the Kenny-Judd model (Kenny & Judd, 1984) and use the product terms of manifest variables to define latent interaction, which is represented by the product of latent variables. These techniques are unconstrained mean-centered approach (Marsh et al, 2004) and orthogonalizing approach (Little et al., 2006). Compared to the traditional methods that use the product indicators, known as ‘constrained approach’ (Algina & Moulder, 2001; Jaccard & Wan, 1995; Jöreskog & Yang, 1996; Hayduk, 1987; Kenny & Judd, 1984; Wall & Amemiya, 2001), these two approaches are characterized as ‘unconstrained’. While the traditional constrained approach used complicated nonlinear constraints to define relations between product indicators and the latent interaction factor, the most of these constraints are omitted in the unconstrained approaches. Hence, these techniques are simpler and easier to use by researchers. The two selected unconstrained approaches differ in the way in which the correlations among latent predictors and the latent interactions are minimized (mean centering of indicators vs. residuals centering method). The two approaches also differ in the way the product terms (indicators of the latent interaction) are formed. The last SEM approach that was utilized in the study is LMS (Klein & Moosbrugger, 2000). LMS does not use products of observed variables as indicators of latent variables; rather, it directly models the nonlinear multivariate distribution of the measured indicators. This approach has been referred to as ‘distribution analytic approach’. All approaches utilized in the study are presented in nontechnical terms, and a detailed account of their implementation is provided. 5 1.2.2 Interactions between Risk Factors for Adolescent Delinquency The selected techniques for assessing statistical interactions were applied to the specific problems from the field concerned with adolescent delinquent behavior. General psychological and developmental theories, such as ecological systems theory (Bronfenbrenner, 1979), bioecological theory (Bronfenbrenner & Ceci, 1994), person-environment development interplay model (Rutter & Rutter, 1993), diathesis-stress models, and differential susceptibility theory (Belsky & Pluess, 2009), provide framework for studying interactions among relevant factors in various fields, including the field concerned with delinquent behaviour. In this specific field, interactive relations have been proposed (Gottfredson & Hirschi, 1990; Granic & Patterson, 2006; Wikström & Sampson, 2003) and some of them were assessed in research studies. For example, several studies assessed the interaction between impulsivity/low self-control and neighbourhood context (Barker, Trentacosta, & Salekin, 2011; Jones & Lynam, 2009; Lynam et al., 2000; Meier, Slutske, Arndt, & Cadoret, 2008; Vazsonyi, Cleveland, & Wiebe, 2006; Zimmerman, 2010). The other studies addressed interaction between certain parenting characteristics and neighbourhood characteristics in relation to adolescent delinquency, such as Beyers, Bates, Pettit, & Dodge (2003); Lahey, Van Hulle, D’Onofrio, Rodgers, & Waldman (2008); and Roche, Ensminger, & Cherlin (2007). The studies addressing both problems used the methods that do not account for measurement error and they yielded mixed results. Building on the previous research in the field, the current project further explored interactive relations among the factors relevant for adolescent delinquency. Specifically, the interactions among the three characteristics were examined: low self-control, which is the most studied individual characteristic related to delinquency, neighborhood risk (explored in the current project in regard to presence of crime and disorder, and neighbourhood social efficacy) 6 and family risk (related to relationships with parents, parental involvement, and parental supervision and monitoring). Hence, specific questions that were addressed in the project were: (a) whether there is an interaction between low self-control and neighbourhood risk in relation to delinquent behaviour, (b) whether there is an interaction between low self-control and family risk in relation to delinquency; and (c) whether there is an interaction between neighbourhood risk and family risk in relation to delinquency. Two lines of previous research, i.e., the research addressing interaction between impulsivity/low self-control and neighbourhood characteristics, and research exploring interactions between family and neighbourhood characteristics were brought together in the current study; moreover, previous research was extended by adding a less researched problem of interaction between low self-control and family characteristics in relation to delinquency. By utilizing multiple techniques, the current study may shed more light on previous findings in the field, as the mixed findings in regard to the issues of interest were obtained. The mixed findings, in part, may be a consequence of the method used (not accounting for measurement error) that may result in bias in parameter estimates (underestimation or overestimation of the relevant parameters). 7 2. Literature Review and Background In this chapter, the first part is concerned with approaches to assessing statistical interactions among continuous variables. The selected approaches are presented, and the challenges that they face in assessing interactive relations are described. In the last part of this chapter, previous research about interactions among risk factors for delinquency selected for this project (low self-control, neighbourhood risk, family risk) is reviewed. 2.1 Assessing Interactions in Multiple Regression Traditionally, multiple regression has been recommended method for assessing interactions among continuous variables. In multiple regression analysis that assesses interactions among predictors, interaction is represented with a product term of the predictor variables (Aiken & West, 1991; Cohen, Cohen, West, & Aiken, 2003). That is, in a case of two predictors, the product of scores of those two predictors carries interaction between the predictors. The case of two predictors and the interaction between them is represented by Equation 1: 21322110 XXbXbXbbY + e (1) where Y is outcome variable, b0 is intercept, X1 and X2 are predictors, X1X2 is a product term that represents the interaction between the two predictors, b1 and b2 are regression coefficients representing the relationship between the individual predictors and the outcome, b3 is regression coefficient for the interaction between two predictors, e is error term. It is recommended that the predictor variables X1 and X2 are centered (which is accomplished by deducting the mean of the variable’s scores from each individual score on that variable), and the product term is formed from these centered predictors, in order to reduce nonessential multicollinearity and facilitate interpretation of the results (Aiken & West, 1991; Cohen et al., 2003). 8 In the product term X1X2 that represents interaction, interaction is the part that is independent of X1 and X2, i.e., the part from which X1 and X2 are partialed out. In regression analysis without interaction term, regression of Y on each of the predictors is constant across the range of the other predictor; for example, regression of Y on X1 is constant across all values of X2. That is, the value of regression coefficient for X1 (i.e., b1) is independent of the value of X2; b1 is the same at all values of X2. On the other hand, in regression analysis that assesses interaction, significant regression coefficient of the product term (b3) indicates that regression of Y on X1 depends on the values of X2; and that regression of Y on X2 depends on the values of X1. That is, the regression coefficient b1 represents conditional relation between X1 and Y, dependent on the values of X2, as well as the regression coefficient b2 represents a conditional relation between X2 and Y, dependent on the values of X1. In further exploring of interactive relations (when regression coefficient of the product term is significant), regression lines that represent the regression of Y on one predictor at specific values of another predictor are plotted. Such regression lines are named ‘simple regression lines’ (Aiken & West, 1991). An equation for the regression line of Y on X1 at different values of X2 is: Ŷ = (b1 + b3X2) X1 + (b2X2 + b0) (2) The value (b1 + b3X2) in Equation 2 is the simple slope, that is, the slope of the regression of Y on X1 at certain value of X2. Simple regression lines are usually plotted at the mean of X2, at a low value of X2 (commonly one standard deviation below the mean of X2) and at a high value of X2 (commonly 1 SD above the mean of X2). In the case of centered predictors, those values of X2 are 0, -1SD and 1SD. Other meaningful values of the X2 can be chosen for plotting interactions. Finally, in post hoc probing of the interactive effects, statistical significance of the simple slopes at certain values of X2 is assessed. 9 The assumptions underlying ordinary least square estimation also apply to the regression that includes interactions among the predictors. These assumptions include normality of residuals, homoscedasticity, residuals uncorrelated with predictors, as well as residuals uncorrelated for different cases. Further, the correct specification of the form of the relationships between independent variables and dependent variable, correct specification of the independent variables in the model, and no measurement error in independent variables are assumed. The violation of the assumptions may lead to the biased regression coefficients estimates and/or the biased standard errors of the regression coefficients (Cohen et al., 2003). 2.1.1 Challenges in Assessing Interactions in Multiple Regression In multiple regression method, the main challenge is related to reliability of measurement. Regression analysis assumes error-free predictors, i.e., observed variables are treated as measured without error, which in practice is not the case. In domains of social and behavioral sciences there is commonly some amount of error involved in measurement. In regression analysis, measurement error in predictor variables results in biased parameter estimates (Cohen et al., 2003). Specifically, in regression with one predictor, the estimated regression coefficient is attenuated (closer to zero than the population value) because the observed variance of the predictor variable is inflated by measurement error. In regression with multiple predictors the situation is more complicated – the extent and direction of bias vary as the bias additionally depends on intercorrelations among the predictors. In addition to biased regression coefficients, measurement error may lead to biased standard errors of regression coefficients, and incorrect significance tests and confidence interval. In regard to regression analysis that includes the product term representing interaction the issues related to measurement error are even more problematic. Unless the predictors are 10 perfectly reliable, the reliability of the product term is lower than the reliability of each of the predictors (i.e., it equals the product of reliabilities of individual predictors, if the predictors are centered and uncorrelated; Aiken & West, 1991). The size and direction of bias in regression coefficient of the product term varies, depending on multiple factors, such as reliabilities of the individual predictors, correlation between them, and correlation between the predictors and the product term. The direction of bias in the interaction coefficient, in presence of measurement error, is certain in the case when predictors are centered and bivariate normal (i.e., the correlation between the product term and the predictors are zero): in this case the regression coefficient for the product term is underestimated compared to the true value in the population (Aiken & West, 1991). Unreliability of measures is related to decreased power to discover the effects. Due to lower reliability of the product term, the power to discover interaction effect is lower than the power to discover the first-order effects, if the predictors are measured with error (Aiken &West, 1991). Aiken and West demonstrated drastic decrease in power of the test for interaction with dropping reliability of the predictors and how with a decrease in reliability in the predictors, the sample size required to produce satisfactory power increases drastically (p.164). Jaccard and Wan (1995) also demonstrated the effects of measurement error on estimates of interaction coefficients in multiple regression: the estimates were biased and the power of statistical tests of significance was reduced – these effects were more pronounced at higher levels of unreliability. In summary, the problems related to reliability are aggravated in the multiple regressions analysis that includes interactive terms compared to the regressions that is not concerned with interactions. Unless the reliability of the predictors is high, using multiple regressions has not been recommended. 11 Another issue that poses a challenge to assessing interactive relationships in multiple regression is problem of multicollinearity. Multicollinearity means that the predictor variables in analysis are highly correlated. For example, when one predictor is highly correlated to the set of other predictors, it contributes little unique information to the prediction, and the estimate of its regression coefficient becomes unstable and unreliable (Cohen et al., 2003). In other words, the estimate of the coefficient can significantly change from sample to sample (in magnitude and even direction); further, the regression coefficient has large standard error, large confidence interval, and is of little practical use. In regression that includes the product term, which represents interaction, multicollinearity problem is emphasized because the product term is highly correlated with the individual predictors from which it is composed. This collinearity, inherent in regression that includes product term, is named nonessential multicollinearity, and is result of nonzero means of the individual predictors that form the product term (Aiken & West, 1991). Nonessential multicollinearity is due to scaling of the variables and can be reduced by rescaling the variables. A strategy that has been widely utilized to reduce the nonessential multicollinearity is centering the predictor variables by deducting the mean of the scores from the individual scores. When the predictors are centered (and they are bivariate normal), the correlation between the predictors and the product term is zero (Aiken & West, 1991). Any correlation between the predictors and the product terms that remains after centering (i.e., after nonessencial multicollinearity is removed) is due to nonnormality of the predictors. This type of collinearity is known as essential multicollinearity and it can not be reduced by centering variables1. Another procedure suggested for reducing nonessencial multicollinearity, in which 1 Cohen et al., 2003 present some strategies for dealing with essential multicollinearity. 12 the predictors and the product term are fully orthogonalized, i.e., any corellation among them is removed, is residual centering (Lance, 1988; Little et al., 2006). Residual centering is a two-step regression method in which the product term is regressed on the individual predictors from which it is composed. In relation to the above described difficulties in assessing interactions in multiple regression, specifically in regard to reliability issue, Aiken and West recommend using latent variable structural models as one of the solutions. SEM with latent variables accounts for measurement error, and some of the problems related to reliability in regression analysis are alleviated in this way. 2.2 Assessing Interactions in SEM with Latent Variables Majority of the approaches developed for assessing interaction among latent variables could be classified into ‘product indicator approach’ that is based on Kenny-Judd model (Kenny & Judd, 1984). In the product indicator approach to assessing interactions among latent variables, interaction is represented as a product of individual latent variables. An equation for a basic model with two exogenous latent variables, and the interaction between them is: 2132211 (3) where η is latent criterion, α is intercept, ξ1 and ξ2 are latent predictor variables, ξ1ξ2 is product term representing the interaction between ξ1 and ξ2, γ1 and γ2 are coefficients representing linear effects of ξ1 and ξ2, γ3 is interactive effect (of ξ1ξ2), and ζ is a disturbance term. As in regression analysis, it is assumed that E(ζ ) = 0, and ζ is normally distributed and uncorellated with ξ’s. Also, it is assumed that ζ is homoscedastic (its variance is constant across the cases) and not autocorellated (not correlated for different cases); Bollen (1989). 13 In relation to the indicators of the latent predictors, measurement equations describing the relationships between the indicators and the latent variables (when each latent predictor is measured by three indicators) are: 11111 x 21222 x 31333 x 42444 x (4) 52555 x 62666 x where τs are intercepts, λs are factor loadings, ξ1 and ξ2 are latent predictors, and δs are measurement errors. In a case of centered variables, the intercept (τ) is not included in the measurement equations. The assumptions of the measurement model are that E(δ) = 0, δs are normally distributed, independent of each other, and independent of ξ and ζ. Additionally, it is assumed that δs are homoscedastic (constant variance across the cases) and not autocorellated (not correlated for different cases). Multivariate normality of the latent variables and the indicators is required for ML method of estimation (Bollen, 1989). In relation to the latent interaction, common to all product indicator approaches is that the products of the manifest variables (i.e., products of the indicators of the latent variables) are used to define the latent interaction term (ξ1ξ2). However, the way the manifest variables were used in order to define the latent product term differs in different approaches. Hence, measurement model will be presented separately for each of the approaches, and will be described in the section dedicated to each approach. 14 Further, while the techniques based on Kenny-Judd model use product terms (i.e., products of observed variables) as indicators of latent interaction, there are different techniques developed that do not use products of observed variables. Such approaches are LMS and Quasi- maximum likelihood (QML; Klein & Muthen, 2007), which have been referred to as ‘distribution analytic approaches’. Instead of product terms, they use original indicators of latent variables to estimate the interaction effect – they model the nonlinear multivariate distribution of the indicators. In structural models that include interactions, a consequence of the interaction (nonlinear) effect is that the latent criterion and its indicators will be nonnormal (Joreskog and Yang, 1996; Klein & Moosbrugger, 2000). LMS and QML take this specific type of nonnormality caused by nonlinear relationships into account. The other approaches to assessing latent interactions have been developed, such as Ping’s (1996) two-step method; Bollen and Paxton’s (1998) two-stage least squares (TSLS) approach, Wall and Amemiya’s (2001) two-step method of moments, Mooijaart & Bentler (2010) method of third-order moments, Bayesian approaches (Arminger & Muthen, 1998; Lee, 2007), etc. There is a need for more studies that evaluate the performance of some of these methods (performance of the Bayasian approaches, for example). The techniques that will be utilized in the current project have been researched the most, and the findings about their performance were promising. In the following sections of this chapter, the selected approaches to assessing latent interaction based on Kenny-Judd model will be presented first, followed by a distribution analytic approach (LMS). 2.2.1 Constrained Approach Kenny and Judd (1994) first proposed the procedure for assessing interactions among latent variables in which interactive effect is represented as the product of the individual latent 15 variables, i.e., ξ1ξ2 in Equation 3. Kenny and Judd used the products of manifest variables as indicators of the latent product term ξ1ξ2. Specifically, they used all possible products of manifest variables as indicators of latent interaction term. In relation to the model in Equation 3, when each of the two latent predictors was measured by three indicators, nine product terms are utilized as indicators of the latent interactive term ξ1ξ2 (x1x4, x1x5, x1x6, x2x4, x2x5, x2x6, x3x4 x3x5, x4x6). These product terms can be expressed in terms of the measurement equations of the individual indicators (Equation 3). In other words, the product terms are expressed as functions of the latent variables and the errors, by taking appropriate products. For example, in relation to one of the indicators of the latent interaction (x1x4): x1x4 = λ1λ4ξ1ξ2 + λ1ξ1δ4 + λ4ξ2δ1 + δ1δ4 (5) Equation (5) is obtained by multiplying measurement equations relevant for x1 and x4 (Equation 4). As the manifest variables were centered, Kenny and Judd did not include the intercepts (τ) in the measurement equations. Further, the variances of the product terms could be expressed in terms of variances of the individual indicators; in other words, they could be expressed as functions of the factor loadings, the latent variables variance and the error variance, i.e.: Var(x1x4) = λ12λ42 Var(ξ1ξ2)+ λ12 Var(ξ1δ4)+ λ42Var(ξ2δ1)+ Var(δ1δ4) (6) Under the assumption that the latent variables and the errors were normally distributed, by implementing various constraints, all parameters of the model were estimated in this approach. The authors originally employed the generalized least squares loss function estimation (GLS), as the product of the latent variables are never normally distributed, and the multivariate normality assumption is violated in the models including product terms. Traditional constrained approach was further developed in Algina and Moulder (2001); Hayduk (1987); Jaccard and Wan (1995); Jöreskog and Yang (1996); Wall and Amemiya (2001); etc. In the later developments of the 16 constrained approach, ML estimation was preferred compared to GLS estimation used by Kenny and Judd. In summary, in constrained approach, in order to estimate model, the complex constraints were used. The mistakes were easy to make and this approach, in general, was difficult for the researchers to utilize. Hence, it was rarely used in practice. The simplification of the model has been proposed in the unconstrained approaches. 2.2.2 Mean-centered Approach While the traditional constrained approaches used complicated nonlinear constraints to define relations between product indicators and the latent interaction factor, Marsh et al. (2004) omitted the most of these constraints in their unconstrained mean-centered approach. Marsh et al. approach was based on Algina and Moulder (2001) model, in which all indicators of predictors are mean-centered and mean structure is a part of the model. In Marsh et al. unconstrained approach, centered manifest variables were used to form the product terms that were then used as indicators of the latent interaction; however, nonlinear constraints present in the previous approaches were omitted. For example, in unconstrained approach, the product indicator x1x4 (in Equation 5) is expressed as x1x4 = λ7ξ1ξ2 + δ7 (7) where λ7 and δ7 are freely estimated, instead of being constrained to the existing linear parameters, as in Equation 5. This brought significant simplification in the specification of the model, making it easier to use by applied researcher, compared to the constrained approach. The only constraint in Marsh et al. approach was that the mean of the latent interaction equals correlations between two latent predictors that form the product term representing interaction, 17 i.e., E(ξ1ξ2) = Cov(ξ1,ξ2) = Φ21. All details about the model specification will be provided in the context of the practical problem addressed in this project (Method section). Further, in Marsh et al. (2004) approach, the specification of the latent interactive product term was simplified in such that not all possible product indicators were used, as originally proposed by Kenny and Judd. Instead, the matched pair strategy was used. In the matched pair strategy, in relation to the model with two latent predictors, each measured by three indicators, the first indicator of the first latent variable is matched with first indicator of the second latent variable (x1x4), the second indicator of ξ1 was matched with the second indicator of the ξ2 (x2x5), and the third indicator of the ξ1 is matched with the third indicator of ξ2 (x3x6). Hence, the three product terms are formed (x1x4, x2x5, x3x6) that become indicators of latent interaction ξ1ξ2. Marsh et al. (2004) proposed that their model had advantages when the distributional assumptions were violated. Nonlinear constraints in all constrained approaches are based on the assumption that individual latent predictors are normally distributed; and the use of these methods is problematic when normality assumption is violated; however, in Marsh et al. method these constrains were omitted. Hence, the authors proposed that their approach might be more appropriate when latent variables were not normally distributed. In their simulation study the authors compared unconstrained to traditional constrained approach under different distribution conditions, and demonstrated that unconstrained approach possessed robust properties in relation to the nonnormality of the factors and their indicators - there was a smaller bias in estimating interaction effect compared to the bias in traditional constrained approach. Further, Cham, West, Ma, & Aiken (2013) found that even in conditions of more extreme nonnormality unconstrained approach with ML estimation yielded unbiased interaction effect estimates and acceptable Type I error rates, in larger samples, N > 500. Under the conditions when the normality assumptions 18 were met, unconstrained approach performed similarly as the traditional constrained approach (Marsh et al., 2004). 2.2.3 Orthogonalizing Approach Residual-centering, as an alternative to mean centering, is a way of dealing with multicollinearity in multiple regression. For a difference from mean-centering, the residual-centering fully orthogonalizes the predictors and the product term of the predictors, i.e., any correlation among them is removed (Lance, 1988; Little et al., 2006). Residual centering is a two-step method in which the product term is regressed on the individual predictors it is composed from, and the residuals from this regression then represent interaction effect. Little et al. (2006) extended the residual centering approach to the latent variable interaction, and in that context the authors named it orthogonalizing approach. Consistent with Kenny-Judd approach, Little et al. created indicators for latent interaction by forming the product terms of all indicators of the latent variables. For example, in relation to the model with two latent predictors, each measured by three indicators, all possible product terms of the indicators of latent variables ξ1 and ξ2 are formed (i.e, x1x4, x1x5, x1x6, x2x4, x2x5, x2x6, x3x4 x3x5, x4x6). Then, each of the product terms is regressed onto all of the first order indicators of the involved latent variables. For example, the product indicator x1x4 is regressed on x1, x2, x3, x4, x5, x6 (the indicators of ξ1 and ξ2), i.e., x1x4 = b0 + b1x1 + b2x2+ b3x3+ b4x4+ b5x5+ b6x6. The residuals of this regression are saved and then used as one indicator of the latent interaction ξ1ξ2. The same procedure is repeated for each of the product terms (the nine product terms in total) and the residuals are saved as indicators of the latent interaction. In other words, these orthogonalized product terms (in this case nine of them) are used as indictors of a single latent interaction (ξ1ξ2). 19 Since each of the indicators is used to form more than one product term (e.g. x1 was used to form x1x4, x1x5 and x1x6), these product terms are correlated. The latent interaction term, on another hand, is not allowed to correlate with the individual latent variables that form the interaction term, as the indicators of the latent interaction were othogonalized in relation to the individual latent variables. The details of the specification of the measurement model, following this approach, is further described in the context of the specific problem addressed in this project, in the corresponding section (Method section). In summary, Little et al. (2006) orthogonalized approach, in the same way as Marsh et al. mean-centered unconstrained approach, omits the complex nonlinear constrains used in traditional constrained approach. In dealing with multicollinearity issues, Little et al. used residual centering as an alternative to Marsh et al. mean-centering strategy. In relation to the product terms that serve as indictors of the latent interaction, the orthogonalizing approach uses all indicators of the latent variables to form the product terms, while Marsh et al. use the matching-pairs strategy. In the orthogonalizing approach the mean structure is not needed, for a difference from the mean-centered approach (Marsh et al., 2007). According to the results of comparison of several methods (Little et al., 2006), the orthogonalizing approach produced the results that were fully consistent with the results from OLS regression; however, the magnitudes of regression coefficient were larger, and the significance values smaller. In comparison to other SEM approaches (mean-centered and LMS), the bias in estimates of first order effects was similar (specifically, negligible), whereas bias in estimates of interaction effect was similar in orthogonalizing and mean-centered approach (bias in LMS approach was the smallest). The bias in estimates of interaction effect was still relatively small, i.e., the parameter estimates differed only in the second decimal place between the mean-centered or orthogonalized approaches compared to the LMS approach. 20 2.2.4 LMS LMS approach (Klein & Moosbruger, 2000) utilizes a new ML estimation method that takes into account distributional properties of the model with latent interaction. In the models that include interactions, as a consequence of nonlinear relations, the latent criterion and its indicators are not normally distributed (Joreskog & Yang, 1996; Klein & Moosbrugger, 2000). Hence, the joint indicator vector (x, y) is not multivariate normally distributed, which represent a violation of the assumptions of ML estimation methods. In LMS, the distribution of joint indicator vector (x, y) is represented as a finite mixture of multivariate normal distributions (Klein & Moosbruger, 2000). Klein and Moosbruger provide technical description of the method. The results of simulation studies were supportive of the LMS method – it provided efficient parameter estimators and unbiased SE (Kelava et al., 2011; Klein & Moosbruger, 2000; Schermelleh-Enger, Klein, & Moosbrugger, 1998). Even though LMS approach does not assume that the indictors of the latent criterion are normally distributed, in regard to the individual predictors it is based on the same assumptions as the traditional product indicator approaches, i.e. the assumption that the indicators of the latent predictors and the latent predictors are multivariate normally distributed. Klein and Moosbrugger (2000), however, provided some evidence about robustness of the method to violation of the normality of the predictors assumption. Further, Chum et al., (2013) found that if the violation of normality was not severe, the LMS approach yielded the most efficient estimates of the latent interaction effect (in comparison with three other used approaches); however, when the predictors were highly nonormal, LMS yielded biased estimates of the interaction effect. LMS is implemented in Mplus software (Muthen & Muthen, 1998-2010). 21 2.2.5 Challenges in Assessing Interactions in SEM One of the problems that SEM methods face is related to multivariate normality requirement. Multivariate normality is the assumption of the most widely used method of estimation in SEM (maximum likelihood estimator). However, when two latent predictors are normally distributed, their product term does not have a normal distribution (Cham et al., 2013). With increased corellation between the predictors, the nonormality of product term increases (Ma, 2012). Hence, in SEM that involves the latent product terms (i.e., the methods based on Kenny-Judd approach), the multivariate normality assumption is violated. The violation of distributional assumptions of ML may result in incorrect estimates (the standard errors and 2 statistics can be biased; Bollen, 1989). SEM methods that involve product terms for assessing latent interaction address the multivariate non-normality issues in different ways, such as by using a different method of estimation instead of ML, by utilizing a correction for nonnormality (i.e., Satorra-Bentler correction), or by assessing/demonstrating robustness of ML to violation of multivariate normality assumption. Additionally, different SEM methods that do not involve creation of the latent product terms have been developed recently, such as LMS and QML. These methods take into account nonnormality of the latent criterion and its indicators caused by nonlinear relationships. In relation to the first direction, i.e., using different method of estimation instead of ML, alternative methods of estimation are available for the circumstances when the assumptions of ML are violated. For example, in their work, Kenny and Judd (1984) originally employed GLS estimation method. However, in such way, standard errors of estimated coefficients were unknown, and confidence intervals for the population values could not be estimated. Jöreskog 22 and Yang (1996) recommended the weighted least squares based on augmented moment matrix method (WLSA); however, the method requires very large samples. Overall, the estimation methods other than ML are available; however, they have different set of limitations and do not provide all benefits that ML does (Bollen, 1989; Brown, 2006). Another option, i.e., utilizing correction for nonnormality, has been proposed to address the issue of nonnormality. However, this correction was not supported in the literature in the situations when nonnormality was caused by introduction of product terms. This method did not result in improved standard errors in Moulder and Algina (2002), and no support for its use was found in Cham et al. (2013). In relation to the third direction (assessing/demonstrating robustness of ML to violation of multivariate normality assumption), it has been suggested that under certain conditions, the use of ML estimation can be justified in the presence of the non-normal product terms. For example, Jaccard and Wan (1995) demonstrated that the ML estimation was robust to violations of the normality caused by the introduction of the product terms, given normally distributed manifest indicators and individual latent variables. It was superior to the other method the authors assessed (the WLS in the form of arbitrary distribution function estimation, ADF). Kenny and Judd (1984), while originally using GLS estimator, noted that the ML parameters estimates were very similar to the GLS estimates. Yang (1998) found that ML performed similarly to the two other approaches he assessed (WLS and WLSA). Further, Moulder and Algina (2002) demonstrated an adequate performance of the ML estimation (as implemented in Algina-Moulder model). Overall, in situations when multivariate nonnormality is caused by the introduction of the nonormal product terms, there is growing evidence that ML method may be an acceptable method of estimation. The issue of multivariate nonnormality that arises from nonnormaly distributed observed variables (in the models that include latent interactions) is a 23 distinct issue, addressed in Cham et al. (2013); Coenders, Batista-Foguet, & Saris (2008); Klein and Moosbrugger (2000); Klein and Muthen (2007), Marsh et al. (2004); Wall and Amemiya (2001). The other issue that the SEM that includes latent interaction terms faces (same as multiple regression method) is the issue of multicollinearity. Multicollinearity is even more problematic in SEM than in regression analysis, because when measurement errors are accounted for, the correlation between the predictors and latent interaction terms increase (Dimitruk, Schermelleh-Engel, Kelava, & Moosbrugger, 2007). Different SEM methods for assessing interactions used different strategies to deal with multicollinearity. As previously described, Marsh et al. (2004) approach uses centered indicators, whereas the orthogonalizing approach of Little et al. (2006), use residual centering. In regard to LMS and QLM, the issue of nonessential multicollinearity resulting from introducing the product terms into the model is not relevant (as there is no indicator product terms). 2.3 Research Addressing Interactions between Risk Factors for Delinquency 2.3.1 Low Self-control and Neighbourhood Risk The most of the research about interaction between factors relevant for adolescent delinquency has been conducted in regard to the interaction between impulsivity and neighbourhood context. There have been several studies that specifically addressed this problem. The most studies represented an attempt to blend two lines of theorizing about the causes of crime (i.e., those emphasizing individual traits and those emphasizing contextual variables) by assessing their interactive effects. 24 Lynam et al., 2000 explored interaction between impulsivity and neighbourhood context (neighbouhood disadvantage) in offending of intercity boys from Pitsburg Youth Study (N = 425). In two studies, the authors found significant interactions, indicating that the relation between impulsivity and delinquency was stronger in disadvantaged neighbourhoods. On the other hand, in a sample from the National Longitudinal Study of Adolescent Health (N = 19842), Vazsonyi et al. (2006) did not find a significant interaction between impulsivity and neighbourhood disadvantage. Further, Meier et al. (2008) found a significant interaction between impulsivity and neighbourhood risk in a large sample of 85,000 Iowa students: the effect of impulsivity on delinquency was greater for adolescents living in higher risk neighborhoods (i.e., low collective efficacy) than for adolescents living in lower risk neighbourhoods, consistent with Lynam et al. findings. Additionally, in a sample from the Project of Human Development in Chicago Neighborhoods (N = 1191), Zimmerman (2010) also found a significant interaction between impulsivity and neighborhood context; however, the direction of interaction was opposite from the one suggested in the previous research. That is, the relationship between impulsivity and offending was amplified in lower risk neighbourhoods (neighborhoods with higher levels of socioeconomic status and collective efficacy, and lower levels of criminogenic behavior settings and moral/legal cynicism). Finally, Barker et al. (2011) and Neumann, Barker, Koot, & Maughan (2010) examined the sample of youth from the Edinburgh Study of Youth Transitions and Crime (n = 4957), and their results did not support the interaction between impulsivity and neighbourhood context (i.e., neighbourhood disadvantage and social control) in predicting antisocial behaviour. Some support for interactive effect was found for girls, in relation to positive outcomes and parental knowledge. 25 Overall, a half of the above studies were supportive of the interactive effects, while the other half was not. There was a great heterogeneity in definitions of the constructs, as well as in measures that were used in the studies. These conceptual and methodological differences in the studies must be taken into account in considering the findings. Further, in all of the studies described above the predictors were treated as measured without error, and the latent variables framework was not utilized. 2.3.2. Low Self-control and Family Factors There has been abundance of research about importance of various family factors for adolescent delinquency; however, possible interactions between family factors and individual characteristics of adolescents have been rarely addressed. Only one study, conducted recently, addressed the issue in adolescent population. Chen and Jacobson (2013) found a significant interaction between family factors (family warmth and parental knowledge) and impulsivity in relation to adolescent delinquency, in a sample 3350 adolescents. Specifically, the negative relations between family warmth and delinquency, and between parental knowledge and delinquency were stronger for adolescents with high levels of impulsivity compared to those with below-average levels of impulsivity. Research concerned with interactions between family and individual characteristics was predominantly conducted in relation to childhood, such as are the studies that addressed the interaction between parenting characteristics and child temperament in relation to externalizing behaviour. For example, Bates, Pettit, Dodge, & Ridge (1998); Lengua, Wolchik, Sandler, & West (2000); and Leve, Kim, & Pears, (2005) provided some evidence of the interactions among the child temperamental characteristics (such as impulsivity) and family environment (discipline at home) in relation to externalizing behaviour. Further, in regard to fetal alcohol syndrome, it 26 has been suggested that a stable home environment protected the child from an antisocial outcome (Streissguth, Barr, Kogan, & Bookstein, 1996). Further research about interactions among family factors and adolescents characteristics in relation to delinquency and other problem behaviour is needed. 2.3.3 Family and Neighbourhood Factors In terms of interactions among family and neighborhood characteristics in relation to adolescent delinquency, several research studies have emerged recently that provided some evidence supportive of the interactive relations between family and neighbourhood risk factors. In a sample of 440 early adolescents, Beyers et al. (2003) reported that the negative relation between parental monitoring and externalizing behavior was stronger in neighbourhoods with more residential instability. Such interactive relation, however, was not found in respect to other aspects of the neighbourhoods (structural disadvantage, concentrated affluence). Further, in a study of 800 ethnic minority youth, Roche et al. (2007) found that the association between uninvolved and permissive parenting and delinquency in boys was stronger in higher risk neighborhoods (it was the strongest in socially disorganized, high-crime neighbourhoods). Such relation was not confirmed in girls, and further differences in interactive relations, in terms of gender and ethnicity, were found in regard to other outcomes addressed in the study (such as school related problem behaviour). Addressing wider issues of parental knowledge, peer pressure, and delinquency, Lahey et al., (2008) found that active parental limit setting in early adolescence predicted late adolescent delinquency only among youth living in higher-risk neighoubrhoods (high social disorganization and crime). The authors did not confirm the interaction between parental monitoring and neighbouhood context, reported previously (Beyers et al. 2003). 27 Finally, using different methods, i.e., behavioral genetic methods, in relation to adolescent aggression, Cleveland (2003) found that shared environmental influences were significant in disadvantaged neighborhoods (as opposed to adequate neighbourhoods), suggesting that importance of family processes was increased in the situation of neighborhood disadvantage. The problem of interaction between family and neighbourthood characteristics needs further research. 28 3. Method 3.1 Data This project utilized the data from International Youth Survey (IYS), which was conducted by Statistics Canada in 2006. The survey was a part of the second round of the international study, which examined behaviour and misbehaviour of students in grades 7 to 9, in about 30 countries, mainly in Europe (the International Self-report Delinquency Study, Enzmann et al., 2010; Junger-Tas et al., 1994). The IYS was administered in the city of Toronto to a sample of 3290 students in grades 7 to 9. The most of the students were in grade 7 (37%); 35% of the students were in grade 8, and 28% of students in grade 9. Forty-seven percent of the sample were boys, while 53% were girls. The IYS data are part of the University of British Columbia Abacus2 research databases. 3.2 Measures The items administered in the IYS were used as measures of the concepts addressed in the study (low self-control, neighbourhood risk, family risk). The items utilized in the IYS were either taken from the existing measures (Grasmick, Tittle, Bursik, & Arneklev, 1993; Sampson & Raudenbush, 1999) or adapted from the survey items used in other large-scale youth studies. All measures were based on youth self-report, that is, the concepts represent youth perception (of their self-control, of neighbourhood risk, and of family risk). In regression analysis, IYS measures were used as direct measure of the concepts (defined as observed variables) while in the SEM approaches the same items were utilized as indicators of the latent variables representing the concepts in the study. Essentially, the same 2 the research data collection of the British Columbia Research Libraries' Data Services 29 items composed the predictors in regression analysis, and the indicators of the latent variables in SEM approaches. The number of indicators for each latent variable in this project was limited to three (Appendix A). As the purpose of the project was assessing interactions, which involves creating the products of indicators, the use of a large number of indicators for each construct would result in a model with very high number of variables. 3.2.1 Low Self-control (LSC) The nine items administered in IYS, from the Grasmick et al. (1993) Self-control scale, were used to define the LSC construct in the current project (the LSC predictor in regression analysis, and the LSC latent variable in SEM approaches). Gramsick et al.’s scale measures LSC construct as conceptualized in Gottfredson & Hirschi’s general theory of crime (Gottfredson & Hirschi, 1990), in which LSC was conceptualized as a unidimensional latent trait, represented by six components: impulsivity (the tendency to choose actions that offer immediate gratification), risk seeking behaviours, volatile temper (low tolerance for frustration), preference for simple rather than complex tasks, self-centered orientation, and preference for physical rather than mental activates. Three of these components, assessed in IYS, were chosen as indicators of the latent variable LSC in this project: impulsivity, risk seeking, and volatile temper (LSC1, LSC2 and LSC3, respectively). The indicators of the latent variable were formed by summing the scores on relevant questionnaire items (each indicator was a sum of three relevant items). For example, impulsivity consisted of three items that tap into lack of planning and premeditation, such as ‘I act on the spur of the moment without stopping to think’, and ‘I do whatever brings me pleasure here and now, even at the cost of some distant goal’. Stimulation seeking consisted of three items that capture tendency to seek and approach novel and exciting experiences despite the risks 30 associated with them, such as ‘ I like to test myself every now and then by doing something a little risky’. Volatile temper was assessed by three items, such as ‘I lose my temper pretty easily’ and ‘When I’m really angry, other people better stay away from me’. All items were measured on a 4-point Lickert-type scale; the higher scores indicated the higher level of risk (higher impulsivity, higher risk seeking, and volatile temper). Some of the original items were recoded to reflect such direction. In regression analysis, a measure of LSC was an index that was formed by summing all relevant items (impulsivity, risk seeking and volatile temper items). The higher score on the scale indicated the higher risk in terms of self-control (lower self-control). The reliability of the scale was Cronbach α = .80. 3.2.2 Neighbourhood Risk (NR) In the literature concerned with the relationship between neighbourhood characteristics and delinquency, characteristics that were suggested as relevant were structural characteristics of the neighbourhoods (such as socio-economic status, residential instability, family structure) and social processes in neighbourhood (such as crime in neighbourhood, lack of ties and relations among neighbours, etc.). In this project, the focus was on social processes in neighbourhood, usually defined as residents’ perceptions of how their communities function. Neighbouhood risk (NR) was defined as a unidimensional construct, and in SEM analyses it was represented by one latent variable measured by nine IYS items, which were combined into three indicators. The indicators were formed by summing the scores on relevant questionnaire items. The first indicator, neigbouhood crime (NR1), consisted of three 4-point Lickert items, such as ‘There is lot of crime in my neighbourhood’. The second indicator, low social efficacy (NR2), referred to cohesion and informal control in neighbourhood (i.e., the ties and 31 communication among neighbours, willingness of residents to intervene on behalf of neighbourhood, etc.; Sampson, Raudenbush, & Earls, 1997). Three IYS items, adapted from Sampson & Raudenbush, 1999 scale, comprised this indicator, including: ‘My neighbours notice when I am misbehaving and let me know’. The third indicator (NR3), physical signs of neighbourhood disorder (Wandersman & Nation, 1998) consisted of two items including ‘There is lot of graffiti in my neighbourhood’. For each of the three indicators of neighbourhood risk, the higher scores indicated a higher risk (higher crime in neighbouhood, lower social efficacy and higher physical disorder). Some of the original items were recoded to reflect such direction. The same nine items that formed three indicators of NR in latent variable models were utilized as measure of NR in regression analysis (all items were summed). The higher scores of the resulting scale indicated the higher risk. The reliability of the scale was Cronbach α = .77. 3.2.3 Family Risk (FR) The construct of family risk in this project relates to quality of family functioning, specifically parents/caregivers-child aspect of family functioning. In SEM approaches, the construct was measured by three indicators, formed by summing the scores on relevant questionnaire items. The first indicator, joint family activities (FR1), was frequency of engagement in shared activities (going out to movies, going for a walk/hiking, attending sport event, having meal together, etc.). It consisted of two items, measured on a 5-point scale. The second indicator (FR2) was related to youth getting along with parents/caregivers and it consisted of two items, measured on a 3-ponts scale. The third indicator (FR3) was parental monitoring, i.e. parental knowledge about youth’s associates and whereabouts; it was comprised of two IYS items, such as ‘Do your parents (or the adults you live with) usually know who you are with when you go out’. 32 For all of the FR indicators, higher scores indicated higher level of risk, i.e. poor parental supervision and involvement, and poor relationships. The similar items were used in other major youth studies, as indicators of family risk for delinquency. In multiple regression analysis, a composite index was formed by summing all relevant family items – this index represented the manifest variable ‘family risk’. The higher scores indicted higher family risk. The reliability of the scale was Cronbach α = .53; the scale was further examined in confirmatory factor analysis. 3.2.4 Delinquency (DELQ) Although in the SEM approaches the predictors were conceptualized as latent variables measured by multiple indicators, the outcome in this project, i.e., delinquency (DELQ) was represented by a single indicator, which was a count of types of delinquent activities the youth had engaged in. In IYS, the items were administered that measured various types of delinquent behaviour that youth engaged in, such as stealing from the store, breaking into building, carrying a weapon, threatening someone to get money, beat up someone, sold drugs, etc. The number of these activities reflected variety of delinquent engagement, and was taken as a measure of delinquency in this project. The reliability of the delinquency scale was Cronbach α = .71. Delinquency as a latent variable has not been studied enough. There is a lack of both theory and empirical evidence regarding the structure of the construct. Factor structure of delinquency is a project in itself, and outside of the scope of this study; hence, delinquency in this project was measured by a single indicator (that is, treated as an observed variable measured without error). That is, in both SEM and regression analysis, the dependent variable was conceptualized as an observed variable measured without error3. Further, the dependent variable in the current project, DELQ, represents a summary of historical engagement in delinquency. 3 Theoretically, error in predictors not in outcome variable is related to bias in estimates in multiple regression. 33 That is, DELQ is measured at the same time as the predictors, and included self-report on current and past behaviour. The research utilizing the data from survey-based cross-sectional projects commonly uses such dependent variable. Further, the focus of this project was on linearity/nonlinearity of the relations among the constructs, and not on proving causality or direction of relations (this has been the focus of many studies concerned with the same phenomena), hence, such definition of the dependent variable was utilized. 3.3 Procedure and Data Analysis The four described techniques (multiple regression, unconstrained mean-centered approach, orthogonalizing approach, LMS) will be utilized in order to assess the three interaction questions (interaction between LSC and NR, between LSC and FR, and between NR and FR). In other words, each of the three interactions will be assessed by applying each of the four techniques, and the results obtained by different techniques, in regard to each of the interaction problems, will be compared. 3.3.1 Multiple Regression In order to address three questions specified in this project, three regression analyses were performed: the first analysis addressed interaction between LSC and NR, the second one involved interaction between LSC and FR, and the third one involved interaction between NR and FR, in relation to adolescent delinquent behaviour. The same procedure (as described in the rest of this section) was followed in regard to each of the three regression analyses. First, all predictor variables were centered, in order to reduce the nonessential multicollinearity, and to increase interpretation of the results. The criterion variable was not centered. Using centered predictors, the product term of the predictors in each of the analysis was formed, which represented interaction between the predictors. The three regression equations were estimated: 34 LSCxNRbNRbLSCbbY 3210ˆ (9) LSCxFRbFRbLSCbbY 3210ˆ (10) FRxNRbNRbFRbbY 3210ˆ (11) Before the analyses were conducted, the assumptions underlying multiple regression analysis were checked. All analyses were conducted using the Statistical Package for Social Sciences (SPSS) 4. 3.3.2 Mean-centered Approach In applying unconstrained approach to address three specific questions set in the study, three separate analyses were conducted, following the same procedure. The indicators of the latent predictors were centered first. Products of the centered indicators (that served as the indicators of latent interactions) were formed then, by using matched pairs strategy. For example, in relation to the first interaction addressed in the project, i.e. the interaction between LSC and NR, where LSC and NR were ξ1 and ξ2, the first indicator of ξ1 was matched with the first indicator ξ2 (LSC1xNR1), the second indicator of ξ1 was matched with the second indicator of the ξ2 (LSC2xNR2), and the third indicator of the ξ1 was matched with the third indicator of ξ2 (LSC3xNR3). That is, the three product terms (LSC1xNR1, LSC2xNR2 and LSC3xNR3) served as indicators of latent interaction in the analyses. The variance-covariance matrix of the indicators of the latent predictors and the latent interaction was used as input in data analysis, and the analyses were performed in LISREL 9.10 (Jöreskog & Sörbom, 2013). 4 Multiple software were chosen for the current project (SPSS, LISREL and Mplus), even though all analyses could have been performed in one program (Mplus software is capable to accommodate analyses in all approaches). Demonstrating how the approaches were implemented in different software and providing syntax for each analysis was considered as potentially more beneficial to applied researchers. 35 The same model specification, as outlined below, was used in regard to assessing each of the three interactions addressed in the study. The equation for the structural part of the model corresponds to the general SEM equation (Equation 3) with η = y (as the outcome was measured by one indicator): 2132211y (8) Specifications of the measurement model, latent mean vector (k), and covariance matrix of the latent predictors (Φ) were: 9876543212121986532635241654321000010000000100000001xxxxxxxxxxxx (9) y = η 33323122211121,00k (10) In order to provide the scale for the latent variables, the λ1, λ4 and λ7 were set to 1. Latent interaction mean was set to Φ21. Φ31 and Φ32 were estimated as the indicators were not normally distributed. Further, following Marsh et al. (2004), the covariances between δs were set to 0, i.e., theta delta (Θδ) was a diagonal matrix: Θδ = diag (θ1, θ2, θ3, θ4, θ5, θ6, θ7, θ8, θ9) (11) LISREL syntax for the analyses performed according to unconstrained approach is included in Appendix B. The abbreviations used in syntax files are further defined in Appendix A. 36 3.3.3 Orthogonalizing Approach In orthogonalizing approach, in relation to each of the three addressed interactions, all possible products of the indicators of the latent variables were formed first. For example, in relation to the problem of interaction of LSC and NR, the products of the three indicators of LSC (LSC1, LSC2 and LSC3) and three indicators of NR (NR1, NR2, and NR3) were formed first, resulting in nine product terms: LSC1xNR1, LSC1xNR2, LSC1xNR3, LSC2xNR1, LSC2xNR2, LSC2xNR3, LSC3xNR1, LASC3xNR2, LSC3xNR3). The indicators of the latent variables were not centered before the product terms were formed. Rather, each of the formed product terms was regressed onto all indicators of the involved latent variables. For example, the product indicator LSC1xNR1 is regressed on LSC1, LSC2, LSC3, NR1, NR2, and NR3 (the indicators of LSC and NR), i.e.: LSC1xNR1 = b0 + b1LSC1 + b2LSC2+ b3LSC3+ b4NR1+ b5NR2+ b6NR3+e (12) The analysis was performed in SPSS, and the residuals of this regression were saved. The residuals were then used as one indicator of the latent interaction (LSCxNR). The same procedure was repeated for each of the nine product terms. Covariance matrix of the indicators of the latent variables and the residuals that serve as indicators of the latent interaction was used as input for the LISREL analysis (LISREL 9.10, Jöreskog & Sörbom, 2013). Covariances of the indicators of the latent variables and the nine indicators of the latent interaction (54 covariances in total) were 0 in covariance matrix (a consequence of the residual centering). The LISREL specification of the model was: 37 1514131211109876543212121151413121110986532635343625242615141654321000000000000000010000000100000001xxresxxresxxresxxresxxresxxresxxresxxresxxresxxxxxx y = η, 3322211100 (14) To define the metric for the latent variables, factor loadings λ1, λ4 and λ7 were set to 1. The correlations between latent interaction term and the individual latent variables that form the interaction term (Φ31 and Φ32) were set to 0, as the indicators of the latent interaction were othogonalized in relation to the indicators of the individual latent variables. Additionally, the correlated errors among certain indicators of the latent interaction were specified, as their error components contained common subcomponents. For example, the error of resLSC1xNR1 covarried with the errors of resLSC1xNR2 and resLSC1xNR3, as well as with the errors of resLSC2xNR1 and resLSC3xNR1 (because indicator LSC1 was used to form LSC1xNR1 LSC1xNR2, LSC1xNR3, while indicator NR1 was used to form LSC1xNR1, (13) 38 LSC2xNR1, and LSC3xNR1). Following the same logic, the rest of error covariances were specified and estimated (Appendix B). The error covariances between residuals of the product terms without common components were fixed to 0 (for example covariance between resLSC1xNR1 and resLSC2xNR2, or between resLSC1xNR1and resLSC3xNR3, etc.). In orthogonalized approach the mean structure is not estimated (Marsh et al., 2007). The structural part of the model was the same as in unconstrained mean-centered approach (Equation 8). A full specification of the model is provided in LISREL syntax file (Appendix B); the abbreviations are defined in Appendix A. 3.3.4 LMS In LMS method, the raw data of observed variables were utilized and no products of observed variables were formed. The joint indicator vector (x, y) = (x1, x2, x3, x4, x5, x6, y) was analyzed, i.e., in relation to the interaction between LSC and ND the joint indicator vector was (LSC1, LSC2, LSC3, NR1, NR2, NR3, DELQ). In case of one interaction effect, the structural equation is equivalent to that in product indicator approaches5, where ω12 is coefficient for latent interaction in Klein and Moosbrugger’s notation. The structural equation estimated in the three analyses (LSCxNR, LSCxFR, or NRxFR) was: 2112212121 000)( ξξy, (15) where ω12 equals γ3 in Equation 3; i.e., ω12 represents coefficient for the latent interaction (e.g., LSCxNR, LSCxFR, or NRxFR, in the three analyses). The estimated measurement model in each of the analyses was: 5 In more complex models that include quadratic terms, a parameter matrix Ω is needed. 39 65432121653265432165432100100001xxxxxx (16) y = η LMS method does not involve forming product indicators, and the parameters associated with the product indicators were not estimated. In regard to the latent interaction, only ω12 parameter was estimated, as the latent interaction does not have mean and variance parameters or the parameters for covariances with other latent predictors. The analyses were performed in Mplus program (Muthen & Muthen, 1998-2010) and the interaction terms were specified by using XWITH command. The syntax used in analysis is included in Appendix B. 3.3.5 Missing Data For the type of the analyses performed in the study (that require forming of the product terms) it was of importance that all cases have all data on all variables; one way of accomplishing this is a listwise deletion of missing data (i.e., cases with missing data on any of the relevant variables are removed from analyses). The amount of missing data on the individual items was small (percentage of missing cases was from to .4% to 1.7% on all except one item6); however, listwise deletion would result in loss of 7% of the cases (231 cases). To decrease the number of cases that would be lost in listwise deletion, an item-level imputation of missing data was performed first, where possible, in the following way: if multiple items were utilized for 6 One item had significantly more missing responses (11%) due to ‘not stated’ and ‘not applicable’ responses being combined into the same category in the available IYS database. Based on the pattern of the missing data on the other items, it is likely that most of the missing data on this item were ‘not applicable’ rather than ‘not stated’ responses; hence they were treated in this way (as described further). 40 each of the indicators (for example three items for LSC1 indictor), the missing value on one item was replaced by average of the other items from that group (i.e., the other two items of LSC1 indicator7). If the subject was missing a response on two out of three items, the replacement of missing values was not performed - in other words, only if 75% or more of information on that group of items was available, the procedure was performed. Further, if only two items formed an indicator (e.g., NR3), replacement of missing values, in this way, was not performed. Additionally, in relation to the ‘not applicable’ responses (relevant for two FR items), the imputation was performed either in the direction of the lower risk, or by replacement with the score on the available applicable item8. In regard to the dependent variable (delinquency), no data imputation was performed. Overall, only small number of missing data was imputed (only where appropriate, as described), but the loss of the cases in the leastwise deletion that was performed afterward was decreased. After the listwise deletion was performed, 176 cases (5.3% of the original sample) were lost. The analysis was performed to ensure that assumption of data missing completely at random (MCAR) is satisfied, and that listwise deletion was appropriate method for handling missing data. The logistic regression performed confirmed that the probability of missing data on the dependent variable (DELQ) was unrelated to the value of DELQ, or to the values of any of the predictors in the study. When MCAR assumption is satisfied, a listwise deletion produces 7 Rounded to the closest scale value 8 If youth does not go out at night, the item about parental supervision referring to going out was not applicable, hence, the score was imputed in the direction of the low risk (as the poor supervision was not a risk factor in such cases). If relationship with one parent/caregiver was coded but not with the other (i.e., single parent families), the available information was taken as representative of the ‘relationship with parents/caregivers’ indicator. 41 consistent parameter estimates, standard errors, and test statistics (Brown, 2006). Loss of efficiency, which is usually a concern, is not concern in this sample, as only small proportion of the sample was lost. Final number of cases in the sample, after listwise deletion, was N =3114. 3.3.6 Data Screening In order to assess if the assumptions underlying multiple regression and SEM analyses were met, data screening was conducted. Descriptive statistics for the variables used in regression analysis were presented in Table 1, and graphical depiction of univariate distributions was included in Figure 1. Table 1. Descriptive Statistics for the Variables in Regression Analyses Variables N Min Max Mean SD Skewness Kurtosis Statistic Std. Error Statistic Std. Error LSC 3114 -10.60 16.40 .000 5.59 .237 .044 -.441 .088 NR 3114 -6.69 17.31 .000 4.31 .988 .044 .785 .088 FR 3114 -3.40 11.60 .000 2.72 .901 .044 .520 .088 DELQ 3114 .00 10.00 .725 1.37 2.772 .044 9.390 .088 Note: LSC – Low self-control, NR – Neighbourhood risk, FR – Family risk, DELQ – Delinquency. *The predictors in regression analyses were centered. Examination of the graphical data and the values of statistical tests (skewness = 2.77, kurtosis = 9.39) indicated that the distribution of the dependent variable (DELQ) deviated from the normal distribution (leptokurtic, positively skewed distribution). Such result reflected the fact that delinquency, i.e., variety of delinquent acts, is a phenomenon that is not normally distributed 42 Figure 1. Distribution of the Variables in Regression Analysis 43 in general population. Smaller deviations from normality were present in the distributions of NR and FR. Screening for outliers in the predictors suggested some potential univariate outliers (Figure 1). The greatest number of outlying cases was on FR predictor (values for 24 cases were in the range 3.1 - 4.3 z-scores). Further examination of the outlying cases suggested that all of them were valid but extreme cases. The data were then examined for multivariate outliers (with interaction term excluded): in each of the three regression analyses there were multivariate outliers present, as indicated by the Mahalobies distance values (assessed against 2 value for df corresponding to the number of predictors, at p < .001; Tabachnick & Fidell, 2013). The greatest number of cases with significant Mahalobies values was in relation to the analysis involving NR and FR - 23 cases. Further examination of the multivariate outliers suggested that they did not exercise high influence on the analyses (based on examination of Cook’s distance values and leverage values9). Additionally, I performed the analyses with multivariate outliers excluded: the value of regression coefficients did not change substantially compared to the coefficients when the same cases were included. Hence, I decided to keep all cases in analyses. In relation to the specific assumptions of multiple regression, normality of the residuals was assessed first, by examining the graphical data (histograms and Q-Q plots, Figure 2) as well as the values of skewness and kurtosis. Such examination revealed that distribution of the residuals deviated from the normal distribution in each of the analyses. In the methodological literature, it has been shown that in large samples the violation of normality of residuals assumptions did not lead to biased coefficient estimates, or to serious problems with the significance tests or confidence intervals (Cohen et al., 2003). The problems were more likely in 9 The cut off values used for influence were: Cook’s D > 1 (Stevens, 1984) and leverage values > .20 (Huber, 1981). 44 Figure 2. Distribution of the Residuals in Regression Analyses – Histograms and Q-Q Plots 45 small samples. As in the present study the sample size was large, further remedies for the nonormality problem were not exercised. Next, bivariate scaterplotts for the variables in each of the analysis were examined (Figure 3) - they did not suggest obvious curvilinear relationships between variables. In relation to homoscedasticity assumption: the scatterplots of the residuals plotted against the predicted values (Figure 4) indicated some heteroscedasticity in the three analyses. Specifically, even though variability of the residuals seemed to be similar, the graphs revealed that at lower levels of the predicted scores the residuals were more in positive range, indicating underprediction, while at the higher predicted scores the residuals were in negative range indicating overprediction in dependent variable. Violation of homoscedasticity assumption is problematic if the degree of heteroscedsticity is large, which does not seem to be the case in these data. If the degree of heteroscedasticity is not large, the significance tests and confidence intervals are close to the correct values (Cohen et al., 2003). Figure 3. Bivariate Relations between the Variables Used in Regression Analysis 46 Further, the Durbin-Watson test (test statistic = 1.93, 1.92, and 1.97, for the three analyses) suggested that the residuals were independent of one another. In relation to multicollinearity, correlations among the variables used in regression analyses were provided in Table 2. Centering predictor variables decreased the corellations among the predictor variables and the product terms formed from them. Overall, multicollinearity was not of concern in this sample, as indicated by VIF and tolerance values (VIF range 1.03 - 1.15; tolerance range .87 - .96). Finally, the assumption of uncorrelated residuals and predictors was not violated (all rs = 0). Figure 4. Scatterplots of the Residuals Plotted against the Predicted Values 47 Table 2. Correlations among the Predictors in Regression Analysis LSC NR FR LSCxNR LSCxFR NRxFR LSC 1 .31 .28 .12 .04 .06 NR .31 1 .29 .27 .06 .19 FR .28 .28 1 .06 .18 .26 LSCxNR .12 .27 .06 1 .32 .33 LSCxFR .04 .06 .18 .32 1 .37 NRxFR .06 .19 .26 .33 .37 1 Note: LSC – Low self-control; NR – Neighbourhood risk; FR – Family risk; LSCxNR, LSCxFR, and NRxFR – interaction terms. In regard to SEM and the assumption of multivariate normality, an inclusion of the product terms in analyses automatically results in violation of multivariate normality assumption (the product of two variables is never normally distributed, even if the individual variables are; Cham et al., 2013). It has been shown that such violation had less significant consequences for the analysis if the multivariate normality in relation to the individual indicators and latent predictors was not violated. Hence, I examined multivariate normality of the individual indictors first. The statistical tests of univariate normality for the indicators in the three analyses were performed in PRELIS, and the results were presented in Table 3, whereas the Q-Q plots of the indicators were included in Figure 5. The skewness and kurtosis values were statistically significant for majority of the indicators; deviation from normality, however, was not extreme. As the individual indicators were not normally distributed, the assumption of multivariate normality of the indicators was not met in the data. Relative multivariate kurtosis for the three analyses (LSC and NR, LSC and FR, and NR and FR) was 1.42, 1.08 and 1.25 respectively, indicating moderate deviation from normality. 48 Table 3. Tests of Univarite Normality of the Indicators Used in the SEM Approaches Variables skewness z-score kurtosis z-score 2 (skewness and kurtosis) p-value LSC1 0.275 6.17 -0.501 -5.71 70.68 <.001 LSC2 0.235 5.30 -0.895 -10.20 132.16 <.001 LSC3 0.329 7.31 -0.755 -8.60 127.46 <.001 NR1 1.286 23.06 0.809 9.22 616.94 <.001 NR2 0.070 1.60 -0.082 -.938 3.437 .179 NR3 1.466 25.17 1.854 21.14 1080.04 <.001 FR1 0.856 17.08 0.095 1.08 292.72 <.001 FR2 1.415 24.59 1.190 13.57 788.62 <.001 FR3 1.252 22.64 0.987 11.25 638.93 <.001 Note: LSC – Low self-control; NR – Neighbourhood risk; FR – Family risk. The violation of distributional assumptions of ML is problematic, as it may result in incorrect estimates (biased standard errors and 2 statistics; Bollen, 1989). Overall, in the situations when the observed variables’ deviations from normality were not extreme, it has been shown that ML estimation was adequate (Curran, West, & Finch, 1996). The similar results were obtained in relation to the nonormality of the indicators when the latent interaction was present, in the product indicator approaches (Coenders et al., 2008; Marsh et al., 2004; Wall & Amemiya, 2001), as well as in LMS approach (Klein & Moosbrugger, 2000; Klein & Muthen, 2007). In Cham et al. (2013), even in conditions of more extreme nonnormality (such as symmetric and leptokurtic distributions with kurtosis ≈ 11 and ≈ 31, and the skewed moderately leptokurtic distribution with skewness ≈ 2 and kurtosis ≈ 6), ML estimation10 yielded unbiased latent 10 ML estimation utilized in unconstrained mean-centered approach and GAPI approach 49 interaction with acceptable actual Type I error rates when number of subjects was large (N > 500). Overall, the problems with ML estimation related to nonnormal indicators, in general, were more likely in small samples. Since in this project the sample was large, and the deviation from normality was not extreme, I decided to proceed with applying the selected approaches based on ML. 50 Figure 5. Distributions of the Indicators of the Latent Variables (LSC, NR, FR) – Q-Q Plots 51 4. Results 4.1 Measurement Model – Confirmatory Factor Analysis Before utilizing the three SEM approaches (unconstrained mean-centered approach, ortogonalizing approach, and LMS), confirmatory factor analysis (CFA) was performed in order to assess relations between the three latent variables (LSC, NR, and FR) and their indicators. The obtained path diagram (completely standardized solution) is presented in the Figure 6. The analysis suggested an adequate model fit, as indicated by goodness-of-fit indices (RMSEA = .059, 90% CI 0.053 - .065; CFI = .96; GFI = .98; Brown, 2006). Obtained factor loadings were statistically significant and of substantial size (standardized factor loadings ranged from .36 to .86). An examination of the residuals matrix also suggested that the model fit the data well – the most of the residuals between observed and reproduced correlation were close to zero (only two out of 36 residuals had absolute value greater than 0.1). Hence, the analyses in the selected SEM approaches were performed as intended, utilizing the original indicators. 4.2 Interaction between Low Self-control and Neighbourhood Risk Regarding interaction between LSC and NR, the results obtained by the selected approaches are reported in Table 4. The reported coefficients of the interaction term and the coefficients of the first-order terms are standardized coefficients. Specifically, the coefficients reported are ‘appropriate standardized coefficients’, as described in Aiken and West (1991) in the context of regression analysis, and in Wen, Marsh, & Hau (2010) in the context of SEM approaches. When interaction terms are included in the model, the regular standardized coefficients11 are not scale-invariant, i.e., the coefficients differ with different scales of indicators 11 reported in statistical software outputs 52 Figure 6. Confirmatory Factor Analysis – Path Diagram (e.g. centered and noncentered indicators). Therefore, ‘appropriate standardized coefficients’, which were proposed to be scale-invariant, were calculated following the procedure described in Aiken and West (1991), Friedrich (1982), and Wen et al. (2010). Standard errors and test statistic values reported were obtained from the outputs of the utilized statistical software, as recommended in Wen et al. (2010)12. 12 The same procedure in terms of reported standardized coefficients, standard errors and test statistic values was followed in regard to all three analyzed problems. 53 Three approaches (multiple regressions, mean-centered approach, and orthogonalizing approach) yielded consistent results in terms of the statistical significance of the coefficient of the interaction term LSCxNR. That is, a statistically significant interaction was found in all approaches (all p < .001). The statistical significance test values in the three approaches ranged from 6.89 to 11.11. Further, there was consistency in the direction of interaction, i.e. the interactions coefficient in all approaches had positive sign. Similarly, there was agreement among the approaches in terms of the statistical significance and the sign of the first-order coefficients (i.e., coefficient of LSC and of NR, in presence of the interaction term LSCxNR; Table 4). In relation to the magnitude of the coefficients, the magnitude of the standardized regression coefficient of the interaction term ranged from .16 to .19. In regard to the magnitude of the coefficients of the first-order terms (LSC and NR), a greater difference among the approaches were recorded (Table 4). The standard errors for the interaction coefficient and the first-order coefficients were similar across the three approaches. In terms of evaluating model fit – the goodness-of-fit indices in mean-centered and orthogonalizing approach were similar: RMSEA = .04, CFI =.97, GFI = .98 in mean-centered approach, and RMSEA = .05, CFI =.97, GFI = .98 in orthogonalizing approach, indicating that the model fit the data well. The variance in dependent variable accounted by the model was R2 = .32 in mean-centered, and R2 = .33 in orthogonalizing approach. In multiple regression, R2 was smaller compared to the approaches that account for measurement error, i.e., R2 = .23, F(3,3110) = 333.72, p < .001. In regard to the LMS approach, the problems were encountered in the analysis conducted in Mplus software. The analysis did not terminate normally and did not converge to a meaningful 54 solution13. In order to exclude the possibility that the specification of the dependent variable (measured by one indicator without error, i.e., conceptualized as observed variable) led to the problem, further analyses were performed on the model in which the dependent variable was measured by two indicators, as well as on the model with dependent variable measured by three indicators (Syntax provided in Appendix B). This, however, did not solve the problem. Further changes in the model, such as change of the method for latent variables scaling, did not make a difference. The reasons for the problems in the analysis will be explored in the discussion section. Table 4: Interaction between Low-self Control and Neighbourhood Risk – The Coefficients Obtained in the Three Approaches Approach Coefficient of LSCxNR Coefficient of LSC Coefficient of NR β3 SE Stat.test β1 SE Stat.test β2 SE Stat.test Regression .16* .014 11.11 .33* .016 20.34 .18* .017 10.57 Mean-centered .17* .011 6.89 .39* .021 16.70 .14* .017 5.26 Orhogonalizing .19* .009 9.41 .40* .021 17.21 .23* .013 11.31 Note: LSC – Low self-control; NR – Neighbourhood risk; β3 – coefficient of the interaction term LSCxNR, β1 – coefficient of the first-order term LSC, β2 – coefficient of the first-order term NR; SE – standard error, Stat. test – statistical significance test value. * p < .05 13 The program reported an error and that the model estimation did not terminate normally. Even though not encountered in this study, the problems of non-convergence and not obtaining ‘fully proper solution’ (solution that doesn’t contain negative variances or negative standard errors, etc.) are also present in other SEM software (i.e., LISREL). 55 4.3 Interaction between Low Self-control and Family Risk In regard to the second substantive problem, i.e., interaction between the LSC and FR, the results of the utilized approaches are presented in Table 5. The ‘appropriate standardized coefficients’ were calculated and reported (as described in the previous section, i.e., the results section pertaining to the first assessed problem of LSCxNR interaction). Three approaches (multiple regression, mean-centered approach, and orthogonalizing approach)14 found a significant interaction between LSC and FR (the statistical significance test values for the interaction coefficient ranged from 3.61 to 7.51). The interaction coefficient had the same sign in all approaches. The magnitude of standardized interaction coefficient across the approaches ranged from .12 to .19. The magnitude of the first-order coefficient of NR was similar (Table 5), however, a substantial difference was found in relation to the first-order coefficient of FR (it ranged from .17 to .33). In terms of the statistical significance and the sign of LSC and NR coefficients, there was agreement among the approaches. The standard errors were somewhat smaller in regression analysis compared to the two SEM approaches, for all coefficients, especially for the coefficient of the first-order term FR. The model was a good fit to the data, according to mean-centered and orthogonalizing approach: RMSEA = .04, CFI =.95, GFI = .98 for mean-centered and RMSEA = .04, CFI =.97, GFI = .98 for orthogonalizing approach. The variance in dependent variable accounted by the model in mean-centered and in orthogonalizing approach was R2 = .33 and R2 = .35, respectively. In multiple regressions, the model accounted for 21% of variance in dependent variable; R2 = .21, F(3,3110) = 284.1, p < .001. 14 The analysis based on LMS approach did not complete successfully. 56 Table 5: Interaction between Low-self Control and Family Risk – The Coefficients Obtained in the Three Approaches Approach Coefficient of LSCxFR Coefficient of LSC Coefficient of FR β3 SE Stat.test β1 SE Stat.test β2 SE Stat.test Regression .12* .015 7.51 .36* .017 21.63 .17* .017 10.01 Mean-centered .12* .033 3.61 .35* .030 10.78 .25* .067 5.51 Orhogonalizing .19* .029 6.66 .31* .027 10.51 .33* .054 8.92 Note: LSC – Low self-control; FR – Family risk.; β3 – coefficient of the interaction term LSCxFR, β1 – coefficient of the first-order term LSC, β2 – coefficient of the first-order term FR, SE – standard error, Stat.test – statistical significance test value. * p < .05 4.4 Interaction between Neighbourhood Risk and Family Risk The results regarding the third substantive problem, pertaining to interaction of NR and FR, are presented in Table 6. Two approaches suggested statistically significant interaction between NR and FR (multiple regression and orthogonalizing approach), while the mean-centered approach did not find significant interaction (marginal significance, p = .08)15. The magnitude of the interaction coefficient was similar across the approaches and the sign of the interaction coefficient was the same. In regard to the magnitude of the first-order coefficients, there was a substantial difference in coefficient for FR across the approaches, i.e., standardized regression coefficient of .20 was obtained in regression, and .40 in orthogonalizing approach. There was an agreement among the approaches in terms of the statistical significance and the sign of NR and FR coefficients. In relation to the standard errors, they were similar across 15 The analysis based on LMS approach did not complete successfully. 57 approaches except in case of FR coefficient. According to the two SEM approaches, the model was a good fit to the data: RMSEA = .06, CFI =.93, GFI = .97 for mean-centered approach; and RMSEA = .04, CFI =.96, GFI = .98 for orthogonalizing approach. The variance in dependent variable accounted by the model was R2 = .28 and R2 = .29, respectively. In multiple regressions, the model accounted for 15% of variance in dependent variable; R2 = .15, F(3,3110) = 188.33, p < .001. Table 6: Interaction between Neighbourhood Risk and Family Risk – The Coefficients Obtained in the Three Approaches Approach Coefficient of NRxFR Coefficient of NR Coefficient of FR β3 SE Stat.test β1 SE Stat.test β2 SE Stat.test Regression .06* .015 4.01 .26* .017 14.98 .20* .018 11.12 Mean-centered .08 .028 1.83 .21* .016 8.27 .35* .056 8.34 Orhogonalizing .08* .015 3.29 .23* .015 9.42 .40* .048 11.35 Note: NR – Neighbourhood risk; FR – Family risk.; β3 – coefficient of the interaction term NRxFR, β1 – coefficient of the first-order term NR, β2 – coefficient of the first-order term FR, SE – standard error, Stat.test – statistical significance test value. * p < .05 58 5. Summary and Discussion Several approaches to assessing interactions among variables have been developed recently in order to address various challenges inherent to assessing interactions. To provide more evidence about performance of different approaches, in an applied context, some of the approaches were utilized and their results compared in the current project. Traditionally used OLS regression and three approaches based on latent variables framework (mean-centered, orthogonalizing, and LMS approach) were compared. The approaches were utilized in assessing specific interactions that involve risk factors for adolescent delinquency (interaction between low self-control and neighbourhood risk, interaction between low self-control and family risk, and interaction between family risk and neighbourhood risk), in order to test relevant theoretical propositions. There was a considerable consistency in the results obtained with the different approaches. However, some differences were also noted. Such differences may affect researchers’ conclusions in regard to substantive problem of interest. Only results of the OLS regression, mean-centering approach, and orthogonalizing approach are discussed, as parameter estimates were not obtained in the LMS approach. The analyses based on LMS approach, implemented following the literature in which this approach was described and utilized (Kelava, Moosbrugger, Dimitruk, & Schermelleh-Engel, 2008; Kelava et al., 2011; Klein & Moosbrugger, 2000; Muthen & Muthen, 1998-2010), did not converge. There may be various reasons leading to convergence problems in Mplus software, as outlined by Muthén and Muthén, 1998-2010. In the current study, the indicators of the latent variables were measured on different scales – this condition was suggested to lead to convergence problems. Further, the variance in the dependent variable in this study (variety of 59 delinquent behavior) was particularly small16. According to Muthén & Muthén, the models with random effects17 that have small variances are among the models that are the most likely to have convergence problems. Further, a specific nonnormal distribution of the dependent variable may have contributed to the problem. Possible remedies, which require more technical changes to the analysis (or assistance of the analyst), were not pursued further in the current study18. The other two SEM approaches (mean-centered and orthogonalizing approach) converged and the results suggested an adequate model-data fit in regard to all three substantive problems addressed in the study. There was an overall consistency among multiple regression, mean-centered, and orthogonalizing approach in answering the main question about the specific problems addressed in the study, i.e., whether there is interaction between the relevant variables. Statistically significant interaction was found in all three approaches in regard to two assessed problems (i.e., interaction between low self-control and neighbourhood risk, and interaction between low self-control and family risk), supporting the theories that propose existence of such interactions. In relation to the third substantive problem (i.e., interaction between neighbourhood risk and family risk), the interaction coefficient was significant in regression and orthogonalizing approach whereas in mean-centered approach the coefficient was marginally significant. Even though the magnitude of the obtained interaction coefficient in mean-centered approach was consistent with the magnitude obtained in other two approaches, it did not reach statistical significance in this 16 The current study concerns population of adolescents in school (school population). 17 Models with variables defined by using the ON options of the MODEL command in conjunction with the | symbol – as is the case in models involving interactions among continuous latent variables. 18 The current project was conducted from the perspective of an applied researcher interested in utilizing the approaches as presented in the literature, in assessing substantive problems of interest. The modifications to the model that were attempted were described in the results section; more technical modifications remain to be explored further, as well as the alternative models for count variables. 60 instance19. The mean-centered approach was consistently more conservative in terms of testing statistical significance of the coefficients in the study (that is, it yielded the smallest values of the statistical significance tests for both interaction and first-order coefficients). Such trend resulted in a different conclusion in regard to the problem of interaction between neighbourhood risk and family risk, for which the test values were closer to the cutoff point for statistical significance in all approaches. In terms of the direction of interactive relations, there was agreement among the approaches in all three assessed problems, as the same sign of the interaction coefficients was obtained. Further, there was agreement among the approaches in regard to the statistical significance and the sign of the first-order coefficients (presence of the first-order effects and their direction) in all three problems assessed. Similarity was also observed in relation to the standard errors – their size was comparable across the approaches, in most of the analyses, for the coefficients of interaction terms and first-order terms. In few instances, when a greater difference in standard errors was obtained, the errors from regression analysis were smaller compared to the other two approaches. In general, smaller standard errors are of concern20 as they may lead to Type I error inflation and to an increased probability that decision about the effect is made when indeed there is no such effect. Inflation in Type I error is a documented shortcoming of multiple regression with the variables that contain measurement error (Bruner & Austin, 2009, Shear & Zumbo, 2013). Alternatively, in the context of SEM approaches, it has been suggested that correction for measurement error may increase standard errors in SEM approaches (and may decrease statistical power, Ledgerwood & Shrout, 2011, Cham et al., 2013). 19 p = .08 20 That is, if the errors are underestimated, smaller than they should be (which can be assessed in simulation studies). 61 In regard to magnitude of the obtained coefficients, the coefficients of interaction terms were of comparable size across the three approaches, whereas there was a greater variability in the obtained coefficients of the first-order terms. In some analyses, the magnitude of the first-order coefficients differed substantially across the approaches (e.g. standardized regression coefficient for FR in the NRxFR analysis was .40 in orthogonalizing approach and .20 in regression analysis). In general, magnitude of both interaction and first-order coefficients is relevant to the researchers because these coefficients are used in post hoc analyses that should be performed in presence of significant interaction. Specifically, the coefficients are used in calculation of simple regression lines that represent the relation between one predictor and the outcome at different values of the other predictor (i.e., conditional relations) 21. Hence, as obtained magnitude of the coefficients is a basis for conclusions about the strength of the conditional relations, differences in their magnitude across approaches (as obtained in the current study) are problematic – such differences would lead researchers to different answers about the relations of interest, depending on the approach they used. Therefore, the obtained differences need further discussion. In order to compare the relations among the variables in the model, or across the models, researchers use standardized regression coefficients. Such coefficients facilitate interpretation of the obtained results. In research utilizing SEM, standardized coefficients are commonly reported (e.g., completely standardized solution). However, in the analyses that contain interaction terms, magnitude of the standardized regression coefficients is dependent on scaling of the variables. That is, different standardized coefficients will be obtained from the analyses conducted with differently scaled variables (e.g., centered vs. non-centered variables), as demonstrated in Aiken 21 Both interaction and first-order terms coefficients are needed for calculations of simple regression lines. 62 and West (1991). Since the scales of the measured variables differed in the approaches utilized in the current study, in order to address this issue, the ‘appropriate standardized coefficients’ were calculated, following the procedure described in Aiken and West (1991) and Friedrich (1982) in regard to regression analysis, and in Wen et al. (2010) in regard to SEM approaches. The ‘appropriate standardized coefficients’ were proposed to be scale invariant; hence, different scales of the indicators across the approaches should not have a bearing on the differences in the coefficients obtained in the current project. Some of the differences in the coefficients in the utilized SEM approaches in this study were due to the use of different strategy in forming indicators of latent interaction. That is, the coefficients of the same size were not expected across the approaches, because not the exactly same indicators were used to define latent interaction in the mean-centered and orthogonalizing approaches. In accordance with the original proposal by the authors of the two approaches (Marsh et al. 2004 and Little et al., 2006), the ‘matched-pairs strategy’ was used in the mean-centered approach (that utilizes only some of the indicators) whereas in the orthogonalizing approach the ‘all possible product terms’ strategy was utilized (i.e., all indicators were used)22. It has been noted in the literature (Wen et al., 2010) that the use of different sets of indicators across different approaches causes small differences in obtained coefficients, yet some of the differences obtained in this study are substantial. The differences in magnitude in the standardized coefficients affect conclusions and interpretation of the relations of interest. As an illustration, in supplementary analyses, the simple 22 In some of the studies that compared different approaches to assessing interaction, the authors chose the same strategy for defining latent interaction (matched-pair or all product terms strategy) in different approaches. In this project, two approaches were implemented as originally proposed by the authors (Marsh et al. 2004 and Little et al. 2006) - this is how they likely would be used by applied researchers in their work. 63 regression lines were plotted based on the obtained standardized coefficients (at values 0, -1 and 1 of one predictor23) for the substantive problem in which the difference in one of the obtained first-order coefficients was considerable (i.e., LSCxFR problem). The graphs are presented in Appendix C. Even though the overall shape of the graphs is similar (reflecting presence of interaction between the variables in all approaches), the graphs clearly demonstrate a difference in the slopes of the lines across the approaches (different ‘steepness’ of the lines). In other words, these graphs reflect the fact that, based on the obtained results, the researchers’ conclusions in regard to the presence and direction of interaction in this specific problem would be the same across the approaches; however, the conclusions about more subtle issues, i.e., strength of conditional relations, would differ across the approaches. Previous simulation studies, in the conditions assessed in those studies, found overall similarity in size of regression coefficients across the leading SEM approaches24 (Little et al., 2006, Marsh et al., 2004, Kelava et al., 2008). Some recent research, however, highlighted differences among the approaches (Cham et al., 2013, Lin, Wen, Marsh, & Lin, 2010). For example, Lin et al. showed that mean-centered and orthogonalizing approaches produced the same first-order and interaction effects only when the all indicators of the predictor variables were normally distributed. The differences among the SEM approaches, in regard to magnitude of the coefficients, are of high relevance to an applied researcher, and they need further research attention. In terms of OLS regression, bias in regression coefficients when predictors are measured with error is a limitation of this approach that has been addressed in the literature extensively (Cohen et al., 2003). Measurement error in predictors commonly leads to attenuation in the 23 i.e. the mean of the predicator, 1SD bellow and 1SD above the mean 24 Including SEM approaches not assessed in this study - the studies that examined the same approaches that were utilized in this study are rare. 64 coefficients. However, the direction and size of bias depend on many factors, and they are not easy to predict. In regression that includes interaction terms, bias in coefficients for both first-order and interaction terms is expected. In the current study, in relation to the interaction coefficients, the coefficients from OLS regression analysis were the smallest in two out of three analyses. However, they were of comparable size to those from the SEM approaches. In regard to the size of the first-order coefficients, about half of the first-order coefficients were the smallest in regression approach; in some cases the difference was substantial (the greatest differences were recorded in relation to the variable with the lowest reliability). Still, an unequivocal evidence toward attenuation (or inflation) of the coefficients in regression analysis, compared to those in SEM approaches, was not found in this study. Substantially smaller estimates of R2 (i.e., the variance in the dependent variable explained by the model) were recorded in regression approach compared to the other two approaches, in all three assessed problems, pointing to the bias in R2 estimates. The results obtained in the current study should be observed in the context of the limitations of the project in which the data were collected (IYS). Overall, the limitations are related to the measures utilized in IYS and limited evidence about psychometric properties of those measures. Further, all IYS measures represent participants’ self-report, which likely led to some method effects. The data about the predictors and the outcome were collected at the same time. Characteristics of the utilized sample and generalizability of the findings are also the issue. Such limitations, however, have less impact on the conclusions about the methodological issues addressed in the study. 65 5.1 Conclusions and Recommendations The current study yielded a number of highlights and suggestions that are of relevance to researchers focused on methodological issues as well as to researchers concerned with applied aspects of assessing interactions. Based on the examination of the results of the chosen approaches, the current study found that there is considerable consistency among the three approaches (regression, mean-centered, and orthogonalizing approach) in answering the main research question i.e., whether there is interaction between the selected variables. Also, there was agreement among the approaches about the direction of the interactive relations, as well as about the presence and direction of the first-order effects. In regard to more detailed analyses and more subtle issue of strength of conditional relations, the current study indicated that the differences among approaches may be more substantial. In context of the existing research about assessing interactions, these results suggest a need for further investigation of the differences in magnitude of the coefficients in different approaches, especially in conditions commonly encountered in applied research (when violations of the assumptions are present). First-order terms coefficients, in particular, deserve more research attention (as the main focus in the literature is often on interaction coefficient). Both of these coefficients, i.e. interaction term- and first-order terms coefficients are of high relevance to the applied researchers, whose decisions in regard to substantive problems of interest depend on the magnitude of these coefficients. In relation to SEM approaches, it would be beneficial if all ‘product indicators’ SEM approaches adopted the same strategy of forming indicators of latent interaction (as argued in Marsh et al., 2007). This would facilitate researchers’ interpretation of the results, and comparison of the research findings based on different approaches. Further, the issue of ‘scale invariance’ in regard to the standardized regression coefficients in SEM approaches to assessing 66 interaction is of high relevance, and it requires broader communication to applied researchers25. What standardized coefficients should be reported in research (i.e., what ‘appropriate standardized coefficients’ in SEM approaches are) is an important question as the specific coefficients that researchers report are the basis for their conclusions in regard to the substantive problems (e.g. strength of the conditional relations of interest). Additionally, comparability of the results among the studies is impacted by this issue. Even though highly important, the problem of comparability across the studies and/or approaches concerned with assessing interactions, has been somewhat neglected in the literature. Further addressing of this issue would be beneficial for the field. In relation to specific individual approaches, based on overall experience in their utilization and on the obtained results, several points should be noted. The mean-centered approach was relatively easy to utilize. It was more conservative26 in terms of statistical significance testing, compared to the other two approaches. Although in the most cases this difference may not be of practical relevance, occasionally it could result in a different decision in testing statistical significance among these approaches. The mean-centered approach received considerable support in the literature so far. In relation to regression approach, there was no strong, unambiguous evidence about attenuation (or inflations) of the interaction coefficients and the first-order coefficients in this study. However, shortcomings of this approach, related to bias in coefficients and standard errors, when predictors are measured with error, have been well documented in the methodological literature. In regard to orthogonalizing approach, complexity 25 Commonly, standardized coefficients from the outputs of statistical software (which are not scale invariant) have been reported in research. 26 Alternatively, it could be said that the other two approaches are too liberal. The wording does not imply that any of the approaches is ‘right’ – such conclusions can not be made based on current study. 67 of application has been suggested in the literature as a challenge related to this approach. Based on the experience in this study, orthogonalizing approach was somewhat more complex to employ compared to other two approaches, as it required the greatest number of product terms, the specifications of correlated measurement errors, and a two-step calculation procedure. However, with the general knowledge of SEM and of the utilized software, the differences among approaches in this regard were not substantial. Of concern in relation to this approach are recent suggestions about the bias present when variables are non-normal (Lin et al., 2010), which needs further investigation. Finally, in regard to LMS approach, which has been described in the literature as one of among the most promising approaches to assessing latent interactions, the analysis did not converge (likely due to complexities inherent to the utilized data), and the results of this approach were not examined along the other three approaches. Another distribution analytic approach, QLM, may be considered as an alternative to LMS in cases of data with certain characteristics. For example, QLM is found to be more robust to violation of normality of the indicators and more suitable for more complex models when computational burden is increased (Dimitruk et al., 2007; Kelava et al., 2011) 27. In conclusion, there are several approaches currently available to applied researches who want to assess interactions among variables within their substantive research problems. An important question for the researchers is whether the choice of specific approach matters, i.e., whether it has substantial influence on the answers in regard to problems of interest. According to the results of the current study, in regard to the assessed approaches, the choice of technique matters in some aspects relevant to applied researchers. Hence, further examination of the approaches that would lead to an agreement about the optimal one (or optimal in certain 27 In regard to QLM approach, the software needed for analyses is not widely/commercially available, which may be problematic (an attempt to obtain the software for this project was not successful). 68 conditions) is needed. In addition to approaches assessed in this study, others are available, and some recent developments were characterized as promising (e.g., Mooijaart & Bentler, 2010; Lin et al., 2010). For example, Lin et al. double-mean-centering approach was suggested to combine the strengths and address the limitations of mean-centered and orthogonalizing approaches, but it has not been further evaluated yet. Overall, there have been important developments and improvements in assessing interactions recently. Development of the approaches based on SEM introduced the general benefits of SEM to this area, resulting in significant improvements in the field. However, new developments also introduced new challenges, and applied researchers interested in addressing problems that involve interactions still face considerable difficulties. The current study pointed to some of the challenges, such as those of practical nature (possible software nonconvergance with data with certain characteristics), or more substantive challenges related to the problems with comparability of the coefficients reported in research (e.g., coefficients from the commercial statistical software outputs versus coefficients standardized in the recommended way; or coefficients for the latent interaction based on different sets of indicators28). In regard to the later challenge (i.e., coefficients reported in research), the choice of coefficients influences the conclusions and practical decisions in regard to the substantive problems; additionally, it impacts comparability of the results (i.e., strength of relations) across the studies. Ensuring comparability across the studies and/or approaches is a goal of high importance in the field. The current study is based on real data, and in contrast to simulation studies, the truth (i.e., population parameters) is unknown. That is, the researchers in simulation studies evaluate the obtained results against the predetermined criteria, whereas that is not the case in the real 28 The first problem applies to product indicator approaches in general, whereas the second issue is related to the type of product indicator approach utilized (what method of forming indicators of latent interaction is utilized). 69 data studies. The later typically encompass more complex, unique set of conditions, and from such conditions they may bring new highlights and insights in regard to the relevant methodological topics. In the case of the current study, practical consequences and practical importance of certain methodological issues emerged as important highlights – these are, on the other hand, somewhat understated in typical simulation studies. Therefore, real data studies may be a valuable addition and supplement to simulation studies. 5.2 Substantive Problems Addressed in the Study In relation to the substantive problems that concern interactions between risk factors for adolescent delinquency (i.e., interaction between low self-control and neighbourhood risk, interaction between low self-control and family risk, and interaction between family risk and neighbourhood risk), based on the results of the utilized approaches to assessing interactions, the current study suggests the existence of interactions between the variables of interest. Such findings are relevant for the research and theories in the field. Further results, i.e., post hoc analyses of the chosen problems, and a discussion of the relevant issues in the context of previous research and literature, will be presented in another report. 70 71 References Aiken, L. S., and S. G. West (1991). Multiple regression: Testing and interpreting interactions. Newbury Park: Sage Publications. Algina, J., & Moulder, B. C. (2001). A note on estimating the Jöreskog–Yang model for latent variable interaction using LISREL 8.3. Structural Equation Modeling, 8, 40–52. Arminger, G., & Muthén, B. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs sampler and the Metropolis–Hastings algorithm. Psychometrika, 63(3), 271–300. Barker, E. D., Trentacosta, C. J., & Salekin, R. T. (2011). Are impulsive adolescents differentially influenced by the good and bad of neighborhood and family? Journal Of Abnormal Psychology, 120(4), 981-986. doi:10.1037/a0022878 Bates, J., Pettit, G., Dodge, K., & Ridge, B. (1998). Interaction of temperamental resistance to control and restrictive parenting in the development of externalizing behavior. Developmental Psychology, 34(5), 982-995. Belsky, J., & Pluess, M. (2009). Beyond diathesis-stress: Differential susceptibility to environmental influences. Psychological Bulletin, 135(6), 885-908. Beyers, J. M., Bates, J. E., Pettit, G. S., & Dodge, K. A. (2003). Neighborhood structure, parenting processes, and the development of youths' externalizing behaviors: A multilevel analysis. American Journal Of Community Psychology, 31(1-2), 35-53. doi:10.1023/A:1023018502759 Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Bollen, K.A., & Paxton, P. (1998). Interactions of latent variables in structural equation models. Structural Equation Modeling, 5, 267–293. 72 Bronfenbrenner, U. (1979). The ecology of human development: Experiments by nature and design. Cambridge, MA: Harvard University Press. Bronfenbrenner, U., & Ceci, S. J. (1994). Nature-nurture reconceptualized in developmental perspective: A bioecological model. Psychological Review, 101, 568–586. Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, NY US: Guilford Press. Brunner, L. J., & Austin, P. C. (2009). Inflation of Type I error rate in multiple regression when independent variables are measured with error. The Canadian Journal of Statistics, 37(1), 33–46. doi:10.1002/cjs.10004 Cham, H., West, S. G., Ma, Y., & Aiken, L. S. (2013). Estimating latent variable interactions with nonnormal observed data: A comparison of four approaches. Multivariate Behavioral Research, 47(6), 840-876. doi:10.1080/00273171.2012.732901 Chen, P., & Jacobson, K. C. (2013). Impulsivity moderates promotive environmental influences on adolescent delinquency: A comparison across family, school, and neighborhood contexts. Journal Of Abnormal Child Psychology, 41(7), 1133-1143. doi:10.1007/s10802-013-9754-8 Cleveland, H. (2003). Disadvantaged neighborhoods and adolescent aggression: Behavioral genetic evidence of contextual effects. Journal Of Research On Adolescence, 13(2), 211-238. doi:10.1111/1532-7795.1302004 Coenders, G., Batista-Foguet, J. M., & Saris, W. E. (2008). Simple, efficient and distribution-free approach to interaction effects in complex structural equation models. Quality & Quantity, 42, 369–396. 73 Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioural sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum. Curran, P. J., West, S. G, & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29. Dimitruk, P., Schermelleh-Engel, K., Kelava, A., & Moosbrugger, H. (2007). Challenges in nonlinear structural equation modeling. Methodology, 3, 100–114. Enzmann, D., Marshall, I. H., Killias, M., Junger-Tas, J., Steketee, M., & Gruszczynska, B. (2010). Self-reported youth delinquency in Europe and beyond: First results of the Second International Self-Report Delinquency Study in the context of police and victimization data. European Journal of Criminology, 7(2), 159–183. Friedrich, R. J. (1982). In defense of multiplicative terms in multiple regression equations. American Journal of Political Science, 26, 797–833. Gottfredson, M. R., & Hirschi, T. (1990). A general theory of crime. Stanford University Press, Stanford. Granic, I., & Patterson, G. R. (2006). Toward a comprehensive model of antisocial development: A dynamic systems approach. Psychological Review, 113, 101–131. Grasmick, H. G., Tittle, C. R., Bursik, R. J., & Arneklev, B. J. (1993). Testing the core empirical implications of Gottfredson and Hirschi’s general theory of crime. Journal of Research in Crime & Delinquency, 30, 5–29. Hayduk, L. A. (1987). Structural equation modeling with LISREL. Baltimore: Johns Hopkins Press. Huber, P. J. (1981). Robust Statistics. John Wiley and Sons, New York. 74 Jaccard, J., & Wan, C. K. (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348–357. Jones, S., & Lynam, D. R. (2009). In the eye of the impulsive beholder: The independent and interactive influences of impulsivity and perceived informal social control on offending behavior. Criminal Justice and Behavior, 36, 307-321. Jöreskog, K. G., & Yang, F. (1996). Nonlinear structural equation models: The Kenny-Judd model with interaction effects. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 57–88). Mahwah, NJ: Lawrence Erlbaum Associates. Jöreskog, K. G., & Sörbom, D. (2013). LISREL 9.1 [computer software]. Lincolnwood, IL: Scientific Software International. Junger-Tas, J., Terlouw G. J., and Klein M. W. (Eds) (1994). Delinquent behaviour among young people in the western world: First results of the international self-report delinquency study. Amsterdam: Kugler. Kelava, A., Moosbrugger,H., Dimitruk, P., & Schermelleh-Engel, K. (2008). Multicollinearity and missing constraints: A comparison of three approaches for the analysis of latent nonlinear effects. Methodology, 4, 51–66. Kelava, A., Werner, C. S., Schermelleh-Engel, K., Moosbrugger, H., Zapf, D., Ma, Y., & ... West, S. G. (2011). Advanced nonlinear latent variable modeling: Distribution analytic LMS and QML estimators of interaction and quadratic effects. Structural Equation Modeling: A Multidisciplinary Journal, 18(3), 465-491. 75 Kenny, D., & Judd, C. M. (1984). Estimating the nonlinear and interaction effects of latent variables. Psychological Bulletin, 96, 201–210. Klein, A., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65, 457–474. Klein, A. G., & Muthén, B. O. (2007). Quasi maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42, 647–674. Lahey, B. B., Van Hulle, C. A., D’Onofrio, B. M., Rodgers, J. L., & Waldman, I. D. (2008). Is parental knowledge of the adolescent offspring’s whereabouts and peer associations spuriously associated with offspring delinquency? Journal of Abnormal Child Psychology, 36, 807–823. doi:10.1007/s10802-008-9214-z Lance, C. E. (1988). Residual centering, exploratory and confirmatory moderator analysis, and decomposition of effects in path models containing interactions. Applied Psychological Measurement, 12, 163–175. Lee, S. Y. (2007). Structural equation modelling: a Bayesian approach. New York: Wiley. Ledgerwood, A., & Shrout, P. E. (2011). The trade-off between accuracy and precision in latent variable models of mediation processes. Journal of Personality and Social Psychology, 101, 1174–1188. Lengua, L. J., Wolchik, S. A., Sandler, I. N., & West, S. G. (2000). The additive and interactive effects of parenting and temperament in predicting problems of children of divorce. Journal Of Clinical Child Psychology, 29(2), 232-244. doi:10.1207/S15374424jccp2902_9 76 Leve, L. D., Kim, H. K., & Pears, K. C. (2005). Childhood temperament and family environment as predictors of internalizing and externalizing trajectories from ages 5 to 17. Journal Of Abnormal Child Psychology, 33(5), 505-520. doi:10.1007/s10802-005-6734-7 Lin, G.-C., Wen, Z., Marsh, H. W., & Lin, H.-S. (2010). Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies. Structural Equation Modeling, 17, 374-391. Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13, 497–519. Lynam, D. R., Caspi, A., Moffit, T. E., Wikström, P., Loeber, R., & Novak, S. (2000). The interaction between impulsivity and neighborhood context on offending: The effects of impulsivity are stronger in poorer neighborhoods. Journal Of Abnormal Psychology, 109(4), 563-574. doi:10.1037/0021-843X.109.4.563 Meier, M. H., Slutske, W. S., Arndt, S., & Cadoret, R. J. (2008). Impulsive and callous traits are more strongly associated with delinquent behavior in higher risk neighborhoods among boys and girls. Journal Of Abnormal Psychology, 117(2), 377-385. doi:10.1037/0021-843X.117.2.377 Marsh, H. W., Wen, Z., & Hau, K.T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300. Marsh, H. W., Wen, Z., Hau, K.-T., Little, T. D., Bovaird, J. A., & Widaman, K. F. (2007). Unconstrained structural equation models of latent interactions: Contrasting residual- and mean-centered approaches. Structural Equation Modeling, 14, 570–580. 77 Mooijaart & Bentler (2010). An alternative e approach for nonlinear latent variable models. Structural Equation Modeling, 17, 357-33. Moulder, B. C., & Algina, J. (2002). Comparison of methods for estimating and testing latent variable interactions. Structural Equation Modeling, 9, 1–19. Muthén, L. K., & Muthén, B. O. (1998-2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthén & Muthén. Neumann, A., Barker, E. D., Koot, H. M., & Maughan, B. (2010). The role of contextual risk, impulsivity, and parental knowledge in the development of adolescent antisocial behavior. Journal of Abnormal Psychology, 119, 534–545. doi:10.1037/a0019860. Ping, R. A. (1996). Latent variable interaction and quadratic effect estimation: A two step technique using structural equation analysis. Psychological Bulletin, 119, 166–175. Roche, K. M., Ensminger, M. E., & Cherlin, A. J. (2007). Variations in parenting and adolescent outcomes among African American and Latino families living in low-income, urban areas. Journal Of Family Issues, 28(7), 882-909. doi:10.1177/0192513X07299617 Rutter, M. & Rutter, M. (1993). Developing minds: Challenge and continuity across the lifespan. Penguin. London. Sampson, R. J., & Raudenbush, S. W. (1999). Systematic social observation of public spaces: A new look at disorder in urban neighborhoods. American Journal of Sociology, 105, 603–651. Sampson, R. J., Raudenbush, S. W., & Earls, F. (1997). Neighborhoods and violent crime: A multilevel study of collective efficacy. Science, 277(5328), 918-924. doi:10.1126/science.277.5328. 78 Schermelleh-Engel, K., Klein, A. & Moosbrugger, H. (1998). Estimating nonlinear effects using a latent moderated structural equations approach. In R.E. Schumacker & G.A. Marcoulides (Eds.), Interaction and nonlinear effects in structural equation modeling (pp. 203-238). Mahwah, NJ: Lawrence Erlbaum Associates. Shear, B. R., & Zumbo, B.D. (2013). False positives in multiple regression: Unanticipated consequences of measurement error in the predictor variables. Educational and Psychological Measurement, 73, 733-756. doi: 10.1177/0013164413487738 Streissguth, A. P., Bookstein, F. L., Barr, H. M., Sampson, P. D., O'Malley, K., & Young, J. (2004). Risk factors for adverse life outcomes in fetal alcohol syndrome and fetal alcohol effects. Journal Of Developmental And Behavioral Pediatrics, 25(4), 228-238. doi:10.1097/00004703-200408000-00002 Stevens, J. P. (1984). Outliers and influential data points in regression analysis. Psychological Bulletin, 95, 334-344. Tabachnick, B. G., and Fidell, L. S. (2013). Using Multivariate Statistics, 6th ed. Boston: Allyn and Bacon. Vazsonyi, A. T., Cleveland, H., & Wiebe, R. P. (2006). Does the effect of impulsivity on delinquency vary by level of neighborhood disadvantage? Criminal Justice and Behavior, 33(4), 511-541. doi:10.1177/0093854806287318 Wall, M. M., & Amemiya, Y. (2001). Generalized appended product indicator procedure for nonlinear structural equation analysis. Journal of Educational and Behavioral Statistics, 26, 1–29. Wandersman, A., & Nation, M. (1998). Urban neighborhoods and mental health. American Psychologist, 53, 647–656. 79 Wen, Z., Marsh, H. W., & Hau, K. T. (2010). Structural equation models of latent interactions: An appropriate standardized solution and its scale-free properties. Structural Equation Modeling, 17, 1–22. Wikström P. O. & Sampson R. J. (2003). Social mechanisms of community influences in crime and pathways in criminality. In B. B. Lahey, T.E. Moffitt, & A. Caspi (Eds.), The causes of conduct disorder and serious juvenile delinquency. New York: Guilford Press. Zimmerman, G.M. (2010). Impulsivity, offending, and the neighborhood: Investigating the person–context nexus. Journal of Quantitative Criminology 26, 301–332. 80 Appendices Appendix A – Constructs and Measures Utilized in the Study Variables* Indicators of Latent Variables Low self-control (LSC) LSC1 – Impulsivity (3 IYS items) LSC2 – Risk seeking (3 IYS items) LSC3 – Volatile temper (3 IYS items) Neighbourhood risk (NR) NR1 – Neighbourhood crime (3 IYS items) NR2 – Social efficacy (control and cohesion) (3 items) NR3 – Physical disorder in neighbourhood (2 IYS items) Family risk (FR) FR1 – Joint activities (2 IYS items) FR2 – Relationship with parents/caregivers (2 IYS items) FR3 – Parental knowledge and monitoring (2 IYS items) Delinquency (DELQ) Engagement in various delinquent activities (13 IYS items) *Conceptualized as latent variables in SEM approaches and as observed variables in multiple regression 81 Appendix B – Syntax for the Utilized SEM Approaches 1. LISREL Syntax for Mean-centered Approach: Interaction between Low Self-control (LSC) and Neighbourhood Risk (NR) DA NO=3114 NI=10 MA=CM LA DELQ LSC1 LSC2 LSC3 NR1 NR2 NR3 LSC1xNR1 LSC2xNR2 LSC3xNR3 MO NX=9 NY=1 NK=3 NE=1 LX=FU,FI LY=FU,FI PH=SY,FI TD=SY,FI GA=FU,FI KA=FR PS=FR TE=FI TY=FR LK LSC NR LSCxNR LE DELQ FR LX 2 1 LX 3 1 LX 5 2 LX 6 2 LX 8 3 LX 9 3 !Scale VA 1 LX 1 1 LX 4 2 LX 7 3 VA 1 LY 1 1 FR PH 1 1 PH 2 2 PH 3 3 FR PH 2 1 PH 3 1 PH 3 2 !TD is diagonal FR TD 1 1 TD 2 2 TD 3 3 TD 4 4 TD 5 5 TD 6 6 TD 7 7 TD 8 8 TD 9 9 FR GA 1 1 GA 1 2 GA 1 3 !The latent means are 0 FI KA 1 KA 2 VA 0 KA 1 KA 2 !Constrain KA3 CO KA 3 = PH 2 1 PD OU ML AD=OFF ND=3 SC RS 82 2. LISREL Syntax for Residual Centering Approach: Interaction between Low Self-control (LSC) and Neighbourhood Risk (NR) DA NO=3114 NI=16 MA=CM LA DELQ LSC1 LSC2 LSC3 NR1 NR2 NR3 rLSC1xN1 rLSC1xN2 rLSC1xN3 rLSC2xN1 rLSC2xN2 rLSC2xN3 rLSC3xN1 rLASC3xN2 rLSC3xN3 MO NX=15 NY=1 NK=3 NE=1 LX=FU,FI LY=FU,FI PH=SY,FI TD=SY,FI GA=FU,FI PS=SY,FR TE=SY,FI LK LSC NR LSCxNR LE DELQ FR LX 2 1 LX 3 1 LX 5 2 LX 6 2 LX 8 3 LX 9 3 LX 10 3 LX 11 3 LX 12 3 LX 13 3 LX 14 3 LX 15 3 !Scale VA 1 LX 1 1 LX 4 2 LX 7 3 VA 1 LY 1 1 FR TD 1 1 TD 2 2 TD 3 3 TD 4 4 TD 5 5 TD 6 6 TD 7 7 TD 8 8 TD 9 9 TD 9 9 TD 10 10 TD 11 11 TD 12 12 TD 13 13 TD 14 14 TD 15 15 FR TD 7 8 TD 7 9 TD 8 9 TD 7 10 TD 7 13 TD 8 11 TD 8 14 TD 9 12 TD 9 15 TD 10 11 TD 10 12 TD 11 12 TD 10 13 TD 13 14 TD 13 15 TD 14 15 FR PH 1 1 PH 2 2 PH 3 3 FR PH 2 1 FR GA 1 1 GA 1 2 GA 1 3 PD OU ML AD=OFF ND=3 SC RS 83 3. Mplus Syntax for LMS Approach Interaction between Low Self-control (LSC) and Neighbourhood Risk (NR) A. Dependent variable (delinquency) measured by one indicator: VARIABLE: NAMES ARE LSC1 LSC2 LSC3 NR1 NR2 NR3 DELQ; ANALYSIS: TYPE = RANDOM; ALGORITH = INTEGRATION; ITERATIONS = 1000; MODEL: LSC BY LSC1 LSC2 LSC3; NR BY NR1 NR2 NR3; DL BY DELQ; LSCxNR | LSC XWITH NR; DELQ@0; DL ON LSC NR LSCxNR; OUTPUT: tech1; tech8; B. Dependent variable measured by three indicators VARIABLE: NAMES ARE LSC1 LSC2 LSC3 NR1 NR2 NR3 DELQ1 DELQ2 DELQ3; ANALYSIS: TYPE = RANDOM; ALGORITH = INTEGRATION; ITERATIONS = 1000; MODEL: LSC BY LSC1 LSC2 LSC3; NR BY NR1 NR2 NR3; DL BY DELQ1 DELQ2 DELQ3; LSCxNR | LSC XWITH NR; DL ON LSC NR LSCxNR; OUTPUT: tech1; tech8; 84 Appendix C – Supplementary Analyses
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Comparison of the approaches to assessing statistical...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Comparison of the approaches to assessing statistical interactions : an application to risk factors for… Rajlic, Gordana 2014
pdf
Page Metadata
Item Metadata
Title | Comparison of the approaches to assessing statistical interactions : an application to risk factors for adolescent problem behaviour |
Creator |
Rajlic, Gordana |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | The purpose of the current project was to utilize and compare several approaches to assessing interactions among continuous variables. The approaches used in the project were: (a) multiple regression, (b) unconstrained mean-centered approach (Marsh, Wen, & Hau, 2004), (c) orthogonalizing approach (Little, Bovaird, & Widaman, 2006), and (d) latent moderated structural equations approach (LMS; Klein & Moosbrugger, 2000). The last three approaches utilize the latent variables modeling framework, and they address some of the limitations of multiple regression related to the assumption that the predictors are measured without error. All selected approaches were applied to a problem from psychology domain concerned with adolescent problem behaviour. Specifically, the interactions between certain risk factors relevant for adolescent delinquency (i.e., low self-control, family risk, and neighbourhood risk) were assessed. The International Youth Survey data collected from 3114 students in grades 7 to 9, in the city of Toronto, were utilized in the study. The results obtained by the different approaches were compared and their consistency was examined in terms of the existence, direction, and strength of the relations of interest (specifically, the statistical significance, sign, and magnitude of the obtained coefficients were examined, as well as the magnitude of the standard errors and model fit indices). According to the results of the comparison, there was a considerable consistency in the results of the different approaches. However, some differences were also noted. The obtained differences are of importance as they may affect researchers’ conclusions in regard to the substantive problems of interest. The current study provided a number of highlights that may be of interest to researchers focused on methodological as well as applied aspects of assessing interactions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-07-25 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0167566 |
URI | http://hdl.handle.net/2429/48497 |
Degree |
Master of Arts - MA |
Program |
Measurement, Evaluation and Research Methodology |
Affiliation |
Education, Faculty of Educational and Counselling Psychology, and Special Education (ECPS), Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2014-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2014_september_rajlic_gordana.pdf [ 1.36MB ]
- Metadata
- JSON: 24-1.0167566.json
- JSON-LD: 24-1.0167566-ld.json
- RDF/XML (Pretty): 24-1.0167566-rdf.xml
- RDF/JSON: 24-1.0167566-rdf.json
- Turtle: 24-1.0167566-turtle.txt
- N-Triples: 24-1.0167566-rdf-ntriples.txt
- Original Record: 24-1.0167566-source.json
- Full Text
- 24-1.0167566-fulltext.txt
- Citation
- 24-1.0167566.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167566/manifest