Type I Error Rates and Power of Robust Chi-Square Difference Tests in Investigations of Measurement Invariance by Jordan Brace B.Sc. (Hons.), Memorial University of Newfoundland, 2013 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Psychology) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August, 2015 © Jordan Brace, 2015 ii Abstract A Monte Carlo simulation study was conducted to investigate Type I error rates and power of several corrections for non-normality to the normal theory chi-square difference test in the context of evaluating measurement invariance via Structural Equation Modeling (SEM). Studied statistics include: 1) the uncorrected difference test, DML, 2) Satorra’s (2000) original computationally intensive correction, DS0, 3) Satorra and Bentler’s (2001) simplified correction, DSB1, 4) Satorra and Bentler’s (2010) strictly positive correction, DSB10, and 5) a hybrid procedure, DSBH (Asparouhov & Muthén, 2010), which is equal to DSB1 when DSB1 is positive, and DSB10 when DSB1 is negative. Multiple-group data were generated from confirmatory factor analytic models invariant on some but not all parameters. A series of six nested invariance models was fit to each generated dataset. Population parameter values had little influence on the relative performance of the scaled statistics, while level of invariance being tested did. DS0 was found to over-reject in many Type I error conditions, and it is suspected that high observed rejection rates in power conditions are due to a general positive bias. DSB1 generally performed well in Type I error conditions, but severely under-rejected in power conditions. DSB10 performed reasonably well and consistently in both Type I error and power conditions. We recommend that researchers use the strictly positive corrected difference test, DSB10, to evaluate measurement invariance when data are not normally distributed. iii Preface All work presented was conducted in the Structural Equation Modelling lab at the University of British Columbia, Point Grey campus. At the time of this writing, this study is currently under review for publication in Psychological Methods. I was primarily responsible for data generation, analysis, programming, and manuscript preparation. This project is ultimately an extension of a project conducted by my advisor Dr. Victoria Savalei, and her previous student Jenny Chuang, who contributed the majority to the experimental design. Dr. Savalei also contributed to the revision of the manuscript, and contributed the majority of the discussion of Satorra’s (2000) scaled difference test in section 1.3.2. Dr. Savalei served as a mediator in communication between myself and Dr. Yves Rosseel, developer of Structural Equation Modelling software package lavaan (Rosseel, 2012), which has received many bug reports and updates by virtue of this project. All analyzed data were generated via computer simulation, thus, ethical approval was not necessary before undertaking this project. iv Table of Contents Abstract………………………………………………………………………………………..…ii Preface………………………………………………………………………………………...…iii Table of Contents…………………………………………………………………………..……iv List of Tables…………………………………………………...…………………………….....vii Acknowledgements…………………………………………………………………………….xiii Chapter 1: Introduction…………………………………………………………………….……1 1.1 Structural Equation Modelling……………………………………………………….1 1.1.2 ML Estimation…………………………………………………………….4 1.1.3 Assumptions of ML Estimation and Robust Corrections……………..……5 1.1.4 Nested Models and Chi-Square Difference Tests…………………….…....8 1.1.5 Robust Chi-Square Difference Tests………………………………….…...9 1.2 Measurement Invariance……………………………………………………………11 1.2.2 Evaluating Measurement Invariance Via Multiple Group SEM……….…14 1.2.3 Invariance Models………………………………………………………..16 1.3 Current Study……………………………………………………………………….18 1.3.2 Definitions of Studied Statistics………………………………………….19 Chapter 2: Methodology………………………………………………………………………..22 2.1 Overview…………………………………………………………………………22 2.2 Study Design……………………………………………………………………..22 2.3 Analysis…………………………………………………………………………..24 2.4 Computation of Chi-Square Difference Tests……………………….……………26 Chapter 3: Results………………………………………………………………………………28 v 3.1 Convergence and Computational Failures………………………….………….…28 3.2 Rejection Rates for Overall Tests of Model Fit……………………..…………….29 3.3 Type I Error Rates for Difference Tests……………………………..……………30 3.3.1 Weak Invariance (D1)……………………………………….…………....30 3.3.2 Strong Invariance (D2)…………………………………….……………...31 3.3.3 Strict Invariance (D3)…………………………………….……………….32 3.4 Power of Difference Tests……………………………………….……………….32 3.4.1 Beyond Strict Invariance (D4)……………………………………………33 3.4.2 Full Mean and Covariance Structure Invariance…………..……………...34 Chapter 4: Discussion…………………………………………………………………………..35 4.1 Summary of Current Study………………………………………………….…....35 4.2 Performance of Normal Theory Chi-Square Difference Test……………….……35 4.3 Performance of Scaled Chi-Square Difference Tests………………………..……36 4.3.1 Type I Error Rates………………………………………………………..36 4.3.2 Power with A Properly Specified Baseline…………………………….…37 4.3.3 Power with A Misspecified Baseline………………………………….….38 4.4 Satorra and Bentler’s Scaled Test of Model Fit With Multiple Group Data…...….39 4.5 Recommendations, Limitations, and Future Research……………………….…..40 References…………………………………………………………………………………….…43 Appendix A: Generating Data From A Contaminated Normal Distribution……………..…51 Appendix B: Sample Runs of D, DS0, DSB1, and DSB10……………………………………….....52 Appendix C: Tables………………………………………………………………………….….54 Primary Summary Tables……………………………………………………………...…54 vi Supplementary Tables…………………………………………………………………....64 Rejection Rates For Tests of Model Fit……………………………………..…….64 Rejection Rates For Difference Tests……………………………………….…....78 vii List of Tables Table 1 Parameter Values Used in Data Generation……………………………………54 Table 2 Rejection Rates for the Overall Tests of Model Fit in Selected Conditions with Even Sample Sizes When Data are Not Normally Distributed and When 2 2B ………..…..55 Table 3 Type I Error Rates For Difference Tests When Data Are Normally Distributed Averaged Across Loading, FCOV, and Sample Evenness Conditions…………………………..56 Table 4 Type I Error Rates for The Weak Invariance Difference Test (The Test of Equality of Factor Loadings, df=6) When 2 2B , and Sample Sizes are Even Across Groups………57 Table 5 Type I Error Rates for The Strong Invariance Difference Test (The Test of Equality of Indicator Intercepts, df=6) When 2 2B , and Sample Sizes are Even Across Groups…………………………………………………………………………………………...58 Table 6 Type I Error Rates for The Strict Invariance Difference Test (The Test of Equality of Residual Variances, df=8)When 2 2B , and Sample Sizes are Even Across Groups……59 Table 7 Power of Difference Tests When Data Are Normally Distributed Averaged Across Loading, FCOV, and Sample Evenness Conditions……………………………………………...60 Table 8 Power for The Beyond Strict Invariance Difference Test When 2 2B , and Sample Sizes are Even Across Groups…………………………………………………………...61 Table 9 Power for The Full Mean and Covariance Structure Invariance Difference Test When 2 2B , and Sample Sizes are Even Across Groups…………………………………...62 viii Table 10 Rejection Rates for D5,SB1 When Interpreting Negative Chi-Squares As Model Retentions Versus Model Rejections When .5 , 2 2B , and sample size is even between groups……………………………………………………………………………………………63 Table 11 Rejection Rates For Tests of Model Fit For VM0,0 Conditions When Sample Sizes Are Even……………………………………………………………………………………….…64 Table 12 Rejection Rates For Tests of Model Fit For VM0,0 Conditions When Sample Sizes Are Uneven…………………………………………………………………………………….…65 Table 13 Rejection Rates For Tests of Model Fit For VM2,7 Conditions When Sample Sizes Are Even……………………………………………………………………………………….…66 Table 14 Rejection Rates For Tests of Model Fit For VM2,7 Conditions When Sample Sizes Are Uneven…………………………………………………………………………………….…67 Table 15 Rejection Rates For Tests of Model Fit For VM2,15 Conditions When Sample Sizes Are Even……………………………………………………………………………………….…68 Table 16 Rejection Rates For Tests of Model Fit For VM2,15 Conditions When Sample Sizes Are Uneven…………………………………………………………………………………….…69 Table 17 Rejection Rates For Tests of Model Fit For VM0,0;2,7 Conditions When Sample Sizes Are Even…………………………………………………………...…………………….…70 Table 18 Rejection Rates For Tests of Model Fit For VM0,0;2,7 Conditions When Sample Sizes Are Uneven…………………………………………………………...………………….…71 Table 19 Rejection Rates For Tests of Model Fit For VM2,7;2,15 Conditions When Sample Sizes Are Even…………………………………………………………...…………………….…72 ix Table 20 Rejection Rates For Tests of Model Fit For VM2,7;2,15 Conditions When Sample Sizes Are Uneven………………………………………………………...…………………….…73 Table 21 Rejection Rates For Tests of Model Fit For CN2,10 Conditions When Sample Sizes Are Even……………………………………………………………………………………….…74 Table 22 Rejection Rates For Tests of Model Fit For CN2,10 Conditions When Sample Sizes Are Uneven…………………………………………………………………………………….…75 Table 23 Rejection Rates For Tests of Model Fit For CN0,0’2,10 Conditions When Sample Sizes Are Even…………………………………………………………...…………………….…76 Table 24 Rejection Rates For Tests of Model Fit For CN0,0;2,10 Conditions When Sample Sizes Are Uneven………………………………………………………...…………………….…77 Table 25 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM0,0..................................................................................................78 Table 26 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM0,0………………………………………………………………..79 Table 27 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM0,0………………………………………………………………..80 Table 28 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM0,0…………………………………………………………………………….81 Table 29 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM0,0………………………………………...82 x Table 30 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM2,7..................................................................................................83 Table 31 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM2,7………………………………………………………………..84 Table 32 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM2,7………………………………………………………………..85 Table 33 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM2,7…………………………………………………………………………….86 Table 34 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM2,7………………………………………...87 Table 35 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM2,15.................................................................................................88 Table 36 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM2,15……...………………………………………………………..89 Table 37 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM2,15……...………………………………………………………..90 Table 38 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM2,15……..…………………………………………………………………….91 Table 39 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM2,15……..………………………………...92 xi Table 40 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM0,0;2,7..............................................................................................93 Table 41 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM0,0;2,7……………………………………………………………..94 Table 42 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM0,0;2,7……………………………………………………………..95 Table 43 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM0,0;2,7……...………………………………………………………………….96 Table 44 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM0,0;2,7…...………………………………...97 Table 45 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM2,7;2,15.............................................................................................98 Table 46 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM2,7;2,15………...…………………………………………………..99 Table 47 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM2,7;2,15………...………………………………………………....100 Table 48 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM2,7;2,15……..………………………………………………………………...101 Table 49 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM2,7;2,15……//…………………………….102 xii Table 50 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are CN2,10...............................................................................................103 Table 51 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are CN2,10……...………………………………………………………104 Table 52 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are CN2,10……...……………………………………………………....105 Table 53 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are CN2,10………...………………………………………………………………...106 Table 54 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are CN2,10…...………………………………….107 Table 55 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are CN0,0;2,10...........................................................................................108 Table 56 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are CN0,0;2,10…………...………………………………………………109 Table 57 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are CN0,0;2,10…………...………………………………………………110 Table 58 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are CN0,0;2,10……..……………………………………………………………...…111 Table 59 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are CN0,0;2,10……...…………………………….112 xiii Acknowledgements I owe my tremendous thanks to Dr. Victoria Savalei for her mentorship and knowledge of the fields of structural equation modelling and psychometrics. I would also like to thank Dr. Jeremy Biesanz and Dr. Rachel Fouladi for serving on my committee, and contributing to my statistical education. I would like to thank Jenny Chuang and Cathy Zhang for their mentorship during the early weeks of my education in quantitative methods, and continued support to this day. Further, I would like to thank Dr. Yves Rosseel for multiple updates to the statistical software lavaan to meet the needs of this project. Finally, I would like to thank my family and friends for their continued love and support.1 Chapter 1: Introduction 1.1 Structural Equation Modelling Structural Equation Modeling (SEM) is a statistical technique popular in the social and behavioral sciences for modelling large, multivariate datasets. SEM is the union of two traditions in multivariate behavioral research: path analysis and factor analysis. Path analysis is an extension of typical multiple regression which permits the specification of models with multiple outcome variables, while multiple regression allows only a single outcome. The path analytic framework allows variables to function as both predictors and outcomes, permitting the specification of complex causal chains. Factor analysis is a technique which allows researchers to construct unobservable “latent variables” from observable variables hypothesized to correlate with unobservable psychological constructs of interest. Unobservable constructs of frequent interest to behavioral researchers include intelligence and personality traits. As a union of path analysis and factor analysis, SEM is, thus, a technique that permits complex causal chain modelling of both observable and latent variables. SEMs contain a measurement model, which defines latent variables in terms of observable “indicators”; and a structural model, which proposes a path model that can involve both observable and latent variables (Kline, 2010). In Confirmatory Factor Analysis (CFA) (Joreskog, 1970), the measurement model defines a set of latent variables in terms of observed variables, while the structural model identifies hypothesized correlations (or lack thereof) between these latent variables. In a typical CFA, the following general covariance structure model is fit to data (Brown, 2006): 1( ) . (1) For (1), is an m n matrix of factor loadings, where m is the number of observed variables of interest, and n is the number of latent variables proposed by the researcher; is an 2 n n latent covariance matrix, is an m m diagonal matrix of residual variances, and ( ) is the m m model implied covariance matrix, an estimate of the population covariance matrix, , given the proposed model, and the sample covariance matrix, S. For a CFA with two latent variables, and four observed variables defining each, the matrices in (1) would be defined as: 112131415262728200000000 21221 2 212223242526272800 00 0 00 0 0 00 0 0 0 00 0 0 0 0 00 0 0 0 0 0 0 Where all ,i j are factor loadings, which describe the relationship between observed variable i and latent variable j in the measurement model; 2j are the variances of latent variables; 12 is the covariance between the two latent variables; and 2i is the residual variance in observed variable i not attributable to the latent variable it defines. By equation (1), the overall covariance structure specified by this hypothetical model is: 2 2 211 1 12 2 2 221 11 1 21 1 22 2 2 2 231 11 1 31 21 1 31 1 32 2 2 2 2 241 11 1 41 21 1 41 31 1 41 1 42 2 252 11 12 52 21 12 52 31 12 52 41 12 52 2 52 262 11 12 62 21 12 62 31 12 62 41 12 62 52 2 62( ) 2 22 62 2 2 2 272 11 12 72 21 12 72 31 12 72 41 12 72 52 2 72 62 2 72 2 72 2 2 2 2 282 11 12 82 21 12 82 31 12 82 41 12 82 52 2 82 62 2 82 72 2 8 2 8 3 Where diagonal elements are structural equations representing model-implied variances of observed variables, and off-diagonal elements are structural equations representing model-implied covariances. Thus the model implied variance of the first observed variable,1y , is equal to the product of the estimated variance of the first latent variable,1f , and the squared estimated relationship between 1y and 1f , plus its estimated residual variance. The estimated covariance between 1y and 5y is equal to the product of the estimated relationship between observed 1y and 1f , the estimated latent covariance, and the estimated relationship between 5y and 2f . This model makes several assumptions about the population. For one, variance of any particular indicator is only directly influenced by one latent construct. Any influence of 2f on 1y, for example, is fully mediated by the influence of 2f on 1f . Modifications to the model that would allow direct influence on an indicator by multiple factors are called cross-loadings, and involve replacing fixed-0 elements of with additional estimated ,i j . Furthermore, this model assumes that all residual variances are orthogonal, meaning the covariance between any particular pair of indicators is fully explained by the model. For example, the covariance between 5y and 1y is fully accounted for by the relationship between 5y and 2f , the relationship between 1y and 1f , and the covariance between 2f and 1f . Model modifications allowing the relationship between any particular pair indicators to be influenced by common factors external to the model are called residual correlations, and involve replacing fixed-0 elements of with additional estimated , 'i i . SEM estimation seeks to find a set of values for , the vector containing all elements of , , and , for which the observed covariance matrix of one’s data, S, would be most likely, given the model-implied estimate of the population covariance matrix, ( ) . It does this by 4 iteratively solving the system of linear equations , , ( )i j i jS for every , {1,..., }i j m to minimize deviations between observed and estimated covariances. The output of a CFA includes parameter estimates, standard errors for those estimates, and an overall test of model fit T, which, assuming normal distribution of data and a correctly specified model, follows a central chi-square distribution (Steiger, Shapiro, & Browne, 1985) with degrees of freedom m(m 1)2df q , where q is the length of . A non-significant chi-square test of fit allows the researcher to retain the null hypothesis,0 : ( )H , and thus, infer that their data are not unlikely given the proposed model. 1.1.2 ML Estimation The most popular estimation method in SEM is normal theory maximum likelihood (ML) (Anderson & Gerbing, 1984). The ML fit function is given by 1 1( , ( )) { ( ) } ln | ( ) |MLF S tr S S m . (2) The ML procedure seeks to iteratively identify a solution to such that FML is as close to 0 as possible, minimizing the discrepancy between S and ( ) . In SEM software, a threshold is set such that iteration will stop if the difference in successive values of FML is sufficiently small. At this point, the model is said to have converged, and the final value of FML is denoted ˆMLF , the minimization of the fit function. Larger values of ˆMLF indicate greater discrepancies between the sample covariance matrix, S, and the model implied covariance matrix ( ) (Amemiya & Anderson, 1990). The magnitude of this discrepancy reflects the likelihood of observing S if the null hypothesis,0 : ( )H , is true, where is the population covariance matrix. 5 The overall test of model fit is given by ˆ( 1)FMLT N , where N is the total sample size. This statistic is often referred to as the ML chi-square, as it follows a chi-square distribution over repeated samples when data are normally distributed and the correct model is fit to data. T is tested against a chi-square distribution with m(m 1)2df q degrees of freedom. A statistically significant T reflects a sufficiently large ˆMLF to permit the inference that S is unlikely if the null hypothesis is true, and thus it is likely that the alternative hypothesis, 1 : ( )H , is true. 1.1.3 Assumptions of ML Estimation and Robust Corrections Maximum likelihood estimation is said to be conditionally asymptotically efficient (Savalei, 2014), meaning parameter estimates, the ML chi-square, and associated estimates of sampling variability (i.e., standard errors) reflect what is true in the population as sample size increases, so long as certain assumptions are met. ML estimation makes two strong assumptions about one’s proposed model and data: 1) that the proposed model is true in the population, and 2) that data are multivariate normal. Violations of the first assumption will lead to inaccurate estimates of model parameters, and an inflated estimate of the population chi-square. The ML chi-square test of model fit allows researchers to evaluate whether this assumption has been violated by testing the null hypothesis, 0 : ( )H . Retention of this hypothesis allows researchers to conclude that their data are not unlikely given the proposed model, and thus, reject the notion that first assumption has been violated. The ability of researchers to accurately assess whether the first assumption of ML estimation has been met is dependent on their ability to also meet the second assumption of multivariate normality. When data are not multivariate normal, parameter estimates will still be accurate under a correctly specified model (Shapiro, 1985), but standard error estimates will be 6 incorrect (generally too small), which results in a positively-biased ML chi-square (Savalei, 2014), thus resulting in an inflated Type I error rates, which can lead to the erroneous rejection of correct models, or erroneous model modifications. Given that violation of either assumption of ML estimation results in an inflated test statistic, under conditions of non-normality, one cannot determine whether rejection of the chi-square test of fit is motivated by an improperly specified model, multivariate non-normality, or some combination of the two. Some researchers may read the above and infer that because violation of assumptions generally results in a positively biased test statistic, rejection of the ML chi-square test of fit is ambiguous in what it reflects, but retention might reflect exceptionally good fit. However, in the case where data are platykurtic (negatively kurtotic, or less “peaked” than the normal distribution), standard error estimates will be positively-biased, resulting in a negatively-biased chi-square test of model fit, and a negatively-biased Type I error rate. In practice, researchers will rarely, if ever, have multivariate normal data (Micceri, 1989), and thus, inferences made based on the ML chi-square is almost always suspect. A variety of statistical methods that do not assume multivariate normality of data have been proposed. The first such method was the Asymptotic Distribution Free estimator of Browne (1984), whose test statistic is asymptotically chi-square distributed independent of distribution of data. Instead of assuming normal distribution of data, this method’s computationally intensive estimation involves fourth order sample moments, meaning kurtoses of data are involved in the estimation of parameters, standard errors, and the test of model fit. The computational complexity of this method, however, has been shown to lead to inaccurate parameter estimates and standard errors unless samples are very large, with empirically-determined sample size recommendations 7 ranging from 1000 (Curran, West, & Finch, 1996; Muthén & Kaplan, 1985; Satorra & Bentler, 1994), to 5000 (Hu, Bentler, & Kano, 1992; Yuan & Bentler, 1999). Because ML parameter estimates are asymptotically accurate under non-normality, it has been proposed that a means of circumventing the requirement of large samples for robust methods might be to use a normal theory estimation method, but with robust corrections made to the chi-square test of model fit (Satorra & Bentler, 1988; 1994). Satorra and Bentler proposed a correction to TML of the form SBTT c, (3) where c is a function of multivariate kurtoses, the matrix of residual covariances, ( )S , and the model degrees of freedom. The scaling correction, c, increases as a function of multivariate kurtosis proportional to the positive bias in T associated with increasing multivariate kurtosis, resulting in an asymptotically accurate TSB. It should be noted that TSB is not central chi-square distributed, but does follow a distribution with a mean equal to the model degrees of freedom. Monte Carlo research has found TSB provides a more accurate estimate of the population chi-square than the uncorrected ML or ADF estimators when data are not normally distributed (Curran, West, & Finch, 1996; Chou, Bentler, & Satorra, 1991; Fouladi, 2000; Hu, Bentler, & Kano, 1992; Satorra & Bentler, 1988), and is also less sensitive to model complexity and over-identification (Fouladi, 2000; Hu, Bentler, & Kano, 1992). Further, when the model is correct in the population and data are multivariate normal, c is asymptotically 1, with SBT and MLT producing similar estimates even in small samples (Curran, West, & Finch, 1996). This statistic is available in a variety of statistical software packages, and is generally the default robust estimator. 8 1.1.4 Nested Models and Chi-Square Difference Tests It is often the case that psychological researchers wish to not simply evaluate a single model, but rather compare a set of models. For example, a psychometrician may wish to test whether or not the parallel model, tau equivalent model, and congeneric model fit their data equally well. When the set of all possible solutions to ( ) for a particular model, MA, is a subset of that for another model, MB, MA is said to be nested within MB (Bentler & Bonett, 1980). Thus, with respect to the classical test theory models, the parallel model is nested within the tau equivalent model, and both are nested within the congeneric model. Sets of nested models are typically obtained by placing constraints on parameter estimates, such as by fixing values, or introducing equality constraints. The parallel model is nested within the tau equivalent model because residual variances are free to vary independently under tau equivalency, while they are constrained to equality under the parallel model. The set of possible solutions to for the tau equivalent model includes all cases where 2 2 21 2 3 , as well as all cases where 2 2 21 2 3 . The parallel model, however, only contains the former. Thus, the set of all possible solutions to for the parallel model is a subset of that for the tau equivalent model. Following ML estimation, the difference in fit between nested models MA and MB can be evaluated using the ML chi-square difference test: ML A BD T T , (4) where TA is the chi-square test of fit associated with the more constrained model, MA, and TB is the chi-square for the less constrained model, MB. When data are normally distributed and both models are correct, DML asymptotically follows a chi-square distribution with degrees of freedom D A Bdf df df , where dfA is the number of degrees of freedom associated with MA, and dfB is the 9 number of degrees of freedom associated with MB (Steiger et al., 1985). If DML is found to not be statistically significant when tested against dfD degrees of freedom, researchers may infer that the additional model constraints imposed by MA above and beyond the structure imposed by MB are true in the population. 1.1.5 Robust Chi-Square Difference Tests Unsurprisingly, as is the case with the ML chi-square test of fit, when data are not normally distributed, the normal theory chi-square difference test, DML, provides a positively-biased estimate of the underlying distribution, resulting in over-rejection of the null hypothesis that the nested model, MA, fits as well as the less restrictive model, MB (Satorra, 2000; Satorra & Bentler, 2001). This is somewhat intuitive, as the difference between two biased statistics would only be unbiased if the magnitudes of biases associated with each model cancelled each other out. This would never be expected given that model complexity, or the number of parameters estimated, has also been shown to predict positive bias in the ML chi-square (Muthén & Kaplan, 1992). One might intuit that when comparing models under multivariate non-normality, one should examine the difference in Satorra-Bentler scaled tests of model fit, such that , ,SB SB A SB BD T T (Byrne & Campbell, 1999). However, because the sampling distribution of the Satorra-Bentler scaled chi-square is not itself a chi-square distribution, the sampling distribution for DSB will not have a mean of dfD. Thus, this is an inappropriate correction for non-normality to the chi-square difference test. It has been demonstrated empirically that this ad-hoc correction is not appropriate. Satorra and Bentler (2001) demonstrated Type I error rates for this correction above 70% for multiple-group sample sizes ranging from 220 to 1700, while also frequently returning negative values. 10 Unlike the Satorra-Bentler scaled chi-square, the ADF robust chi-square is asymptotically chi-square distributed under conditions of non-normality, and thus the difference in ADF chi-squares,, ,ADF ADF A ADF BD T T , will asymptotically follow a chi-square distribution with dfD degrees of freedom. However, as one might expect, DADF requires a large sample size (greater than 500) in order to return an accurate estimate of the underlying distribution (Satorra & Bentler, 2001). As was the case with developing a robust test of model fit: in order to develop a chi-square difference test that is asymptotically chi-square distributed under conditions of non-normality, while also performing acceptably under smaller sample sizes, scaling corrections to the chi-square difference test have been developed. A series of scaled chi-square difference tests have been proposed by Albert Satorra and Peter Bentler of the form MLScaledDDD c, (5) where Dc is the scaling correction, which increases as a function of increasing multivariate kurtosis, constraints imposed on the more restricted model, and the number of degrees of freedom associated with each model. Three statistics proposed by Satorra and Bentler differ only in their computation of c. The first such corrected statistic, DS0, was suggested by Satorra (2000). This correction has seen little use, as it is computationally difficult, requiring statistical information that is not easily obtainable from popular SEM software. A simplified correction, DSB1, which is currently the most popular correction, was developed by Satorra and Bentler (2001). This correction has received criticism on the grounds that it sometimes returns negative values, which are uninterpretable. In response, Satorra and Bentler (2010) developed another, more 11 computationally intensive calculation, DSB10, which is always positive but is more cumbersome to implement as it requires an additional model run. Finally, Asparouhov and Muthén (2010) have suggested a hybrid procedure whereby the DSB1 is used as default, and DSB10 is computed only in the case DSB1 is negative; we will refer to this hybrid test as DSBH. These corrections are defined in detail in section 1.3.2. 1.2.1 Measurement Invariance One of the most popular applications of chi-square difference testing is comparing measurement invariance models. Measurement invariance is the property of a psychometric instrument indicting that it performs the same way when applied in a variety of contexts, so long as these contexts are unrelated to the construct being measured (Millsap, 2011). In psychological measurement, content validity, concurrent validity, discriminant validity, and construct validity seek to answer the broad question: “does my instrument actually measure what it is supposed to?” Evaluation of measurement invariance extends this investigation to the more nuanced: “does my instrument always measure what it is supposed to, and in the same way?” The most common context in which measurement invariance is examined is between groups. Between-group invariance refers to the equivalence of instrument behaviour when applied to members of different populations, such as different races, genders, SES levels, or levels of management. Alternatively, researchers may examine invariance of a measure longitudinally, and test whether an instrument behaves equivalently within a population across a set of time points, such as before and after treatment. Past empirical research has demonstrated that the application of non-invariant instruments can lead to the detection of spurious differences as well as the attenuation of real effects (Steinmetz, 2013). Further, a lack of measurement invariance can bias selection criteria towards certain population groups, which can lead to 12 sampling error when creating groups for a study based on a median split, as well as have severe legal consequences in organizational settings if, for example, an instrument favors hiring members of a certain race (Millsap & Kwok, 2004). Thus, any inferences based on applications of psychometric instruments that have not been evaluated for measurement invariance across the contexts in which they are applied should be called into question. Much of the motivation for studies of measurement invariance in the psychological and methodological literature can be traced back to a 1975 article by Golembiewski, Billingsley, & Yeager. In their paper, the authors outline three types of change that occur in psychological research: 1) Alpha change: “Variation in the level of some existential state, given a constantly calibrated measuring instrument related to a constant conceptual domain” (Golembiewski et al., 1975). 2) Beta change: “Variation in the level of some existential state, complicated by the fact that some intervals of the measurement continuum associated with a constant conceptual domain have been recalibrated” (Golembiewski et al., 1975). 3) Gamma change: “A redefinition or reconceptualization of some domain, a major change in the perspective or frame of reference within which phenomena are perceived and classified, in what is taken to be relevant in some slice of reality” (Golembiewski et al., 1975). Alpha change is, thus, the type of change psychological researchers hope to observe. A researcher evaluating an experimental clinical intervention would hope to observe decreases in BDI scores that are motivated by true decreases in depression, and not by changes in what the 13 scale is actually measuring, or participant re-conceptualization of an unchanged state of depression as less severe. According to Golembiewski et al. (1975), Beta change occurs when respondents recalibrate their perspectives of latent constructs given a clearer perception of reality. For instance, participants may adjust their interpretation of value-loaded terms such as “always”, “extremely”, “often”, “rarely”, or “never”. Likewise, Beta change has occurred when participants adjust their interpretations of what it means to be a particular value on a particular Likert scale. Chan (1998) describes Beta change as a change in the scale of measurement, while the construct of interest is unchanged. To illustrate with an example of physical measurement, if the length of a board is three units, and the length of another board is one unit, one might assume the first board is much longer than the second. However, one would be incorrect in this conclusion if the first board is three feet in length, and the second is one yard. The original BDI’s indicator of pessimism includes as response options: “I feel that things are hopeless and that things cannot improve”, and “I feel that I won’t ever get over my troubles” (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961). A conceptual shift from “my troubles are truly insurmountable” to “my troubles technically could be overcome, but it will never happen” would result in a change in response to this item, but might not actually reflect a true change in depression, but rather a change in how participants interact with the BDI. Gamma change refers to a fundamental change in what is being measured by a scale (Golembiewski et al., 1975). If the difference between alpha change and beta change is the difference between inches and centimetres, the difference between alpha change and gamma change is the difference between inches and litres. An example of Gamma change might be good participant effects (Nichols & Maner, 2008), wherein participants actively decide to respond to a 14 measure at post-test in a manner that supports what they believe to be the research hypothesis. Thus, in the case of evaluating a depression intervention, the BDI would change from a measure of depression to a measure of how depressed individuals believe non-depressed individuals respond to the BDI. Evaluations of measurement invariance, therefore, seek to determine whether Alpha, Beta, or Gamma change are being assessed by one’s instrument. If Beta and Gamma change can be ruled out, then differences in observed means can be said to be due to true Alpha change. To illustrate, cultural differences in individualism and collectivism are believed to play a role in cultural differentiation within a number of psychological domains (Hui & Trandis, 1986; Markus & Kitayama, 1991). However, it has also been documented that cross-cultural differences in response style exist between members of individualist and collectivist cultures, such that members of individualist cultures are more likely to endorse extreme responses to Likert items, while individuals from collectivist cultures are more inclined towards the midpoint (Chen & Stevenson, 1995). Thus, when evaluating differences between individualistic and collectivist cultures, one cannot be certain if observed differences are on the construct of interest (i.e., alpha change) or primarily motivated by differences in response styles (i.e., beta or gamma change), unless one evaluates measurement invariance (Chen & West, 2008; Heine, Lehman, Peng, & Greenholtz, 2002). 1.2.2 Evaluating Measurement Invariance Via Multiple Group SEM Measurement invariance can be assessed within an SEM framework via Multiple-Group SEM (MGSEM), which involves fitting the same model to data collected from the populations being compared, and progressively introducing between-group equality constraints on parameter 15 estimates. Whether these constraints result in a significant loss of fit can be assessed via chi-square difference tests. In the context of multiple group SEM analysis (MGSEM), the ML fit function is given by 1 1 11{ ( ) } ln | ( ) | ( ( )) ( )( ( ))1G gML g g g g g g g g ggNF tr S S x x mN , (6) where ,g gN x, and gSare the sample size, vector of sample means, and the sample covariance matrix, respectively, in group g, {1,.., }g G , where G is the total number of groups; N is the total sample size; ( )g and ( )g are the hypothesized mean and covariance structure in group g; is the vector of model parameters of length q; and m is the number of observed variables in the model. As is the case with single-group model estimation, when data are normally distributed, and a correct model is fit to data, T asymptotically follows a chi-square distribution with degrees of freedom, (m 1)2Gmdf q . A non-significant T allows the researcher to infer that the observed data are not unlikely given the proposed model, and thus, the model should not be rejected. MGSEM differs substantively from typical single-group SEM in that, in addition to variances and covariances, structure is imposed on the means of observed variables. The mean and covariance structure model is given by the following equations: ( )g g Fg g , and ( )g g g g g , (7) where g , g , and g are the matrices of factor loadings, factor variances and covariances, and observed variable residual variances in each group, while g and Fg are the observed variables’ intercepts and the factor means in each group. 16 1.2.3 Invariance Models Measurement invariance is typically tested in a sequential manner wherein between-group equality constraints are progressively added to a baseline model (Byrne, 1989; Millsap & Everson, 1991), and change in fit is assessed via chi-square difference tests. The baseline model is called the configural invariance model (Horn, McArdle, & Mason, 1983; Horn & McArdle, 1992), which fits the same CFA model to each group (i.e., the dimensions and the pattern of fixed zeros is the same for g , g , and g , for all {1,..., }g G , with no between-group equality constraints placed on parameter estimates. The ML chi-square for this model run is equal to the sum of the chi-squares associated with each single-group model run, and is tested against twice the degrees of freedom for a single-group model run. If this model is found to fit the data, additional constraints exploring invariance of parameter estimates across groups can then be investigated. While further constraints on the baseline model can be imposed in any order and should ideally be theory-driven (Bentler, 2006), there is an established sequence of steps that is commonly followed (Vandenberg & Lance, 2000). In each step, the previous constraints are retained and additional constraints are implemented. The test of configural invariance is typically followed with the test of weak factorial invariance (Meredith, 1993), which introduces between-group constraints on factor loadings, i.e., g for all g . Retention of this model has been interpreted by some researchers to conclude that the latent construct has the same meaning and scale in all groups (Van de Schoot, Lugtig, & Hox, 2012; Millsap & Hartog, 1998; Golembiewski et al., 1975). In both the configural and the weak factorial invariance models, 0Fg in all groups due to lack of identification, and the mean structure is saturated. 17 The test of strong invariance (Meredith, 1993) introduces additional constraints on indicator intercepts, i.e., g for all g . With the introduction of intercept constraints, latent mean differences across groups become identified and estimable. Latent means for one group are set to zero for identification (e.g., 1 0F ) and estimated in other groups. Retention of this model has been interpreted by some researchers to suggest that participants in all groups attribute the same meaning to the individual levels of the underlying items (Van de Schoot, Lutgig & Hox, 2012; Millsap & Hartog, 1998; Golembiewski et al., 1975). Strong invariance is considered sufficient to permit comparison of observed scores for participants from different groups (Millsap & Kwok, 2004; Steinmetz, 2013). This is trivial given that when strong invariance holds, ( )g Fg , therefore, any between-group differences on observed means can only be motivated by between-group differences on latent means. Thus, retention of the strong invariance model allows researchers to rule out the possibility of Golembiewski et al.’s (1975) beta and gamma change. The test of strict invariance (Meredith, 1993) adds between-group constraints on residual variances, i.e., g for all g . The test of “beyond strict” factorial invariance introduces between-group constraints on factor variances and factor covariances, i.e., g for all g . At this stage of invariance testing, the covariance structures are fully constrained to equality between groups, while latent means are still permitted to vary. The simultaneous invariance of residual and latent variances reflects invariant item reliabilities between groups (Vandenberg & Lance, 2000). The test of invariance of latent covariances is generally considered to be of little substantive importance (Vandenberg & Lance, 2000), although it has been interpreted as a secondary test of between group differences in conceptualization of the construct of interest 18 (Schmitt, Pulakos, & Lieblein, 1984), supplementary to the tests of configural and weak invariance (Horn & McArdle, 1992). The final test in the most common invariance series (Vandenberg & Lance, 2000) is the test of full mean and covariance structure invariance, which introduces between-group equality constraints on latent means, and implies the same population mean vectors and covariance matrices in all groups Constraining latent means to equality is equivalent to constraining latent means in both groups to 0, (i.e., 0Fg for all g) since the location of latent variables is not identified. Rejection of this hypothesis has been interpreted by researchers to imply real between-group differences on the latent construct of interest (Riordan, & Vandenberg, 1994). It is likely that true score differences are rarely evaluated by a test of latent mean invariance, as latent mean differences can be approximated by observed mean differences when strong invariance holds (Steinmetz, 2013). 1.3 Current Study Despite extensive application of corrected chi-square difference tests, little empirical investigation of their relative performances has been conducted. Limited, small-scale simulations were conducted within the original articles that proposed corrected difference tests. Satorra (2000) demonstrates that DS0 has a nominal Type I error rate in moderate to large samples when comparing multiple group regression models when data are non-normal, while the uncorrected DML consistently over-rejects at all sample sizes. Using the same simulation design as Satorra (2000), Satorra and Bentler (2001) demonstrated that DSB1 performs well relative to DML. Additionally, the authors demonstrate the performance of the ad-hoc robust difference test, , ,SB SB A SB BD T T , which drastically over-rejects relative to DML, and frequently returns a negative chi-square difference. Satorra and Bentler (2010) included, as a proof of concept, a 19 demonstration of a case in which DSB1 is negative, but DSB10 is positive, but no Monte Carlo investigation of DSB10’s performance relative to other statistics. The only simulation study to compare the relative performance of multiple corrected difference tests under a wide variety of conditions (Chuang, Savalei, & Falk, 2015) compared the performance of DML, DSB1, DSB10, and DSBH in the context of single-group confirmatory factor analytic (CFA) models with constraints. Only Type I error rates were studied. This study found the corrected tests to perform well, and similarly in some conditions, with DSB1 under-rejecting in some scenarios, and DSB10 over-rejecting in others. The performance of the hybrid test DSBH was found to be indistinguishable from that of DSB1, because DSB1 was negative in a negligible proportion of replications. This good performance of DSB1 relative to what is often observed in practice is likely due to the fact that only correct models were fit to data. The current study expands on Chuang, Savalei, and Falk (2015) in at least three important ways. First, it investigates the performance of the corrected difference tests in the context of evaluating measurement invariance using MGSEM, one of the most common applications of nested-model comparison (Byrne & Shavelson, 1987; Marsh, 1993; Nesselroade & Thompson, 1995). Second, by generating data from a population that is invariant on some, but not all parameters, the current study evaluates Type I error rates as well as power by examining difference tests for progressively more constrained models. Third, the current study includes in its investigation the original Satorra (2000) statistic, and thus compares the performance of all known mean-corrected difference tests: DS0, DSB1, DSB10, and DSBH. 1.3.2 Definitions of Studied Statistics All studied scaled difference tests, DS0, DSB1, and DSB10, are computed using (5) and dfD, differing only in terms of the computation of Dc . For Satorra’s (2000) original correction, let 20 be an * 1m vector of the elements of the model-implied mean vector, ( )g , and the non-redundant elements of the model-implied covariance matrix, ( )g , {1,..., }g G . Let the baseline model, : ( )BM , meaning is a function of the q parameters in the 1q vector . Let the nested model, AM , be identical to BM , except that it imposes a set of p equality constraints by the function 0( ) , where 0 is 1p vector of constants. The null hypothesis of the chi-square difference test is, thus, 0 0: ( )H . Let ( ) be the *m q matrix of model derivatives forBM , and ( )A be the ( )p q p matrix of model derivatives for the constraints. The scaling correction for the chi-square difference test proposed by Satorra (2000) is 0{ }DA BStr Uc df df, (8) where is the asymptotic covariance matrix of fourth order moments, and 1 1 1 1( )U V P A AP A AP V , where V is the ML weight matrix, and P V . In the case of multiple groups, both and V are block-diagonal (See Satorra, 2000 for details). Due to its computational complexity, Satorra’s (2000) has received little application. Satorra and Bentler (2001) noted that the matrices V , A , and can be estimated using parameter estimates from the MB and MA runs. Satorra and Bentler (2001) proposed a simplified correction such that 1A A B BDSBA Bdf c df cc df df , (9) 21 where Ac and Bc are the Satorra-Bentler scaling corrections associated with MA and MB respectively. Because the numerator of this computation involves a subtraction, and Ac is not necessarily strictly greater than Bc , this scaling correction can sometimes be negative, resulting in a negative chi-square difference, which is uninterpretable. In response to the issue of negative chi-square differences associated with 1DSBc , Satorra and Bentler (2010) later proposed a strictly-positive estimate of Dc , such that *10cA A BDSBA Bdf df cc df df , (10) where *c is the Satorra-Bentler scaling correction associated with an additional model run, M*. This additional run involves fitting the less constrained model, MB, using the final parameter estimates of the MA run as starting values, and with iterations set to 0. It follows that * ,ML A MLT T , and, because *SBT is based on fewer df than ,A SBT , * ,SBSB AT T , thus, * Ac c . Because A Bdf df , it follows that ** *cA A Bdf c df , and 10 0Dc , thus ensuring a positive value of 10SBD . Because all three scaling corrections are sample estimates of the same population value given by (5), all scaled difference tests are asymptotically equivalent and will provide increasingly similar rejection rates in large samples. However, their realistic sample behavior may differ, and is, thus, the subject of the current study. 22 Chapter 2: Methodology 2.1 Overview Using Monte Carlo simulation, the relative performances of DS0, DSB1, DSB10, and DSBH were compared to the uncorrected DML, in the context of testing measurement invariance. Datasets were generated from multiple-group population CFA models, and rejection rates for a number of invariance models were evaluated. Non-normal data were generated from population models showing strict invariance (equal loadings, intercepts, and residual variances between groups), permitting the examination of Type I error (when the more restrictive analysis model MA was less or equally restrictive as the population model), power (when MA was more restrictive than the population model, while MB was still consistent with the population model), and rejection rates when both models were misspecified. 2.2 Study Design The population model was always a 2-group 2-factor CFA model with 4 indicators per factor, and no cross-loadings or correlated residuals. Loading size and factor covariance matrices varied across conditions. Variances of the observed variables were always set to 1 in Group 1, but varied in Group 2 in some conditions. Means of observed variables differed across the two groups. In the notation of equation (7), the data generating model had the following mean and covariance structure, {1,2}g : ˆg Fg , ˆ i i . (11) As the previous equation shows, population factor loadings, residual variances, and intercepts were specified as equal across groups. Latent means were always different between the two groups, but their values did not vary across study conditions. Factor covariance matrices varied across groups as well as across study conditions: Group 2 either had a different factor 23 covariance than Group 1 (2 2A ), or it had a different factor variance for one of the factors than Group 1 (2 2B ). For simplified discussion, latent covariance matrix condition will be referred to as FCOV condition. The matrix of factor loadings was always equal between groups, but varied across study conditions, with an average value of either .5 or .7. The exact specification of all matrices in (9) for all conditions are shown in Table 1. Seven different distributional conditions were included; they are summarized in Table 1. The seven conditions included one normally distributed data condition, four conditions where data in at least one group were non-normal and were generated using the procedure of Vale & Maurelli (1983) (hereafter referred to as VM), and two conditions that involved data generated using a contaminated normal distribution. The VM conditions involve the specification of univariate skew and kurtosis for each variable; univariate skew was always set to 2, and univariate kurtosis was either 7 or 15 on average (but varied across indicators, see Table 1). These distributions will be denoted VM2,7 (moderate non-normality) and VM2,15 (extreme non-normality). Three VM conditions had the same distribution in both groups (normal, VM2,7, VM2,15), and two VM conditions specified different distributions across groups (normal in group 1 and moderately non-normal in group 2, denoted VM0,0;2,7, and moderately non-normal in group 1 and extremely non-normal in group 2, denoted VM2,7;2,15). When data are generated from a contaminated normal distribution, data are sampled from two normal distributions, one with a variance 10 times that of the other. Twenty percent of data points were sampled from the wider distribution. This procedure is conducted for each dependent variable, resulting in a dataset with univariate skew of 0, and a homogeneous univariate kurtosis of 4.96. Two of the seven data distribution conditions made use of contaminated normal distributions: one in which data were generated from a contaminated normal in both groups, and 24 one in which data were normally distributed in group 1 (see Table 1). These distributions are denoted CN2,10 and CN0,0;2,10 respectively. Normal data and VM data were generated in the R package lavaan (version 0.5.17). Contaminated normal data were generated using our own R code (See Appendix A). Sample size conditions varied along two dimensions: total number of observations in the two groups and whether the groups were equal in size. Small, medium, and large total sample size was set to be 220, 440, and 1760 observations, respectively. In the equal sample size conditions, the number of observations per group was 110, 220, and 880, respectively. In the unequal sample size conditions, Group 2 had 6 observations for every 5 observations in Group 1. Sample size details are included in Table 1. In total, conditions varied along the following five dimensions: 1) average factor loading in the population, 2) FCOV condition, 3) total sample size, 4) sample size ratio between groups, and 4) data distribution. These conditions were fully crossed in a 2 x 2 x 3 x 2 x 7 factorial design. One thousand data sets were generated in each of the 168 conditions. 2.3 Analysis Data were analysed in the R package lavaan version 0.5.17. Sample analysis code is included Appendix B. Latent variables were always identified by fixing a single factor loading to 1. Six multiple-group models were fit to each generated dataset: configural invariance (M0, df=38), weak factorial invariance (M1, df=44), strong factorial invariance (M2, df=50), strict factorial invariance (M3, df=58), “beyond strict” factorial invariance (M4, df=61), and full mean and covariance structure invariance (M5, df=63). Models M0 through M3 are consistent with the data generating process described by equation (5). Thus, when these models are fit to data, model rejection rates represent Type I error 25 rates, and similarly the rejection rates of difference tests between any pair of these four models also represent Type I error rates. Type I error rates falling within the 3.75-6.25% range proposed by Serlin (2000) are considered acceptable. Model M4 is not consistent with the data generating process, and thus model rejection rates for M4 represent power. Similarly, chi-square difference tests between M4 and the preceding model M3 also represent power to reject the incorrect constraint that1 2 . When the average factor loading was 0.5, the population RMSEA values for M4 was 0.012 in both FCOV conditions. When the average factor loading was 0.7, the RMSEA values were 0.016 for both FCOV conditions. Model M5 is also inconsistent with the data generation model described in equation (9). When the average factor loading was 0.5, the population RMSEA values for M5 were 0.028 when 2 2A , and 0.026 when 2 2B . When the average loading was 0.7, the RMSEA values were 0.033 and 0.031 for the two FCOV conditions, respectively. While the overall model rejection rates for M5 represent power, rejection rates of the chi-square difference test between M5 and M4 do not, strictly speaking, represent power, because the baseline model M4 is also misspecified, thus violating the assumptions of the chi-square difference test (Bentler, 2006). However, given that M5 is more misspecified than M4, the approximate noncentrality difference between these two models is not zero (Steiger et al., 1985), and the behavior of the chi-square difference test can be taken to mean approximate power to detect this additional misspecification. The overall chi-square test of fit for model iM , {0,...,5}i , will be denoted by iT , with an additional subscript to indicate the type of test (ML or SB). A difference test between two adjacent models will be denoted 1i i iD T T , {1,...,5}i , with an additional subscript indicating which scaling correction (ML, S0, SB1, SB10, SBH), is being discussed. The degrees 26 of freedom for the five examined chi-square differences are df=6, 6, 8, 3, and 2 for 1D through5D , respectively. 2.4 Computation of Chi-Square Difference Tests Normal theory ML difference tests ,i MLDare computed as the difference of two ML chi-squares and do not require special implementation. The original robust difference test as defined in Satorra (2000), Di,S0 is included as a function in the R package lavaan (Rosseel, 2012). The relevant matrices necessary for the computation of this test (e.g., , V ) can be evaluated at the model estimates for either the more or the less restricted model (Satorra & Bentler, 2001), creating two potential versions of this test. Lavaan’s implementation uses the estimates from the less restricted model. The robust difference test popularized by Satorra and Bentler (2001), ,SB1iD, is computed from lavaan’s output for the ML and the robust runs using equations described in section 1.3.2. The “strictly positive” robust difference test, ,SB10iD (Satorra & Bentler, 2010), requires an additional model run, *M , described in section 1.3.2. Technical difficulties arise when computing D2,SB10 , Satorra and Bentler’s (2010) test of strong invariance. The additional model run, M*, fits M1 using the final estimates of M2 as starting values. Because the latent means are not identified under M1 (i.e., they are set to zero for both groups), while they are identified for M2 (i.e., freely estimated in one group), starting value specifications for latent means are incorrectly ignored by lavaan, resulting in a model run such that * ,ML A MLT T , which is not expected. As a correction, latent means are forced into the intercepts by setting intercept starting values for each group to mean estimates, i i i , (1,2)i , where i is the matrix of loading estimates from the M2 run, i is the vector of latent 27 mean estimates, and i is the vector of intercepts. Because the latent means for group 1 are always set to 0, 1 1 1 1 . This correction produces a M* such that * ,ML A MLT T , as expected. Similar issues arise when testing D2,S0. Because the mean structure is identified for M2 but not M1, M2 is not technically nested within M1. To circumvent this issue, the mean structure of M1 must be manually freed. This results in all intercepts being estimated as the correct indicator mean, and latent means estimated as 0 in both groups. This modification does not impact parameter estimates or chi-square estimates. To properly nest M2 within the new M1, as in the case of DSB10, latent means must be forced into indicator intercepts. This was done by placing non-linear constraints on intercept estimates in group 2. Alternative parameterizations are possible and are included in Appendix B. This bug has been reported to the maker of lavaan and a solution should be implemented in future versions. Finally, the hybrid test ,SBHiD is computed as ,SB1iD when it is positive, and otherwise is set to ,SB10iD . Sample R code for the application of all studied difference statistics is included in Appendix B. 28 Chapter 3: Results 3.1 Convergence and Computational Failures Model convergence failures were generally uncommon when data were generated using the VM procedure. Convergence failures were most frequent when testing configural invariance (M0) in conditions with small sample sizes and average factor loadings of 0.5. In these conditions, M0 generally failed to converge for 1-3% of datasets, while M1-5 failed in 0-1% of datasets. Convergence failures in conditions with higher sample sizes and/or factor loadings were rare; rejection rates for most other conditions are based on all 1000 samples. When data were generated from a contaminated normal distribution, convergence failures were much more common, occurring in as many as 20% of M0 runs, and 5-8% of M1-5 runs, in conditions with small samples and average loadings of 0.5. As was the case with VM data, convergence failures were much less frequent in higher sample size and loading conditions, occurring in less than 1% of medium sample size conditions, and no large sample conditions. Non-convergent runs were excluded from rejection rate computations. The Satorra-Bentler scaled test of model fit cannot be computed when TML is not available, or when standard errors cannot be computed. Thus, the failure rate for TSB is greater than or equal to that of TML. TSB was not available for 2-4% of M0 runs, and 1-2% of M1-5 runs when sample sizes were small, and data were generated via the VM procedure. When data were generated from a contaminated normal distribution, TSB was not available for 15-30% of M0 runs, and 9-12% of M1-5 runs in small sample conditions. TSB was also not available for 1-2% of contaminated normal runs when sample sizes were medium, and was always available in large sample size conditions. 29 The uncorrected chi-square difference test, DML, is not computable whenever a required TML is not available. Failure rates for Di,ML are generally very close to those of Ti,ML, where i is the numeric subscript associated with the more constrained model. Scale difference tests, Di,S0, Di,SB1, and Di,SB10, generally have failure rates close to those of Ti,,SB. 3.2 Rejection Rates for Overall Tests of Model Fit While the focus of this study is on the performance of corrected difference tests, overall tests of model fit were also examined to determine that the types of non-normal data generated indeed broke down the performance of Ti,ML, and that the scaled test statistic Ti,SB produced rejection rates closer to nominal. The performance of difference tests may be dependent on the performance of the overall model tests of fit. Rejection rates for Ti,ML were inflated in all non-normal data conditions relative to normally distributed data, particularly in contaminated normal conditions, where they often approached 99%. Rejection rates for Ti,SB were much closer to the acceptable 3.75-6.25% range. Type I error rates for T0,SB, T1,SB, and T2,SB, were between 6-16% in small samples, increasing with non-normality and average factor loadings. Type I error rates for T3,SB were generally higher than those for lower invariance levels, with rejection rates typically exceeding 20% in small samples. Little variation was seen in Type I error rates as a function of latent non-invariance condition. Rejection rates improve with sample size, ranging from 7-12% in medium sample sizes, and 4-9% in large sample sizes. Mean power of 4,SBT across non-normality conditions was 17.3% in small samples, and 27.5% in large samples. Mean power of 5,SBT was 32.1% in small samples, and 93.7% in large samples across normality conditions. See Table 2 for detailed rejection rates for a representative subsample of conditions. Tables 11-24 in Appendix C contain complete rejection rates for all conditions. 30 3.3 Type I Error Rates for Difference Tests While variations in population parameter values such as factor loadings, FCOV conditions, sample evenness, and degree of non-normality did impact power and Type I error rates for corrected chi-square difference tests, these effects were generally small, and never affected conclusions as to which corrected difference test performs best. The only dimension along which the best performing statistic varied was level of invariance. Since all corrected difference tests are asymptotically equivalent, this variation was most pronounced in small samples, with between-statistic differences in performance converging towards zero as sample size increased. For this reason, discussion of results primarily focuses on small sample sizes. Corrected statistics generally performed similarly when data were normally distributed, and were consistently outperformed by the uncorrected DML. This pattern was consistent across all levels of invariance. Table 3 gives average Type I error rates across all conditions when the data are normal. Tables 25-27 give complete Type I error rates for all conditions tests when data are normal. Negative values of DSB1 were found to be extremely uncommon when evaluating D1-3. This finding is consistent with the results of Chuang, Savalei, and Falk (2015), who also examined Type I error rates of DSB1. For this reason, the performance of D1-3,SBH is not discussed, as rejection rates were indistinguishable from those of D1-3,SB1. 3.3.1 Weak Invariance (D1) The test of weak invariance is a test of the constraints on the factor loadings across groups; because the marker approach was used to identify the latent factors, this test is against 6 degrees of freedom. The loadings were equal across groups in the generated data, and the rejection rates of this test represent Type I error rates. Table 4 contains Type I error rates for 31 D1,ML, D1,S0, D1,SB1, and D1,SB10, when sample sizes are even and 2 2B . Results for uneven sample sizes and the second FCOV condition (2 2A ) are similar. In small samples and when .5 , D1,SB1 consistently rejects within the acceptable 3.75%-6.25% range. In contrast, D1,S0 consistently rejects at a rate in excess of 10%, while D1,SB10 rejects within a range from 8-11%. Rejection rates generally decrease with increasing factor loadings. In moderate samples, D1,SB1 is, again, more consistently within the acceptable range than D1,S0 and D1,SB10. In large samples all corrected statistics perform similarly, and are generally within or close the 3.75%-6.25% range. Complete Type I error rates for all conditions are included in Appendix C. 3.3.2 Strong Invariance (D2) The test of strong invariance is a test of the additional constraints on indicator intercepts across groups. This test estimates eight fewer indicator intercepts than a model without these constraints. Additionally, intercept constraints allow latent means to become estimable, using up two degrees of freedom. Thus, this test is against 6 degrees of freedom. The intercepts were equal across groups in the generated data, and the rejection rates of this test represent Type I error rates. Table 5 contains Type I error rates for D2,ML, D2,S0, D2,SB1, and D2,SB10, when sample sizes are even and 2 2B . In small samples when .5 , DS0 and DSB10 generally perform within the acceptable range, with performance improving with increasing factor loadings. Conversely, DSB1 consistently performs outside the acceptable 3.75-6.25% range, with performance diminishing with increasing loadings. In medium and large sample sizes, DS0 and DSB10 consistently perform within the acceptable range, while DSB1 does not. While other statistics were found to generally be uninfluenced by sample evenness condition, D2,S0, as can be seen in Appendix C, has a tendency to under-reject when sample sizes are uneven; frequently rejecting in the 2-3% range. No improvement in D2,S0 rejection rates are 32 seen as a function of increasing sample size. Complete Type I error rates for all conditions are included in Appendix C. 3.3.3 Strict Invariance (D3) The test of strict invariance is a test of the additional constraints on the residual variances across groups; this test is against 8 degrees of freedom. The residual variances were equal across groups in the generated data, and the rejection rates of this test represent Type I error rates. Table 6 contains Type I error rates for D3,ML, D3,S0, D3,SB1, and D3,SB10, when sample sizes are even and 2 2B . No corrected statistic was found to perform consistently within an acceptable range when testing between-group equivalence of residual variances. While DSB10 does perform the best relative to DS0 and DSB1, between-statistic differences are minimal and inconsistent. Additionally, while performance was uninfluenced by non-normal data generating algorithm when testing D1 and D2, over-rejection tends to occur only for VM data, while corrected statistics tend to under-reject when models are fit to CN data. Rejection rates associated with VM data tends to converge towards the 3.75%-6.25% range with increasing sample size, while under-rejection associated with CN data does not. Complete Type I error rates for all conditions are included in Appendix C. 3.4 Power of Difference Tests Consistent with Type I error conditions, conclusions regarding which corrected statistics perform best were unrelated to population parameter values. When used to compare model fits with normal data, all corrected statistics performed similarly, rejecting at a slightly higher rate than DML. The greater power of corrected tests here is an artifact of the slightly more inflated Type I error rates that they also exhibit (see Table 3). See Table 7 for a summary of power across 33 all conditions for normal data. Tables 28 and 29 in Appendix C contain complete rejection rates for power conditions when data are normal. Negative values of DSB1 were found to only occur with any frequency for D5,SB1, thus making DSBH indistinguishable from DSB1 for all other levels of invariance. For this reason, the hybrid statistic, DSBH is only addressed when discussing D5. Power for the uncorrected test DML is included in the results for completeness; however, it is important to remember that this statistic fails to maintain Type I error rates with non-normal data, and thus its superior performance in terms of power is not meaningful. 3.4.1 Beyond Strict Invariance (D4) The test of beyond strict invariance is a test of the additional constraints on the latent variances and covariances across groups; this test is against 3 degrees of freedom. The latent variance matrices were not equal across groups in the generated data, and the rejection rates of this test represent power to detect misspecifcation. Table 8 shows power of D4,ML, D4,S0, D4,SB1, and D4,SB10, to reject the null hypothesis that 1 2 when 2 2B . No systematic differences in rejection rates between the two FCOV conditions are observed. When sample sizes are small and .5 , DSB1 is extremely under-powered, detecting the misspecification in 3-12% of datasets, depending on the degree of non-normality. DSB10 outperforms DSB1, detecting the misspecification in 9-20% of datasets. DS0 is the most powerful statistic in this condition, with rejection rates ranging from 11-27% depending on normality condition. As can be seen in Table 8, rejection rates increase with increasing factor loadings. In medium samples when .5 , DSB1 rejects 7-20% of the time, DSB10 17-31% of the time, and DS0 20-34% of the time. In large samples DSB1 rejects 51-82% of the time, DSB10 56-89% of the time, and DS0 60-90% of the time. Complete rejection rates for all conditions are included in Appendix C. 34 3.4.2 Full Mean and Covariance Structure Invariance (D5) The test of full mean and covariance structure invariance is a test of the additional constraints on the latent means across groups; this test is against 2 degrees of freedom. Because this difference test compares two misspecified models, rejection rates do not technically represent power. However, given that M5 was always more misspecified than M4, these rejection rates can be interpreted as ability of each statistic to detect the additional misspecification. Table 9 contains rejection rates for D5,ML, D5,S0, D5,SB1, and D5,SB10, when sample sizes are even and 2 2B . When sample sizes are small, DSB1 rejects 81-93% of the time, DSB10 78-82% of the time, and DS0 77-81% of the time. In medium samples, DSB1 rejects 98% of the time, DSB10 98% of the time, and DS0 98-99% of the time. In large samples, all three statistics reject 100% of the time. The test of full mean and covariance structure invariance was the only difference test where rejection rates for DSB1 and DSBH differed substantially, indicating a non-trivial frequency of negative values of DSB1. Because no empirically determined recommendations exist, separate rejection rates were calculated for interpreting negative values of DSB1 as model retentions and model rejections. Interpreting negative chi-square differences as model retentions leads to an underpowered DSB1 relative to the other corrected statistics, while interpretation as rejection leads to rejection rates similar to those of DSBH and other scaled statistics. These results suggest that negative values of DSB1 should be treated as model rejections. See Table 10 for an illustration of this differential performance. Complete rejection rates for all conditions are included in Appendix C. 35 Chapter 4: Discussion 4.1 Summary of Current Study The goal of the current study was to investigate the relative performance of four robust corrections to the chi-square difference test in the context of evaluating measurement invariance via multiple-group SEM. Statistics evaluated included Satorra’s (2000) original computationally intensive correction, DS0; Satorra and Bentler’s (2001) simplified correction, DSB1; Satorra and Bentler’s (2010) strictly positive correction, DSB10; and the hybrid statistic, DSBH (Asparouhov & Muthén, 2010; Chuang et al., 2015). Prior evaluation of these statistics has been limited; it includes small-scale simulations at the initial inceptions of DS0 and DSB1 (Satorra, 2000; Satorra & Bentler, 2001) and an empirical comparison of DSB1 and DSB10 in the context of single-group model constraints (Chuang et al., 2015). The present study is the first to 1) evaluate the performance of robust chi-square difference tests in the context of testing measurement invariance via multiple-group SEM with non-normal data, 2) examine the relative performance of DS0, DSB1, and DSB10 , and, 3) examine power as well as Type I error rates of robust chi-square difference tests. 4.2 Performance of The Normal Theory Chi-Square Difference Test The normal theory ML chi-square test of model fit has been studied extensively, and is known to show inflated rejection rates with non-normal data (Chou et al., 1991; Bentler & Yuan, 1999; Satorra & Bentler, 1994). As can be seen in Tables 2, 11, and 12, this result was replicated in this study. Tables 4 and 6 demonstrate that, as expected, the ML chi-square difference test, DML, also exhibits inflated Type I error rates when data are not normally distributed. As seen in Table 5, an exception to this trend was D2,ML, the test of strong invariance, which tests that indicator intercepts are equal between groups (Meredith, 1993). Here, the uncorrected DML was 36 found to be largely insensitive to non-normality. This finding is reasonable given that the constraints are on the mean structure, and, by the central limit theorem (Johnson, & Wichern, 1982), traditional tests on group means are asymptotically robust to non-normality. As seen in Table 5, this robustness is most pronounced for VM data, and breaks down somewhat for CN data. 4.3 Performance of Scaled Chi-Square Difference Tests Variation with respect to the best-performing corrected statistic in each condition was largely unrelated to the value of the average factor loading, the type of group differences in the factor covariance matrices (the FCOV conditions), and whether the sample sizes were even across groups (the sample evenness conditions), with the exception of DS0 when evaluating strong invariance. Statistics did, however, differ in their relative performance as a function of which sets of parameters were being constrained to equality. As would be expected from their asymptotic equivalence, differences in relative performance of the corrected statistics were most pronounced in small samples. When sample sizes were large (NTotal=1760), all corrections performed similarly and well, exhibiting high power, and Type I error rates near 5%. For this reason, the discussion of performance differences below primarily focuses on small sample sizes. 4.3.1 Type I Error Rates When evaluating weak invariance (i.e., the equality of factor loadings between groups) with non-normal data, Type I error rates of D0 and DSB10 generally exceeded 10% in small samples. DSB1, however, performs within the acceptable 3.75%-6.25% range at all studied sample sizes. When evaluating strong invariance (i.e., additional equality constraints on the indicator intercepts), DSB1 was generally found to have a Type I error rate near 10% in small sample sizes, while DS0 and DSB10 performed within the acceptable range when sample sizes were even, with 37 DS0 frequently under-rejecting when samples were uneven. When evaluating strict invariance (i.e., additional equality constraints on residual variances), all three corrected statistics generally performed poorly and inconsistently, with DS0 performing the worst, and with DSB1 and DSB10 perform similarly in all conditions. Unlike for the tests of weak and strong invariance, the type of non-normal data did appear to affected rejection rates for the test of strict invariance: for data generated by the Vale and Maurelli (1983) procedure, all statistics over-rejected in small samples and improved with increasing sample; for contaminated normal data, all statistics generally under-rejected at all sample sizes. When data were normal in one group and sampled from a contaminated normal distribution in the second group, rejection rates were much closer to the acceptable range for all corrected statistics at all sample sizes. 4.3.2 Power with A Correctly Specified Baseline Power to detect a lack of beyond strict invariance (i.e., additional constraints on elements of latent variance-covariance matrices) was generally low for all considered statistics; with rejection rates generally ranging from 5%-30% across statistics in small samples. However, the population RMSEA for the test of beyond strict invariance was very small, ranging from 0.012 to 0.016 across conditions. An RMSEA difference of less than 0.01 has been proposed as a threshold for model retention for researchers who wish to evaluate invariance via change in fit indices (Chen, 2007). Additionally, the power of DML to detect this misspecification when data is normal was only about 20%, suggesting that this misspecification is very slight. Researchers who wish to reliably detect non-invariance of this nature and magnitude are encouraged to ensure large samples. In general, DS0 was found to be more powerful than DSB10, which in turn was found to be more powerful than DSB1. DS0 generally detected the lack of beyond strict invariance in 25-30% 38 of replications, DSB10 15-20%, and DSB1 5-10%. Rejection rates were consistent across FCOV conditions. Given that DS0 was found to over-reject to a greater degree than DSB1 and DSB10 when testing equality of covariance structure parameters (i.e., factor loadings and residual variances), it is plausible that DS0 is biased towards rejection, and its power advantage is an artifact of its inflated Type I error rates. However, the magnitude of the rejection rate for DS0 relative to other corrections is much greater in power conditions than Type I error conditions, rejecting as much as 30% more than DSB1 and 15% more than DSB10 when testing beyond strict invariance (power), while only rejecting 10% more than DSB1 and 5% more than DSB10 when testing weak invariance (Type I error rates), and rejecting similarly to the other corrections when testing strict invariance. 4.3.3 Power with Misspecified Baseline The test of full mean and covariance structure invariance (i.e., additional constraints placed on latent means) is not a true power condition, as the baseline model is the misspecified test of beyond strict invariance, thus violating assumptions of the chi-square difference test. Given, however, that full mean and covariance structure invariance is more misspecified than the baseline model, rejection rates can be interpreted as the power of difference tests to detect that the more restricted model shows greater misspecification than the baseline. Additionally, given that, in applied settings, hypothesized models will always involve some degree of misspecification, this condition might be more representative of what applied researchers will encounter in practice. When samples were small, all statistics were found to reject at a rate around 65-70%, with DSB1 generally being the most powerful statistic. Negative values of DSB1 were very infrequent for D1-4. When testing full mean and covariance structure invariance, however, negative values of DSB1 became much more frequent: generally occurring in at least 15% of replications but in as many as 90% when data were not 39 normally distributed. Increased frequency of negative values permits interpretable differences between DSBH and the two possible interpretations of DSB1: treating negative values as rejections or treating them as retentions. Rejection rates for DSB1 more closely resembled DSBH and the other corrected difference tests when negative values were interpreted as model rejections. Rejection rates for D5,SB1 were also generally higher than those for other corrected difference tests when data were not normally distributed. When negative values are excluded from the rejection rate computation, DSB1 continues to reject at a rate greater than DS0 and DSB10, suggesting that the higher power of DSB1 is not merely an artifact of uniformly interpreting negative values as rejections. For applied researchers, this set of findings means that negative values of DSB1 are indicative of an incorrect baseline model. 4.4 Satorra and Bentler’s Scaled Test of Model Fit With Multiple Group Data While our interest was in the difference tests, very little prior research exists examining the performance of the overall Satorra-Bentler scaled chi-square model tests of fit in the multiple-group context, and we included these data in our results (see Table 2). It is interesting to note that Type I error rates for the Satorra-Bentler scaled chi-square are generally inflated for all levels of invariance, with rejection rates near 10% for the tests of configural, weak, and strong invariance, and 20% for the test of strict invariance. Surprisingly, over-rejection of tests of model fit does not necessarily lead to over-rejection in difference tests. Given the pattern of rejection rates in Table 2, it appears plausible that rejection rates of scaled difference tests are related to rejection rates of the scaled test of model fit, in that difference tests comparing models with similar rejection rates have rejection rates near 5%, while difference tests involving models with dissimilar rejection rates do not. 40 As presented in Table 2, rejection rates for the configural, weak, and strong invariance model fits are all roughly 10% in .5 conditions with small sample sizes, and tables 4 and 5 demonstrate adequately performing difference tests for the weak and strong difference tests. The strict invariance model fit, however, rejects near 20% when data are generated via the VM procedure, and Table 6 demonstrates a tendency of all corrected statistics to over-reject. The beyond strict invariance model fit also rejects near 20%, and Table 8 demonstrates low power of difference tests to detect misspecification, including rejection rates near 5% for DSB1. When data are generated from a contaminated normal distribution, rejection rates for scaled tests of model fit differ substantially from those for VM data. As presented in Table 2, when data in both groups are sampled from a contaminated normal distribution, T3,SB rejects at a rate less than 10%, and, as can be seen in Table 6, difference tests tend to under-reject. For CN0,0;2,10 conditions, T2,SB and T3,SB have similar rejection rates, and difference tests reject at a rate closer to 5%. Given this pattern of results, it appears plausible that inconsistent and erratic performance of tests of model fit and difference tests when evaluating equality of residual variances is associated with data generation method. It would be in the best interest of psychometricians to conduct more empirical comparisons of non-normal data generation techniques in the context of SEM simulations. 4.5 Recommendations, Limitations, and Future Research Overall, this research finds that DS0 may have a positive bias, and is sensitive to sample evenness when testing equality constraints on indicator intercepts, while DSB1 and DSB10 both perform very similarly, with each having its own strengths and weaknesses. In general, DSB10 appears to be the best performing statistic, given its power edge over DSB1. It may, however, be more appropriate to consider alternating corrected statistics when testing a psychometric 41 instrument for measurement invariance. Specifically, DSB1 could be used to test weak invariance, while DSB10 could be used to test strong invariance. However, further research considering a greater variety of populations should be conducted to confirm that this pattern is stable across a variety of models, with varying numbers of latent variables and indicators, before such a recommendation could be made authoritatively. Given that the majority of recent applications of invariance testing (Davidov, Schmidt, & Schwartz, 2008; Kavussanu & Boardley, 2009; Nye, Roberts, Saucier, & Zhou, 2008; Rusticus, Hubley, & Zumbo, 2008; Yoo, 2002) seek to retain the strong invariance model, as no degree of invariance beyond strong is necessary to compare observed scores between groups, issues associated with tests of strict invariance and beyond are not of tremendous importance. Simultaneous invariance of residual and latent variances is used to evaluate invariance of an instrument’s reliability between groups (Vandenberg & Lance, 2000), while invariance of latent covariance and latent means are of little substantive value (Vandenberg & Lance, 2000). In order to better understand the performance of robust statistics when evaluating the substantively meaningful configural, weak, and strong invariance levels, further research should focus on evaluating power to detect misspecifications at these locations. As previously discussed, equivalence of latent variances and covariances was a very small misspecification, and yet, this misspecification seemingly gives rise to frequent negative values of DSB1 when evaluating full mean and covariance structure invariance. Because the test of full mean and covariance structure invariance was only examined in the context of power conditions with misspecified baselines, the consequences of this misspecification on the performance of DS0 and DSB10, relative to when baseline models are properly specified, are not known. Given that, in practice, hypothesized models will never be absolutely correct (Edwards, 42 2013), applied invariance series will always involve some degree of baseline misspecification. In order to make better recommendations to applied researchers, further methodological research should more thoroughly evaluate the performance of difference tests when comparing model fits when the baseline is misspecified to an acceptably small degree. Additionally, it may be worthwhile to evaluate the frequency of negative values of DSB1 in this context, and develop a more nuanced interpretation of negative values. Further research is also necessary to evaluate whether the order in which equality constraints are introduced impacts the performance of scaled test statistics. While equality of residual variances is the most frequent constraint to follow the tests of loading and intercept invariance (Vandenberg & Lance, 2000), the order of constraints following intercept equality is inconsistent in the literature. Some researchers have recommended preceding the test of invariant residual variances with independent tests of equal latent covariances and latent variances (Marsh, 1994; Steenkamp & Baumgartner, 1998; Taris, Bok, and Meijir, 1998). Others have recommended conducting tests of equivalent latent covariance matrices following the test of strong invariance, and subsequently adding the uniqueness constraints (Rock, Werts, & Flaughter, 1978; Schaie & Hertzog, 1985). Whether this ordering might be responsible for the erratic performance of the test of strict invariance has not been empirically evaluated. 43 References Amemiya, Y., & Anderson, T. W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. Annals of Statistics, 18(3), 1453-1463. Anderson, J. C. & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155-173. Asparouhov, T., & Muthén, B. (2010). Computing the strictly positive Satorra-Bentler chi-square test in Mplus. Mplus Web Notes: No. 12. January 24, 2012. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives Of General Psychiatry, 4, 561-571 Bentler, P. M. (2006). EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc. Bentler, P.M., & Bonett, D.G. (1980). Significance tests and goodness of fit in the analysis of covariance structures, Psychological Bulletin, 88, 588-606. Bentler, P. M., & Yuan, K. H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34(2), 181-197 Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, NY: The Guilford Press. Browne, M. W. (1984). Asymptotically distribution‐free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37(1), 62-83. 44 Byrne, B. M. (1989). Multigroup comparisons and the assumption of equivalent construct validity across groups: Methodological and substantive issues. Multivariate Behavioral Research, 24(4), 503-523. Byrne, B.M., & Campbell, T.L. (1999). Cross-cultural comparisons and the presumption of equivalent measurement and theoretical structure: A look beneath the surface. Journal of Cross-Cultural Psychology, 30, 555-574. Byrne, B. M., & Shavelson, R. J. (1987). Adolescent self-concept: Testing the assumption of equivalent structure across gender. American Educational Research Journal, 24, 365-385. Chan, D. (1998). The conceptualization and analysis of change over time: An integrative approach incorporating longitudinal mean and covariance structures analysis (LMACS) and multiple indicator latent growth modeling (MLGM). Organizational Research Methods, 1, 421–483. Chen, C., Lee, S., & Stevenson, H. W. (1995). Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychological Science, 6(3), 170-175. Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464-504. Chen, F. F., & West, S. G. (2008). Measuring individualism and collectivism: The importance of considering differential components, reference groups, and measurement invariance. Journal Of Research In Personality, 42(2), 259-294. 45 Chuang, J., Savalei, V. & Falk, C. F. (2015). Investigation of Type I error rates of three versions of robust chi-square difference tests. Structural Equation Modeling. Advance online publication. doi: 10.1080/10705511.2014.938713 Chou, C. P., Bentler, P. M., & Satorra, A. (1991). Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of Mathematical and Statistical Psychology, 44, 347-357. Curran, P.J., West, S.G., & Finch, J.F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29. Davidov, E., Schmidt, P., & Schwartz, S. H. (2008). Bringing values back in: The adequacy of the European social survey to measure values in 20 countries. Public Opinion Quarterly, 72(3), 420-445. Edwards, M. C. (2013). Purple unicorns, true models, and other things I've never seen. Measurement: Interdisciplinary Research and Perspectives, 11(3), 107-111. Golembiewski, R. T., Billingsley, K., & Yeager, S. (1975). Measuring change and persistence in human affairs: Types of change generated by OD designs. Journal of Applied Behavioral Science, 12(2), 133-157. Heine, S. J., Lehman, D. R., Peng, K., & Greenholtz, J. (2002). What's wrong with cross-cultural comparisons of subjective Likert scales?: The reference-group effect. Journal Of Personality And Social Psychology, 82(6), 903-918. Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research, 18(3-4), 117-144. 46 Horn, J. L., McArdle, J. J., Mason, R. (1983). When is invariance not invariant: A practical scientist's look at the ethereal concept of factor invariance. The Southern Psychologist, 1, 179-88. Hu, L.-T., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351-362. Hui, C. H., & Triandis, H. C. (1986). Individualism-collectivism: A study of cross-cultural researchers. Journal Of Cross-Cultural Psychology, 17(2), 225-248. Johnson, R. A., & Wichern, D. W. (1982). Applied multivariate statistical analysis. Englewood Cliffs, NJ: Prentice-Hall. Joreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2, Pt.1), 183-202. Kavussanu, M., & Boardley, I. D. (2009). The Prosocial and Antisocial Behavior in Sport Scale. Journal of Sport & Exercise Psychology, 31(1), 97-117. Kline, R.B. (2010). Principles and practice of structural equation modeling (3rd ed.). New York: Guilford Press. Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98(2), 224-253. Marsh, H.W. (1993). The multidimensional structure of academic self-concept: Invariance over gender and age. American Educational Research Journal, 30, 841-860. Marsh, H.W. (1994). Confirmatory factor analysis models of factorial invariance: A multifaceted approach. Structural Equation Modeling, 1, 5-34. 47 Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543. Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156-166. Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge. Millsap, R. E., & Everson, H. T. (1991). Confirmatory measurement model comparisons using latent means. Multivariate Behavioral Research, 26, 479-497. Millsap, R. E., & Hartog, S. B. (1988). Alpha, beta, and gamma change in evaluation research: A structural equation approach. Journal of Applied Psychology, 73, 574-584. Millsap, R. E., & Kwok, O. (2004). Evaluating the Impact of Partial Factorial Invariance on Selection in Two Populations. Psychological Methods, 9(1), 93-115. Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189. Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30. Nesselroade, J. R., & Thompson, W.W. (1995). Selection and related threats to group comparisons: An example comparing factorial structures of higher and lower ability groups of adult twins. Psychological Bulletin, 117, 271-284. 48 Nichols, A. L., & Maner, J. K. (2008). The good-subject effect: Investigating participant demand characteristics. Journal of General Psychology, 135(2), 151-165. Nye, C. D., Roberts, B. W., Saucier, G., & Zhou, X. (2008). Testing the measurement equivalence of personality adjective items across cultures. Journal of Research in Personality, 42(6), 1524-1536. Riordan, C. R., & Vandenberg, R. J. (1994). A central question in cross-cultural research: Do employees of different cultures interpret work-related measures in an equivalent manner? Journal of Management, 20, 643-671. Rock, D. A., Werts, C. E., & Flaugher, R. L. (1978). The use of analysis of covariance structures for comparing the psychometric properties of multiple variables across populations. Multivariate Behavioral Research, 13, 403-418. Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. Rusticus, S. A., Hubley, A. M., & Zumbo, B. D. (2008). Measurement invariance of the Appearance Schemas Inventory--Revised and the Body Image Quality of Life Inventory across age and gender. Assessment, 15(1), 60-71. Satorra, A. (2000). Scaled and adjusted restricted tests in multisample analysis of moment structures. In D.D.H. Heijmails, D.S.G. Pollock, & A. Satorra (Eds.), Innovations in multivariate statistical analysis: A Festschrift for Heinz Neudecker (pp. 233-247). Dordrecht, The Netherlands: Kluwer Academic Publishers. 49 Satorra, A., & Bentler, P.M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. ASA Proceedings of the Business and Economic Section, 52, 308-313. Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye && C.C. Clogg (Eds.) Latent variable analysis: applications for developmental research (pp. 399-419). Thousand Oaks: Sage. Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test for moment structure analysis. Psychometrika, 66, 507-514. Satorra, A., & Bentler, P.M. (2010). Ensuring positiveness of the scaled difference chi-square test statistic. Psychometrika, 75, 243-248. Savalei, V. (2014). Understanding robust corrections in structural equation modeling. Structural Equation Modeling, 21(1), 149-160. Schaie, K.W., & Hertzog, C. (1985). Measurement in the psychology of adulthood and aging. In J. E. Birren & K.W. Schaie (Eds.), Handbook of the psychology of aging (2nd ed., pp. 61-92). New York: Van Nostrand Reinhold. Schmitt, N., Pulakos, E. D., & Lieblein, A. (1984). Comparison of three techniques to assess group-level beta and gamma change. Applied Psychological Measurement, 8, 249-260. Serlin, R. C. (2000). Testing for robustness in Monte Carlo studies. Psychological Methods, 5, 230–240. Shapiro, A. (1985). Asymptotic equivalence of minimum discrepancy function estimators to GLS estimators. South African Statistical Journal, 17, 33-81. 50 Steenkamp, J.E.M., & Baumgartner, H. (1998). Assessing measurement invariance in crossnational consumer research. Journal of Consumer Research, 25, 78-90. Steiger, J. H., Shapiro, A., Browne, M. W. (1985). On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika, 50, 253-264. Steinmetz, H. (2013). Analyzing observed composite differences across groups: Is partial measurement invariance enough? Methodology: European Journal of Research Methods for The Behavioral and Social Sciences, 9(1), 1-12 Taris, T. W., Bok, I. A., & Meijer, Z. Y. (1998). Assessing stability and change of psychometric properties of multi-item concepts across different situations: A general approach. Journal of Psychology, 132, 301-316. Vale, C.D., & Maurelli, V.A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465-471. Van de Schoot, R., Lugtig, P., & Hox. J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486-492. Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4-69. Yoo, B. (2002). Cross-group comparisons: A cautionary note. Psychology & Marketing, 19(4), 357-368. 51 Appendix A Generating Data From A Contaminated Normal Distribution ## Code to get sigma k <- 3 # Number of factors p <- 15 # Number of indicators lambda <- rep(0.7, p) # Vector of factor loadings phi <- c(0, 0.3) # Vector of factor correlations L <- matrix(0, p, k) L[1:5, 1] <- lambda[1:5] L[6:10, 2] <- lambda[6:10] L[11:15, 3] <- lambda[11:15] F0 <- matrix(phi[1], k, k) diag(F0) <- 1 tmp_mat <- L %*% F0 %*% t(L) P <- matrix(0, p, p) diag(P) <- 1-diag(tmp_mat) # Sigma sigma <- tmp_mat + P ## Function to generate from contaminated normal ## n - sample size ## p - proportion contamination ## k - scaling factor for contaminated sigma ## sigma - sigma of non-contaminated ## mu - means of latent distrib (usually just 0) ## Will automaticaly scale mixture so that all factors ## have unit variance. ## Other inputs for rcontnorm function p = .2 k = 10 mu = rep(0,15) n = 10000 nreps = 1000 # Number of datasets to be generated library(MASS) rcontnorm<-function(n,p,k,sigma,mu){ p1<-1-p p2<-p nf<-length(mu) R<-diag(nf) D12inv<-diag(nf) varf<-as.matrix(p1*sigma+p2*k*sigma) diag(D12inv)<-1/sqrt(diag(varf)) sigrescaled<-D12inv%*%sigma%*%D12inv n1<-n*p1 ## this needs to be an integer and might fail if it is not n2<-n*p2 X1<-mvrnorm(n1,mu,sigrescaled) X2<-mvrnorm(n2,mu,k*sigrescaled) X<-rbind(X1,X2) return(X) } 52 Appendix B Sample Runs of D, DS0, DSB1, and DSB10 library(lavaan) ###initial model runs ana.model<-" #single group configural model f1=~y1+y2+y3+y4 f2=~y5+y6+y7+y8 " ###modified models used in the test of strong invariance ana.model1<-" f1=~c(a1,a1)*y1+c(a2,a2)*y2+c(a3,a3)*y3+c(a4,a4)*y4 #equality constraints added manually f2=~c(a5,a5)*y5+c(a6,a6)*y6+c(a7,a7)*y7+c(a8,a8)*y8 y1 ~ c(b1, b12)*1 #manually freeing intercepts y2 ~ c(b2, b22)*1 y3 ~ c(b3, b32)*1 y4 ~ c(b4, b42)*1 y5 ~ c(b5, b52)*1 y6 ~ c(b6, b62)*1 y7 ~ c(b7, b72)*1 y8 ~ c(b8, b82)*1 f1 ~ c(0,0)*1 #manually fixing latent means to 0 f2 ~ c(0,0)*1" ana.model2<-" f1=~c(a1,a1)*y1+c(a2,a2)*y2+c(a3,a3)*y3+c(a4,a4)*y4 #equality constraints added manually f2=~c(a5,a5)*y5+c(a6,a6)*y6+c(a7,a7)*y7+c(a8,a8)*y8 y1 ~ c(b1, b12)*1 #manually freeing intercepts y2 ~ c(b2, b22)*1 y3 ~ c(b3, b32)*1 y4 ~ c(b4, b42)*1 y5 ~ c(b5, b52)*1 y6 ~ c(b6, b62)*1 y7 ~ c(b7, b72)*1 y8 ~ c(b8, b82)*1 f1 ~ c(0,0)*1 #manually fixing latent means to 0 f2 ~ c(0,0)*1 b22==b2+a2*(b12-b1) #nonlinear constraints b32==b3+a3*(b12-b1) b42==b4+a4*(b12-b1) b62==b6+a6*(b52-b5) b72==b7+a7*(b52-b5) b82==b8+a8*(b52-b5) " #grouping variable group<-"group" #model runs 53 run0<-cfa(ana.model, data=data, group=group, estimator="mlm", mimic="EQS") #configural run1<-cfa(ana.model, data=data, group=group, estimator="mlm", mimic="EQS", group.equal=c("loadings")) #weak (loadings constrained) run2<-cfa(ana.model, data=data, group=group, estimator="mlm", mimic="EQS", group.equal=c("loadings","intercepts")) #strong (intercepts constrained) run3<-cfa(ana.model, data=data, group=group, estimator="mlm", mimic="EQS", group.equal=c("loadings","intercepts","residuals")) #strict (residuals) run4<-cfa(ana.model, data=data, group=group, estimator="mlm", mimic="EQS", group.equal=c("loadings","intercepts","residuals","lv.variances","lv.covariances")) #beyond strict (phi matrix constrained) run5<-cfa(ana.model, data=data, group=group, estimator="mlm", mimic="EQS", group.equal=c("loadings","intercepts","residuals","lv.variances","lv.covariances","means")) #full mean and covariance structure (latent means constrained) #modified model runs for strong difference tests run1X<-cfa(ana.model1, data=data, group=group, estimator="mlm", mimic="EQS") run2X<-cfa(ana.model2, data=data, group=group, estimator="mlm", mimic="EQS") ###difference tests ##D inspect(run1,"fit.measures")[c(2)]-inspect(run0,"fit.measures")[c(2)] #weak inspect(run2,"fit.measures")[c(2)]-inspect(run1,"fit.measures")[c(2)] #strong inspect(run3,"fit.measures")[c(2)]-inspect(run2,"fit.measures")[c(2)] #strict inspect(run4,"fit.measures")[c(2)]-inspect(run3,"fit.measures")[c(2)] #beyond inspect(run5,"fit.measures")[c(2)]-inspect(run4,"fit.measures")[c(2)] #full ##DS0 #for lavaan 0.5-17 and below lavTestLRT(run0, run1, run2, run3, run4, run5, SB.classic=F) #the run1/run2 difference test is incorrect #for lavaan 0.5-18 and above lavTestLRT(run0, run1, run2, run3, run4, run5, method="satorra.2000") #the run1/run2 difference test is incorrect #to test strong invariance lavTestLRT(run1X,run2X, method="satorra.2000") ##DSB1 lavTestLRT(run0, run1, run2, run3, run4, run5) ##DSB10 #available in lavaan 0.5-18 and above lavTestLRT(run0, run1, run2, run3, run4, run5, method="satorra.bentler.2010") #the run1/run2 difference test is incorrect #to test strong invariance lavTestLRT(run1X,run2X, method="satorra.bentler.2010") 54 Appendix C Tables Table 1 Parameter Values Used in Data Generation Factor Loading Matrix (invariant across groups) '.50.4 0.45 0.55 0.6 0 0 0 00 0 0 0 0.4 0.45 0.55 0.6 '.70.6 0.65 0.75 0.8 0 0 0 00 0 0 0 0.6 0.65 0.75 0.8 Factorial Non-invariance Group 1: 11 0.30.3 1 ; Group 2: 21 00 1A Group 1: 11 0.30.3 1 ; Group 2: 21 0.30.3 1.5B Residual Variances (invariant across groups) .5 (0.84,0.7975,0.6975,0.64,0.84,0.7975,0.6975,0.64) .7 (0.64,0.5775,0.4375,0.36,0.64,0.5775,0.4375,0.36) Intercepts (invariant across groups) 1 2 (3,3.5,4,4.5,3,3.5,4,4.5) Latent Means 1 (0,0) , 2 (0.2,0.5) Sample Size Ratio Equal (1:1) Unequal (5:6) Group Sizes Small (NTotal=220) Medium (NTotal=440) Large (NTotal=1760) Distributions Group 1: Normal Group 1: VM2,7 Group 1: VM2,15 Group 1: Normal Group 1: VM2,7 Group 1: CN2,10 Group 1: Normal Group 2: Normal Group 2: VM2,7 Group 2: VM2,15 Group 2: VM2,7 Group 2: VM2,15 Group 2: CN2,10 Group 2: CN2,10 Subscripts for VM distributions indicate univariate skew, and average kurtosis respectively. When average kurtosis is 7, specified kurtoses for the 8 variables were 5.95, 6.25, 6.55, 6.85, 7.15, 7.45, 7.75, and 8.05. When average kurtosis is 15, specified univariate kurtoses were 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, and 18.5. 55 Table 2 Rejection Rates for the Overall Tests of Model Fit in Selected Conditions with Even Sample Sizes When Data are Not Normally Distributed and When 2 2B .5 .7 N=110 N=880 N=110 N=880 ML SB ML SB ML SB ML SB VM Configural 18.0 11.3 21.3 5.4 37.1 12.5 53.1 7.5 Weak 28.3 10.0 35.3 6.1 63.2 12.5 81.5 7.1 Strong 28.7 11.6 35.5 6.2 62.9 13.4 79.1 7.4 Strict 87.2 21.9 94.4 13.7 93.4 22.4 98.3 10.5 Beyond 91.4 21.7 99.0 25.3 96.9 22.6 99.6 28.1 Full 95.9 33.8 100 80.6 98.9 36.3 100 89.1 .5 .7 N=110 N=880 N=110 N=880 ML SB ML SB ML SB ML SB CN Configural 98.8 8.8 99.5 6.1 99.9 11.8 99.5 5.1 Weak 99.6 9.6 99.7 5.6 100 11.8 99.8 5.5 Strong 99.5 11.4 99.7 5.8 100 12.9 99.5 6.2 Strict 99.7 7.7 99.9 4.0 100 8.8 99.9 4.3 Beyond 99.9 7.3 100 11.2 100 10.4 100 18.8 Full 99.9 15.8 100 73.9 100 20.3 100 93.4 Note. Bold values indicate Type-I error rates falling outside of the acceptable 3.75%-6.25% range (the tests of Beyond and Full invariance are not Type-I error rates but power) Ns are the sample size in each group. VM indicates VM2,15 conditions. CN indicates CN2,10 conditions. 56 Table 3 Type I Error Rates For Difference Tests When Data Are Normally Distributed Averaged Across Loading, FCOV, and Sample Evenness Conditions D DS0 DSB1A DSB1B DSB10 DSBH Small Weak 6.03 7.31 6.77 6.79 6.82 6.76 Strong 5.00 4.17 8.19 8.19 5.39 8.20 Strict 5.46 7.23 6.83 6.83 6.72 6.83 Med Weak 5.46 6.11 6.00 6.00 5.90 6.00 Strong 5.19 4.41 6.95 6.95 5.51 6.95 Strict 5.43 6.31 6.09 6.09 6.18 6.09 Large Weak 4.70 4.83 4.80 4.80 4.81 4.80 Strong 4.94 4.18 5.41 5.41 5.08 5.41 Strict 4.93 5.21 5.19 5.19 5.14 5.19 Note. Bold values indicate rejection rates falling outside of the acceptable 3.75%-6.25% range. 57 Table 4 Type I Error Rates for The Weak Invariance Difference Test (The Test of Equality of Factor Loadings, df=6) When 2 2B , and Sample Sizes are Even Across Groups .5 .7 D DS0 DSB1 DSB10 D DS0 DSB1 DSB10 N=110 Normal 7.1 8.6 8.2 8.6 4.2 5.0 4.7 4.9 VM2,7 28.7 15.5 6.0 10.4 46.2 13.1 5.5 9.8 VM2,15 31.1 15.0 5.4 10.3 65.6 16.5 5.7 11.1 VM0,0;2,7 16.3 12.2 8.4 10.6 24.1 9.9 4.9 6.8 VM2,7;2,15 26.7 14.3 5.8 10.4 55.6 13.6 4.3 8.3 CN2,10 67.5 14.9 6.5 10.9 61.0 11.4 4.2 8.4 CN0,0;2,10 34.0 15.1 9.6 13.4 31.0 10.4 5.2 8.1 N=220 Normal 6.2 6.6 6.4 6.1 5.8 6.5 6.5 6.4 VM2,7 24.9 8.4 4.4 6.0 49.0 8.4 4.5 6.2 VM2,15 32.2 11.0 4.4 8.9 68.3 11.6 3.6 6.9 VM0,0;2,7 14.3 7.9 5.4 6.6 27.6 8.1 5.5 7.1 VM2,7;2,15 27.5 8.3 3.6 6.7 59.3 12.0 5.3 7.9 CN2,10 60.6 9.4 5.0 7.5 59.3 9.1 5.6 7.6 CN0,0;2,10 28.9 7.8 6.0 7.6 27.4 6.7 4.7 6.1 N=880 Normal 5.3 5.5 5.4 5.5 5.6 6.2 6.2 6.2 VM2,7 23.5 5.5 4.8 5.0 49.3 4.7 3.8 4.2 VM2,15 37.0 6.6 4.2 6.0 73.8 6.2 3.5 4.7 VM0,0;2,7 15.1 6.2 5.3 5.9 24.9 3.5 3.2 3.7 VM2,7;2,15 30.0 4.8 3.0 4.7 66.4 6.0 4.2 5.6 CN2,10 60.2 6.4 5.1 5.9 56.0 7.4 6.6 7.2 CN0,0;2,10 33.2 6.1 5.3 5.8 29.4 5.8 5.6 6.2 Note. Bold values fall outside of the acceptable 3.75%-6.25% range. Ns are the sample size in each group. 58 Table 5 Type I Error Rates for The Strong Invariance Difference Test (The Test of Equality of Indicator Intercepts, df=6) When 2 2B , and Sample Sizes are Even Across Groups .5 .7 D DS0 DSB1 DSB10 D DS0 DSB1 DSB10 N=110 Normal 5.1 5.6 7.9 5.7 4.9 5.2 7.9 5.3 VM2,7 5.5 5.2 8.4 6.2 6.5 6.0 10.4 7.0 VM2,15 6.7 5.4 8.0 6.3 5.8 5.4 10.0 5.7 VM0,0;2,7 4.5 4.7 7.5 5.4 1.8 4.4 8.6 5.4 VM2,7;2,15 5.7 5.0 8.4 5.5 7.1 5.8 10.0 6.4 CN2,10 9.4 5.3 14.4 8.4 5.7 4.8 12.6 5.0 CN0,0;2,10 7.6 5.6 11.0 6.7 5.6 5.0 11.1 5.1 N=220 Normal 4.6 4.9 6.0 5.0 5.4 5.8 7.3 5.9 VM2,7 5.5 5.1 6.5 5.5 4.3 4.2 6.8 4.6 VM2,15 3.9 3.6 5.1 3.8 8.1 6.0 8.3 6.6 VM0,0;2,7 2.7 3.6 5.9 4.4 2.2 4.7 6.1 4.9 VM2,7;2,15 6.0 5.1 6.4 5.8 7.2 5.4 8.0 6.3 CN2,10 8.0 4.7 8.2 5.2 7.3 5.3 9.9 6.0 CN0,0;2,10 6.0 4.8 7.8 5.0 5.2 4.5 7.6 5.0 N=880 Normal 5.4 5.4 5.8 5.5 6.0 6.2 6.5 6.2 VM2,7 5.2 4.7 5.2 5.0 4.8 4.3 4.6 4.4 VM2,15 7.4 5.8 6.3 6.2 6.3 4.6 4.5 4.8 VM0,0;2,7 4.1 6.1 6.4 6.1 2.7 4.6 4.9 4.7 VM2,7;2,15 6.0 5.1 5.4 5.3 7.7 5.5 5.8 5.6 CN2,10 7.9 5.3 6.2 5.6 7.4 5.3 6.2 5.3 CN0,0;2,10 8.0 6.2 7.0 6.0 4.7 4.3 4.5 4.0 Note. Bold values fall outside of the acceptable 3.75%-6.25% range. Ns are the sample size in each group. 59 Table 6 Type I Error Rates for The Strict Invariance Difference Test (The Test of Equality of Residual Variances, df=8)When 2 2B , and Sample Sizes are Even Across Groups .5 .7 D DS0 DSB1 DSB10 D DS0 DSB1 DSB10 N=110 Normal 5.4 6.7 6.6 6.4 5.4 6.9 6.0 6.1 VM2,7 84.5 8.6 7.9 7.3 83.8 11.0 7.8 8.4 VM2,15 95.3 10.0 8.8 7.8 94.9 12.9 7.8 7.3 VM0,0;2,7 57.7 14.1 13.4 13.5 61.5 15.2 14.6 13.7 VM2,7;2,15 92.7 12.8 12.1 10.8 91.1 12.7 8.4 8.9 CN2,10 68.8 4.3 1.6 2.9 67.6 4.0 1.4 2.9 CN0,0;2,10 41.1 6.9 8.5 5.9 41.2 6.5 9.6 6.1 N=220 Normal 6.3 7.4 7.3 7.4 4.9 6.0 5.7 5.6 VM2,7 84.7 6.1 5.6 5.6 85.2 7.9 6.6 6.3 VM2,15 95.8 7.6 6.8 6.4 95.6 7.7 5.8 4.9 VM0,0;2,7 62.0 8.4 8.0 7.9 62.3 8.9 8.5 8.6 VM2,7;2,15 94.6 7.9 7.2 7.2 93.4 7.7 6.5 6.8 CN2,10 67.4 2.0 1.6 1.7 68.0 2.3 1.6 1.9 CN0,0;2,10 38.6 4.3 5.7 3.8 41.0 2.5 4.4 2.5 N=880 Normal 6.5 6.8 6.6 6.6 4.6 5.1 5.1 5.1 VM2,7 88.3 6.1 5.8 5.9 87.6 5.3 4.6 4.8 VM2,15 98.3 5.6 5.2 5.3 97.7 5.5 4.6 4.6 VM0,0;2,7 65.7 6.3 6.2 6.3 63.1 5.7 5.9 5.7 VM2,7;2,15 96.0 5.0 4.7 4.8 94.1 6.6 6.2 6.2 CN2,10 67.4 2.2 1.5 1.9 65.2 1.6 1.5 1.5 CN0,0;2,10 41.1 2.8 3.0 2.6 39.3 2.0 2.0 2.0 Note. Bold values fall outside of the acceptable 3.75%-6.25% range. Ns are the sample size in each group. 60 Table 7 Power of Difference Tests When Data Are Normally Distributed Averaged Across Loading, FCOV, and Sample Evenness Conditions D DS0 DSB1A DSB1B DSB10 DSBH Small Beyond 2 2A 26.1 28.6 28.5 28.5 28.2 28.5 Beyond 2 2B 25.3 28.0 29.3 29.3 27.5 29.3 Full 77.7 78.3 83.0 83.2 78.7 83.2 Med Beyond 2 2A 46.5 47.5 47.4 47.4 47.1 47.4 Beyond 2 2B 46.7 48.4 50.2 50.2 48.3 50.2 Full 96.9 97.0 97.6 97.6 97.1 97.6 Large Beyond 2 2A 95.0 94.9 95.0 95.0 94.9 95.0 Beyond 2 2B 95.9 96.2 96.3 96.3 96.2 96.3 Full 100 100 100 100 100 100 Note. “Beyond” refers to the test of beyond strict invariance (df=3), which places between-group equality constraints on latent variances and covariances. 2A refers to power conditions where groups differ on the latent covariance in the population. 2B refers to power conditions where groups differ on the latent variance of the second factor in the population. “Full” refers to the test of full mean and covariance structure invariance, which places between-group equality constraints on latent means. 61 Table 8 Power for The Beyond Strict Invariance Difference Test When 2 2B , and Sample Sizes are Even Across Groups .5 .7 D DS0 DSB1 DSB10 D DS0 DSB1 DSB10 N=110 Normal 19.1 22.3 20.7 20.8 31.7 35.6 33.7 34.1 VM2,7 47.9 26.5 8.3 19.1 54.2 24.9 11.6 19.5 VM2,15 58.7 34.4 4.8 20.7 64.0 26.8 5.4 16.9 VM0,0;2,7 33.7 21.9 7.4 15.0 44.9 21.3 9.4 16.0 VM2,7;2,15 54.3 30.3 4.2 18.6 56.6 22.4 2.6 11.2 CN2,10 56.2 14.2 6.1 9.0 55.4 10.8 6.1 9.1 CN0,0;2,10 35.8 9.4 4.5 5.6 47.7 13.4 5.7 10.8 N=220 Normal 35.2 36.7 36.4 36.5 61.1 62.4 61.3 61.6 VM2,7 53.0 29.4 11.7 22.3 69.4 33.6 19.7 29.1 VM2,15 60.8 29.6 6.9 19.0 71.1 28.3 11.9 22.5 VM0,0;2,7 46.8 28.1 10.4 21.7 60.8 35.3 15.9 30.7 VM2,7;2,15 56.5 29.9 4.9 19.3 64.4 23.4 6.6 16.5 CN2,10 61.3 15.1 9.4 11.6 71.5 19.9 16.1 17.7 CN0,0;2,10 49.2 15.0 6.4 13.0 63.8 22.5 7.5 19.5 N=880 Normal 93.6 94.0 93.6 93.7 99.9 99.9 99.9 99.9 VM2,7 86.5 61.7 54.0 58.6 95.3 75.2 73.4 74.6 VM2,15 84.3 50.2 34.7 45.2 91.2 59.8 51.5 55.4 VM0,0;2,7 85.2 71.5 60.8 69.7 96.9 89.6 81.4 88.5 VM2,7;2,15 84.4 53.0 36.7 48.0 93.6 64.3 50.5 62.1 CN2,10 89.1 41.2 37.7 39.0 98.3 71.6 70.7 70.5 CN0,0;2,10 89.9 58.7 40.4 57.4 99.0 89.1 78.6 88.6 Note. Ns are the sample size in each group. 62 Table 9 Power for The Full Mean and Covariance Structure Invariance Difference Test When 2 2B , and Sample Sizes are Even Across Groups .5 .7 D DS0 DSB1 DSB10 D DS0 DSB1 DSB10 N=110 Normal 66.4 66.5 72.8 67.8 80.2 80.5 84.6 81.0 VM2,7 67.5 66.5 79.6 68.3 80.5 80.1 90.6 81.3 VM2,15 73.1 69.6 84.9 73.7 81.7 80.7 92.9 82.3 VM0,0;2,7 68.2 67.9 70.3 68.9 80.4 80.2 80.9 81.1 VM2,7;2,15 68.8 66.7 82.9 69.8 77.2 76.5 89.7 78.2 CN2,10 69.8 63.4 87.4 71.3 79.5 78.1 90.9 80.5 CN0,0;2,10 69.5 67.0 82.4 71.2 81.4 80.6 90.0 82.0 N=220 Normal 93.8 94.0 95.2 94.0 97.9 97.9 98.6 98.1 VM2,7 95.7 95.0 96.8 95.8 97.9 97.8 98.9 97.9 VM2,15 95.2 94.8 97.4 95.3 98.7 98.4 99.5 98.7 VM0,0;2,7 94.7 94.5 94.7 95.0 98.7 98.6 98.1 98.8 VM2,7;2,15 93.0 92.6 96.0 93.2 98.1 97.9 99.3 98.1 CN2,10 93.5 91.8 97.6 93.4 98.3 98.2 99.3 98.3 CN0,0;2,10 95.6 94.9 98.0 95.8 98.4 98.5 99.3 98.4 Note. Ns are the sample size in each group. When N=880, power to detect the misspecification was 100% for all statistics in all conditions. Negative values of DSB1 are treated as model rejections. 63 Table 10 Rejection Rates for D5,SB1 When Interpreting Negative Chi-Squares As Model Retentions Versus Model Rejections When .5 , 2 2B , and sample size is even between groups DSB1A DSB1B DSBH N=110 Normal 72.7 72.8 72.8 VM2,7 73.5 79.6 79.5 VM2,15 65.6 84.9 84.6 VM0,0;2,7 69.7 70.3 70.3 VM2,7;2,15 67.5 82.9 82.9 CN2,10 45.0 87.4 86.6 CN0,0;2,10 65.6 82.4 82.2 N=220 Normal 95.2 95.2 95.2 VM2,7 93.8 96.8 96.8 VM2,15 84.6 97.4 97.4 VM0,0;2,7 94.6 94.7 94.7 VM2,7;2,15 84.9 96.0 96.0 CN2,10 63.6 97.6 97.6 CN0,0;2,10 90.7 98.0 98.0 N=880 Normal 100.0 100.0 100.0 VM2,7 99.9 100.0 100.0 VM2,15 91.0 100.0 100.0 VM0,0;2,7 100.0 100.0 100.0 VM2,7;2,15 96.9 100.0 100.0 CN2,10 79.0 100.0 100.0 CN0,0;2,10 99.6 100.0 100.0 Note. DSB1A is the rejection rate of DSB1 when negative chi-squares are interpreted as model retentions, and DSB1B is the rejection rate of DSB1 when negative chi-squares are interpreted as model rejections. Ns are the sample size in each group. 64 Table 11 Rejection Rates For Tests of Model Fit For VM0,0 Conditions When Sample Sizes Are Even N=110 N=220 N=880 ML SB ML SB ML SB La05 2 2A 1 0.073 0.092 0.047 0.052 0.042 0.046 2 0.076 0.093 0.053 0.066 0.042 0.046 3 0.077 0.106 0.055 0.064 0.047 0.048 4 0.079 0.118 0.058 0.076 0.044 0.050 5 0.109 0.142 0.119 0.137 0.316 0.317 6 0.293 0.365 0.546 0.587 1.000 1.000 2 2B 1 0.062 0.078 0.067 0.075 0.050 0.053 2 0.061 0.086 0.071 0.084 0.047 0.050 3 0.055 0.089 0.070 0.083 0.048 0.048 4 0.060 0.091 0.080 0.094 0.049 0.052 5 0.077 0.133 0.130 0.149 0.378 0.384 6 0.239 0.313 0.471 0.523 1.000 1.000 La07 2 2A 1 0.074 0.099 0.051 0.059 0.046 0.045 2 0.071 0.095 0.051 0.061 0.044 0.049 3 0.078 0.093 0.051 0.063 0.048 0.049 4 0.081 0.110 0.057 0.067 0.049 0.051 5 0.143 0.190 0.166 0.198 0.700 0.701 6 0.407 0.516 0.772 0.817 1.000 1.000 2 2B 1 0.074 0.095 0.047 0.053 0.051 0.054 2 0.074 0.087 0.057 0.065 0.050 0.052 3 0.067 0.094 0.055 0.065 0.041 0.041 4 0.076 0.098 0.055 0.066 0.043 0.044 5 0.116 0.158 0.178 0.204 0.712 0.715 6 0.313 0.437 0.683 0.736 1.000 1.000 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 65 Table 12 Rejection Rates For Tests of Model Fit For VM0,0 Conditions When Sample Sizes Are Uneven N1=100,N2=120 N1=200,N2=240 N1=800,N2=960 ML SB ML SB ML SB La05 2 2A 1 0.058 0.074 0.065 0.073 0.051 0.053 2 0.059 0.072 0.061 0.066 0.044 0.042 3 0.053 0.085 0.059 0.069 0.040 0.043 4 0.059 0.077 0.063 0.071 0.042 0.044 5 0.093 0.114 0.115 0.147 0.332 0.343 6 0.263 0.360 0.550 0.603 1.000 1.000 2 2B 1 0.073 0.082 0.046 0.054 0.051 0.049 2 0.061 0.087 0.044 0.055 0.047 0.048 3 0.062 0.091 0.044 0.058 0.043 0.047 4 0.067 0.091 0.050 0.067 0.040 0.041 5 0.087 0.129 0.116 0.138 0.354 0.366 6 0.231 0.316 0.454 0.509 0.996 0.997 La07 2 2A 1 0.074 0.090 0.058 0.070 0.055 0.054 2 0.085 0.107 0.057 0.068 0.052 0.057 3 0.087 0.107 0.048 0.065 0.061 0.064 4 0.078 0.106 0.056 0.062 0.053 0.059 5 0.136 0.173 0.155 0.181 0.700 0.709 6 0.393 0.494 0.761 0.808 1.000 1.000 2 2B 1 0.068 0.087 0.051 0.057 0.045 0.046 2 0.074 0.091 0.053 0.061 0.042 0.043 3 0.076 0.097 0.055 0.062 0.043 0.044 4 0.078 0.104 0.058 0.073 0.035 0.037 5 0.117 0.178 0.178 0.211 0.691 0.702 6 0.336 0.448 0.688 0.737 1.000 1.000 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 66 Table 13 Rejection Rates For Tests of Model Fit For VM2,7 Conditions When Sample Sizes Are Even N=110 N=220 N=880 ML SB ML SB ML SB La05 2 2A 1 0.141 0.091 0.147 0.070 0.146 0.061 2 0.234 0.104 0.220 0.072 0.211 0.066 3 0.218 0.112 0.208 0.077 0.209 0.065 4 0.675 0.146 0.719 0.135 0.710 0.106 5 0.781 0.178 0.798 0.160 0.920 0.251 6 0.890 0.329 0.962 0.414 1.000 0.983 2 2B 1 0.191 0.093 0.213 0.087 0.201 0.048 2 0.283 0.102 0.277 0.073 0.293 0.056 3 0.262 0.112 0.266 0.075 0.273 0.053 4 0.707 0.175 0.727 0.114 0.772 0.107 5 0.801 0.179 0.819 0.154 0.926 0.263 6 0.884 0.311 0.946 0.364 1.000 0.927 La07 2 2A 1 0.292 0.108 0.281 0.078 0.296 0.066 2 0.463 0.101 0.487 0.070 0.508 0.053 3 0.459 0.121 0.466 0.077 0.495 0.058 4 0.794 0.180 0.829 0.128 0.861 0.094 5 0.875 0.202 0.920 0.184 0.997 0.392 6 0.949 0.407 0.992 0.556 1.000 1.000 2 2B 1 0.376 0.100 0.394 0.074 0.396 0.061 2 0.562 0.105 0.576 0.075 0.588 0.054 3 0.534 0.114 0.553 0.088 0.548 0.062 4 0.845 0.178 0.864 0.125 0.888 0.096 5 0.909 0.203 0.928 0.180 0.987 0.382 6 0.959 0.357 0.991 0.462 1.000 0.977 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 67 Table 14 Rejection Rates For Tests of Model Fit For VM2,7 Conditions When Sample Sizes Are Uneven N1=100,N2=120 N1=200,N2=240 N1=800,N2=960 ML SB ML SB ML SB La05 2 2A 1 0.154 0.105 0.120 0.051 0.138 0.055 2 0.222 0.111 0.208 0.061 0.219 0.054 3 0.221 0.118 0.206 0.066 0.212 0.062 4 0.696 0.175 0.685 0.126 0.730 0.088 5 0.791 0.201 0.786 0.163 0.918 0.213 6 0.886 0.373 0.958 0.408 1.000 0.976 2 2B 1 0.205 0.098 0.187 0.065 0.196 0.057 2 0.306 0.094 0.276 0.060 0.262 0.055 3 0.279 0.110 0.272 0.073 0.264 0.055 4 0.706 0.184 0.746 0.124 0.765 0.083 5 0.788 0.218 0.850 0.167 0.923 0.259 6 0.869 0.338 0.942 0.388 1.000 0.922 La07 2 2A 1 0.286 0.106 0.282 0.072 0.280 0.041 2 0.472 0.117 0.472 0.068 0.484 0.053 3 0.468 0.131 0.460 0.082 0.482 0.067 4 0.814 0.181 0.820 0.140 0.854 0.103 5 0.881 0.192 0.912 0.191 0.992 0.390 6 0.963 0.438 0.995 0.528 1.000 1.000 2 2B 1 0.375 0.119 0.383 0.082 0.405 0.059 2 0.568 0.114 0.576 0.101 0.617 0.056 3 0.539 0.129 0.541 0.102 0.584 0.067 4 0.847 0.161 0.873 0.129 0.907 0.086 5 0.893 0.203 0.938 0.176 0.991 0.422 6 0.943 0.356 0.992 0.464 1.000 0.981 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 68 Table 15 Rejection Rates For Tests of Model Fit For VM2,15 Conditions When Sample Sizes Are Even N=110 N=220 N=880 ML SB ML SB ML SB La05 2 2A 1 0.126 0.105 0.142 0.076 0.149 0.057 2 0.214 0.107 0.239 0.068 0.266 0.062 3 0.216 0.108 0.216 0.066 0.261 0.064 4 0.839 0.227 0.899 0.160 0.925 0.129 5 0.896 0.235 0.949 0.181 0.985 0.234 6 0.951 0.367 0.992 0.387 1.000 0.897 2 2B 1 0.180 0.113 0.171 0.072 0.213 0.054 2 0.283 0.100 0.279 0.069 0.353 0.061 3 0.287 0.116 0.273 0.070 0.355 0.062 4 0.872 0.219 0.896 0.159 0.944 0.137 5 0.914 0.217 0.947 0.178 0.990 0.253 6 0.959 0.338 0.982 0.356 1.000 0.806 La07 2 2A 1 0.332 0.132 0.338 0.077 0.405 0.061 2 0.589 0.137 0.631 0.103 0.733 0.082 3 0.598 0.168 0.626 0.115 0.728 0.078 4 0.948 0.259 0.960 0.175 0.976 0.124 5 0.976 0.256 0.986 0.214 0.999 0.293 6 0.993 0.435 1.000 0.487 1.000 0.971 2 2B 1 0.371 0.125 0.459 0.092 0.531 0.075 2 0.632 0.125 0.735 0.082 0.815 0.071 3 0.629 0.134 0.710 0.094 0.791 0.074 4 0.934 0.224 0.976 0.154 0.983 0.105 5 0.969 0.226 0.988 0.197 0.996 0.281 6 0.989 0.363 1.000 0.381 1.000 0.891 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 69 Table 16 Rejection Rates For Tests of Model Fit For VM2,15 Conditions When Sample Sizes Are Uneven N1=100,N2=120 N1=200,N2=240 N1=800,N2=960 ML SB ML SB ML SB La05 2 2A 1 0.144 0.098 0.165 0.079 0.166 0.064 2 0.226 0.110 0.249 0.084 0.279 0.063 3 0.224 0.108 0.247 0.091 0.275 0.069 4 0.862 0.225 0.916 0.178 0.941 0.125 5 0.918 0.218 0.948 0.186 0.989 0.223 6 0.954 0.337 0.990 0.380 1.000 0.883 2 2B 1 0.174 0.118 0.182 0.072 0.203 0.059 2 0.277 0.115 0.312 0.070 0.343 0.051 3 0.278 0.118 0.305 0.082 0.332 0.062 4 0.858 0.207 0.906 0.171 0.945 0.132 5 0.905 0.204 0.949 0.196 0.987 0.234 6 0.942 0.316 0.985 0.358 1.000 0.802 La07 2 2A 1 0.332 0.138 0.362 0.100 0.434 0.067 2 0.626 0.178 0.662 0.095 0.740 0.081 3 0.612 0.185 0.633 0.093 0.726 0.080 4 0.949 0.259 0.963 0.186 0.975 0.132 5 0.968 0.255 0.982 0.209 0.999 0.308 6 0.988 0.447 1.000 0.465 1.000 0.969 2 2B 1 0.436 0.144 0.471 0.091 0.564 0.073 2 0.670 0.139 0.729 0.100 0.825 0.089 3 0.663 0.146 0.717 0.106 0.796 0.085 4 0.958 0.232 0.967 0.162 0.984 0.130 5 0.974 0.242 0.987 0.204 0.999 0.313 6 0.993 0.367 1.000 0.394 1.000 0.870 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 70 Table 17 Rejection Rates For Tests of Model Fit For VM0,0;2,7 Conditions When Sample Sizes Are Even N=110 N=220 N=880 ML SB ML SB ML SB La05 2 2A 1 0.088 0.084 0.082 0.068 0.096 0.072 2 0.135 0.086 0.107 0.078 0.112 0.064 3 0.110 0.094 0.086 0.070 0.115 0.079 4 0.351 0.153 0.371 0.102 0.382 0.081 5 0.436 0.162 0.480 0.144 0.758 0.281 6 0.605 0.312 0.845 0.458 1.000 0.999 2 2B 1 0.117 0.089 0.113 0.066 0.122 0.061 2 0.151 0.095 0.150 0.078 0.162 0.057 3 0.142 0.116 0.134 0.073 0.138 0.066 4 0.388 0.166 0.405 0.100 0.444 0.086 5 0.478 0.167 0.546 0.130 0.781 0.274 6 0.634 0.298 0.818 0.401 1.000 0.980 La07 2 2A 1 0.137 0.106 0.129 0.088 0.111 0.054 2 0.230 0.116 0.217 0.078 0.202 0.061 3 0.210 0.123 0.177 0.082 0.170 0.057 4 0.458 0.161 0.444 0.107 0.466 0.083 5 0.585 0.212 0.642 0.188 0.943 0.528 6 0.800 0.445 0.971 0.661 1.000 1.000 2 2B 1 0.221 0.104 0.202 0.072 0.187 0.048 2 0.293 0.104 0.300 0.078 0.284 0.039 3 0.246 0.104 0.264 0.085 0.247 0.043 4 0.519 0.172 0.530 0.125 0.554 0.066 5 0.619 0.189 0.692 0.174 0.932 0.462 6 0.789 0.360 0.933 0.509 1.000 0.999 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 71 Table 18 Rejection Rates For Tests of Model Fit For VM0,0;2,7 Conditions When Sample Sizes Are Uneven N1=100,N2=120 N1=200,N2=240 N1=800,N2=960 ML SB ML SB ML SB La05 2 2A 1 0.082 0.076 0.089 0.062 0.076 0.053 2 0.115 0.088 0.103 0.057 0.103 0.046 3 0.106 0.088 0.092 0.062 0.090 0.050 4 0.319 0.126 0.324 0.089 0.345 0.071 5 0.394 0.150 0.459 0.143 0.722 0.269 6 0.582 0.310 0.838 0.488 1.000 0.998 2 2B 1 0.121 0.088 0.098 0.061 0.134 0.067 2 0.158 0.103 0.132 0.063 0.154 0.059 3 0.134 0.096 0.110 0.058 0.139 0.059 4 0.377 0.135 0.384 0.098 0.398 0.089 5 0.457 0.150 0.500 0.128 0.755 0.305 6 0.626 0.278 0.799 0.399 1.000 0.985 La07 2 2A 1 0.128 0.098 0.113 0.070 0.096 0.050 2 0.195 0.107 0.192 0.073 0.159 0.041 3 0.171 0.115 0.162 0.079 0.142 0.040 4 0.432 0.151 0.433 0.094 0.412 0.065 5 0.558 0.200 0.635 0.187 0.930 0.486 6 0.806 0.466 0.943 0.663 1.000 1.000 2 2B 1 0.229 0.112 0.227 0.086 0.195 0.061 2 0.316 0.128 0.292 0.077 0.303 0.059 3 0.271 0.124 0.271 0.077 0.263 0.058 4 0.511 0.155 0.530 0.106 0.553 0.078 5 0.603 0.192 0.696 0.167 0.943 0.489 6 0.753 0.357 0.925 0.527 1.000 1.000 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 72 Table 19 Rejection Rates For Tests of Model Fit For VM2,7;2,15 Conditions When Sample Sizes Are Even N=110 N=220 N=880 ML SB ML SB ML SB La05 2 2A 1 0.131 0.081 0.141 0.072 0.141 0.046 2 0.225 0.098 0.250 0.065 0.249 0.052 3 0.204 0.090 0.244 0.080 0.236 0.056 4 0.790 0.212 0.857 0.137 0.892 0.124 5 0.852 0.211 0.899 0.164 0.967 0.244 6 0.929 0.362 0.987 0.403 1.000 0.944 2 2B 1 0.164 0.106 0.198 0.063 0.183 0.048 2 0.255 0.102 0.284 0.068 0.291 0.053 3 0.252 0.110 0.277 0.074 0.284 0.056 4 0.809 0.211 0.866 0.141 0.882 0.104 5 0.876 0.216 0.909 0.145 0.967 0.223 6 0.919 0.334 0.982 0.384 1.000 0.867 La07 2 2A 1 0.303 0.114 0.345 0.089 0.352 0.077 2 0.535 0.110 0.585 0.091 0.622 0.086 3 0.519 0.114 0.571 0.098 0.596 0.087 4 0.911 0.197 0.931 0.146 0.949 0.114 5 0.949 0.200 0.968 0.198 0.998 0.347 6 0.987 0.388 1.000 0.509 1.000 0.992 2 2B 1 0.395 0.139 0.405 0.090 0.485 0.059 2 0.633 0.135 0.650 0.096 0.733 0.070 3 0.617 0.154 0.638 0.097 0.709 0.075 4 0.920 0.223 0.939 0.145 0.966 0.132 5 0.948 0.206 0.969 0.161 0.998 0.318 6 0.975 0.346 0.997 0.385 1.000 0.943 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 73 Table 20 Rejection Rates For Tests of Model Fit For VM2,7;2,15 Conditions When Sample Sizes Are Uneven N1=100,N2=120 N1=200,N2=240 N1=800,N2=960 ML SB ML SB ML SB La05 2 2A 1 0.123 0.096 0.166 0.076 0.161 0.060 2 0.224 0.106 0.264 0.082 0.246 0.065 3 0.227 0.107 0.257 0.078 0.251 0.065 4 0.780 0.186 0.825 0.150 0.866 0.133 5 0.864 0.194 0.880 0.168 0.971 0.234 6 0.930 0.352 0.977 0.380 1.000 0.958 2 2B 1 0.193 0.098 0.187 0.074 0.219 0.050 2 0.290 0.096 0.293 0.060 0.317 0.054 3 0.278 0.103 0.275 0.081 0.294 0.054 4 0.824 0.187 0.848 0.161 0.869 0.112 5 0.886 0.197 0.906 0.158 0.960 0.207 6 0.945 0.324 0.975 0.341 1.000 0.851 La07 2 2A 1 0.318 0.124 0.330 0.074 0.338 0.061 2 0.538 0.128 0.581 0.090 0.633 0.065 3 0.537 0.144 0.578 0.098 0.623 0.063 4 0.910 0.190 0.923 0.140 0.940 0.125 5 0.953 0.214 0.962 0.184 0.999 0.343 6 0.984 0.434 0.997 0.515 1.000 0.993 2 2B 1 0.389 0.126 0.445 0.079 0.482 0.051 2 0.593 0.116 0.663 0.089 0.724 0.054 3 0.572 0.137 0.655 0.092 0.694 0.062 4 0.918 0.189 0.951 0.142 0.967 0.088 5 0.955 0.208 0.972 0.185 0.999 0.305 6 0.980 0.350 0.998 0.430 1.000 0.940 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 74 Table 21 Rejection Rates For Tests of Model Fit For CN2,10 Conditions When Sample Sizes Are Even N=110 N=220 N=880 ML SB ML SB ML SB La05 2 2A 1 0.993 0.060 0.995 0.071 0.990 0.061 2 0.998 0.100 0.999 0.070 0.996 0.060 3 0.998 0.114 0.998 0.079 0.996 0.064 4 1.000 0.086 0.999 0.051 0.998 0.045 5 1.000 0.090 1.000 0.065 0.999 0.104 6 1.000 0.181 1.000 0.204 1.000 0.822 2 2B 1 0.988 0.088 0.996 0.074 0.995 0.061 2 0.996 0.096 0.998 0.087 0.997 0.056 3 0.995 0.114 0.997 0.098 0.997 0.058 4 0.997 0.077 0.998 0.061 0.999 0.040 5 0.999 0.073 1.000 0.080 1.000 0.112 6 0.999 0.158 1.000 0.189 1.000 0.739 La07 2 2A 1 0.996 0.119 0.990 0.075 0.994 0.052 2 0.996 0.113 1.000 0.076 0.998 0.062 3 0.995 0.128 0.998 0.086 0.997 0.066 4 0.999 0.091 1.000 0.058 0.998 0.041 5 0.999 0.104 1.000 0.080 1.000 0.176 6 0.999 0.228 1.000 0.306 1.000 0.975 2 2B 1 0.999 0.118 0.995 0.089 0.995 0.051 2 1.000 0.118 0.994 0.081 0.998 0.055 3 1.000 0.129 0.996 0.097 0.995 0.062 4 1.000 0.088 1.000 0.077 0.999 0.043 5 1.000 0.104 0.999 0.116 1.000 0.188 6 1.000 0.203 0.999 0.301 1.000 0.934 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 75 Table 22 Rejection Rates For Tests of Model Fit For CN2,10 Conditions When Sample Sizes Are Uneven N1=100,N2=120 N1=200,N2=240 N1=800,N2=960 ML SB ML SB ML SB La05 2 2A 1 0.984 0.075 0.996 0.071 0.993 0.048 2 0.999 0.101 0.997 0.078 0.997 0.054 3 0.999 0.125 0.996 0.094 0.996 0.060 4 1.000 0.100 0.998 0.063 0.999 0.035 5 1.000 0.103 0.998 0.073 1.000 0.073 6 1.000 0.194 0.999 0.216 1.000 0.813 2 2B 1 0.995 0.094 0.994 0.069 0.995 0.055 2 0.998 0.110 0.996 0.075 0.998 0.054 3 0.997 0.135 0.998 0.089 0.995 0.056 4 1.000 0.092 0.998 0.061 1.000 0.034 5 1.000 0.087 0.999 0.068 1.000 0.106 6 1.000 0.158 1.000 0.187 1.000 0.734 La07 2 2A 1 0.993 0.119 0.991 0.085 0.997 0.047 2 0.996 0.116 0.997 0.076 0.999 0.053 3 0.995 0.132 0.996 0.086 0.997 0.057 4 1.000 0.095 0.999 0.062 0.999 0.044 5 0.999 0.104 1.000 0.082 1.000 0.185 6 1.000 0.240 1.000 0.321 1.000 0.976 2 2B 1 0.991 0.123 0.992 0.078 0.997 0.051 2 0.995 0.113 0.995 0.074 1.000 0.053 3 0.994 0.130 0.994 0.087 0.997 0.058 4 0.998 0.092 0.998 0.056 1.000 0.043 5 1.000 0.112 0.999 0.093 1.000 0.201 6 1.000 0.217 1.000 0.271 1.000 0.926 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 76 Table 23 Rejection Rates For Tests of Model Fit For CN0,0;2,10 Conditions When Sample Sizes Are Even N=110 N=220 N=880 ML SB ML SB ML SB La05 2 2A 1 0.827 0.098 0.814 0.087 0.831 0.082 2 0.903 0.120 0.867 0.100 0.883 0.091 3 0.896 0.133 0.847 0.102 0.871 0.092 4 0.930 0.140 0.897 0.100 0.904 0.077 5 0.949 0.141 0.944 0.114 0.977 0.171 6 0.982 0.296 0.992 0.330 1.000 0.950 2 2B 1 0.832 0.105 0.828 0.086 0.821 0.071 2 0.880 0.128 0.875 0.085 0.862 0.067 3 0.874 0.139 0.856 0.091 0.844 0.070 4 0.907 0.133 0.905 0.078 0.903 0.053 5 0.935 0.114 0.941 0.078 0.983 0.165 6 0.965 0.220 0.996 0.268 1.000 0.921 La07 2 2A 1 0.862 0.112 0.847 0.093 0.824 0.065 2 0.902 0.109 0.874 0.097 0.872 0.062 3 0.883 0.124 0.857 0.097 0.848 0.066 4 0.932 0.114 0.896 0.077 0.919 0.047 5 0.954 0.137 0.954 0.130 0.995 0.345 6 0.982 0.336 0.999 0.479 1.000 0.999 2 2B 1 0.847 0.124 0.843 0.109 0.821 0.073 2 0.882 0.119 0.875 0.101 0.855 0.076 3 0.859 0.124 0.850 0.110 0.840 0.079 4 0.916 0.106 0.906 0.095 0.889 0.064 5 0.941 0.108 0.955 0.127 0.995 0.316 6 0.978 0.269 0.999 0.384 1.000 0.993 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 77 Table 24 Rejection Rates For Tests of Model Fit For CN0,0;2,10 Conditions When Sample Sizes Are Uneven N1=100,N2=120 N1=200,N2=240 N1=800,N2=960 ML SB ML SB ML SB La05 2 2A 1 0.803 0.065 0.802 0.087 0.827 0.056 2 0.864 0.108 0.856 0.090 0.875 0.062 3 0.862 0.122 0.845 0.100 0.845 0.065 4 0.887 0.126 0.893 0.095 0.884 0.057 5 0.921 0.130 0.938 0.112 0.971 0.184 6 0.967 0.251 0.991 0.336 1.000 0.957 2 2B 1 0.817 0.101 0.806 0.087 0.828 0.078 2 0.874 0.106 0.845 0.089 0.853 0.076 3 0.856 0.105 0.829 0.092 0.831 0.083 4 0.909 0.116 0.879 0.081 0.886 0.066 5 0.942 0.101 0.924 0.091 0.980 0.173 6 0.975 0.205 0.985 0.274 1.000 0.936 La07 2 2A 1 0.848 0.121 0.835 0.081 0.839 0.059 2 0.885 0.127 0.871 0.076 0.870 0.067 3 0.874 0.130 0.860 0.083 0.841 0.068 4 0.917 0.138 0.907 0.073 0.896 0.055 5 0.944 0.154 0.955 0.132 0.989 0.344 6 0.991 0.351 0.997 0.482 1.000 0.999 2 2B 1 0.857 0.123 0.825 0.096 0.831 0.071 2 0.873 0.120 0.854 0.090 0.858 0.074 3 0.868 0.135 0.837 0.088 0.842 0.076 4 0.916 0.143 0.887 0.091 0.883 0.058 5 0.944 0.145 0.939 0.118 0.993 0.315 6 0.977 0.296 0.992 0.383 1.000 0.995 Note. Models 1 through 6 are the overall tests of model fit for the configural, weak, strong, strict, beyond strict, and full mean and covariance structure invariance model fits respectively. 78 Table 25 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM0,0 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.075 0.081 0.078 0.077 0.077 220 0.057 0.061 0.065 0.060 0.065 880 0.044 0.044 0.044 0.044 0.044 100;120 0.063 0.075 0.072 0.065 0.073 200;240 0.060 0.066 0.065 0.066 0.065 800;960 0.039 0.036 0.036 0.036 0.036 2 2B 110 0.071 0.086 0.082 0.086 0.081 220 0.062 0.066 0.064 0.061 0.064 880 0.053 0.055 0.054 0.055 0.054 100;120 0.058 0.080 0.075 0.070 0.073 200;240 0.042 0.051 0.050 0.047 0.050 800;960 0.043 0.045 0.045 0.047 0.045 La07 2 2A 110 0.056 0.075 0.060 0.066 0.060 220 0.062 0.069 0.067 0.068 0.067 880 0.050 0.049 0.049 0.048 0.049 100;120 0.057 0.061 0.063 0.058 0.063 200;240 0.044 0.049 0.045 0.046 0.045 800;960 0.048 0.048 0.048 0.048 0.048 2 2B 110 0.042 0.050 0.047 0.049 0.047 220 0.058 0.065 0.065 0.064 0.065 880 0.056 0.062 0.062 0.062 0.062 100;120 0.060 0.077 0.066 0.074 0.066 200;240 0.052 0.062 0.059 0.060 0.059 800;960 0.043 0.047 0.046 0.045 0.046 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 79 Table 26 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM0,0 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.051 0.049 0.077 0.050 0.077 220 0.053 0.055 0.073 0.058 0.073 880 0.035 0.037 0.047 0.037 0.047 100;120 0.053 0.030 0.090 0.056 0.090 200;240 0.050 0.037 0.069 0.054 0.069 800;960 0.039 0.030 0.043 0.041 0.043 2 2B 110 0.051 0.056 0.079 0.057 0.079 220 0.046 0.049 0.060 0.050 0.060 880 0.054 0.054 0.058 0.055 0.058 100;120 0.050 0.031 0.092 0.056 0.092 200;240 0.048 0.033 0.068 0.052 0.068 800;960 0.050 0.032 0.053 0.052 0.053 La07 2 2A 110 0.045 0.051 0.088 0.053 0.088 220 0.045 0.045 0.059 0.047 0.059 880 0.058 0.058 0.060 0.058 0.060 100;120 0.038 0.030 0.068 0.041 0.068 200;240 0.059 0.036 0.081 0.061 0.081 800;960 0.047 0.030 0.049 0.047 0.049 2 2B 110 0.049 0.052 0.079 0.053 0.079 220 0.054 0.058 0.073 0.059 0.073 880 0.060 0.062 0.065 0.062 0.065 100;120 0.062 0.033 0.082 0.064 0.082 200;240 0.060 0.040 0.073 0.060 0.073 800;960 0.052 0.031 0.058 0.054 0.058 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 80 Table 27 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM0,0 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.057 0.078 0.073 0.075 0.073 220 0.053 0.065 0.061 0.063 0.061 880 0.046 0.052 0.052 0.051 0.052 100;120 0.054 0.074 0.073 0.072 0.073 200;240 0.047 0.053 0.054 0.052 0.054 800;960 0.050 0.050 0.052 0.050 0.052 2 2B 110 0.054 0.067 0.066 0.064 0.066 220 0.063 0.074 0.073 0.074 0.073 880 0.065 0.068 0.066 0.066 0.066 100;120 0.045 0.070 0.065 0.062 0.065 200;240 0.061 0.072 0.067 0.070 0.067 800;960 0.047 0.047 0.046 0.047 0.046 La07 2 2A 110 0.066 0.077 0.073 0.073 0.073 220 0.048 0.054 0.051 0.053 0.051 880 0.044 0.046 0.046 0.046 0.046 100;120 0.054 0.072 0.072 0.069 0.072 200;240 0.052 0.059 0.055 0.058 0.055 800;960 0.046 0.050 0.049 0.048 0.049 2 2B 110 0.054 0.069 0.060 0.061 0.060 220 0.049 0.060 0.057 0.056 0.057 880 0.046 0.051 0.051 0.051 0.051 100;120 0.052 0.071 0.064 0.061 0.064 200;240 0.061 0.068 0.069 0.068 0.069 800;960 0.050 0.053 0.053 0.052 0.053 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 81 Table 28 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM0,0 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.196 0.222 0.219 0.214 0.219 220 0.341 0.346 0.341 0.345 0.341 880 0.896 0.894 0.894 0.893 0.894 100;120 0.212 0.227 0.226 0.225 0.226 200;240 0.330 0.354 0.351 0.349 0.351 800;960 0.908 0.908 0.911 0.909 0.911 2 2B 110 0.191 0.223 0.207 0.208 0.207 220 0.352 0.367 0.364 0.365 0.364 880 0.936 0.940 0.936 0.937 0.936 100;120 0.194 0.216 0.252 0.218 0.252 200;240 0.342 0.359 0.396 0.363 0.396 800;960 0.901 0.907 0.915 0.911 0.915 La07 2 2A 110 0.329 0.355 0.349 0.347 0.349 220 0.596 0.604 0.604 0.595 0.604 880 0.998 0.998 0.998 0.998 0.998 100;120 0.308 0.338 0.346 0.341 0.346 200;240 0.591 0.594 0.599 0.594 0.599 800;960 0.997 0.997 0.997 0.997 0.997 2 2B 110 0.317 0.356 0.337 0.341 0.337 220 0.611 0.624 0.613 0.616 0.613 880 0.999 0.999 0.999 0.999 0.999 100;120 0.310 0.323 0.377 0.333 0.377 200;240 0.564 0.585 0.634 0.588 0.634 800;960 1.000 1.000 1.000 1.000 1.000 82 Table 29 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM0,0 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.749 0.763 0.813 0.760 0.813 220 0.956 0.956 0.962 0.957 0.962 880 1.000 1.000 1.000 1.000 1.000 100;120 0.764 0.771 0.823 0.771 0.823 200;240 0.973 0.974 0.979 0.974 0.979 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.664 0.665 0.728 0.678 0.728 220 0.938 0.940 0.952 0.940 0.952 880 1.000 1.000 1.000 1.000 1.000 100;120 0.689 0.698 0.765 0.706 0.765 200;240 0.935 0.936 0.947 0.939 0.947 800;960 1.000 1.000 1.000 1.000 1.000 La07 2 2A 110 0.866 0.866 0.904 0.870 0.904 220 0.996 0.996 0.997 0.997 0.997 880 1.000 1.000 1.000 1.000 1.000 100;120 0.874 0.879 0.908 0.880 0.908 200;240 0.993 0.993 0.995 0.993 0.995 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.802 0.805 0.846 0.810 0.846 220 0.979 0.979 0.986 0.981 0.986 880 1.000 1.000 1.000 1.000 1.000 100;120 0.805 0.813 0.871 0.819 0.871 200;240 0.984 0.984 0.989 0.984 0.989 800;960 1.000 1.000 1.000 1.000 1.000 83 Table 30 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.244 0.137 0.052 0.094 0.053 220 0.240 0.093 0.044 0.074 0.044 880 0.257 0.061 0.051 0.059 0.051 100;120 0.244 0.132 0.055 0.102 0.057 200;240 0.250 0.099 0.046 0.082 0.046 800;960 0.222 0.061 0.046 0.058 0.046 2 2B 110 0.287 0.155 0.060 0.104 0.061 220 0.249 0.084 0.044 0.060 0.044 880 0.235 0.055 0.048 0.050 0.048 100;120 0.278 0.135 0.051 0.091 0.053 200;240 0.254 0.106 0.047 0.077 0.047 800;960 0.229 0.049 0.036 0.045 0.036 La07 2 2A 110 0.443 0.124 0.043 0.074 0.043 220 0.474 0.085 0.041 0.052 0.041 880 0.485 0.057 0.048 0.051 0.048 100;120 0.483 0.133 0.056 0.084 0.056 200;240 0.478 0.092 0.043 0.068 0.043 800;960 0.476 0.053 0.046 0.049 0.046 2 2B 110 0.462 0.131 0.055 0.098 0.055 220 0.490 0.084 0.045 0.062 0.045 880 0.493 0.047 0.038 0.042 0.038 100;120 0.493 0.126 0.044 0.075 0.044 200;240 0.480 0.086 0.048 0.069 0.048 800;960 0.540 0.060 0.042 0.047 0.042 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 84 Table 31 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.061 0.050 0.077 0.058 0.077 220 0.060 0.049 0.066 0.051 0.066 880 0.070 0.055 0.061 0.058 0.061 100;120 0.058 0.028 0.076 0.057 0.076 200;240 0.062 0.033 0.062 0.055 0.062 800;960 0.061 0.025 0.050 0.050 0.050 2 2B 110 0.055 0.052 0.084 0.062 0.084 220 0.055 0.051 0.065 0.055 0.065 880 0.052 0.047 0.052 0.050 0.052 100;120 0.050 0.025 0.078 0.054 0.078 200;240 0.068 0.038 0.083 0.070 0.083 800;960 0.030 0.022 0.036 0.032 0.036 La07 2 2A 110 0.079 0.063 0.105 0.069 0.105 220 0.066 0.055 0.069 0.054 0.069 880 0.073 0.053 0.056 0.053 0.056 100;120 0.070 0.036 0.094 0.060 0.094 200;240 0.066 0.026 0.065 0.051 0.065 800;960 0.070 0.027 0.054 0.049 0.054 2 2B 110 0.065 0.060 0.104 0.070 0.104 220 0.043 0.042 0.068 0.046 0.068 880 0.048 0.043 0.046 0.044 0.046 100;120 0.052 0.025 0.083 0.059 0.083 200;240 0.052 0.030 0.067 0.054 0.067 800;960 0.069 0.028 0.070 0.063 0.070 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 85 Table 32 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.844 0.084 0.081 0.077 0.081 220 0.866 0.083 0.080 0.080 0.080 880 0.872 0.055 0.053 0.053 0.053 100;120 0.829 0.089 0.091 0.088 0.091 200;240 0.844 0.066 0.058 0.055 0.058 800;960 0.875 0.053 0.051 0.049 0.051 2 2B 110 0.845 0.086 0.079 0.073 0.079 220 0.847 0.061 0.056 0.056 0.056 880 0.883 0.061 0.058 0.059 0.058 100;120 0.825 0.107 0.094 0.085 0.094 200;240 0.870 0.070 0.068 0.068 0.068 800;960 0.869 0.054 0.051 0.052 0.051 La07 2 2A 110 0.822 0.105 0.083 0.078 0.083 220 0.850 0.070 0.057 0.060 0.057 880 0.874 0.057 0.057 0.055 0.057 100;120 0.838 0.101 0.078 0.071 0.078 200;240 0.819 0.074 0.068 0.062 0.068 800;960 0.871 0.069 0.065 0.065 0.065 2 2B 110 0.838 0.110 0.078 0.084 0.078 220 0.852 0.079 0.066 0.063 0.066 880 0.876 0.053 0.046 0.048 0.046 100;120 0.824 0.114 0.089 0.097 0.089 200;240 0.859 0.065 0.052 0.055 0.052 800;960 0.882 0.061 0.054 0.057 0.054 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 86 Table 33 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.502 0.268 0.070 0.161 0.068 220 0.569 0.237 0.081 0.172 0.081 880 0.933 0.630 0.588 0.620 0.588 100;120 0.520 0.290 0.082 0.198 0.082 200;240 0.601 0.260 0.124 0.205 0.124 800;960 0.919 0.605 0.564 0.589 0.564 2 2B 110 0.479 0.265 0.083 0.191 0.082 220 0.530 0.294 0.117 0.223 0.117 880 0.865 0.617 0.540 0.586 0.540 100;120 0.500 0.296 0.101 0.215 0.101 200;240 0.531 0.287 0.150 0.229 0.150 800;960 0.853 0.614 0.568 0.591 0.568 La07 2 2A 110 0.603 0.246 0.094 0.181 0.094 220 0.758 0.311 0.191 0.275 0.191 880 0.998 0.890 0.874 0.891 0.874 100;120 0.598 0.224 0.099 0.166 0.099 200;240 0.768 0.309 0.194 0.275 0.194 800;960 0.996 0.910 0.897 0.904 0.897 2 2B 110 0.542 0.249 0.116 0.195 0.116 220 0.694 0.336 0.197 0.291 0.197 880 0.953 0.752 0.734 0.746 0.734 100;120 0.541 0.221 0.125 0.192 0.124 200;240 0.681 0.311 0.245 0.294 0.245 800;960 0.956 0.763 0.773 0.764 0.773 87 Table 34 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.759 0.744 0.868 0.766 0.868 220 0.979 0.976 0.990 0.980 0.990 880 1.000 1.000 1.000 1.000 1.000 100;120 0.753 0.748 0.844 0.761 0.844 200;240 0.972 0.972 0.982 0.972 0.982 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.675 0.665 0.796 0.683 0.795 220 0.957 0.950 0.968 0.958 0.968 880 1.000 1.000 1.000 1.000 1.000 100;120 0.699 0.687 0.829 0.714 0.828 200;240 0.938 0.938 0.965 0.942 0.965 800;960 1.000 1.000 1.000 1.000 1.000 La07 2 2A 110 0.857 0.849 0.941 0.863 0.941 220 0.995 0.992 0.998 0.996 0.998 880 1.000 1.000 1.000 1.000 1.000 100;120 0.884 0.880 0.946 0.887 0.945 200;240 0.995 0.994 0.999 0.993 0.999 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.805 0.801 0.906 0.813 0.906 220 0.979 0.978 0.989 0.979 0.989 880 1.000 1.000 1.000 1.000 1.000 100;120 0.795 0.786 0.896 0.807 0.895 200;240 0.990 0.991 0.999 0.991 0.999 800;960 1.000 1.000 1.000 1.000 1.000 88 Table 35 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.279 0.145 0.049 0.106 0.049 220 0.294 0.098 0.038 0.068 0.038 880 0.334 0.079 0.050 0.065 0.050 100;120 0.291 0.170 0.055 0.120 0.055 200;240 0.309 0.118 0.045 0.083 0.045 800;960 0.299 0.057 0.034 0.048 0.034 2 2B 110 0.311 0.150 0.054 0.103 0.055 220 0.322 0.110 0.044 0.089 0.044 880 0.370 0.066 0.042 0.060 0.042 100;120 0.310 0.173 0.058 0.110 0.060 200;240 0.335 0.123 0.053 0.093 0.053 800;960 0.389 0.060 0.043 0.055 0.043 La07 2 2A 110 0.634 0.174 0.058 0.115 0.058 220 0.674 0.126 0.055 0.092 0.055 880 0.726 0.069 0.047 0.060 0.047 100;120 0.660 0.192 0.060 0.116 0.060 200;240 0.707 0.121 0.042 0.079 0.042 800;960 0.721 0.069 0.039 0.054 0.039 2 2B 110 0.656 0.165 0.057 0.111 0.057 220 0.683 0.116 0.036 0.069 0.036 880 0.738 0.062 0.035 0.047 0.035 100;120 0.653 0.173 0.054 0.110 0.054 200;240 0.700 0.105 0.040 0.075 0.040 800;960 0.750 0.078 0.043 0.070 0.043 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 89 Table 36 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.065 0.049 0.069 0.056 0.068 220 0.070 0.051 0.061 0.056 0.061 880 0.069 0.048 0.049 0.054 0.049 100;120 0.071 0.033 0.088 0.060 0.088 200;240 0.073 0.042 0.069 0.061 0.069 800;960 0.074 0.028 0.050 0.050 0.050 2 2B 110 0.067 0.054 0.080 0.063 0.080 220 0.039 0.036 0.051 0.038 0.051 880 0.074 0.058 0.063 0.062 0.063 100;120 0.053 0.030 0.091 0.060 0.091 200;240 0.053 0.026 0.057 0.049 0.057 800;960 0.055 0.021 0.047 0.045 0.047 La07 2 2A 110 0.092 0.066 0.108 0.072 0.108 220 0.085 0.053 0.070 0.062 0.070 880 0.095 0.053 0.061 0.058 0.061 100;120 0.079 0.033 0.093 0.064 0.093 200;240 0.089 0.041 0.071 0.066 0.071 800;960 0.086 0.025 0.050 0.048 0.050 2 2B 110 0.058 0.054 0.100 0.057 0.100 220 0.081 0.060 0.083 0.066 0.083 880 0.063 0.046 0.045 0.048 0.045 100;120 0.046 0.015 0.063 0.038 0.063 200;240 0.057 0.026 0.058 0.045 0.058 800;960 0.077 0.031 0.057 0.055 0.057 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 90 Table 37 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.959 0.101 0.111 0.086 0.111 220 0.970 0.070 0.070 0.063 0.070 880 0.977 0.054 0.051 0.051 0.051 100;120 0.961 0.107 0.097 0.081 0.097 200;240 0.972 0.081 0.073 0.067 0.073 800;960 0.984 0.045 0.041 0.040 0.041 2 2B 110 0.953 0.100 0.088 0.078 0.088 220 0.958 0.076 0.068 0.064 0.068 880 0.983 0.056 0.052 0.053 0.052 100;120 0.960 0.103 0.093 0.081 0.093 200;240 0.964 0.091 0.072 0.072 0.072 800;960 0.980 0.063 0.057 0.056 0.057 La07 2 2A 110 0.955 0.149 0.092 0.091 0.091 220 0.964 0.102 0.074 0.070 0.074 880 0.984 0.059 0.056 0.054 0.056 100;120 0.966 0.135 0.088 0.080 0.088 200;240 0.971 0.089 0.068 0.064 0.068 800;960 0.974 0.062 0.060 0.061 0.060 2 2B 110 0.949 0.129 0.078 0.073 0.078 220 0.956 0.077 0.058 0.049 0.058 880 0.977 0.055 0.046 0.046 0.046 100;120 0.952 0.133 0.094 0.081 0.094 200;240 0.960 0.093 0.064 0.060 0.064 800;960 0.972 0.050 0.041 0.045 0.041 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 91 Table 38 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.633 0.356 0.042 0.209 0.041 220 0.713 0.313 0.058 0.193 0.058 880 0.962 0.515 0.408 0.472 0.408 100;120 0.651 0.386 0.059 0.221 0.058 200;240 0.679 0.280 0.064 0.179 0.064 800;960 0.959 0.481 0.378 0.439 0.378 2 2B 110 0.587 0.344 0.049 0.207 0.048 220 0.608 0.296 0.069 0.190 0.069 880 0.843 0.502 0.347 0.452 0.347 100;120 0.590 0.355 0.069 0.211 0.067 200;240 0.641 0.328 0.095 0.226 0.094 800;960 0.843 0.496 0.401 0.455 0.401 La07 2 2A 110 0.693 0.248 0.043 0.143 0.042 220 0.821 0.272 0.088 0.204 0.088 880 0.999 0.708 0.648 0.695 0.648 100;120 0.702 0.267 0.056 0.152 0.056 200;240 0.825 0.254 0.087 0.174 0.087 800;960 0.997 0.727 0.664 0.722 0.664 2 2B 110 0.640 0.268 0.054 0.169 0.054 220 0.711 0.283 0.119 0.225 0.119 880 0.912 0.598 0.515 0.554 0.515 100;120 0.636 0.272 0.081 0.189 0.081 200;240 0.723 0.291 0.142 0.226 0.142 800;960 0.919 0.594 0.571 0.583 0.571 92 Table 39 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.737 0.722 0.879 0.747 0.870 220 0.972 0.968 0.989 0.973 0.989 880 1.000 1.000 1.000 1.000 1.000 100;120 0.754 0.747 0.866 0.766 0.857 200;240 0.975 0.973 0.991 0.976 0.991 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.731 0.696 0.849 0.737 0.846 220 0.952 0.948 0.974 0.953 0.974 880 1.000 1.000 1.000 1.000 1.000 100;120 0.680 0.664 0.825 0.696 0.824 200;240 0.949 0.947 0.972 0.952 0.972 800;960 1.000 1.000 1.000 1.000 1.000 La07 2 2A 110 0.878 0.867 0.964 0.885 0.955 220 0.996 0.998 1.000 0.997 1.000 880 1.000 1.000 1.000 1.000 1.000 100;120 0.880 0.856 0.962 0.882 0.948 200;240 0.995 0.994 0.999 0.995 0.999 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.817 0.807 0.929 0.823 0.922 220 0.987 0.984 0.995 0.987 0.995 880 1.000 1.000 1.000 1.000 1.000 100;120 0.804 0.808 0.922 0.816 0.915 200;240 0.983 0.978 0.997 0.983 0.997 800;960 1.000 1.000 1.000 1.000 1.000 93 Table 40 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM0,0;2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.141 0.091 0.065 0.076 0.068 220 0.143 0.086 0.054 0.064 0.054 880 0.126 0.063 0.052 0.055 0.052 100;120 0.153 0.107 0.072 0.093 0.074 200;240 0.138 0.080 0.062 0.077 0.062 800;960 0.116 0.046 0.043 0.046 0.043 2 2B 110 0.163 0.122 0.085 0.106 0.084 220 0.143 0.079 0.054 0.066 0.054 880 0.151 0.062 0.053 0.059 0.053 100;120 0.114 0.085 0.059 0.070 0.060 200;240 0.120 0.076 0.052 0.064 0.052 800;960 0.122 0.053 0.050 0.050 0.050 La07 2 2A 110 0.256 0.104 0.056 0.069 0.056 220 0.264 0.074 0.053 0.063 0.053 880 0.293 0.057 0.045 0.050 0.045 100;120 0.239 0.111 0.081 0.092 0.081 200;240 0.253 0.080 0.059 0.065 0.059 800;960 0.217 0.048 0.043 0.045 0.043 2 2B 110 0.241 0.099 0.049 0.068 0.049 220 0.276 0.081 0.055 0.071 0.055 880 0.249 0.035 0.032 0.037 0.032 100;120 0.243 0.095 0.081 0.084 0.081 200;240 0.232 0.071 0.058 0.065 0.058 800;960 0.291 0.063 0.052 0.057 0.052 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 94 Table 41 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM0,0;2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.040 0.057 0.092 0.065 0.092 220 0.029 0.039 0.056 0.043 0.056 880 0.043 0.062 0.068 0.064 0.068 100;120 0.031 0.020 0.061 0.039 0.061 200;240 0.032 0.026 0.063 0.049 0.063 800;960 0.039 0.030 0.056 0.052 0.056 2 2B 110 0.045 0.047 0.075 0.054 0.075 220 0.027 0.036 0.059 0.044 0.059 880 0.041 0.061 0.064 0.061 0.064 100;120 0.040 0.028 0.084 0.057 0.084 200;240 0.036 0.026 0.060 0.053 0.060 800;960 0.028 0.022 0.050 0.045 0.050 La07 2 2A 110 0.031 0.046 0.074 0.050 0.074 220 0.021 0.039 0.053 0.039 0.053 880 0.025 0.049 0.051 0.049 0.051 100;120 0.032 0.028 0.088 0.056 0.088 200;240 0.023 0.020 0.063 0.039 0.063 800;960 0.032 0.031 0.059 0.058 0.059 2 2B 110 0.018 0.044 0.086 0.054 0.086 220 0.022 0.047 0.061 0.049 0.061 880 0.027 0.046 0.049 0.047 0.049 100;120 0.028 0.022 0.084 0.052 0.084 200;240 0.027 0.027 0.072 0.053 0.072 800;960 0.036 0.037 0.060 0.060 0.060 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 95 Table 42 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM0,0;2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.550 0.127 0.122 0.119 0.122 220 0.606 0.099 0.097 0.095 0.097 880 0.610 0.056 0.055 0.054 0.055 100;120 0.522 0.121 0.111 0.113 0.111 200;240 0.587 0.086 0.080 0.084 0.080 800;960 0.611 0.056 0.058 0.057 0.058 2 2B 110 0.577 0.141 0.134 0.135 0.134 220 0.620 0.084 0.080 0.079 0.080 880 0.657 0.063 0.062 0.063 0.062 100;120 0.566 0.129 0.109 0.117 0.109 200;240 0.608 0.094 0.082 0.091 0.082 800;960 0.588 0.055 0.055 0.055 0.055 La07 2 2A 110 0.591 0.120 0.110 0.107 0.110 220 0.617 0.097 0.088 0.089 0.088 880 0.624 0.057 0.056 0.056 0.056 100;120 0.579 0.133 0.107 0.117 0.107 200;240 0.581 0.084 0.067 0.070 0.067 800;960 0.611 0.062 0.062 0.063 0.062 2 2B 110 0.615 0.152 0.146 0.137 0.146 220 0.623 0.089 0.085 0.086 0.085 880 0.631 0.057 0.059 0.057 0.059 100;120 0.542 0.134 0.113 0.116 0.113 200;240 0.582 0.104 0.095 0.099 0.095 800;960 0.609 0.057 0.056 0.057 0.056 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 96 Table 43 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM0,0;2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.334 0.205 0.138 0.151 0.138 220 0.437 0.251 0.192 0.220 0.192 880 0.940 0.776 0.760 0.771 0.760 100;120 0.322 0.208 0.137 0.175 0.136 200;240 0.439 0.266 0.206 0.239 0.206 800;960 0.925 0.798 0.787 0.795 0.787 2 2B 110 0.337 0.219 0.074 0.150 0.074 220 0.468 0.281 0.104 0.217 0.104 880 0.852 0.715 0.608 0.697 0.608 100;120 0.338 0.205 0.069 0.154 0.069 200;240 0.417 0.259 0.119 0.220 0.119 800;960 0.875 0.744 0.682 0.739 0.682 La07 2 2A 110 0.495 0.287 0.249 0.257 0.249 220 0.688 0.377 0.338 0.367 0.338 880 0.998 0.963 0.955 0.960 0.955 100;120 0.451 0.261 0.209 0.224 0.209 200;240 0.700 0.406 0.366 0.393 0.366 800;960 0.999 0.970 0.970 0.970 0.970 2 2B 110 0.449 0.213 0.094 0.160 0.094 220 0.608 0.353 0.159 0.307 0.159 880 0.969 0.896 0.814 0.885 0.814 100;120 0.400 0.195 0.089 0.170 0.088 200;240 0.594 0.313 0.179 0.301 0.179 800;960 0.986 0.925 0.889 0.923 0.889 97 Table 44 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM0,0;2,7 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.732 0.739 0.759 0.745 0.759 220 0.976 0.976 0.975 0.978 0.975 880 1.000 1.000 1.000 1.000 1.000 100;120 0.759 0.757 0.786 0.764 0.786 200;240 0.973 0.973 0.976 0.975 0.976 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.682 0.679 0.703 0.689 0.703 220 0.947 0.945 0.947 0.950 0.947 880 1.000 1.000 1.000 1.000 1.000 100;120 0.663 0.665 0.682 0.673 0.682 200;240 0.949 0.949 0.947 0.950 0.947 800;960 1.000 1.000 1.000 1.000 1.000 La07 2 2A 110 0.879 0.879 0.881 0.881 0.881 220 0.997 0.997 0.997 0.998 0.997 880 1.000 1.000 1.000 1.000 1.000 100;120 0.881 0.878 0.887 0.885 0.887 200;240 0.998 0.995 0.993 0.996 0.993 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.804 0.802 0.809 0.811 0.809 220 0.987 0.986 0.981 0.988 0.981 880 1.000 1.000 1.000 1.000 1.000 100;120 0.780 0.778 0.784 0.784 0.784 200;240 0.992 0.991 0.988 0.992 0.988 800;960 1.000 1.000 1.000 1.000 1.000 98 Table 45 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are VM2,7;2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.261 0.146 0.064 0.103 0.066 220 0.303 0.112 0.047 0.081 0.047 880 0.277 0.059 0.040 0.053 0.040 100;120 0.251 0.145 0.053 0.095 0.054 200;240 0.276 0.091 0.047 0.068 0.047 800;960 0.292 0.062 0.048 0.056 0.048 2 2B 110 0.267 0.143 0.058 0.104 0.060 220 0.275 0.083 0.036 0.067 0.036 880 0.300 0.048 0.030 0.047 0.030 100;120 0.302 0.151 0.050 0.099 0.049 200;240 0.293 0.098 0.046 0.071 0.046 800;960 0.294 0.052 0.038 0.048 0.038 La07 2 2A 110 0.560 0.152 0.043 0.080 0.043 220 0.563 0.101 0.043 0.067 0.043 880 0.614 0.073 0.044 0.059 0.044 100;120 0.520 0.144 0.048 0.086 0.048 200;240 0.597 0.105 0.045 0.071 0.045 800;960 0.618 0.073 0.053 0.064 0.053 2 2B 110 0.556 0.136 0.043 0.083 0.043 220 0.593 0.120 0.053 0.079 0.053 880 0.664 0.060 0.042 0.056 0.042 100;120 0.537 0.128 0.051 0.087 0.051 200;240 0.617 0.088 0.038 0.058 0.038 800;960 0.625 0.063 0.045 0.055 0.045 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 99 Table 46 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are VM2,7;2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.067 0.051 0.076 0.063 0.076 220 0.067 0.052 0.067 0.055 0.067 880 0.061 0.045 0.050 0.047 0.050 100;120 0.063 0.032 0.093 0.055 0.093 200;240 0.066 0.027 0.064 0.051 0.064 800;960 0.083 0.031 0.061 0.059 0.061 2 2B 110 0.057 0.050 0.084 0.055 0.084 220 0.060 0.051 0.064 0.058 0.064 880 0.060 0.051 0.054 0.053 0.054 100;120 0.055 0.033 0.076 0.053 0.076 200;240 0.058 0.035 0.063 0.055 0.063 800;960 0.059 0.027 0.053 0.053 0.053 La07 2 2A 110 0.069 0.049 0.092 0.054 0.092 220 0.064 0.046 0.060 0.048 0.060 880 0.079 0.052 0.058 0.056 0.058 100;120 0.076 0.030 0.084 0.053 0.084 200;240 0.085 0.030 0.072 0.062 0.072 800;960 0.084 0.024 0.053 0.051 0.053 2 2B 110 0.071 0.058 0.100 0.064 0.100 220 0.072 0.054 0.080 0.063 0.080 880 0.077 0.055 0.058 0.056 0.058 100;120 0.077 0.028 0.110 0.073 0.110 200;240 0.070 0.025 0.074 0.058 0.074 800;960 0.075 0.030 0.056 0.053 0.056 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 100 Table 47 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are VM2,7;2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.924 0.109 0.116 0.097 0.116 220 0.942 0.076 0.070 0.066 0.070 880 0.954 0.054 0.046 0.048 0.046 100;120 0.915 0.098 0.084 0.082 0.084 200;240 0.935 0.077 0.070 0.068 0.070 800;960 0.955 0.063 0.062 0.058 0.062 2 2B 110 0.927 0.128 0.121 0.108 0.121 220 0.946 0.079 0.072 0.072 0.072 880 0.960 0.050 0.047 0.048 0.047 100;120 0.914 0.113 0.099 0.093 0.099 200;240 0.940 0.083 0.077 0.080 0.077 800;960 0.959 0.054 0.046 0.049 0.046 La07 2 2A 110 0.925 0.134 0.094 0.094 0.094 220 0.942 0.086 0.067 0.065 0.067 880 0.958 0.067 0.062 0.063 0.062 100;120 0.932 0.128 0.096 0.092 0.096 200;240 0.942 0.072 0.057 0.059 0.057 800;960 0.938 0.076 0.070 0.072 0.070 2 2B 110 0.911 0.127 0.084 0.089 0.084 220 0.934 0.077 0.065 0.068 0.065 880 0.941 0.066 0.062 0.062 0.062 100;120 0.911 0.127 0.092 0.092 0.092 200;240 0.934 0.089 0.071 0.073 0.071 800;960 0.947 0.054 0.050 0.049 0.050 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 101 Table 48 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are VM2,7;2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.572 0.314 0.052 0.184 0.052 220 0.619 0.256 0.082 0.170 0.082 880 0.948 0.575 0.498 0.554 0.498 100;120 0.576 0.330 0.064 0.210 0.064 200;240 0.589 0.265 0.060 0.162 0.060 800;960 0.961 0.551 0.489 0.539 0.489 2 2B 110 0.543 0.303 0.043 0.186 0.042 220 0.565 0.299 0.049 0.193 0.049 880 0.844 0.530 0.367 0.480 0.367 100;120 0.527 0.318 0.050 0.201 0.050 200;240 0.536 0.258 0.069 0.195 0.069 800;960 0.833 0.519 0.377 0.473 0.377 La07 2 2A 110 0.644 0.249 0.058 0.162 0.058 220 0.789 0.293 0.138 0.236 0.138 880 1.000 0.819 0.783 0.816 0.783 100;120 0.645 0.221 0.066 0.151 0.066 200;240 0.805 0.265 0.129 0.220 0.129 800;960 0.998 0.810 0.780 0.807 0.780 2 2B 110 0.566 0.224 0.026 0.112 0.026 220 0.644 0.234 0.066 0.165 0.066 880 0.936 0.643 0.505 0.621 0.505 100;120 0.547 0.249 0.059 0.174 0.059 200;240 0.661 0.250 0.097 0.215 0.097 800;960 0.935 0.649 0.562 0.631 0.562 102 Table 49 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are VM2,7;2,15 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.760 0.754 0.873 0.768 0.873 220 0.961 0.955 0.979 0.962 0.979 880 1.000 1.000 1.000 1.000 1.000 100;120 0.738 0.734 0.866 0.751 0.864 200;240 0.963 0.953 0.977 0.963 0.977 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.688 0.667 0.829 0.698 0.829 220 0.930 0.926 0.960 0.932 0.960 880 1.000 1.000 1.000 1.000 1.000 100;120 0.672 0.650 0.822 0.682 0.819 200;240 0.932 0.921 0.962 0.933 0.962 800;960 1.000 1.000 1.000 1.000 1.000 La07 2 2A 110 0.877 0.860 0.947 0.882 0.946 220 0.987 0.986 0.999 0.988 0.999 880 1.000 1.000 1.000 1.000 1.000 100;120 0.872 0.862 0.952 0.873 0.951 200;240 0.989 0.987 0.997 0.990 0.997 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.772 0.765 0.897 0.782 0.895 220 0.981 0.979 0.993 0.981 0.993 880 1.000 1.000 1.000 1.000 1.000 100;120 0.804 0.789 0.916 0.812 0.916 200;240 0.983 0.982 0.994 0.987 0.994 800;960 1.000 1.000 1.000 1.000 1.000 103 Table 50 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are CN2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.678 0.153 0.071 0.126 0.076 220 0.616 0.104 0.060 0.082 0.058 880 0.580 0.058 0.052 0.055 0.052 100;120 0.683 0.191 0.074 0.143 0.090 200;240 0.638 0.100 0.048 0.083 0.049 800;960 0.601 0.058 0.054 0.056 0.054 2 2B 110 0.675 0.149 0.068 0.109 0.072 220 0.606 0.094 0.050 0.075 0.052 880 0.602 0.064 0.051 0.059 0.051 100;120 0.660 0.158 0.070 0.123 0.083 200;240 0.600 0.086 0.040 0.074 0.039 800;960 0.572 0.052 0.039 0.049 0.039 La07 2 2A 110 0.591 0.107 0.033 0.075 0.033 220 0.573 0.071 0.041 0.060 0.041 880 0.585 0.058 0.050 0.055 0.050 100;120 0.628 0.111 0.032 0.072 0.032 200;240 0.572 0.078 0.045 0.061 0.045 800;960 0.566 0.059 0.050 0.054 0.050 2 2B 110 0.610 0.114 0.042 0.084 0.042 220 0.593 0.091 0.056 0.076 0.056 880 0.560 0.074 0.066 0.072 0.066 100;120 0.614 0.112 0.052 0.087 0.052 200;240 0.614 0.080 0.041 0.069 0.041 800;960 0.578 0.058 0.047 0.050 0.047 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 104 Table 51 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are CN2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.110 0.076 0.151 0.080 0.151 220 0.081 0.046 0.087 0.057 0.086 880 0.112 0.069 0.078 0.070 0.078 100;120 0.093 0.036 0.143 0.064 0.141 200;240 0.100 0.044 0.104 0.081 0.103 800;960 0.080 0.030 0.050 0.043 0.050 2 2B 110 0.094 0.053 0.145 0.084 0.143 220 0.080 0.047 0.082 0.052 0.082 880 0.079 0.053 0.062 0.056 0.062 100;120 0.095 0.054 0.147 0.074 0.148 200;240 0.095 0.036 0.107 0.068 0.107 800;960 0.064 0.029 0.050 0.043 0.050 La07 2 2A 110 0.078 0.061 0.149 0.065 0.149 220 0.063 0.050 0.087 0.054 0.087 880 0.068 0.043 0.051 0.043 0.051 100;120 0.057 0.025 0.121 0.045 0.121 200;240 0.070 0.027 0.086 0.056 0.086 800;960 0.068 0.027 0.050 0.044 0.050 2 2B 110 0.057 0.048 0.126 0.050 0.126 220 0.073 0.053 0.099 0.060 0.099 880 0.074 0.053 0.062 0.053 0.062 100;120 0.064 0.030 0.126 0.053 0.126 200;240 0.070 0.036 0.092 0.062 0.092 800;960 0.054 0.023 0.049 0.044 0.049 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 105 Table 52 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are CN2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.648 0.036 0.011 0.024 0.011 220 0.660 0.020 0.010 0.018 0.010 880 0.666 0.022 0.020 0.021 0.020 100;120 0.632 0.045 0.024 0.030 0.023 200;240 0.676 0.026 0.017 0.023 0.017 800;960 0.674 0.012 0.007 0.010 0.007 2 2B 110 0.688 0.043 0.016 0.029 0.016 220 0.674 0.020 0.016 0.017 0.016 880 0.674 0.022 0.015 0.019 0.015 100;120 0.666 0.024 0.011 0.016 0.011 200;240 0.667 0.017 0.008 0.014 0.008 800;960 0.669 0.016 0.017 0.017 0.017 La07 2 2A 110 0.643 0.034 0.012 0.026 0.012 220 0.668 0.027 0.019 0.022 0.019 880 0.640 0.015 0.014 0.014 0.014 100;120 0.636 0.034 0.009 0.021 0.009 200;240 0.666 0.026 0.013 0.020 0.013 800;960 0.661 0.016 0.012 0.015 0.012 2 2B 110 0.676 0.040 0.014 0.029 0.014 220 0.680 0.023 0.016 0.019 0.016 880 0.652 0.016 0.015 0.015 0.015 100;120 0.672 0.024 0.013 0.019 0.013 200;240 0.672 0.015 0.009 0.014 0.009 800;960 0.658 0.023 0.018 0.021 0.018 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 106 Table 53 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are CN2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.564 0.123 0.049 0.092 0.047 220 0.615 0.133 0.078 0.098 0.078 880 0.876 0.366 0.344 0.354 0.344 100;120 0.561 0.126 0.075 0.090 0.073 200;240 0.617 0.139 0.084 0.118 0.084 800;960 0.864 0.321 0.302 0.311 0.302 2 2B 110 0.562 0.142 0.063 0.090 0.061 220 0.613 0.151 0.094 0.116 0.094 880 0.891 0.412 0.377 0.390 0.377 100;120 0.541 0.132 0.096 0.096 0.096 200;240 0.616 0.136 0.115 0.117 0.115 800;960 0.892 0.411 0.420 0.398 0.420 La07 2 2A 110 0.579 0.131 0.064 0.099 0.064 220 0.748 0.209 0.154 0.184 0.154 880 0.969 0.678 0.670 0.669 0.670 100;120 0.590 0.152 0.087 0.116 0.087 200;240 0.687 0.178 0.152 0.169 0.152 800;960 0.978 0.695 0.693 0.689 0.693 2 2B 110 0.554 0.108 0.061 0.091 0.061 220 0.715 0.199 0.161 0.177 0.161 880 0.983 0.716 0.707 0.705 0.707 100;120 0.582 0.120 0.100 0.112 0.100 200;240 0.716 0.189 0.201 0.185 0.201 800;960 0.973 0.684 0.705 0.685 0.705 107 Table 54 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are CN2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.726 0.648 0.905 0.749 0.887 220 0.970 0.962 0.992 0.973 0.992 880 1.000 1.000 1.000 1.000 1.000 100;120 0.693 0.621 0.892 0.710 0.875 200;240 0.953 0.938 0.983 0.956 0.983 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.698 0.634 0.874 0.713 0.866 220 0.935 0.918 0.976 0.934 0.976 880 1.000 1.000 1.000 1.000 1.000 100;120 0.652 0.576 0.854 0.669 0.850 200;240 0.937 0.911 0.976 0.941 0.976 800;960 1.000 1.000 1.000 1.000 1.000 La07 2 2A 110 0.875 0.863 0.968 0.877 0.967 220 0.995 0.993 0.998 0.995 0.998 880 1.000 1.000 1.000 1.000 1.000 100;120 0.889 0.879 0.956 0.894 0.956 200;240 0.994 0.992 0.998 0.995 0.998 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.795 0.781 0.909 0.805 0.909 220 0.983 0.982 0.993 0.983 0.993 880 1.000 1.000 1.000 1.000 1.000 100;120 0.790 0.784 0.914 0.802 0.914 200;240 0.975 0.973 0.988 0.976 0.988 800;960 1.000 1.000 1.000 1.000 1.000 108 Table 55 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Weak Invariance When Data Are CN0,0;2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.443 0.153 0.086 0.139 0.090 220 0.377 0.106 0.075 0.099 0.076 880 0.344 0.068 0.059 0.065 0.059 100;120 0.362 0.138 0.086 0.119 0.087 200;240 0.318 0.072 0.060 0.072 0.059 800;960 0.305 0.079 0.068 0.073 0.068 2 2B 110 0.340 0.151 0.097 0.134 0.094 220 0.289 0.078 0.061 0.076 0.061 880 0.332 0.061 0.053 0.058 0.053 100;120 0.337 0.132 0.097 0.136 0.100 200;240 0.285 0.076 0.065 0.083 0.066 800;960 0.261 0.058 0.057 0.058 0.057 La07 2 2A 110 0.360 0.092 0.046 0.071 0.046 220 0.343 0.074 0.053 0.066 0.053 880 0.332 0.059 0.047 0.053 0.047 100;120 0.318 0.117 0.067 0.090 0.067 200;240 0.296 0.073 0.053 0.062 0.053 800;960 0.322 0.058 0.055 0.058 0.055 2 2B 110 0.310 0.104 0.052 0.081 0.052 220 0.274 0.067 0.047 0.061 0.047 880 0.294 0.058 0.056 0.062 0.056 100;120 0.257 0.089 0.060 0.081 0.060 200;240 0.250 0.071 0.052 0.062 0.052 800;960 0.279 0.058 0.052 0.058 0.052 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 109 Table 56 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strong Invariance When Data Are CN0,0;2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.083 0.059 0.110 0.070 0.108 220 0.060 0.049 0.069 0.052 0.068 880 0.074 0.062 0.068 0.062 0.068 100;120 0.074 0.033 0.092 0.054 0.092 200;240 0.075 0.037 0.086 0.063 0.086 800;960 0.085 0.032 0.067 0.062 0.067 2 2B 110 0.076 0.056 0.111 0.067 0.111 220 0.060 0.048 0.078 0.050 0.078 880 0.080 0.062 0.070 0.060 0.070 100;120 0.056 0.024 0.099 0.054 0.099 200;240 0.075 0.024 0.077 0.058 0.077 800;960 0.057 0.023 0.047 0.042 0.047 La07 2 2A 110 0.054 0.046 0.087 0.051 0.087 220 0.052 0.045 0.062 0.045 0.062 880 0.070 0.058 0.063 0.059 0.063 100;120 0.063 0.022 0.100 0.057 0.100 200;240 0.070 0.034 0.084 0.058 0.084 800;960 0.061 0.032 0.059 0.054 0.059 2 2B 110 0.056 0.050 0.111 0.051 0.111 220 0.052 0.045 0.076 0.050 0.076 880 0.047 0.042 0.045 0.040 0.045 100;120 0.060 0.030 0.104 0.059 0.104 200;240 0.059 0.026 0.076 0.049 0.076 800;960 0.060 0.034 0.057 0.055 0.057 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 110 Table 57 Type I Error Rates Of Chi-Square Difference Tests For The Test Of Strict Invariance When Data Are CN0,0;2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.407 0.064 0.084 0.049 0.083 220 0.402 0.059 0.071 0.054 0.071 880 0.389 0.020 0.019 0.019 0.019 100;120 0.392 0.066 0.076 0.057 0.077 200;240 0.385 0.047 0.060 0.040 0.060 800;960 0.386 0.030 0.029 0.028 0.029 2 2B 110 0.411 0.069 0.085 0.059 0.084 220 0.386 0.043 0.057 0.038 0.057 880 0.411 0.028 0.030 0.026 0.030 100;120 0.383 0.083 0.075 0.064 0.075 200;240 0.387 0.057 0.060 0.055 0.060 800;960 0.390 0.023 0.024 0.021 0.024 La07 2 2A 110 0.384 0.041 0.067 0.035 0.067 220 0.375 0.046 0.068 0.041 0.068 880 0.381 0.019 0.029 0.019 0.029 100;120 0.367 0.066 0.081 0.054 0.081 200;240 0.376 0.038 0.043 0.035 0.043 800;960 0.367 0.032 0.037 0.032 0.037 2 2B 110 0.412 0.065 0.096 0.061 0.096 220 0.410 0.025 0.044 0.025 0.044 880 0.393 0.020 0.020 0.020 0.020 100;120 0.397 0.072 0.085 0.058 0.085 200;240 0.379 0.048 0.056 0.044 0.056 800;960 0.358 0.026 0.029 0.026 0.029 Note. Type I error rates highlighted in green exceed the acceptable upper limit of 6.25%. Type I error rates highlighted in red fall below the acceptable lower limit of 3.75%. 111 Table 58 Power Of Chi-Square Difference Tests For The Test Of Beyond Strict Invariance When Data Are CN0,0;2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.417 0.170 0.163 0.143 0.152 220 0.475 0.156 0.160 0.137 0.159 880 0.873 0.537 0.518 0.526 0.518 100;120 0.373 0.124 0.134 0.101 0.130 200;240 0.478 0.176 0.182 0.165 0.182 800;960 0.871 0.593 0.591 0.592 0.591 2 2B 110 0.358 0.094 0.047 0.056 0.046 220 0.492 0.150 0.064 0.130 0.064 880 0.899 0.587 0.404 0.574 0.404 100;120 0.390 0.101 0.053 0.072 0.051 200;240 0.510 0.158 0.068 0.151 0.068 800;960 0.894 0.620 0.497 0.621 0.497 La07 2 2A 110 0.455 0.189 0.182 0.174 0.181 220 0.654 0.288 0.257 0.271 0.257 880 0.990 0.903 0.883 0.900 0.883 100;120 0.420 0.155 0.156 0.146 0.156 200;240 0.648 0.281 0.267 0.283 0.267 800;960 0.982 0.891 0.887 0.893 0.887 2 2B 110 0.477 0.134 0.058 0.108 0.057 220 0.638 0.225 0.075 0.195 0.075 880 0.990 0.891 0.786 0.886 0.786 100;120 0.422 0.115 0.053 0.108 0.053 200;240 0.627 0.237 0.124 0.243 0.124 800;960 0.982 0.873 0.818 0.874 0.818 112 Table 59 Rejection Rates Of Chi-Square Difference Tests For The Test Of Full Mean And Covariance Structure Invariance When Data Are CN0,0;2,10 D DS0 DSB1 DSB10 DSBH La05 2 2A 110 0.743 0.721 0.863 0.765 0.862 220 0.964 0.962 0.983 0.968 0.983 880 1.000 1.000 1.000 1.000 1.000 100;120 0.726 0.698 0.851 0.741 0.849 200;240 0.975 0.976 0.990 0.979 0.990 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.695 0.670 0.824 0.712 0.822 220 0.956 0.949 0.980 0.958 0.980 880 1.000 1.000 1.000 1.000 1.000 100;120 0.622 0.600 0.772 0.643 0.772 200;240 0.937 0.934 0.964 0.938 0.964 800;960 1.000 1.000 1.000 1.000 1.000 La07 2 2A 110 0.876 0.876 0.938 0.882 0.938 220 0.997 0.996 0.999 0.997 0.999 880 1.000 1.000 1.000 1.000 1.000 100;120 0.868 0.868 0.933 0.874 0.933 200;240 0.989 0.988 0.995 0.990 0.995 800;960 1.000 1.000 1.000 1.000 1.000 2 2B 110 0.814 0.806 0.900 0.820 0.900 220 0.984 0.985 0.993 0.984 0.993 880 1.000 1.000 1.000 1.000 1.000 100;120 0.792 0.798 0.894 0.806 0.894 200;240 0.981 0.983 0.990 0.983 0.990 800;960 1.000 1.000 1.000 1.000 1.000
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Type I error rates and power of robust chi-square difference...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Type I error rates and power of robust chi-square difference tests in investigations of measurement invariance Brace, Jordan 2015
pdf
Page Metadata
Item Metadata
Title | Type I error rates and power of robust chi-square difference tests in investigations of measurement invariance |
Creator |
Brace, Jordan |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | A Monte Carlo simulation study was conducted to investigate Type I error rates and power of several corrections for non-normality to the normal theory chi-square difference test in the context of evaluating measurement invariance via Structural Equation Modeling (SEM). Studied statistics include: 1) the uncorrected difference test, DML, 2) Satorra’s (2000) original computationally intensive correction, DS0, 3) Satorra and Bentler’s (2001) simplified correction, DSB1, 4) Satorra and Bentler’s (2010) strictly positive correction, DSB10, and 5) a hybrid procedure, DSBH (Asparouhov & Muthén, 2010), which is equal to DSB1 when DSB1 is positive, and DSB10 when DSB1 is negative. Multiple-group data were generated from confirmatory factor analytic models invariant on some but not all parameters. A series of six nested invariance models was fit to each generated dataset. Population parameter values had little influence on the relative performance of the scaled statistics, while level of invariance being tested did. DS0 was found to over-reject in many Type I error conditions, and it is suspected that high observed rejection rates in power conditions are due to a general positive bias. DSB1 generally performed well in Type I error conditions, but severely under-rejected in power conditions. DSB10 performed reasonably well and consistently in both Type I error and power conditions. We recommend that researchers use the strictly positive corrected difference test, DSB10, to evaluate measurement invariance when data are not normally distributed. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-08-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166587 |
URI | http://hdl.handle.net/2429/54538 |
Degree |
Master of Arts - MA |
Program |
Psychology |
Affiliation |
Arts, Faculty of Psychology, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2015-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2015_september_brace_jordan.pdf [ 2.26MB ]
- Metadata
- JSON: 24-1.0166587.json
- JSON-LD: 24-1.0166587-ld.json
- RDF/XML (Pretty): 24-1.0166587-rdf.xml
- RDF/JSON: 24-1.0166587-rdf.json
- Turtle: 24-1.0166587-turtle.txt
- N-Triples: 24-1.0166587-rdf-ntriples.txt
- Original Record: 24-1.0166587-source.json
- Full Text
- 24-1.0166587-fulltext.txt
- Citation
- 24-1.0166587.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166587/manifest