Two-stage maximum likelihood approach for item-level missing data in regression

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Two-stage maximum likelihood approach for item-level missing data in regression Chen, Lihan

Abstract

Psychologists often use scales composed of multiple items to measure underlying constructs, such as well-being, depression, and personality traits. Missing data often occurs at the item-level. For example, participants may skip items on a questionnaire for various reasons. If variables in the dataset can account for the missingness, the data is missing at random (MAR). Modern missing data approaches can deal with MAR missing data effectively, but existing analytical approaches cannot accommodate item-level missing data. A very common practice in psychology is to average all available items to produce scale means when there is missing data. This approach, called available-case maximum likelihood (ACML) may produce biased results in addition to incorrect standard errors. Another approach is scale-level full information maximum likelihood (SL-FIML), which treats the whole scale as missing if even one item is missing. SL-FIML is inefficient and prone to bias. A new analytical approach, called the two-stage maximum likelihood approach (TSML), was recently developed as an alternative (Savalei & Rhemtulla, 2017b). The original work showed that the method outperformed ACML and SL-FIML in structural equation models with parcels. The current simulation study examined the performance of ACML, SL- FIML, and TSML in the context of bivariate regression. It was shown that when item loadings or item means are unequal within the composite, ACML and SL-FIML produced biased estimates on regression coefficients under MAR. Outside of convergence issues when the sample size is small and the number of variables is large, TSML performed well in all simulated conditions, showing little bias, high efficiency, and good coverage. Additionally, the current study investigated how changing the strength of the MAR mechanism may lead to drastically different conclusions in simulation studies. A preliminary definition of MAR strength is provided in order to demonstrate its impact. Recommendations are made to future simulation studies on missing data.

Item Metadata

Title	Two-stage maximum likelihood approach for item-level missing data in regression
Creator	Chen, Lihan
Publisher	University of British Columbia
Date Issued	2017
Description	Psychologists often use scales composed of multiple items to measure underlying constructs, such as well-being, depression, and personality traits. Missing data often occurs at the item-level. For example, participants may skip items on a questionnaire for various reasons. If variables in the dataset can account for the missingness, the data is missing at random (MAR). Modern missing data approaches can deal with MAR missing data effectively, but existing analytical approaches cannot accommodate item-level missing data. A very common practice in psychology is to average all available items to produce scale means when there is missing data. This approach, called available-case maximum likelihood (ACML) may produce biased results in addition to incorrect standard errors. Another approach is scale-level full information maximum likelihood (SL-FIML), which treats the whole scale as missing if even one item is missing. SL-FIML is inefficient and prone to bias. A new analytical approach, called the two-stage maximum likelihood approach (TSML), was recently developed as an alternative (Savalei & Rhemtulla, 2017b). The original work showed that the method outperformed ACML and SL-FIML in structural equation models with parcels. The current simulation study examined the performance of ACML, SL- FIML, and TSML in the context of bivariate regression. It was shown that when item loadings or item means are unequal within the composite, ACML and SL-FIML produced biased estimates on regression coefficients under MAR. Outside of convergence issues when the sample size is small and the number of variables is large, TSML performed well in all simulated conditions, showing little bias, high efficiency, and good coverage. Additionally, the current study investigated how changing the strength of the MAR mechanism may lead to drastically different conclusions in simulation studies. A preliminary definition of MAR strength is provided in order to demonstrate its impact. Recommendations are made to future simulation studies on missing data.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2017-08-18
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0354497
URI	http://hdl.handle.net/2429/62724
Degree	Master of Arts - MA
Program	Psychology
Affiliation	Arts, Faculty of; Psychology, Department of
Degree Grantor	University of British Columbia
Graduation Date	2017-09
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Two-stage maximum likelihood approach for item-level missing data in regression Chen, Lihan

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights