UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Applying modern machine learning to the number of latent variables problem in principal components analysis and principal axis factoring Draper, Zakary Andrew


The related questions of how many components to retain in principal components analysis and how many factors to retain in principal axis factoring have been the subject of many studies over the past hundred years. Retaining too many—or too few—components or factors may lead to the development of constructs based on erroneous findings. There are many component and factor retention rules; however, because the validity of these rules is often dependent on the characteristics of the data being tested, no single rule is valid for all datasets. This paper presents a new approach to component and factor retention: using machine learning to incorporate information from several previously developed retention rules—including parallel analysis, the minimum average partial test, and others—into a single, classification function. Four classifiers were trained to predict the number of components or factors in simulated datasets. Three of these classifiers provided the highest overall accuracy of the rules tested and were unbiased in their predictions across 129,600 samples. The best classifier showed an absolute increase in accuracy of 10.9% compared to the most accurate traditional retention rule. These results suggest that use of machine learning classification could substantially improve confidence in exploratory factor analysis findings.

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International