ASSESSING UNIDIMENSIONALITY OF PSYCHOLOGICAL SCALES: USING INDIVIDUAL AND INTEGRATIVE CRITERIA FROM FACTOR ANALYSIS by . SUZANNE LYNN SLOCUM M.A. , Boston College (1999) B.A. (Honors Economics), Albion College (1995) A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE F A C U L T Y OF G R A D U A T E STUDIES Measurement, Evaluation and Research Methodology Program v THE UNIVERSITY OF BRITISH C O L U M B I A May 2005 Â© Suzanne Lynn Slocum, 2005 11 Abstract Whenever one uses a composite scale score from item responses, one is tacitly assuming that the scale is dominantly unidimensional. Investigating the unidimensionality of item response data is an essential component of construct validity. Yet, there is no universally accepted technique or set of rules to determine the number of factors to retain when assessing the dimensionality of item response data. Typically factor analysis is used with the eigenvalues-greater-than-one rule, the ratio of first-to-second eigenvalues, parallel analysis (PA), root-mean-square-error-of-approximation ( R M S E A ) , or hypothesis testing approaches involving chi-square tests from Maximum Likelihood (ML) or Generalized Least Squares (GLS) estimation. The purpose of this study was to investigate how these various procedures perform individually and in combination when assessing the unidimensionality of item response data via a computer simulated design. Conditions such as sample size, magnitude of communality, distribution of item responses, proportion of communality on second factor, and the number of items with non-zero loadings on the second factor were varied. Results indicate that there was no one individual decision-making method that identified undimensionality under all conditions manipulated. A l l individual decision-making methods failed to detect unidimensionality for the case where sample size was small, magnitude of communality was low, and item distributions were skewed. In addition, combination methods performed better than any one individual decision-making rule in certain sets of conditions. A set of guidelines and a new statistical methodology are provided for researchers. A future program of research is also illustrated. iii Table of Contents ABSTRACT II TABLE OF CONTENTS Ill LIST OF TABLES ..VI LIST OF FIGURES VIII ACKNOWLEDGEMENTS X CHAPTER 1 1 INTRODUCTION TO THE PROBLEM 1 B A C K G R O U N D 1 P R O B L E M U N D E R I N V E S T I G A T I O N 3 A P P L I C A T I O N O F T H E D E C I S I O N - M A K I N G R U L E S A N D I N D I C E S 6 O B J E C T I V E S A N D P U R P O S E O F T H E S T U D Y 11 CHAPTER II 13 INTRODUCTION TO THE FACTOR ANALYTIC MODEL 13 D I M E N S I O N A L I T Y 15 Strict and Essential Unidimensionality 17 F A C T O R A N A L Y S I S P R O C E D U R E S A N D C O N C E P T S 17 Factor Analytic Model 18 Communality 21 Assumptions of the Common Factor Model 23 Correlated and Uncorrelated Factor Models 23 Eigenvalues and Eigenvectors 25 CHAPTER III 30 REVIEW OF THE RELEVANT LITERATURE 30 M E T H O D S F O R D E T E R M I N I N G T H E N U M B E R O F F A C T O R S T O R E T A I N 3 0 Definition of Decision-Making Rules and Indices 32 I T E M R E S P O N S E T H E O R Y 35 Investigating the Dimensionality of Test Item Data: The Role oflRT. 37 Simulation Studies 38 I N V E S T I G A T I N G T H E R E C O V E R Y O F P O P U L A T I O N F A C T O R S T R U C T U R E 41 Theoretical Development of the Sources of Error in Factor Analysis 41 Over determination 44 Simulation Studies 45 P R E V I O U S R E S E A R C H F I N D I N G S O N D E C I S I O N - M A K I N G R U L E S A N D I N D I C E S 4 9 R E S E A R C H Q U E S T I O N S 53 iv Research Question One 54 Research Question Two 54 Research Question Three 55 R A T I O N A L E A N D G E N E R A L P R E D I C T I O N S O F T H I S S T U D Y 55 M E A N I N G F U L A N D A P P R O P R I A T E N U M B E R O F F A C T O R S : F I N D I N G S A N D R E C O M M E N D A T I O N S . 58 Appropriate Number of Factors to Retain in This Dissertation 64 S I G N I F I C A N C E O F T H E D I S S E R T A T I O N 66 Scope of the Study 66 C U R R E N T R E S E A R C H P R A C T I C E S 67 C H A P T E R I V â€¢ 68 A N I N V E S T I G A T I O N O F T H E C U R R E N T R E S E A R C H P R A C T I C E S O F A S S E S S I N G D I M E N S I O N A L I T Y 68 V I A E X P L O R A T O R Y F A C T O R A N A L Y S I S 68 I N T R O D U C T I O N T O T H E P R O B L E M 68 P U R P O S E O F T H E I N V E S T I G A T I O N 69 P R E V I O U S S U R V E Y S 7 0 M E T H O D S 71 Criteria Recorded 72 R E S U L T S 75 Results from Recorded Criteria 75 Summary , 80 C O N C L U S I O N 82 C H A P T E R V , 83 C O M P U T E R S I M U L A T I O N M E T H O D O L O G Y 83 S T R I C T U N I D I M E N S I O N A L I T Y 83 Conditions for the Magnitude of Communality 85 Conditions for the Sample Size 85 Conditions for Distributions 86 E S S E N T I A L U N I D I M E N S I O N A L L T Y 87 Conditions for the Proportion of Communality on Secondary Factor 88 Conditions for the Number of Test Items with Non-Zero Loadings on the Second Factor.. 90 F A C T O R S H E L D C O N S T A N T 91 D E P E N D E N T V A R I A B L E S : D E C I S I O N - M A K I N G R U L E S A N D I N D I C E S 92 P R O C E D U R E S 93 D A T A A N A L Y S E S 98 Research Questions 98 Criteria for Best and Optimal Performance 101 S O F T W A R E 104 V CHAPTER VI 105 RESULTS 105 D A T A C H E C K 105 R E S E A R C H Q U E S T I O N 1 A 107 Strict Unidimensionality 107 Essential Unidimensionality Ill R E S E A R C H Q U E S T I O N I B 122 Strict Unidimensionality 123 Essential Unidimensionality 133 R E S E A R C H Q U E S T I O N S 2 A A N D 2 B 147 Strict Unidimensionality 148 Combinations for Strict Unidimensionality 151 Essential Unidimensionality 153 Combinations for Essential Unidimensionality 156 R E S E A R C H Q U E S T I O N 3 A A N D 3 B 158 Strict Unidimensionality 158 Essential Unidimensionality 163 S U M M A R Y F O R S T R I C T U N E D I M E N S I O N A L I T Y 172 S U M M A R Y F O R E S S E N T I A L U N I D I M E N S I O N A L I T Y 173 CHAPTER VII 176 DISCUSSION 176 G E N E R A L P R E D I C T I O N S O F T H E S T U D Y 176 C O N T R I B U T I O N S 179 Guidelines 180 L I M I T A T I O N S A N D F U T U R E P R O G R A M O F R E S E A R C H 187 REFERENCES 193 APPENDIX A 203 DETAILS OF CRITERIA REPORTED IN JOURNALS 203 vi List of Tables Table 1 11 Decision-Making Rules and Indices with the Students in my Classroom Scale 11 Table 3 74 Summary of Criteria Reported in Journals 74 Table 4 89 Example of the Relationship between Communality and Factor Loadings 89 Table 5 94 Example of Population Pattern Matrix: Essential Unidimensional 94 Table 6 . 108 Strict Unidimensional Results of the Mean Accuracy Rates for Rules and Indices.... 108 Table 7 112 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices 112 (Skewness of 0.0 and Communality of 0.20) 112 Table 8 114 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices 114 (Skewness of 0.00 and Communality of 0.90) 114 Table 9 115 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices 115 (Skewness of 2.50 and Communality of 0.20) 115 Table 10 117 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices 117 (Skewness of 2.50 and Communality of 0.90) 117 Table 11 130 Binary Logistic Regression Results for Main Effects: Strict Unidimensionality 130 Table 12 143 Binary Logistic Regression Results for Main Effects: Essential Unidimensionality 143 Table 13 152 List of New Combination Rules for Strict Unidimensionality 152 Table 14 157 List of New Combination Rules for Essential Unidimensionality 157 Table 15 159 Strict Unidimensional Results of the Mean Accuracy Rates for the Four Main Groups 159 Table 16 160 Strict Unidimensional Results of the Mean Accuracy Rates for the New Rules 160 Table 17 163 Essential Unidimensional Results of the Mean Accuracy Rates for the Four Main Groups 163 Table 18 167 Essential Unidimensional Results of the Mean Accuracy Rates for the New Rules 167 Table 19 182 Recommended Combination Rules for Strict Unidimensionality 182 vii Table 20 183 Recommended Individual Rules for Strict Unidimensionality 183 Table 21 184 Recommended Combination Rules for Essential Unidimensionality 184 Table 22 185 Recommended Individual Rules for Essential Unidimensionality 185 viii List of Figures Figure 1 7 Students in my Classroom Scale 7 Figure 2 8 Distribution of the Total Score for the Students in M y Classroom Scale 8 Figure 3 10 Combined Scree Plots for the Continuous PA, Ordinal PA, and Student Data 10 Figure 4 97 Flowchart of Simulation Methodology 97 Figure 5 123 Chi-square Statistic for Strict Unidimensionality: Effect of Sample Size 123 Figure 6 124 Chi-square Statistic for Strict Unidimensionality: Effect of Skewness ; 124 Figure 7 124 Chi-square Statistic for Strict Unidimensionality: Effect of the Magnitude of Communality... 124 Figure 8 125 Eigenvalue Rules for Strict Unidimensionality: Effect of Sample Size 125 Figure 9 125 Eigenvalue Rules for Strict Unidimensionality: Effect of Skewness 125 Figure 10 126 Eigenvalue Rules for Strict Unidimensionality: Effect of the Magnitude of Communality 126 Figure 11 126 Parallel Analysis for Strict Unidimensionality: Effect of Sample Size 126 Figure 12 127 Parallel Analysis for Strict Unidimensionality: Effect of Skewness 127 Figure 13 127 Parallel Analysis for Strict Unidimensionality: Effect of the Magnitude of Communality 127 Figure 14 128 R M S E A for Strict Unidimensionality: Effect of Sample Size 128 Figure 15 128 R M S E A for Strict Unidimensionality: Effect of Skewness 128 Figure 16 129 R M S E A for Strict Unidimensionality: Effect of the Magnitude of Communality 129 Figure 17 : 133 Chi-square Statistic for Essential Unidimensionality: Effect of Sample Size 133 Figure 18 133 Chi-square Statistic for Essential Unidimensionality: Effect of Skewness 133 Figure 19 134 Chi-square Statistic for Essential Unidimensionality: Effect of the Magnitude of Communality 134 Figure 20 134 IX Chi-square Statistic for Essential Unidimensionality: Effect of the Proportion of Communality on the Second Factor 134 Figure 21 . 135 Chi-square Statistic for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor 135 Figure 22 135 Eigenvalue Rules for Essential Unidimensionality: Effect of Sample Size 135 Figure 23 '. 136 Eigenvalue Rules for Essential Unidimensionality: Effect of Skewness 136 Figure 24 136 Eigenvalue Rules for Essential Unidimensionality: Effect of the Magnitude of Communality. 136 Figure 25 : 137 Eigenvalue Rules for Essential Unidimensionality: Effect of the Proportion of Communality on the Second Factor 137 Figure 26 137 Eigenvalue Rules for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor 137 Figure 27 138 Parallel Analysis for Essential Unidimensionality: Effect of Sample Size 138 Figure 28 138 Parallel Analysis for Essential Unidimensionality: Effect of Skewness 138 Figure 29 139 Parallel Analysis for Essential Unidimensionality: Effect of the Magnitude of Communality.. 139 Figure 30 139 Parallel Analysis for Essential Unidimensionality: Effect of the Proportion of Communality on Second Factor 139 Figure 31 140 Parallel Analysis for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor 140 Figure 32 140 R M S E A for Essential Unidimensionality: Effect of Sample Size 140 Figure 33 141 R M S E A for Essential Unidimensionality: Effect of Skewness 141 Figure 34 141 R M S E A for Essential Unidimensionality: Effect of the Magnitude of Communality 141 Figure 35 142 R M S E A for Essential Unidimensionality: Effect of the Proportion of Communality on the Second Factor 142 Figure 36 142 R M S E A for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor 142 X Acknowledgements I would like to acknowledge and thank my academic and dissertation supervisor, Dr. Bruno D. Zumbo. His immeasurable support with my research and professional growth is greatly valued and appreciated. I would also like to acknowledge my supervisory committee members, Dr. Bruno D. Zumbo, Dr. Anita Hubley, and Dr. Susan James, for being incredibly thorough and flexible. I thank each of you from the bottom of my heart. Finally, I would like to thank my parents, Frederick and Jill Slocum. M y parents have been extraordinarily supportive and giving throughout my education, and I don't know what I would of done without them. Chapter I Introduction to the Problem 1 The purpose of this chapter is to provide a general introduction to the research problem investigated in this dissertation. In doing so, the current chapter presents the background to the research questions and highlights the purpose and objectives of the study. In addition, the research problem is set in context through an example. Background Millions of people are tested every year. Inferences made from the results of these tests are used to make high-stake decisions in education and social-welfare systems, as well as for diagnosis and treatment in health care systems. The test scores, in particular, are used for research purposes. The inferences made from these test scores are highly dependent upon the accuracy and validity of the interpretation of the test results. According to the Standards for Educational and Psychological Testing (APA, A E R A , & N C M E , 1999), validity refers to "the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores" (p.9). As Zumbo and Hubley (1996) state, a test score is not necessarily directly linked to the construct that is being measured, but should actually be considered one (of various) indicators of the construct. Furthermore, Messick (1975) and Guion (1977) claim that in order to adequately assess the meaningfulness of an inference, one should have some kind of confirmation of what the test score itself actually exhibits. Therefore, in order to properly assess 2 the appropriateness, meaningfulness, and usefulness of the inferences made from test scores, one needs to take construct validity into consideration. Construct validity seeks agreement between a theoretical concept and a specific measurement (e.g., test). Messick (1998) has claimed that one major concern of the validity of inferences is the unanticipated or negative consequences of test score interpretation, which can often be traced to construct under-representation or construct-irrelevant variance. If a measurement instrument excludes essential facets of a construct, or if the instrument includes facets that may be irrelevant, the inferences made from the test scores could involve a variety of inappropriate consequences (e.g., social, political, or health). For example, if an instrument intended to measure depression, and actually included secondary facets of anxiety, the inferences made from the test scores could lead to inappropriate diagnosis and treatment. A patient may be prescribed a medication that treats only depression, whereas a more appropriate treatment may be a different medication or therapy. One way, among many, to prevent inappropriate consequences from test score interpretation is to focus on the theoretical dimensions of the construct a test is intending to measure. Investigating the dimensionality (i.e., the structure of a specific phenomenon) of tests is an essential component of construct validity. In fact, construct validity has often been referred to as factorial validity (Thompson & Daniel, 1996). Furthermore, when the items of a test are summed to one total score, there is a tacit assumption that the test is measuring one dimension and that this dimension is measured on a continuum. If, for example, a measure is actually multi-dimensional yet is being treated as unidimensional, interpretation of the test scores can be 3 invalid and entail harmful consequences, as suggested in the previous example of measuring depression. Problem Under Investigation Investigating the dimensionality of test data is one of the most commonly encountered activities in day-to-day research. The structure (i.e., dimensionality) of item response data is composed of a certain number of factors. The decision, of how many factors to retain in a factor analysis not only explains the structure of the underlying phenomena, but also accounts for the relationships between the test items. This verdict is critical in that it tends to be one of the most common problems facing researchers who assess the dimensionality of measures (Crawford, 1975; Fabrigar, Wegener, MacCal lum, & Strahan, 1999). When assessing dimensionality, the goal is to obtain factor solutions that are reliable and mirror the population factor structure. Evidently, it has been noted by several researchers that this verdict can lead to researchers deciding on an erroneous number of factors (e.g., Crawford, 1975; Fabrigar et al., 1999; Russell, 2002). Moreover, the consequences of over- and under- extracting the number of factors can lead to inappropriate inferences and decisions (e.g., high-stakes measurement decisions or inappropriate interventions). Yet, there is no universally accepted technique or set of rules to determine the number of factors to retain when assessing the dimensionality of item response data. Previous research has shown that many of the decision-making rules and indices that are commonly used to identify dimensionality do not work efficiently under certain conditions (e.g., 4 small sample sizes), over-and under-extract the number of factors, and have a limited range of accuracy (Gorsuch, 1983; Fabrigar et al., 1999; Russell, 2002). Most importantly, however, observations made by Hattie (1985) and Lord (1980) still hold today. Specifically, Hattie noted that there has been no empirical study that has examined the efficiency of combinations of these rules, indices, and methods. Lord declared that there is a need for an index or rule to define unidimensionality, in particular. Furthermore, it has been recommended that researchers apply multiple criteria when deciding on the appropriate number of factors to retain (Gessaroli & De Champlain, 1996; Fabrigar et al., 1999; Davison & Sireci, 2000). In fact, as demonstrated in Chapter IV, researchers are actually utilizing multiple criteria, but the rationale behind the combinations is unclear and the selection of combinations do not necessarily prove to be effective. Although there are numerous rules and indices that researchers can utilize in order to determine the dimensionality of item response data, there are a select few that have been suggested in the literature. These rules have been noted as being commonly used and, under certain conditions, successful in determining dimensionality (Hattie, 1984; Fabrigar et al., 1999; Russell, 2002). Such rules and indices include (1) the chi-Square statistic from Maximum Likelihood (ML) factor analysis,-(2) the chi-Square statistic from Generalized Least Squares (GLS) factor analysis, (3) the eigenvalues-greater-than-one rule, (4) the ratio-of- first- to-second-eigenvalues-greater-than-three rule, (5) the ratio-of- first- to-second-eigenvalues-greater-than-four rule, (6) parallel analysis (PA) using continuous data, (7) parallel analysis (PA) using ordinal or rating scale (Likert) data, (8) the Root Mean Square of Approximation (RMSEA) index from M L 5 estimation, and (9) the Root Mean Square of Approximation (RMSEA) index from GLS estimation. These nine decision-making rules and indices will be described in Chapter III. Although there are inherent differences in decision-making rules versus decision-making methods when determining the dimensionality of measures, this study will be using the terms interchangeably. There are numerous methods, procedures, approaches, statistics, rules and indices that are used by researchers in order to assess the dimensionality of measures. For example, the eigenvalues-greater-than-one rule entails a mathematical foundation that actually makes this rule an index (Kaiser, 1960). Parallel analysis is often considered a procedure, whereas the R M S E A is an index. The chi-square test produces a statistic, and a scree plot is used as an approach (Fabrigar et al., 1999). In summary, while there are theoretical and practical differences in these terms, in order to establish an understanding of the vocabulary used in this dissertation, 'decision-making rules and indices' will be used throughout the dissertation to refer to the various techniques that are used to determine dimensionality. As mentioned previously, the validity of inferences based on test scores is grounded upon the dimensional structure of a test. The scoring of item response data rests on an implicit, but grounding, principal: when test items are summed to one total scale score, there is a tacit assumption that a test is unidimensional (i.e., one factor). Because these total scale scores are often used for interpretation in the measurement field, it is especially important to focus on this assumption. If items are summed to a total scale score, inferences are made in regards to the one construct (i.e., factor). Lord's (1980) claim, that there is a need for an index or rule to define unidimensionality, is one that is critical to the measurement process of assessing the 6 dimensionality of tests. Because there is not a universal and sound index or methodology to assess unidimensional measures, one may question the validity of the inferences made from total scale scores. This dissertation focused on developing a (universal) methodology in order to assess unidimensionality. In doing so, researchers could have more assurance that the inferences made from test scores are valid. Multidimensional measures are also used in the assessment field. However, in order to develop a baseline methodology for assessing the structure of tests, there is a need to start at the unidimensional level. Consequently, a methodology can be generalized and enhanced (i.e., built-upon) to assess multidimensional measures with confidence and accuracy. Unidimensional measures, in particular, could have a variety of different factor structures, ranging from structures with one-and-only- one factor to measures with one dominant factor that include secondary- minor dimensions. As mentioned above, this dissertation is focusing on nine decision-making rules and indices that are commonly used to assess unidimensional measures. An example is provided below in order to put (the use of) these nine individual decision-making rules and indices in context. Application of the Decision-Making Rules and Indices Students in My Classroom (Battistich, Solomon, Kim, Watson, & Schaps, 1995) is a 14-item scale that measures the degree to which students feel their classmates are supportive (i.e., measures the students' sense of the school as a community). A total scale score is computed by summing the 14 items, and each item is rated on a 5-point scale. Items 5, 9, and 10 are reversed coded. The scale can be seen in Figure 1 below. 7 Figure 1 Students in my Classroom Scale Directions: For the following sayings, think about yourself and people your age when you answer. For each sentence, circle the number that describes how true it is for you. 1 Disagree a lot 2 Disagree a little 3 Don't agree or disagree 4 Agree a little 5 Agree a lot Items 1. ..willing to go out of their way to help... 1 2 3 4 5 2. ..care about my work ... 1 2 3 4 5 3. ..like a family... 1 2 3 4 5 4. ..really care about each other... 1 2 3 4 5 5. .. put others down... 1 2 3 4 5 6. ..help each other learn... 1 2 3 4 5 7. ..help each other... 1 2 3 4 5 8. ..get along together very well... 1 2 3 4 5 9. .. just look out for themselves... 1 2 3 4 5 10. ..mean to each other... 1 2 3 4 5 11. ..trouble with my school work... 1 2 3 4 5 12. ...treat each other with respect... 1 2 3 4 5 13. ...work together to solve problems... 1 2 3 4 5 14. ..everyone in the class feels good... 1 2 3 4 5 Note: Due to copyright restrictions, the complete scale is not reproduced. Instead, key words are provided to help the reader understand the context of the questions. The scale was administered to 450 students in 28 different rural and urban schools in British Columbia, Canada. The students' age ranged from 8 to 14 years with an average age of 11.3 years and a median age of 11 years. The sample consisted of 227 boys (50.4%) and 223 girls (49.6%). The scale's internal consistency (i.e., Cronbach's alpha) for this data was 0.78. The distributions of the items were slightly skewed (i.e., looking across the items, the skewness 8 indices ranged from -0.352 to 0.357). The total scale score ranged from 22 to 70 with an average of 43.4 and a median of 44.0. The distribution of the total scale score can be seen in Figure 2 below. Figure 2 Distribution of the Total Score for the Students in M y Classroom Scale 50 H 30.00 40.00 50.00 BO.OO 70.00 Total Scale Score The nine decision-making rules and indices were applied to this data in order to determine the number of factors to retain. Previous research has shown the Students in My Classroom Scale to be a unidimensional measure (Roberts, Horn & Battistich, 1995).When conducting a principal components analysis (PCA) on the current data set, the first eigenvalue (6.25) accounted for 44.67% of the total variance, the second eigenvalue (1.52) accounted for 10.863% percent of the total variance, and the third eigenvalue (0.96) accounted for 6.85% of the 9 total variance. The results of applying all nine decision-making rules and indices indicate that two methods identified Students in My Classroom Scale as unidimensional: the ratio of first-to-second-eigenvalue-greater-than three rule and the ratio of first-to-second-eigenvalue-greater-than four rule. First, as stated above, the third eigenvalue is 0.958, the second eigenvalue is 1.52, and the first is 6.25. According to the eigenvalues-greater-than-one rule, two factors would be retained. In addition, PA for continuous data and PA for ordinal data generated random data with the first two eigenvalues less than the first two eigenvalues of the real data (i.e., the student data), and the third eigenvalue from both the PA continuous and PA ordinal data was greater than the third eigenvalue of the student data. Therefore, if using PA for continuous and PA for ordinal data as approaches for determining the number of factors to retain, two factors would be retained. The scree plot for the student data, random continuous PA data, and random ordinal PA data can be seen in Figure 3 below. As shown, the point in which the eigenvalues from the random data are less than the eigenvalues from the real data (i.e., student data) indicates the number of factors to retain, which was two factors in this context. 10 Figure 3 Combined Scree Plots for the Continuous PA, Ordinal PA, and Student Data Scree Plots for Continuous PA, Ordinal PA, and Student Data "Eigenvalues for Ordinal PA Data Eigenvalues for Continuous PA Data Eigenvalues for Student Data Furthermore, the M L and GLS chi-square tests were statistically significant, which indicates that a unidimensional model doesn't fit the data. Finally, the values for the M L R M S E A (0.11) and GLS R M S E A (0.09) were above 0.05, which indicates that a unidimensional model is not a good fit (Fabrigar et al., 1999). Therefore, a unidimensional model did not fit the data according to the both of the R M S E A indices. On the other hand, the ratio of the first-to-second-eigenvalues was 4.11, which is greater than both three and four. Therefore, both the ratio of first-to-second-eigenvalues-greater-than-three rule and the ratio of first-to-second-eigenvalues-greater-than-four rule identified this scale as unidimensional. Table 1 below shows the results from applying the nine decision-making rules and indices to this sample data. Not only is it important to recognize the difference in the performance of these nine decision-making rules and indices, but also to acknowledge the 11 inconclusive findings in regards to the number of factors to retain. Which one of the nine decision-making rules are most reliable? In other words, which one of the rules should be used in determining the number of factors to retain? It is examples like the Students in my Classroom Scale that provided motivation for the investigations that were conducted in this dissertation. Table 1 Decision-Making Rules and Indices with the Students in my Classroom Scale Rule Decision Criteria Decision of Unidimensionality Chi-square M L Chi-square=429.97, df=77, p=.0001 no Chi-square G L S Chi-square=290.11, df=77, /?=.0001 no Eigenvalue>l Rule /L, = 6.254, X, =1.521 no Eigenvalue Ratio>3 Rule 6.254/1.521=4.11 yes Eigenvalue Ratio>4 Rule 6.254/1.521=4.11 yes P A Continuous Method A, (random)=1.209 Xx (real-data)=6.254 no P A Likert Method Xx (random)=1.207, /I, (real-data)=6.254 no R M S E A M L 0.11 no R M S E A G L S 0.09 no Objectives and Purpose of the Study The purpose of this dissertation is to extend previous research by investigating how the nine decision-making rules and indices perform individually and in combination under varying conditions (e.g., different sample sizes or magnitude of communality) when assessing the underlying unidimensionality of item response data that is often found in psychological, educational, and health measurements (e.g., subjective well-being, depression, motivation). The overall objective of the present study is to provide guidelines to assist the social and behavioral science researcher in the decision-making process of retaining factors in an assessment of 12 unidimensionality. With an eye toward addressing the research goals, the factor analytic model is reviewed in Chapter II, which includes a theoretical and statistical perspective on dimensionality. Relevant psychometric research literature on the assessment of dimensionality is reviewed in Chapter III. Given that a focus of this dissertation is informing day-to-day research practice, a short note summarizing how researchers are currently practicing day-to-day factor analytic investigations of determining the dimensionality of test items is presented in Chapter IV. Chapters III and IV informed the methodology presented in Chapter V . Results are described in Chapter VI, and the discussion is outlined in Chapter VII. 13 Chapter II Introduction to the Factor Analytic Model The purpose of this chapter is to describe the well-known common factor model. The remainder of this dissertation will require an understanding of the statistical and theoretical assumptions and implications of factor analytic procedures. Thus, at this point, it is deemed appropriate to provide a brief introduction to the statistical foundation of factor analysis. The chapter begins by defining exploratory factor analysis (EFA) and its' purpose. Dimensionality is defined by integrating several perspectives. Strict and essential unidimensionality are defined. Lastly, the factor model is presented. There are various researchers and statisticians that have documented the factor analytic model and methods. This chapter closely follows the work of Comrey and Lee (1973; 1992), Gorsuch (1983), and Tabachnick and Fidell (2001). The equations presented are taken from Gorsuch (1983) and Comrey and Lee (1973;1992). Principal Components Analysis and Factor Analysis It is important to note that Principal Components Analysis (PCA) and Factor Analysis (FA) are two distinct methods, but are often categorized together or as the same process (i.e., PCA has been shown to be a kind of FA) (Fabrigar et al., 1999). Traditionally, the purpose of F A and PCA was to account for variance, which both of them do. FA, however, is often used to reproduce the population covariance matrix, something PCA cannot do solely alone. PCA attempts to account for all the variance of each variable, and therefore, it does not differentiate 14 between common and unique variance (i.e., it is assumed that all the variance is relevant). F A includes an error component to the model (i.e., D A T A = M O D E L +ERROR), and thus the error that is not accounted for is separated. Additionally, P C A does not produce latent constructs, but rather generates components, which are defined as linear composites of the original measured variables. Therefore, it is not conceptually correct to equate factors to components. P C A provides a complete set of eigenvalues for the correlation matrix (eigenvalues will be defined below). As in common practice, these eigenvalues will be used in this dissertation in order to determine the number of factors to retain (e.g., ratio of first- to-second eigenvalues). Likewise, F A will be used to generate the chi-square statistic, df, and p values from M L and G L S estimation methods in order to determine the number of factors to retain (e.g., M L chi-square decision-making method, G L S R M S E A ) . P C A and F A are mathematically and statistically different. As seen in the equations below, F A is represented in Equation 10, whereas P C A is represented in Equation 11 (Equation 10 is on page 24 and Equation 11 is on page 26). Factor analysis (FA), in particular, is a statistical procedure that is applied to a set of observed variables (e.g., test items) in order to determine which observed variables form individual subsets or clusters that ultimately combine into factors or dimensions. These factors are considered to mirror the underlying phenomenon that creates the correlation among the observed variables (Tabachnick & Fidell, 2001). Exploratory factor analysis (EFA), specifically, is used to gain a theoretical insight on a given set of test items by identifying the underlying latent dimension(s). The primary purpose of E F A is to explain the relationship between the set of observed variables (i.e., test items) when little to no prior information on the data structure is 15 available. Thus, E F A is thought of as a tool for generating theories. A n E F A has often been utilized as an initial assessment tool. In common test development practice, however, an E F A is conducted to validate instruments when developing or revising a scale. For example, researchers may utilize an E F A to determine the dimensionality of an instrument, and subsequently use this information to construct composite scores for either hypothesis testing or for making inferences. There are fundamental measurement theories and practices that are vital to making such inferences. There is an assumption that the sampling process and the development of test items were conducted with as little error as possible. The process of sampling has been discussed in the literature, as well as which statistical procedures are most appropriate for certain sampling designs (Korn & Graubbard, 1999). Additionally, the development of test items could include error such as culturally biased language, grammatical errors, and more importantly, theory that is not related to the set of test items. These kinds of errors are often represented in the statistical output, but sampling and item-writing errors can confuse the interpretation of variance accounted for and other statistical results. Most of these issues can be screened if a researcher conducts the initial stages of research appropriately. Dimensionality One of the most important assumptions of measurement theory is that the test items (i.e., observed variables) of an instrument all measure just one thing. In order to make psychological sense, ordering persons on an attribute, describing individual differences, or grouping people by 16 ability, the underlying latent variable of a composite score must be unidimensional (Hattie, 1985). One important goal of assessing unidimensionality, in particular, is to summarize the patterns of correlations among the observed variables (Tabachnick & Fidell, 2001). This is often done by reducing variables down to smallest number as possible to account for the underlying phenomenon. The underlying phenomenon is considered to be the reason why the observed variables are correlated in the first place. The underlying phenomena can reflect one or more dimensions. Dimensionality refers to the structure of a specific phenomenon (Pett, Lackey, Sullivan, 2003). Unidimensionality, in particular, refers to one dominant latent variable or phenomena. The use of composite scale scores are often used to make inferences in social and behavioral sciences, and unidimensionality is assumed when using these composite scores. There are several statistical procedures that provide a structural analysis of a selected set of observed variables (e.g., factor analysis or multidimensional scaling). Ultimately, these procedures ideally obtain an appropriate number of dimensions to justify the use of composite scores and to explain the pattern of correlations among observed variables. Dimensions (i.e., latent variables) are known to be constructed variables that come prior to the observed variables (i.e., test items). That is, it is assumed that if two test items are correlated, they have something unobserved in common (i.e., the latent dimension). Despite the significance of the theoretical and procedural aspects of assessing the dimensionality of measures via FA, there is not an acceptable index to represent the unidimensionality of a set of test items. 17 Strict and Essential Unidimensionality Psycho-educational and health measures often utilize composite scale scores in order to make inferences. Although some of the psycho-educational and health measures meet strict unidimensionality, most of these measures are actually essential unidimensional. According to Humphreys (1952, 1962), in order to measure any psychological latent variable of interest (e.g., subjective well being), the inclusion of numerous minor latent variables is not merely desirable but unavoidable. The inclusion of secondary minor latent variables is often referred to as 'essential unidimensionality'. Strict unidimensionality, on the other hand, is defined as one dominant latent variable with no secondary minor dimensions. Both essential and strict unidimensionality were investigated in this dissertation. Strict unidimensionality was investigated via manipulation of the magnitude of the communalities (h2). Essential unidimensionality was also examined by varying the magnitude of the communalities (h2), but the proportion of communality on the second factor and the number of test items with non-zero loadings on the second factor was also manipulated for essential unidimensionality. As shown in Chapter V , the methodology of this dissertation (informally) provided an operational definition for essential unidimensionality: manipulation of the magnitude of communality, proportion of communality on the second factor, and the number of test items with non-zero loadings on the second factor. Factor Analysis Procedures and Concepts F A particularly aims to summarize the interrelationships among the test items in a 18 concise, but accurate manner (i.e., reducing the number of variables down to a few factors or dimensions). Specifically, F A generates several linear combinations of observed variables, and each one of these combinations represent a factor. These factors explain the system of correlations in the observed correlation matrix, and ultimately these factors can be used to reproduce the observed correlation matrix (Tabachnick & Fidell, 2001). In order to employ F A appropriately, there are a set of essential steps that must be conducted. These include selecting and measuring (i.e., administering to a set of individuals) a set of observed variables, generating a correlation matrix, extracting a set of factors from this matrix, determining the number of factors, rotating the factors (this is optional), and interpreting the results. There are various other statistical techniques that need to be considered in order to conduct these steps, but the interpretation of the results seems to be an essential part of F A . Interpreting and labeling factors depends on the theoretical foundation of the observed variables that form a particular factor as well as the purpose of the test, and construct validity plays a large role here. In general, a factor is more interpretable when the set of observed variables correlate highly with that factor and do not correlate with the other factors (Tabachnick & Fidell, 2001). This is often referred to as simple structure. Factor Analytic Model In F A , it is expected to start with a matrix of correlations among the observed variables. Ultimately, a matrix of factor loadings is interpreted as the correlations between the observed variables and the specific hypothetical factor (i.e., dimension). Therefore, factor loadings reflect 19 quantitative relationships between the observed variables and the dimensions. In other words, the loadings are the degree of the generalizability found between the observed variable and the factor: the farther the loadings are from zero, the more one can generalize from the factor to the variable (Gorsuch, 1983). There are several different arbitrary cut-off scores that are presented in psychometric literature, which provide a boundary for whether a loading is worthy of being generalized. For example, loadings that are greater than 0.30 (or even 0.40 in some contexts) are considered significant (Crawford & Hambuger, 1967). With this introduction of factor loadings, it seems appropriate to discuss how these loadings are actually generated. The common factor model will be presented here. In comparison, a full factor model is based on a perfect calculation of the variables from components. In other words, the full factor model does not include error terms, and thus it is assumed that this model would lead to perfect reproduction of the variables, and any observed error is a reflection of the model itself (Gorsuch, 1983). It is seldom in empirical research that observed data are obtained in a perfect manner. Therefore, the full component model will not be used in this context. The common factor model, on the other hand, consists of two different components: common factors and unique factors. Common factors refer to the factors that contribute to two or more observed variables'. The unique factor represents the non-common factor variance for each observed variable, and this variance needs to be accounted for in order to complete the actual prediction of that variable (Gorsuch, 1983). The unique factor, however, does not include error of the model. This factor includes random error of measurement such as biases, unknown sampling error, or administration or test discrepancies. It has been recommended that large samples should be used The psychometric literature presents various cut-off points for the number of observed variables needed to define a factor. This cut-off point of two or more is taken from Gorsuch (1983). 20 for FA so that the sampling error can be ignored (Gorsuch, 1983, Comrey & Lee, 1973; 1992). As defined by Comrey & Lee (1992) and Gorsuch (1983), the traditional common factor model is defined by the following equation: Xik = w n F i k + - Wif Ffi + Wu, Uik + Wie Eik (!) where: xjk = a standard score for person k on observed variable i wn - a factor loading for observed variable i on common factor 1 Flk - a standard score for person k on common factor 1 wif = a factor loading for observed variable i on the common factor/ Fjk = a standard score for person k on the common factor/ wlu = a factor loading for observed variable i on unique factor u Uik - a standard score for person k on unique factor i wie - a factor loading for observed variable / on error factor e Ejk - a standard score for person k on error factor e The x, F, U, and E scores in Equation 1 are all standard scores that have a Mean (M) of zero and a standard deviation (<y ) of 1.00 (Comrey & Lee, 1992). Each w value is considered to be a factor loading and falls between a -1.00 and 1.00. The x score on the left hand side of the equation is empirically obtained from the data collection process (i.e., it is the participant response to the test item). The U will change for each variable because each variable's unique factor only affects that variable. Equation 1 can then be represented in matrix form for all the possible values of / and k (i.e., observed variables and people) simultaneously:2 2 For a detailed description of the matrix algebra that is specific to factor analysis and to Equation 2, please see Comrey & Lee (1992) or Gorsuch (1983). 21 (2) Zjk = a standard score data matrix of k individuals and i variables Fkf - a standardized factor score matrix for the same number of individuals k with/factors Pfi = a factor by variable weight matrix â€¢Uik = a matrix of unique factor scores with one unique factor for each variable for k individuals Dd - a diagonal matrix giving each unique factor weight for reproducing the variable with which the weight is associated. In summary, U and D are unique factor scores and weights whereas F and P are the common factor scores and weights. It is important to state that the unique factor score and weight represent that part of the variable that is not predictable from the set of other observed variables and is assumed to be uncorrelated with the common factors and the other unique factors. The unique factors increase the amount of variance of the variables (Gorsuch, 1983). In saying so, it is also essential to present additional components of the factor analysis process to further explain the common factor model. Communality The communality of a variable is the proportion of variance that is accounted for by the common factors only. More specifically, the communality is the sum of squared loadings for a variable across factors, and it can be considered the squared multiple correlation of the variable as predicted from the factors (Tabachnick & Fidell, 2001). The following equation shows the communality for variable X : 2 2 ,2 I ^ + I I ^ O * /i = (where j^k) (3) Given Equation 3, the variable's uniqueness, u2, is defined as the proportion of variance not including the variance from the common factors. From one perspective, the uniqueness factor is equal to the sum of squares of the loadings in the specific and error factors (Comrey & Lee, 1973). It can also be assessed as follows: u2 = \ - h 2 (4) To further divide u2, it is important to consider nonrandom variance. The reliability coefficient, rxx , is a measure of the variable's variance that is nonrandom and an estimate of the true score 2 3 variance, rXl (Gorsuch, 1983). It is important to note here that a communality cannot be greater than the true score variance. A specific factor's proportion of variance, s2, is a difference of the proportion of nonerror variance, r 2 , , and the variance accounted for by the common factors, h2:4 s2 = r2, - h2 (5) The error contribution, excluding the specific factor, is calculated as follows: e2 = l - r 2 , (6) With these equations in mind, the following holds true: u2 = s2 - e2 (7) l=h2 + s2 + e2 (8) 3 A true score is a hypothetical score that represents an assessment result, which is entirely free of error. Sometimes a true score is thought of as the average score of an infinite series of assessments with the same or exactly equivalent instruments, but with no practice effect or change in the person being assessed across the series of assessments. A specific factor is a composite of all an observed variable's reliable variance that does not overlap with a factor in the analysis. 23 Finally, the proportion of total variance contributed by common factors is obtained as follows: z > , 2 Proportion of total variance = â€” (v is the number of variables) (9) v Assumptions of the Common Factor Model According to Gorsuch (1983), the following is assumed to be true for a given population of individuals: â€¢ The observed variables are estimated from the common factors by multiplying each factor by the appropriate weight and summing across factors. Both unique and common factors are included in the model. â€¢ The common factors correlate zero with unique factors. â€¢ The unique factors correlate zero with each other. â€¢ A l l factors have a mean of zero and a standard deviation of 1.0. Correlated and Uncorrelated Factor Models There are two forms of common factor models: the more general case, where common factors are correlated with each other, and the uncorrelated models. In the former case, the correlation can range, of course, up to a (absolute) value of 1.0 (Gorsuch, 1983). There is an assumption when conducting E F A that factors will be rotated and selected to exhibit meaningful dimensions. Highly overlapping factors are not of great theoretical interest (Gorsuch, 1983). If variables are "too" correlated, researchers are more likely to run analyses again or select a factor model with one less factor. Gorsuch (1983) presents a challenge for many researchers: "If a 24 procedure has factors that correlate too high, then it should be rejected. If a procedure has factors that correlate too low, then it should also be rejected". The definition of what is too high or too low is extremely subjective and differs depending on the theoretical foundation of the factors. In general, the correlation among the variables themselves should indicate the level of correlation between the factors: certain variables often cluster together (i.e., correlated variables) and therefore will "load" onto a specific factor. As mentioned previously, F A starts with a matrix of correlations among the observed variables, and ultimately a matrix of factor loadings are generated which can be interpreted as the correlations between the observed variables and the specific hypothetical factor. The common factor correlation matrix is as follows for a population: Rvv = PvfRffP^Uvv (10) The Rff is the correlation between the factors, P^ is the factor by variable weight matrix, the is the transposed factor by variable weight matrix, and the Um is a diagonal matrix of the squared unique factor loadings.5 The second form of the common factor model is that which includes orthogonal rotation methods (i.e., methods for uncorrelated factors). These models include simplified equations that are easier to compute. Uncorrelated models assume the factors are independent of each other. The factor structure is equal to the factor pattern in these models (Gorsuch, 1983). Orthogonal models indicate that the factor matrix, when multiplied by its transpose, gives a diagonal matrix. In knowing this, Gorsuch (1983, p.55) notes: "Because the factor scores are assumed to be 5 A transpose of a matrix is obtained by interchanging the rows and columns of an original matrix (Comrey & Lee, 1973). 6 An Identity matrix is a matrix of correlations between factor scores containing elements 0 and 1, depending on whether the correlation is that of a factor score with itself or with another factor. Multiplying a matrix by an Identity matrix leaves it unchanged (Comrey & Lee, 1973). 25 standardized and uncorrelated, dividing the resulting diagonal matrix by the reciprocal of the number of individuals gives an identity matrix". 6 Therefore, the factor models with uncorrelated factors drop the Rff terms, because these are now identity matrices. Other than that, the conclusions and foundations are the same for the correlated and uncorrelated factor models. Eigenvalues and Eigenvectors Within the F A procedure, the maximum amount of variance that can possibly be extracted by a given number of factors is removed. This procedure involves the characteristic root and vector analysis of correlations (Gorsuch, 1983). The goal for each factor is to account for the maximum amount of variance possible of the variables being factored during extraction. The first factor from the correlation matrix consists of a combination of all the variables that will produce the highest squared correlations between the variables and the factor, because the squared correlation is a measure of the variance accounted for. The sum of the squares of the first column of the factor structure will then be maximized. The second factor is extracted so that it is uncorrelated with the first. This factor then maximizes the amount of variance extracted from the residual matrix, which was generated after the first factor was removed. This process continues with each additional factor until the process accounts for as much variance as possible. A factor solution is also known as a solution of characteristic roots (eigenvalues) and vectors (eigenvectors). Basically, the principle factors are considered rescaled characteristic vectors (Gorsuch, 1983). The characteristic root, or eigenvalue, is equal to the sum of squared loadings on a principal factor. Therefore, the eigenvalue is the direct index of how much 26 variance is accounted for by each factor. The eigenvalues are all positive and nonzero, but only if there are as many factors as there are variables (Gorsuch, 1983). Matrix algebra is obviously an important component to FA, and it provides an efficient method for manipulating numbers.7 Matrix algebra plays a particularly important role with calculating eigenvalues. In order to understand how eigenvalues are generated, it is important to review some of the basics of matrix algebra. A matrix (A) with only one row or only one column is known as a vector. A matrix (A) with one row and one column is known as a scalar. A transpose of a matrix (A) is written as (A ')â€¢ In this case, each row of the original matrix A becomes a column of A' and each column of the original matrix A become a row of A'. An identity matrix is one in which results from multiplying a matrix by its' inverse and is equal to 1.00. And finally, a trace is equal to the sum of the eigenvalues, which is also equal to the sum of the communalities used in the diagonal of the correlation matrix. With that being said, within FA, one of the goals is to obtain a complete set of vectors and eigenvalues for the correlation matrix (i.e., also referred to as PCA). Thus, the equation becomes: Rv v = AAvA-v ( i i ) where A and S are respectively a eigenvector and eigenvalue of R. The 5 in Equation 11 is a diagonal matrix with eigenvalues in the diagonal and A contains the vectors. The R will generally have as many eigenvalues and vectors as there are variables (Gorsuch, 1983). If the square root of each eigenvalue is taken and these are used to form two diagonal matrices, the 7 To become more familiar with the basics of matrix algebra, see Tabachnik & Fidell, 2001. 27 following will result: Svv = Svv Svv (12) After substitution into Equation 11: R = A SV2SU2A VV W W w w (13) By taking the square root of each eigenvalue, we now have as many factors as there are variables. The relationship between factor loadings, magnitude of communality, and eigenvalues is shown in Table 2 below. These elements are computed using the basic equations of factor analysis shown above. Columns one and two provide the factor loadings for factors one and two, column three consists of the magnitude of communality, columns four and five include the proportion of communality distributed among factors one and two, and columns six and seven are the squared-factor loadings for factors one and two. 28 Table 2 Example of the Calculation of Eigenvalues Factor 1 Factor 2 Communality Proportion Proportion Squared Squared Loadings Loadings (h2) of h2for of h2for Factor 1 Factor 2 Factor 1 Factor 2 Loadings Loadings 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.769 0.190 0.90 0.80 0.20 0.576 0.036 0.285 0.664 0.90 0.30 0.70 0.081 0.441 0.285 0.664 0.90 0.30 0.70 0.081 0.441 0.285 0.664 0.90 0.30 0.70 0.081 0.441 0.285 0.664 0.90 0.30 0.70 0.081 0.441 0.285 0.664 0.90 0.30 0.70 0.081 0.441 Eigenvalues U)= 5.589 2.529 Total Variance Explained = 57.90 Remember that a communality of a variable is that proportion of variance that can be accounted for by the common factors. In addition, recall that a factor loading is the correlation between that observed variable and a particular factor. The proportion of communality for factor one and factor two were arbitrarily chosen for each observed variable in Table 2. The factor loadings can be calculated by the following equation: 2 / 2 Loading =%h *yh (14) 29 As stated previously, an eigenvalue is defined as the sum of squared factor loadings. As seen in Table 2, the squared factor loadings in column six are summed to equal the corresponding eigenvalue for factor one, and the squared factor loadings in column seven are summed to equal the corresponding eigenvalue for factor two. The total amount of variance accounted for is calculated as follows: Total_Variance = 100*h21(A, + X2) (15) Therefore, a measure with 14 test items, one dominant factor (i.e. factor one), one minor factor (i.e., factor two), and a magnitude of communality of 0.90 generates a first eigenvalue of 5.59 and a second eigenvalue of 2.53. The total amount of variance accounted for is 57.90. 30 Chapter III Review of the Relevant Literature The purpose of this chapter is to introduce the relevant literature that encompasses the proposed problem of this dissertation. In doing so, this chapter begins by defining the most prominent methods that are used by researchers in order to determine the number of factors to retain. The chapter is organized according to three types of relevant literature: (1) literature that reviews theory and simulation studies that examine item response theory (IRT) and its counterparts, (2) literature that presents results of simulation studies that aim to recover population factor structure, and (3) literature that reports various research findings in regards to retaining an appropriate number of factors. Research questions pertaining to the dissertation are introduced. The chapter also introduces literature that assists in defining what is meant by an "appropriate" number of factors. Methods for Determining the Number of Factors to Retain Many different rules and indices have been utilized to determine the correct number of factors to retain when assessing dimensionality (Zwick & Velicer, 1982; Hattie, 1984). After factors have been identified for a model(i.e., extracted from a correlation matrix), the number of factors to retain needs to be determined by a researcher, who applies one or more of these rules and indices. The decision is based on the goal of explaining as much variance as possible while 31 maintaining a parsimonious model (i.e., retaining the least amount of meaningful variables). This decision of how many factors to retain is essential. Under-and over-factoring has been noted to have a large impact on the interpretation of the results (Fava & Velicer, 1992; 1996; Wood, Tataryn & Gorsuch, 1996). Extracting too few factors often results with distorted solutions from substantial error (Fava & Velicer, 1992; Wood et al., 1996). On the other hand, extracting too many factors often distorts the variance, deflates factor loadings, and can lead to an overall degradation of a true factor (Fava & Velicer). Furthermore, over-extraction can initiate the creation of false factors at the expense of real ones, especially when varimax rotation is used. (Wood et al.). There is an agreement in the literature that over-extraction is less erroneous than under-extraction when one or two extra factors are extracted (Fava & Velicer; Wood et al.; Gorsuch, 2003). Nevertheless, over-extraction should be avoided (Comrey & Lee, 1992). In practice, adding too many factors to a model may induce a researcher to develop constructs with little theoretical value or to develop unnecessarily complex theories. In light of these research findings, it is important to acknowledge which rules and indices are more successful. The theoretical foundation of several of these methods date back to the early 1900's (Spearman, 1904). Several rules and indices are known to extract factors more accurately than others. The rules and indices that are widely used in practice and exhibited strong empirical support are defined below, and various research findings are presented on these particular rules and indices later in the chapter. 32 Definition of Decision-Making Rules and Indices The chi-square tests from Maximum Likelihood (ML) and Generalized Least Squares (GLS) estimation methods are used by researchers in order to determine the number of factors to retain (Gessaroli & De Champlain, 1996; Fabrigar et al., 1999). M L factor extraction estimates population values for factor loadings by calculating loadings that maximize the probability of sampling the observed correlation matrix from the population. This method also maximizes the canonical correlations between the variables and the factors, and is scale free. M L estimation is based on the assumption that the common factor model holds exactly in the population and that the measured variables follow a multivariate normal distribution in the population (MacCallum et al., 1999). GLS extraction method, on the other hand, seeks to minimize squared differences between the observed and reproduced correlation matrices, and weights are applied to the variables. Variables that are not as strongly related to the other variables are considered to be trivial in the solution (i.e., have smaller weights) (Tabachnick & Fidell, 2001). Being a formal hypothesis test, the Bartlett's (1950) chi-square statistic has a more structured statistical foundation compared to the other rules and indices, and thus provides more capabilities for statistical inference, such as significance testing (Fabrigar et al., 1999). This statistic is used to test the null hypothesis that the model holds in the population with a given number of factors. The chi-square test is applied to the residual correlation matrix after a given number of factors have been extracted. This test should be applied to a consecutive number of factors (beginning with zero) until a non-significant test statistic is obtained. A non-significant chi-square test indicates that meaningful factors have been extracted. 33 The eigenvalues-greater-than-one rule (Kaiser, 1960), also known as the Kaiser-Guttman (K-G) criterion, is another prominent method. This is one of the most widely used criteria for deciding on the number of factors and is the default option in most statistical software packages (Thompson & Daniel, 1996). This criterion requires the computation of eigenvalues from the unreduced correlation matrix. As mentioned in the previous chapter, eigenvalues represent variance. Because the variance that each observed variable contributes to a principal components extraction is one, a component with an eigenvalue less than one is considered meaningless. In addition, when a latent root falls below one, the internal consistency of the factor scores approaches zero (Hoyle & Duvall, 2004). Given this information, the number of eigenvalues (for a correlation matrix) greater than one is then used as the number of factors to retain. The ratio-of-first-to-second-eigenvalues is yet another eigenvalue criterion. If the ratio of the first-to-second-eigenvalues is greater than three, or four in some contexts, then a single factor solution (or unidimensional model) is assumed (Gorsuch, 1983). That is, the first factor accounts for a sufficiently large proportion of the variance among the observed variables or items. Another widely used approach for determining the number of factors, which is also based on eigenvalues, is known as the "scree test" (Cattell, 1966). The scree plot test is a graphical method with the eigenvalues of the correlation matrix of the test items plotted against the order of the factors. The factors form a line sloping downward in descending order, in which the most dominant factors are the first and highest eigenvalues on the scree plot. To construct a scree plot, the factors are arranged along the horizontal axis (abscissa) with eigenvalues (percentage of variance accounted for by a factor) along the vertical axis (ordinate). By mathematical 34 construction, the eigenvalue is highest for the first factor and equal or decreasing for the next few factors until reaching very small values for the remaining factors. The goal is to look for the point where the slope of the line changes or where there is a large drop in the eigenvalues. The number of factors prior to this drop represents the number of factors to extract. Parallel analysis (PA), which is considered a systematic evaluation of the eigenvalues of the observed correlation matrix, is also used to determine the number of factors to retain. Horn (1965) introduced this method based on the generation of random variables for estimating components. This work was based on Guttman, Kaiser and Dickman (Zwick & Velicer, 1986). P A is a method for deciding on the number of factors in principal components and principal factor analysis (Horn; Humphreys & Ilgen, 1969; Zwick & Velicer). P A requires a generation of random data with the same (i.e., parallel) number of items and individuals as the original data matrix. The random data should mirror that of the population in that if the original data are ordinal, then the random data should be of the same rank, and the same applies if the original data are continuous (Thompson & Daniel, 1996). This approach is based on the comparison of eigenvalues obtained from the sample data to eigenvalues one would expect to obtain from completely random data (i.e., items for which there are no common factors). The number of factors to extract is indicated by the point at which the eigenvalues for the actual data drop below the eigenvalues for the random data. This procedure is built on the assumption that a factor from the original data should explain more variance (i.e., larger eigenvalue) than a corresponding factor from random data. The eigenvalues from the original data that are larger than the eigenvalues from the random data are considered the maximum number of factors to extract. 35 The Root Mean Square of Approximation (RMSEA) index (Steiger & Lind, 1980) is yet another index used to determine the number of factors to retain. The R M S E A is considered an estimate of the discrepancy between the proposed model and the data, per degree of freedom. It is referred to as a goodness-of-fit index and takes into account the error of approximation in the population. Browne and Cudeck (1993) provide guidelines for the interpretation of the RMSEA: values less than 0.05 indicate a close fit, values ranging from 0.05 to 0.08 indicate a fair fit, values ranging from 0.08 to 0.10 indicate a mediocre fit, and values greater than 0.10 represent a poor fit. The research findings on these rules and indices are often differentiated based on the item type of the data (i.e., dichotomous, ordinal, continuous) or the dimensionality of measures (i.e., multidimensional, unidimensional). This dissertation, in particular, is investigating the application of these rules and indices for unidimensional measures with Likert-type data. For that reason, it is important to introduce the literature on item response theory (IRT) before presenting research findings on these particular rules and indices. Item Response Theory The assumption of unidimensionality must be met in order to use specific item response theory (IRT) models (De Champlain & Gessaroli, 1998). In summary, IRT consists of a family of models that have been demonstrated to be useful in the design, construction, and evaluation of educational and psychological tests (Hambleton, Swaminathan, & Rogers, 1991). Because IRT researchers and practitioners need unidimensional measures in order to utilize IRT models, they 36 are particularly interested in the process of assessing dimensionality. It is assumed, when using IRT, that the probability of a correct response on a given item (i.e., observed variable) necessitates a single underlying dimension or attribute, which is interpreted as the latent variable or proficiency measured by the given set of test items (i.e., test). The assumption of unidimensionality is often violated due to the existence of secondary minor dimensions, or due to cognitive, personality, or test taking factors that accompany the single trait being measured (Hambleton et al., 1991). As long as a dominant trait exists, the unidimensional assumption can be met, and an IRT model may be applied. In IRT, the probability of correctly answering an item depends on the item discrimination (a,), item difficulty (/?,), a lower asymptote, also known as the item guessing parameter (c,), and the underlying latent trait being measured by the test (d). Although these item parameters do not necessarily seem relevant when assessing psychological latent traits, it is important to review the literature on these models. 9 IRT models and factor analytic models (e.g., nonlinear factor analysis) have been shown to be mathematically equivalent (De Champlain & Gessaroli, 1998). More importantly, the process of determining dimensionality is a requirement for IRT, and thus, the IRT literature focuses on this process in depth. Due to the fact that IRT models were initially used to assess dichotomous item response data, much of the IRT literature examines this kind of item response data. For example, a mathematics achievement test will also measures verbal ability, an accompanying secondary minor dimension. 9 Items parameters such as difficulty level or guessing estimates do not necessarily fit the context of psychological constructs such as depression or anxiety. However, these parameters can be applied in a psychological context by looking at these paramters from a different perspective. For example, the endorsement proportion (i.e., difficulty level) of an item can also be referred to as "the proportion of examinees in a group of interest who endorse an item according to the scoring key". 37 Investigating the Dimensionality of Test Item Data: The Role of IRT There is confusion in the psychometric literature concerning the proper methods for assessing unidimensionality in a set of test items (Hambleton & Rovinelli, 1986). Hattie (1984) reported that there are 87 indices in the psychometric literature for assessing the unidimensionality of a set of test items. Unfortunately, many of these indices are ad hoc and seem to lack practical evidence and rationale (Tate, 2003). In addition, many of the indices that Hattie proposed for assessing unidimensionality do not match the definitions of unidimensionality found in the psychometric literature (Hambleton & Rovinelli). For example, McDonald (1982) concluded that the principal of local independence is the foundation of unidimensionality. He defined a set of test items as unidimensional if the residual covariation between items is zero. Hattie did not provide such a discussion when reviewing the various indices. Local independence is defined as follows: "Local independence means that when abilities influencing test performance are held constant, examinees' responses to any pair of items are statistically independent. In other words, after taking examinees' abilities into account, no relationship exists between examinees' responses to different items. Simply put, this means that the abilities specified in the model are the only factors influencing examinees' responses to test items" (Hambleton et al., 1991, p. 10). 1 0 In addition, for binary items only, McDonald recommended the use of nonlinear factor analysis because the relationship between items and the relationship between items and the underlying traits are known to be nonlinear. Likewise, Hambleton and Rovinelli stated that one of the In other words, when items are correlated, they have some underlying trait in common. When these traits are partialled out or held constant, the items or observed variables become uncorrelated. This is also a grounding principal in factor analysis. That is why local independence can also be referred to as conditional independence (Hambleton et al., 1991). 38 fundamental assumptions of IRT is that these relationships are nonlinear. Simulation Studies Based on McDonald's recommendations of nonlinear FA, Hambleton and Rovinelli (1986) conducted a simulation study to compare nonlinear and linear factor analysis by generating five artificial dichotomous data sets. The intercorrelation among factors, number of factors, the number of items defining a factor, and IRT parameter values were varied in these data sets. Tetrachoric and phi correlations were used. Scree plots and PA were used to determine the number of factors to retain. What they found was that linear factor analysis for binary items, in all instances, overestimated the number of underlying dimensions. These results suggested the need for extreme caution in using scree plots and PA as methods for determining the number of factors when using linear factor analysis. Upon identifying 87 different indices for detecting unidimensionality of a set of test items, Hattie (1984) conducted a simulation study that investigated the adequacy of various indices as decision criteria for assessing unidimensionality for binary items. Tetrachoric correlation matrices were used. The number of factors, IRT parameter values, and the intercorrelation among factors were varied for a total of 36 different models. The results of the simulation indicated that the eigenvalues-greater-than-one rule generally resulted in an overestimate of the number of factors in the cases where there was only one factor, and underestimated in cases where there were more than one factor. He also found, in this same study, that the chi-square test of linear factor analysis was inefficient and claimed it to be 39 inappropriate for determining unidimensionality. De Ayala and Hertzog (1991) also conducted a simulation study, which compared two commonly used methods to determine the number of factors: linear factor analysis (i.e., confirmatory factor analysis and exploratory factor analysis) and multi-dimensional scaling (MDS). Unlike factor analysis, which is based on a linear model, M D S assumes a monotonic relationship between item performance and ability, just as IRT assumes a nonlinear relationship. There were seven data sets with both dichotomous and polytomous item response formats generated, which differed in respect to dimensionality, intercorrelation among factors, and the number of items defining a dimension. F A of matrices of tetrachoric and phi correlation coefficients were performed. Additionally, Pearson product-moment (PPM) and polychoric correlation coefficients were obtained on the polychotomous data and factor analyzed. The number of factors to retain was determined initially by the scree plots for E F A analyses. The final decision of dimensionality was made after considering several criteria, including percent of common variance accounted for by a factor, magnitude of loadings, and number of residuals greater than 0.05. There were numerous findings, but one in particular worth noting is that no one method for determining the number of factors led to the correct solution for all data sets (i.e., for all sets of conditions). In addition, it was found that the number of items used to define a dimension did not appear to affect the determination of the number of dimensions for the E F A solutions. Also, for data sets with highly correlated factors, the scree plot identified a unidimensional model, but follow-up analyses indicated a two-dimensional structure. And for unidimensional data, the scree 40 plot correctly identified a one-factor model, whereas the follow-up analyses were incorrect by identifying a two-factor structure. Because of these inconsistencies, De Ayala and Hertzog (1991) suggest that it may not be possible to combine multiple methods. De Champlain and Gessaroli (1998) conducted a simulation study to investigate Type-I error rates for nonlinear factor analysis ( N L F A ) fit statistics, including the approximate chi-square statistic. Dichotomous unidimensional item response vectors, which varied according to test length and sample size, were generated according to a three-parameter logistic IRT function. To investigate the effects of sample size and test length on the Type I error rates, separate logit-linear analyses were performed for the chi-square test. Acceptances and rejections of a unidimensional model (i.e., the null hypothesis) were assessed. It was discovered that the approximate chi-statistic statistic was relatively unaffected by the sample size, test lengths, item parameter structures, and latent trait correlation levels simulated. Tate (2003) compared both confirmatory and exploratory parametric and nonparametric methods for assessing the dimensionality of large-scale tests with dichotomous items response formats using a three-parameter logistic IRT function. Despite the numerous findings from this simulation study, only results that are relevant to the current study are discussed. For example, E F A provided borderline results when the decision of the number of factors to retain was based on a model fit index. Parametric procedures identified an incorrect factor structure when the item difficulty was extreme, and nonparametric methods depended on whether the factor structure was unidimensional or multidimensional. Several of the factor analytic and IRT models correctly recovered the underlying true structure, when the assumed model was correct and when the 41 model parameters were not extreme. Although the assessment of unidimensionality is presented in the IRT literature, the process of determining the number of factors to retain is also examined in simulation studies that aim to recover population factor structure. While similar to the findings in the IRT literature, the research studies that investigated the recovery of population factor structure used a different framework. For example, these studies utilized a different set of conditions (i.e., item parameters were not investigated). Investigating the Recovery of Population Factor Structure Theoretical Development of the Sources of Error in Factor Analysis In practice, a factor analytic model is not an exact depiction of real-world phenomena. The theoretical and conceptual framework for factor analysis was based, for the most part, on Spearman (1904). Spearman did not actually present a formal F A model. Instead, he sought to show how a latent variable alone accounted for the correlations among various measures (MacCallum, 2004). 1 1 During the following years, Spearman and others continued to conduct research on F A theory and methods. There was an unclear distinction between theory and model at this point in the literature. Factor analysis became recognized as exploratory in nature with the primary purpose of discovering underlying structure (i.e., the model) in data. Thurstone (1947) introduced the F A model as an approximation, rather than a true representation of real phenomena. He and others proclaimed that the model may be incorrect not only due to various sources of error, but also to nonlinearity, secondary minor factors, and violation of distributional " Spearman (1904) presented the findings on "g" theory: GeneralTntelligence. 42 assumptions (MacCallum). This approximation theory became established and built upon. Likelihood ratio test of model fit (Joreskog, 1969) and sampling and model error (Tucker & Lewis, 1973) were introduced. MacCallum and Tucker (1991) further investigated sources of error in factor analysis. A distinction between "model error" and "sampling error" was presented. For example, model error, which arises from lack of fit of the model to the population, and sampling error, which arises from lack of exact correspondence between the sample and population, were declared to act in accordance in any given sample to produce lack of model fit and error in parameter estimates. Sampling error introduces inaccuracy in parameter estimates whereas model error introduces lack of fit of the model in the population and sample. The common factor model assumes that the model holds in the population, and therefore, model error is often eliminated from the investigation (of determining the number of factors to retain) if the appropriate number of factors have been selected (i.e., if the model fits within the sample). This framework of error sheds some light on the issue of sample size in fitting a common factor model to a sample. That is, MacCallum, Widaman, Zhang, and Hong (1999) and Mundfrom, Shaw, and Lu Ke (2005) elaborated on MacCallum and Tucker's (1991) framework by investigating the influence of sample size in factor analysis. It was found that as sample size increases, the intercorrelations between the unique factors with each other and with the common factors tend to approach their population values of zero.1 2 This, in fact, reduces the sampling error and improves the accuracy of the parameter estimates (MacCallum et al., 1999). The unique factor loadings also play a role. If communalities of measured variables are high, the unique factor loadings will 1 2 Refer to the common factor model presented in Chapter II to review the theoretical foundation of unique factors. 43 be low, which, in return, reduces the impact of sampling error. Therefore, sample size is irrelevant if communalities are high. On the other hand, if communalities of observed variables are low (i.e., unique factor loadings are high), then the role of sample size becomes important (MacCallum et al). Moreover, the number of variables (i.e., test items) is not necessarily an appropriate index to determine an appropriate sample size. However, with more variables, the influence of the magnitude of communality seems to diminish (Mundfrom et al., 2005). In smaller sample sizes, the intercorrelations between the unique factors and themselves and with common factors deviate further from zero, introducing the burden of sampling error. Therefore, as unique factor weights become larger (i.e., low communalities), the impact of this source of sampling error is more heavily impacted by sample size, causing an inadequate recovery of population factors (MacCallum et al.). Finally, it seems that the development of an absolute minimum necessary sample size is not feasible. The factor analytic literature is full of various and competing recommendations pertaining to the minimum sample size (AO and the minimum ratio of sample size (AO to the number of variables (p) necessary to obtain adequate and stable factor solutions that closely approximate the population factors (Hogarty, Hines, Kjornrey, Ferron, & Mumford, 2005). Due to the extreme number of conditions and variations that one needs to take into consideration when investigating such issues, it is near impossible to establish one minimum sample size for all conditions. However, Mundfrom et al. (2005) found that with a variable-to-factor ratio of at least seven, and even with low communalities, the minimum necessary sample size was never greater than 180, and in most cases less than 150. On the other hand, with a variable-to-factor 44 ratio of three, the number of factors between three and six, and low communalities, the minimum necessary sample size was at least 1200, which is a large jump from 150. Then again, Brown and Cudeck (1993) found that the factor solutions resulting from the use of large samples showed greater stability, and that as the ratio of the number of variables to factors increased, factors in the population were recovered more successfully. Over determination An additional facet to consider when determining the quality of a factor analytic solution is the degree of overdetermination, which is defined as the degree to which each factor is clearly represented by a sufficient number of variables (MacCallum et al., 1999). This is often assessed by evaluating the ratio of the number of variables to the number of factors, commonly denoted as the p:r ratio (as Mundfrom et al. assessed, above). It is important to note that overdetermination is not purely defined by this ratio. Highly overdetermined factors are those that reveal high loadings on a considerable number of variables and show good simple structure (i.e., factors that are marked by high loadings for some variables and low loadings for others). Weakly overdetermined factors tend to display poor simple structure without the high loadings (MacCallum et al., 1999). For all factors to be highly overdetermined, the number of variables should be at least several times the number of factors. Comrey and Lee (1992) recommend five times as many variables as factors. However, three or four times as many variables as factors have also been recommended in the psychometric literature (Gorsuch, 1983). MacCallum et al. (1999) stated that the impact of sampling error on factor analytic 45 solutions can be reduced when factors are highly overdetermined. Moreover, when factors are highly overdetermined, sample size may have less impact on the quality of results. Likewise, having highly overdetermined factors might be especially useful when communalities are low, because the impact of sample size is greatest in that situation. Thus, the influence of the p:r ratio on the factor analytic solution decreases as the magnitude of communalities increases. Simulation Studies Hogarty, Hines, Kromrey, Ferron, and Mumford (2005) investigated the relationship between sample size and the quality of factor solutions obtained from E F A . They sought to assess the minimum N and N:p ratio that was needed to attain a good recovery of population factors across a variety of communality levels, degrees of overdetermination, and the number of factors. This simulation study's design was guided by the framework of MacCallum et al. (1999). Results showed that there was not a minimum level of N or N:p ratio to achieve good factor recovery across all conditions explored, as mentioned previously. In addition, as MacCallum et al. (1999) found, higher levels of communality were associated with better factor recovery. Results also showed that the apparent quality of a sample factor solution depended, in large part, on which index was used to determine whether recovery was 'good'. MacCallum, Widaman, Preacher, and Hong (2001) conducted a simulation study to investigate the effects of sample size, magnitude of communality, and overdetermination on recovery of population factors under conditions in which the common factor model did not hold 13 in the population. There were 18 population correlation matrices generated, and the matrices For full details on how the population correlation matrices were generated, see Tucker et al. (1969). 46 varied in sample size, the number of factors and magnitude of communality, and the number of measured variables was held constant at 20. Communalities ranged from high, wide, and low values. The degree of model fit in the population was also controlled by simulating minor factors. Sample correlation matrices were generated from each of the 18 population correlation matrices. Each sample correlation matrix was analyzed using M L F A . It was hypothesized that the lack of fit of the model in a population would not affect the population recovery, regardless of model error and sample size. The R M S E A was used as an index of model fit. MaCallum et al. (2001) found that the effects of sample size, communality level, and overdetermination on the recovery of population factors were identical regardless of the existence of model error, as hypothesized. However, retaining an inappropriate number of factors, especially too few factors, can cause major distortions of factor loading patterns. Underfactoring is known to be a major source of model error (Fava & Velicer, 1992; Wood et al., 1996). The findings from this study in regards to the lack of effect of model error on factor recovery did not generalize to situations in underfactoring. The MacCallum et al. findings pertain to the influences of a variety of minor sources of model error, but not a major source of model error, such as failure to retain a major common factor. Preacher and MacCallum (2002) further investigated model error by conducting a simulation study of E F A . The sample size, number of factors, number of observed variables, and levels of model fit were varied, and sample matrices were analyzed using principal factors method. Measures of sample-population congruence and bias were obtained. The model fit was measured in terms of the population root mean squared residual (RMSR). The R M S R index 47 generates an estimate of the average degree of incongruity between equivalent elements of the population correlation matrix and the correlation matrix implied by the factor model. Even though it was found that the lack of fit of a factor model in a population does not seriously influence factor recovery (MacCallum et al., 2001; Preacher and MacCallum, 2003), one of the primary findings in this study was the result that emphasized overdetermination. The results suggest that the number of factors, rather than the number of variables, is what drove overdetermination. Therefore, it appears that researchers may be more likely to improve factor recovery by reducing the number factors rather than by adding indicators, but these findings were generated by holding communalities constant. As mentioned in Chapter II, a communality for an observed variable is the variance accounted for by the factor(s). The communality for a test item is the sum of squared loadings (SSL) for a variable across factors, or for unrotated factors, the S S L for a factor is equal to the eigenvalue. High communalities have been known to offset the negative effects of small sample sizes (Preacher & MacCallum, 2002). According to Cliff and Pennell (2004), higher communalities mean not only greater stability for the loadings of a specific tests, but also lead to. stronger factors, which means that the stability of all the loadings is improved. It is important to note here that retaining too few factors will negatively impact (i.e., inflate or deflate) the communalities (Preacher & MacCallum, 2002), which can ultimately affect the stability of the entire test. Whereas communalities were found to be the most important determinant of factor recovery according to MacCallum et al. (1999), Preacher and MacCallum (2002) found sample size to have the largest effect on factor recovery. Regardless of the differences found in these 48 simulation studies, it is vital to consider the association of sample size, communality levels, overdetermination, and the major influences on model error when assessing dimensionality via FA. The question of how many factors to retain would only be moderately problematic if taking too few factors simply meant that a few dimensions were left behind in the correlation matrix or if taking too many meant that some factors were left uninterpretable. However, the importance of deciding on the number of factors is a much larger issue since the estimation of loadings on one factor cannot be accomplished independently of the estimation of loadings on the other factors (Cliff & Hamburger, 1967). This is due to the communality for a test item being the sum of squared loadings (SSL) for a variable across factors. Communalities are used in determining the number of factors to retain because the SSL for a factor is equal to the eigenvalue, and eigenvalues are the foundation for many of the indices and rules used in retaining factors. There have been numerous simulation studies conducted to assess the adequacy of FA procedures and criteria, as presented thus far. Regardless of these studies, however, there has been no empirical study that has examined the combination of rules, indices, and methods that are used to determine the number of factors to retain, and researchers have been calling for it for over 20 years (Hattie, 1985). It has been suggested that researchers should rely on multiple criteria or rules when deciding on the appropriate number of factors to retain (Hattie; Gessaroli & De Champlain, 1996; Fabrigar et al., 1999; Preacher & MacCallum, 2003). Accordingly, in order to determine which of these methods should be combined, it is necessary to review the 49 previous research findings on the individual rules and indices. Previous Research Findings on Decision-Making Rules and Indices Although the chi-square test is useful as an upper bound for determining the number of factors and as a statistical check on the adequacy of the sample size (Gorsuch, 1983, 1997b), the chi-square test, unfortunately, has been criticized as being too strict or unrealistic. The true objective of factor analysis is to obtain a parsimonious factor solution that generates an estimate to the real world. A null hypothesis of a perfect fit of unidimensionality is not necessarily realistic or of real interest to a practitioner or researcher (Fabrigar et al., 1999). Likewise, it has been noted that the limitations of hypothesis testing (e.g., the impractical null hypothesis) is a concern to many researchers (Wilkinson & The A P A Task Force, 1999). Moreover, this test has been found to reject observed factor structures that actually represent or recover the population factor structure adequately (MacCallum, 1990). Most importantly, the chi-square statistic is highly influenced by sample size (Fabrigar et al.). When the sample size is extremely large, even the most trivial differences between the model and the data can indicate a statistical significance. Hence, the large sample size can lead to potential over-extraction (Finch & West, 1997). In situations when the sample size is small, even large differences are not detected, which may lead to underfactoring (Fabrigar et al.). There are several commonly used F A estimation methods that produce the chi-square statistic. The significance tests, confidence intervals, and fit statistics that are obtained from Maximum Likelihood (ML) estimation have been found to be quite useful in practice, and if the 50 assumptions of the method are met, researchers are often assured that the estimates from M L have minimum sampling variance (Briggs & MacCallum, 2003). However, Briggs and MacCallum also found that OLS outperformed M L in the recovery of weak factor structures when there was a moderate amount of error. For practitioners who favor M L , they recommend applying OLS with M L in EFA to compare solutions. On the other hand, it was stated that the chi-square statistic based on unweighted least squares (ULS) and generalized least squares (GLS) are quite similar, and that the chi-square statistic from GLS estimation is a useful and practical tool for assessing dimensionality, especially for binary items (De Champlain & Gessaroli, 1998). There have been numerous investigations conducted on the role of eigenvalues in the process of determining the number of factors to retain. The eigenvalues-greater-than-one rule and the criteria for the ratio of first-to-second eigenvalues have been found to overestimate the number of factors in a data set (Cattell & Jaspers, 1967; Linn, 1968; Cattell & Vogelmann, 1977; Gorsuch, 1983; Zwick & Velicer, 1986). The eigenvalues-greater-than-one rule was found to be most effective when sample size is large, when there are at least forty observed variables (i.e., items), and when the number of factors is between the number of variables divided by five and number of variables divided by three (Gorsuch, 1983,1997b). Despite this, the rule has been noted as being problematic. Guttman (1954) demonstrated this rule's validity for determining the number of factors in a population matrix, but this rule is most often applied to a matrix based on data from a sample. Because of the errors of measurement that exist with such data, factors emerge that are not meaningful (Cattell, .1966). Hence, this rule, when applied to sample data, virtually always leads to over-extraction (Lee & Comrey, 1979; Zwick & Velicer). Furthermore, 51 this rule was stated as being inappropriate for item analysis (Gorsuch, 1997a), and it has been noted that the rule is not necessarily a reliable test (Cliff, 1988). Fabrigar et al. (1999) claimed that this rule has not been shown to work well in any published study. Despite the well-documented poor performance of the eigenvalues-greater-than-one rule, it is still widely used in practice, as noted in Chapter IV. This adherence to a faulty rule by researchers and practitioners may be due to its ease and availability (Hoyle & Duvall, 2004). The scree test, which also utilizes eigenvalues, has been examined in numerous studies. The results of the scree test have been found to be more successful when sample sizes are large, communality values are high, and when each factor has several observed variables with high loadings (Cattell & Vogelmann, 1977). When factor loadings are low (which is typical in social and behavioral sciences), the accuracy of the scree test drops significantly (Gorsuch, 1983). The factor loadings reflect the correlation between each observed variable and factor. This scree test is subjective in that there is no clear, objectively defined criterion of what represents an extensive drop in the size of the eigenvalue, and, in many cases, the eigenvalues produce a configuration that does not show a clear drop. Interestingly, according to Fabrigar et al., the scree test was found to be used as a sole criterion for deciding on the number of factors in 26% of the published articles they reviewed. Although this method has been criticized for its' subjective judgment (Fabrigar et al., 1999; Hoyle & Duvall, 2004), the scree test has also been suggested to aid in deciding on the number of factors (Russell, 2002). Parallel analysis (PA) is one method that has been highly recommended in the literature. This test has been found to function well under various conditions (Humphreys & Montanelli, 52 1975), and to be the most accurate method in determining the number of factors (Zwick & Velicer, 1986; Glorfeld, 1995; Velicer, Eaton, & Fava, 2000; Gorsuch, 2003). Crawford and Koopman (1973) found that sample size and different factoring methods (i.e., principal components) did have significant effects on Horn's test (i.e., PA). In addition, PA has been found to over-extract (Zwick & Velicer) and under-extract (Turner, 1988) in certain studies. Although parallel analysis is not found in the major statistical software programs, Thompson and Daniel (1996) include a condensed computer program for conducting parallel analysis in SPSS. Furthermore, Cota, Longman, Holden, Fekken, and Xinaris (1993) give tables of eigenvalues from random data that can be used, but these eigenvalues can only be used for limited sample sizes. Overall, this method has been highly recommended for making decisions regarding the numbers of factors to retain (Fabrigar et al., 1999; Thompson & Daniel). Finally, according to Browne and Cudeck (1992), the R M S E A provides a promising approach for assessing the fit of models. The RMSEA, however, has yet to be extensively tested within the context of determining the number of factors (Fabrigar et al., 1999). Furthermore, according to Byrne (1998), the R M S E A cut points are unreliable and should be used with caution. As mentioned in Chapter I, it has been recommended that practitioners use multiple decision criteria when determining the number of factors to retain. As found in Chapter IV, researchers are actually using multiple methods, but the rationale behind using certain combinations is not available. When using M L , it has been recommended that the R M S E A be utilized in combination with the scree test and parallel analysis (Fabrigar et al., 1999; Browne & 53 Cudeck, 1992). Further to this, Preacher and MacCallum recommend combining the scree test and parallel analysis. Two noteworthy exceptions in these simulation studies are the scree plot and deciding on the number of factors based on values of factor loadings. As seen in Chapter IV, both methods are used in day-to-day factor analytic research practices. However, both of these methods are known to be subjective approaches, and hence cannot be used in simulation studies. Furthermore, using the factor loading approach to determine the number of factors to retain has not been described as a rule in the psychometric literature, such as Lord (1980), Fabrigar et al. (1999), or Russell (2000). These publications have outlined the prominent decision-making rules and indices that are used to decide on the number of factors to retain. Research Questions From an assessment of the literature, above, I formulated three research questions. Each research question has two sub components involving both strict and essential unidimensionality, as defined in Chapter II. Each research question is described below. These research questions are used throughout the remainder of the dissertation to help orient the reader in the methodology, results, and discussion chapters. 54 Research Question One How do the nine decision-making rules and indices, individually, perform for both strict and essential unidimensional measures? As suggested by the psychometric literature, reviewed above, in investigating the performance of the individual rules and indices, I examined the effects of (a) sample size, (b) magnitude of the communality (i.e., how strong each item indicates a factor), and (c) skewness of the item responses for strict unidimensionality. In addition, for the essential unidimensionality cases, (d) the proportion of communality on the second factor, and (e) the number of items with non-zero loadings on the second factor were also studied. In addressing research question one, I first describe the overall performance of each decision-making rule, and then investigate the main effects of the various simulation conditions (e.g., sample size) on the decision-making rules. These two steps will be referred to as Research Questions 1A and IB, respectively. Research Question Two Is there one decision-making rule that performs best overall? If not, which ones perform optimally in the various sample sizes, magnitudes of the communalities, and distribution of the item responses? That is, did any one of the nine decision-making rules or indices perform best, in terms of detecting unidimensionality, in all sets of conditions explored, such as skewness, magnitude of communality, and sample size (i.e., Research Question 2A). Secondly, if there was no one rule or index that performed best in all conditions explored, I wished to investigate which decision-making rule(s) performed optimally under specific conditions (i.e., Research Question 2B). For example, I wanted to examine which rule(s) performed best when the distribution of test items 55 was skewed, magnitude of communality was low, and sample size was large. In addressing Research Question 2B, a list of possible combinations was constructed. Research Question Three Is there one combined decision-making rule that performs best? If not, which combination(s) perform optimally in the various sample sizes, magnitudes of the communalities, and distribution of the item responses? I investigated whether a combination of the nine individual decision-making rules or indices provided a methodology that performed better than any one individual rule or index (i.e., Research Question 3A). In investigating this question, the question of whether any one combination performed best in all conditions was also answered. Secondly, if no one combination worked best in all conditions, I planned to investigate which combinations performed optimally in specific conditions. For example, I wished to examine which combination(s) of decision-making rules and indices performed best when the distribution of test items was skewed, magnitude of communality was high, and sample size was low. Rationale and General Predictions of this Study Based on the previous research findings and simulation studies, several rationales are provided for the selection of the variables (i.e., rules and conditions) that were used in this dissertation's computer simulation. Furthermore, based on the literature, several hypotheses were made in regards to the outcomes of the simulation. The purpose of this simulation study was to extend previous research by investigating the performance of nine individual decision-making 56 rules and indices in determining the number of factors to retain in FA, as well as examine the possibility of using combinations. As mentioned previously, the nine decision-making rules and indices selected for this study included (1) the chi-square statistic from Maximum Likelihood (ML) factor analysis, (2) the chi-square statistic from Generalized Least Squares (GLS) factor analysis, (3) the eigenvalues-greater-than-one rule, (4) the ratio-of- first- to-second-eigenvalues greater than three, (5) the ratio-of- first- to-second-eigenvalues greater than four, (6) parallel analysis (PA) using continuous data, (7) parallel analysis (PA) using Likert (i.e., ordinal or rating scale) data, (8) the Root Mean Square of Approximation (RMSEA) index from M L , and (9) the Root Mean Square of Approximation (RMSEA) index from GLS. The selection of these nine decision-making rules and indices were due, in part, to previous research findings.14 These findings, as outlined in this chapter, provided a foundation of evidence and rationale in selecting these rules and indices for the current simulation study. For example, the chi-square statistic was selected for its capability of making statistical inferences (i.e., statistical testing of a null hypothesis). The limitations of this statistic were evident (as seen previously in this chapter). However, a formal evaluation of this statistic's performance when applied to specific conditions of strong or varied secondary minor dimensions (i.e., essential unidimensionality) had not been directly performed.15 The chi-square statistic from M L estimation was commonly used and was found to be useful (Briggs & MacCallum, 2003). The chi-square statistic from GLS estimation was recommended for assessing dimensionality (De Champlain & Gessaroli, 1998). Furthermore, although the eigenvalues-greater-than-one rule has been highly criticized The selection of the decision-making rules and indices were also due to the findings of Chpater IV. 1 5 The investigation of how the chi-square statistic performs directly in the context of essential unidimensionality (i.e., in the context of strong versus weak secondary minor dimensions) has not been conducted. ' 57 for its performance in retaining an appropriate number of factors, it is still widely used by researchers. Therefore, eignevalues-greater-than-one rule was selected in order to determine when it is most appropriate to employ (e.g., is this rule accurate when sample sizes are small and magnitude of communality is high?). The ratio of first-to-second eigenvalues-greater-than-three and the ratio of first-to-second eigenvalues-greater-than-four rules remain largely untested in an empirical sense. For that reason, these ratios were selected in order to assess their performance in determining unidimensionality. In addition, P A has been found to be one of the most accurate methods for determining the appropriate number of factors to retain (Fabrigar et al., 1999). Thus, this method was selected for this dissertation not only to test this finding, but also to assess how this method performs under extreme conditions (e.g., when secondary minor dimensions are strong). Furthermore, based on examining previous literature, the distinction between P A Likert and P A continuous data had not been directly or empirically compared using the same data. Finally, the R M S E A fit statistic was noted as being a promising approach, but it needed more testing (Fabrigar et al., 1999). For that reason, the R M S E A was selected in order to assess its performance in determining unidimensionality. Furthermore, as mentioned above, both the M L and G L S estimation methods for F A were selected for the simulation design. Since the chi-square statistic was generated for both of these estimation methods, and because the R M S E A utilizes the chi-square statistic in its calculation, the R M S E A for both M L and G L S were computed and tested. 58 Hypotheses Based on previous research findings, above, the hypotheses for the current simulation study included the following: (1) the chi-square statistic will be sensitive to sample size, (2) overall, the eigenvalues-greater-than-one rule will perform poorly, except when sample size is large, (3) overall, P A for both continuous and Likert data will perform better than the other rules and indices, except when sample sizes are small, and (4) the R M S E A for M L and G L S F A will provide inconsistent results. In order to determine whether or not a rule or index performed accurately, it was necessary to define what is meant by the correct number of factors to retain. It would be difficult to judge whether or not these predictions (i.e., hypotheses) were accurate unless the definition of "appropriate" number of factors to retain had been established. The psychometric and measurement literature has attempted to do so in a variety of ways. Meaningful and Appropriate Number of Factors: Findings and Recommendations What does it mean to have an "appropriate", "correct", or "meaningful" number of factors? This question is one largely unasked and unresolved, yet many researchers and psychometric specialists often use these terms interchangeably in measurement literature. This section will provide a review of how the various meanings of the "appropriate" number of factors are categorized, and will explicitly state how the correct number of factors will be defined in this study. 59 Many researchers in psychometric literature neglect to state exactly what they mean by the "correct" number of factors. There are several different theoretical definitions or broad classes that shape what is meant by an "appropriate" number of factors in the literature. These classes somewhat overlap, but are, in fact, distinct. It is useful in discussion to separate these definitions in order to provide an understanding of the rationale behind determining the number of factors and to judge whether or not the indices or rules of thumb utilized have been successful. When determining what is an "appropriate" number of factors to retain, it is important to note what is implied by the notion of a factor. From an explanatory view, a factor is that which influences a variable under investigation (Hakstian & Muller, 1973). With that in mind, the number of appropriate factors would mean differentiating the factors that have a large influence on the n variables sampled from a domain of interest versus those factors that have minimal influence. The problems with this approach are 1) defining how large this influence needs to be and 2) determining which method would establish this influence most accurately. In doing so, several different pieces of literature need to be reviewed. Preacher and MacCallum (2003) use two main classes to label what is meant by an "appropriate" number of factors. The first is using the eigenvalues of the reduced sample correlation matrix, which utilizes communalities in the diagonal rather than the number one. This method is used only if iterative or non-iterative principal factor techniques are employed (Preacher & MacCallum, 2003). Under this category fall several rules of thumb and indices. The first is to retain the number of factors with eigenvalues of the unreduced sample correlation matrix that are greater than one (i.e., K - G rule, eigenvalues-greater-than-one rule). The 60 theoretical grounds of this rule are based on whether the common factor holds exactly in the population, and if so, then the number of eigenvalues of the unreduced population correlation matrix that are greater than one are said to be a lower bound for the number of factors. Unfortunately, this rule does not match the theoretical foundation. The actual mathematical proof of this rule (Guttman, 1954) applies to a population correlation matrix, which, in practice, is often not available (Preacher & MacCallum, 2003). Application of this rule to the sample correlation matrix under an imperfect model represents conditions not explained by the theoretical basis of the rule. In addition, this rule's theory is based on eigenvalues of the unreduced correlation matrix rather than the reduced correlation matrix, whereas in practice, more often than not, the reduced correlation matrix is used. Furthermore, it has been noted that researchers often use this rule as the actual number of factors to retain rather than a lower bound (Gorsuch, 1983). The scree plot (e.g., subjective scree test) and parallel analysis also fall under the category of using a reduced sample correlation matrix. The scree test rule can be applied to both unreduced and reduced correlation matrices (Gorsuch, 1983). As defined previously in this chapter, the scree plot test identifies the number of eigenvalues that fall below the last large drop on a scree plot, and these are the number of factors retained. The parallel analysis procedure involves comparing scree plots of a reduced correlation matrix with that one derived from a random set of data, and counting the number of eigenvalues that fall above the intersection of these two scree plots is indicative as the "correct" number of factors to retain (Horn, 1965). The second broad class in deciding on the appropriate number of factors to retain 61 according to Preacher and MacCallum (2003) requires the use of M L estimation method. There is a wide array of fit statistics and measures in factor analysis (FA) and structural equation modeling (SEM) that involve this method: for example, the root mean square error of approximation ( R M S E A ) and likelihood-ratio statistics. These statistics have a more formal statistical foundation to them and therefore, a rationale for their use(s) is evident. Although Preacher and MacCallum (2003) provide two main classes of rationale for the "appropriate" number of factors to retain, Hakstian and Muller (1973) define these broad classes differently, but utilize very similar rules to define their classes. Hakstian and Muller (1973) provide four different rationales to assist in justifying an appropriate number of factors. These classes were broken into four categories: 1) algebraic, 2) psychometric, 3) statistical, and 4) psychological. The algebraic class was outlined by Guttman's (1954) weaker lower bounds. This was then translated into a rule of thumb concerned with the number of principal components to retain. Kaiser (1960) then provided a psychometric view by pointing-out that a principal component should have a latent root exceeding "unity" (i.e., 1.0), 1 6 thereby adding a procedural justification to Guttman's weaker lower bound proof. The rule then became widely known as the Kaiser-Guttman rule. Factor analysis has a long history for being renowned for its inferential statistical strength and interpretation. According to Hakstian and Muller's (1973) statistical framework, "the correct number of factors for a set of data is that number yielding a reproduced dispersion or correlation matrix in which the entries are simultaneously within normal sampling error of the observed coefficients" (p.464). The tests of significance for the common factor models provide a 1 6 According to Kaiser (1960), in order for a principal component to have a positive alpha internal consistency, the associated latent root must exceed unity. 62 17 statistical foundation to base decisions, and therefore allow for valid inferences. Hakstian and Muller's (1973) fourth class is the notion of psychological criteria. Cattell (1958) established rules for how much common-factor variance should be accounted for before factoring is completed. As a result, Cattell's (1962) scree plot test and Horn's (1965) parallel analysis were used to determine the number of "appropriate" factors with psychological constructs. These, among other methods, rest on the idea that the number of factors to retain involves extracting components until the percentage of variance accounted for exceeds some arbitrary value, for example 80%. Although the percentage of variance has been a key aspect with determining the appropriate number of factors, Hattie (1985) reviews the classes from a slightly different perspective. Hattie (1985) provided a set of broad categories of indices or rules of thumb, which are used to identify an "appropriate" number of factors. However, Hattie evaluated rules and indices that are specifically used for unidimensionality. He grouped over 30 indices and rules of thumb into five classes: methods based on 1) answer patterns, 2) reliability, 3) principal components, 4) factor analysis, and 5) latent traits. For the first category, the best unidimensional test is a function of the amount by which a set of item responses deviate from the ideal scale pattern. This function takes into consideration factors such as difficulty level, guessing parameters, number of correct answers, order of items, and correlation of items (Hattie). The second class of indices defined by Hattie (1985) is that characterized by reliability. Coefficient alpha (i.e., KR-20) may be one of the most widely used indices in assessing unidimensionality, as well as reliability. Cronbach (1951) stated that "alpha estimates the Tests of significance for the common factor model and the theoretical implications of these tests have been discussed by Thurstone (1938), Andersen and Rubin (1956), Joreskog (1967), Gorsuch (1983), and Comrey and Lee (1992). 63 proportion of test variance due to all common factors among the items" (p.320). It was also stated by Cronbach that this was true only when the inter-item correlation matrix was of unit rank, otherwise alpha was an underestimate of the common factor variance. According to Hattie (1985), " the rank of a matrix is the number of positive, nonzero characteristic roots of the correlation matrix" (p. 144). According to Lumsden (1957), unit rank "means that the matrix fits the Spearman case of the common factor model" (p.89). Hattie's (1985) third class covers indices based on principal components. The first principal component accounts for a maximum percentage of total variance, and this variance is used as an index of unidimensionality. In other words, the larger the amount of variance accounted for by the first component, the more likely the set of items are unidimensional. How high this percentage needs to be is arbitrary and has been given cut-off scores without rationale in psychometric literature (Hattie). The fourth class of definitions outlined by Hattie (1985) includes indices based on FA. It is important to note that F A differs from components analysis in that F A estimates uniqueness for each item. The chi-square test is often used as a decision for an appropriate number of factors. According to Hattie (1985), the chi-square statistic from maximum likelihood is a reasonable measure of the nearness of the residual covariance matrix to a diagonal covariance matrix. Chi-square difference tests are also used when comparing models. McDonald (1982) has suggested the use of the residual covariance matrix to judge the extent of misfit of the model to the data. He also claimed that the residuals may be more important than the significance tests because many of the significance test statistics are extremely sensitive to sample sizes, and 64 therefore, hypotheses according to the significance tests could result with false restricted decisions. Communalities obtained from F A are also used to assist in the decision-making of retaining factors. According to Green, Lissitz, and Mulaik (1977), when there is a single factor among a set of item responses, the item loadings on this factor are equal to the square roots of the relevant communalities. Communalities are often underestimated (Hattie, 1985), and the use of these in making decisions about retaining factors depends on knowledge of the correct dimensionality, which is not always known. The last category Hattie (1985) reviewed was that of latent trait models. The theory of latent traits is centered on the idea that responses to items can be accounted for by identifying characteristics of examinees, which are called latent traits. A latent trait can be interpreted as a magnitude that the items measure in common, which explains all the mutual statistical dependencies among the items (Hattie, 1985). The critical assumption with latent trait modeling (e.g., IRT), however, is that of local independence, which was defined previously in this chapter. According to Hattie (1985), the principal of local independence requires that any two items be uncorrelated when the latent trait magnitude is fixed and does not require that items be uncorrelated when the latent trait level varies over groups. By using a mix of difficulty, discrimination, and guessing, latent trait models are estimated. Appropriate Number of Factors to Retain in This Dissertation The "appropriate", "correct" or "meaningful" number of factors to retain is a decision 65 that involves a great deal of judgment by the researcher. There are many perspectives or broad classes a researcher needs to consider, as noted above. There are obviously many indices to be used, but the decision also involves a great deal of knowledge about the conceptual framework of the latent traits being measured as well as the observed variables (i.e., items). Unfortunately, the conceptual framework cannot be tested in this study because it is a computer simulated investigation. The accuracy with which the data matrix can be reproduced from the factor pattern and scores is a function of the number of factors extracted. This accuracy will first depend on the statistical estimation method selected for assessing dimensionality (e.g., M L , PCA, etc.). Adding factors will increase accuracy, but the decision of how many factors to retain rests on the goal of identifying a limited number of underlying latent variables (i.e., factors) responsible for the observed variances and covariances (i.e., maintaining a parsimonious model). Perfect reproduction of the original variables and their correlations can always be obtained by extracting enough factors (Gorsuch, 1983, p. 143). In saying that, for purposes of this study, what is meant by the "correct" number of factors will be defined in order to judge whether or not the indices or rules of thumb utilized have been successful. For the simulation study reported in this dissertation, the correct or appropriate number of factors was defined as a zero residual correlation, which is obtained after fitting a factor analysis model to an item response matrix. This definition was based on the statistical grounds of the fundamental definition of factor analysis. McDonald (1982), Hakstian and Muller (1973), and Gorsuch (1983) provided the theoretical and statistical grounds for this definition of what is 66 considered to be the correct number of factors to retain. Significance of the Dissertation The key strength of this research is that there were gaps (i.e., discrepancies) in the literature as to what rules or combination of rules and indices are superior when retaining factors. This study is unique in that it addressed some of these issues, thereby filling that gap. The overall objective was to provide guidelines to assist researchers in the decision-making process of retaining factors. The novel contributions of this research were to provide (1) a comprehensive direct comparison of a variety of decision-making rules and indices. This allowed us to determine if there was, in fact, one superior method; and (2) an investigation of whether multiple rules and indices (e.g., eigenvalues-greater-than-one rule and M L RMSEA) aid in the decision-making process of retaining factors. This strategy was suggested in the literature as early as 20 years ago (Hattie, 1984), but had never been investigated to date. Ultimately, the goal was to develop a new statistical methodology that involved multiple criteria when determining the number of factors to retain. Scope of the Study The scope of this study limited the selection of rules and indices. Because the focus of this study was based on unidimensional measures, the decision-making rules and indices that are used in day-to-day research, Likert-type item formats, and the availability of statistical-software, many other possible methods were not discussed or investigated in this study. Furthermore, assessment methods specifically appropriate for tests containing dichotomous items were not 6 7 considered (i.e., tetrachoric correlation matrices). Finally, the conditions of interest were limited to those found in psychological assessment, ignoring issues and conditions that go along with large scale assessment (i.e., IRT- item parameters). Current Research Practices The current day-to-day factor analytic practices and procedures that researchers use were considered vital information for determining the methodology of this dissertation. The methodology of this simulation needed to be guided by up-to-date research practices. What procedures do researchers use? What kind of rationale, if any, is provided for selecting the methods they used? Is there consistency of methods used across researchers? Do the research practices match what was recommended in measurement literature? In addressing these questions, the next chapter is a systematic review of the current practices and procedures that researchers are using when conducting factor analyses. 68 Chapter IV An Investigation of the Current Research Practices of Assessing Dimensionality via Exploratory Factor Analysis Introduction to the Problem Exploratory factor analysis (EFA) is one of the most widely used statistical procedures in behavioral and social science research (Ford et al., 1986; Fabrigar et al., 1999). Despite the common use and extensive literature on EFA, there is still a debate in regards to the success and appropriateness of its methods and procedures. For example, EFA has been found to be unreliable in extracting factors and providing meaningful interpretations (Fabrigar et al.). In addition, EFA has been criticized for its limited capability in contributing to theory development (Gould, 1981). There has also been opposition to the way E F A is applied (e.g., Comrey, 1978; Gorsuch, 1983). Such as, there are several methodological issues that a researcher must consider in order to implement E F A effectively, such as the design of a study, selection of the sample, and factor rotation. Yet, the methods and practices that are used by researchers when applying EFA vary and lack rationale (Fabrigar et al.). Russell (2002), on the other hand, found that researchers who employed E F A to analyze their data often had unambiguous and accurate predictions regardless of the factor structure of the measures. For instance, of the 137 EFA analyses reported in the Personality and Social Psychology Bulletin (PSPB) during the years 1996, 1998, and 2000, 54 (39%) included analyses where the researchers had clear predictions regarding the factor structure of the measures at hand. 69 Previous research findings that are specific to EFA practices were illustrated in Chapter III of this dissertation. One apparent question that rises from reviewing these previous research findings is whether researchers actually utilize the findings and apply the recommendations made in the literature. Fabrigar et al. (1999) and Russell (2002) addressed this question by conducting surveys to investigate the use of EFA by researchers. Purpose of the Investigation Fabrigar et al. (1999) and Russell (2002) provided examinations of the methods and practices that are commonly used by researchers when conducting an EFA. Although these two studies surveyed the use of EFA by researchers, these investigations did not intend to inform educational or health research, both of which commonly utilize EFA. Their studies examined psychological research only. In addition, the objectives of these two studies were to inform practice, not necessarily simulation research. Their studies explored a variety of EFA procedures (e.g., study design, appropriateness of EFA, number of measures per factor, model-fitting procedures, factor rotation). This dissertation is concerned with one E F A procedure, in particular. That is, the technique of determining the number of factors. This technique encompasses certain facets that are not necessarily pertinent to other E F A procedures (e.g., magnitude of communalities, distribution of test items, etc.). The purpose of this review was to illustrate the day-to-day research practices in ascertaining the dimensionality of psycho-educational tests and health measures via EFA. In view of that, the goal of this dissertation was to investigate and test decision-making rules and 70 indices that are currently used by researchers when assessing the unidimensionality of psycho-educational and health measures via E F A . The E F A decision-making rules and indices and the values of the research conditions (e.g., test length and sample size) that were used in the simulation of this dissertation should reflect current E F A practices. Therefore, the objective of this review was to inform the simulation design of this dissertation.18 Accordingly, the results of the simulation were used to (1) determine if the procedures researchers are currently using are successful in determining unidimensionality, and (2) to make recommendations that can be directed specifically at the procedures currently being used in practice. To survey how E F A was currently being conducted in the psycho-educational and health research, I conducted a review of articles in six psycho-educational and health journals for the time period of 1999 through 2004. Previous Surveys As mentioned previously, there are two relevant studies that assessed current E F A practices. First, Fabrigar and colleagues (1999) systematically examined applications of E F A used in psychological research in order to determine if researchers were using procedures that had been shown to produce misleading results. They investigated research articles that were published from 1991 through 1995 in the Journal of Personality and Social Psychology (JPSP) and the Journal of Applied Psychology (JAP). Fabrigar et al. selected these journals based on their reputation for rigorous work, and also because they represented two different areas in psychology: social and industrial-organization psychology. Their survey entailed examining The decision-making rules and various conditions, such as sample size, that were selected for the simulation design of this dissertation were based on previous research findings, as represented in Chapter III, and the values of the various conditions were based on current research practices described in this chapter. 71 every article published in these journals from 1991 through 1995 to determine if articles used EFA procedures. If a study met this requirement, then several criteria were recorded: the ratio of measured variables to factors, the average reliability of measured variables, sample size, common factors versus principal components, model-fitting procedures, the method for determining the number of factors, and the rotation procedures. Russell (2002) examined several different criteria. Russell (2002) investigated the use of factor analysis (i.e., EFA, PCA, and confirmatory factor analysis, CFA) in order to review and discuss how researchers are using certain FA techniques in psychological research. The Personality and Social Psychology Bulletin (PSPB) was examined in the years of 1996, 1998, and 2000. The following criteria were recorded for EFA procedures: the factor extraction procedures, sample size, the number of measured variables, method for determining the number of factors, factor rotation, and the process of creating of factor scores. Methods In order to explore the uses of E F A in psycho-educational and health research, I conducted a review of articles that were published from 1999 to 2004. Six relevant journals were selected: Journal of Educational Psychology (JEP), Educational and Psychological Measurement (EPM), British Journal of Educational Psychology (BJEP), Health Psychology (HP), British Journal of Health Psychology (BJHP), Psychological Assessment (PA), and European Journal of Psychological Assessment (EJPA). These journals were selected to reflect 72 E F A practices that are conducted in education, health, and psychological assessment research. These journals are known to publish articles that reference psychometric and statistical analyses, especially in regards to test development and measurement issues in the psycho-educational and health disciplines. In my review, I examined every article published in these journals during the specified time period to determine if any statistical reported use of an exploratory factor analysis by using PsycLNFO and E B S C O Academic Search Premier, a multi-disciplinary research database. If an article in one of these journals addressed an E F A , the manuscript was examined for specific criteria. Studies were coded by the criteria. Criteria Recorded The articles were reviewed in order to record the following: (1) reported sample size, (2) the number of test items (i.e., observed variables), (3) distribution of test items, (4) the number of factors, (5) analyses conducted (i.e., estimation procedure, rotation procedure, and software used), (6) method for determining the number of factors, (7) number of scale points, and (8) magnitude of communality. It is important to note that there were several differences between the current survey and the studies conducted by Fabrigar et al. (1999) and Russell (2002). For example, there were criteria that the two previous surveys did not capture that the current study recorded. First, neither of these surveys recorded or discussed the distribution of tests items. Second, the surveys did not attain the number of scale points that were used on the psychological instruments being investigated in the articles. In addition, the time periods of these surveys varied. For example, the 73 current survey reviewed articles from 1999 to 2004, Fabrigar et al. examined articles from 1991 to 1995, and Russell investigated articles in the years of 1996, 1998, and 2000. Furthermore, both Fabrigar et al. and Russell utilized different journals and different disciplines than that of the current survey. Therefore, the differences in time periods, journals, and disciplines should be taken into consideration when the findings from these surveys are reviewed. The criteria recorded for the current survey can be seen in Table 3 below. Appendix B provides the details on these findings. Rationale for Criteria Recorded There were several factors that influenced which criteria were recorded for this survey. The first being that this search aimed at providing an illustration of day-to-day research practices of assessing unidimensionality. There are a series of steps and variables that are critical to the process of assessing unidimensionality via EFA. For example, sample size and the number of test items per factor are related to the process of determining the number of factors, as described in Chapter III. Thus, sample size, number of test items, and the number of factors were recorded for the current survey. The two previous surveys also influenced which criteria to record. Fabrigar et al. (1999) and Russell (2002) captured a wide range of E F A practices and variables when conducting their surveys of journals. I utilized several of the criteria they reported. 74 Table 3 Summary of Criteria Reported in Journals Recorded Variables Summary Sample Size The sample sizes ranged from 95 to 9520. The average sample size is 677, and the median is 433. Number of Test Items The number of test items ranged from 6 to 167 with an average of 38 and a median of 24. Distribution of Test Items There was an intention to collect reported data on distribution of test items. One study reported the distribution to be "non-normal", but no skewness statistics were reported. The rest of the articles did not report if the distribution of items was explored. Factor Structure The number of factors ranged from one to nine, with an average of four and a median of three. Test Items to Number of Factors Ratio (p:r) The ratio of the number of test items to the number of factors (p:r) ranged from 3 to 105 with an average of 13 and a median of nine. Analysis: â€¢ Software â€¢ Estimation method â€¢ The reporting of which software programs were utilized to conduct an E F A was lacking: SPSS was indicated in three studies. â€¢ Estimation: Principal Components Analysis (PCA) was reported in 31 studies, Principal Axis was reported in 14 studies, and Maximum Likelihood (ML) was reported in 10 studies. The rest of the articles failed to report estimation. Method for Deciding on the Number of Factors to Retain The scree plot was reported in 30 studies; eigenvalues > 1 rule was reported in 32 studies; factor loadings > .25 in one study, loadings >.30 in 18 studies, loadings >.40 in 12 studies, loadings >.45 in one study, and loadings >.50 in one study; chi-square statistic was reported in 10 studies; R M S E A was reported in nine studies; P A was reported in eight studies; review of magnitude of communalities was reported in two studies. â€¢ 47 studies utilized multiple criteria. Number of Scale Points The number of scale points ranged from dichotomous to nine scale points with an average (and median) of five. There were seven studies that utilized mixed response formats. 75 Results Several aspects of the findings are worthy of discussion. The overall findings are discussed, and results are also presented separately for the (1) education, (2) health, and (3) psychological assessment research. If applicable, the findings from the current survey are compared to Fabrigar et al. (1999) or Russell (2002). The JEP, EPM, BJEP, HP, BJHP, PA, and EJPA collectively conducted a total of 65 E F A analyses in their articles from 1999 to 2004. For the educational research {JEP, EPM, BJEP), 18 studies utilized EFA. For the health research (HP, BJHP), 15 studies utilized EFA. For the psychological-assessment research (PA, EJPA), 32 studies utilized EFA. Psychological assessment seemed to utilize E F A more than the education and health research. Results from Recorded Criteria The first criterion to discuss is the sample size. As Table 3 indicates, the sample sizes in the current study ranged from 95 to 9520, with an average sample size of 668 cases. There were 45 articles (69%) that reported using a sample size of 300 or greater. Russell (2002) found that 39% of the analyses that were surveyed utilized sample sizes of 100 or fewer, and 23% involved sample sizes of approximately 100 to 200. Fabrigar et al. (1999) found that approximately 33% used sample sizes of less than 200, and less than a third of the studies used sample sizes between 200 and 400. For the educational journals, sample size ranged from 95 to 1049, with an average 76 sample size of 424. For the health journals, sample size ranged from 191 to 2864, with an average sample size of 697. And for the psychological journals, sample size ranged from 174 to 1230, with an average sample size of 802. Overall, larger sample sizes were being used in the articles examined in the current survey compared to the articles surveyed by Fabrigar et al. and Russell. None of the articles in the current study reported magnitude of communality. Russell (2002) stated that very few researchers reported communality information. Fabrigar et al. (1999) found no reporting of the magnitude of communalities in the articles assessed, but the average communality associated with 18 data sets was calculated by examining the squared multiple correlations taken from the correlation matrices. The average magnitude of communality ranged from 0.12 to 0.65 for these data sets, with the median of these averages as 0.425. An important topic to take into consideration when conducting an E F A involves the number of items that are utilized in an EFA relative to the number of factors extracted (i.e., p:r ratio). This ratio is associated with identification (i.e., having a sufficient number of observed variables that load on each factor in order to be interpretable), and identification was not discussed in the articles currently surveyed. The number of test items in the current study ranged from 6 to 167 with an average of 38. The number of factors extracted ranged from one to nine with an average of four dimensions. Comrey and Lee (1992) recommended that there should be five times as many items as factors in order for a factor to be adequately represented. However, Russell (2002) stated that three items per factor are required for a model to be identified. Fabrigar et al. (1999) stated that 77 four or more items per factor are recommended for a factor solution to be interpretable. Fabrigar et al. reported that the majority of articles were based on ratios of at least 4:1 (i.e., most were overdetermined), and approximately 1 in 5 of the articles entailed ratios less than 4:1. Russell found that 25% of the E F A analyses included three or more measures per factor. In the current survey it was found that 100% of the analyses in these journals included a p:r ratio of three or more. The current study found that the p:r ratio ranged from 3 to 105, with an average of 13 and a median of nine. When examining unidimensionality alone, the average p:r ratio was 32 and the median was 30. For the education and health journals, the average p:r ratio was nine, and for the psychological research, the average p:r ratio was 13. Thus, the psychological research utilized higher p:r ratios compared to the education and health journals. As seen in Table 3, the distribution of test items was consistently underreported. One study in EJPA mentioned a "non-normal" distribution, but skewness indicators were not reported nor whether or not the distributions were assessed for skewness. The distribution of items is known to influence the robustness of specific analyses (e.g., Maximum Likelihood) and the statistics associated with such analyses (e.g., chi square statistic). Therefore, it is surprising that these articles neglected to report such a crucial variable to the process of assessing dimensionality. The distribution of items was not a criterion assessed by Russell (2002) or Fabrigar etal.( 1999). Various factor analytic procedures were recorded. As seen in Table 3, Principal Components Analysis (PCA) was reported as being used to extract factors in 31 studies, Principal Axis was reported in 14 studies, and Maximum Likelihood (ML) was reported in 10 78 studies. The rest of the articles (n=10) failed to report estimation or factor extraction procedures. Fabrigar et al. (1999) stated that approximately half of the articles reported use of PCA, 20% of the analyses reported using some form of an EFA, and 25% of the articles did not report which methods were used. PCA was the most commonly used method for factor extraction for the education, health, and psychological assessment journals. The method for determining the number of factors to retain was also a criterion assessed in this survey. Only two articles failed to report which method(s) were used to determine the number of factors to retain. Overall, 33 studies utilized one of the factor loading criteria (i.e., factor loadings greater than 0.25, 0.30, 0.40, 0.45, or 0.50),19 30 studies utilized the eigenvalues-greater-than-one rule, 24 studies employed the scree-plot test, eight studies applied parallel analysis, 10 studies used the chi-square statistic, nine studies used the RMSEA, and two studies reported using the magnitude of communality (i.e., items with communalities that were "high" were used to define a factor). Educational journals utilized the eigenvalues-greater-than-one rule most often (14 out of the 18 articles), whereas the health and psychological journals used one of the criteria of the factor loadings most often (health journals: nine out of the 15 articles; psychological: 19 out of the 32 articles). Fabrigar et al. (1999) reported that in approximately 40% of the articles authors failed to report how they arrived at the decision as to the number of factors to include. Russell (2002) stated that 55% of the E F A articles did not indicate the criteria used to determine the number of factors to retain. Of the remaining analyses, however, Russell found that 52% reported using the eigenvalues-greater-than-one rule, and 30% of the analyses used the scree plot. Likewise, 1 9 As mentioned in Chapter III, using the values of factor loadings is not considered a formal approach to determine the number of factors in the psychometric literature (Hattie, 1985; Lord, 1980; Fabrigar et al.,. 1999; Russell, 2002). 79 Fabrigar et al. found eigenvalues-greater-than-one rule to be the most popular when examining both journals, followed by the scree-plot. The current study, however, found the factor loadings to be the most widely used criterion for determining the number of factors to retain. Although it may appear that there has been a shift in research practice (in that currently the factor loadings are the most commonly used rule for deciding on the number of factors to retain), it is difficult to compare this finding to the survey results from Fabrigar et al. and Russell. Both Fabrigar et al. and Russell were silent on whether researchers used factor loadings to decide on the number of factors to retain (i.e., these surveys did not report the use of factor loadings). Researchers may have been using factor loadings as an approach to decide on the number of factors to retain, but Fabrigar et al. and Russell did not report that information. Or, it could have been that the researchers themselves did not report the use of factor loadings in their articles even though they may have been using this approach. Because of the lack of information reported in regards to the use of factor loadings in the surveys from Fabrigar et al. and Russell, it is challenging to know if there was actually a change in practice. As noted in Chapter III, this approach is not considered a rule in the psychometric literature. As found with Fabrigar et al. (1999), a large number of the articles in the current survey reported using multiple methods (47 of the 65), which included some combination of the eigenvalues-greater-than-one rule, scree plot, parallel analysis, factor loading criteria, chi-square statistic, and R M S E A index. The practice of multiple methods has been suggested in the literature, but which methods to actually combine has not been empirically tested. Fabrigar et al. reported that approximately 20% of the analyses employed multiple methods in determining the 80 number of factors to retain. Finally, the last criterion to discuss is the number of scale points. In the current survey, the number of scale points ranged from dichotomous (two points) to nine scale points with an average (and median) of five. There were seven studies that utilized mixed response formats in the educational and psychological assessment journals. The average number of scale points used in studies for the educational journals was five with the median of four, and for the health journals the average was four with a median of five. The average number of scale points for measures found in psychological assessment journals was five with a median of four. As mentioned previously, Fabrigar et al. (1999) and Russell (2002) did not record the number of scale points in their surveys. Summary Russell (2002) and Fabrigar et al. (1999) proposed that researchers may be making poor decisions when employing and reporting results from E F A . The analyses from the current survey reveal that many of the errors that Fabrigar et al. and Russell highlighted are also being made by researchers in the psycho-educational and health fields. This survey shows that the reporting of E F A in the psycho-educational and health research is often of poor quality. For example, despite the fact that communalities have been found to be important to the interpretability of factors (as seen in Chapter III), they were not reported in the articles examined in this survey, as found with Fabrigar et al. and Russell. In addition, the distribution of items has been shown to inflate or deflate the statistics associated with specific procedures (e.g., chi-square statistic of M L ) . Again, 81 researchers failed to integrate such an analysis or discussion in their articles. Furthermore, statistical software also exercises an effect on the way analyses are performed. As Fabrigar et al. claimed, many of the researchers today do not show a complete understanding of the capabilities of the software programs (e.g., syntax for conducting certain techniques, such as parallel analysis) or the deficiencies of these programs (e.g., statistical techniques that are default, but inappropriate for certain studies). In the current survey, most of the researchers failed to indicate the statistical packages that were used for the analyses (i.e., three of the 65 studies reported using SPSS). Moreover, 37% of the articles that were examined failed to report the factor extraction method. Fabrigar et al. found that 25% of the authors neglected to report such techniques. Although poor practices were found, the current survey did uncover several proficient practices that deserve recognition. First, researchers used larger sample sizes than the studies investigated by Fabrigar et al. (1999) and Russell (2002). In the current study, the average sample size was 668, and only 31% of the analyses utilized a sample size less than 300. On the other hand, Fabrigar et al. found that 67% used sample sizes less than 400 and Russell reported that 62% used sample sizes less than 200. In addition, it was found that 100% of the analyses reviewed in the current survey included a p:r ratio of three or more, whereas Russell's findings indicated that 67% generated a p:r ratio of three or less. Moreover, only two out of the 65 articles that were currently surveyed failed to report a technique to determine the number of factors to retain. Fabrigar et al. found that 40% neglected to report this technique, and Russell reported that 55% did not report a method to determine the number of factors to include. Finally, 72% of the articles currently surveyed indicated use of multiple methods in determining the number of 82 factors to retain, whereas Fabrigar et al. reported 20% using multiple methods. As mentioned in Chapter III, multiple methods are recommended, and researchers are currently employing multiple methods. However, the rationale and success of these combinations were not discussed in the articles currently surveyed. It is important to note that the Fabrigar et al. and Russell surveys examined different journals, disciplines, and time periods. Conclusion The purpose of this survey was to examine the current research practices in ascertaining the dimensionality of psycho-educational tests and health measures via E F A in order to inform the simulation design of this dissertation. The results of this dissertation were used to determine if the procedures researchers were currently using are successful in determining unidimensionality. The results of this survey were used to inform the simulation methodology of this dissertation, as seen in Chapter V . 83 Chapter V Computer Simulation Methodology The purpose of this chapter is to introduce the methodology of the computer simulation and the analyses that were used to investigate the nine decision-making rules and indices and combinations. In doing so, the methodology is organized according to two sub-studies: (1) investigation of strict unidimensional measures, and (2) investigation of essential unidimensional measures. The independent variables (e.g., sample size) and dependent variables (e.g., R M S E A ) were selected based on the literature review presented in Chapter III, and the values of the independent variables were based on current research practices, as presented in Chapter IV. The independent variables for both strict and essential unidimensionality are introduced, as well as the dependent variables. The computer simulated design for the population data and F A procedures are discussed. The chapter subsequently introduces the analyses, which are organized according to the three research questions that were presented in Chapter III. Strict Unidimensionality Strict unidimensionality is defined as one dominant latent variable with no secondary minor dimensions. In other words, when the observed variables (i.e. test items) generate factor loadings that are high for only one factor, a strict unidimensional model is reflected. In order to reflect this notion of strict unidimensionality, a statistical model that was based on the magnitude 84 of communality was applied. Remember that a communality is the proportion of variance that is accounted for by the common factors. More specifically, the communality is the sum of squared factor loadings for a variable across factors (Tabachnick & Fidell, 2001). Therefore, a factor model cannot have high loadings and low communalities simultaneously. If the factor loadings are high, then the communalities will also be high. The factor loadings for the strict unidimensional models were generated via magnitude of communalities. Based on the calculation of a communality estimate, the factor loadings were generated by the following equation: Factor loading = (%h2) x 4r7 (16) As shown in Equation 16, a certain proportion of the communality is distributed among the factors for each observed variable. Because there is only one factor for strict unidimensional models, 100% of the communality is allocated to factor one. The factor loadings were then imputed into the simulation design for factor analytic procedures. In order to investigate strict unidimensional measures, three different conditions were varied: magnitude of communality, distribution of test items, and sample size. There were two levels of communalities, two skewness indices for the distribution of test items, and three different sample sizes. 85 Conditions for the Magnitude of Communality It has been found in several studies that factor analytic solutions are influenced by the magnitude of communality (Preacher & MacCallum, 2002; MacCallum et al., 1999, 2001; Fabrigar et al., 1999). Preacher and MacCallum (2002) selected high values for the communality estimates, whereas MacCallum et al. (1999; 2001) selected a wide range for the magnitude of communalities. In order to reflect the methodology of both these simulations, the magnitudes of the communality selected for this simulation were of a wide variability and included a high communality (/*2=0.20 and 0.90). Conditions for the Sample Size The data consisted of three different sample sizes: 100, 450, 800. As seen in Chapter TV, researchers in the psycho-educational and health fields are currently using relatively large sample sizes. Sixty-eight percent of the studies investigated in Chapter TV employed sample sizes of 300 or greater with an average sample size of 668. In order to reflect current practice, the findings from Chapter IV's survey were utilized. The survey's minimum sample size of 95 was rounded to 100. The median value was 433, which was rounded to 450. The difference was then taken between 100 and 450, which was 350. This difference was added to 450 to get 800. The distribution of the various sample sizes found in Chapter IV was highly skewed. Therefore, the difference between the minimum value and the median was used to get the maximum sample size. The smaller sample size of 100 was used to determine if smaller sample sizes can in fact 86 recover the number of factors in a population. Although Preacher and MacCallum (2002) and Schonemann (1981) utilized sample sizes as small as n=10 and n=30, a sample size of 100 is considered small in the psycho-educational context, as shown in Chapter IV. Conditions for Distributions The distribution of the test items was simulated so that both symmetric and skewed conditions were included. Many psychological variables (e.g., intelligence) have bell-shaped (unimodal and symmetric) distributions (Glenberg, 1988). However, distributions studied in social and behavioral sciences are often strongly skewed (Aron & Aron, 2002). Therefore, both skewed and symmetric distributions were included. The skewness indicators and threshold values for a five-point Likert scale were based off of DiStefano's (2002) methodology. Continuous data were initially generated. The first skewness indicator of zero was chosen to represent a normal distribution of item responses. The area under the curve was reported approximately 5%, 21%, 48%, 21%, and 5% of the'responses for ordered categories one through five. A second skewness indicator of 2.50 was produced to represent extreme responses to items. For the nonnormal item distributions, the percentage of responses (i.e., thresholds) in each of the five categories was approximately 75, 15, 5, 3, and 2 for categories one through five. 87 Essential Unidimensionality Essential unidimensionality is defined as one dominant latent variable with secondary minor latent variable(s). From an operational standpoint, essential unidimensionality is defined via manipulation of the magnitude of communality, proportion of communality on the second factor, and the number of test items with non-zero loadings on the second factor. Secondary minor factors were included in the design of the simulation by specifying a certain proportion of a communality to the major factor and a certain proportion to the secondary minor factor. In addition, the number of test items that loaded on the second factor also constituted a secondary minor factor. In order to investigate essential unidimensional measures, five different conditions were varied: magnitude of communality, distribution of test items, sample size, proportion of communality on the second factor, and the number of test items with nonzero loadings on the second factor. The first three conditions were introduced, above, within the strict unidimensional methodology. In review, there were two magnitudes of communality (/z2=0.20 and 0.90), two distributions of test item (skewness=0.00 and 2.50), and three different sample sizes (n=100, 450, 800). Likewise, there were also three proportions of communality on the second factor and two different numbers of test items with nonzero loadings on the second factor manipulated for essential unidimensional models. \ 88 Conditions for the Proportion of Communality on Secondary Factor The proportion of the magnitude of communality for each observed variable (i.e., test item) is related to the factor loadings. Again, recall that the communality is the sum of squared loadings for a variable across factors. Even though a communality may be high, it is important to decipher which factor is accounting for more of that variable's variance. By varying the percentage of variance for the factors for each observed variable, this study indirectly investigated different factor loadings. Each factor was assigned a certain proportion of the magnitude of communality levels. As shown in Equation 16, a factor loading is calculated by multiplying that proportion times the square root of the communality. Therefore, factor loadings are higher when the proportion is higher and vice versa. It is important to note that this relationship is actually reversed outside of a simulation context. When factor loadings are high, the communalities will also be high. In the current study, we worked backwards by starting with the communalities, and that, in-return, generated factor loadings. The second factor was assigned a percentage of the communality, and by default, the first factor received the difference between that percentage and 100%. In order to generate secondary dimensions of varying strengths, the proportion of communalities on the secondary minor factor varied as follows: 0.05, 0.30, and 0.50. Any value greater than 50% would indicate another dominant factor, and the current study investigated unidimensional measures only. The first factor received the higher proportions of the communality, except for the cases when factor one and the secondary minor factor received equal proportions of the communalities (i.e., 0.50). 89 Example of the Generation of Factor Loadings As seen in Table 4, the factor loadings (columns one and two), magnitude of communality (column three), and proportion of communality distributed among factor one and factor two (columns four and five) are all interconnected. It is shown in Table 4 that when communalities and the proportion of communality allocated to a factor are high, the factor loadings will also be high. For example, when 30% of the 0.20 communality is allocated to factor two, 70% of the 0.20 is then distributed to factor one. As shown in Table 4, the factor loading of factor one (0.31) is higher than the factor loading of factor two (0.13). In addition, if the same proportions of communality are used (0.30 for factor to and 0.70 for factor one), but a 0.90 communality is used instead of 0.20, the factor loadings increase for both factors, but again, factor one has a higher factor loading (0.66) than factor two (0.28). Table 4 Example of the Relationship between Communality and Factor Loadings Factor 1 Factor 2 Communality Proportion Proportion Loadings Loadings (h2) of h2for of h2for Factor 1 Factor 2 0.22 0.22 0.20 0.50 0.50 0.47 0.47 0.90 0.50 0.50 0.31 0.13 0.20 0.70 0.30 0.66 0.28 0.90 0.70 0.30 0.42 0.02 0.20 0.95 0.05 0.90 0.05 0.90 0.95 0.05 0.45 0.00 0.20 1.00 0.00 0.95 0.00 0.90 1.00 0.00 90 Conditions for the Number of Test Items with Non-Zero Loadings on the Second Factor The question of how many items need to load onto a factor in order for that factor to be considered a meaningful latent construct has been a question that remains unanswered. Fabrigar et al. (1999) claimed that mirroring the population factor structure is not necessarily a function of the sample size or the number of observed variables (i.e., test length), but rather is influenced by the magnitude of communality as well as overdetermination. Overdetermination is referred to as the degree to which each factor is clearly represented by a sufficient number of variables and is often assessed by the number of factors to the number of variables ratio (p:r). Highly overdetermined factors are those that exhibit high factor loadings on at least three or four variables (MacCallum et al., 1999). The more items that load on the second factor, the higher the chances for a second factor to become overdetermined, depending on whether the factor loadings are high (which is the result of the communality and proportion of communality on the second factor, as seen in Table 4). The number of items with non-zero loadings on the second factor will vary as follows: three and six. These items (which will have factor loadings) will represent the varying magnitudes of secondary minor dimensions. Any number of items greater than six with non-zero loadings would represent another dominant factor (i.e., the number of test items is 14), and the current study examined unidimensional measures only. 91 Factors Held Constant There were several experimental factors that were held constant (i.e., not investigated in this study): the correlation between the first and second factor, the number of test items per scale, and the number of scale points. Hattie (1984) found that the intercorrelation between factors turned out to be important when determining unidimensionality. Several rules and indices distinguished between one and two factors when the intercorrelation was 0.10, but not when the intercorrelation was 0.50. Although this was a unique finding, in order to keep the factor structure of this simulation study at a manageable level, the intercorrelation between factors one and two were held constant at a value of zero. Examining the intercorrelations between factors is part of future research initiatives. The test length was also held constant in the current simulation study. As seen in Chapter IV, common test lengths in psycho-educational and health fields ranged from 6 to 167 with an average of 38 and a median of 24. When assessing unidimensional measures in particular, Chapter IV showed that the average test length was 18 items with a median of 11. Therefore, the test length for the current simulation study was selected as 14, halfway between 18 and 11. However, the significant-matter is not necessarily the test length per say, but rather the magnitude of communality, as mentioned above. Thus, the test length is held constant. The number of scale points was also held constant. As seen in Chapter IV, the average item format included a 5-point Likert response format. Therefore, the current study utilized a 5-point scale. Additional numbers of scale points will be investigated in future research. 92 Dependent Variables: Decision-making Rules and Indices There are numerous decision-making rules and indices that are used to determine the number of factors to retain when assessing dimensionality, as presented in Chapter III. The simultaneous use of multiple decision-making rules has been recommended (Boyd & Gorsuch, 2003; Fabrigar et al., 1999; Thompson & Daniel, 1996) and practiced (see Chapter IV). Therefore, the dependent variables for this simulation study included optimal combinations (i.e., simultaneous use) as well as the individual application of various decision-making rules and indices. The rules and indices included the following: (1) the chi-square statistic from Maximum Likelihood (ML) factor analysis, (2) the chi-square statistic from Generalized Least Squares (GLS) factor analysis, (3) the eigenvalues-greater-than-one rule, (4) the ratio-of- first- to-second eigenvalues-greater-than three rule, (5) the ratio-of- first- to-second eigenvalues-greater-than four rule, (6) PA using continuous data, (7) PA using Likert (i.e., ordinal or rating scale) data, (8) the Root Mean Square of Approximation (RMSEA) index from M L , and (9) the Root Mean Square of Approximation (RMSEA) index from GLS. The selection of these rules was based on widely available statistical packages, computational restrictions, as well as previous simulation and theoretical studies. Because this dissertation is aiming at day-to-day researchers, the design of the simulation reflected what these researchers would practice, including the software they utilize. Common software packages that conduct EFA (SPSS and SAS) are capable of conducting the nine decision-making rules and indices selected for this study. In addition, computational restrictions refer to rules like the scree plot that cannot be included in a simulation study. 93 Procedures Generation of Item Response Data For the present simulation study, population covariance matrices were created under a variety of conditions for both strict and essential unidimensional measures, as illustrated above. The population data were produced based on factor loadings, which varied according to the strict and essential unidimensional cell structure. For each condition investigated, a population covariance matrix was computed on the fundamental equation of factor analysis (see Equation 10, Chapter II). For each population matrix, 100 samples were generated with a specific sample size. The process from going from the correlation matrix to the data is, in essence, a process that mimics doing a factor analysis backwards. This is a well-known and widely used method. That is, one starts with uncorrelated, normally distributed data that are transformed to a specified correlation pattern (while maintaining the same means, variances, and distributional shape) by applying the weights from a P C A (Cholesky decomposition) of the correlation matrix. The resultant data are normally distributed with specified means, variances, and correlation matrix values. Continuous item responses were then transformed into Likert responses, in the population, for both symmetric and skewed distributions using the approach by DiStefano (2002), as described above. This process was replicated to create the comparative data for P A Likert and P A continuous data, with all the factor loadings being 0.00, resulting in a correlation matrix with all the off-diagonal elements equal to zero. 94 Example of Population Pattern Matrix for Essential Unidimensional Model Table 5 gives an example of a population pattern matrix consisting of 14 observed variables (i.e., items) for an essential unidimensional cell design with a magnitude of communality (h2) as 0.90, a proportion of communality on the second factor as 0.05, and the number of items loading on second factor as six. As shown below, although there are six items loading on the second factor and a high communality is present, the factor loadings are quite low for the secondary dimension. That is, only 5% of the communality is distributed to the second factor. Table 5 Example of Population Pattern Matrix: Essential Unidimensional (h2 = 0.90, proportion of h2on the second factor =0.05, number of items loading on second factor = 6) Test Factor 1 Factor 2 Items Loadings Loadings 1 0.948683 0.00 2 0.948683 0.00 3 0.948683 0.00 4 0.948683 0.00 5 0.948683 0.00 6 0.948683 0.00 7 0.948683 0.00 8 0.901249 0.00 9 0.901249 0.047434 10 0.901249 0.047434 11 0.901249 0.047434 12 0.901249 0.047434 13 0.901249 0.047434 14 0.901249 0.047434 95 FA and PCA Analyses F A and P C A analyses were conducted for each replication, which included saving various output data using SPSS 13.0 Output Management System (OMS). The following procedures were performed on each replication: (1) F A using M L estimation, (2) F A using G L S estimation, (3) P C A , (4) P C A on the random (PA) Likert data, and (5) P C A on the random (PA) continuous data. Merging of Data Files Data files were merged in order to conduct analyses. The chi-square statistic, df, and p values from each replication of F A using M L estimation were merged into one M L data file. The chi-square statistic, df, and p values from each replication of F A using G L S estimation were merged into one G L S data file. The first and second eigenvalues and percent of variance accounted for from each replication of P C A were merged into one P C A data file. The first and second eigenvalues and percent of variance accounted for from each replication of P C A for Likert P A were merged into one P A Likert data file. The first and second eigenvalues and percent of variance accounted for from each replication of P C A for continuous P A were merged into one P A continuous data file. These five data files were then merged into one overall data file for strict unidimensional cell structure and one overall data file for essential unidimensional cell structure. Analyses were run on these two data files. In summary, the simulation study was comprised of a 2 (magnitude of communality) x 2 (skewness of items) x 3 (sample size) completely crossed factor design resulting in 12 different cells or sets of conditions for strict unidimensionality. The simulation was also comprised of a 2 96 (magnitude of communality) x 2 (skewness of items) x 3 (sample size) x 3 (proportion of communality on second factor) x 2 (number of test items with nonzero loadings on the second factor) completely crossed factor design resulting in 72 different cells or sets of conditions for essential unidimensionality. The factor model was determined by the conditions of the cell. There were 100 replications per cell. The syntax for the F A and P C A was generated using the Output Management System in SPSS 13.0. Factor analysis of matrices was performed by M L and G L S estimation methods. Exploratory factor analysis (EFA) was conducted on all Pearson product-moment (PPM) correlation matrices via SPSS 13.0. P P M correlation matrices were selected in order to reflect current research practices." The overall methodology of the computer simulation can be seen in Figure 4. 2 1 This dissertation is grounded upon current research practices, and the design of the simulation reflects what these researchers would practice, including the software they utilize. Standard statistical software packages (e.g.,rSPSS & SAS) do not include polychoric or tetrachoric correltaion matrices. The default matrix in standard software-packages is the PPM. Therefore, PPM correlations were used in this simulation study to reflect current practices. Figure 4 Flowchart of Simulation Methodology Generation of Item Response Data i Syntax for F A Analyses i Generation of Output Files ~ r â€” Merge Data Files of Simulation Results â€”Tâ€” Analysis of Simulated Data 1 ' Determination of whether Rules & Combinations were Successful I Development of Guidelines 98 Data Analyses As illustrated above, there were two overall data files that were used for analyses: one for strict unidimensionality and one for essential unidimensionality. Simulation results were analyzed and rules and indices were investigated according to the three research questions that were outlined in Chapter III. The following subsections describe how the three research questions initiated the methodology of the data analyses. Research Questions Research Question 1A and IB The Research Question 1A investigated the performance of the individual nine decision-making rules and indices for both strict and essential unidimensional measures. In order to analyze the performance, an accuracy index was computed for each of the nine decision-making rules and indices, for each case and each replication. This index coded each of the nine decision-making rules and indices as either being accurate (i.e., correctly identified a unidimensional model) or inaccurate (i.e., did not select a unidimensional model) for each case. A n overall accuracy rate or proportion (i.e., how often a decision-making rule or index was correct in identifying unidimensionality) ranging from 0.00 to 1.00 was generated for each of the nine decision-making rules and indices. Descriptive statistics in terms of cell means of the accuracy index were generated in order to describe performance according to the various conditions. For example, an accuracy rate for each of the decision-making rules and indices was computed for a 99 strict unidimensional measure with a skewness of 2.50, sample size of 100, and a magnitude of communality as 0.20. Research Question IB consisted of investigating the main effects of the independent variables (e.g., sample size) on the performance of the individual rules and indices. Such variables (i.e., conditions) included sample size, levels of skewness, and the magnitude of communality for both strict and essential unidimensional measures. The proportion of communality on the second factor and the number of items with non-zero loadings on the second factor were also examined for essential unidimensional measures. The main effects were explored via bar graphs (see Figures 5 to 36) and examined by use of binary logistic regression (see Tables 11 and 12) for both strict and essential unidimensionality. The bar charts provide the overall mean values of the accuracy rates for the individual decision-making rules and indices. The Wald chi-square test results from the binary logistic regression are provided in Tables 11 9 9 and 12. A binary logistic regression was run separately for each of the dependent variables (i.e., nine individual decision-making rules and indices). The independent variables consisted of the corresponding conditions for strict and essential unidimensionality. Research Question 2A and 2B Research Question 2A included the investigation of whether any one of the nine decision-making rules or indices performed best, in terms of detecting unidimensionality, in all combinations of conditions explored. Research Question 2B examined the optimal performances of the decision-making rule(s) under specific conditions. This entailed, for example, determining " The Wald test is applied to test whether a set of predictors contribute to the precition of an outcome by examining the statistical significance of the regression coefficients. Interpretation and formulas for this test can be found in Cohen, Cohen, West, and Aiken (2003). 100 which decision-making rule(s) performed best (i.e., optimally) when the distribution of test items was skewed, communalities were low, and sample size was large. In answering Research Question 2B, a list of possible combinations was constructed. In other words, if the decision-making rules and indices were considered optimal for certain conditions, they were added to the list of possible combinations. Several criteria needed to be determined in order to approach Research Questions 2A and 2B. For example, the definition of "best" and "optimal" needed to be identified. According to the psychometric and measurement literature, there are two distinct, and interrelated, scoring interpretations that could be used. The first is using a norm-referenced interpretation. Norm-referenced scoring indicates a performance relative to a specific reference population (APA, A E R A , N C M E , 1999). Therefore, best or optimal performance would be defined by comparing the performance of the decision-making rules and indices from this study to that of some kind of norm-group. The validity of norm-referenced interpretations depends, in part, on the appropriateness of the reference group to which the performance scores are compared ( A P A Standards, 1999). Because the data was generated via a simulated design, the selection and definition of an appropriate norm-group was not applicable or appropriate. In addition, since this study is not investigating the performance of individuals, but rather the performance of rules and indices, norm-referenced interpretation was not fitting. A normative-fype of interpretation was also considered. For example, best and optimal could be determined by comparing the performance of one index or rule to that of all the other rules and indices. This method, again, was not appropriate because all of the individual decision-making rules and indices potentially According to APA, AERA, NCME (1999) standards, both criterion-referenced and norm-referenced scales may be developed and used for the same test scores. In addition, norm-referenced measures that indicate performance relative to some population might, over time, come to support criterion-referenced interpretations. On the other hand, interpretation of performance at different ordered proficiency levels (criterion-referenced), could be reported for different groups, and if these groups were compared to the overall population, a norm-referenced interpretation exists. Therefore, both of these testing methods are interrelated and often cross-bridges with each other. 101 could have had low success rates (i.e., low accuracy). In other words, under this type of interpretation, rules and indices would be defined as being the best and optimal compared to the other rules and indices, even though those particular rules and indices were performing poorly themselves. A better option for defining best and optimal in this context was to use a criterion-referenced interpretation. This type of interpretation makes no direct reference to other decision-making rules or indices. Instead, rules and indices are evaluated for their proportion of correct responses. In other words, how often a decision-making rule or index was correct in identifying unidimensionality was used to define best or optimal. An accuracy rate or proportion of correct responses was calculated for each of the decision-making rules and indices (Research Question 1A). A minimum criterion or cut-off value for this accuracy rate was selected in order to define best and optimal. Criteria for Best and Optimal Performance Currently, there are no objectively defined or accepted rules that state how well a decision-making index needs to perform in order to be considered best or optimal (i.e., successful). At the very minimum, a decision-making rule or index should perform better than 50%. In other words, a researcher would want to utilize a decision-making rule that works successfully (i.e., identifies unidimensionality) more often than it does not. Although 50% is a fairly low standard to meet, the accuracy rates (i.e., the proportion of correct identification of unidimensionality) of the nine decision-making rules and indices were assessed to determine if 102 the 50% criterion was exceeded. A decision-making rule or index was considered best if it exceeded the 50% criterion in all cell conditions of the design (Research Question 2A). A decision-making rule or index was considered optimal if it exceeded the 50% criterion for specific conditions of the cell design (i.e., Research Question 2B). However, in order to raise the standards for the application of combinations, a second criterion was developed. Taking from Cohen's (1988) statistical power criterion, the individual decision-making rules and indices were required to meet an 80% criterion in order to be considered for a combination. As mentioned previously, a list of possible combinations was constructed from the results of Research Question 2B. In other words, once as individual rules and indices met the first criterion (i.e., exceeded 50%), they were assessed in order to determine if the 80% criterion was met. If the 80% criterion was met, the rule or index was included in a combination. Cohen (1988) developed a statistical rule of thumb for the behavioral sciences, which states that a minimum power of 80% is recommended for a research study.24 The statistical power of a test is the long-term probability of rejecting the null hypothesis (H a ) given a specified alpha criterion (a ), sample size (AO, and measure of effect size {fl). Although this dissertation is not utilizing statistical power per say, the same 80% criterion was used to determine whether decision-making rules and indices were optimal for combinations. The 80% criterion raised the standards for what is considered a successful decision-making rule or index. In order for a rule to be considered successful, the rule should be performing significantly better than 50%. See Cohen (1988) for further detail of the statistical rule of thumb for power analysis. 103 Research Question 3A and 3B Research Question 3A involved the investigation of whether a combination of the individual decision-making rules or indices provided a methodology that performed better than any one individual rule or index. In addition, the question of whether any one combination performed best in all conditions was also explored. Research Question 3B focused on examining combinations for optimal performance in specific sets of conditions. This entailed, for example, determining which combination(s) performed optimally when the distribution of test items was skewed, communalities were low, and sample size was large. Methodology was borrowed from Research Questions 2A and 2B. Combinations were formed based on whether the individual decision-making rules or indices met the 80% criterion. In order to test these combinations, overall accuracy rates for the combinations were generated and then assessed. The accuracy rates for the combinations were the proportion of correct identification of unidimensionality for various sets of conditions. In order for a combination to detect unidimensionality correctly, at least one of the decision-making rules and indices in the , combination had to identify unidimensionality accurately. The accuracy rates for the combinations ranged from 0.00 to 1.00. Cell means of the accuracy rates for the combinations were generated. These cell means were used to determine if best and optimal performance was met. Taking from the methodology of Research Questions 2B, best or optimal performance of the combinations was determined by utilizing the 80% criterion. Keeping the standards high (i.e., 104 significantly greater than the 50% criterion), a combination was considered best if it met the 80% criterion for all conditions. Likewise, a combination was considered optimal if it met the 80% criterion for specific conditions. For example, a combination for a strict unidimensional measure with a skewness of 2.50, sample size of 100, and a magnitude of communality as .20 was considered optimal if it met the 80% criterion for these specific conditions. In addition, all the combinations were assessed to determine if any combination performed better than any one individual rule or index. This was done by comparing the accuracy rates of the combinations to the accuracy rates of the individual rules and indices. Software Noting that the objective of this dissertation was to reflect and inform day-to-day research practice, all nine of the decision-making rules and indices that were selected for this dissertation are statistical techniques that are available in widely used statistical software packages (e.g., SPSS, SAS, SYSTAT). The F A techniques used in this study do not require specialized software. The rules and combinations could be used in most widely used statistical packages and produce the same results regardless of the software package employed. SPSS, in particular, is a statistical software package that is widely used by psychometricians and measurement specialists around the world and is the software used in this dissertation. 105 Chapter VI Results The results of the simulation study are presented in this chapter. The chapter is organized according to the two sub-studies, strict unidimensional and essential unidimensional investigations. The three main research questions initiated different levels of analyses, as introduced in Chapter V . The results correspond to these different levels (i.e., research questions). The first level of results describes the performances of the nine individual decision-making rules and indices for both the strict and essential unidimensional models. The main effects of the simulation design conditions are also presented in this level. The second level of results identifies the best and optimal performances of the nine individual rules and indices for both strict and essential unidimensionality. A list of optimal combinations was developed from this level of results. The final level of results detects the best and optimal performances of the combinations of the nine decision-making rules and indices. A summary of the results for both strict and essential unidimensional models is provided. Data Check Several procedures were conducted in order to inspect whether the data consisted of certain types of error (i.e., missing values or variables). This examination was essential before the analyses were conducted in order to determine whether the simulations needed to be run again. To begin, the frequencies were run for each of the dependent variables and independent 106 variables for all simulated design cells. The M L and G L S chi-square statistics, degrees of freedom (df) and p- values were missing for two cases in total. One missing case was found in an essential unidimensional data set, and one missing case was found in a strict unidimensional data set. This was due to the M L and G L S factor analysis estimation methods not being able to converge. Therefore, the simulation code was re-run with a larger number of iterations (i.e., 1000 instead of 25). Afterwards, these data sets consisted of no missing cases. No missing variables were found in all data sets. In addition to examining data sets for missing variables, values and cases, several other inspections took place. Variables were assessed to make sure that the range of values was appropriate. For example, the chi-square values needed to be positive. The p-values for both M L and G L S chi-square tests needed to be between 0.00 and 1.00 Also, the df for both M L and G L S estimation methods needed to be positive and of the same value for each of the 100 cases in each data file. A l l values were in the proper ranges. In addition, each independent variable was coded into each cell (via simulation design) in order to conduct the analyses. Therefore, each cell was inspected to make sure that the independent variables were coded correctly so that the final step of analyses (i.e., determining if the simulation design conditions impact the decision-making of unidimensionality) could draw accurate conclusions. Each independent variable was coded correctly in each cell. 107 Research Question 1A Performance of Individual Decision-Making Rules and Indices Research Question 1A focused on examining the performance of the individual rules and indices. As described in the methodology section, an overall accuracy rate or proportion of correct response was generated for each of the nine decision-making rules and indices. Descriptive statistics in terms of cell means of the accuracy rates were generated in order to describe performance. These cell means can be found in Table 6 for strict unidimensionality and Tables 7 through 10 for essential unidimensionality. Research Question 1A, in essence, is an overall description and summary of the analyses conducted on the individual rules and indices. The results from Research Question 1A are quite ample. The nine decision-making rules and indices were sorted into four groups for ease of interpretation: (1) chi-square M L and chi-square G L S , (2) eigenvalues-greater-than-one rule, ratio-of-first-to-second-eigenvalues-greater-than three rule, and ratio-of-first-to-second-eigenvalues-greater-than four rule, (3) P A Likert and P A continuous, and (4) R M S E A M L and R M S E A G L S . Research Question IB tests the statistical significance of the various conditions (e.g., sample size). Research Questions 2A and 2B, on the other hand, reduce the magnitude of the results in that performance is assessed according to a criterion. Strict Unidimensionality There were three conditions for strict unidimensional measures: distribution of items (i.e., skewness), sample size, and magnitude of communality. Table 6 is organized according to these conditions. For example, an accuracy rate for chi-square M L decision-making index for a strict unidimensional measure with a skewness index of 0.00, sample size of 100, and a 0.20 magnitude of communality is 0.95 (i.e., 95 of the 100 replications in this cell correctly identified one factor), as seen in the first row and second column of Table 6.~ Table 6 Strict Unidimensional Results of the Mean Accuracy Rates for Rules and Indices Skewness Magnitude of h2 Index Decision-Making Rule .20 .90 Sample Size 100 0.00 Chi-square M L 0.95 0.59 Chi-square GLS 0.97 0.84 Eigenvalues>l 0.00 1.00 Eigenvalues Ratio>3 0.10 1.00 Eigenvalues Ratio>4 0.00 1.00 Parallel Analysis of Continuous Data 0.94 1.00 Parallel Analysis of Ordinal Data 0.90 1.00 RMSEA of M L 0.00 0.00 RMSEA of GLS 0.00 0.00 2.50 Chi-square M L 0.35 0.00 Chi-square GLS 0.69 0.01 Eigenvalues>l 0.00 1.00 Eigenvalues Ratio>3 0.00 1.00 Eigenvalues Ratio>4 0.00 1.00 Parallel Analysis of Continuous Data 0.34 1.00 Parallel Analysis of Ordinal Data 0.28 1.00 RMSEA of M L 0.00 0.00 R M S E A of GLS 0.00 0.00 Sample Size 450 0.00 Chi-square M L 0.93 0.66 Chi-square GLS 0.94 0.75 Eigenvalues> 1 0.03 1.00 Eigenvalues Ratio>3 0.69 1.00 Eigenvalues Ratio>4 0.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 Parallel Analysis of Ordinal Data 1.00 1.00 RMSEA of M L 0.75 0.36 In this dissertation, I am using 'rate' and 'proportion' interchangably.. 109 Table 6 (continued) RMSEA of GLS 0.77 0.42 Chi-square M L 0.28 0.00 Chi-square GLS 0.30 0.00 Eigenvalues>l 0.00 1.00 Eigenvalues Ratio>3 0.00 1.00 Eigenvalues Ratio>4 0.00 1.00 Parallel Analysis of Continuous Data 0.77 1.00 Parallel Analysis of Ordinal Data 0.74 1.00 R M S E A of M L 0.11 0.00 RMSEA of GLS 0.10 0.00 Sample Size 800 Chi-square M L 0.96 0.69 Chi-square GLS 0.96 0.78 Eigenvalues>l 0.52 1.00 Eigenvalues Ratio>3 0.96 1.00 Eigenvalues Ratio>4 0.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 Parallel Analysis of Ordinal Data 1.00 1.00 RMSEA of M L 1.00 1.00 RMSEA of GLS 1.00 1.00 Chi-square M L 0.25 0.00 Chi-square GLS 0.28 0.00 Eigenvalues>l 0.00 1.00 Eigenvalues Ratio>3 0.00 1.00 Eigenvalues R a t k Â» 4 0.00 1.00 Parallel Analysis of Continuous Data 0.95 1.00 Parallel Analysis of Ordinal Data 0.97 1.00 RMSEA of M L 0.98 0.03 RMSEA of GLS 0.97 0.03 Overall, both continuous and Likert P A methods provided the strongest and most consistent performances for strict unidimensionality. The cell means for both P A methods were the highest among the nine decision-making methods for all conditions except in two incidences. The first being when sample size was 100, skewness was 0.00, and magnitude of communality was 0.20: both the chi-square M L and chi-square G L S were slightly higher, but all four methods 110 performed well (chi-square M L = 0.95, chi-square GLS = 0.97, PA continuous = .94, PA Likert = 0.90). The second incidence was when sample size was 100, skewness was 2.50, and magnitude of communality was 0.20: the chi-square GLS was considerably higher (chi-square M L = 0.35, chi-square GLS = 0.69, PA continuous = .34, PA Likert = 0.28). This was the only condition where the PA methods performed poorly for strict unidimensionality. The M L and GLS chi-square tests also provided strong performances, but not as consistent as the PA methods. Both chi-square methods provided strong accuracy rates for non-skewed distributions with magnitude of communality as 0.20 (i.e., exceeded 0.93 for all sample sizes). Both chi-square tests seemed to perform poorly for skewed distributions, except when the chi-square GLS generated the highest accuracy rate among the nine rules (0.69): sample size of 100, skewed distributions, and magnitude of communality of 0.20. There were several conditions where the chi-square GLS outperformed the chi-square M L . The first incidence was when sample size was 100, skewness was 0.00, and magnitude of communality was 0.90: chi-square M L = 0.59, chi-square GLS = 0.84. The second was when sample size was 100, skewness was 2.50, and magnitude of communality was 0.20: chi-square M L = 0.35, chi-square GLS = 0.69. The performance of the three eigenvalue rules was mixed depending on the magnitude of communality. To review, the three eigenvalue rules included (a) retaining the number of factors with eigenvalues-greater-than-one, (b) selecting a unidimensional model when the ratio of first-to-second-eigenvalues-is greater than three, and (c) selecting a unidimensional model when the ratio of first-to-second-eigenvalues-is greater than four. For magnitude of communality of 0.20, all three eigenvalue rules performed poorly for all sample sizes and skewness levels, except for I l l the ratio of first-to-second-eigenvalues-greater-than three rule, which generated a strong accuracy rate for non-skewed distribution and sample size of 800 (0.96). When communality level was 0.90, however, all three eigenvalue rules generated 100% accuracy rates for both non-skewed and skewed distributions and for all three sample sizes. The M L and GLS R M S E A indices provided inconsistent results. Both R M S E A indices generated 0.00 accuracy rates for sample size of 100, regardless of skewness or magnitude of communality. For sample size of 450, both R M S E A indices performed average for non-skewed distributions (RMSEA M L = 0.75, R M S E A GLS = 0.77 with 0.20 magnitude of communality; R M S E A M L = 0.36, R M S E A GLS = 0.42 with 0.90 magnitude of communality) and performed poorly for skewed distributions, regardless of magnitude of communality (RMSEA indices generated accuracy rates ranging from 0.00 to 0.11). When sample size increased to 800, both R M S E A indices performed well, except when distributions were skewed and magnitude of communality was 0.90 (RMSEA M L = 0.03, R M S E A GLS = 0.03). It is important to note that when conditions included skewed distributions, communality of 0.20 and sample sizes of 100 and 450, all nine of the decision-making rules and indices generated considerably low accuracy rates. Essential Unidimensionality There were five conditions for essential unidimensional measures: distribution of items (i.e., skewness), sample size, magnitude of communality, proportion of communality on the second factor, and the number of items with non-zero loading on the second factor. Tables 7 through 10 are organized according to these conditions. For example, in Table 7 it can be seen that for an essential unidimensional model with a skewness of 0.00, 0.20 magnitude of communality, sample size of 100, proportion of communality on second factor as 0.05, and the number of items loading on the second factor as three, the chi-square M L accuracy rate was 0.95, as seen in the first row. Table 7 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices (Skewness of 0.0 and Communality of 0.20) # of Items Proportion of h2 on 2n d Factor with non-zero Decision-Making Rule .05 .30 .50 loadings on 2nd Factor Sample Size 100 3.0 Chi-square M L 0.95 0.98 0.98 Chi-square GLS 0.98 0.98 0.98 Eigenvalues> 1 0.00 0.00 0.00 Eigenvalues Ratio>3 0.07 0.04 0.02 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 0.92 0.86 0.84 Parallel Analysis of Ordinal Data 0.89 0.86 0.85 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 6.0 Chi-square M L 0.95 0.91 0.93 Chi-square GLS 0.99 0.99 0.98 Eigenvalues>l 0.00 0.00 0.00 Eigenvalues Ratio>3 0.07 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 0.85 0.76 0.57 Parallel Analysis of Ordinal Data 0.83 0.83 0.59 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 113 Table 7 (continued) Sample Size 450 Chi-square M L 0.90 0.95 0.92 Chi-square GLS 0.93 0.98 0.95 Eigenvalues> 1. 0.01 0.00 0.00 Eigenvalues Ratio>3 0.56 0.24 0.01 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 1.00 0.99 0.96 Parallel Analysis of Ordinal Data 1.00 0.98 0.98 RMSEA of M L 0.72 0.73 0.72 RMSEA of GLS 0.78 0.80 0.74 Chi-square M L 0.99 0.91 0.85 Chi-square GLS 0.98 0.93 0.93 Eigenvalues>l 0.00 0.00 0.00 Eigenvalues Ratio>3 0.62 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 1.00 0.99 0.77 Parallel Analysis of Ordinal Data 1.00 0.99 0.74 RMSEA of M L 0.76 0.62 0.55 RMSEA of GLS 0.84 0.71 0.66 Sample Size 800 Chi-square M L 0.96 0.97 0.90 Chi-square GLS 0.95 0.97 0.90 Eigenvalues>l 0.44 0.04 0.00 Eigenvalues Ratio>3 0.93 0.39 0.04 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 1.00 1.00 0.96 Parallel Analysis of Ordinal Data 1.00 1.00 0.97 RMSEA of M L 1.00 1.00 1.00 RMSEA of GLS 1.00 1.00 1.00 Chi-square M L 0.98 0.93 0.66 Chi-square GLS 0.97 0.94 0.73 Eigenvalues> 1 0.40 0.00 0.00 Eigenvalues Ratio>3 0.88 0.03 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 1.00 1.00 0.66 Parallel Analysis of Ordinal Data 1.00 1.00 0.60 RMSEA of M L 1.00 1.00 1.00 RMSEA of GLS 1.00 1.00 1.00 114 Table 8 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices (Skewness of 0.00 and Communality of 0.90) # of Items Proportion of h2on 2 n d Factor with non-zero Decision-Making Rule .05 .30 .50 loadings on 2nd Factor Sample Size : 100 3.0 Chi-square M L 0.72 0.76 0.44 Chi-square GLS 0.90 0.93 0.86 Eigenvalues>l 1.00 0.98 0.09 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.98 Parallel Analysis of Ordinal Data 1.00 1.00 0.97 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 6.0 Chi-square M L 0.81 0.51 0.02 Chi-square GLS 0.96 0.87 0.78 Eigenvalues> 1. 1.00 0.87 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 0.95 Parallel Analysis of Continuous Data 1.00 1.00 0.45 Parallel Analysis of Ordinal Data 1.00 1.00 0.44 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 Sample Size 450 3.0 Chi-square M L 0.71 0.35 0.00 Chi-square GLS 0.76 0.46 0.08 Eigenvalues>l 1.00 1.00 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.87 Parallel Analysis of Ordinal Data 1.00 1.00 0.89 RMSEA of M L 0.33 0.11 0.00 RMSEA of GLS 0.40 0.16 0.00 6.0 Chi-square M L 0.80 0.00 0.00 Chi-square GLS 0.84 0.15 0.00 115 Table 8 (continued) Eigenvalues>l 1.00 1.00 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.01 Parallel Analysis of Ordinal Data 1.00 1.00 0.00 RMSEA of M L 0.48 0.00 0.00 RMSEA of GLS 0.61 0.03 0.00 Sample Size 800 Chi-square M L 0.76 0.18 0.00 Chi-square GLS 0.78 0.25 0.00 Eigenvalues>l 1.00 1.00 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.76 Parallel Analysis of Ordinal Data 1.00 1.00 0.75 RMSEA of M L 1.00 0.97 0.00 RMSEA of GLS 1.00 0.99 0.20 Chi-square M L 0.79 0.00 0.00 Chi-square GLS 0.80 0.00 0.00 Eigenvalues>l 1.00 1.00 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.00 Parallel Analysis of Ordinal Data 1.00 1.00 0.00 RMSEA of M L 1.00 0.02 0.00 RMSEA of GLS 1.00 0.57 0.00 Table 9 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices (Skewness of 2.50 and Communality of 0.20) # of Items Proportion of h2 on 2n d Factor with non-zero Decision-Making Rule .05 .30 .50 loadings on 2nd Factor Sample Size 100 3.0 Chi-square M L 0.26 0.46 0.54 Table 9 (continued) Chi-square GLS 0.66 0.77 0.80 Eigenvalues>l 0.00 0.00 0.00 Eigenvalues Ratio>3 0.00 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 0.30 0.41 0.29 Parallel Analysis of Ordinal Data 0.32 ' 0.40 0.31 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 Chi-square M L 0.30 0.53 0.62 Chi-square GLS 0.67 0.79 0.89 Eigenvalues> 1 0.00 0.00 0.00 Eigenvalues Ratio>3 0.00 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 0.32 0.32 0.27 Parallel Analysis of Ordinal Data 0.32 0.30 0.25 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 Sample Size 450 3.0 Chi-square M L 0.31 0.32 0.37 Chi-square GLS 0.42 0.40 0.44 Eigenvalues;* 1 0.00 0.00 0.00 Eigenvalues Ratio>3 0.00 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 0.78 0.70 0.71 Parallel Analysis of Ordinal Data 0.83 0.69 0.67 RMSEA of M L 0.05 0.12 0.09 RMSEA of GLS O08 014 016 6.0 Chi-square M L 0.35 0.44 0.31 Chi-square GLS 0.37 0.52 0.44 Eigenvalues;* 1 0.00 0.00 0.00 Eigenvalues Ratio>3 0.00 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 0.76 0.69 0.32 Parallel Analysis of Ordinal Data 0.76 0.67 0.39 RMSEA of M L 0.07 0.20 0.16 RMSEA of GLS 0.13 0.21 0.17 Sample Size 800 ; 3.0 Chi-square M L 0.27 0.22 0.41 Chi-square GLS 0.39 0.28 0.44 Eigenvalues>l 0.00 0.00 0.00 Eigenvalues Ratio>3 0.00 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 117 Parallel Analysis of Continuous Data 0.90 0.84 0.76 Parallel Analysis of Ordinal Data 0.92 0.89 0.78 RMSEA of M L 0.98 0.99 1.00 RMSEA of GLS 0.98 1.00 1.00 Chi-square M L 0.28 0.37 0.33 Chi-square GLS 0.33 0.40 0.44 Eigenvalues> 1 0.00 0.00 0.00 Eigenvalues Ratio>3 0.00 0.00 0.00 Eigenvalues Ratio>4 0.00 0.00 0.00 Parallel Analysis of Continuous Data 0.97 0.83 0.50 Parallel Analysis of Ordinal Data 0.98 0.82 0.51 RMSEA of M L 0.99 0.99 0.99 RMSEA of GLS 1.00 1.00 0.99 Table 10 Essential Unidimensional Results of the Mean Accuracy Rates for Rules and Indices (Skewness of 2.50 and Communality of 0.90) # of Items Proportion of h2on 2 n d Factor with non-zero Decision-Making Rule .05 .30 .50 loadings on 2nd Factor Sample Size 100 3.0 Chi-square M L 0.00 0.00 0.00 Chi-square GLS 0.00 0.01 0.01 Eigenvalues> 1 0.99 0.60 0.03 * Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.85 Parallel Analysis of Ordinal Data 1.00 1.00 0.88 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 6.0 Chi-square M L 0.00 0.00 0.00 Chi-square GLS 0.01 0.00 0.00 Eigenvalues> 1 1.00 0.26 0.00 Eigenvalues Ratio>3 1.00 1.00 0.98 Eigenvalues Ratio>4 1.00 0.99 0.69 Parallel Analysis of Continuous Data 1.00 0.94 0.19 118 Table 10 (continued) Parallel Analysis of Ordinal Data 1.00 0.94 0.19 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 Sample Size 450 3.0 Chi-square M L 0.00 0.00 0.00 Chi-square GLS 0.00 0.00 0.00 Eigenvalues>l 1.00 1.00 0.00 Eigenvalues Ratio>3 1.00 1.00 0.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.54 Parallel Analysis of Ordinal Data 1.00 1.00 0.54 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 6.0 Chi-square M L 0.00 0.00 0.00 Chi-square GLS 0.00 0.00 0.00 Eigenvalues>l 1.00 0.68 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 0.93 Parallel Analysis of Continuous Data 1.00 0.99 0.00 Parallel Analysis of Ordinal Data 1.00 0.99 0.00 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.00 0.00 0.00 Sample Size 800 3.0 Chi-square M L 0.00 0.00 0.00 Chi-square GLS 0.00 0.00 0.00 Eigenvalues>l 1.00 1.00 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 1.00 Parallel Analysis of Continuous Data 1.00 1.00 0.24 Parallel Analysis of Ordinal Data 1.00 1.00 0.25 RMSEA of M L 0.02 0.01 0.00 RMSEA of GLS 0.03 0.01 0.00 6.0 Chi-square M L 0.00 0.00 0.00 Chi-square GLS 0.00 0.00 0.00 Eigenvalues>l 1.00 0.67 0.00 Eigenvalues Ratio>3 1.00 1.00 1.00 Eigenvalues Ratio>4 1.00 1.00 0.98 Parallel Analysis of Continuous Data 1.00 0.99 0.00 Parallel Analysis of Ordinal Data 1.00 0.99 0.00 RMSEA of M L 0.00 0.00 0.00 RMSEA of GLS 0.03 0.00 0.00 119 Overall, as seen in the strict unidimensional results, both continuous and Likert P A methods provided the strongest and most consistent performances for essential unidimensionality. The cell means for both P A methods were generally the highest among the nine decision-making methods for all conditions, except under several sets of conditions. As seen in Table 7, the first exception was with a sample size of 100, skewness of 0.00, and magnitude of communality of 0.20: both the chi-square M L and chi-square G L S were slightly higher, but all four methods performed well, regardless of the proportion of communality on the second factor or the number of test items loading on second factor. As seen in Table 9, the second exception was with sample size of 100, skewness of 2.50, and magnitude of communality of 0.20: the chi-square G L S was considerably higher for all proportions of communality on the second factor and for both numbers of items loading on the second factor, whereas both P A methods generated fairly low accuracy rates for these conditions (ranging from 0.25 to 0.41). This was the only combination of conditions where the P A methods performed poorly for essential unidimensionality, except when the proportion of communality on the second factor was 0.50. This condition, in particular, seemed to generate low accuracy rates for both P A methods. The M L and G L S chi-square tests provided strong performance rates for several of the non-skewed conditions. As shown in Table 7, both the M L and G L S chi-square tests provided the highest accuracy rates among the nine decision-making methods for non-skewed distributions when sample size was 100 and magnitude of communality was 0.20. Although the M L and G L S chi-square tests performed quite similarly under most conditions, the chi-square 120 GLS test outperformed the M L chi-square test for several conditions: (1) non-skewed distributions and magnitude of communality of 0.90 (see Table 8), and (2) skewed distributions and magnitude of communality of 0.20 (see Table 9). Both the M L and GLS chi-square tests generally performed poorly with skewed distributions, especially when the magnitude of communality was 0.90. Overall, both the chi-square methods were sensitive to the proportion of communality on the second factor as 0.50. However, the chi-square GLS generated the highest accuracy rates when sample size was 100, distributions were skewed, magnitude of communality was 0.20, and proportion of communality on second factor was 0.50 for both three and six items loading on the second factor. The performance of the three eigenvalue rules was mixed depending on the magnitude of communality. As seen with the strict unidimensionality, for a magnitude of communality of 0.20, all three eigenvalue rules performed poorly for all sample sizes, skewness levels, proportion of communality on the second factor, and the number of items loading on the second factor (see Tables 7 and 9). When magnitude of communality was 0.90, however, all three eigenvalue rules generated high accuracy rates for both non-skewed and skewed distributions, all three sample sizes, all proportions of communality on the second factor, and the number of items loading on the second factor (see Tables 8 and 10), except for the eigenvalues-greater-than-one rule, which performed poorly to the proportion of communality on the second factor as 0.50 in all cases. Although the eigenvalues-greater-than-one rule was extremely sensitive to the proportion of communality on the second factor as 0.50, the other two eigenvalue rules were not. The ratio of first-to-second-eigenvalues-greater-than three rule and the ratio of first-to-second-eigenvalues-121 greater-than four rule generated the highest accuracy rates among the nine decision-making rules when distributions were skewed, magnitude of communality were 0.90, and the proportion of communality on the second factor was 0.50 (regardless of sample size and the number of items loading on the second factor). This robustness to the proportion of communality on the second factor being 0.50 can be seen in Tables 9 and 10. The M L and GLS R M S E A indices provided inconsistent results. On the most part, both R M S E A performed poorly with small sample sizes and became increasingly better as sample size increased. When sample size increased to 800, both R M S E A indices performed well, except when distributions were skewed and magnitude of communality was 0.90. Both R M S E A indices were sensitive to the proportion of communality on the second factor as 0.50, except when distributions were skewed, magnitude communality was 0.20, and sample size was 800. The R M S E A indices generated the highest accuracy rates when sample size was 800, distributions were skewed, magnitude of communality was 0.20, and proportion of communality on the second factor was 0.50 for both three and six items loading on the second factor. It is important to note that when conditions included skewed distributions, magnitude of communality of 0.20 and sample sizes of 100 and 450, all nine of the decision-making rules and indices generated considerably low accuracy rates. These combined conditions seem to have a large influence on the decision-making process of retaining factors: these were the same conditions that influenced the robustness of the nine decision-making rules and indices for strict unidimensionality. The next section illustrates the main effects of each condition (i.e., independent variables) on the decision-making rules and indices in detecting both strict and 122 essential unidimensional measures. Research Question IB Effects of the Independent Variables on Strict and Essential Unidimensional Measures Research Question IB investigated the main effects of the independent variables on the performance of the individual rules and indices. Such variables (i.e., conditions) included sample size (three levels), levels of skewness (two levels), and the magnitude of communality (two levels) for both strict and essential unidimensional measures. The proportion of communality on the second factor (three levels) and the number of items with non-zero loadings on the second factor (two levels) were also examined for essential unidimensional measures. The main effects were explored via bar graphs (see Figures 5 to 16 for strict unidimensionality and Figures 17 to 36 for essential unidimensionality) and through binary logistic regression (see Table 11 for strict unidimensionality and Table 12 for essential undimensionality). The binary logistic regression results are included because of the concern that the figures alone may hide some important information - i.e., graphs can sometimes be deceptive and the statistical model was used to verify findings in the graphs. For ease of interpreting these results, certain decision-making rules and indices were grouped together in bar charts: (1) M L and GLS chi-square tests, (2) the PA continuous and Likert methods, (3) the three eigenvalue rules, and (4) the M L and GLS R M S E A indices. The bar charts provide the mean values of the accuracy rates for the individual decision-making rules and indices. Strict and essential unidimensional measures are discussed separately. Additionally, as seen in Table 11 for strict unidimensionality, there were three binary logistic 123 regressions that were unable to converge. The ratio-of-first-to-second-eigenvalues-greater-than four rule, P A continuous, and P A Likert consisted of cells that lacked variability, and hence, models were unable to converge. Strict Unidimensionality E f f e c t o f S a m p l e S i z e C h i - s q u a r e M L a n d C h i - s q u a r e G L S S t r i c t U n i d i m e n s i o n a l i t y 100.00 800.00 decision based Q on Chi-squared M L decision based E3 on Chi-squared G L S Figure 5 Chi-square Statistic for Strict Unidimensionality: Effect of Sample Size 124 Effect of Skewness Chi-square ML and Chi-square GLS Strict Unidimensionality .00 H D.80H : O.SQH re ai 0.40 0.20H 0.00 decision based â€¢ on Chi-squared ML decision based 0 on Chi-squared GLS s k e w Figure 6 Chi-square Statistic for Strict Unidimensionality: Effect of Skewness Effect of the Magnitude of Communality Estimates Chi-square ML and Chi-square GLS Strict Unidimensionality i.ooH 0.80 : 0.60-H re w 0.40 H 0.20 H 0.00 decision based CD on Chi-squared ML decision based E3 on Chi-squared G L S comm Figure 7 Chi-square Statistic for Strict Unidimensionality: Effect of the Magnitude of Communality Effect of Sample Size Eigenvalue Rules Strict Unidimensionality 100.00 800.00 decision based on whether the 2nd eigenvalue gtl decision based on ratio of eigenvalues. 1/2, gt 3 decision based on ratio of eigenvalues, 1/2, gt 4 Figure 8 Eigenvalue Rules for Strict Unidimensionality: Effect of Sample Size 1.00H 0.80-: o.60 H 0.40H 0 .20H 0.00-Effect of Skewness Eigenvalue Rules Strict Unidimensionality T7 .00 T 2.50 s k e w decision based n on whether the 2nd eigenvalue gt i decision based pi on ratio of â€¢ eigenvalues, 1/2, gt.3 decision based â„¢ on ratio of â„¢* eigenvalues, 1/2, gt 4 Figure 9 Eigenvalue Rules for Strict Unidimensionality: Effect of Skewness E f f e c t o f t h e M a g n i t u d e o f C o m m u n a l i t y E s t i m a t e s E i g e n v a l u e R u l e s S t r i c t U n i d i m e n s i o n a l i t y Figure 10 Eigenvalue Rules for Strict Unidimensionality: Effect of the Magnitude of Communality Effect of Sample Size Parallel Analysis for Likert and Continuous Data Strict Unidimensionality 100.00 450.00 800.00 N Figure 11 Parallel Analysis for Strict Unidimensionality: Effect of Sample Size E f f e c t o f S k e w n e s s P a r a l l e l A n a l y s i s f o r L i k e r t a n d C o n t i n u o u s D a t a S t r i c t U n i d i m e n s i o n a l i t y - r r-.00 2 .50 s k e w Figure 12 Parallel Analysis for Strict Unidimensionality: Effect of Skewness E f f e c t o f t h e M a g n i t u d e o f C o m m u n a l i t y E s t i m a t e s P a r a l l e l A n a l y s i s f o r L i k e r t a n d C o n i n u o u s D a t a S t r i c t U n i d i m e n s i o n a l i t y .20 .90 comm Figure 13 Parallel Analysis for Strict Unidimensionality: Effect of the Magnitude of Communality Effect of Sample Size RMSEA for ML and RMSEA for GLS Strict Unidimensionality 1.00 0 .80H : 0 .60H n 01 2 o.Â«H 0.20 H o.oo 100.00 800.00 decision based on R M S E A ML less than or equal to 05 decision based on R M S E A GLS less than or equal to 05 Figure 14 R M S E A for Strict Unidimensionality: Effect of Sample Size Effect of Skewness RMSEA for ML and RMSEA for GLS Strict Unidimensionality i.ooH skew â€¢ decision based on R M S E A ML less than or equal to 05 decision based ~ on R M S E A l a G L S less than or equal to 05 Figure 15 R M S E A for Strict Unidimensionality: Effect of Skewness f Effect of the Magnitude of Communality Estimates RMSEA for ML and RMSEA for GLS Strict Unidimensionality comm decision based on R M S E A ML less than or equal to 05 decision based on R M S E A GLS less than or equal to 05 Figure 16 R M S E A for Strict Unidimensionality: Effect of the Magnitude of Communality Table 11 Binary Logistic Regression Results for Main Effects: Strict Unidimensionality Condition Rules & Indices Wald df Sig. Sample Size Chi-square M L 0.096 2 0.953 Chi-square GLS 35.057 2 0.0001 Eigenvalues>l 32.958 2 0.0001 Eigenvalues Ratio>3 93.267 2 0.0001 Eigenvalues Ratio>4* N/A N / A N / A PA Continuous* N/A N / A N / A PA Likert* N/A N / A N / A RMSEA M L 149.986 2 0.0001 RMSEA GLS 137.555 2 0.0001 Skewness Chi-square M L 263.046 1 0.0001 Chi-square GLS 259.277 1 0.0001 Eigenvalues> 1 0.0001 1 0.980 Eigenvalues Ratio>3 0.0001 1 0.986 Eigenvalues Ratio>4* N/A N / A N / A PA Continuous* N/A N / A N / A PA Likert* N/A N / A N / A RMSEA M L 165.178 1 0.0001 RMSEA GLS 165.373 1 0.0001 Magnitude of Communality Chi-square M L 114.575 1 0.0001 Chi-square GLS 114.541 1 0.0001 Eigenvalues> 1 0.001 1 0.980 Eigenvalues Ratio>3 0.001 1 0.981 Eigenvalues Ratio>4* N/A N / A N / A PA Continuous* N/A N / A N / A PA Likert* N/A N / A N / A RMSEA M L 147.200 1 0.0001 RMSEA GLS 137.671 1 0.0001 Note: * Solution could not converge because certain cells lacked variability. 131 As shown in Figures 5 through 16 and Table 11, there were several conditions that seemed to influence the mean accuracy rates of the decision-making rules and indices. First, the M L and GLS chi-square accuracy rates were assessed. For M L chi-square, there were statistically significant main effects of skewness and magnitude of communality. For GLS chi-square, there were statistically significant main effects of sample size, skewness, and magnitude of communality. For sample size, there were statistically significant effects between n=100 and n=450: x 2 ( l , 800)=28.6, p=0.0001; and between n= 100 and n=800: x 2 0 , 800)=25.6, p=0.0001. The three eigenvalue rules were assessed. For the eigenvalues-greater-than-one rule, there was a statistically significant main effect of sample size. The statistical significance was found between n=450 and n=800: x 2 ( l , 800)=33.0, p=0.0001. For the ratio-of-first-to-second-eigenvalues-greater-than three rule, there was a statistically significant main effect of sample size. The statistical significances were was found between n=100 and n=450: x 2 ( l , 800)=56.9, p=0.0001; between n=450 and n=800: x 2 ( l , 800)=18.4, p=0.0001; and between n=100 and n=800: x 2 ( l , 800)=77.8, p=0.0001. The model did not converge for the ratio-of-first-to-second-eigenvalues-greater-than four rule due to lack of variability. However, when assessing the bar chart in Figure 10, it can be seen that the magnitude of communality had a significant effect (i.e., if communality was 0.20, the rule failed to detect unidimensionality, and when communality was 0.90, the rule identified undimensionality). The models for both PA continuous and PA Likert did not converge. As seen in Figures 12, 13, and 14, sample size, skewness, and magnitude of communality seem to have little to no effect. There was a lack of variability in the mean values of the accuracy rates in the various 132 cells. The M L and GLS R M S E A accuracy rates were examined next. For M L R M S E A , there were statistically significant main effects of sample size, skewness, and magnitude of communality. For sample size, there was a statistically significant effect between n=450 and n=800: f(\, 800)= 150.0, p=0.0001. Likewise, for GLS RMSEA, there were statistically significant main effects of sample size, skewness, and magnitude of communality. For sample size, there was a statistically significant effect between n=450 and n=800: x2(L 800)= 137.6, p=0.0001. In summary, sample size had a main effect on five of the nine individual decision-making rules and indices. Skewness had a main effect on four of the nine individual decision-making rules and indices. Magnitude of communality had a main effect on five of the nine individual decision-making rules and indices. Essential Unidimensionality E f f e c t o f S a m p l e S i z e C h i - s q u a r e M L a n d C h i - s q u a r e G L S E s s e n t i a l U n i d i m e n s i o n a l i t y I.OQH o.8<H c o.so-l 01 2 Q.4(H 0.20 H 0.00-100.00 decision based Q on Chi-squared ML decision based 13 on Chi-squared GLS 450.00 N 800.00 Figure 17 Chi-square Statistic for Essential Unidimensionality: Effect of Sample Size E f f e c t o f S k e w n e s s C h i - s q u a r e M L a n d C h i - s q u a r e G L S E s s e n t i a l U n i d i m e n s i o n a l i t y 1.00-0.80-0.40-0.20-^ decision based O on Chi-squared ML decision based 23 on Chi-squared GLS skew Figure 18 Chi-square Statistic for Essential Unidimensionality: Effect of Skewness 134 Effect of the Magnitude of Communality Estimates Chi-square ML and Chi-square GLS Essential Unidimensionality 1.00-0.80-4 : 0.60-4 0) 0.40-0.20 H 0.00-I decision based O on Chi-squared ML decision based 0 on Chi-squared GLS .90 c o m m Figure 19 Chi-square Statistic for Essential Unidimensionality: Effect of the Magnitude of Communality E f f e c t o f t h e P e r c e n t o f C o m m u n a l i t y E s t i m a t e s o n 2nd F a c t o r C h i - s q u a r e M L a n d C h i - s q u a r e G L S E s s e n t i a l U n i d i m e n s i o n a l i t y 2 : 0.60-4 0.20H 0.00-.05 .30 1 p_comm Figure 20 1 decision based O on Chi-squared M L decision based Â£3 on Chi-squared G L S Chi-square Statistic for Essential Unidimensionality: Effect of the Proportion of Communality on the Second Factor 135 E f f e c t o f t h e N u m b e r o f I t e m s L o a d i n g o n 2nd F a c t o r C h i - s q u a r e M L a n d C h i - s q u a r e G L S E s s e n t i a l U n i d i m e n s i o n a l i t y Figure 21 items decision based Q on Chi-squaied ML decision based Â£3 on Chi-squared GLS Chi-square Statistic for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor 1.00-0.80-o.*H 0.20-0.00-E f f e c t o f S a m p l e S i z e E i g e n v a l u e R u l e s E s s e n t i a l U n i d i m e n s i o n a l i t y i 100.00 Figure 22 450.00 N 800.00 0 decision based on whether the 2nd eigenvalue gt I decision based on ratio of eigenvalues, 1/2. gt 3 decision based on ratio of eigenvalues. 1/2. gt 4 Eigenvalue Rules for Essential Unidimensionality: Effect of Sample Size i.ooH 0.80 H ra : o.BOH Q.40H â€¢.20 H 0.00-Figure 23 136 E f f e c t o f S k e w n e s s E i g e n v a l u e R u l e s E s s e n t i a l U n i d i m e n s i o n a l i t y decision based on whether the 2nd eigenvalue gt 1 decision based on ratio of eigenvalues. 1/2, gt 3 decision based on ratio of eigenvalues. 1/2, gt 4 .00 skew Eigenvalue Rules for Essential Unidimensionality: Effect of Skewness E f f e c t o f t h e M a g n i t u d e o f C o m m u n a l i t y E s t i m a t e s i.ooH 0.80H 0> : O.BOH o:40-H 0.20 H 0.00 E i g e n v a l u e R u l e s E s s e n t i a l U n i d i m e n s i o n a l i t y Figure 24 comm decision based Â« on whether the 2nd eigenvalue at 1 decision based on ratio of eigenvalues. I.C. gt .> decision based on ratio of â„¢ eigenvalues, 1/2, gt 4 Eigenvalue Rules for Essential Unidimensionality: Effect of the Magnitude of Communality 137 Effect of the Percent of Communality Estimates on the 2nd Factor Eigenvalue Rules Essential Unidimensionality 1.00-0.80H (0 0) : 0.60-0.40H 0.20-0.00- I .05 I .30 p _ c o m m â€¢ decision based on whether the 2nd eigenvalue gt 1 decision based on ratio of eigenvalues, 1/2. gt 3 decision based on ratio of eigenvalues, 1/2, gt 4 Figure 25 Eigenvalue Rules for Essential Unidimensionality: Effect of the Proportion of Communality on the Second Factor E f f e c t o f t h e N u m b e r o f I t e m s L o a d i n g o n t h e 2nd F a c t o r i.o(H 05 : 0.60H 0.40-E i g e n v a l u e R u l e s E s s e n t i a l U n i d i m e n s i o n a l i t y Figure 26 3m items T 6.00 â€¢ 0 decision based on whether the 2nd eigenvalue gt i decision bnsed on ratio of eigenvalues, 1/2, gt 3 decision based on ratio of eigenvalues, 1/2, gt 4 Eigenvalue Rules for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor E f f e c t o f S a m p l e S i z e P a r a l l e l A n a l y s i s o f C o n t i n u o u s a n d L i k e r t D a t a E s s e n t i a l U n i d i m e n s i o n a l i t y 100.00 decision based â€¢ on PA continuous decision based m on PA Likert 800.00 Figure 27 Parallel Analysis for Essential Unidimensionality: Effect of Sample Size E f f e c t o f S k e w n e s s P a r a l l e l A n a l y s i s o f C o n t i n u o u s a n d L i k e r t D a t a E s s e n t i a l U n i d i m e n s i o n a l i t y i.ooH : 0.60 H 0.40H 0.20H Figure 28 skew decision based â€¢ on PA continuous decision based on PA Likert Parallel Analysis for Essential Unidimensionality: Effect of Skewness E f f e c t o f t h e M a g n i t u d e o f C o m m u n a l i t y E s t i m a t e s P a r a l l e l A n a l y s i s o f C o n t i n u o u s a n d L i k e r t D a t a E s s e n t i a l U n i d i m e n s i o n a l i t y decision based Q on P A continuous decision based 0 on P A Likert Figure 29 Parallel Analysis for Essential Unidimensionality: Effect of the Magnitude of Communality E f f e c t o f t h e P e r c e n t o f C o m m u n a l i t y E s t i m a t e s o n t h e 2nd F a c t o r P a r a l l e l A n a l y s i s o f C o n t i n u o u s a n d L i k e r t D a t a E s s e n t i a l U n i d i m e n s i o n a l i t y 1.00 2 0.20 .05 .30 p_comm Figure 30 decision based El on P A continuous decision based 0 on P A Likert Parallel Analysis for Essential Unidimensionality: Effect of the Proportion of Communality Second Factor 140 Effect of the Number of Items Loading on the 2nd Factor Parallel Analysis of Continuous and Likert Data Essential Unidimensionality decision based â€¢ on P A continuous decision based 0.80 H ra at 2 : o.BOH 0.20 0.00 E2 on P A Likert items Figure 31 Parallel Analysis for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor Effect of Sample Size RMSEA for ML and RMSEA for GLS Essential Unidimensionality 100.00 450.00 800.00 N Figure 32 R M S E A for Essential Unidimensionality: Effect of Sample Size Effect of Skewness RMSEA for ML and RMSEA for GLS Essential Unidimensionality 0.8DH : 0.60 0.20 0.00 skew Figure 33 R M S E A for Essential Unidimensionality: Effect of Skewness Effect of the Magnitude of Communality Estimates RMSEA for ML and RMSEA for GLS Essential Unidimensionality i.ooH decision based on RMSEA ML less than or equal to 05 decision based on R M S E A GLS less than or equal to 05 comm Figure 34 R M S E A for Essential Unidimensionality: Effect of the Magnitude of Communality 142 Effect of the Percent of Communality Estimates on 2nd Factor RMSEA for ML and RMSEA for GLS Essential Unidimensionality 1.DO-CS 0) 2 0.40-0.20 H 0.00-.05 .30 m. i .50 â€¢ El decision based on RMSEA ML less than or equal to 05 decision based on RMSEA GLS less than or equal to 05 p_comm Figure 35 R M S E A for Essential Unidimensionality: Effect of the Proportion of Communality on the Second Factor Effect of the Number of Items Loading on the 2nd Factor RMSEA for ML and RMSEA for GLS Essential Unidimensionality i .ooH c 0.60H. a 2 Q.40H 0.20 H r 3.00 â€¢ 0 decision based on RMSEA ML less than or equal to 05 decision based on RMSEA GLS less than or equal.to 05 6.00 items Figure 36 R M S E A for Essential Unidimensionality: Effect of the Number of Items Loading on the Second Factor Table 12 Binary Logistic Regression Results for Main Effects: Essential Unidimensionality Condition Rules & Indices Wald df Sig. Sample Size Chi-square M L 130.302 2 0.0001 Chi-square GLS 620.378 2 0.0001 Eigenvalues> 1 167.621 2 0.0001 Eigenvalues Ratio>3 208.961 2 0.0001 Eigenvalues Ratio>4 35.79 2 0.0001 PA Continuous 100.710 2 0.0001 PA Likert 100.978 2 0.0001 RMSEA M L 794.834 2 0.0001 RMSEA GLS 828.725 2 0.0001 Skewness Chi-square M L 1313.297 1 0.0001 Chi-square GLS 1230.784 1 0.0001 Eigenvalues> 1 169.496 1 0.0001 Eigenvalues Ratio>3 0.002 1 0.969 Eigenvalues Ratio>4 23.612 1 0.0001 PA Continuous 438.992 1 0.0001 PA Likert 417.295 1 0.0001 RMSEA M L 663.607 1 0.0001 RMSEA GLS 789.515 1 0.0001 Magnitude of Communality Chi-square M L 1286.453 1 0.0001 Chi-square GLS 1235.097 1 0.0001 Eigenvalues>l 442.592 1 0.0001 Eigenvalues Ratio>3 0.004 1 0.949 Eigenvalues Ratio>4 0.004 1 0.947 PA Continuous 45.224 1 0.0001 PA Likert 41.163 1 0.0001 RMSEA M L 1028.937 1 0.0001 RMSEA GLS 1056.663 1 0.0001 Proportion h2of on 2 n d Factor Chi-square M L 228.456 2 0.0001 Chi-square GLS 167.606 2 0.0001 Eigenvalues>l 577.495 2 0.0001 Eigenvalues Ratio>3 322.315 2 0.0001 Eigenvalues Ratio>4 16.201 2 0.0001 PA Continuous 1159.275 2 0.0001 144 Table 12 (continued) PA Likert 1160.767 2 0.0001 RMSEA M L 243.086 2 0.0001 RMSEA GLS 254.282 2 0.0001 # of Items on 2 n d Factor Chi-square M L 11.737 1 0.001 Chi-square GLS 3.895 1 0.048 Eigenvalues> 1 78.382 1 0.0001 Eigenvalues Ratio>3 32.501 1 0.0001 Eigenvalues Ratio>4 0.001 1 0.974 PA Continuous 286.900 1 0.0001 PA Likert 305.975 1 0.0001 RMSEA M L 23.960 1 0.0001 RMSEA GLS 6.420 1 0.011 As shown in Figures 17 through 36 and Table 12, there were several conditions that seemed to influence the mean accuracy rates of the decision-making rules and indices. First, the M L and G L S chi-square accuracy rates were assessed. For M L chi-square, there were statistically significant main effects of sample size, skewness, magnitude of communality, proportion of communality on second factor, and the number of items loading on the second factor. For sample size, there were statistically significant effects between n=100 and n=450: X 2 ( l , 4800)=76.8, p=0.0001; between n=100 and n=800: x 2(l, 4800)=113.2, p=0.0001; and between n=450 and n=800: x2(L 4800)=3.88, p=0.049. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: x 2(l, 4800)=82.1, p=0.0001; between 0.05 and 0.50: x 2(l, 4800)=223.4, p=0.0001; and between 0.30 and 0.50: X 2 ( l , 4800)=39.9, p=0.0001. Similarly, for G L S chi-square, there were statistically significant main effects of sample 145 size, skewness, magnitude of communality, proportion of communality on second factor, and the number of items loading on the second factor. For sample size, there were statistically significant effects between n=100 and n=450: x 2(l, 4800)=411.3, p=0.0001; between n=100 and n=800: X 2 ( l , 4800)=553.8, p=0.0001; and between n=450 and n=800: x2 (1, 4800)=17.8, p=0.0001. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: x2 (1, 4800)=76.0, p=0.0001; between 0.05 and 0.50: x 2(l, 4800)= 160.3, p=0.0001; and between 0.30 and 0.50: x 2(l, 4800)= 17.6, p=0.0001. The three eigenvalue rules were assessed next. For the eigenvalues-greater-than-one rule, there was a statistically significant main effect of all conditions. For sample size, there were statistically significant effects between n=100 and n=450: x 2(l, 4800)=57.6, p=0.0001; between n=100 and n=800: x 2(l, 4800)=167.5, p=0.0001; and between n=450 and n=800: x 2(l, 4800)=52.7, p=0.0001. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: x 2(l, 4800)= 104.0, p=0.0001; between 0.05 and 0.50: X 2 ( l , 4800)=487.5, p=0.0001; and between 0.30 and 0.50: x2Q, 4800)=486.7, p=0.0001. For the ratio-of-first-to-second-eigenvalues-greater-than three rule, there were statistically significant main effects of sample size, proportion of communality on second factor, and the number of items loading on the second factor. For sample size, there were statistically significant effects between n=100 and n=450: x 2(l, 4800)=111.1, p=0.0001; between n=100 and n=800: x 2(l, 4800)=207.8, p=0.0001; and between n=450 and n=800: x 2(l, 4800)=48.2, p=0.0001. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: x 2(l, 4800)=227.4, p=0.0001; between 0.05 and 0.50: x 2(l, 146 4800)=180.4, P=0.0001; and between 0.30 and 0.50: x 2 ( l , 4800)=40.6, p=0.0001. The ratio-of-first-to-second-eigenvalues-greater-than four rule had statistically significant main effects of sample size, skewness, and proportion of communality on second factor. For sample size, there were statistically significant effects between n=100 and n=450: x 2 0 , 4800)=20.0, p=0.0001; and between n=100 and n=800: x 2 ( l , 4800)=19.3, p=0.0001. For proportion of communality of second factor, there were statistically significant effects 0.30 and 0.50: x 2 ( l , 4800)= 16.2, p=0.0001. The models for both PA continuous and PA Likert were examined. For PA continuous, there were statistically significant main effects of sample size, skewness, magnitude of communality, proportion of communality on second factor, and the number of items loading on the second factor. For sample size, there were statistically significant effects between n=100 and n=450: x 2 ( l , 4800)=61.0, p=0.0001; and between n=100 and n=800: x 2 ( l , 4800)=84.0, p=0.049. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: x 2 ( l , 4800)=6.0, p=0.0001; between 0.05 and 0.50: x 2 ( l , 480O)=819.0, p=0.0001; and between 0.30 and 0.50: x 2 ( l , 4800)=754.3, p=0.0001. PA Likert had similar results. There were statistically significant main effects of all conditions. For sample size, there were statistically significant effects between n=100 and n=450: x 2 ( l , 4800)=60.8, p=0.0001; and between n=100 and n=800: x 2 ( l , 4800)=84.6, p=0.049. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: x 2 ( l , 4800)=6.3, p=0.012; between 0.05 and 0.50: x 2 ( l , 4800)=818.4, p=0.0001; and between 0.30 and 0.50: x 2 ( l , 4800)=753.1, p=0.0001. 147 When M L R M S E A was examined, there were statistically significant main effects of all conditions. For sample size, there was a statistically significant effect between n=450 and n=800: X 2 ( l , 4800)=794.8, p=0.0001. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: x 2(L 4800)=90.6, p=0.0001; between 0.05 and 0.50: x 2(L 4800)=242.7, p=0.0001; and between 0.30 and 0.50: x 2(l, 4800)=52.7, p=0.0001. Likewise, there were statistically significant main effects of all conditions for GLS RMSEA. For sample size, there was a statistically significant effect between n=450 and n=800: X 2 ( l , 4800)=828.7, p=0.0001. For proportion of communality of second factor, there were statistically significant effects between 0.05 and 0.30: %2(l, 4800)=56.5, p=0.0001; between 0.05 and 0.50: x 2 0, 4800)=251.5, p=0.0001; and between 0.30 and 0.50: x 2(L 4800)=95.0, p=0.0001. In summary, sample size had a main effect on all nine of the individual decision-making rules and indices. Skewness had a main effect on eight of the nine individual decision-making rules and indices. Magnitude of communality had a main effect on seven of the nine individual decision-making rules and indices. Proportion of communality on the second factor and the number of items loading on the second factor had a main effect on all nine individual decision-making rules and indices. Research Questions 2A and 2B Best and Optimal Decision-Making Rules or Indices Research Question 2A investigated whether any one of the nine decision-making rules or indices performed best, in terms of detecting unidimensionality, in all combinations of 148 conditions explored. Research Question 2B was applied in order to investigate the optimal performances of the decision-making rule(s) under specific conditions. These results are presented for strict and essential unidimensionality separately. As mentioned in the methodology section, two criteria were developed: 50% and 80%. In order for a decision-making rule or index to be considered best, the rule or index had to exceed the 50% criterion in all conditions explored. In order for a decision-making rule or index to be considered optimal it had to exceed the 50% criterion for a specific set of conditions. After rules and indices were assessed for optimal performance, they were then examined to meet the second criterion of 80% in order to be considered for a combination. In accomplishing Research Question 2B, a list of possible combinations was constructed. Strict Unidimensionality As shown in the Table 6, there were no decision-making rules and indices that exceeded the first criterion of 50% for all sets of the conditions explored in the simulation. Therefore, there was no superior or best decision-making rule or index for all conditions of strict unidimensionality. The next step was to investigate which decision-making rules and indices performed optimally under specific conditions (i.e., Research Question 2B). Optimal Performance Sample size of 100: When distributions were non-skewed and magnitude of communality was 0.20, four rules and indices performed optimally (i.e., exceeded the 50% criterion): chi-square M L (0.95), chi-square GLS (0.97), PA continuous (0.94), and PA Likert (0.90). When 149 distributions were non-skewed and magnitude of communality was 0.90, six rules and indices performed optimally: chi-square M L (0.59), chi-square GLS (0.84), eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). When distributions were skewed and magnitude of communality was 0.20, only one index performed optimally: chi-square GLS (0.69). When distributions were skewed and magnitude of communality was 0.90, five rules and indices performed optimally: eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). Sample size of 450: When distributions were non-skewed and magnitude of communality was 0.20, seven rules and indices performed optimally: chi-square M L (0.93), chi-square GLS (0.94), eigenvalues-ratio-greater-than-three rule (.69), PA continuous (1.00), PA Likert (1.00), R M S E A M L (0.75), and R M S E A GLS (0.77). When distributions were non-skewed and magnitude of communality was 0.90, seven rules and indices performed optimally: chi-square M L (0.66), chi-square GLS (0.75), eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). When distributions were skewed and magnitude of communality was 0.20, two methods performed optimally: PA continuous (0.77), and PA Likert (0.74). When distributions were skewed and magnitude of communality was 0.90, five rules and indices performed optimally: eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA 150 Likert (1.00). Sample size of 800: When distributions were non-skewed and magnitude of communality was 0.20, eight rules and indices performed optimally: chi-square M L (0.96), chi-square GLS (0.96), eigenvalues- greater-than-one rule (.52), eigenvalues-ratio-greater-than-three rule (.96), PA continuous (1.00), PA Likert (1.00), R M S E A M L (1.00), and R M S E A GLS (1.00). When distributions were non-skewed and magnitude of communality is 0.90, all nine rules and indices perform optimally: chi-square M L (0.69), chi-square GLS (0.78), eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00), R M S E A M L (1.00), and R M S E A GLS (1.00). When distributions were skewed and magnitude of communality was 0.20, four methods performed optimally: PA continuous (0.95), and PA Likert (0.97), R M S E A M L (0.98), and R M S E A GLS (0.97). When distributions were skewed and magnitude of communality was 0.90, five rules and indices performed optimally: eigenvalues- greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). In summary, there was one main set of conditions that posed a problem for the performance of the rules and indices. When sample size was 100, distributions were skewed, and magnitude of communality was 0.20: the only rule that exceeded the 50% criterion was the chi-square GLS (0.69). 151 Combinations for Strict Unidimensionality After decision-making rules and indices were examined for optimal performance, as illustrated above, a second criterion of 80% was employed in order to determine if rules and indices were fit for a combination. The decision-making rules and indices were selected for a combination if they met the 80% criterion for a particular set of simulated design conditions (i.e., independent variables). These combinations and corresponding design conditions can be seen in Table 13. The following list of new combination rules was used for Research Questions 3A and 3B. It is important to note that there were two sets of conditions in which no rules and indices met the 80% criterion. The first was when sample size was 100, distributions were skewed, and the magnitude of communality was 0.20: chi-square G L S generated the highest accuracy rate (0.69) whereas the rest of the rules performed considerably poor. There was no combination formulated for this set of conditions. The second occasion was when sample size was 450, distributions were skewed, and the magnitude of communality was 0.20: both P A continuous (0.77) and P A Likert (0.74) exceeded the 50% criterion, but not the 80% criterion, and the rest of the rules performed poorly in this context. For this set of conditions only, the 50% criterion was used to form a combination (i.e., Rule 4 in Table 13). 152 Table 13 List of New Combination Rules for Strict Unidimensionality Rule Number Simulation-Design Conditions Combined Rules and Indices Rule 1 n=100, h2 =0.90, skew=0.00 Chi-square G L S , P A Continuous, P A Likert, Eigenvalues>l, Eigenvalues Ratio>4* Rule 2 n=100, h2 n=450, h2 =0.20, skew=0.00; =0.20, skew=0.00. Chi-square M L , Chi-square G L S , P A Continuous, P A Likert Rule3 n=100, h2 =0.90, skew=2.50; P A Continuous, P A Likert, n=450, h2 =0.90, skew=0.00; Eigenvalues>l, Eigenvalues Ratio>4* n=450, h2 =0.90, skew=2.50; n=800, h2 =0.90, skew=2.50. Rule 4 n=450, h2 =0.20, skew=2.50 P A Continuous, P A Likert RuleS n=800, h2 =0.90, skew=0.00 P A Continuous, P A Likert, Eigenvalues>l, Eigenvalues Ratio>4, R M S E A M L , R M S E A G L S * Rule 6 n=800, / i 2 =0.20, skew=0.00 Chi-square M L , Chi-square G L S , P A Continuous, P A Likert, Eigenvalues Ratio>3, R M S E A M L , R M S E A G L S Rule 7 n=800, h2 =0.20, skew=2.50 P A Continuous, P A Likert, R M S E A M L , R M S E A G L S Note: The ratio of first-to-second eigenvalues-greater-than three rule also met the 80% criterion, but it was not included in the combination because if the ratio of first-to-second eigenvalues-greater-than four was successful, then so would the eigenvalues-greater-than three rule. This rule was excluded because of redundancy. It is important to note that when applying one of the combination rules, a scale is declared unidimensional if one of the rules in the combination detects unidimensionality. For example, when applying Rule 7 above, if any of the four elements of the combination identify unidimensionality, then the scale is declared as a unidimensional measure. These combinations were formed using a conservative approach. In other words, a decision-making rule or index needed to meet an eighty percent criterion in order to be part-of a combination rule. Utilizing an 153 eighty percent criterion indicates thata rule or index needs to have a high accuracy rate in order to be included in a combination rule. A somewhat liberal approach was used when applying the combination rules - i.e., only one of the decision-making rules in a combination had to detect unidimensionality in order for a measure to be identified as unidimensional. In this dissertation, only unidimensional data were simulated (i.e., strict and essential undimensionality); therefore, applying the combination rules maximizes the accuracy (or sensitivity) of detecting unidimensionality. In short, the combination rules provide a better 'mouse-trap' for identifying unidimensionality in comparison to using an individual rule or index. In the psychometric literature and the present study, no one individual decision-making rule or index has been found to accurately detect unidimensionality under varying sets of conditions of sample size, communalities and skewness. The expectation is that combination rules will be more accurate than any individual rule under varying sets of conditions of sample size, communalities, and skewness - i.e., the combination rules should have broader applicability than any individual rule or index. Essential Unidimensionality As shown in Tables 7 to 10, there were no decision-making rules or indices that exceeded the first criterion of 50% for all sets of conditions explored. Therefore, there was no superior or best decision-making rule or index for all conditions explored of essential unidimensionality. The next step involved investigating which decision-making rules and indices 154 performed optimally under specific conditions (Research Question 2B). Although essential unidimensionality involved two additional conditions (i.e., the proportion of communality on the second factor and the number of items with non-zero loadings on the second factor), neither of these conditions were used to differentiate rules and indices as being optimal. On the average, accuracy rates for the nine individual decision-making methods exhibited similar patterns of performance for both of these conditions." For that reason, the results (i.e., accuracy rates) that were analyzed and reported below are taken from the section labeled as 30% of the communality loading on the second factor and three items loading on the second factor in Tables 7 through 10. These values represent a medium-sized secondary minor dimension."" Optimal Performance Sample size of 100: When distributions were non-skewed and magnitude of communality was 0.20, four rules and indices performed optimally (i.e., exceeded the 50% criterion): chi-square M L (0.98), chi-square GLS (0.98), PA continuous (0.86), and PA Likert (0.86). When distributions were non-skewed and magnitude of communality was 0.90, seven rules and indices performed optimally: chi-square M L (0.76), chi-square GLS (0.93), eigenvalues-greater-than-one rule (0.98), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). When distributions were skewed and magnitude of communality was 0.20, only one index performed optimally: chi-square GLS (0.77). When distributions were skewed and magnitude of communality was 0.90, five rules and 2 6 When the percentage of the communality loading on the second factor was 50%, several rules and indices exhibited poor performance. This level of the condition exhibits an extremely strong secondary minor dimension - it is on the border of being a second major factor. Hence, several of these rules and indices are failing to detect undimensionality under this value of the condition. For that reason, the 50% level of the condition was not chosen in order to examine whether rules and indices were optimal for combinations in detecting unidimensionality. Research Question 1A descibed the performance of the rules and indices under this value of the condition. 2 7 One of the goals of this dissertation is to examine the decision-making rules under an essential undimensional model. These values characterize essential unidimensionality well. The values of 30% of the communlity on second factor and three items loading on the second factor do not represent a weak secondary dimension (0.05), which would, to some extent, mirror strict unidimensionality, nor do these values represent an extremely strong secondary dimension (0.50 and six items), which would reflect a multidimensional, or two factor,, model. 155 indices performed optimally: eigenvalues- greater-than-one rule (0.60), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). Sample size of 450: When distributions were non-skewed and magnitude of communality was 0.20, six rules and indices performed optimally: chi-square M L (0.95), chi-square GLS (0.98), PA continuous (0.99), PA Likert (0.98), R M S E A M L (0.73), and R M S E A GLS (0.80). When distributions were non-skewed and magnitude of communality was 0.90, five rules and indices performed optimally: eigenvalues- greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous, (1.00), and PA Likert (1.00). When distributions were skewed and magnitude of communality was 0.20, two methods performed optimally: PA continuous (0.70), and PA Likert (0.69). When distributions were skewed and magnitude of communality was 0.90, five rules and indices performed optimally: eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). Sample size of 800: When distributions were non-skewed and magnitude of communality was 0.20, six rules and indices performed optimally: chi-square M L (0.97), chi-square GLS (0.97), PA continuous (1.00), PA Likert (1.00), R M S E A M L (1.00), and R M S E A GLS (1.00). When distributions were non-skewed and magnitude of communality was 0.90, seven rules and indices performed optimally: eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and 156 PA Likert (1.00), R M S E A M L (0.97), and R M S E A GLS (0.99). When distributions were skewed and magnitude of communality was 0.20, four methods performed optimally: PA continuous (0.84), and PA Likert (0.89), R M S E A M L (0.99), and R M S E A GLS (1.00). When distributions were skewed and magnitude of communality was 0.90, five rules and indices performed optimally: eigenvalues-greater-than-one rule (1.00), eigenvalues-ratio-greater-than-three rule (1.00), eigenvalues-ratio-greater-than-four rule (1.00), PA continuous (1.00), and PA Likert (1.00). In summary, there was one main set of conditions that posed a problem for the performance of the rules and indices. When sample size was 100, distributions were skewed, and magnitude of communality was 0.20: the only rule that exceeded the 50% criterion was the chi-square GLS (0.77). This was also found for strict unidimensionality. Combinations for Essential Unidimensionality After decision-making rules and indices were examined for optimal performance, a second criterion of 80% was employed in order to determine if rules and indices were fit for a combination. The decision-making rules and indices were selected for combination if they met the 80% criterion for a particular set of simulated design conditions. These combinations and corresponding design conditions can be seen in Table 14. This list of new combination rules found in Table 14 was used for Research Questions 3A and 3B. There were two sets of conditions in which no rules and indices met the 80% criterion. This was when sample sizes were 100, distributions were skewed, and the magnitude of 157 communality was 0.20: chi-square G L S generated the highest accuracy rate (0.77) whereas the rest of the rules performed poorly. There was no combination formulated for this set of conditions. The second occasion was when sample size was 450, distributions were skewed, and the magnitude of communality was 0.20: both P A continuous (0.77) and P A Likert (0.69) exceeded the 50% criterion, but not the 80% criterion, and the rest of the rules performed poorly in this context. There was no combination formulated for this combination. Table 14 List of New Combination Rules for Essential Unidimensionality Rule Number Simulation-Design Conditions Combined Rules and Indices Rule 1 n=100, h2 =0.90, skew=2.50; P A Continuous, P A Likert, n=450, h2 =0.90, skew=0.00; Eigenvalues>l, Eigenvalues Ratio>4 * n=450, h2 =0.90, skew=2.50; n=800, h2 =0.90, skew=2.50. Rule 2 n=800, h2 =0.90, skew=0.00 P A Continuous, P A Likert, Eigenvalues>l, Eigenvalues Ratio>4, R M S E A M L , R M S E A G L S * Rule 3 n=800, h2 =0.20, skew=2.50 P A Continuous, P A Likert, R M S E A M L , R M S E A G L S Rule 4 n=100, h2 n=450, h2 =0.20, skew=0.00; =0.90, skew=2.00. Chi-square M L , Chi-square G L S , P A Continuous, P A Likert, Rule 5 n=450, h2 n=800, h2 =0.20, skew=0.00; =0.90, skew=0.00. Chi-square M L , Chi-square G L S , P A Continuous, P A Likert, R M S E A M L , R M S E A G L S Rule 6 n=100, h2 =0.90, skew=0.00 Chi-square G L S , Eigenvalues >1, Eigenvalues Ratio>4, P A Continuous, P A Likert* Note: The ratio of first-to-second eigenvalues-greater-than three rule also met the 80% criterion, but it was not included in the combination because if the ratio of first-to-second eigenvalues-greater-than four was successful, then so would the greater-than three rule. This rule was excluded because of redundancy. 158 Research Question 3A and 3B Best and Optimal Combinations of Decision-Making Rules and Indices Research Questions 3A investigated whether a combination of the individual decision-making rules or indices performed best in all conditions. Research Question 3A also examined whether any combination performed better than any one individual rule or index. This was examined by comparing accuracy rates. Research Question 3B examined combinations for optimal performance in specific conditions. As mentioned in the methodology section, combinations were generated based off of the results from Research Question 2B, and accuracy rates were computed and assessed. Combinations were considered best if the 80% criterion was met for all conditions. Likewise, a combination was considered optimal if it met the 80% criterion for specific conditions. Strict Unidimensionality The nine decision-making methods were grouped into four main groups based on the theoretical and statistical similarities of the individual methods.2 8 The four main groups included the following: 1) M L chi-square test and G L S chi-square test (chi-square group), 2) eigenvalues-greater- than-one rule, ratio of first-to-second eigenvalues-greater-than three rule, and ratio of first-to-second eigenvalues-greater-than four rule (Eigenvalue group), 3) P A Continuous and P A Likert (PA group), and 4) M L R M S E A and G L S R M S E A ( R M S E A group). Table 15 displays the mean accuracy rates for the four main groups for strict unidimensional measures. In addition to these groups, the combinations that were formulated from Research Question 2B were In order to get a thorough understanding of the similar theoretical and statistical foundations of the decision-making methods that were grouped together, several different resources can be referenced: Tabachncik & Fidell (2001), Byrne (1998), Horn (1965), and Gota, Longman, Holden, Fekken, & Xinaris (1993). 159 applied. Table 16 presents the results for the new combination rules for strict unidimensionality. Results for the Four Main Combination Groups Table 15 Strict Unidimensional Results of the Mean Accuracy Rates for the Four Main Groups Skewness Magnitude of h2 Index Decision-Making Rule .20 .90 Sample Size 100 0.00 Chi-square 0.98 0.84 Eigenvalue Rules 0.10 1.00 PA 0.96 1.00 RMSEA 0.00 0.00 2.50 Chi-square 0.69 0.01 Eigenvalue Rules 0.00 1.00 PA 0.44 1.00 RMSEA 0.00 0.00 Sample Size 450 0.00 Chi-square 0.95 0.76 Eigenvalue Rules 0.69 1.00 PA 1.00 1.00 RMSEA 0.79 0.43 2.50 Chi-square 0.32 0.00 Eigenvalue Rules 0.00 1.00 PA 0.83 1.00 RMSEA 0.11 0.00 Sample Size 800 0.00 Chi-square 0.97 0.78 Eigenvalue Rules 0.96 1.00 PA 1.00 1.00 RMSEA 1.00 1.00 2.50 Chi-square 0.29 0.00 Eigenvalue Rules 0.00 1.00 PA 0.98 1.00 RMSEA 0.98 0.03 The P A group met the 80% criterion for ail simulation conditions except for one combined set: sample size of 100, skewed distributions, and magnitude of communality of 0.20. This is the same set of conditions in which no individual decision-making rule or index met the 80% criterion. Results for the New Combination Rules Table 16 Strict Unidimensional Results of the Mean Accuracy Rates for the New Rules Skewness Magnitude of h2 Index Decision-Making Rule .20 .90 Sample Size 100 0.00 Rule 1 1.00 1:00 Rule 2 1.00 1.00 Rule 3 0.96 1.00 Rule 4 0.96 1.00 Rule 5 0.96 1.00 Rule 6 1.00 1.00 Rule 7 0.96 1.00 2.50 Rule 1 0.91 1.00 Rule 2 0.81 1.00 Rule 3 0.44 1.00 Rule 4 0.44 1.00 Rule 5 0.44 1.00 Rule 6 0.81 1.00 Rule 7 0.44 1.00 Sample Size 450 0.00 Rule 1 1.00 1.00 Rule 2 1.00 1.00 Rule 3 1.00 1.00 Rule 4 1.00 1.00 Rule 5 1.00 1.00 Rule 6 1.00 1.00 Rule 7 1.00 1.00 2.50 Rule 1 0.85 1.00 Rule 2 0.85 1.00 161 Table 16 (continued) Rule 3 0.83 1.00 Rule 4 0.83 1.00 Rule 5 0.83 1.00 Rule 6 0.85 1.00 Rule 7 0.83 1.00 Sample Size 800 0.00 Rule 1 1.00 1.00 Rule 2 1.00 1.00 Rule 3 1.00 1.00 Rule 4 1.00 1.00 Rule 5 1.00 1.00 Rule 6 1.00 1.00 Rule 7 LOO L00_ 2.50 Rule 1 0.98 1.00 Rule 2 0.98 1.00 Rule 3 0.98 1.00 Rule 4 1.00 1.00 Rule 5 1.00 1.00 Rule 6 1.00 1.00 Rule 7 1.00 1.00 Superior or Best Combination: Research Question 3A As shown in Table 16, three combinations (i.e., new rules) met the 80% in all conditions explored: Rule 1, Rule 2, and Rule 6. Therefore, there were three superior or best combinations for strict unidimensionality (i.e., Research Question 3A). Research Question 3A also investigated whether a combination performed better than any one individual decision-making rule or index. There was no one superior or best individual rule or index found in Research Question 2A for strict unidimensionality. However, accuracy rates for the combinations and individual rules were assessed and compared in order to determine which combinations performed better than individual rules and indices in certain sets of conditions (this can be seen by comparing Table 6 to Table 16). 162 There were four different sets of conditions in which the combinations performed better than the nine individual rules and indices. The first set of conditions was when sample size was 100, distributions were non-skewed, and magnitude of communality was 0.20. In this context, Rule 1, Rule 2, and Rule 6 performed better than the individual rules and indices. The second set of conditions was when sample size was 100, distributions were skewed, and magnitude of communality was 0.20. In this set of conditions, Rule 1 and Rule 2 performed better than the individual rules and indices. The third set of conditions was when sample size was 450, distributions were skewed, and magnitude of communality was 0.20. In this situation, all seven combinations performed better than the individual rules and indices. The fourth set of conditions was when sample size was 800, distributions were skewed, and magnitude of communality was 0.20. In this context, Rule 4, Rule 5, Rule 6, and Rule 7 performed better than the individual rules and indices. In the remaining sets of conditions, the individual rules and indices performed just as well as the combinations. Optimal Combinations: Research Question 3B Research Question 3B examined combinations for optimal performance in specific conditions. Combinations were considered optimal if they met the 80% criterion for a certain set of conditions. Rule 1, Rule 2, and Rule 6 met the 80% criterion (i.e., were considered optimal) in all sets of conditions, and hence were considered superior and optimal for strict unidimensional measures. A l l seven rules would have been considered optimal or superior if the one set of conditions did not pose to be a problem (sample size=100, magnitude of communality=0.20, 163 skewness=2.50). In other words, all seven rules met the 80% for all conditions explored besides this one set of conditions, whereas Rule 1, Rule 2, and Rule 6 did meet the 80% criterion for this set. Essential Unidimensionality Similar to what was done for strict unidimensionality, the nine decision-making methods were grouped into four combinations. Again, these combination groups were formed based on the theoretical and statistical similarities of the individual methods. Table 17 displays the mean accuracy rates for the four main groups for essential unidimensional measures. In addition to these groups, the combinations that were formulated from Research Question 2B were applied. Table 18 presents the results for the new combination rules for essential unidimensionality. Results for the Four Main Combination Groups Table 17 Essential Unidimensional Results of the Mean Accuracy Rates for the Four Main Groups # of Items Proportion of h2 on 2n d Factor with non-zero Decision-Making Group .05 .30 .50 loadings on 2nd Factor Skewness=0.00, Communality=.20, Sample Size= 100 3.0 Chi-square 0.98 0.99 1.00 Eigenvalue Rules 0.07 0.04 0.02 PA 0.94 0.92 0.90 R M S E A 0.00 0.00 0.00 6.0 Chi-square 1.00 0.99 0.99 164 Table 17 (continued) Eigenvalue Rules 0.07 0.00 0.00 PA 0.86 0.86 0.66 RMSEA 0.00 0.00 0.00 Skewness=0.00, Communality=.20, Sample Size= 450 3.0 Chi-square 0.94 0.98 0.95 Eigenvalue Rules 0.56 0.24 0.01 PA 1.00 0.99 0.99 RMSEA 0.79 0.80 0.77 6.0 Chi-square 0.99 0.93 0.93 Eigenvalue Rules 0.62 0.00 0.00 PA 1.00 1.00 0.82 RMSEA 0.84 0.72 0.67 Skewness=0.00, Communality=.20, Sample Size= 800 3.0 Chi-square 0.96 0.97 0.90 Eigenvalue Rules 0.93 0.39 0.04 PA 1.00 1.00 0.97 R M S E A 1.00 1.00 1.00 6.0 Chi-square 0.98 0.95 0.73 Eigenvalue Rules 0.88 0.03 0.00 PA 1.00 1.00 0.69 RMSEA 1.00 1.00 1.00 Skewness=0.00, Communality=.90, Sample Size= 100 3.0 Chi-square 0.90 0.93 0.87 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.98 RMSEA 0.00 0.00 0.00 6.0 Chi-square 0.97 0.87 0.78 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.51 RMSEA 0.00 0.00 0.00 Skewness=0.00, Communality=.90, Sample Size= 450 3.0 Chi-square 0.77 0.46 0.08 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.90 RMSEA 0.44 0.16 0.00 6.0 Chi-square 0.84 0.15 0.00 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.01 RMSEA 0.62 0.03 0.00 Skewness=0.00, Communality=.90, Sample Size= 800 3.0 Chi-square 0.80 0.25 0.00 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.81 165 RMSEA LOO 099 0.20 . 6.0 Chi-square 0.82 0.00 0.00 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.00 RMSEA LOO 057 0.00 Skewness=2.50, Communality=.2Q, Sample Size= 100 3.0 Chi-square 0.66 0.77 0.80 Eigenvalue Rules 0.00 0.00 0.00 PA 0.40 0.52 0.42 RMSEA 0.00 0.00 0.00 6.0 Chi-square 0.68 0.79 0.90 Eigenvalue Rules 0.00 0.00 0.00 PA 0.40 0.39 0.39 RMSEA O00 O00 0.00 Skewness=2.50, Communality=.20, Sample Size= 450 3.0 Chi-square 0.43 0.43 0.44 Eigenvalue Rules 0.00 0.00 0.00 PA 0.88 0.79 0.81 R M S E A O08 OJ4 0.16 6.0 Chi-square 0.40 0.54 0.46 Eigenvalue Rules 0.00 0.00 0.00 PA 0.83 0.82 0.44 RMSEA 0.13 0.21 0.19 Skewness=2.50, Communality=.2Q, Sample Size= 800 3.0 Chi-square 0.40 0.29 0.46 Eigenvalue Rules 0.00 0.00 0.00 PA 0.93 0.92 0.82 RMSEA 099 LOO L00_ 6.0 Chi-square - 0.33 0.41 0.45 Eigenvalue Rules 0.00 0.00 0.00 PA 0.99 0.88 0.59 RMSEA 1.00 1.00 0.59 Skewness=2.50, Communality=.9Q, Sample Size= 100 3.0 Chi-square 0.00 0.01 0.01 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.90 RMSEA O00 O00 0.00 6.0 Chi-square 0.01 0.00 0.00 Eigenvalue Rules 1.00 1.00 0.98 PA 1.00 0.95 0.24 RMSEA 0.00 0.00 0.00 Skewness=2.50, Communality=.9Q, Sample Size= 450 3.0 Chi-square 0.00 0.00 0.00 Eigenvalue Rules 1.00 1.00 1.00 166 Table 17 (continued) PA 1.00 1.00 0.61 RMSEA O00 O00 O00 6.0 Chi-square 0.00 0.00 0.00 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 0.99 0.00 RMSEA 0.00 0.00 (X00 Skewness=2.50, Communality=.9Q, Sample Size= 800 3.0 Chi-square 0.00 0.00 0.00 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.26 R M S E A O04 (XOJ O00 6.0 Chi-square 0.00 0.00 0.00 Eigenvalue Rules 1.00 1.00 1.00 PA 1.00 1.00 0.00 RMSEA 0.03 0.00 0.00 Overall, the P A group was most consistent with meeting the 80% criterion. There were a few sets of conditions in which the P A group did not meet the 80% criterion, such as when the proportion of communality on the second factor was 0.50 with the number of items loading on the second factor as six (these two conditions combined formulate a strong secondary dimension). In addition, when sample size was 100, distributions were skewed, and a magnitude of communality as 0.20, the P A group did not meet the 80% criterion. The chi-square group met the 80% criterion for this set of conditions when the proportion of communality on the second factor was 0.50 and the number of items loading on the second factor was three and six. 167 Results for New Combination Rules Table 18 Essential Unidimensional Results of the Mean Accuracy Rates for the New Rules # of Items Proportion of h2on 2n d Factor with non-zero Decision-Making Group .05 .30 .50 loadings on 2nd Factor Skewness=0.00, Communality=.20, Sample Size= 100 3.0 Rule 1 0.94 0.92 0.90 Rule 2 0.93 0.91 0.90 Rule 3 0.94 0.92 0.90 Rule 4 1.00 1.00 1.00 Rule 5 1.00 1.00 1.00 Rule 6 1.00 1.00 1.00 6.0 Rule 1 0.86 .86 0.66 Rule 2 0.87 .87 0.71 Rule 3 0.86 .86 0.66 Rule 4 1.00 1.00 0.99 Rule 5 1.00 1.00 0.99 Rule 6 1.00 1.00 0.99 Skewness=0.00, Communality=.20, Sampl e Size= 450 3.0 Rule 1 1.00 0.99 0.99 Rule 2 1.00 1.00 0.99 Rule 3 1.00 0.99 0.99 Rule 4 1.00 1.00 0.99 Rule 5 1.00 1.00 0.99 Rule 6 1.00 1.00 0.99 6.0 Rule 1 1.00 1.00 0.82 Rule 2 1.00 1.00 0.90 Rule 3 1.00 1.00 0.89 Rule 4 1.00 1.00 0.98 Rule 5 1.00 1.00 0.98 Rule 6 1.00 1.00 0.98 Skewness=0.00, Communality=.20, Sample Size= 800 3.0 Rule 1 1.00 1.00 0.97 Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 1.00 168 Table 18 (continued) Rule 4 1.00 1.00 0.98 Rule 5 1.00 1.00 1.00 Rule 6 LOO LOO 0.98 6.0 Rule 1 1.00 1.00 0.69 Rule 2 1.00 1.00 0.89 Rule 3 1.00 1.00 1.00 Rule 4 1.00 1.00 0.83 Rule 5 1.00 1.00 1.00 Rule 6 ; L00 LOO 0.83 Skewness=0.00, Communality=.9Q, Sample Size= 100 3.0 Rule 1 1.00 1.00 1.00 Rule 2 1.00 1.00 1.00 Rule 3. 1.00 1.00 0.98 Rule 4 1.00 1.00 0.99 Rule 5 1.00 . 1.00 0.99 Rule 6 1.00 LOO L0p_ 6.0 Rule 1 1.00 1.00 1.00 Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.51 Rule 4 1.00 1.00 0.86 Rule 5 1.00 1.00 0.86 Rule 6 1.00 1.00 LOO Skewness=0.00, Communality=.9Q, Sample Size= 450 3.0 Rule 1 1.00 1.00 1.00 Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.90 Rule 4 1.00 1.00 0.90 Rule 5 1.00 1.00 0.90 Rule 6 LOO LOO L00_ 6.0 Rule 1 1.00 1.00 LOO Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.01 Rule 4 1.00 1.00 0.01 Rule 5 1.00 1.00 0.01 Rule 6 1.00 1.00 1.00 Skewness=0.00, Communality=.9Q, Sample Size= 800 3.0 Rule 1 1.00 1.00 LOO Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.82 Rule 4 1.00 1.00 0.81 Rule 5 1.00 1.00 0.82 Rule 6 LOO LOO L00_ 6.0 Rule 1 1.00 1.00 1.00 169 Table 18 (continued) Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.00 Rule 4 1.00 1.00 0.00 Rule 5 ' 1.00 1.00 0.00 Rule 6 1.00 JLOO L00_ Skewness=2.50, Communality=.2Q, Sample Size= 100 3.0 Rule 1 0.40 0.52 0.42 Rule 2 0.40 0.52 0.42 Rule 3 0.40 0.52 0.42 Rule 4 0.78 0.87 0.83 Rule 5 0.78 0.87 0.83 Rule 6 0/78 087 0.83 6.0 Rule 1 0.40 0.39 0.39 Rule 2 0.40 0.39 0.39 Rule 3 0.40 0.39 0.39 Rule 4 0.82 0.86 0.00 Rule 5 0.82 0.86 0.94 Rule 6 082 086 0.94 Skewness=2.50, Communality=.20, Sample Size 450 3.0 Rule 1 0.88 .0.79 0.81 Rule 2 0.89 0.79 0.81 Rule 3 0.89 0.80 0.83 Rule 4 0.92 0.83 0.84 Rule 5 0.92 0.83 0.84 Rule 6 092 083 0.84 6.0 Rule 1 0.83 0.82 0.44 Rule 2 0.85 0.83 0.49 Rule 3 0.85 0.83 0.49 Rule 4 0.87 0.86 0.64 Rule 5 0.87 0.86 0.64 Rule 6 0.87 0.86 0.64 Skewness=2.50, Communality=.20, Sample Size= 800 3.0 Rule 1 0.93 0.92 0.82 Rule 2 0.99 0.99 0.90 Rule 3 1.00 1.00 1.00 Rule 4 0.96 0.92 0.84 Rule 5 1.00 1.00 1.00 Rule 6 096 092 0.84 6.0 Rule 1 0.99 0.88 0.59 Rule 2 0.99 0.90 0.80 Rule 3 1.00 1.00 0.99 Rule 4 0.99 0.88 0.69 Rule 5 1.00 1.00 0.99 170 Table 18 (continued) Rule 6 0.99 0.88 0.69 Skewness=2.50, Communality=.9Q, Sample Size= 100 3.0 Rulel 1.00 1.00 1.00 Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.90 Rule 4 . 1.00 1.00 0.90 Rule 5 1.00 1.00 0.90 A Rule 6 1.00 LOO L00_ 6.0 , Rulel 1.00 .1.00 0.98 Rule 2 1.00 1.00 0.98 Rule 3 1.00 0.95 0.24 Rule 4 1.00 0.95 0.24 Rule 5 1.00 0.95 0.24 Rule 6 1.00 1.00 0.98 Skewness=2.50, Communality=.9Q, Sample Size= 450 3.0 Rule 1 1.00 1.00 1.00 Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.61 Rule 4 1.00 1.00 0.61 Rule 5 1.00 1.00 0.61 Rule 6 LOO L00 1.00 6.0 Rule l 1.00 1.00 1.00 Rule 2 1.00 1.00 1.00 Rule 3 1.00 0.95 0.00 Rule 4 1.00 0.95 0.00 Rule 5 1.00 0.95 0.00 Rule 6 1.00 1.00 1.00 Skewness=2.50, Communality=.90, Sample Size= 800 3.0 Rulel 1.00 1.00 100 Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.26 Rule 4 1.00 1.00 â€¢ . .0.26 Rule 5 1.00 1.00 0.26 Rule 6 1.00 , 1.00 L00_ 6.0 ' Rule 1 1.00 1.00 1.00 Rule 2 1.00 1.00 1.00 Rule 3 1.00 1.00 0.00 Rule 4 1.00 1.00 0.00 Rule 5 1.00 1.00 0.00 Rule 6 1.00 1.00 1.00 ( 171 Superior or Best Combination: Research Question 3A As shown in Table 18, Rule 4, Rule 5, and Rule 6 (i.e., new rules) met the 80% in all conditions explored. Therefore, there were three superior or best combinations for essential unidimensionality (i.e., Research Question 3A). Research Question 3A also investigated whether a combination performed better than any one individual decision-making rule or index. There was no one superior or best individual rule or index found in Research Question 2A for essential unidimensionality. Accuracy rates for the combinations and individual rules were compared in order to determine which combinations performed better than individual rules and indices in certain sets of conditions (this can be seen in comparing Tables 7 through 10 to Table 18). There were four different sets of conditions in which the combinations performed better than the nine individual rules and indices. The first set of conditions was when sample size was 100, distributions were skewed, and magnitude of communality was 0.20. In this context, Rule 4, Rule 5, and Rule 6 performed better than the individual rules and indices for all proportions of communality on second factor and number of items loading on the second factor. The second set of conditions was when sample size was 450, distributions were non-skewed, and magnitude of communality was 0.20. In this set of conditions, Rule 4, Rule 5, and Rule 6 performed better than the individual rules and indices when the proportion of communality on the second factor was 0.30 and 0.50 and when the number of items loading on the second factor was both three and six. The third set of conditions was when sample size was 100, distributions were skewed, and magnitude of communality was 0.20. In this situation, Rule 4, Rule 5, and Rule 6 performed 172 better than the individual rules and indices for all proportions of communality on second factor and number of items loading on the second factor. The fourth set of conditions was when sample size was 450, distributions were skewed, and magnitude of communality was 0.20. In this context, all six rules performed better than the individual rules and indices for all proportions of communality on second factor and number of items loading on the second factor. In the remaining sets of conditions, the individual rules and indices performed just as well as the combinations. Optimal Combinations: Research Question 3B Research Question 3B examined combinations for optimal performance in specific conditions. Combinations were considered optimal if they met the 80% criterion for a certain set of conditions. Rule 4, Rule 5 and Rule 6 met the 80% criterion in all conditions investigated, as mentioned for superior methods. A l l six rules met the 80% in all conditions except when sample size was 100, distributions were skewed, and magnitude of communality was 0.20. Therefore, all six methods would have been superior or optimal if this set of conditions did not pose a problem. Summary for Strict Unidimensionality Overall, both P A methods provided the strongest (i.e., highest accuracy rates) and most consistent (across various conditions) performance for strict unidimensionality. There were several conditions in which the chi-square G L S outperformed chi-square M L , and both performed relatively poor for skewed distributions. A l l three eigenvalue rules performed 173 generally poor when magnitude of communality was 0.20 and favorable when magnitude of communality was 0.90. The R M S E A indices were inconsistent. On the whole, when sample size was 100, distributions were skewed, and magnitude of communality was 0.20, all nine decision-making rules generated considerably low accuracy rates, except for chi-square GLS (0.69). There were no main effects of the independent variables found on the PA methods, whereas the other seven decision-making rules and indices had several main effects. The eigenvalues-greater-than-one rule and the ratio-of-first-to-second-eigenvalues-greater-than three rule had a main effect of sample size only. The ratio-of-first-to-second-eigenvalues-greater-than four rule had a main effect of magnitude of communality only. There was no superior or best decision-making rule or index for all conditions of strict unidimensionality. There were several optimal decision-making rules and indices for various sets of conditions, and these were formulated into combinations (see Table 13). There were four different sets of conditions in which the combinations performed better than the nine individual rules and indices. The one set of conditions that posed a problem for individual decision-making rules was when sample size was 100, distributions were skewed, and magnitude of communality was 0.20, as mentioned above. However, three combinations (i.e., new rules) met the 80% criterion in all conditions explored, including this one problem-set of conditions. Summary for Essential Unidimensionality As seen in strict unidimensional measures, both PA methods provided the strongest and most consistent performance. The chi-square GLS generated the highest accuracy rates when 174 sample size was 100, distributions were skewed, magnitude of communality was 0.20. A l l three eigenvalue rules performed generally poor when magnitude of communality was 0.20 and favorable when magnitude of communality was 0.90, and the R M S E A indices were inconsistent, which was the same results found with strict unidimensionality. Likewise, on the whole, when sample size was 100, distributions were skewed, and magnitude of communality was 0.20, all nine decision-making rules generated considerably low accuracy rates, except for chi-square GLS. A l l nine decision-making rules and indices had main effects off all conditions, except for the ratio-of-first-to-second-eigenvalues-greater-than three rule and the ratio-of-first-to-second-eigenvalues-greater-than four rule. The ratio-of-first-to-second-eigenvalues-greater-than three rule had main effects of proportion of communality on second factor and number of items loading on second factor, and the ratio-of-first-to-second-eigenvalues-greater-than four rule had main effects of sample size, skewness and proportion of communality on second factor. Because the number of cells was larger for the essential undimensional investigation, more replications were conducted, and therefore the overall sample size was much larger for essential undimensional models (n=7200). Consequently, even the smallest of differences may be generating statistically significant main effects of the conditions. Both rules for the ratio of the first-to-second-eigenvalues (i.e., greater than three and four) had the highest accuracy rates among the nine decision-making rules for the proportion of communality on the second factor as 0.50 when distributions were skewed, magnitude of communality was 0.90 (regardless of sample size and the number of items loading on the second 175 factor). Similarly, the R M S E A indices generated the highest accuracy rates among the nine decision-making rules for the proportion of communality on the second factor as 0.50 when distributions were skewed, magnitude of communality was 0.20, and sample size was 800. There was no superior or best decision-making rule or index for all conditions of essential unidimensionality. There were several optimal decision-making rules and indices for various sets of conditions, and these were formulated into combinations (see Table 14). Rule 4, Rule 5 and Rule 6 (i.e., new rules) met the 80% in all sets of conditions explored. Therefore, there were three superior or best combinations for essential unidimensionality. There were four different sets of conditions in which the combinations performed better than the nine individual rules and indices. When sample size was 100, distributions were skewed, and magnitude of communality was 0.20, all individual decision-making methods failed to meet unidimensionality. However, there were three superior or optimal combinations (i.e., new rules), which meant that the 80% criterion was met in all sets of conditions investigated, including this problem-set of conditions. 176 Chapter VII Discussion The purpose of the current study was to extend previous research by investigating how the nine decision-making rules and indices performed individually and in combination under varying conditions when assessing the unidimensionality of item response data. The overall objective was to provide guidelines to assist the social and behavioral science researcher in the decision-making process of retaining factors in an assessment of unidimensionality. For that reason, the intention of this chapter is to discuss the results of the computer simulation with an eye towards highlighting the contributions and providing a set of guidelines for researchers. The limitations of the investigation are explained, along with the program of research that stems from such limitations. As illustrated in Chapter III, general predictions were made in regards to the performance of the nine individual decision-making rules and indices. The results from these predictions are introduced and integrated with the previous literature findings. General Predictions of the Study To review, the predictions or hypotheses that were made in regards to the outcomes of the computer simulation were as follows: 1) the chi-square statistic will be sensitive to sample size, (2) overall, the eigenvalues-greater-than-one rule will perform poorly, except when sample size is large, (3) overall, P A for both continuous and Likert data will perform better than the other rules and indices, except when sample sizes are small, and (4) the R M S E A for M L and G L S F A 177 will provide inconsistent (i.e., unreliable) results. As stated previously, these predictions were based off of previous research findings, as Chapter III illustrated. Chi-square statistic The first hypothesis stated that the chi-square statistic would be influenced by sample size. Fabrigar et al. (1999) claimed that the chi-square statistic was extremely sensitive to sample size. This was partially true in the current simulation. There was a main effect of sample size on chi-square GLS, but not chi-square M L , for strict unidimensionality. Both M L and GLS chi-square tests had main effects of sample size for the essential unidimensional investigation. Eigenvalue Rule The second hypothesis stated that the eigenvalues-greater-than-one rule would perform poorly, especially for small sample sizes. This prediction proved to be partially true in this dissertation. In the current simulation study, for strict unidimensional measures, this rule seemed to perform inadequately when sample sizes were small and when communalities were low. When communalities were high, however, even for small sample sizes, the eigenvalues-greater-than-one rule had a 100% accuracy rate (see Table 6). Likewise, when assessing essential unidimensionality, for small sample size and low communalities, the eigenvalues-greater-than-one rule performed extremely poor (accuracy rates of 0.00), but when communalities increased to 0.90, this rule met the 80% criterion in all cases, except when the proportion of communality on the second factor was 0.50. 178 This rule was previously found to be most effective when sample sizes were large (Gorsuch, 1983), as was the case with the current findings. Hattie (1984) found the eigenvalues-greater-than-one rule to overestimate the number of factors for unidimensional cases. Fabrigar et al. (1999) reported that there had been no study that found the eigenvalues-greater-than-one rule to work. However, the results of this simulation show that this rule actually performs quite well under certain sets of conditions. Researchers continue to widely apply this rule, as presented in Chapter IV. It deems appropriate at this point to asses whether researchers are actually using this rule under appropriate sets of conditions (see guidelines below). PA Methods The third hypothesis claimed that the P A for both continuous and Likert data would perform better than the other rules and indices, except when sample sizes were small. Again, this prediction proved to be accurate. For both strict and essential unidimensionality, both P A methods provided the strongest accuracy rates overall. The P A methods did not perform well when sample sizes n=100. Likewise, Crawford and Koopman (1973) found that sample size and different factoring methods influenced the effectiveness of P A methods. Overall, however, this method was highly recommended in the literature for making decisions about the number of factors to retain. RMSEA Indices The fourth prediction stated that the M L and G L S R M S E A would provide unreliable results. This hypothesis was partially confirmed. For both strict and essential unidimensional 179 measures, R M S E A indices provided erratic results. However, for essential unidimensional measures, the R M S E A values generated the highest accuracy rates among the nine decision-making rules for the proportion of communality on the second factor as 0.50 when distributions were skewed, magnitude of communality was 0.20, and sample size was 800. In fact, overall, for non-skewed distributions with large sample sizes and communalities of 0.20, these indices performed quite well. Bryne (1998) found the cut points for the R M S E A index to be unreliable, but according to Browne and Cudeck (1992), R M S E A was a promising approach. Fabrgar et al. (1999) recommended using R M S E A , but also pointed-out that the performance of this index lacked empirical evidence. Contributions Foremost, there were several theoretical contributions to the research literature. First, the development and systematic comparison of several optimal combinations of decision-making rules and indices had never been conducted previous to this study. Second, a simultaneous, head-to-head comparison of nine widely used decision-making rules and indices for determining unidimensionality, utilizing the same sample of data, had never been investigated previous to this dissertation. Together, these two theoretical contributions shaped a practical contribution: the development of a (preliminary) decision-making methodology for determining unidimensionality (i.e., a set of guidelines). A n additional contribution of this study included the development of a procedural definition of essential unidimensionality. Essential unidimensionality is defined conceptually as 180 one dominant factor with the inclusion of an underlying secondary minor factor(s). As introduced in the methodology, essential unidimensionality was technically defined as the simultaneous manipulation of (1) the magnitude of communality, (2) the proportion of communality on the second factor, and (3) the number of items with non-zero loadings on the second (minor) factor. As mentioned above, the overall objective of the present study was to provide guidelines to assist researchers. It is vital for researchers in today's computer based society, where a variety of data analyses can be conducted quickly and efficiently, to be able to make decisions with confidence in regards to retaining factors and potentially using multiple criteria. By using the outcomes (i.e., guidelines) from this study, these researchers can assist with making appropriate inferences and high-stake decisions in policy, education, and health care when interpreting the results of measures that consist of item response data. Guidelines The following guidelines are based on the simulation results outlined in Chapter VI. These guidelines will serve as advice for the social and behavioral science researchers in the decision-making process of retaining factors in an assessment of unidimensionality. These guidelines are preliminary in that future research will need to further investigate how these guidelines perform using multiple data-sets. This advice is guided by what researchers actually have knowledge-of before making decisions. For example, sample size, magnitude of communality, and skewness of the distribution of items is information that a researcher can 181 actually attain, whereas information on population data is not necessarily information acquired by day-to-day researchers. Guidelines are provided for both strict and essential unidimensional measures. The guidelines provide recommendations based on certain values of conditions (e.g., sample sizes of 100, 450, and 800), and researchers will need to determine how closely their sample data approximate the conditions represented in these guidelines. In summary, the new combination rules provided extremely high accuracy rates, and in most cases performed better or just as well as the individual rules and indices. For that reason, it is recommended to use one or more of the new combination rules that are outlined in Tables 13 and 14 for determining both strict and essential unidimensionality. A set of guidelines is provided below for researchers who prefer using individual rules and indices (see Tables 20 and 22). In addition, as illustrated in Tables 19 and 21, even though a researcher may prefer the use of individual rales, there are several sets of conditions in which a new combination rule should be used instead of an individual rule or index. This is due to the combinations performing significantly better than the individual rules for those particular sets of conditions. Overall, for individual rules in both strict and essential unidimensional investigations, both P A continuous and P A Likert provided the strongest and most consistent results, except for one set of conditions (small sample sizes, skewed distribution, and low communalities), in which chi-square G L S provided the strongest results. Strict Unidimensionality There was no superior or best individual decision-making rule or index for all sets of 182 conditions of strict unidimensionality. There were several optimal decision-making rules and indices for various sets of conditions, and these were formulated into combinations. There were four different sets of conditions in which the combinations performed better than the nine individual rules and indices. Therefore, it is recommended that the following combination rule(s) be applied for the following sets of conditions, as shown in Table 19. Table 19 Recommended Combination Rules for Strict Unidimensionality Recommendation Simulation-Design Conditions New Combined Rules 1 n=100, h2 =0.20, skew=0.00 Rule l ,Ru le 2, and Rule 6 2 n=100, /?2=0.20, skew=2.50 Rule 1 and Rule 2 3 n=450, A 2 =0.20, skew=2.50 Any of the new seven rules 4 n=800, h2=0.20, skew=2.50. Rule 4, Rule 5, Rule 6, and Rule 7 Note: Refer to Table 13 in order to view the rules and indices of the specific combined rules outlined in this table. In the remaining sets of conditions, the individual rules and indices performed just as well as the combinations. Therefore, if a researcher prefers to use individual methods, it is recommended that the following individual rules and indices be used for the following set of conditions. 183 Table 20 Recommended Individual Rules for Strict Unidimensionality Recommendation Simulation-Design Individual Rules Conditions 5 n= = 100, h2 =0.90, skew= =0.00; Eigenvalues>l, Eigenvalues Ratio>3, n= n= 100, h2 =450, h2 =0.90, skew= =0.90, skew= =2.50; =0.00; Eigenvalues Ratio>4, P A Continuous, O R P A Likert n= =450, h2 =0.90, skew= =2.50; n= =800, h2 =0.90, skew= =2.50. 6 rp =450, h2 =0.20, skew: =0.00 P A Continuous O R P A Likert 7 n; =800, h2 =0.20, skew: =0.00 P A Continuous, P A Likert, M L R M S E A , O R G L S R M S E A 8 Eigenvalues>l, Eigenvalues Ratio>3, n: =800, h2 =0.90, skew: =0.00 Eigenvalues Ratio>4, P A Continuous, P A Likert, M L R M S E A , O R G L S R M S E A Again, there were three combination rules (Rule 1, Rule 2, Rule 6) that were considered superior, and hence could be applied to all sets of conditions. However, if a researcher would prefer to utilize one method, recommendations five through eight in Table 20 are deemed appropriate. Essential Unidimensionality There was no superior or best individual decision-making rule or index for all sets of conditions of essential unidimensionality. There were numerous optimal decision-making rules and indices for various sets of conditions, and these were formulated into combinations. There were four different sets of conditions in which the combinations performed better than the nine 184 individual rules and indices. Therefore, it is recommended that the following combination rule(s) be applied for the following sets of conditions, as shown in Table 21. Table 21 Recommended Combination Rules for Essential Unidimensionality Recommendation Simulation-Design Conditions New Combined Rules 1 n=100, h2=0.20, skew=0.00; Rule 4, Rule 5, and Rule 6 n=100, /i2=0.20, skew=2.50. 2 n=450, ft2 =0.20, skew=0.00 Rule 4, Rule 5, and Rule 6** 3 n=450, /i2=0.20, skew=2.50 Any of the new six rules Note: Refer to Table 14 in order to view the rules and indices of the specific combined rules outlined in this table. **In addition, when the proportion of communality on the second factor was 0.05, both PA methods individually performed just as well. In the remaining sets of conditions, the individual rules and indices performed just as well as the combinations. Therefore, if a researcher prefers to use individual methods, it is recommended that the following individual rules and indices be used for the following set of conditions for essential unidimensional measures. 185 Table 22 Recommended Individual Rules for Essential Unidimensionality Recommendation Simulation-Design Individual Rules Conditions 4 n = 100, h2 =0.90, skew=0.00; n=100, A 2 =0.90, skew=2.50; n=450, h2=0.90, skew=0.00 n=450, h2=0.90, skew=2.50; n=800, h2 =0.90, skew=0.00; n=800, h2=0.90, skew=2.50. 5 n=800, h2=0.20, skew=0.00 P A Continuous, P A Likert, M L R M S E A , O R G L S R M S E A * * 6 n=800, 6 2 =0.20, skew=2.50 R M S E A G L S * * * Note: *When the proportion of communality on the second factor is 0.50, just the rules for the ratio of first-to-second-eigenvalues-greater than thee and four are recommended. **When the proportion of communality on the second factor is 0.50, just the RMSEA indices are recommended. *** When the proportion of communality on the second factor is 0.50, both RMSEA indices are recommended. Again, there were three combination rules (Rule 4, Rule 5, Rule 6) that were considered superior, and hence could be applied to all sets of conditions. However, if a researcher would prefer to utilize one method, recommendations four through six in Table 22 are deemed appropriate. Applying the New Guidelines to the Students in My Classroom Scale Turning our attention back to the example that was introduced in Chapter I, Students in My Classroom Scale was found to be a unidimensional model using the ratio-of-first^to-second-eigenvalues-greater-than-three rule and ratio-of-first-to-second-eigenvalues-greater-than-four Eigenvalues>l, Eigenvalues Ratio>3, Eigenvalues Ratio>4, P A Continuous, O R P A Likert* 186 rule. Using the trends and findings from this simulation study, the dimensionality of the Students in My Classroom Scale can be assessed according to the new set of guidelines. It deems appropriate to utilize guidelines for an essential unidimensional model due to this scale being identified as a two-factor model by several of the other decision-making rules in Chapter I (i.e., a strong secondary minor dimension could be present). As indicated by Table 14, application of combination Rule 1 would be most appropriate. The sample size of this data set was n=450, the magnitude of communalities ranged from 0.50 to 0.80, and the skewness indicators of the item distributions ranged from 0.001 to 1.00. Therefore, for this data, communalities would be considered high (0.90) and skewness would be considered minimal (0.00), which are the specified conditions for Rule 1 (as well as the sample size of n=450). Combination Rule 1 includes P A continuous, P A Likert, eigenvalues-greater-than-one rule, and the ratio-of-first-to-second-eigenvalues-greater-than-four rule. According to the methodology of using combinations, at least one of the individual rules and indices in this combination need to identify the model as unidimensional. In Chapter I, it was shown that the ratio-of-first-to-second-eigenvalues-greater-than-three rule and the ratio-of-first-to-second-eigenvalues-greater-than-four rule identified unidimensionality for the Students in My Classroom Scale. Rule 1 would identify this scale as a unidimensional model because this combination includes the ratio-of-first-to-second-eigenvalues-greater-than-four rule. Furthermore, according to the recommendations for the application of individual rules and indices found in Table 22, recommendation four could also be utilized. Recommendation 187 four suggests that the ratio-of-first-to-second-eigenvalues-greater-than-four rule be used when sample size is 450, magnitude of communality is 0.90, and skewness is 0.00, which are similar conditions to those found in the Students in My Classroom data. As stated above, this rule identified a unidimensional model in Chapter I. Interestingly, the simulation results of this dissertation proved to be effective in practice. In other words, the simulation results indicated that the ratio-of-first-to-second-eigenvalue-greater-than four rule should be used under conditions where sample size n=450, communality estimates are high (.90) and item distributions are approximately normal (skewness=0.00). As stated previously, these were similar conditions for Students in My Classroom data. When applying the ratio-of-first-to-second-eigenvalue-greater-than four rule (as well as the recommended combination Rule 1) to this data, a unidimesional model is selected. Previous research has also shown this scale to be unidimensional (Roberts, Horn, Battistich, 1995). Limitations and Future Program of Research There are advantages and disadvantages to conducting a computer simulated study rather than employing real data. One disadvantage, in particular, is that computer simulated data may not necessarily correspond precisely to data configurations in the real world. Therefore, inferences made from the computer simulation are limited to the conditions and the values of the conditions that were selected for the simulated design. In addition, the literature review, simulation design, and chosen parameters of this dissertation were somewhat restricted by the software. This study reflected current practice, which restricted me to utilize software packages 188 and analyses that are widely and commonly used (i.e., SPSS and S A S do not analyze polychoric or tetrachoric correlation matrices) Likewise, one advantage to this dissertation was the ability to systematically manipulate specific conditions of interest (e.g., sample size). This allowed for the evaluation of procedures (i.e., rules and indices) under a wide variety of conditions that would be very challenging (if not impossible) to be simultaneously evaluated with real data. Type I Error Type I error rate (a ) is defined as the probability of rejecting a true null-hypothesis. Would using multiple decision-making rules simultaneously (i.e., use of combinations) indicate build-up of Type I error rate? The build-up of Type I error rate is applicable to this situation in that false statements or decisions could be made. For example, an instrument could fail to meet unidimensionality, whereas the measure actually included one dominant underlying factor. In spite of this, Type I error is not necessarily an appropriate error to be discussing in the context of this dissertation because there is not necessarily a statistical null-hypotheses. When determining the number of factors to retain, or determining unidimensionality, the chi-square M L and G L S methods entail a formal statistical foundation, in that a null-hypothesis is used. In context of this dissertation, the null-hypothesis could be perceived as the number of factors equal to one (i.e., H0: unidimensionality). The chi-square tests for M L and G L S would result with either rejecting this null-hypothesis or failing-to-reject this null-hypothesis. For that reason, both M L and G L S chi-square tests undergo the potential for build-up of Type I error rate when the combinations include both of these methods. Such rules include Rule 2 and Rule 6 for strict unidimensionality, and Rule 4 and Rule 5 for essential unidimensionality. 189 How does this influence the effectiveness of the combinations? For strict unidimensionality, Rule 1, Rule 2, and Rule 6 (i.e., new combinations rules) met the 80% criterion in all conditions explored, and were therefore considered superior. Furthermore, all seven new rules met the 80% criterion in all conditions explored, except for one set of conditions (sample size=100, magnitude of communality=0.20, and distribution=2.50). In order to avoid build-up of Type I error for essential unidimensionality, Rule 1 could be used for this problem-condition, and for the rest of the conditions, the remaining four rules (excluding Rule 2 and 6) could be applied. Likewise, Rule 4, Rule 5, and Rule 6 met the 80% criterion in all conditions explored for essential unidimensionality, and were therefore considered superior. In addition, all six new rules met the 80% criterion in all conditions explored, except for one set of conditions (sample size=100, magnitude of communality=0.20, and skewness=2.50). In order to avoid build-up of Type I error for essential unidimensionality, Rule 4 could be used for this problem-condition, and for the rest of the conditions, the remaining four rules (excluding Rule 5 and 6) could be applied. Program of Future Research This dissertation investigated numerous significant and germane conditions, but there were several variables that were held constant and encompassed a limited range of values. For example, this study did not explore the number of scale points. The number of scale points was 190 selected based on current research practices (Chapter IV) and was held constant at five. West, Finch, and Curran (1995) found that factor loadings and factor correlations (of CFA) become underestimated in the case when there are two or three response format categories. Overall, published reviews of F A literature rarely address the influence of the number of scale points on EFA solutions. This deems significant in that E F A methods were developed on the basis of continuous item responses. Future research will investigate how the number of scale points may influence individual and combined decision-making rules and indices when assessing unidimensionality. Furthermore, future studies will include the examination of polychoric and tetrachoric correlation matrices. Currently, the research literature provides various recommendations as to which correlation matrices should be used. In the case of ordered categorical variables, polychoric correlations are often computed, and tetrachoric correlations are often computed from dichotomous item response formats, which are then used as the input for FA. Comrey and Lee (1992) recommended product-moment correlation coefficients over tetrachoric correlations for FA. De Ayala and Hertzog (1991) found polychoric correlation coefficients to be quite promising in determining the number of factors. The current study utilized PPM correlation matrices, so as to reflect commonly used statistical software packages, such as SPSS. 2 9 In order to gain a better understanding as to which correlation matrices perform optimally, future investigations will include polychoric and tetrachoric correlation matrices. Moreover, the current study utilized uncorrelated factors. Hattie (1984) found that the intercorrelation between factors turned out to be important when determining unidimensionality. PPM is the default correlation matrix in most of the commonly used statistical software packages, such as SPSS and SAS. 191 Several rules and indices distinguished between one and two factors when the intercorrelation was 0.10, but not when the intercorrelation was 0.50 (Hattie). Future research will also include factors that are correlated. P A Likert will be examined more extensively. This study found that there was not a difference between P A Likert and P A continuous. The current simulation investigated P A Likert by creating and applying thresholds that matched the population. Researchers in practice would not necessarily have access to population data. Future investigations would need to occur on real data, where the P A cut-off points or threshold values for Likert data would be developed from the real data. Finally, the guidelines of the combinations need to be further investigated. Tables 13 and 14 provide optimal combinations in that the combinations of decision-making methods were selected based on the performance of the individual decision-making methods. The combinations were therefore expected to perform well. Although De Ayala and Hertzog (1991) suggested that the use of multiple methods may not necessarily be appropriate, several researchers recommended the use of combinations (Hattie, 1985; Fabrigar et al., 1999), and the combinations have proven to perform quite well in the current simulation. Therefore, future research will further investigate the performance of these combinations under different conditions (e.g., varying the correlation between factors and the number of scale points) using multiple data sets, including real data. This will allow for the examination of the replicability of the performance of the new combinations rules. 192 In conclusion The conclusions and set of guidelines that are provided in this chapter are not considered to be black and white. These guidelines do not provide a procedure that guarantees valid inferences. Validity is an on-going, integrative process that needs to be re-investigated for a measure and its' scores continuously. As recommended by Hubley and Zumbo (1996): It is not just the test scores that need to be validated, but the theory behind the inferences made from the test scores. It is important that both the data and the theory remain in touch because it is the theoretical conception of the construct that dictates the nature of the data used first to validate the scores on the test, and then interpretation. The data must be used to validate, reject, or revise the theory. As a result, all data (e.g., correlational work, group differences, reliability, observed changes) are potentially useful as evidence for construct validity, (p. 212) Researchers need to place the process of assessing unidimensionality in context and survey the purpose and use of the instruments under investigation. 193 References American Psychological Association, American Educational Research Association, and National Council on Measurement in Education (1999). Standards for education and psychological testing. Washington, D C : American Psychological Association. Aron, A . & Aron, E . (2002). Statistics for psychology (second edition). Upper Saddle River, NJ: Prentice Hall. Bartlett, M.S . (1950). Tests of significance in factor analysis. The British Journal of Psychology, 3, 77-85. Battistich, V . , Solomon, D. , Kim, D. , Watson, M . , & Schaps, E . (1995). Schools as communities, poverty levels of student populations, and students' attitudes, motives, and performance. American Educational Research Journal, 32, 627-658. Boyd, K. C . & Gorsuch, R. L . (2003). Factor replication, factor invariance, and salient loadings: three objective criteria for number of factors. Manuscript submitted for publication. Briggs, N . E . , & MacCallum, R . C . (2003). Recovery of weak common factors by maximum likelihood and ordinary least squares estimation. Multivariate Behavioral Research, 38(1), 25-56. Browne, M . W . , & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230-258. Bryne, B . M . (1998). Structural equation modeling with LISREL, PRELIS, and S1MPLIS: Basic concepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum Associates. 194 Cattell, R. B . (1958). Extracting the correct number of factors in factor analysis. Educational Researcher, 18,791-838. Cattell, R. B. (1962). The basis of recognition and interpretation of factors. Educational and Psychological Measurement, 22, 667-697'. Cattell, R .B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-267. Cattell, R. B. , & Jaspers, J. A . (1967). A general plasmode for factor analytic exercises and research. Multivariate Behavioral Research Monographs, 3, 224-245. Cattell, R. B. , & Vogelmann, S. (1977). A comprehensive trial of the scree and K . G . criteria for determining the number of factors. Multivariate Behavioral Research, 12, 289-325. Cliff, N . & Hamburger, C D . (1967). The study of sampling errors in factor analysis by means of artificial experiments. Psychological Bulletin, 68 (6), 430-445. Cliff, N . & Pennell, R. (1967). The influence of communality, factor strength, and loadings on the sampling characteristics of factor loadings. Psychometrika, 32 (3), 309-326. Cliff, N . (1988). The eigenvalues-greater-than-one rule and the reliability of components. Psychological Bulletin, 103, 276-279. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (second edition). Hillsdale, New Jersey: Lawrence Erlbaum Associates. Cohen, J. , Cohen, P., West, S. G . , & Aiken, L . S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences: third edition. Mahwah, New Jersey: Lawrence Erlbaum Associates. 195 Comrey, A . L . & Lee, H.B. (1973). A first course in factor analysis. New York, New York: Academic Press. Comrey, A . L . & Lee, H.B. (1992). A first course in factor analysis: second edition. Hillsdale, NJ: Erlbaum Associates. Cota, A .A. , Longman, R.S., Holden, R.R., Fekken, G.C., & Xinaris, S. (1993). Interpolating 95 t h percentile eigenvalues from random data: An empirical example. Educational and Psychological Measurement, 53, 585-596. Crawford, C. B. (1975). Determining the number of interpretable factors. Psychological Bulletin, 82,226-237. Crawford, C. B. , & Koopman, P. (1973). A note on Horn's test for the number of factors in factor in analysis. Multivariate Behavioral Research, 8, 117-125. Cronbach, L.J. (1951) Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Davison, M . L . & Sireci, S. G. (2000). Multidimensional scaling. In H. E. A. Tinsley & S. D. Brown (Eds.), Handbook of applied multivariate statistics and mathematics modeling (pp. 323-352). New York: Academic Press. De Ayala, R. J. & Hertzog, M . A. (1991). The assessment of dimensionality for use of item response theory. Multivariate Behavioral Research, 26(4), 765-792. De Champlain, A. & Gessaroli, M . E. (1998). Assessing dimensionality of item response matrices with small sample sizes and short test lengths. Applied Measurement in Education, 11(3), 231-251. 196 DiStefano, C. (2002). The impact of categorization with confirmatory factor analysis. Structural Equation Modeling, 9, 327-346. Fabrigar, L . R . , Wegener, D.T., MacCallum, R . C , & Strahan, E.J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299. Fava, J.L, & Velicer, W.F. (1992). The effects of over-extraction on factor and component analysis. Multivariate Behavioral Research, 27, 387-415. Fava, J.L., & Velicer, W.F. (1996). The effects of under-extraction in factor and component analyses. Educational & Psychological Measurement, 56(6), 907-929. Finch, J.F. & West, W.F. (1997). The investigation of personality structure: statistical models. Journal of Research in Personality, 31, 439-485. Gessaroli, M . E., & De Champlain, A. F. (1996). Using an approximate chi-square statistic to test the number of dimensions underlying responses to a set of items. Journal of Educational Measurement, 33, 157-179. Glenberg A. , (1996). Learning from data: An introduction to statistical reasoning. New York, New York: Lawrence Erlbaum Associates. Glorfeld, L.W. (1995). An improvement on Horn's parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55(3), 377-393. Gorsuch, R . L . (1973). Factor nnalysis. Philadelphia: W.B. Saunders Company. Gorsuch, R . L . (1983). Factor analysis: second edition. Hillsdale, NJ: Lawrence Erlbaum Associates. Gorsuch, R . L . (1997a). New procedures for extension analysis in exploratory factor analysis. Educational and Psychological Measurement, 57, 725-740. Gorsuch, R . L . (1997b). Exploratory factor analysis: Its role in item analysis. Journal of Personality Assessment, 68(3), 532-560. Gorsuch, R . L . (2003). Factor analysis. In J.A. Schinka & W . F . Velicer (Vol. Eds.), Handbook of psychology: Vol.2. Research methods in psychology, (pp. 143-164). New York: John Wiley & Sons. Green, S.B., Lissitz, R.W., Mulaik, S. (1977). Limitations of coefficient alpha as an index of text unidimensionality. Educational and Psychological Measurement, 37, 827-839. Guion, R. M . (1977). Content Validity: the source of my discontent. Applied Psychological Measurement, 1, 1-10. Guttman, L . (1954). A new approach to factor analysis: The radex. In P.F. Lazarsfeld (Ed.), Mathematical thinking in the social sciences (pp.258-348). Chicago: Free Press. Hakstian, A . R . & Muller, V . J . (1973). Some notes on the number of factors problem. Multivariate Behavioral Research, 8, 461-475. Hambleton, R. K. & Rovinelli, R. J. (1986). Assessing the dimensionality of a set of test items. Applied Psychological Measurement, 10(3), 287-302. Hambleton, R. K . , Swaminathan, H . , & & Rogers, H . J. (1991). Fundamentals of item response theory. Newbury Park, C A : Sage Publications, Inc. Hattie, J. (1984). Methodology review: Assessing unidimensionality of tests and items. Applied 198 Psychological Measurement, 20, 1-14. Hattie, J. (1985). A n empirical study of the various indices for determining unidimensionality. Multivariate Behavioral Research, 19, 49-78. Horn, J .L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185. Hoyle, R . H . & Duvall, J .L. (2004). Determining the number of factors in exploratory and confirmatory factor analysis. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 301-313). Thousand Oaks, C A : Sage Publishing. Hubley, A . M . & Zumbo, B . D. (1996). A dialectic on validity: Where we have been and where we are going. The Journal of General Psychology, 123(3), 207-215. Humphreys, L . G . (1952). Individual differences. Annual Review of Psychology, 3, 131-150. Humphreys, L . G . (1962). The organization of human abilities. American Psychologist, 17, 475-483. Humphreys, L . G . , & Ilgen, D.R. (1969). Note on a criterion for the number of common factors in factor analysis. Educational and Psychological Measurement, 29, 571-578. Humphreys, L . G . , & Montanelli, R . G . (1975). A n investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10, 193-205. Joreskog, K. G . (1969). A general approach to maximum likelihood factor analysis. Psychometrika, 34, 183-202. Kaiser, H .F . (1960). The application of electronic computers to factor analysis. Educational and 199 Psychological Measurement, 20, 141-151. Korn, E . L . & Graubbard, B . I. (1999). Analysis of health surveys. New York: John Wiley & Sons, Inc. Linn, R. L . (1968). A monte carlo approach to the number of factors problem. Psychometrika, 33, 37-72. Lumsden, J. (1957). A factorial approach to unidimensionality. Australian Journal of Psychology, 9, 105-11. MacCallum, R . C . (1990). The need for alternative measures of fit in covariance structure modeling. Multivariate Behavioral Research, 25, 157-162. MacCallum, R . C , Widaman, K . F . , Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84-99. MacCallum, R . C , Widaman, K . F . , Preacher, K . J . , & Hong, S. (2001). Sample size in factor analysis: The role of model error. Multivariate Behavioral Research, 36(4), 611-637. MacCallum, R . C . (2004, May). Factor analysis models as approximations. Paper presented at the meeting of the Chapel Hil l Conference on Factor Analysis, Chapel Hi l l , University of North Carolina. McDonald, R.P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6, 379-396. Messick, S. (1975). The standard problem: meaning and values in measurement and evaluation. American Psychologist, 30, 955-966. Messick, S. (1988). The once and future issues of validity: Assessing the meaning and 200 consequences of measurement. In H . Wainer & H.I. Braun (Eds.), Test validity (p. 33-45). Hillsdale, New Jersey: Lawrence Erlbaum. Mundfrom, D.J. , Shaw, D . G . , & L u Ke, T. (2005). Minimum sample size recommendations for conducting factor analysis. International Journal of Testing. Manuscript in press. Pett, M . A . , Lackey, N . R., & Sullivan, J. J. (2003). Making sense of factor analysis. Thousand Oaks, C A : Sage. Preacher, K . J . & MacCallum, R . C . (2002). Exploratory factor analysis in behavior genetics research: factor recovery with small sample sizes. Behavior Genetics, 32(2), 13-43. Preacher, K . J . & MacCallum, R . C . (2003). Repairing torn swift's electric factor analysis machine. Understanding Statistics, 2(1), 13-43. Roberts,W., Horn, A . , & Battistich, V . (1995, April). Assessing students' and teachers' sense of the school as a caring community. Paper presented as the annual meeting of the American Educational Research Association, San Francisco. Russell, D .W. (2002). In search of underlying dimensions: The use (and abuse) of factor analysis in Personality and Social Psychology Bulletin. Personality and Social Psychology Bulletin, 28(2), 1629-1646. Schonemann, P.H. (1981). Power as a function of communality in factor analysis. Bulletin of the Psychonomic Society, 17, 57-60. Spearman, C . (1904). 'General Intelligence' objectively determined and measured. American Journal of Psychology, 15, 201-293. Steiger, J .H. , & Lind, J. (1980). Statistically based tests for the number of common factors. Paper 201 presented at the annual meeting of the Psychometric Society, Iowa City, IA. Tabachnick, B . G . , & Fidell, L.S . (1996). Using multivariate statistics (third edition). New York: Harper Collins College Publishers. Tate, R. (2003). A comparison of selected empirical methods for assessing the structure of response to test items. Applied Psychological Measurement, 27(3), 159-203. Thompson, B. , & Daniel, L . G . (1996). Factor analytic evidence for the construct validity of scores: A n historical overview and some guidelines. Educational and Psychological Measurement, 56, 213-224. Tucker, L . R., & Lewis, C . (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 110-126. Velicer, W . F. , Eaton, C . A . , & Fava, J. L . (2000). Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factors or components. In R. D . Goffin & E . Helmes (Eds.), Problems and solutions in human assessment: Honoring Douglas N . Jackson at seventy. Norwell, M A : Kluwer Academic. West, S.G-, Finch, J.F. , & Curran, P.J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R . H . Hoyle (Ed.), Structural equation modeling: Concepts, issues and applications (pp. 56-75). Newbury Park, C A : Sage Publications. Wood, J . M . , Tataryn, D.J. , & Gorsuch, R . L . (1996). Effects of under- and over-extraction on principal axis factor analysis with varimax rotation. Psychological Methods, 1, 354-365. Zwick, W.R. , & Velicer, W.F . (1982). Factors influencing four rules for determining the number 202 of components to retain. Multivariate Behavioral Research, 17, 253-269. Zwick, W.R. , & Velicer, W . F . (1986). A comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432-442. 203 Appendix A Details of Criteria Reported in Journals n #of #of Analysis Extraction Scale Notes Items Factors Pts 1049 18 4 LISREL 8.0, RMSEA, CFI, GFI, chi- 5 JEP, 2002, 94(1) principal axis, square, scree plot, direct oblimin loadings>.30 190 24 4 LISREL 8.0, RMSEA, CFI, chi square, 9 JEP, 1999,91 (4) principal axis, eigen>1.0 Two different studies Varimax 945 105 1 LISREL 8.0 Modified parallel analysis, 2 JEP, 1999,91 (3) scree plot, eigen> 1.0 306 30 5 LISREL 8.0, PCA, chi square, RNI, TLI 8 JEP, 1999,91 (4) varimax 101 15 4 Common FA, Eigen>1.0, scree plot Mixed JEP, 2004, 96(1), 3-18; varimax various scales 650 54 . 6 SPSS 2001, Parallel analysis, eigen>1.0, 4 JEP, 2004, 96(1), 110-principal axis scree plot, loadings>.40 118 95 14 4 ML, varimax Eigen>1.0 Mixed JEP, 2003, 95 (4), 833-846; various scales 320 16 3 Principal factor Parallel analysis Mixed JEP, 2001,93 (4), 797-extraction, oblimin 825; various scales 290 12 2 ML, oblique RMSEA Mixed PA, 2004, 16(2), 169-quartimax 181; various scales 179 20 4 PCA, varimax Loadings>.30 PA, 2004, 20 (2), 106-115 1045 89 5 PCA, varimax Scree plot, communalities, Dichoto PA, 2004,20(2), 134-146 loadings>.30 mous 300 21 2 PCA, varimax Eigen>1.0, loadings>.40 5 PA, 2004, 15 (3), 384-391 378 30 3 PCA, varimax Eigen>1.0, scree plot 5 PA, 2003, 15 (3), 399-412 425 73 4 Principal axis, Loadings>.30 7 PA, 2004, 19(2), 85-91 direct oblimin 598 30 1 PCA Loadings>.30 4 PA, 2001 13 (2), 277-293 254 49 6 PCA, direct Loadings>.30 5 PA, 2000 17(3), 241-250 oblimin 740 13 4 ML, principal Scree plot, eigen> 1.0 Mixed PA, 2000 12(4), 426-239; factor, promax various scales 436 95 2 PCA, varimax Scree plot, eigenvalues>1.0 Mixed PA, 2000 12 (1), 77-88 293 23 2 PCA, varimax Scree plot, loadings>.30 7 PA, 2000 16(1), 66-76 903 102 5 LISREL 8.0, ML Chi Square, RMSEA 6 PA, 2004 16(1) 367 30 9 LISREL 8.0, ML Chi Square, RMSEA Mixed PA, 2002 14 (4); 439-450 598 30 1 Principal axis, Loadings>.30, chi square, 4 PA, 2001 13 (2); 277-293 varimax CFI, RMSEA 204 433 38 5 ML, oblique Scree plot, loadings>.40 6 PA, 2001 13 (1); 99-109 221 14 4 Principal axis, Eigen>1.0, 5 PA, 1999 11 (4); 525-533 direct oblimin communalities>.30, loadings>.40 447 10 3 ML, promax Loadings>.25 6 BJEP, 2003, 73 (3), 329-341 333 51 6 SPSS 6.1, AMOS, Eigen>1.0, scree plot, 5 BJHP, 2001,6(4), 373-PCA, varimax loadings>.40 384 174 22 2 LISREL 8.0, PCA, Eigen>1.0, scree plot, 7 EJPA, 2002, 18 (2), 112-varimax loadings>.30, RMSEA, chi- 117 square 634 60 4 SYSTAT 8.0, Scree plot, loadings>.30 3 EJPA, 2002, 18 (1), 30-42 Principal axis 473 73 2 Stat Soft, Loadings>.30 5 EJPA, 2003, 19 (2), 101-Hierarchical FA, 116 PCA Schmid-Leiman's algorithm 1059 50 2 LISREL 8.0, PCA, Chi square, RMSEA, GFI, 5 EJPA, 2002, 18 (1), 16-29 oblique 668 16 2 LISREL 8.0, ML RMSEA, chi square 5 EJPA, 2002, 18 (2), 158-1 f\A Specification of non-normal distribution 179 20 3 PCA, varimax Loadings>.30, scree plot 4 EJPA, 2004, 20 (2), 106-1045 99 5 PCA, varimax Loadings>.30, Dichoto 1 1 D EJPA, 2004, 20 (2), 134-communalities mous 146 425 13 4 Principal axis, Eigen>1.0, loadings>.30 7 EJPA, 2003, 19(2), 85-91 varimax, PCA, oblimin 1230 11 1 PCA, oblique Not reported 5 EJPA, 2002, 18(1), 16-29 174 9 1 ML Eigen>1.0, loadings>.50 7 EJPA, 2002, 18(1), 43-51 233 28 7 PCA, varimax Eigen>1.0 4 EJPA, 2002, 18(1), 63-77 9520 91 4 Principal axis, Parallel analysis . 5 EJPA, 2002, 18 (2), 97-promax 112 678 27 5 PCA, varimax Eigen>1.0, loadings>.30 3 EJPA, 2002, 18 (3), 259-274 1180 37 3 PCA Scree plot, eigen>l.O 5 EJPA, 2001, 17 (2), 87-97 254 49 6 PCA, direct Loadings>.30 5 EJPA, 2001, 17 (2), 241-oblimin 250 293 23 2 PCA, varimax Scree plot, eigen>1.0 5 EJPA, 2000, 16(1), 66-76 212 36 2 Principal axis, Scree plot, eigen> 1.0, 4 HP, 2002,21 (3), 254-262 varimax loadings>.40 388 21 2 . PCA Parallel analysis, 5 HP, 2002,21 (6), 564-572 loadings>.40 1382 77 8 Not reported Loadings>.40 4 HP, 2002,21 (1), 51-60 2864 21 6 Stata 6.0, not Not reported .4 HP, 2002,23 (4), 51-60 reported 205 899 100 3 LISCOMP, Oblique Loadings>.40 5 HP, 2000, 5 (3), 386-402 465 167 6 PCA Loadings>.30 6 HP, 1999,3(4), 1076-8998 193 66 4 Not reported Not reported 4 HP, 1999,4(1), 15-28 240 10 i Direct obi i min Eigen>l .0 4 HP, 1999, 18(4), 333-345 902 17 4 ML, oblique Not reported 4 HP, 2000, 5 (1), 127-141 708 16 4 Principal axis, PCA, varimax Eigen>1.0, loadings>.40 5 HP, 2000,5 (1), 111-126 191 6 2 ML, oblique Chi-Square 4 HP, 2000, 19(2), 155-164 496 18 3 PCA, varimax Eigen>1.0, Loadings>.40 7 HP, 2000, 5 (3), 386-402 842 10 3 Principal axis, Oblique Not reported 7 HP, 2000,5 (4), 417-427 340 21 2 PCA, promax, varimax Parallel analysis, salience of rotated factors 4 HP, 2001,20(2), 112-119 398 17 3 . PCA, oblique Eigen>1.0, scree plot, loadings>.40 6 EPM, 2004,43, 19-28 583 15 3 SPSS 10.0, LISREL 8.0, direct oblimin Eigen> 1.0, scree plot 5 EPM, 2002, 62 (6), 1028-1041 358 20 3 PCA, varimax, ML, oblimin Eigen>l!0, scree, parallel analysis 6 EPM, 2003, 63 (3), 465-483 509 92 9 PCA, varimax, Principal axis Eigen>1.0, scree plot, p:r ratio 4 EPM, 2002,62(1), 79-96 547 25 4 PCA, varimax, oblique Eigen>1.0, scree plot 7 EPM, 2001,61 (5), 818-826 503 32 3 Common, oblique Eigen> 1.0, scree plot 5 EPM, 2001, 61 (5), 849-865 275 20 2 Principal axis, oblique Eigen>1.0, parallel analysis 4 EPM, 2000, 60 (3), 439-447 458 46 4 Principal axis, oblique, promax orthogonal, varimax Eigen> 1.0, scree plot Dichoto mous EPM, 2000, 60(1), 100-116 140 17 3 PCA, varimax Eigen>1.0, loadings>.45, minimum gap between salient coeff on mult factors 7 EPM, 1999, 59(2), 310-324
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Assessing unidimensionality of psychological scales...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Assessing unidimensionality of psychological scales : using individual and integrative criteria from… Slocum, Suzanne Lynn 2005
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Assessing unidimensionality of psychological scales : using individual and integrative criteria from factor analysis |
Creator |
Slocum, Suzanne Lynn |
Publisher | University of British Columbia |
Date Issued | 2005 |
Description | Whenever one uses a composite scale score from item responses, one is tacitly assuming that the scale is dominantly unidimensional. Investigating the unidimensionality of item response data is an essential component of construct validity. Yet, there is no universally accepted technique or set of rules to determine the number of factors to retain when assessing the dimensionality of item response data. Typically factor analysis is used with the eigenvalues-greater- than-one rule, the ratio of first-to-second eigenvalues, parallel analysis (PA), root-mean-square- error-of-approximation (RMSEA), or hypothesis testing approaches involving chi-square tests from Maximum Likelihood (ML) or Generalized Least Squares (GLS) estimation. The purpose of this study was to investigate how these various procedures perform individually and in combination when assessing the unidimensionality of item response data via a computer simulated design. Conditions such as sample size, magnitude of communality, distribution of item responses, proportion of communality on second factor, and the number of items with nonzero loadings on the second factor were varied. Results indicate that there was no one individual decision-making method that identified undimensionality under all conditions manipulated. All individual decision-making methods failed to detect unidimensionality for the case where sample size was small, magnitude of communality was low, and item distributions were skewed. In addition, combination methods performed better than any one individual decision-making rule in certain sets of conditions. A set of guidelines and a new statistical methodology are provided for researchers. A future program of research is also illustrated. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2009-12-23 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0054414 |
URI | http://hdl.handle.net/2429/17203 |
Degree |
Doctor of Philosophy - PhD |
Program |
Measurement, Evaluation and Research Methodology |
Affiliation |
Education, Faculty of Educational and Counselling Psychology, and Special Education (ECPS), Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2005-105682.pdf [ 8.45MB ]
- Metadata
- JSON: 831-1.0054414.json
- JSON-LD: 831-1.0054414-ld.json
- RDF/XML (Pretty): 831-1.0054414-rdf.xml
- RDF/JSON: 831-1.0054414-rdf.json
- Turtle: 831-1.0054414-turtle.txt
- N-Triples: 831-1.0054414-rdf-ntriples.txt
- Original Record: 831-1.0054414-source.json
- Full Text
- 831-1.0054414-fulltext.txt
- Citation
- 831-1.0054414.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0054414/manifest