UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A comparison of conventional and Rasch item analysis approaches applied to a grade four science test item pool Knodel, John William


The purpose of this study was to compare the results of applying conventional and Rasch item analysis approaches to a grade four science test item pool. A 76-item modified version of the pilot tests used to construct the British Columbia Grade k Science Assessment Test administered in Spring 1978 was utilized. This item pool was administered to 527 grade four students attending 15 schools located in three adjacent South Okanagan school districts. Eleven booklets were eliminated through application of criteria aimed at controlling for possible effects of speededness. Item analyses were obtained using the 516 remaining booklets. Preliminary investigations of the test data indicated that it would be best to limit item analyses to the 30-item Concepts and 32-item Processes subtests in the item pool. Coefficient alpha indices and factor analysis data were used to determine the unidimensionality of the subtests. Coefficient alpha indices indicated strong subtest homogeneity. For each of the subtests, however, more than one common factor was found on which there were salient loadings. Study of clusters of items with salient pattern coefficients, however, failed to yield unique definitions of possible different traits being measured by the subtests. It was decided that the subtests were essentially unidimensional and that application of the Rasch model was justified. The LERTAP computer program was used for conventional item analysis. Four criteria relating to item difficulty, corrected item-subtest point-biserial correlations, distractor-subtest point-biserial correlations, and distractor difficulty were applied. The BICAL computer program was used for the Rasch item analyses. Rasch criteria used related to item mean square fit, item discrimination, and item difficulty. For Rasch Approaches I and IV all criteria were used. In Rasch Approaches II and V, the item difficulty criterion was eliminated. Rasch Approaches Mi and VI used only the mean square fit criterion. For Rasch Approaches I, II, and III, Panchapakesan's correction for guessing formula was used to determine subject membership in the calibration sample. The random guessing level-formula was applied in Rasch Approaches IV, V, and VI. Eight comparisons were made on the subtests resulting from the application of the Conventional Approach and the six Rasch approaches. Four of the comparisons were aimed at the item level. These included the percentage of items rejected by each approach, the efficiency of the Rasch approaches in eliminating items illustrating problems related to conventional criteria, the percentage overlap of rejected items among pairs of different approaches, and the percentage of items rejected solely on the basis of Rasch criteria in the Rasch approaches. Four comparisons focussed on the subtests as entities. The first involved comparisons of numbers of items in each subtest, subtest means, standard deviations, and score ranges, as well as Hoyt estimates of internal consistency and subtest standard errors of measurement. A second comparison involved correlations of subjects' scores among all Concepts subtests and among all Processes subtests. In a third comparison, correlated t-tests were performed among all possible pairs of Concepts subtests and among all possible pairs of Processes subtests. The final comparison involved the fit of items in the subtests to the Concepts and Processes items used in the final version of the British Columbia Grade k Science Test. The one conventional and six Rasch approaches produced quite different Concepts and Processes subtests as regards specific items selected by each. Numbers of items in the subtests and consequently subtest characteristics, however, were more related to the stringency of the criteria applied rather than the approach--conventional or Rasch--used to build the subtests. The number of items in each subtest affected the reliability of the instrument. Use of the Spearman-Brown prophecy formula to adjust the Hoyt internal consistency estimates for the subtests yielded nearly equivalent reliabilities. The content sampling of the subtests was also a function of the stringency of the item analysis criteria used in their construction. Shorter subtests provided a poorer sampling of the content domain than did the longer subtests constructed using more lenient approaches. While there appears to be a similarity between the conventional £-value and the Rasch item difficulty index, application of Rasch criteria identified items with conventional item-subtest point-biserial problems less effectively. Rasch item analysis proved to be particularly inefficient in identifying items with conventional distractor problems. Although Rasch approaches produced subtests of equal reliability compared to those built using the conventional approach, and Rasch subtests ordered subjects in essentially the same fashion as subtests built using conventional methods, it was concluded that the conventional approach to itern analysis should remain the method of choice. The conventional approach provides information not only to identify poor items but also to improve them. Rasch approaches provide information related to item quality, but do not provide insights for improvement of poor items. Rasch approaches to item analysis should therefore only be applied to large item pools where rejection of items would not seriously affect the resulting instruments' effectiveness in sampling the content domain.

Item Media

Item Citations and Data


For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.