UBC Theses and Dissertations
An application of the rasch logistc model to the assessment of change in mathematics achievement O’Shea, Thomas Joe
The purpose of this study was to explore the use of the Rasch simple logistic model for the measurement of group change in mathematics achievement. A survey of previous studies revealed no consensus as to the performance of present-day students compared with their counterparts in the past. Few studies attempted to identify changing performance on specific topics within the mathematics curriculum., Groups were compared most commonly on the basis of grade equivalents or raw scores. Neither of these is satisfactory for measuring change; grade equivalents have no natural statistical interpretation, and raw scores yield ordinal rather than interval measures. A possible solution to the problem of scale lay in the use of the Rasch model, since it purports to yield measures of item difficulty and person ability on a common interval scale. In 1964 and in 1970, all Grade 7 students in British Columbia wrote the Arithmetic Reasoning and Arithmetic Computation tests from the Stanford Achievement Test, Advanced Battery, Form L (1953 Revision). Random samples of 300 test booklets were available from each administration. In 1979, the same tests were administered to a sample of 50 Grade 7 classes, stratified by geographic region and size of school, selected from schools across the province. A random sample of 300 test booklets was drawn from the tests completed in 1979. The reasoning and computation tests contained 45 and 44 items, respectively. The items were reclassified by the researcher into ten content areas as follows: 1. Whole number concepts and operations (WNC) - 11 items 2. Applications using whole numbers (WNA) - 9 items 3. Common fraction concepts and operations (CFC) - 12 items 4. Applications using common fractions (CIA) - 8 items 5. Decimals (Dec) - 11 items 6. Money (Mon) - 9 items 7. Percent (Pet) - 8 items 8. Elementary algebra (Alg) - 9 items 9. Geometry and graphing (Geo) - 8 items 10. Units of measure (Mea) - 4 items The computer program BICAL was used to determine estimates of item difficulties and person abilities. A minimum cutoff score was established to eliminate examinees near the guessing level. An item was deemed non-fitting if its fit mean square exceeded unity by four or more standard errors and its discrimination index was less than 0.70. One of the key requirements of the study was to demonstrate that the two Stanford tests measured the same ability, thereby justifying the regrouping of the items. To this end, for each year a standardized difference score between Rasch ability on the reasoning test and on the computation test was calculated for, each person. The distribution of such scores was compared with the unit normal distribution, using the Kolmogorov-Smirnov statistic, Since no distribution differed from the unit normal at the 0.01 level of significance, the tests were assumed to measure the same ability. All 89 items were then calibrated as a whole for each year. Items were deleted from the analysis if they showed lack of fit on two of the three administrations. The deletion process was terminated after two recalibrations, with 10 items eliminated. For each pair of years, two standardized difference scores for the difficulty of each item were calculated: one reflected relative change of difficulty within the curriculum, and the other reflected absolute change of difficulty. For each content area the mean difficulty and standard error of the mean were calculated, and the standardized difference of the mean was determined for each comparison. The small number of items subsumed under Units of Measure precluded any reliable conclusions on this topic. Of the remaining nine topics only Elementary Algebra showed any relative change of difficulty; from 1964 to 1970 it became easier, both relatively and absolutely, probably due to increased emphasis on this topic within the curriculum,. The topic Percent was more difficult in 1970 than in 1 964. From 1970 to 1979, Elementary Algebra and, both topics dealing with common fractions became more difficult. Overall, from 1964 to 1979, five of the nine topics became more difficult: WNA, CFC, CFA, Dec, and Pet. No topic became less difficult. A comparison of decisions using the Rasch model and using the traditional model based on p-values showed the Rasch model to be more conservative. For example, from 1964 to 1979, all nine topics would have been judged mere difficult by using the traditional model. It was suggested that the differing decisions were due to the differing behaviours of the standard error of estimate in the two models. In the Rasch model, items of average difficulty are calibrated with the least standard error, while in the traditional model the standard error for items of greatest and least difficulty are estimated with the least standard error. The question of which is preferable was unresolved.
Item Citations and Data