UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Score scale comparability in international educational assessments Sandilands, Debra Anne


Many countries, including Canada, are increasingly using international educational assessments to make comparisons of achievement across countries and to make important decisions regarding issues such as educational policy and curriculum. Most large-scale assessments have different forms that are adapted and/or translated for use across multiple language and cultural groups. Equivalence and fairness for examinees of all groups must be established in order to support valid score comparisons across groups and validity of decisions made based on these assessments. This study investigated the degree of score comparability in the Reader booklet of the Progress in International Reading Literacy Study (PIRLS) 2001 at three levels of scores. At the item level, differential item functioning (DIF) analyses were conducted using Ordinal Logistic Regression and Poly-SIBTEST. DIF items were grouped into bundles and analyzed for differential bundle functioning (DBF) using Poly-SIBTEST. Differences in item response theory-based test characteristic curves (TCCs) were analyzed to investigate comparability at the scale level. The study focussed on four countries: Argentina, Colombia, England and USA. The results of this study confirm previous studies that demonstrate a large degree of DIF in international educational assessments. Results also indicate a high degree of similarity between the two DIF methods used in identifying DIF items, but fail to support the correspondence between their effect size measures. This study expands on the research base regarding DBF and demonstrates a two stage approach to identifying potential causes of differential functioning. Results of DBF analyses indicate that cognitive levels tapped by reading comprehension questions may represent a source of bias leading to differential functioning in the Reader booklet. This study also contributes preliminary evidence for the possibility that the use of international item parameters to create individual country scores may provide a relative advantage to some countries due to the locations of their score distributions, which may have implications regarding current score scale creation methods.

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International