Score scale comparability in international educational assessments

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Score scale comparability in international educational assessments Sandilands, Debra Anne

Abstract

Many countries, including Canada, are increasingly using international educational assessments to make comparisons of achievement across countries and to make important decisions regarding issues such as educational policy and curriculum. Most large-scale assessments have different forms that are adapted and/or translated for use across multiple language and cultural groups. Equivalence and fairness for examinees of all groups must be established in order to support valid score comparisons across groups and validity of decisions made based on these assessments. This study investigated the degree of score comparability in the Reader booklet of the Progress in International Reading Literacy Study (PIRLS) 2001 at three levels of scores. At the item level, differential item functioning (DIF) analyses were conducted using Ordinal Logistic Regression and Poly-SIBTEST. DIF items were grouped into bundles and analyzed for differential bundle functioning (DBF) using Poly-SIBTEST. Differences in item response theory-based test characteristic curves (TCCs) were analyzed to investigate comparability at the scale level. The study focussed on four countries: Argentina, Colombia, England and USA. The results of this study confirm previous studies that demonstrate a large degree of DIF in international educational assessments. Results also indicate a high degree of similarity between the two DIF methods used in identifying DIF items, but fail to support the correspondence between their effect size measures. This study expands on the research base regarding DBF and demonstrates a two stage approach to identifying potential causes of differential functioning. Results of DBF analyses indicate that cognitive levels tapped by reading comprehension questions may represent a source of bias leading to differential functioning in the Reader booklet. This study also contributes preliminary evidence for the possibility that the use of international item parameters to create individual country scores may provide a relative advantage to some countries due to the locations of their score distributions, which may have implications regarding current score scale creation methods.

Item Metadata

Title	Score scale comparability in international educational assessments
Creator	Sandilands, Debra Anne
Publisher	University of British Columbia
Date Issued	2008
Description	Many countries, including Canada, are increasingly using international educational assessments to make comparisons of achievement across countries and to make important decisions regarding issues such as educational policy and curriculum. Most large-scale assessments have different forms that are adapted and/or translated for use across multiple language and cultural groups. Equivalence and fairness for examinees of all groups must be established in order to support valid score comparisons across groups and validity of decisions made based on these assessments. This study investigated the degree of score comparability in the Reader booklet of the Progress in International Reading Literacy Study (PIRLS) 2001 at three levels of scores. At the item level, differential item functioning (DIF) analyses were conducted using Ordinal Logistic Regression and Poly-SIBTEST. DIF items were grouped into bundles and analyzed for differential bundle functioning (DBF) using Poly-SIBTEST. Differences in item response theory-based test characteristic curves (TCCs) were analyzed to investigate comparability at the scale level. The study focussed on four countries: Argentina, Colombia, England and USA. The results of this study confirm previous studies that demonstrate a large degree of DIF in international educational assessments. Results also indicate a high degree of similarity between the two DIF methods used in identifying DIF items, but fail to support the correspondence between their effect size measures. This study expands on the research base regarding DBF and demonstrates a two stage approach to identifying potential causes of differential functioning. Results of DBF analyses indicate that cognitive levels tapped by reading comprehension questions may represent a source of bias leading to differential functioning in the Reader booklet. This study also contributes preliminary evidence for the possibility that the use of international item parameters to create individual country scores may provide a relative advantage to some countries due to the locations of their score distributions, which may have implications regarding current score scale creation methods.
Extent	2816742 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-04-27
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0054575
URI	http://hdl.handle.net/2429/7586
Degree	Master of Arts - MA
Program	Measurement, Evaluation and Research Methodology
Affiliation	Education, Faculty of; Educational and Counselling Psychology, and Special Education (ECPS), Department of
Degree Grantor	University of British Columbia
Graduation Date	2009-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Score scale comparability in international educational assessments Sandilands, Debra Anne

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights