AN APPLICATION CF THE EASCH LOGISTIC MODEL TO THE ASSESSMENT OF CHANGE IN MATHEMATICS ACHIEVEMENT by THOMAS JOE O'SHEA B . Eng. , M c G i l l U n i v e r s i t y , 1960 Br. Ed,, U n i v e r s i t y of Saskatchewan, 1968 M. Ed., U n i v e r s i t y o f Manitoba, 1976 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF TEE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF EDUCATION THE FACULTY OF GRADUATE STUDIES (Mathematics Educaticn) We accept t h i s t h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF BRITISH COLUMEIA i n September 1979 Thomas Joe O'Shea, 1979 In presenting th i s thes is in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Univers i ty of B r i t i s h Columbia, I agree that the L ibrary sha l l make i t f ree l y ava i l ab le for reference and study. I further agree that permission for extensive copying of th is thesis for scho lar ly purposes may be granted by the Head of my Department or by his representat ives. It i s understood that copying or pub l i ca t ion of this thesis for f inanc ia l gain sha l l not be allowed without my writ ten permission. Department of Mathematics Education The Univers i ty of B r i t i s h Columbia 2075 Wesbrook P l a c e Vancouver, Canada V6T 1W5 Date October 12, 1979 i i ABSTRACT Research S u p e r v i s o r : Dr. D. ,F. R c h i t a i l l e The purpose of t h i s study was to explore the use of the Rasch simple l o g i s t i c model f o r the measurement of group change i n mathematics achievement. A survey of p r e v i o u s s t u d i e s r e v e a l e d no consensus as to the performance of present-day students compared with t h e i r c o u n t e r p a r t s i n the past. Few s t u d i e s attempted to i d e n t i f y changing performance on s p e c i f i c t o p i c s within the mathematics c u r r i c u l u m . , Groups were compared most commcnly on the b a s i s of grade e q u i v a l e n t s or raw scores. N e i t h e r of these i s s a t i s f a c t o r y f c r measuring change; grade e q u i v a l e n t s have no n a t u r a l s t a t i s t i c a l i n t e r p r e t a t i o n , and raw scores y i e l d o r d i n a l r a t h e r than i n t e r v a l measures.. A p o s s i b l e s o l u t i o n to the problem of s c a l e l a y i n the use of the Rasch model, s i r c e i t purports to y i e l d measures of item d i f f i c u l t y and person a b i l i t y on a common i n t e r v a l s c a l e . In 1964 and i n 1970, a l l Grade 7 students i n B r i t i s h Columbia wrote the A r i t h m e t i c Reasoning and A r i t h m e t i c Computation t e s t s from the S t a n f o r d Achievement Test, Advanced E a t t e r y , Form L (1953 R e v i s i o n ) . . Random samples of 300 t e s t b o o k l e t s were a v a i l a b l e from each a d m i n i s t r a t i o n . In 1979, the same t e s t s were administered to a sample of 50 Grade 7 i i i c l a s s e s , s t r a t i f i e d by geographic r e g i o n and s i z e of s c h o o l , s e l e c t e d from schools across the p r o v i n c e . , A random sample of 300 t e s t b ooklets was drawn from the t e s t s completed i n 1S79. The reasoning and computation t e s t s contained 45 and 44 items, r e s p e c t i v e l y . The items were r e c l a s s i f i e d by the r e s e a r c h e r i n t o t e n content areas as f o l l o w s : 1? Whole number concepts and o p e r a t i o n s (WNC) - 11 items 2. A p p l i c a t i o n s using whole numbers (WNA ) - 9 items 3. Common f r a c t i o n concepts and o p e r a t i o ns (CF C) - 12 items 4. A p p l i c a t i o n s using common f r a c t i o n s ( CIA) - 8 items 5. Decimals (Dec) - 11 items '6. Money (Mon) - 9 items 7. Percent (Pet) - 8 items 8. Elementary algebra (Alg) - 9 items 9. Geometry and graphing (Geo) - 8 items 10, Units of measure (Mea) - 4 items The computer program BICAL was used to determine e s t i m a t e s of item d i f f i c u l t i e s and person a b i l i t i e s . A minimum c u t o f f s c o r e was e s t a b l i s h e d to e l i m i n a t e examinees near the guessing l e v e l . An item was deemed n o n - f i t t i n g i f i t s f i t mean sguare exceeded u n i t y by f o u r cr more standard e r r o r s and i t s d i s c r i m i n a t i o n index was l e s s than 0.70. One of the key requirements of the study was to demonstrate t h a t the two S t a n f o r d t e s t s measured the same a b i l i t y , thereby j u s t i f y i n g the regrouping of the items. To t h i s end, f o r each year a s t a n d a r d i z e d d i f f e r e n c e score between Easch a b i l i t y on the reasoning t e s t and on the computation t e s t was c a l c u l a t e d for, each person. The i v d i s t r i b u t i o n of such s c o r e s was compared with the u n i t normal d i s t r i b u t i o n , using the Kolmogorov-Smirnov s t a t i s t i c , Since no d i s t r i b u t i o n d i f f e r e d from the u n i t normal a t the 0,01 l e v e l of s i g n i f i c a n c e , the t e s t s were assumed to measure the same a b i l i t y . A l l 89 items were then c a l i b r a t e d as a whole f o r each year,. Items were d e l e t e d from the a n a l y s i s i f they showed lack of f i t on two of the t h r e e a d m i n i s t r a t i o n s . The d e l e t i o n process was terminated a f t e r two r e c a l i b r a t i o n s , with 10 items e l i m i n a t e d . For each p a i r o f y e a r s , two s t a n d a r d i z e d d i f f e r e n c e s c o r e s f o r the d i f f i c u l t y of each i t e * were c a l c u l a t e d : one r e f l e c t e d r e l a t i v e change of d i f f i c u l t y w i t h i n the c u r r i c u l u m , and the other r e f l e c t e d absolute change c f d i f f i c u l t y . For each content area the mean d i f f i c u l t y and standard e r r o r of the mean were c a l c u l a t e d , and the s t a n d a r d i z e d d i f f e r e n c e of the mean was determined f o r each comparison. The s m a l l number of items subsumed under U n i t s of Measure precluded any r e l i a b l e ! c o n c l u s i o n s on t h i s t o p i c . Of I i • • • t h e remaining nine t o p i c s cnly Elementary Algebra showed any r e l a t i v e change o f d i f f i c u l t y ; from ;1964 to 1970 i t b€came e a s i e r , both r e l a t i v e l y and a b s o l u t e l y , probably due to i n c r e a s e d emphasis on t h i s t o p i c w i t h i n the c u r r i c u l u m , . The t o p i c Percent was more d i f f i c u l t i n 1970 than i n 1 964,. . From 1S70 t o 1979, Elementary Algebra and, both t o p i c s d e a l i n g with ccmmon f r a c t i o n s became more d i f f i c u l t . O v e r a l l , from 1964 to 1979, f i v e of the nine t o p i c s became more d i f f i c u l t : WNA, CFC, i CFA, Dec, and Pet. No t o p i c became l e s s d i f f i c u l t . V A comparison of d e c i s i o n s u s i n g the Rasch model and using the t r a d i t i o n a l model based cn p-values showed the Basch model to be more c o n s e r v a t i v e . , Forj example, from 1964 to 1S79, a l l nine t o p i c s would have been judged mere d i f f i c u l t by using the t r a d i t i o n a l model. I t was suggested t h a t the d i f f e r i n g d e c i s i o n s were due to the d i f f e r i n g behaviours of the standard e r r o r of estimate i n the tvo models. In the Rasch model, items of average d i f f i c u l t y are c a l i b r a t e d with t h e l e a s t standard e r r o r , while i n the t r a d i t i o n a l model the standard e r r o r f o r items of g r e a t e s t and l e a s t d i f f i c u l t y are estimated with the l e a s t standard e r r o r . The q u e s t i o n of which i s p r e f e r a b l e was u n r e s o l v e d . , v i TABLE OF CONTENT'S A b s t r a c t ..................................... .. ,. . , . i i L i s t o f Tables .......................................... . xx L i s t o f F i g u r e s ......................................... x i Acknowledgements ............ . . i i . . . . . . . . . . . . . . . . . . . . . . . . x i i CHAPTER I STATEMENT OF THE PROBLEM 1 Background to t h e Problem ............................ 3 Reported Change i n B r i t i s h Columbia ................... 4 The I n t e r p r e t a t i o n of Change ......................... 13 The Problem of Scale ................................. 16 An A l t e r n a t i v e Approach ............................... . 19 Purpose of the Study ................................. 21 S i g n i f i c a n c e of the Study ............................ 22 CHAPTER I I REVIEW OF THE LITERATURE ................... 24 St u d i e s on Change i n Mathematics Achievement ............ 24 Stud i e s a t the D i s t r i c t Levjei ..,.>.,..,... v . .,..;,..,..*«,. , 25 ,.i Studies at the P r o v i n c i a l or State L e v e l ...........v. 29 St u d i e s at the N a t i o n a l L e v e l ........................ . 34 i The Rasch L o g i s t i c Model ................................ 38 The E s t i m a t i o n of Parameters ......................... 41 The Standard E r r o r o f Parameters........................ 43 Standard E r r o r of Item D i f f i c u l t i e s ............... . 43 Standard E r r o r c f Person 1 A b i l i t i e s ............... . 45 The E v a l u a t i o n of F i t ................................ 46 The Need f o r R e c a l i b r a t i o n ............................... 50 I m p l i c a t i o n s of the Model ............................ 52 'Antecedent C o n d i t i o n s . The u n i d i m e n s i o n a l i t y c o n d i t i o n ................ The i n f l u e n c e of guessing ...................... The question of item d i s c r i m i n a t i o n ............. Consequent C o n d i t i o n s .,....................... • Sample-free item c a l i b r a t i o n .................. . T e s t - f r e e person c a l i b r a t i o n ........ . . The Issue of Sample S i z e • .,...................... . CHAPTER I I I DESIGN OF THE STUDY ....................... Samplinq P r o c e d u r e s . . ..............................., . . Data C o l l e c t i o n ............ ....... ............. V e r i f i c a t i o n of the Data ............................. BICAL E d i t i n q the Data ..................................... The D e l e t i o n of Persons ............................ The D e l e t i o n of Items ...... •«. ............... . T e s t i n g the U n i d i m e n s i o n a l i t y of the Two Tests ....... T e s t i n g t h e Changes i n Item D i f f i c u l t y .............. T e s t i n g Change i n Content Area D i f f i c u l t y ........... . CH AFTEE IV R ESUITS ............................. V e r i f i c a t i o n o f Data ................................. The D e l e t i o n of Persons .............................. Summary Raw Score S t a t i s t i c s . ......... ... T e s t s of U n i d i m e n s i o n a l i t y ........................... Item C a l i b r a t i o n Changes i n Item D i f f i c u l t y Changes i n Content Area D i f f i c u l t y ................... Comparison of R e s u l t s Using Rasch and T r a d i t i o n a l v i i i P rocedures ........................................... 134 CHAPTER V DISCUSSION AND CONCLUSIONS 139 Comparison of t h e Rasch and T r a d i t i o n a l Models ....... 140 Change i n Achievement i n B r i t i s h Columbia ............ 150 Sampling and M o t i v a t i o n C o n s i d e r a t i o n s ............ 150 Change i n Achievement by C o n t e n t R.rea ............. 153 L i m i t a t i o n s of the Study 165 Some Concerns and S u g g e s t i o n s f o r F u t u r e Research .... 166 REFERENCES 169 APPENDIX A B r i t i s h Columbia Report on the T e s t i n g o f A r i t h m e t i c , Grade V I I , March 1964 and May 1970 ....... 176 APPENDIX B S t a n f o r d Achievement T e s t s : A r i t h m e t i c Reasoning and A r i t h m e t i c Computation ................. 187 APPENDIX C The Computer Program BICAL 196 APPENDIX D Correspondence . . . 21fi i x LIST OF TABLES 1.1 Summary of Reported Changes i n D i f f i c u l t y Value ... 8 1.2 Item D i f f i c u l t i e s on the Reasoning Test ........... . 10 -1.3' Item D i f f i c u l t i e s on the Computation Te s t 11 1.4 Summary of Changes i n D i f f i c u l t y Values A f t e r E e a n a l y s i s of the Data ............................ 12 2.1 Summary of S t u d i e s on A r i t h m e t i c Achievement ...... 37 3.1 1979 Sample by Geographic Regions ................. 82 4,1 Summary Raw Score S t a t i s t i c s ...................... .107 4,. 2 Mean A b i l i t y Estimates on the Reasoning and . , , i Computation T e s t s .................................110 4.3 D i s t r i b u t i o n of St a n d a r d i z e d D i f f e r e n c e Scores on the Reasoning and Computation Tests ............... 110 4.4 Number of Subje c t s i n Each C a l i b r a t i o n ............ 114 4.5 Items Not Meeting F i t Mean Square C r i t e r i o n ....... 115 4.6 C h a r a c t e r i s t i c s of N o n - F i t t i n g Items .............. 115 4.7 N o n - F i t t i n g Items on R e c a l i b r a t i o n ................. 116 4.8 I l l - F i t t i n g Items i n F i n a l C a l i b r a t i o n ............ 117 4.9 Item D i f f i c u l t i e s and Standard E r r o r s ............. 121 4.10 Summary S t a t i s t i c s f o r A b i l i t i e s . . . . . . . . . . . 1 2 3 4.11 D i s t r i b u t i o n s of Standardized Scores Related t o R e l a t i v e Changes i n Item D i f f i c u l t y ..............• 125 4.12 Items Changing i n R e l a t i v e D i f f i c u l t y ............ . 126 4.13 Number of R e l a t i v e D i f f i c u l t y Changes ............. 126 4.14 D i s t r i b u t i o n s of Standardized Scores Related t o Absolute Change i n Item D i f f i c u l t y ................ 127 X 4.15 Items Changing i n Absolute D i f f a c u l t y ............. 129 4.16 Number of Absolute D i f f i c u l t y Changes ............. 129 4.17 Summary S t a t i s t i c s f o r Content Areas ............... 132 4.18 Changes i n Content Area D i f f i c u l t y 133 4.19 T r a d i t i o n a l A n a l y s i s of Change .............. 136 4.20 Items Showing D i s c r e p a n c i e s Between D e c i s i o n s Dsing the Easch and T r a d i t i o n a l Models .................. 138 4.21 Content Area D e c i s i o n s Dsing the Easch and T r a d i t i o n a l Models ..........., f................... 138 5.1 Time A l l o t t e d to the Study of A r i t h m e t i c .,,....,..161 5.2 Mean S a t i s f a c t i o n E a tings of Content Areas on the 1977 Grade 8 Assessment ........................... 164 LIST OF FIGURES x i 1.1 A h y p o t h e t i c a l example of the r e l a t i o n s h i p between raw score and a b i l i t y ............................. 17 2.1 Raw scores by item matrix ......................... . 56 I 2.2 Item c h a r a c t e r i s t i c curves (ICC's) ................ 58 2.3 ICC's with a guessing parameter ................... 59 2.4 ICC's with a d i s c r i m i n a t i o n parameter....,.,....,,.. 63 3.1 Two h y p o t h e t i c a l d i s t r i b u t i o n s o f t r a d i t i o n a l and Easch item d i f f i c u l t i e s ........................... 97 3.2 The t e s t i n g of change i n the r e l a t i v e d i f f i c u l t y of items ............................................ , .• 99 3.3 The t e s t i n g of change i n the a b s o l u t e d i f f i c u l t y of items .., ..., ,100 4.1 P a t t e r n s of i l l - f i t t i n g items ..................... 109 5. 1 The r e l a t i o n s h i p between % - d i f f i c u l t y and Rasch d i f f i c u l t y ........................................ 142 5.2 The r e l a t i o n s h i p between Rasch item d i f f i c u l t y and standard e r r o r ....................... ....... ....... . 144 5,. 3 V a r i a t i o n i n confidencej bands w i t h i n the ii • ! • ' ' t r a d i t i o n a l and Rasch models ........................ 145 5.4 E f f e c t of v a r y i n g standard e r r o r s on d e c i s i o n s or changing item d i f f i c u l t y .......................... 147 x i i Acknowledgements I wish to express my thanks to my committee chairman, Dr..David R c b i t a i l ' l e , f o r the way i n which he s u p e r v i s e d the production of t h i s d i s s e r t a t i o n . He was g e n e r a l l y s u p p o r t i v e , sometimes demanding, and- at a l l times, he showed con f i d e n c e i n my a b i l i t y t o pursue the t o p i c as I saw f i t . I would a l s o l i k e to thank my committee members: Dr. Merle Ace, f o r guidance i n the use of the Easch model. Dr. Todd Eodgers, f o r c l a r i f i c a t i o n of s t a t i s t i c a l i s s u e s . Dr. James S h e r r i l l , f o r d e t a i l e d t e x t u a l comments, and Dr. G a i l S p i t l e r , f o r p l a c i n g the study i n the c o n t e x t of g e n e r a l e d u c a t i o n a l i s s u e s . , F i n a n c i a l support f o r the study was provided through a grant from the E d u c a t i o n a l Research I n s t i t u t e o f B r i t i s h Columbia, and t h e i r support was much a p p r e c i a t e d . F i n a l l y , I wish to acknowledge the c o n t r i b u t i o n of Peg McCartney whose support was a constant source of s t r e n g t h . Her l o s s changed the p u r s u i t of the d o c t o r a t e from a joy to a j o t . 1 CHAPTER I STATEMENT OF THE PROBLEM I t i s c u r r e n t l y f a s h i o n a b l e t o lament the l e v e l of mathematical knowledge possessed by graduates of p u b l i c e d u c a t i o n a l i n s t i t u t i o n s (e.g., B e l t z n e r , Coleman, & Edwards, 1976). Moreover i t i s a common b e l i e f t h a t s c h o o l s were more s u c c e s s f u l i n t e a c h i n g fundamental mathematical s k i l l s a t some p o i n t i n the past (e.g., Armbruster, 1S77) . O b j e c t i v e evidence which might confirm such a b e l i e f i s d i f f i c u l t to f i n d ; .. C r i t i c s are d i v i d e d as t o the e x i s t e n c e and i m p l i c a t i o n s o f o b j e c t i v e measures c f mathematical achievement. For example, i n a rec e n t book e n t i t l e d Why Jchnny. Can 11 Add: The F a i l u r e of the New Math, M o r r i s K l i n e (1973) attacked the c u r r i c u l u m reforms of the l g e O ' s . In s p i t e o f the suggestiveness of the t i t l e , K l i n e f a i l e d to document such a f a i l u r e . In p a r t i c u l a r , one might have expected h i s chapter e n t i t l e d "The Testimony of T e s t s " to c o n t a i n data s u p p o r t i n g h i s i m p l i e d c o n t e n t i o n t h a t computational p r o f i c i e n c y had d e c l i n e d . On the c o n t r a r y , h i s main argument i n t h a t chapter i s t h a t no s u i t a b l e t e s t s or t e s t i n g programs have yet been d e v i s e d t c c a r r y out the necessary measurements. The most ambitious attempt t o date to assess the s t a t e o f mathematics education i n the United S t a t e s i s the r e p o r t (1975) of the N a t i o n a l Advisory Committee on Mathematical Education (NACOME). The ccmmittee reviewed c u r r i c u l u m reforms from 1955 t o 1975, i d e n t i f i e d new c u r r i c u l a r emphases i n c u r r e n t programs, o u t l i n e d v a r i o u s p a t t e r n s of i n s t r u c t i o n i n use, addressed the problem of t e a c h e r e d u c a t i o n , and, f i n a l l y , t a c k l e d the q u e s t i o n of e v a l u a t i o n . T h e i r c o n c l u s i o n s concerning achievement are overshadowed by the r e a c t i o n o f the committee to the e v a l u a t i o n procedures themselves: " U n f o r t u n a t e l y , e v a l u a t i o n i n American mathematics education i s c h a r a c t e r i z e d by use of l i m i t e d technigues i n a p p r o p r i a t e l y matched t c q o a l assessment t a s k s " (p. 119).. The committee recommended t h a t the use of g r a d e - e q u i v a l e n t scores be abandoned. They arqued that t e s t i n g samples of students would avoid the problem of u n j u s t i f i e d o v e r - t e s t i n g . They suggested t h a t the use of standard norm-referenced t e s t s t o assign an o v e r a l l measure of performance was not a p p r o p r i a t e f o r programs with s p e c i f i c g o a l s . They p r e f e r r e d the development cf s u i t a b l e c o l l e c t i o n s of t e s t items on c a r e f u l l y c o n s t r u c t e d , o b j e c t i v e - d i r e c t e d t e s t s c a l e s . They concluded: To make e v a l u a t i o n play a p o s i t i v e and e f f e c t i v e r o l e i n s c h o o l mathematics today there i s an urgent need to develop a much broader c o l l e c t i o n of measurement te c h n i g u e s and instruments and to match these e v a l u a t i o n t o o l s more a p p r o p r i a t e l y to the v a r i e d purposes of e v a l u a t i o n . (p. 135) In 1973 the Science C o u n c i l of Canada agreed t o fund a p r o j e c t proposed by s i x c o l l a b o r a t i n g n a t i o n a l mathematics s o c i e t i e s . The aim of the p r o j e c t was to take i n v e n t o r y of the mathematical s c i e n c e s i n Canada and t c formulate p o l i c y i n the n a t i o n a l i n t e r e s t . . The study was completed i n 1975 and pu b l i s h e d a year l a t e r as Mathematical Sciences i n Canada ( E e l t z n e r et a l . , 1976).. Chapter IV of the study d e a l t with the t e a c h i n g of mathematics i n Canadian elementary and secondary s c h o o l s . The authors concluded t h a t "the o v e r a l l p i c t u r e i n Canada at present c o n t a i n s so much d i s t r e s s , unease and c o n f u s i o n that e n e r g e t i c steps must be i n i t i a t e d immediately to improve the s i t u a t i o n " (p. 114). This c o n c l u s i o n appears to be based on two c l a s s e s of evidence: (1) o p i n i o n expressed i n b r i e f s and by i n d i v i d u a l s , and (2) o b j e c t i v e data,; Regarding the l a t t e r : The most c o n v i n c i n g o b j e c t i v e p i e c e of i n f o r m a t i o n which was presented to the Mathematics Study c o n s i s t e d of a Report on the T e s t i n g of A r i t h m e t i c i s s u e d by t h e Department of Education of B r i t i s h Columbia. (pp. ; 113-114) Th i s same r e p o r t a l s o formed the main s u b j e c t of the 1969-1970 annual r e p o r t of the D i r e c t o r of the Besearch and Standards Branch of the B r i t i s h Columbia Department of Education (1971);. * In h i s r e p o r t , the D i r e c t o r summarized seme r e s u l t s of two a r i t h m e t i c t e s t i n g programs wbich had been i The E r i t i s h Columbia Department o f Education was reorganized and renamed the B r i t i s h Columbia M i n i s t r y o f Education i n 1976-77. The terms "Department" and " M i n i s t r y " are int e r c h a n g e a b l e i n t h i s study. 4 c a r r i e d out i n 1964 and i n 1970. His summary d e a l t with the a n a l y s i s of responses to each t e s t item, and with the performance o f the B r i t i s h Columbia students as a whole compared with t h e i r United S t a t e s c o u n t e r p a r t s . He concluded t h a t t h e r e were evident computational d i f f i c u l t i e s t h a t i r d i c a t e d "a need f o r a r e t u r n to that n e g l e c t e d and unpopular procedure c a l l e d ' r e p e t i t i o n ' or ' d r i l l ' 1 1 (p. G6 2) . He pointed out, however, t h a t the students i n 1964 had scored c o n s i d e r a b l y b e t t e r than the Onited S t a t e s s t a n d a r d i z a t i o n group and, although performance had d e c l i n e d , the 1970 B r i t i s h Columbia scores corresponded approximately to American t e s t norms. The study c a r r i e d out by the B r i t i s h Columbia Department of Education has been i n f l u e n t i a l both a t the p r o v i n c i a l l e v e l where i t was used to b o l s t e r arguments f o r a r e t u r n to " d r i l l " , and at the n a t i o n a l l e v e l where i t was c i t e d as o b j e c t i v e evidence of a c u r r e n t l y u n s a t i s f a c t o r y s t a t e of a f f a i r s i n mathematics e d u c a t i o n . To warrant such i n f l u e n c e i t i s reasonable to assume t h a t the study was w e l l -documented and founded on a s o l i d i n f e r e n t i a l base. T h i s may not be the case as the f o l l o w i n g d i s c u s s i o n w i l l show. Reported Change i n B r i t i s h Cclumbia The 1970 Report on t h e T e s t i n g of A r i t h m e t i c i s s u e d by the B r i t i s h Columbia Department of E d u c a t i o n c o n s t i t u t e s Appendix A of t h i s study* . I t should be r e f e r r e d t c f o r d e t a i l s . What f o l l o w s here i s a g e n e r a l d e s c r i p t i o n of that 5 study and a c r i t i q u e c f i t s procedures. . In March 1964 the Department of Education administered the Stanford Achievement T e s t , Advanced B a t t e r y , P a r t i a l , Form L, t o the p o p u l a t i o n c f Grade 7 students of B r i t i s h Columbia. Completed forms were returned by 29 204 students out of an estimated enrolment of 29 533.. The S t a n f o r d Achievement Test has a lonq h i s t o r y with many r e v i s i o n s d a t i n q back to 19:23, The e d i t i o n used f o r the B r i t i s h Columbia study was the 1953 r e v i s i o n which was st a n d a r d i z e d i n the s p r i n g of 1952 cn a norm sample r e p r e s e n t a t i v e by geographic r e g i o n and ty s i z e of s c h o o l system i n the U.S.A., e x c l u d i n g p u p i l s i n segregated Negro systems ( K e l l e y , Madden, Gardner, Terman, & Euch, 1953). The content of the S t a n f o r d b a t t e r y was based cn the c u r r i c u l u m of American s c h o o l s of the l a t e 1940's. The b a t t e r y c o n s i s t e d of s i x t e s t s : Reading was measured by two te s t s - - P a r a g r a p h Meaning and Word Meaning; Language and S p e l l i n g were each measured by a s i n g l e t e s t ; A r i t h m e t i c was measured by two t e s t s — A r i t h m e t i c Reasoning and A r i t h m e t i c Ccmputation. The b a t t e r y was administered i n fou r s i t t i n g s , each of approximately f o r t y minutes d u r a t i o n , ever four days. The two a r i t h m e t i c t e s t s r e q u i r e d one s i t t i n g each. They are contained i n Appendix B. The 1964 t e s t i n g program was undertaken t o a s s i s t i n e s t a b l i s h i n g reasonably c o n s i s t e n t standards across the province. Each classrcom teacher r e c e i v e d a f u p i l r e p o r t and a c l a s s l i s t i n g . _ Summaries were prepared f o r each classroom, each s c h o o l , and each s c h o o l d i s t r i c t , Computer programs were 6 used to determine d i s t r i b u t i o n s o f s c o r e s and t o c a l c u l a t e p e r c e n t i l e s (Conway, 1S64). A random sample o f three hundred completed t e s t b a t t e r i e s was drawn with one hundred frcm each cf the upper t h i r d , middle t h i r d , and lower t h i r d as d e f i n e d by t o t a l score on the b a t t e r y . The one hundred papers f o r each of the upper t h i r d and lower t h i r d were used to c a l c u l a t e item d i f f i c u l t i e s and v a l i d i t i e s f o r a l l the .items cn the s i x t e s t s . The d i f f i c u l t y value was d e f i n e d to be the percentage of the two hundred respondents who e i t h e r f a i l e d t o respond to the item or who gave an i n c o r r e c t response* . The v a l i d i t y f i g u r e f o r each item was determined by s u b t r a c t i n g the percentage of respondents i n the lower t h i r d sample who responded c o r r e c t l y from the percentage cf those i n the upper t h i r d sample who responded c o r r e c t l y . , T h i s i s a b b r e v i a t e d as V-L% i n the r e p o r t . These two c a l c u l a t i o n s appear to have been customary with the Research and Standards branch i n a l l i t s t e s t i n g programs. The three hundred t e s t papers were f i l e d along with data sheets showing d e t a i l s of the t a b u l a t i o n s and c a l c u l a t i o n s * In May 1970 the Department r e a d m i n i s t e r e d the two a r i t h m e t i c t e s t s from the same b a t t e r y t o a l l Grade 7 students i n B r i t i s h Columbia. From the estimated 40 252 students e n r o l l e d , 38 377 completed forms were r e t u r n e d . "The purpose of the second a d m i n i s t r a t i o n was to determine the changes t h a t had o c c u r r e d i n achievement i n the o r d i n a r y a r i t h m e t i c type of item" ( B r i t i s h Columbia Department of Education, 1970, p. 1). A procedure s i m i l a r t o t h a t cf 1964 was f o l l o w e d i n a n a l y z i n g the data i n 1970. I t i s not c l e a r , however, whether an a n a l y s i s was made f o r each classroom, s c h o o l , and d i s t r i c t as i n the p r e v i o u s a d m i n i s t r a t i o n ; Modal-age grade e q u i v a l e n t s based on the 1952 0.S.,norms were again determined f o r the p o p u l a t i o n . The excess over D.S. modal-age grade e q u i v a l e n t s was found to have dropped by 0.9 years on the reasoning t e s t and by -1*1 years on the computation t e s t . V a l i d i t y f i g u r e s f o r each item were determined as i n 1964* Again three hundred papers were drawn as a s t r a t i f i e d random sample,:. T h i s time, however^ the cne hundred papers f o r t i e middle t h i r d were used as w e l l as the one hundred f o r each of the upper and lower t h i r d s f o r determining the item d i f f i c u l t i e s . The t a b l e s on pages 5 and 6 of Appendix A set out the values o b t a i n e d on the two a d m i n i s t r a t i o n s , With r e s p e c t to the item a n a l y s i s a l l t h a t has been di s c u s s e d t o t h i s p o i n t i s the computational procedure. Now the essence of the problem may te d e l i n e a t e d . The purpose of the 1970 program was to assess change i n performance s i n c e 1964,. T h i s was done i n two ways: The f i r s t was to compare the grade e g u i v a l e n t means. The second was to compare the d i f f i c u l t y o f each item as determined i n 1964 and i n 1970. C o n c l u s i o n s were drawn with r e s p e c t to items which had changed i n d i f f i c u l t y , with r e s p e c t to areas of the c u r r i c u l u m which had become more d i f f i c u l t , and with r e s p e c t to the reasoning processes of students based on p a t t e r n s of item d i f f i c u l t y . The q u e s t i o n a r i s e s as to what c r i t e r i a were used to decide t h a t a change had indeed taken p l a c e i n the d i f f i c u l t y of an item. . 8 I t appears from the t a b l e s cn pages 5 and 6 i n Appendix a t h a t a change of more than 2% cn the reasoning t e s t and of more than 3% on the computation t e s t was considered t o r e f l e c t a t r u e change i n item difficulty™. The e x c e p t i o n s to t h i s r u l e are items 2 and 4 on the r e a s o n i n g t e s t and item 5 on the computation t e s t * A summary of the numbers of changes i s given i n T a b l e 1. 1. Table 1. 1 Summary of Reported Changes in D i f f i c u l t y Value Test No. of Items Decreasing No. of Items I n c r e a s i n g i n D i f f i c u l t y i n D i f f i c u l t y Reasoning 12 17 Computation n 26 Of c r i t i c a l importance i s the f a c t t h a t nowhere i n the r e p o r t i t s e l f was there any mention t h a t the., item d i f f i c u l t y values were determined on the b a s i s of samples. That i n f o r m a t i o n was obtained only upon examination of the f i l e s which c o n t a i n e d d e t a i l e d summary sheets of the sample responses. The 1964 f i g u r e s were based on a sample o f j u s t two hundred papers even though a f u r t h e r one hundred were a v a i l a b l e . In 1970 the Department decided to use the a d d i t i o n a l one hundred papers from the middle t h i r d t o obtain mere r e p r e s e n t a t i v e item d i f f i c u l t i e s on the 1970 a d m i n i s t r a t i o n . . Since the f i g u r e s were sample based there should have been an attempt to take t h a t f a c t i n t o account when d e c i d i n g which items had s i g n i f i c a n t l y changed i n d i f f i c u l t y . A need t o r e a n a l y z e the data i s c l e a r l y i n d i c a t e d by t h i s o v e r s i g h t . Tables 1.2 and 1.3 summarize the r e s u l t s cf the r e a n a l y s i s of the data c a r r i e d out f o r the present study., The 1964 d i f f i c u l t y f i g u r e s were determined using a l l three hundred a v a i l a b l e papers;. Item responses were t a b u l a t e d d i r e c t l y from the o r i g i n a l t e s t papers and a comparison was made with the Department's summary sheets. Some d i s c r e p a n c i e s were found, An examination of i n d i v i d u a l papers showed some s c o r i n g e r r o r s . In most of these cases the item had been designated c o r r e c t although more than one response had been marked by the student; In re-marking the papers, where the evidence was strong t h a t the student favoured the c o r r e c t a l t e r n a t i v e , the item was marked c o r r e c t . Where i t was not c l e a r which a l t e r n a t i v e the student wished the marker to accept, the item was marked i n c o r r e c t . As a r e s u l t of t h i s procedure a t o t a l of 35 changes were made.. No item d i f f e r e d from the o r i g i n a l t a l l y by more than 2 out of 200. . The same procedure was f o l l o w e d i n r e c o r d i n g the i n f o r m a t i o n frcm the 1970 sample t e s t papers. Six s c o r i n g changes were made. The d i f f i c u l t y f i g u r e s f o r each year were extended t c cne decimal p l a c e . In the Ch* column of the t a b l e s the key i s as f o l l o w s : + : more d i f f i c u l t i n 1970 - : l e s s d i f f i c u l t i n 1970 0 : no change i n d i f f i c u l t y . Table 1.2 Item D i f f i c u l t i e s on the Reasoning Test -I , j O r i g i n a l | I A n a l y s i s | R e a n a l y s i s Item| K - D i f f | ~| 1964 } 1970 ~| r h 4 \- h 1- + 4 No. | 196 4| 1970 |Ch* | % - D i f f 1 SE | X - D i f f | SE |Ch* 1 I 7 | 5 | 0 J 4. 7 | 1.2 | 4. 7 | 1.2~| 0 2 | 3 | 6 | 0 | 3.7 | 1.1 | 5.7 | -1.3 | 0 3 | 4 | 5 | 0 | 2.7 J 0.9 | 5i 0 J 1. 3 | 0 4 | 15 | 13 | - | 14.0 | 2*0 | 13.3 | 2.0 | 0 5 | 12 | 7 | - | 10.6 | -li 7 | 7.0 | 1.5 | 0 6 | 10 | 7 | - | 8.3 | 1.6 | 7.3 | 1.5 | 0 7 | 20 | 20 | 0 | 19.7 | 2i3 | 20.3 | 2.3 | 0 8 | 11 | 17 | + | 12.3 | 1.9 | 17.0 | 2.2 | 0 9 | 17 | 22 | + | 15. 3 | 2. 1 | 21.7 j 2.4 | + 10 | 18 | 14 | - | 19.0 | 2.3 | 13.7 | 2.0 | 0 11 I 12 | 14 | 0 | 9.7 | 1„7 | 14. 0 | 2.0 | 0 12 | 18 | 17 | 0 | 18.7 | 2'*3 | 17.3 | 2.2 | 0 13 | 18 | 17 | 0 | 14.7 I 2.0 | 16.7 | 2.2 | 0 14 | 39 | 38 | 0 | 36.7 | 2.8 | 37.7 | 2.8 | 0 15 | 22 | 25 | + | 22.3 | 2.4 | 24.7 j 2.5 | 0 16 | 25 | 30 | + | 21.3 | 2*4 | 30.0 | 2.7 | + 17 | 20 | 21 | 0 | 19.0 | 2J3 | 21.0 | 2.4 | 0 18 | 23 | 26 | + | 22.0 | 2^ .4 | 26.3 | 2.6 | 0 19 | 35 | 34 | 0 | 31.0 | 2*7 | 33.7 | 2.7 | 0 20 | 34 | 36 | 0 | 31.7 | 2.7 | 36.3 | 2.8 | 0 21 | 27 J 25 | 0 | 23.7 | 2,i 5 | 25. 0 | 2.5 | 0 22 | 20 | 37 | + | 18.3 | 2. 2 | 36.7 | 2.8 J + 23 | 21 | 24 | + | 20.3 | 2* 3 | 23.7 | 2.5 | 0 24 | 58 | 58 | 0 | 60.0 | 2.8 | 57.7 | 2.9 | 0 25 | 50 | 54 | + | 50.0 | 2.9 | 54.3 | 2.9 | 0 26 | 39 | 44 | + | 41.7 | 2.9 | 44.0 | 2.9 | 0 27 | 43 | 50 J + | 37.3 | 2.8 | 50.3 | 2.9 J + 28 | 64 | 61 | - | 65.3 | 2.8 | 61.0 | 2.8 | 0 29 | 73 | 79 | + | 74.0 | 2.5 | 78.9 | 2.7 j 0 30 | 64- | 71 | + | 64.3 | 2.8 | 71.0 | 2.6 | 0 31 | 14 | 10 | - | 12.7 | 1.9 | 9.7 | 1.7 I 0 32 | 36 | 36 | 0 | 35.0 | 2.8 | 36.7 | 2.8 | "0 33 | 20 | 11 | - | 17.0 | 2.2 | 11.0 | 1.8 | -34 | 57 | 52 | - | 56.7 | 2.9 | 51.3 | 2.9 | 0 35 | 17 | 38 j + | 15.7 I 2.1 | 38.3 | 2.9 | + 36 | 34 | 36 | 0 | 34.7 | 2.8 | 36.3 | 2.8 | 0 37 | 35 | 43 | + | 36.0 | 2.8 | 42.7 | 2.9 | 0 38 | 20 | 16 | - | 22.0 | 2.4 | 15.7 | 2.4 | 0 39 | 28 | 37 | + | 28.3 | 2.6 | 37.3 | 2.8 | + 40 | 46 I 45 | 0 | 49.0 | 2.9 | 45.3 | 2.9 | 0 41 | 58 | 51 | - | 60.0 | 2.8 | 51.0 | 2-9 | -42 | 30 | 34 | + | 30.0 | 2.7 | 34.3 | 2.7 | 0 43 | 70 | 64 | - | 69.0 | 2*7 | 64.3 | 2.8 | 0 44 | 59 | 55 | - | 59.7 | 2.8 | 54.7 | 2.9 | 0 45 | 63 | 72 | + | 67.7 | 2.7 | 72.0 | 2.6 | 0 Table 1.3 Item D i f f i c u l t i e s on the Computation Test + | O r i g i n a l L a n a l y s i s E e a n a l y s i s Item r | i JS-Diff | 1964 i r | 1970 | No. r 1964|1970|Ch*| 3S-Diff | SE | % - D i f f | 1 SE | Ch*| 1 + r 3 1 r -9 I ——r + | 2.3 | 0.9 I 9.0 „ ^ , , j 1.7 | + | 2 | 12 | 7 | - | 9.7 | 1.7 I 7.0 | 1.5 | 0 I 3 j 7 | 12 | + | 7. 3 | 1. 5 | 11.7 | 1.9 | 0 i 4 | 2 | 2 I 0 1 2.0 | 0.8 I 1-7 | 0.7 | 0 I 5 | 10 | 13 | + J 10.0 | 1.7 | 13.0 | 1.9 | 0 I 6 | 14 | 9 I - | 12.7 | 1.9 I 8.7 | 1.6 | 0 I 7 | 17 | 16 | 0 1 14.7 | 2.0 | 16. 3 | 2.1 | 0 I 8 | 12 | 11 | 0 1 9.7 | 1.7 I 11.3 | 1.8 | 0 I 9 13 | 17 | 4- 1 12.3 | 1.9 | 17.3 | 2.2 | 0 I 10 | 11 | 15 | + 1 8.3 | 1,6 I 1 5. 0 | 2.1 | 4- | 11 20 | 34 | + 1 16.3 | 2. 1 | 33* 7 | 2.7 | 4- | 12 | 10 | 23 | + 1 10*0 | 1.7 | 23.0 | 2.4 1 4- | 13 | 32 | 47 | + 1 29.0 | 2.6 I 47.0 | 2.9 | 4- | 14 | 32 | 50 | + 1 29.0 | 2*6 | 50.0 | 2-9 | + I 15 | 16 | 19 | 0 1 14.7 | 2.0 | 19. 0 | 2-3 | 0 I 16 | 26 | 45 j 4- | 24.7 | 2.5 I 45. 0 | 2.9 | 4- | 17 | 30 | 31 | 0 1 32.3 | 2.7 | 31. 3 | 2.7 | 0 I 18 | 23 | 45 | 4- | 23.3 | 2.4 | 45. 0 I 2.9 | + J 19 | 32 | 47 | + J 281 3 | 2.6 | 47.0 | 2.9 | 4- | 20 | 28 | 46 | + 1 2 5.3 | 2.5 I 46.3 | 2.9 | 4- | 21 | 11 | 9 I 0 1 9.7 | 1.7 | 9.0 | 1.7 | 0 I 22 | 38 | 46 | 4- 1 32.0 | 2.7 | 46.0 | 2.9 | 4- | 23 | 21 1 18 | 0 I 24.7 | 2,-5 | 18. 0 | 2.2 | - | 24 | 45 | 53 | 4- | 46.3 | 2*9 | 53.3 | 2.9 | 0 I 25 | 25 | 22 | 0 1 25.3 | 2.5 | 22.6 | 2.4 | 0 I 26 j 42 | 48 | 4- | 4 3.0 | 2.9 | 48.3 | 2.9 | 0 I 27 | 33 | 46 | 4- | 27.7 | 2.6 | 46.0 | 2.9 | + I 28 | 31 | 44 | 4- | 29.7 | 2.6 | 44. 0 2.9 | 4- j 29 | 32 | 28 | - | 28.3 | 2.6 | 27.7 | 2.6 | o I 30 | 17 | 17 | 0 1 16.3 | 2. 1 | 17.0 | 2.2 | 0 I 31 | 17 | 31 | 4- | 16.0 | 2. 1 | 31.3 | 2,7 | 4- | 32 | 54 | 59 | + | 50.0 | 2i9 | 59.0 | 2.9 | + I 33 | 28 | 42 j 4- | 30.3 | 2.7 | 42.0 j 2.9 | 4- | 34 | 60 | 79 | + | 61.0 | 2. 8 | 79.0 | 2.4 | + | 35 | 23 | 31 | 4- | 22.7 | 2.4 | 30. 7 | 2.7 | + | 36 | 5 I 8 I 0 I 4.7 | 1.2 I 7.7 | 1.6 | 0 I 37 | 62 | 39 | - | 64.0 | 2,. 8 | 39.0 [ 2.8 | - I 38 | 66 | 76 | 4- | 66.3 | 2.7 | 75*7 | 2.5 | + I 39 | 71 1 72 | 0 j 73.7 | 2.5 | 72.3 | 2,6 | 0 I 40 | 88 | 88 I 0 1 86.3 | 2.0 | 88.3 | 1.9 | 0 I 41 | 54 | 63 | 4- | 56.7 I 2.9 | 63.3 | 2.8 | 0 I 42 | 47 | 46 | 0 1 49.0 | 2.9 I 46.3 | 2.9 | 0 I 43 76 | 75 | 0 1 77. 3 | 2. 4 | 74. 7 | 2.5 | o i 44 80 | 91 I 4- | 85'. 0 | 2. 1 | 91.3 1.6 | 4- | 12 The standard e r r o r (SE) f o r each item i n the r e a n a l y s i s was computed from the formula: where d i s the % - d i f f i c u l t y of the item, and n i s the sample s i z e . The d i f f e r e n c e of the item d i f f i c u l t i e s was determined as w e l l as the standard e r r o r c f the. d i f f e r e n c e s , ( ^ /SE ( 6 4 ) 2 + S E ( 7 0 ) 2 ) . 2 A change i n item d i f f i c u l t y was deemed t o have occ u r r e d when the d i f f e r e n c e i n item d i f f i c u l t y e s t i m a t e s exceeded twc standard e r r o r s ; . a summary of changes i s given i n Table 1.4. These may be compared with the o r i g i n a l r e s u l t s i n Table 1 . 1 . Table 1.4 Summary of Changes i n D i f f i c u l t y Values a f t e r E e a n a l y s i s of the Data Test No. of Items Decreasing NOi„cf Items I n c r e a s i n g i n D i f f i c u l t y i n D i f f i c u l t y Reasoning 2 6 Computation 2 20 2 In t h i s study, p l a c i n g what would s u b s c r i p t e d v a r i a b l e s are r e p r e s e n t e d be s u b s c r i p t s i n parentheses; by 13 The I n t e r p r e t a t i o n of Change Although the s t a t e d purpose of the 1970 .project was to determine change, the Department a l s o attempted t c analyze s p e c i f i c weaknesses. In p a r t i c u l a r , e f f o r t s were made to i d e n t i f y types of e r r o r s , f o r example, "zero d i f f i c u l t y " , and to a s c r i b e reasons f o r lower performance on s p e c i f i c items, f o r example, l a c k of use of analogy. O v e r a l l , the Department appears to have attempted t o i n t e r p r e t change a t three d i f f e r e n t l e v e l s . . The f i r s t i n t e r p r e t a t i o n of the r e s u l t s was i n terms of the g l o b a l a b i l i t i e s " r e a s o n i n g " and "computation". Comparisons were made by determining median modal-age grade e g u i v a l e n t s on the two a d m i n i s t r a t i o n s . . T h i s approach has s e v e r a l d e f i c i e n c i e s , In the f i r s t p l a c e , the NACOME r e p o r t (1975) recommended the abandonment of gr a d e - e q u i v a l e n t scores f o r a number of reasons: they have no n a t u r a l s t a t i s t i c a l i n t e r p r e t a t i o n , they l e a d to the popular e x p e c t a t i o n t h a t a l l students should perform at grade l e v e l or above, and t h e j are open to the m i s i n t e r p r e t a t i o n that c h i l d r e n s c o r i n g above grade l e v e l c o u l d perform s a t i s f a c t o r i l y at the grade l e v e l i n d i c a t e d . I n the second place the value t o p r a c t i t i o n e r s of knowing how w e l l t h e i r students performed on t e s t s of mathematical " r e a s o n i n g " and mathematical "computation" provides l i t t l e guidance as to what a c t i c n should be taken i n the classroom. T h i s i s p a r t i c u l a r l y so i n t h i s i n s t a n c e since the Department's r e p o r t (1970) i t s e l f notes: "The s u b - t e s t s o v e r l a p to a c e r t a i n e xtent; almost a l l 'reasoning* items r e q u i r e some s k i l l i n computation and many of the 14 • computation* items r e q u i r e a c e r t a i n amount of problem-s o l v i n g a b i l i t y " (p. ,1). A second i n t e r p r e t a t i o n of the r e s u l t s was i n terms of students' a b i l i t y to answer s p e c i f i c items c o r r e c t l y . In t h i s i n s t a n c e each item i s taken t o be r e p r e s e n t a t i v e cf an e n t i r e c l a s s of items.. In g e n e r a l , f o r example, cne i s not i n t e r e s t e d i n whether c h i l d r e n can c o r r e c t l y determine the value of ( 1 / 4 ) / ( 1 / 2 ) or not, but whether they can perform the o p e r a t i o n of d i v i s i o n on u n i t f r a c t i o n s . T h i s r a i s e s the problem of item p e c u l i a r i t y : the r e s u l t s may have been d i f f e r e n t i f the s p e c i f i c item had been ( 1 / 6 ) / ( 1 / 3 ) . The c u r r e n t move toward mastery l e a r n i n g and c r i t e r i o n - r e f e r e n c e d t e s t i n g i n which a student responds to a number of s i m i l a r items, i n d i c a t e s t h a t the use of the t r a d i t i o n a l form of survey t e s t t o d e a l with very narrowly d e f i n e d c u r r i c u l a r areas i s inadequate; F i n a l l y , the r e s u l t s were i n t e r p r e t e d i n terms of performance i n content areas. For example, the D i r e c t o r concluded t h a t " p u p i l s ;,..db much more p o o r l y , however, i n problems i n v o l v i n q i n t e r e s t r a t e s or f r a c t i o n s " ( B r i t i s h Columbia Department of Education, 1971, p. G 6 2 ) . . The D i r e c t o r d i d not c i t e s p e c i f i c evidence to support h i s c o n c l u s i o n of poorer performance on f r a c t i o n s * . His r e p o r t d i d show t h a t , of the twenty-six items on the computation t e s t f o r which the Department claimed poorer performance i n 1970, f i v e d e a l t with f r a c t i o n s . Yet t h e r e were f o u r other items d e a l i n g with f r a c t i o n s on the same t e s t f o r which performance was unchanged* Furthermore, there were nine more questions cn the 15 r e a s o n i n g t e s t i n which o p e r a t i o n s on f r a c t i o n s were r e q u i r e d but none of these items showed changes i n performance over time. C l e a r l y t h e r e i s a problem of e s t a b l i s h i n g c r i t e r i a on which to base such g e n e r a l i z a t i o n s * The value of grouping items cn the Stanford Achievement T e s t appears to have been r e c o g n i z e d . . In the 1973 r e v i s i o n of the t e s t the p u b l i s h e r s provide an item a n a l y s i s s e r v i c e (Budman, 1977). The n a t i o n a l "p-value", t h a t i s , the percentage of students who answered the item c o r r e c t l y , i s given f o r each item along with p-values f o r the c l a s s , s c h o o l , and system* Each of the l a t t e r t hree i s s p e c i a l l y marked i f i t d i f f e r s s i g n i f i c a n t l y from the n a t i o n a l value;. The items are grouped by i n s t r u c t i o n a l o b j e c t i v e and item group mean p-values determined f o r each of the f o u r r e p o r t i n g c a t e g o r i e s . . For example, at the grade f o u r l e v e l , a number of items are grouped under the o b j e c t i v e " a d d i t i o n and s u b t r a c t i o n a l g o r i t h m s " ; o t h e r s , under " m u l t i p l i c a t i o n and d i v i s i o n a l g o r i t h m s " (Rudman, 1977, p..181). The a n a l y s i s of change based upon groups of items having some common mathematical underpinning would be v a l u a b l e f o r the p r a c t i t i o n e r . I t would i d e n t i f y areas of r e l a t i v e s t r e n g t h and weakness i n manageable c u r r i c u l a r u n i t s . Change i n the o v e r a l l d i f f i c u l t y o f the groups of items c o c l d be determined by comparing group means a c r o s s time. 16 The Problem of Scale a fundamental requirement u n d e r l y i n g the c a l c u l a t i o n of s t a t i s t i c s such as the mean and standard d e v i a t i o n i s t h a t o f an equal i n t e r v a l s c a l e . On such a s c a l e equal d i f f e r e n c e s i n the measures correspond to equal d i f f e r e n c e s i n the auount of the u n d e r l y i n q a t t r i b u t e . P h y s i c a l measurement s c a l e s such as l e n g t h i n c e n t i m e t r e s and temperature i n degrees C e l s i u s are examples; One can determine not only that i n d i v i d u a l s d i f f e r i n h e i g h t but a l s o by how much they d i f f e r . O r d i n a l measurement, on the other hand, allows only the arrangement of o b j e c t s or i n d i v i d u a l s on a ranked b a s i s without knowledge of t r u e q u a n t i t a t i v e d i f f e r e n c e s . „ The s c a l e s which underly the measurement of achievement are not easy to c l a s s i f y i n c l e a r - c u t terms. Glass and Stanley (1970, p. 13) suqgest, f c r example, that I.Q. scores might be c a l l e d " g u a s i - i n t e r v a l ' V ; Three people having I^Q.'s of 50, 110, and 120 y i e l d more than a simple r a n k i n g - - i t would a l s o be expected t h a t the second and t h i r d i n d i v i d u a l s are much more a l i k e . w i t h r e s p e c t to I.Q. than the f i r s t and s e cond./ G l a s s and S t a n l e y do not o f f e r recommendations on how to deal with such measures. ahmann and Glock (1971, p. 246) give a h y p o t h e t i c a l example of a s t a n d a r d i z e d v o c a t u l a r y t e s t cn which two p a i r s of students achieve raw scores o f 68 and 88, and 17 and 37, r e s p e c t i v e l y . They point out t h a t i t i s u n l i k e l y that the 20 raw-score u n i t d i f f e r e n c e i n each case i s i n d i c a t i v e of the same true spread i n vocabulary: " I t i s a l l too t r u e that e d u c a t i o n a l measurement h a b i t u a l l y y i e l d s somewhat unequal 17 u n i t s " (p, 246). They suggest t h a t the i n c r e a s e from 68 to 88 i s l i k e l y a g r e a t e r achievement than an increase.frem 17 to 37. " I t i s t y p i c a l of achievement t e s t s to f i n d the •rubber u n i t s * c o n t r a c t e d at the lower end cf the raw-score d i s t r i b u t i o n and expanded at the upper end" (p. 246). F i s c h e r (1976) s t a t e s the problem more, g r a p h i c a l l y . In F i g u r e 1.1, the h o r i z o n t a l a x i s r e p r e s e n t s a c e r t a i n c o g n i t i v e a b i l i t y , and the v e r t i c a l a x i s i s the expected raw score on a t e s t which measures t h i s a b i l i t y . The s o l i d curve i s the graph of the f u n c t i o n r e l a t i n g t e s t score t o a b i l i t y . . X (1) and X(2) r e p r e s e n t the a b i l i t i e s of two c h i l d r e n before "treatment" and X (1) ' and X(2)' r e p r e s e n t t h e i r a b i l i t i e s a f t e r treatment. Assuming t h a t they have d e r i v e d e x a c t l y the same b e n e f i t from the treatment, the d i f f e r e n c e i n t h e i r a b i l i t i e s i s c onstant. ' t " " ~ - I Max. 1 x 2 x i x 2 Ability F i g u r e 1.1. A h y p o t h e t i c a l example of the r e l a t i o n s h i p between raw s c o r e and a b i l i t y . 18 Although the b e n e f i t cf treatment i s the same f o r each i n d i v i d u a l the rah score d i f f e r e n c e w i l l not r e f l e c t t h i s f a c t . Since D(1) i s g r e a t e r than D(2) the lower a b i l i t y c h i l d w i l l appear t o have gained much more. I f , however, the easy t e s t items are r e p l a c e d by more d i f f i c u l t ones the f u n c t i o n r e l a t i n g a b i l i t y and t e s t score might be the dashed curve. The corresponding raw scores would show d (2) g r e a t e r than d (1) i m p l y i n g t h a t the higher a b i l i t y person shows g r e a t e r improvement, Hence the i n t e r p r e t a t i o n of raw s c o r e s i s dependent upon the d i s t r i b u t i o n of the d i f f i c u l t i e s of items making up the t e s t . K e r l i n g e r (1964, p. 427) adopts a pragmatic p o i n t of view. He suggests t h a t the best procedure i s to t r e a t o r d i n a l measurements as though they were i n t e r v a l measurements except f o r the case of gross i n e q u a l i t i e s of i n t e r v a l s . He f u r t h e r advises t h a t t h e - i n t e r p r e t a t i o n of s t a t i s t i c a l a n a l y s e s l a s e d c r i n t e r v a l measures where the data are b a s i c a l l y o r d i n a l be made c a u t i o u s l y . He suqgests t h a t the competent r e s e a r c h worker should be aware of t r a n s f o r m a t i o n s which can change o r d i n a l s c a l e s i n t o i n t e r v a l s c a l e s when t h e r e i s s e r i o u s doubt as t o i n t e r v a l e q u a l i t y . I f one i s w i l l i n g to assume t h a t the normal d i s t r i b u t i o n u n d e r l i e s the observed v a r i a b l e measures, the problem can be r e s o l v e d . _ Standard procedures can be used to perform a n o n l i n e a r t r a n s f o r m a t i o n of raw scores (e.g., Magnusson, pp. 235-238) . . However, Stevens (1951) c h a r a c t e r i z e s such usage as an a c t o f f a i t h , p o i n t i n g out t h a t : " I t i s c e r t a i n l y not unreasonable to b e l i e v e t h a t t h i s 19 f a i t h i s o f t e n j u s t i f i e d . What haunts us i s the d i f f i c u l t y of knowing when i t i s n ' t " (p. 41). An A l t e r n a t i v e Approach In 1960, Georg Easch of the Danish I n s t i t u t e f o r E d u c a t i o n a l Research proposed s e v e r a l p r o b a b i l i s t i c models f o r the a n a l y s i s of i n t e l l i g e n c e and achievement t e s t s (Rasch, 1960), The p a r t i c u l a r model t o be a p p l i e d i n the present study i s known as the simple l o g i s t i c model, Rasch, using the r e s u l t s of a Danish m i l i t a r y t e s t , argued from h i s data t h a t items on the t e s t c o u l d be ordered by t h e i r degree of d i f f i c u l t y as i n d i c a t e d by t h e i r percentage of c o r r e c t s o l u t i o n s . He argued f u r t h e r t h a t the respondents c o u l d be ordered by t h e i r a b i l i t y to s o l v e the t e s t items as i n d i c a t e d by t h e i r raw sc o r e s on the t e s t , The problem which Easch f a c e d was t h a t o f s e t t i n g up a model which allowed measurement on a r a t i o s c a l e r a t h e r than on an o r d i n a l s c a l e . Raw scores or percentages based upon raw scores are iradeg u a t e f o r a r a t i o s c a l e i n which one wishes t o be a b l e to s t a t e , f o r example, t h a t person A has twice the a b i l i t y as person B. I f A o b t a i n s a score of 60% and B o b t a i n s a score of 30% and i t i s suggested t h a t A has twice the a b i l i t y of B, then i t i s not p o s s i b l e , on the same b a s i s , to determine the score of person C who has twice the a b i l i t y c f A. The same argument a p p l i e s i n the case o f item d i f f i c u l t i e s . In s e t t i n g up the model, Easch c o n s i d e r e d s e v e r a l d e s i r a b l e c h a r a c t e r i s t i c s . . The r e l a t i v e a b i l i t i e s of two 20 i n d i v i d u a l s should be uniquely determined r e g a r d l e s s of the p a r t i c u l a r item used as a s t i m u l u s . C o r r e s p o n d i n g l y , the r e l a t i v e d i f f i c u l t i e s of two items should not depend on the a b i l i t y o f the p a r t i c u l a r person t o whom they were administered. Furthermore, i f one person was twice as able as another and one q u e s t i o n was twice as d i f f i c u l t as another, the more able person should s o l v e the more d i f f i c u l t problem with the same "expenditure of e f f o r t " as the l e s s able solved the l e s s d i f f i c u l t (Easch, 1960, p..73)., For a very capable person faced with a very easy item, the p r o b a b i l i t y t h a t the person would be able to solve the problem should be very n e a r l y u n i t y . Only f a c t o r s such as f a t i g u e or boredom should r e s u l t i n a wrong answer. By the same token, a very d u l l person should have very l i t t l e chance of c o r r e c t l y answering a very d i f f i c u l t item. . F i n a l l y , f o r a problem n e i t h e r t o o easy nor too d i f f i c u l t f o r the respondent, the outccme should be u n c e r t a i n , and one would expect the p r o b a b i l i t y of a c o r r e c t s o l u t i o n t o be arcund 0.5. In g e n e r a l , the p r o b a b i l i t y of a c o r r e c t s o l u t i o n i s a f u n c t i o n of the r a t i o A/D, where A i s the a b i l i t y o f the person and D i s the d i f f i c u l t y of the item. S e t t i n g A/D = E, the problem becomes a matter of s e l e c t i n g a f u n c t i o n of E such t h a t the requirements of the preceding paragraphs are . met. The s i m p l e s t f u n c t i o n o c c u r r i n g to Easch was the f u n c t i o n E/(1 + E ) . S u b s t i t u t i n g A/D f o r E r e s u l t s i n the e x p r e s s i o n : A/(A+D). I n s p e c t i o n r e v e a l s t h a t f o r & equal t c zero the p r o b a b i l i t y of s o l v i n g an item i s zero* For a l l persons of high A meeting questions of low D, the p r o b a b i l i t y i s c l o s e to 21 u n i t y f o r c o r r e c t solutions,. And, i f A=D the p r o b a b i l i t y i s 0.5. F i n a l l y , the p r o b a b i l i t y t h a t a person w i l l not s o l v e the problem c o r r e c t l y i s 1 - A/(A+D) = D/(A+D), Easch extended the model to a h y p o t h e t i c a l s i t u a t i o n i n which a t e s t of k items i s administered t o a group of n persons. He found t h a t the model had s e v e r a l important p r o p e r t i e s . The estimate of a person's a b i l i t y , A, can be d e r i v e d s o l e l y from h i s or her raw s c o r e , r , r e g a r d l e s s of which items c o n t r i b u t e d t o t h a t s c o r e . . Secondly, the estimate of an item's d i f f i c u l t y , D, can be d e r i v e d frem the number of times the item was c o r r e c t l y answered without regard f o r which persons s o l v e d the item c o r r e c t l y (Easch, 1960, pp. ,76-77). F i n a l l y , f o r the s p e c i f i c problem at hand, t h e r e i s one r e s u l t of fundamental importance d e r i v i n g from the Easch model: the d i f f i c u l t y / a b i l i t y s c a l e i s an equal i n t e r v a l s c a l e . The t r a d i t i o n a l p-values and t e s t raw s c o r e s serve as s u f f i c i e n t e s t i m a t o r s f o r a s s i q n i n g p o s i t i o n s on the common und e r l y i n g metric both to items and to persons. Purpose of the Study The purpose of t h i s study, then, was to apply the Easch model i n an attempt t o determine change i n the mathematics achievement of Grade 7 students i n B r i t i s h Columbia. T h i s was accomplished i n s e v e r a l steps. (1) The Easch model was a p p l i e d to the data on hand from the 1964 and 1970 a d m i n i s t r a t i o n s of the Stanfo r d Achievement T e s t s t o determine item d i f f i c u l t i e s f o r each 22 year. I n d i v i d u a l item d i f f i c u l t i e s were compared u s i n g a procedure e l u c i d a t e d i n Chapter I I I . Items were grouped a c c o r d i n g to content i n t o a number o f meaningful and mutually e x c l u s i v e subsets. Mean item d i f f i c u l t i e s f o r each group were compared from 1964 to 1970 i n order to draw from the data c o n c l u s i o n s more g e n e r a l than those made p o s s i b l e by simple item comparisons. (2) In order to gain some i n s i g h t i n t o the performance of students i n 1979, the 1964 and 1970 t e s t s were administered to a sample of Grade 7 students i n B r i t i s h Columbia i n 1979.. The same procedure as i n (1) was f o l l o w e d i n order to assess the extent of change between 1970 and 1S79. S i q n i f i c a n c e o f the Study The c o n c l u s i o n s reached i n the NACOME r e p o r t emphasized the need f o r t e t t e r means of measuring change i n achievement. In the l i t e r a t u r e viewed p r i o r to t h i s study, t h e a p p l i c a t i o n of the Rasch model to the problem had not been attempted. I n theory, the Rasch model has a number of c h a r a c t e r i s t i c s which make i t p a r t i c u l a r l y s u i t a b l e f o r measuring change. By i n v e s t i g a t i n g a r e a l s i t u a t i o n , t h i s study r e v e a l s some of the advantages and d i f f i c u l t i e s of a p p l y i n g the model* In a d d i t i o n t o c o n t r i b u t i n g to the a r t of change a n a l y s i s , the study should p r o v i d e u s e f u l i n f o r m a t i o n on the s t a t e of mathematics achievement at the end of elementary s c h o o l education i n B r i t i s h Columbia* In h e l p i n g to c h a r t 23 r e a l change i n mathematical performance i t should provide data f o r c u r r i c u l a r emphasis or a l t e r a t i o n . The f i n d i n g s of the study are a u s e f u l supplement to the B r i t i s h Columbia Mathematics Assessment ( E o b i t a i l l e & S h e r r i l l , 1977). 24 CHAPTER I I REVIEW OF THE TITERATP RE The review of the l i t e r a t u r e i s d i v i d e d i n t o two major s e c t i o n s * In the f i r s t s e c t i o n , the methodology and f i n d i n g s of pr e v i o u s attempts to measure change i n mathematics achievement i n the elementary grades are o u t l i n e d . I n the second s e c t i o n , the Easch l o g i s t i c model i s d e s c r i b e d , and s t u d i e s r e l a t e d t o c o n t r o v e r s i a l i s s u e s surrounding the model are d i s c u s s e d , STUDIES ON CHANGE IN MATHEMATICS ACHIEVEMENT The p a u c i t y of i n f o r m a t i o n r e l a t i n g t c changing student performance i n Canada was noted by Hedges (1977) : J.Thei] widespread argument cn the question of comparative student achievement i s aggravated by the almost t o t a l , absence of long-term e v a l u a t i o n s t u d i e s based on s t a n d a r d i z e d tests;. (p. 3) The few measurements of change i n mathematics achievement have g e n e r a l l y been c a r r i e d out on three d i s t i n c t l e v e l s : (1) sc h o o l d i s t r i c t , (2) p r o v i n c i a l or s t a t e , and (3) 25 n a t i o n a l . Each of these w i l l be con s i d e r e d i n t u r n . . Because Canadian c u r r i c u l a and methods have always c l o s e l y p a r a l l e l e d these i n the United S t a t e s , p a t t e r n s o f achievement among American s c h o o l c h i l d r e n have been used t c y i e l d i n f o r m a t i o n on the Canadian situation;* T h i s f a c t i s of p a r t i c u l a r importance at the n a t i o n a l l e v e l , where nc Canadian s t u d i e s e x i s t . S t u d i e s at the D i s t r i c t L e v e l Hedges (1977) c a r r i e d out a study of achievement i n language a r t s and mathematics over a 40-year span. The study covered Grades 5 t o 8 i n the s c h o o l s of S t . C a t h a r i n e s , O n t a r i o . Data were a v a i l a b l e from t e s t i n g programs conducted i n 1938 and 1952-54, each of which used the Dominion A r i t h m e t i c Test of Fundamental Operations, 1934 e d i t i o n , f o r Grades 5, 6, and 7, and the Dominicn Group Achievement Test, P a r t I I , 1934 e d i t i o n , f o r Grade 8. The same t e s t s were administered again i n 1975-76., An attempt was made to adjust s c o r e s f o r changing socio-economic backgrounds and age-grade p a t t e r n s of students. Furthermore, Hedges found t h a t s c h o o l o b j e c t i v e s had changed s u f f i c i e n t l y to r e q u i r e the c r e a t i o n c f a " f a i r " t e s t f o r 1976 which b e t t e r r e f l e c t e d the a r i t h m e t i c p o r t i o n of the c u r r i c u l u m . By comparing the adjusted mean t e s t s c o r e s , Hedges found t h a t s tudents i n Grades 5 to 7 performed b e t t e r than t h e i r e a r l i e r c o u n t e r p a r t s i n fundamental o p e r a t i o n s i n a r i t h m e t i c when based on the t e s t of f a i r comparison. On the other hand, he found that the 1S75-76 26 Grade 8 students performed c o n s i d e r a b l y l e s s w e l l than the other two groups i n both a r i t h m e t i c computation and a r i t h m e t i c reasoning r e g a r d l e s s of whether the o r i g i n a l t e s t or the f a i r t e s t was used. Hedges conceded t h a t the r e s u l t s were anomalous. He argued t h a t the most l i k e l y reason f o r the c o n f l i c t i n g r e s u l t s a c r o s s grade l e v e l s was the e l i m i n a t i o n of high s c h o o l entrance examinations and the consequent decrease i n emphasis on mathematics i n Grade 8 over the p r e v i o u s twenty years i n Ontario. In a review of Hedges 1 r e s e a r c h , Winne (1979) pointed out s e v e r a l flaws i n the data a n a l y s i s . Since l o c a l norms were not a v a i l a b l e f o r the St,; C a t h a r i n e s s c h o o l s i n 1934, Hedges r e s o r t e d t o u s i n g median p r o v i n c i a l norms, arguing only t h a t the l o c a l and p r o v i n c i a l student p o p u l a t i o n s were s i m i l a r . . Secondly, Winne found i t i m p o s s i b l e to r e p l i c a t e Hedges 1 f i g u r e s by using the d e s c r i b e d adjustment to the mean scores. Nevertheless, Winne concluded t h a t the r e p o r t was a healthy s i g n t h a t the q u e s t i o n of change was being examined o b j e c t i v e l y . A s h o r t e r term study was conducted i n 1974 by the North York (Ontario) Board of Education ( V i r g i n 8 Darby, 1S74). They wished to compare the mathematics achievement of students at t h a t time with t h a t c f students i n 1972. To t h i s end, the Mathematics Computation s u b t e s t of the M e t r o p o l i t a n Achievement T e s t was administered to samples of approximately 1500 students i n s c h o o l s t h a t had p a r t i c i p a t e d i n the 1972 study at each of Grades 3, 5, and 6. Grade e q u i v a l e n t scores 27 based on 1971 American norms were used i n the a n a l y s i s . . For Grade 6, the expected mean grade e q u i v a l e n t at the time of t e s t i n g i n 1972 was 6.5, while the achieved value was 7. 2.\ In 1974, i t was 7.4.. The r e s e a r c h e r s concluded t h a t the 1974 l e v e l compared f a v o u r a b l y with t h a t of 1972, The 1974 North York study was r e p l i c a t e d i n 1975 i n t h e same sch o o l d i s t r i c t ( V i r g i n & Eowan, 1975) . A 20% s t r a t i f i e d random sample was chosen, with the r e s u l t t h a t 1442 students i n Grade 6 were s e l e c t e d to w r i t e the same t e s t as i n 1972. The mean grade e q u i v a l e n t score was 7.1 with an expected v a l u e of 6.7. The r e s e a r c h e r s argued t h a t the d e c l i n e frcm previous years could be e x p l a i n e d p a r t l y by the more r e p r e s e n t a t i v e sample. They a l s o suggested that the changed sampling technique was p a r t i a l l y r e s p o n s i b l e f o r an i n c r e a s e d range of s c h o o l mean grade e q u i v a l e n t s . A study of Grade 3 p u p i l s i n the c i t y of Edmonton showed a s l i g h t d e c l i n e i n a r i t h m e t i c achievement from 1S56 to 1977 (Clarke, Nyberg, & Worth, 1977)., In t h i s i n s t a n c e , a l l Grade 3 p u p i l s wrote the C a l i f o r n i a Achievement Test ( A r i t h m e t i c , 1950 e d i t i o n ) i n both 1956 and 1977. . Twelve of the e i g h t y items were deemed i n a p p r o p r i a t e because of high d i f f i c u l t y l e v e l or i r r e l e v a n c e to the A l b e r t a c u r r i c u l u m , and both the 1956 and 1977 t e s t s were r e s c o r e d to e l i m i n a t e those items. The mean raw s c o r e , dropped from 55.S8 to 55.56 i n 1977. At the same time, the standard d e v i a t i o n i n c r e a s e d from 5.S7 t o 7.58.. The authors suggested t h a t the decrease i n means was due t o lower achievement on a few items which had r e c e i v e d l e s s emphasis i n recent years. 28 Hammons (1972) c a r r i e d out a study i n Caddo P a r i s h , L o u i s i a n a to determine whether any s i g n i f i c a n t change had occurred i n a r i t h m e t i c computation and r e a s o n i n g among Grade 8 students during the p e r i o d 1960 to 1969. . A r e p r e s e n t a t i v e sample of 1000 t o 1500 students was s e l e c t e d f o r each of the odd-numbered years from 1961 to 1969,. The s t a n d a r d i z e d t e s t used was t h e C a l i f o r n i a Achievement Test,; A n a l y s i s of v a r i a n c e and trend a n a l y s i s r e v e a l e d a s i g n i f i c a n t d e c l i n i n g t r e n d of p r o f i c i e n c y i n computational s k i l l , but there was no s i g n i f i c a n t change i n achievement i n r e a s o n i n g . Hungerman (1975) c a r r i e d out a study t o compare the computational s k i l l s of Grade 6 students i n a southeastern Michigan school d i s t r i c t i n 1975 with those of a s i m i l a r group i n 1965. Ten s c h o o l s were re p r e s e n t e d , with 305 students i n 1965 and 386 i n 1975,. . The t e s t used i n both y e a r s was the C a l i f o r n i a A r i t h m e t i c T e s t (Part I I—Fundamentals) which co n t a i n e d f o u r s e c t i o n s , each of 20 guestions: a d d i t i o n , s u b t r a c t i o n , m u l t i p l i c a t i o n , and d i v i s i o n . F i f t e e n separate analyses of covariance were performed, one f o r each s e c t i o n and one f o r the t o t a l computation s c o r e , using t h r e e d i f f e r e n t I i Q> c o v a r i a t e s . R e s u l t s d i f f e r e d s l i g h t l y depending on the c o v a r i a t e s e l e c t e d . . For r e p o r t i n g purposes here the " t o t a l I. C. 11 w i l l be used. Hungerman's r e s u l t s showed no s i g n i f i c a n t d i f f e r e n c e s i n t o t a l computation s c o r e s . However, on the s u b t e s t s , the 1965 group was s i g n i f i c a n t l y favoured (£ < 0.01) on a d d i t i o n and subtraction,. The 1975 group performed s i g n i f i c a n t l y b e t t e r (p < 0.05) on d i v i s i o n , while the groups 29 were not s i g n i f i c a n t l y d i f f e r e n t on m u l t i p l i c a t i o n . For i n d i v i d u a l items of computation, the 1S75 group scored h i g h e r than the 1965 group on 10 a d d i t i o n items, 11 s u b t r a c t i o n items, 13 m u l t i p l i c a t i o n items, and 16 d i v i s i o n items.. They a l s o scored h i g h e r on a l l 33 whole number items, but lower on 20 of 30 f r a c t i o n items, and on 5 of 8 decimal items. Hungerman suggested t h a t a staisle teaching s t a f f and l a c k of any major socio-economic change were p o s i t i v e i n f l u e n c e s i n m a i n t a i n i n g computational s k i l l s . In an attempt to e x t r a c t more i n f o r m a t i o n from her data, Hungerman (1977) used p r o f i l e a n a l y s i s to determine r e l a t i v e performance within the c a t e g o r i e s of whole numbers and f r a c t i o n s . . She a l s o used median I,Q. scores to d i v i d e the s u b j e c t s i n t o high I* C- and low I.Q. c a t e g o r i e s . P r o f i l e a n a l y s i s g e n e r a l l y confirmed the r e s u l t s cf the a n a l y s i s of c o v a r i a n c e except f o r d i v i s i o n i n which no s i g n i f i c a n t d i f f e r e n c e was found i n performance a c r o s s a l l items. I n the a n a l y s i s by content, the 1975 group performed b e t t e r than the 1S65 group (JJ < 0.01) on the whole number q u e s t i o n s , while t h i s was r e v e r s e d on o p e r a t i o n s on f r a c t i o n s (JJ < 0,01). Performance of the low I.Q. subgroup i n g e n e r a l changed l i t t l e frcm 1965 to 1975; the high I.Q..subgroup c o n t r i b u t e d most of the t o t a l change. Studies a t the P r o v i n c i a l or S t a t e Leve,l The 1970 study i n B r i t i s h Columbia c i t e d i n Chapter I ( B r i t i s h Columbia Department of E d u c a t i o n , 1970) examined the performance of students at the end of the elementary 30 s c h o o l program. a l l Grade 7 students i n the p r o v i n c e . were administered the a r i t h m e t i c Seasoning and a r i t h m e t i c Computation t e s t s from the S t a n f o r d achievement T e s t , advanced B a t t e r y , Form 1 , (1953 r e v i s i o n ) i n 1964 and 1970. In 1964, t h e median modal-age grade e q u i v a l e n t s i n B r i t i s h Columbia were 1.8 and 1.1 years greater than the Onited S t a t e s norms on the reasoning and computation t e s t s , r e s p e c t i v e l y . By 1970, the excesses over american modal-age grade e q u i v a l e n t s had dropped to 0.8 and -0.1 on the two tests;* The authors p o i n t e d out that the comparison i n both cases was made on pre-1964 norms and suggested t h a t new American norms i n 1970 would be c o n s i d e r a b l y lower. Using a sample of three hundred papers s t r a t i f i e d by t o t a l s c o r e , the 1970 B r i t i s h Columbia r e p o r t c i t e d chances i n d i f f i c u l t y of items on the two S t a n f o r d t e s t s . P a r t i c u l a r l y on the computation t e s t , more items were more d i f f i c u l t i n 1S70 than were l e s s d i f f i c u l t , . However, as p o i n t e d out e a r l i e r , the c o n c l u s i o n s were not based upon adeguate sampling s t a t i s t i c s . In 1975, a study ( R u s s e l l , Robinson, Wolfe, S Dimond) of the c h a r a c t e r i s t i c s of elementary s c h o o l mathematics programs i n O n t a r i o was r e l e a s e d . P a r t c f the i n t e n t of the study was to i d e n t i f y apparent trends i n performance l e v e l s of students who had taken a r i t h m e t i c t e s t s as p a r t of a c o n t i n u i n g t e s t i n g program. The r e s e a r c h e r s found only s i x c o u n t i e s i n the province i n which s t a n d a r d i z e d t e s t s had been administered over an extended number cf years. In a l l c a s e s some combination of o b s t a c l e s made aut h e n t i c comparison o f the r e s u l t s d i f f i c u l t . Nevertheless, the i n v e s t i g a t o r s c i t e d performance i n one of t h e s e . j u r i s d i c t i o n s as showing a s l i g h t d e c l i n e i n Grades 4, 5, and 6 from 1968 to 1974.. They argued , c i t i n g a v a i l a b l e s t a t i s t i c s , t h a t such a d e c l i n e c o u l d be e x p l a i n e d by a d e c l i n e i n the mean age of students at each grade l e v e l a c r o s s t h a t period of time. R u s s e l l e t a l . (1975) also asked t e a c h e r s i n 85 s c h o o l s , s e l e c t e d on a s t r a t i f i e d random b a s i s , f o r t h e i r p e r c e p t i o n s of t r e n d s i n student performance. For Grades 6 and 8, approximately t h r e e - q u a r t e r s of the sample f e l t t h a t performance was e i t h e r t e t t e r or about the same i n recent years. About one-half of the p r i n c i p a l s of the same s c h o o l s , however, f e l t t h a t performance had d e c l i n e d , On the b a s i s of the q u e s t i o n n a i r e and the l i m i t e d s t a n d a r d i z e d t e s t i n f o r m a t i o n a v a i l a b l e , the r e s e a r c h e r s argued t h a t i t was reasonable t o conclude t h e r e had been no d e c l i n e i n standards on a p r o v i n c i a l l e v e l i n the p e r i o d 1965 to 1975. In Nova S c o t i a , a long-term study c f competence in b a s i c e d u c a t i o n a l s k i l l s was c a r r i e d . o u t from 1955 to 1974 (McDonald, 1978), The M e t r o p o l i t a n Achievement Te s t b a t t e r y was administered approximately every three years t o p r o v i n c i a l random samples of Grade 3 p u p i l s * Comparisons were made on the b a s i s of median grade e g u i v a l e n t s c o r e s . The r e s u l t s i n d i c a t e d no d e c l i n e i n performance. However, since three d i f f e r e n t e d i t i o n s of the t e s t s were used—1947, 1959, and 1S70--and the equating of t e s t scores f r o n cne e d i t i o n to the other was not d i s c u s s e d , i t i s d i f f i c u l t t c t e l l whether an ab s o l u t e l e v e l of performance was maintained, or whether the 32 p u p i l s simply performed at the.same l e v e l as the ncrming group i n each i n s t a n c e * Roderick (1973) used r e s u l t s on the Iowa T e s t s of B a s i c S k i l l s t o compare the performance of Grade 6 and Grade 8 students i n the s t a t e of Iowa i n 1973 to t h a t of 1965, 1951-55, and 1936. Comparisons were made i n the f o l l o w i n g t o p i c areas: whole number computation; f r a c t i o n a l number computation; decimals, percentages, and f r a c t i o n a l p a r t s ; measurement and geometry; and problem - s o l v i n g . R e p r e s e n t a t i v e samples of s c h o o l s i n the s t a t e were s e l e c t e d . The p a r t i c u l a r s t a t i s t i c a l t e s t used was not i n d i c a t e d . Roderick found the 1936 student performance s u p e r i o r to t h a t of 1973 i n a l l areas t e s t e d f o r each grade l e v e l . He found the 1951-55 students s u p e r i o r to the 1973 students i n whole number computation and f r a c t i o n a l number computation a t Grade 6, and i n decimals and percentages, and problem-solving at Grade 8. Students i n 1965 i n both grades were s u p e r i o r to the 1973 students i n problem-s o l v i n g , the only t o p i c i n c l u d e d i n the 1965 t e s t i n g . He concluded t h a t the modern mathematics c u r r i c u l u m was s e r i o u s l y d e f i c i e n t with r e s p e c t to many long-term c u r r i c u l a r g o a l s . The r e s u l t s i n a neighbouring s t a t e , however, dc not r e f l e c t the same patterm* In 1950, Beckmann (1978) c o n s t r u c t e d a t e s t based on the t o p i c s i d e n t i f i e d by the Commission on Post-War Plans of the N a t i o n a l C o u n c i l of Teachers of Mathematics as those which should have been mastered by a mathematically l i t e r a t e person. He administered h i s 109-item t e s t to a sample of Grade 9 s t u d e n t s across Nebraska i n 1950, again i n 1965, and f i n a l l y , i n 1975, The t e s t was administered t o over a thousand students i n each case, and the same s c h o o l s were used i n s o f a r as p o s s i b l e each time. Beckmann found a s i g n i f i c a n t gain (JJ < 0.001) between 1950 mean scores and 1965 mean s c o r e s . . T h i s was f o l l o w e d by a s i g n i f i c a n t l o s s (JJ < 0.001) from 1965 to 1975. The net r e s u l t l e f t the 1975 students at about the same l e v e l as the 1950 students.. Scores i n mathematics computation achieved by Grade 8 students i n New Hampshire showed a c o n s i s t e n t d e c l i n e on grade e q u i v a l e n t s from 196 3 to 1967. In an i n v e s t i g a t i o n of whether the i n t r o d u c t i o n of the modern mathematics program was having an e f f e c t on computational s k i l l s , A u s t i n and Prevost (1S72) c l a s s i f i e d Grade 8 students i n t o three g r o u p s — t r a d i t i o n a l , t r a n s i t i o n a l , and modern--depending upcn the type of textbook used f o r t e a c h i n g mathematics. A n a l y s i s of va r i a n c e c a r r i e d out i n 1965 on raw s c o r e s cn the A r i t h m e t i c Computation and A r i t h m e t i c Concepts s u b t e s t s of the M e t r o p o l i t a n Achievement Test showed no d i f f e r e n c e among groups on the Concepts s u b t e s t . For the Ccmcutation sub t e s t the modern group performed l e s s w ell than the t r a d i t i o n a l group (£ < 0.01), and a l s o l e s s w e l l than the t r a n s i t i o n a l group (JJ < 0.05). In 1967, s t u d e n t s wrote the a r i t h m e t i c Computations, Concepts, and A p p l i c a t i o n s s u b t e s t s of the Stanford Achievement Test. A n a l y s i s o f v a r i a n c e showed the modern group s u p e r i o r to the t r a n s i t i o n a l group (JJ < 0.01) on the A p p l i c a t i o n s s u b t e s t ; the modern group s u p e r i o r to both the t r a d i t i o n a l and t r a n s i t i o n a l groups (JJ < 0.01) on the Concepts s u b t e s t ; and the modern group s u p e r i o r t o the 34 t r a n s i t i o n a l group (£ < 0.01) on the Computations s u b t e s t . . In a f o l l o w - u p study (Austin S Prevost, 1972) the 1965 eighth-grade group was t e s t e d i n Grade 10 i n 1967.. Tests used were the Stanford High School Numerical Computation Test, and the S t a n f o r d High School Mathematics T e s t , P a r t A. The only s i g n i f i c a n t d i f f e r e n c e showed the modern group s u p e r i o r t o the t r a d i t i o n a l group (£ < 0.01) on the Mathematics t e s t * Using the two s t u d i e s as evidence, the authors concluded t h a t the type of mathematics t e x t used di d net d i f f e r e n t i a l l y a f f e c t the a b i l i t y of students to do a r i t h m e t i c * S t u d i e s a t the N a t i o n a l L e v e l In a n o v e l approach t o the problem, M a f f e i (1977) sent out 600 q u e s t i o n n a i r e s to a s t r a t i f i e d random sample of p u b l i c high school mathematics c h a i r p e r s o n s across a l l s t a t e s of the United S t a t e s . . Each c h a i r p e r s o n was asked to g i v e the q u e s t i o n n a i r e to an experienced and e f f e c t i v e mathematics t e a c h e r . The t e a c h e r s were asked to s t a t e whether the mathematics achievement of students i n t h e i r s c h o o l was on the d e c l i n e and, i f so, to check the reasons f o r the d e c l i n e . Seventy-nine percent of the teachers sampled b e l i e v e d there had been a d e c l i n e . Most of the reasons c i t e d f o r the d e c l i n e c e n t r e d on the s t u d e n t : l e s s s e l f - d i s c i p l i n e , lower mathematical entry s k i l l s , lower r e a d i n g comprehension s k i l l s , and h i g h e r absenteeism. The respondents a l s o f e l t t h a t mathematics t e a c h e r s were l e s s l i k e l y t o set minimum academic pass standards* A major study (NAEP, 1975) of the mathematical 35 s k i l l s of American c h i l d r e n and a d u l t s was c a r r i e d out i n 1972-73 by the N a t i o n a l Assessment o f E d u c a t i o n a l Progress (NAEP).. T h i s o r g a n i z a t i o n was founded t o survey the e d u c a t i o n a l attainments of persons at ages 9, 13, 17, and 26 t o 35 (adult) i n t e n l e a r n i n g areas, i n c l u d i n g mathematics. Over 90 000 i n d i v i d u a l s , s t a t i s t i c a l l y r e p r e s e n t a t i v e of the t o t a l p o p u l a t i o n of the United S t a t e s , were surveyed.. The emphasis i n the s t a t i s t i c a l a n a l y s i s was a t the item l e v e l , t h a t i s , on the percentage cf respondents who c o r r e c t l y answered each item* The p r o p o r t i o n s of the most i d e n t i f i a b l e common e r r o r s f o r each age group f o r each e x e r c i s e were a l s o r e p o r t e d . Although the NAEP data were intended t c provide a b a s e l i n e f o r f u t u r e assessments i n achievement, comparisons of the r e s u l t s were used i n one i n s t a n c e t c draw c o n c l u s i o n s about the i n f l u e n c e of the modern mathematics program. Carpenter, Cofcurn, Eeys, and Wilson (1975) pointed out that the 13- and 17-year-old groups would have been taught throughout t h e i r school c a r e e r s under the modern mathematics program, whereas th e ad u l t p o p u l a t i o n would not. They argued t h a t a d e t r i m e n t a l i n f l u e n c e of the new program on computational s k i l l s would be i n d i c a t e d i f the younger groups performed l e s s w e l l than a d u l t s on computational g u e s t i o n s . I n f a c t , the 13-year-olds performed almost as w e l l as a d u l t s on most computational t a s k s , and the 17-year-olds d i d t e t t e r than the a d u l t s . Hence, they argued, no d e t r i m e n t a l i n f l u e n c e of the modern mathematics program was e v i d e n t . . F i n a l l y , the general d e s c r i p t i o n of the t r e n d i n 36 mathematics achievement ' s t a t e d i n the NACOME r e p o r t ,'(1975) prov i d e s a s u c c i n c t summary of the s t u d i e s t o t h a t time;. The p r o j e c t team attempted to c o l l e c t enough data to determine the t r u t h or f a l s i t y of the charges of d e c l i n i n g student competence i n mathematics. They examined achievement data from f o u r major sources: s t a t e assessment r e p o r t s ( p a r t i c u l a r l y New York and C a l i f o r n i a ) , performance on s t a n d a r d i z e d t e s t b a t t e r i e s and r e p o r t s cn norming samples fxcm developers of s t a n d a r d i z e d t e s t s , r e s e a r c h s t u d i e s such as the N a t i o n a l L o n g i t u d i n a l Study of Mathematical A b i l i t i e s (NLSMA), and the N a t i o n a l Assessment of E d u c a t i o n a l Progress.^ They came t o two bread c o n c l u s i o n s : (1) there has teen a tendency f o r t r a d i t i o n a l c l a s s e s t o perform b e t t e r on computation while modern c l a s s e s do b e t t e r i n comprehension, and (2) mathematics achievement has shared i n the g e n e r a l d e c l i n e i n b a s i c s c h o l a s t i c s k i l l s s i n c e 1960.. They noted, however, t h a t the n a t i o n a l p i c t u r e was more complex than a p o l o g i s t s or c r i t i c s would make i t out to be. The s t u d i e s c i t e d are summarized i n Table 2,1. I n v e s t i g a t o r & Y e a r Time Span Region Grade L e v e l C ontent S t a t i s t i c Used F i n d i n g s Hedges (1977) V i r g i n e t a l . (1974, 1975) 1938/54/76 1972/74/75 S t . C a t h a r i n e s , O n t a r i o N o r t h Y o r k , O n t a r i o 5 - 8 . 6 comp u t a t i o n s & r e a s o n i n g c o m p u t a t i o n s a d j u s t e d mean raw s c o r e s mean grade e q u i v a l e n t s Grades 5 - 7: 1976 >1954 >1938 Grade 8: 1952 >1938 >1976 1974 > 1972 > 1975 C l a r k e e t a l . (1977) 1956-1977 Edmonton, A l b e r t a 3 mathematics achievement mean raw s c o r e s s l i g h t d e c l i n e from 1956 t o 1977 Hammons (1972) 1960-1969 Caddo P a r i s h , L o u i s i a n a 8 c o m p u t a t i o n & r e a s o n i n g ANOVA 1960 > 1969 Hunge rtnan (1975, 1977) 1965-1975 S c h o o l d i s t r i c t ( M i c h i g a n ) 6 comp u t a t i o n s ANOVA & p r o f i l e a n a l y s i s + ,-: 1965 >1975 •ST , whole #'s: 1975 > 1965 B. C. Dept. o f Ed. (1970) R u s s e l l e t a l . (1975) McDonald (1978) R o d e r i c k (1973) Beckmann (1978) A u s t i n & P r e v o s t (1972) 1964- 19 70 1965- 1975 1955-1974 1938/53/73 1950/65/75 1963-1967 B r i t i s h Columbia O n t a r i o Nova S c o t i a Iowa Nebraska New Hampshire 7 6,8 3 6,8 9 8 r e a s o n i n g & c o m p u t a t i o n mathematics achievement c o m p u t a t i o n & c o n c e p t s b a s i c a r i t h -m e t i c s k i l l s b a s i c math, knowledge c o m p u t a t i o n , c o n c e p t s , & a p p l i c a t i o n s median grade e q u i v a l e n t s mean raw s c o r e s & q u e s t i o n n a i r e s median grade e q u i v a l e n t s l i t t t e s t s ANOVA 1964 >1970 no change no change 1938 >1973 (Grades 6,8) 1953>1973 (Gr. 6: whole It's, f r a c . ) 1953 > 1973 (Gr. 8: d e c , p r o b - s o l v . ) 1965>1950 1965 > 1975 Mod>Trad (com p u t a t i o n s ) Mod>Trad, Trans ( c o n c e p t s ) Mod>Trans ( a p p l i c a t i o n s ) M a f f e i (1977) C a r p e n t e r e t a l . (1975) NACOME (1975) u n s p e c i f i e d 1965-1973 1960-1975 U. S. A. U. S. A'. U. S. A. s e c . 8, 12 a l l math, a c h i e v e , c o m p u t a t i o n mathematics q u e s t i o n n a i r e l o g i c based on i t e m a n a l y s i s meta-analys i s 79% o f t e a c h e r s b e l i e v e d d e c l i n e no d e t r i m e n t a l e f f e c t of modern program on c o m p u t a t i o n a l s k i l l s o v e r a l l d e c l i n e i n m a t h e m a t i c a l s k i l l s t r a d i t i o n a l > m o d e r n i n c o m p u t a t i o n modern > t r a d i t i o n a l i n comprehension 38 THE RASCH LOGISTIC MODEL I n a c o n c i s e e x p o s i t i o n o f t h e model, Rasch (1966a) i n d i c a t e d t h a t t h e r e were j u s t t h r e e assumptions u n d e r l y i n g t h e model: (a) To each s i t u a t i o n i n which a s u b j e c t (s=1,2..,n) has t o answer an item (i=1,2...m) t h e r e i s a c o r r e s p o n d i n g p r o b a b i l i t y of a c o r r e c t answer (XiL=1) which we s h a l l w r i t e i n t h e form (b) The s i t u a t i o n parameter ^\ 5 L i s t h e p r o d u c t o f two f a c t o r s , ^ s L = TTs • oil where fts p e r t a i n s t o t h e s u b j e c t and u>i t o t h e i t e m . (c) Given t h e v a l u e s of t h e parameters, a l l answers a r e s t o c h a s t i c a l l y i n d e p e n d e n t . (p.. 50) The s u b j e c t parameter, T f s / i s a measure of t h e a b i l i t y o f t h e s u b j e c t w i t h r e s p e c t t o t h e k i n d of i t e m b e i n g answered, and may take any n o n - n e g a t i v e v a l u e , w i t h h i g h e r v a l u e s i n d i c a t i n g g r e a t e r a b i l i t y . The i t e m parameter, COL # which may a l s o be any n o n - n e g a t i v e v a l u e , i s a measure of the e a s i n e s s of t h e i t e m . . The model may a l s o be s e t up u s i n g an i t e m parameter, 1/tOi.r which measures the d i f f i c u l t y c f t h e i t e m , w i t h h i g h e r v a l u e s i n d i c a t i n g g r e a t e r d i f f i c u l t y . I t i s t h i s form of the model which i s d e s c r i b e d i n C h a p t e r I of the p r e s e n t s t u d y . I n t h i s f o r m , by r e p l a c i n g n s w i t h A(s) and 1/toL w i t h D ( i ) , t h e Rasch e q u a t i o n r educes t c : 39 A(s) P ( s i ) = A(s) + D(i) where P (si) i s the p r o b a b i l i t y of s u b j e c t s c o r r e c t l y s o l v i n g item i , A(s) i s the a b i l i t y o f s u b j e c t s , and D(i) i s the d i f f i c u l t y of item i . The u n d e r l y i n g s c a l e f o r A(s) and D (i) i s a r a t i o s c a l e ranging from 0 to + <=o „ A second approach i s t o use the e g u i v a l e n t l o g i s t i c form of the model, d e r i v e d as f o l l o w s : D i v i d i n g numerator and denominator of the r i g h t hand s i d e by D (i) , The p r o b a b i l i t y of f a i l u r e on the item, Q(si) = 1 - P(si) = 1/p + A(s)/D(i) j] The e x p r e s s i o n f o r the odds on s u c c e s s , 0 ( s i ) , becomes 0 (si) = P (si) :Q (si) = A(s)/D(i) . Taking the l o g a r i t h m of both s i d e s : ln[ 0 (si) J = lnj. A (s) ,] - lnj;D(i)J3 And, s e t t i n g l n [ A ( s ) J = a (s) and ln£D(i) J = d (i) , lnj. 0 (si) t] = a (s) - d (i) A (s)/D (i) P(si) = £1 j 1 +• A ( s ) / D ( i ) a(s) - Pi2 > Pi3 > .... . > P i k , and t h a t each item orders s u b j e c t s by membership i n a score group i n the same way, t h a t i s , P1j < P2j < P3j < . < P ( k - 1 ) j . The i n f l u e n c e o f guessing The model assumes t h a t the p r o t a x i l i t y of a person with very low a b i l i t y c o r r e c t l y answering an item c f average d i f f i c u l t y i s near zero.. T h i s c o n d i t i o n may w e l l apply to a t e s t comprising open-ended questions o n l y , but the s i t u a t i o n becomes complicated when m u l t i p l e - c h o i c e q u e s t i o n s are used. Easch (1960) r e c o g n i z e d the problem but d i d net d e a l with i t e f f e c t i v e l y . He analyzed two m u l t i p l e - c h o i c e t e s t s , arguing i n one case t h a t " i t becomes p o s s i b l e to change the t e s t form from m u l t i p l e c h o i c e t o f r e e answers" (p. 62), and i n the second case "with so many answers o f f e r e d the d e f i c i e n c i e s of a m u l t i p l e - c h o i c e t e s t are p r a c t i c a l l y e l i m i n a t e d " (p. 62). Other l a t e n t t r a i t models a t t e n p t t o estimate a separate "guessing" parameter f o r each item to account f o r item m i s f i t at the low end of the. a b i l i t y continuum. Hambleton and Cook (1977) note t h a t e s t i m a t e s of such a parameter g e n e r a l l y are smal l e r than expected i i the assumption i s made t h a t low a b i l i t y examinees guess randomly on high d i f f i c u l t y items. . They c i t e Lord (1S74) i n suggesting t h a t t h i s i s probably due to the a b i l i t y of item w r i t e r s to develop a t t r a c t i v e but i n c o r r e c t a l t e r n a t i v e answers. , 58 Before c o n s i d e r i n g the problem of guessing f u r t h e r , i t may be i n s t r u c t i v e to i n t r o d u c e the n c t i c n of the item c h a r a c t e r i s t i c curve (ICC). T h i s i s the f u n c t i o n t h a t r e l a t e s the p r o b a b i l i t y of success on an item t o the a b i l i t y measured by the t e s t of which i t i s a part (Hambleton & Cook, 1S77).. In F i g u r e 2.2, curve A i s the item c h a r a c t e r i s t i c c u r v e . f o r an item of d i f f i c u l t y -1; B i s the curve f o r d i f f i c u l t y 1.5. , In each case the. p r o b a b i l i t y i s 0,. 5 t h a t a person with a b i l i t y matched to the item d i f f i c u l t y w i l l succeed on the item. In a l l c a s e s , r e g a r d l e s s of the a b i l i t y of an i n d i v i d u a l , the p r o b a b i l i t y of success i s higher on item A than on item B s i n c e item B i s more d i f f i c u l t . F i g u r e 2. 2. Item c h a r a c t e r i s t i c curves (ICC's). 59 In the Easch model a l l ICC's have the same shape. A m o d i f i c a t i o n c o u l d be made i f a guessing parameter were to be i n c l u d e d i n the model. T y p i c a l curves f o r f i v e - a l t e r n a t i v e m u l t i p l e - c h o i c e items e i g h t be as shown i n F i g u r e 2.3.. Ability F i g u r e 2.3. ICC's with a guessing parameter. In each case the curve a s y m p t o t i c a l l y approaches seme h y p o t h e t i c a l lower l i m i t which i s a f u n c t i o n of the number of a l t e r n a t i v e s . . Such a model might have the form: a(s) - d ( i ) e 2 (si) = g + (1 - g) • 'a (s) - d ( i ) i + e where 3 i s the asymptotic i n t e r c e p t on the p r o b a b i l i t y a x i s f o r the item. . The values o f a: may d i f f e r a c r o s s items ! • 60 depending on how much guessing each item provokes., Wright (1977a) made the o b s e r v a t i o n t h a t i f one a l l c f c s a parameter to measure the guessing p o t e n t i a l on an item then a comparable person parameter r e p r e s e n t i n g a person's i n c l i n a t i o n to guess might e q u a l l y w e l l be admitted. He argued t h a t such a d d i t i o n a l parameters "wreak havoc with the l o g i c and p r a c t i c e of measurement" (p. 103).. Waller (1975) devised a procedure t o remove the e f f e c t s of random guessing by e l i m i n a t i n g c o r r e c t responses i n the matrix f o r items deemed too d i f f i c u l t f o r p a r t i c u l a r a b i l i t y examinees. The procedure i s an i t e r a t i v e one r e q u i r i n g s u c c e s s i v e c a l i b r a t i o n s using i n c r e a s i n g values of a c u t - o f f value f o r the p r o b a b i l i t y t h a t an item i s answered c o r r e c t l y by chance alone. Waller found t h a t the o v e r a l l c h i -square value f o r t e s t i n g the f i t f i r s t decreased with i n c r e a s i n g c u t - o f f p r o b a b i l i t y and then i n c r e a s e d . The minimum c h i - s g u a r e p i n p o i n t e d the r e g u i r e d p r o b a b i l i t y l e v e l s f o r best e s t i m a t i o n of item d i f f i c u l t i e s . . Simulated data v e r i f i e d the e f f e c t i v e n e s s of the proposed procedure. Notably, although Waller acknowledged h i s debt t o Wright f o r a d v i c e , the procedure has not been i n c o r p o r a t e d i n t o the l a t e s t v e r s i o n of BICAL (Wright £ Mead, 1S78). Panchapakesan (1969), using simulated data, proposed a model which would provide f o r " i n t e l l i g e n t " g u essing. In her model the number of d i s t r a c t o r s e l i m i n a t e d by an examinee was a f u n c t i o n of the .. p r o b a b i l i t y t h a t he or she would c o r r e c t l y answer the item. The p r o b a b i l i t y l e v e l s at which the number of e f f e c t i v e d i s t r a c t o r s changed was a r b i t r a r i l y 61 s e t . S i m u l a t i o n s of a 20-item m u l t i p l e - c h o i c e (5 a l t e r n a t i v e s ) t e s t with a sample ' s i z e of 1000 and varying ranges of a b i l i t i e s were c a r r i e d out.. She concluded t h a t . i f the c a l i b r a t i n g sample was able enough the e f f e c t of guessing would be n e g l i g i b l e (p. 112)., She c i t e d Boss (1966) who a l s o found guessing t o be a n e g l i g i b l e f a c t o r i n h i s r e s e a r c h , and she suggested that may have been so because the average a b i l i t y of the s u b j e c t s was g r e a t e r than the average d i f f i c u l t y of the t e s t s . Panchapakesan proposed the f o l l o w i n g c r i t e r i o n f o r e l i m i n a t i n g examinees from the c a l i b r a t i o n sample: k / k(m-1) I f r = - + 2 , / - - — -m y ; m2 where k i s the number o f items, m i s the number of a l t e r n a t i v e s , and r i s the score below which examinees are eliminated,. In her equation, k/m i s the expected s c o r e based on random guessing, and k(m-1)/m 2 i s the variance of t h a t score.. Thus the procedure e l i m i n a t e s s c o r e s l e s s than two standard d e v i a t i o n s above the expected score due to random response by a l l examinees. I t i s d i f f i c u l t t o r e c o n c i l e t h i s recommendation with her s t a t e d i n i t i a l i n f e c t i o n of a l l o w i n g only f o r " i n t e l l i g e n t " guessing.„ Nonetheless, the procedure appears t o be r e a s o n a b l e , and p r o v i d e s a s t r a i g h t forward g u i d e l i n e f o r a l l m u l t i p l e - c h o i c e t e s t s . T i n s l e y and Dawis (1S75) acknowledged the p o s s i b l e e f f e c t s of guessing on item c a l i b r a t i o n . They r e f e r r e d to Panchapakesan's c r i t e r i o n but decided, on the b a s i s of t h e i r i n i t i a l l y s m a l l sample s i z e s (89 t o 319, mode 269), not to f o l l o w the recommended procedure, Wright and Mead (1978) suggested t h a t , i n achievement t e s t i n g , i t i s d e s i r a b l e to set the lower l i m i t "somewhat above the guessing l e v e l " (p. 65). ! ii The question of item d i s c r i m i n a t i o n ; As p r e v i o u s l y o u t l i n e d the Easch model assumes a l l item c h a r a c t e r i s t i c curves have the same shape. T h i s means, i n terms of t r a d i t i o n a l item a n l a y s i s , t h a t a l l items have egua l d i s c r i m i n a t i o n . Whitely and Dawis (1974) i n t e r p r e t e d t h i s to mean t h a t the r a t e at which the p r o b a b i l i t y of passing the items i n c r e a s e s with t o t a l score must be egual f o r a l l items (p. 166) . Again, as f o r guessing, a m o d i f i e d model co u l d be s e t out with item d i s c r i m i n a t i o n as a parameter.. For example, i f c i s the d i s c r i m i n a t i o n parameter, then the model might be: ca (s) . - d (i) e P ( s i ) = • — ca (s) - d ( i ) 1 + e T y p i c a l ICC's f o r t h i s f u n c t i o n might be as shown i n F i g u r e 2.4. In F i g u r e 2.4 item A has the t y p i c a l Easch shape, and the value of c i s u n i t y . The value of c f o r item B i s g r e a t e r than u n i t y , hence i t b e t t e r d i s t i n g u i s h e s h i g h e r a b i l i t y examinees from those of lower a b i l i t y . For item C the value of c i s l e s s than u n i t y ; i t d i s c r i m i n a t e s l e s s w e l l than item A. The measure o f item d i s c r i m i n a t i o n i s a f u n c t i o n of 6 3 F i g u r e 2.4. ICC's with a d i s c r i m i n a t i o n parameter. the s l o p e of the ICC at the p o i n t where the p r o b a b i l i t y of success i s 0 . 5 . admitting a parameter f o r item d i s c r i m i n a t i o n c o m p l i c a t e s the model even more than a l l o w i n g a guessing parameter. The Rasch model r e g u i r e s t h a t any i n d i v i d u a l stands a b e t t e r chance of succeeding on an easy item than on a d i f f i c u l t one.. In F i g u r e 2.4 t h i s does not h o l d . I n d i v i d u a l s with a b i l i t y a(1) w i l l l i k e l y do b e t t e r on item C than on item B whereas f o r i n d i v i d u a l s with a b i l i t y a(2) t h a t s i t u a t i o n i s r e v e r s e d . U n f o r t u n a t e l y f o r the Rasch model, items do d i f f e r i n t h e i r d i s c r i m i n a t i o n s . The lower l i m i t f o r d i s c r i m i n a t i o n values of items on achievement t e s t s i s zero s i n c e n e g a t i v e l y d i s c r i m i n a t i n g items are u s u a l l y d i s c a r d e d , and the upper bound i s l i k e l y around 2 (Hambleton & Cook, 1977). The 64 q u e s t i o n then becomes one of the robustness cf the model, that i s , how much d e v i a t i o n the model can t c l e r a t e and s t i l l p rovide u s e f u l estimates of d i f f i c u l t y and a b i l i t y . Panchapakesan (1969) e s t a b l i s h e d c e r t a i n c r i t e r i a f o r the i d e n t i f i c a t i o n of items with d e v i a n t d i s c r i m i n a t i o n s . Computer s i m u l a t i o n s i n d i c a t e d t h a t her c r i t e r i a could c o n s i s t e n t l y e l i m i n a t e s e r i o u s l y d e v i a n t items but they could not i d e n t i f y items whose d i s c r i m i n a t i o n s ranged from 0.8 to 1.25. Furthermore, f o r a simulated t e s t cf 20 items whose d i s c r i m i n a t i o n s ranged from 0.8 to 1.2 on a sample s i z e of 500 the mean-square f o r o v e r a l l f i t was not s i g n i f i c a n t l y l a r g e . Panchapakesan a l s o considered the q u e s t i o n of the e f f e c t of v a r y i n g item d i s c r i m i n a t i o n s on the measurement of the a b i l i t i e s of the examinees. ; Her s i m u l a t i o n s showed, when item d i f f i c u l t y and d i s c r i m i n a t i o n were u n c o r r e l a t e d , t h a t even f o r the extreme range of d i s c r i m i n a t i o n used (0.4 < c < 2.5) the b i a s i n a b i l i t y e s t i m a t e s was l e s s than the standard e r r o r of measurement. She. concluded, "In p r a c t i c a l a p p l i c a t i o n s the model i s robust even when the c o n d i t i o n of equal d i s c r i m i n a t i o n i s not met" (p. .100).., Dinero and H a e r t e l (1976) a l s c i n v e s t i g a t e d the a p p l i c a b i l i t y of the Rasch model with v a r y i n g item d i s c r i m i n a t i o n s . F o c u s s i n g upon the b i a s i n measuring a b i l i t y , they simulated 30-item t e s t s on 75 s u b j e c t s using f i v e d i s c r i m i n a t i o n variances r a n g i n g from 0.05 to 0.25 drawn from t h r e e d i s t r i b u t i o n s : normal, uniform, and p o s i t i v e l y skewed (the most l i k e l y i n p r a c t i c e ) . . They concluded t h a t the Rasch c a l i b r a t i o n procedure i s r o b u s t with r e s p e c t to 65 departures from homogeneity i n item d i s c r i m i n a t i o n . They went cn to d i s c u s s the seemingly c o u n t e r - i n t u i t i v e requirement of equal d i s c r i m i n a t i o n under the Easch model as opposed to maximum d i s c r i m i n a t i o n i n the c l a s s i c a l model., They suggested t h a t , although higher d i s c r i m i n a t i o n on an item r e s u l t s i n b e t t e r placement of an i n d i v i d u a l , t h i s i s achieved at the c o s t of l o s s of range. As an extreme example, an item having p e r f e c t d i s c r i m i n a t i o n y i e l d s i n f o r m a t i o n about only a s i n g l e p o i n t on the a b i l i t y s c a l e . . They a l s o cautioned t h a t e stimates of d i s c r i m i n a t i o n depend on the f i t of the item, t h a t i s , the worse the item f i t the l a r g e r the e r r o r of e s t i m a t e of i t s d i s c r i m i n a t i o n may be. Several other s t u d i e s u sing simulated data show s i m i l a r r e s u l t s . C a r t l e d g e (1975) found t h a t items with s l o p e s i n the r e g i o n of 0.90 to 1.10 were t r e a t e d as s i m i l a r items f i t t i n g the model. She concluded t h a t even when the s l o p e s vary as much as from 0.80 t;o 1.20 the model i s ro b u s t . The r e s u l t s of Hambleton (1969) accord with C a r t l e d g e ' s f i r s t f i n d i n g , i n d i c a t i n g t h a t a range of 0.20 y i e l d s c o n s i s t e n t f i t to the model.. However, when guessing was a l s o i n t r o d u c e d as a parameter, Hambleton found c o n s i s t e n t r e j e c t i o n of the n u l l h y p o t h e s i s t h a t t h e model f i t the data. On the o t h e r hand, as p r e v i o u s l y mentioned, Wright and Mead (1978) found t h a t simulated runs using e x a c t l y egual d i s c r i m i n a t i o n s f r e q u e n t l y y i e l d e d observed standard d e v i a t i o n s of the e s t i m a t e s as l a r g e as 0.20., The i n f e r e n c e which might be drawn i s that a range cf 0.60 to 1.40 i s a c c e p t a b l e . 66 In an a p p l i c a t i o n of the model t c a c t u a l r e s u l t s on o b j e c t i v e achievement t e s t s , i n c l u d i n g mathematics, S o r i y a n (1971) found t h a t d i s c r i m i n a t i o n i n d i c e s c f items f i t t i n g the model were q u i t e unequal, i n the range of 0.50 to 1.25. I t should be noted t h a t t h i s p i e c e o f r e s e a r c h d i f f e r s i n kind from t h a t on simulated data. In the l a t t e r case the d i s c r i m i n a t i o n parameters are known , i n the former the values are those estimated a f t e r f i t t i n g the.model. Consequent C o n d i t i o n s The preceding four s e c t i o n s were concerned with i m p l i c a t i o n s d e r i v i n g from the assumptions, t h a t i s , antecedent c o n d i t i o n s . They d e a l t with i n d i v i d u a l items and item f i t to the model., I m p l i c a t i o n s f o l l o w i n g from the consequence of the model, that i s , of " s p e c i f i c o b j e c t i v i t y " , have been termed consequent c o n d i t i o n s . They d e a l with s e t s of. items and persons, t h a t i s , with t e s t s and samples , and l e a d t o questions of t e s t f i t . . The two consequent c o n d i t i o n s i n p a r t i c u l a r t o be i n v e s t i g a t e d are "sample-free item c a l i b r a t i o n " and " t e s t - f r e e person c a l i b r a t i o n " . Sample-free item c a l i b r a t i o n One of the consequent c o n d i t i o n s of the model i s th a t e s t i m a t e s of the d i f f i c u l t i e s o f items can be made without regard f o r the a b i l i t i e s of the persons i n the c a l i b r a t i n g sample. More c o r r e c t l y , d i f f i c u l t y estimates are made by t a k i n g i n t o account the d i s t r i b u t i o n cf the a b i l i t i e s of the persons i n the sample, thereby f r e e i n g the d i f f i c u l t y e s t i m a t e s from the p a r t i c u l a r s of the a b i l i t i e s (Wright, 1S67), C o n s i d e r a b l e r e s e a r c h has been c a r r i e d out to determine whether item c a l i b r a t i o n s made using r e a l persons ccnform to the model. The. s t u d i e s which f o l l o w are c i t e d c h r o n o l o g i c a l l y i n order t o show the p r q g r e s s i v e .nature of the r e s e a r c h . In 1964, Brocks ( c i t e d i n T i n s l e y , 1971) analyzed i the r e s u l t s of 509 Grade 8 students and 544 Grade 10 s tudents on the Lorge-Thorndike I n t e l l i g e n c e b a t t e r y . The b a t t e r y comprised f i v e v e r b a l and three non-verbal t e s t s made.up of m u l t i p l e - c h o i c e items with f i v e a l t e r n a t i v e s . Brooks p l o t t e d d i f f i c u l t y l e v e l s of a l l items on the t e s t as determined from the Grade 8 sample a g a i n s t those d i f f i c u l t y l e v e l s determined from the Grade 10 sample. He devised h i s own s t a t i s t i c f o r determining the f i t of the p o i n t s to a s t r a i g h t l i n e with u n i t s l o p e and concluded t h a t the d i f f i c u l t i e s were i n v a r i a n t with r e s p e c t to the a b i l i t y of the c a l i b r a t i n g sample. T i n s l e y (1971), however, b e l i e v e d t h a t i t was not p o s s i b l e to judge the g u a l i t y of Brooks' c o n c l u s i o n s i n c e the s i g n i f i c a n c e of B i c o k s ' s t a t i s t i c s c o u l d not be evaluated. , Anderson, Kearney, and E v e r e t t (1968) analyzed the responses to a 45-item i n t e l l i g e n c e type s c r e e n i n g t e s t f o r r e c r u i t s t o the A u s t r a l i a n armed f o r c e s . _ Sample s i z e s were 608 and 874. _ The Pearscn product-moment c o r r e l a t i o n between estimated item d i f f i c u l t i e s was 0.958, i n d i c a t i n g d i f f i c u l t y i n v a r i a n c e . . When n o n - f i t t i n g items were removed and the a n a l y s i s repeated, the c o r r e l a t i o n between the o r i g i n a l d i f f i c u l t y l e v e l and the r e c a l i b r a t e d values was 0.9999 f o r 68 a l l c a s e s. The authors c i t e d t h i s as evidence t h a t i t i s not necessary t o r e c a l i b r a t e item d i f f i c u l t i e s a f t e r d e l e t i n g non-f i t t i n g items. The sample of 608 was broken down i n t o s i x a i i l i t y groupings and item d i f f i c u l t i e s were determined f o r each group. The c o r r e l a t i o n between item d i f f i c u l t i e s by groups and by t o t a l s i z e was 0.996. , The authors concluded: I n v a r i a n c e i n Basch's model appears to be e s t a b l i s h e d , i n t h a t n e i t h e r the sample from which the s c a l e values were d e r i v e d , nor the presence of items t h a t f a i l e d to meet: the model, appears to have any r e a l i n f l u e n c e on the r e s u l t a n t s c a l e values, (p. 237) , i, T i n s l e y (1971) a p p l i e d the model to analogy t e s t s . He made ten compariscns between va r i o u s types of respondents on f o u r d i f f e r e n t types of analogy ; t e s t s with sample s i z e s r a nging from 89 t o 630. Based on Pearson product-moment c o r r e l a t i o n s he concluded t h a t s i x o f the ten comparisons (r > 0.88) supported the hypothesis of d i f f i c u l t y i n v a r i a n c e . In two cases the sample s i z e s were deemed too s m a l l , and he suggested t h a t the other two cases may have been i n v a l i d because of the t e s t c c n s t r u c t i o n procedure.. T i n s l e y a l s o concluded t h a t the d e l e t i o n of n o n - f i t t i n g items i n c r e a s e s the i n v a r i a n c e of the item d i f f i c u l t y e s t i m a t e s . Passmore (1974) a p p l i e d the Easch model to a l a r g e sample (6287) of n u r s i n g students who wrote the two-part N a t i o n a l League f o r Nursing Achievement Test i n Normal N u t r i t i o n . Two s u b t e s t s of items f i t t i n g the model were i d e n t i f i e d and two non-cverlapping samples were determined f o r each subtest by d i v i d i n g the scores at the median s u b t e s t s c o r e . The item d i f f i c u l t y e s t i m a t e s c o r r e l a t e d 0.994 and 0.997 f o r the two s u b t e s t s , r e s p e c t i v e l y . . Passmore a l s o noted 69 t h a t the d i f f e r e n c e s between the two d i f f i c u l t y e s t i m a t e s f o r each item were not more than 2.0 standard e r r o r s of estimate f o r the 40 items on one s u b t e s t and no more than 1.25 standard e r r o r s f o r the 65 items on the other. . He concluded t h a t the Easch iDOdel was found to provide sample-free t e s t c a l i b r a t i o n , In a 1976 study to determine the s m a l l e s t sample s i z e which would y i e l d r e l i a b l e item d i f f i c u l t y c a l i b r a t i o n s , the Northwest E v a l u a t i o n a s s o c i a t i o n (NWEA) of P o r t l a n d , Oregon used the responses of 1400 s t u d e n t s to a Grade 4 mathematics t e s t ( F o r s t e r , Ingebo, & Wclmut, undated ( b ) ) . F i v e random samples were drawn f o r each of f o u r d i f f e r e n t sample s i z e s : 50, 100, 200, and 300, The mean c a l i b r a t e d item d i f f i c u l t i e s f o r each sample s i z e were c o r r e l a t e d with those f o r the t o t a l group of 1400 s t u d e n t s , The standard d e v i a t i o n of the item d i f f i c u l t y values f o r each sample s i z e was d i v i d e d by the standard d e v i a t i o n of the d i f f i c u l t y values determined by using the e n t i r e group of 1400 respondents. These r a t i o s were compared to the value c f u n i t y , which should apply i f the m e t r i c s are e g u a l . a t h i r d s t a t i s t i c used i n the comparison was the absolute value of the d i f f e r e n c e between the two estimates of d i f f i c u l t y values, C o n c l u s i c n s regarding sample s i z e w i l l be i n d i c a t e d l a t e r ; here the main p o i n t of i n t e r e s t i s the d i v e r s i t y of procedures used i n the comparison. The NWEA a l s o c a r r i e d out a study to determine whether item d i f f i c u l t i e s c o u l d be determined without random sampling ( F o r s t e r , Ingebo, & Wclmut, undated ( a ) ) . , The r e s e a r c h e r s d i v i d e d the sample o f 1400 students on a Grade 4 r e a d i n g t e s t i n two ways: (1) above average versus below 70 average, and (2) i n n e r - c i t y students versus o t h e r s . „ Cn the b a s i s of what they c a l l e d " r e s t r i c t e d c o r r e l a t i o n s " they concluded t h a t random, samples were net r e q u i r e d . T h i s c o n c l u s i o n was confirmed i n a r e p l i c a t i o n study on 4000 Grade 4 students i n r e a d i n g and mathematics, and 4000 Grade 8 students i n r e a d i n g and mathematics. Hashway (1977) c r i t i c i z e d procedures f o r t e s t i n g the d i f f i c u l t y i n v a r i a n c e c o n d i t i o n as i n d i c a t e d i n the research to t h a t date on s e v e r a l counts., He s t a t e d t h a t the use of simple b i v a r i a t e p l o t s was inadequate s i r c e no t e s t s of s i g n i f i c a n c e are a p p l i c a b l e . , Of more importance, he contended t h a t the c o r r e l a t i o n c o e f f i c i e n t was not the a p p r o p r i a t e s t a t i s t i c (p. .42). He argued t h a t the e x i s t e n c e of a high c o r r e l a t i o n or a s i m i l a r rank o r d e r i n g does not imply e q u i v a l e n c e of raw scores. To overcome t h i s problem Hashway suggested a d i f f e r e n t procedure f o r t e s t i n g the item d i f f i c u l t y i n v a r i a n c e property. , He proposed l o o k i n g a t the r e g r e s s i o n equation between d i f f i c u l t y e s t i m a t e s : d (ik) = b (1)d ( i j ) + b (2) where d ( i j ) and d(ik) are, the estimated item d i f f i c u l t i e s based on sample groups j and k, b(1) i s the slope of the r e g r e s s i o n l i n e , and b(2) i s the i n t e r c e p t of the r e g r e s s i o n l i n e , both of the l a t t e r parameters estimated using a l e a s t squares procedure. I f the item d i f f i c u l t y e s t i m a t e s are e q u i v a l e n t then b(1) should not d i f f e r s i g n i f i c a n t l y from 1.0 and b (2) should not d i f f e r s i g n i f i c a n t l y from 0.0.. The s u f f i c i e n t s t a t i s t i c 71 f o r t e s t i n g each h y p o t h e s i s i s the t s t a t i s t i c . Hashway used t h i s procedure to r e a n a l y z e data r e p o r t e d by Whitely and Dawis (1976) i n which they, by a p p l y i n g an a n a l y s i s of v a r i a n c e procedure, had found against the item d i f f i c u l t y i n v a r i a n c e p r o p e r t y . Hashway's a n a l y s i s negated the c o n c l u s i o n s s e t out by the o r i g i n a l i n v e s t i g a t o r s . Hashway (1977) a p p l i e d the r e g r e s s i o n procedure to h i s cwn r e s e a r c h . He c o n s t r u c t e d two mathematics t e s t s and administered each to samples of I r i s h students a t two l e v e l s , approximately Grades 6 and 1, and at two times, f a l l and s p r i n g . Thus f o r each t e s t four c a l i b r a t i o n s were a v a i l a b l e with twelve d i f f e r e n t p a i r e d comparisons possible.„ At the 0.05 l e v e l o f s i g n i f i c a n c e he found none of the slopes d i f f e r i n g from u n i t y and none of the i n t e r c e p t s d i f f e r i n g from zero . Furthermore he found t h a t the maximum observed d i f f e r e n c e o f item d i f f i c u l t i e s f o r each item on the f o u r c a l i b r a t i o n s to be l e s s than 1.6 times the s m a l l e s t standard e r r o r of the item's f o u r d i f f i c u l t y e s t i m a t e s . T h i s compared f a v o u r a b l y with an expected 10% o f the items which would have occu r r e d on the b a s i s of random e r r o r alone. He concluded t h a t the item d i f f i c u l t y i n v a r i a n c e property h o l d s . In the d i s c u s s i o n of h i s r e s u l t s Hashway (1977, p. 148) expressed concern over the procedure.. He suggested t h a t the approach he used may be a necessary but not a s u f f i c i e n t c o n d i t i o n f o r item d i f f i c u l t y i n v a r i a n c e . He suggested an a l t e r n a t i v e procedure based on the a n a l y s i s of s t a n d a r d i z e d d i f f e r e n c e s c o r e s . T h i s procedure p a r a l l e l s t h a t t o be e l a b o r a t e d i n the next s e c t i o n of the present study.. 72 F i n a l l y , Eentz and Eentz (1978) expressed the f o l l o w i n g caveat: One should regard "sample-free" and " i t e m - f r e e " with a l i t t l e d i s c r e t i o n , ,. Items cannot j u s t be given to any group of people; the sample must be comprised of a p p r o p r i a t e people. He can not c a l i b r a t e a l g e b r a problems on 2nd-graders; whether we c a l i b r a t e them cn 8th graders depends cn the experiences o f the 8th-graders.. Items with which you i n t e n d t o measure : 9th-graders should be c a l i b r a t e d on people with ; 9th-grade e x p e r i e n c e s . While we do not have t o pay p a r t i c u l a r a t t e n t i o n t p r e p r e s e n t a t i v e n e s s o f the sample i n any k i n d o f s t r i c t sampling way,, we have t o e x e r c i s e good judgement to make sure the sample i s a p p r o p r i a t e , There i s u s u a l l y no reason why cne can't get a sample t h a t i s reasonably, r e p r e s e n t a t i v e . ,. I f i n general the group i s a p p r o p r i a t e i n terms of the people f o r whom the t e s t i s . designed, then the p a r t i c u l a r sample does not matter. (p. ,26) T e s t - f r e e person c a l i b r a t i o n The second consequent c o n d i t i o n h c l d s t h a t j u s t as estim a t e s of item d i f f i c u l t y can be made without regard f o r the a b i l i t y c f the c a l i b r a t i n g sample, sc may e s t i m a t e s of person a b i l i t y be made r e g a r d l e s s o f the d i f f i c u l t y of items used to assess t h a t a b i l i t y . , T h i s f e l l o w s from the fundamental equation r e l a t i n g item d i f f i c u l t y and person a b i l i t y ; i t may be thought c f as a d u a l i t y p r i n c i p l e o p e r a t i n g i n the model. Wright (1967) analyzed the scores of 976 students who had w r i t t e n the 48-item r e a d i n g comprehension s e c t i o n of the Law School Admission T e s t . He d i v i d e d the c a l i b r a t e d items from the t e s t i n t o two s u b t e s t s of 24 items each. , The 24 e a s i e s t items were used t o make up an Easy Test, and the 24 most d i f f i c u l t items c o n s t i t u t e d the Hard Test. Each of the 976 students was thus assigned two a b i l i t y e s t i m a t e s , one 73 based on the r e s u l t s on the Easy T e s t , and another based on the Hard T e s t . Each a b i l i t y e stimate was accompanied by i t s standard e r r o r o f es t i m a t e . To assess whether e q u i v a l e n t a b i l i t y estimates were made f o r each person, Wright determined the d i f f e r e n c e between the two esti m a t e s and d i v i d e d by the standard e r r o r of the d i f f e r e n c e t o o b t a i n a standardized d i f f e r e n c e f o r each person., He argued t h a t , i f the estimates were s t a t i s t i c a l l y e q u i v a l e n t , the d i s t r i b u t i o n o f the sta n d a r d i z e d d i f f e r e n c e s should have a mean of zero and a standard d e v i a t i o n of u n i t y . The r e s u l t s showed a mean of 0.003 and standard d e v i a t i o n of 1*QT4.„ Without making f u r t h e r s t a t i s t i c a l a nalyses Wright concluded t h a t t e s t - f r e e person measurement was i n d i c a t e d . Willmott and Fowles (1974) used a s i m i l a r procedure t o analyze the responses t o the 50 f i t t i n g items on a 70-item General C e r t i f i c a t e of Education O - l e v e l P h y s i c s t e s t . Two a b i l i t y measures f o r each person were determined—one from the e a s i e s t 25 items, the other from!'the. most d i f f i c u l t 25 items. Of the 745 c a n d i d a t e s , the a b i l i t y estimates f o r 703 (94.4%) d i f f e r e d by l e s s than two standard e r r o r s , T Since the s i g n i f i c a n c e l e v e l of t h i s t e s t was f i v e percent, the authors concluded t h a t , once the poor items had been e d i t e d out, the items i n the t e s t y i e l d e d person measures which were t e s t -f r e e . Whitely and Dawis (1974) s e t up a t e s t of the model s i m i l a r to t h a t of Wright.. They p o i n t e d out t h a t Wright's use of the term " s t a t i s t i c a l l y e q u i v a l e n t forms" f a l l s under Lord and Novick's (1968) concept of t a u - e q u i v a l e n t measures,. The 74 expected values f o r t r u e s c o r e s are equal but the expected values of e r r o r v a r i a n c e s are not n e c e s s a r i l y e q u a l . , To t e s t t h i s e g u i valency of c a l i b r a t e d subsets Whitely and Dawis r e -analyzed a p o r t i o n of T i n s l e y ' s (1971) data. Responses from 949 s u b j e c t s on a 60-item v e r b a l a n a l o g i e s t e s t were used to c a l i b r a t e the items.. Three d i f f e r e n t d i v i s i o n s of the item pool were then set up: (1) odd versus even items, (2) easy versus hard items, and (3) randomly s e l e c t e d subsets with no o v e r l a p . They found f c r two subset comparisons, odd/even and random, no s i g n i f i c a n t d i f f e r e n c e s i n e i t h e r the means of the a b i l i t y e s t i m a t e s or i n t h e i r v a r i a n c e s , . For the easy/hard comparison, however, both the means and the v a r i a n c e s d i f f e r e d (j: < 0.05) . , When comparisons were made on s t a n d a r d i z e d d i f f e r e n c e s , the o n l y s i g n i f i c a n t d i f f e r e n c e was i n the v a r i a n c e on the easy/hard s u b t e s t comparison (j) < 0.01). Whitely and Dawis concluded t h a t the r e s u l t s i n d i c a t e d that the Rasch model would produce s t a t i s t i c a l l y e q u i v a l e n t forms f o r any item subset except under the mpst extreme conditions... Passmore's study (1974), on the ether hand, showed c o n t r a r y r e s u l t s . . The responses of 6287 nurses cn two Easch c a l i b r a t e d s u b t e s t s of an achievement t e s t i n n u t r i t i o n were used to t e s t the a b i l i t y i n v a r i a n c e h y p o t h e s i s . The a b i l i t i e s of examinees s c o r i n g higher than the median s c o r e on each s u b t e s t were estimated from the easy, hard, odd, and even items s e l e c t e d from each s u b t e s t . C o r r e l a t i o n s between a b i l i t y estimates on the easy/hard and odd/even items on each s u b t e s t were low, the h i g h e s t being 0.382. Passmore concluded t h a t i t e m - f r e e measurement was not a t t a i n e d t c any reasonable 75 c r p r a c t i c a l degree. Eentz and Bashaw (1977) f e l t s u f f i c i e n t l y c o m f o r t a b l e with the a b i l i t y i n v a r i a n c e c o n d i t i o n t o apply the Easch model to the problem of t e s t e q u a t i n g . Using the r e s u l t s of the equating phase o f the Anchor Test Study ( I o r e t , Seder, B i a n c h i n i , & Vale, 1974) they converted the s c o r e s on twenty-eight reading t e s t s f o r Grades 4, 5, and 6 students to a transformed Easch s c a l e . Each c h i l d responded t c two reading t e s t s y i e l d i n g an estimate o f the d i f f e r e n c e i n the a b i l i t y s c a l e o r i g i n f o r the two t e s t s , , at each grade l e v e l , seven d i f f e r e n t b a t t e r i e s were used. Each b a t t e r y was p u b l i s h e d i n two forms and these were administered to f o u r t e e n a d d i t i o n a l samples at each grade l e v e l . , T h i s provided a b a s i s f o r equating the t e s t s w i t h i n each grade l e v e l . The f i n a l s tep was t o combine data across grade l e v e l s , t h a t i s , to c a r r y out " v e r t i c a l equating" of the t e s t s , y i e l d i n g a ccmmon a b i l i t y s c a l e f o r a l l t e s t s and a l l grades. The authors found t h a t i n comparing t h e i r r e s u l t s with r e s u l t s using e q u i p e r c e n t i l e e q u ating, most t e s t p a i r s were i n agreement., Eentz and Bashaw (1977) d i d not i n d i c a t e any d i r e c t evidence f o r the i t e m - f r e e person measurement c o n d i t i o n . , That i s , they d i d not attempt to determine the degree of agreement between estimates f o r each c h i l d ' s reading a b i l i t y as determined from the two r e a d i n g t e s t s . Instead, they concentrated on demonstrating the s t a b i l i t y cf the raw score t c estimated a b i l i t y c o n v e r s i o n f o r each t e s t . , S l i n d e and Linn (1978) explored the adequacy o f the Easch model f o r v e r t i c a l equating by using item response data frcm 1365 students on a 50-item C o l l e g e Entrance Board achievement t e s t i n mathematics. . Fourteen items were d e l e t e d as being too d i f f i c u l t , too easy, or p o s s i b l y speeded., Wright's procedure was f o l l o w e d by d i v i d i n g the items i n t o an easy s u b t e s t and a d i f f i c u l t one. . T h e i r r e s u l t s were s i m i l a r to those of Wright (1967) and Whitely and Dawis (1974).. However, S l i n d e and l i n n continued t h e i r a n a l y s i s by c a l i b r a t i n g the items s e p a r a t e l y ! u s i n g the lew a b i l i t y , medium a b i l i t y , and h i g h a b i l i t y students independently. When a b i l i t y estimates were obtained cn extreme groups by using d i f f i c u l t y e s t i m a t e s obtained from the opposite extreme groups t h e r e were d i s c r e p a n c i e s , . For the middle group the authors pointed out that the examinees would do b e t t e r by t a k i n g the d i f f i c u l t t e s t when a b i l i t y e s t i m a t e s were obtained frcm the high group, but would do b e t t e r to take the easy t e s t when the estimates were obtained frcm the low group. . They concluded t h a t the Easch model d i d not provide a s a t i s f a c t o r y means of v e r t i c a l equating, but conceded t h a t the ccmparisons may have been more extreme than apt to be encountered when eguating t e s t s over s e v e r a l grades. Hashway (1977) was concerned with the s t a t i s t i c a l procedures used t o t e s t the score i n v a r i a n c e p r o p e r t y . He extended the s t a n d a r d i z e d d i f f e r e n c e procedure of Wright (1S67) and Whitely and Dawis (1974).. In that procedure, i t w i l l be r e c a l l e d , i f two estimates are a v a i l a b l e f o r the a b i l i t y of each person they are s u b t r a c t e d and the d i f f e r e n c e i s d i v i d e d by the standard e r r o r of the d i f f e r e n c e of the two e s t i m a t e s . T h i s r e s u l t s i n a s t a n d a r d i z e d d i f f e r e n c e score 77 f o r each person. I f the d i s t r i b u t i o n of such scores i s u n i t normal, that i s , with mean o f zero and v a r i a n c e of one, i t may be assumed t h a t d i f f e r e n c e s are due to random e r r o r . Thus the f i r s t s t e p, Hashway argued, i s t c compare the observed d i s t r i b u t i o n with the u n i t normal d i s t r i b u t i o n f u n c t i o n . T h i s can be done u s i n g e i t h e r the c h i - s g u a r e or the Kclmogcrov-Smirnov s t a t i s t i c . I f the t e s t s t a t i s t i c i s not s i g n i f i c a n t the v a r i a t i o n i n s t a n d a r d i z e d d i f f e r e n c e s c o r e s may be assumed to be a r e s u l t of random e r r o r . only. I f , however, the hy p o t h e s i s of u n i t n o r m a l i t y i s r e j e c t e d , a second step should be performed. Hashway argued t h a t there were two reasons why the observed d i s t r i b u t i o n would be ncn-normal: (1) there i s g r e a t e r concordance of estimates than expected from random e r r o r , or (2) there i s g r e a t e r discordance than expected. E e j e c t i o n of the f i r s t c o n d i t i o n means acceptance of the second. For the f i r s t c o n d i t i o n t o h o l d , the d i s t r i b u t i o n must be l e p t o k u r t i c , and c e n t r e d on zero with v a r i a n c e l e s s than u n i t y . . R e j e c t i o n of t h i s l e p t o k u r t i c property Kould imply t h a t there i s g r e a t e r d i s c o r d a n c e than expected and the a b i l i t y i n v a r i a n c e c o n d i t i o n does not h e l d . Hashway found t h a t the d i s t r i b u t i o n of s t a n d a r d i z e d d i f f e r e n c e s observed i n h i s own r e s e a r c h was l e p t o k u r t i c and concluded that measurement based on Rasch instruments seemed to provide a s t a b l e mapping f u n c t i o n . 78 The Issue of Sample S i z e i ft co n t r o v e r s y with r e s p e c t t o the s i z e of sample r e q u i r e d f o r the a p p l i c a t i o n of the Easch mcdel has r e c e n t l y s u r f a c e d . whitely (1977) maintained t h a t the key i s the sample s i z e r e q u i r e d f o r adequately t e s t i n q the f i t of items t c the model s i n c e the consequent c o n d i t i o n s depend on t h i s f i t . She argued t h a t a reasonably powerful s i g n i f i c a n c e t e s t i s needed to ass e s s item f i t and t h i s can only occur with l a r g e sample s i z e s . . She suggested t h a t a sample s i z e o f l e s s than 800 f a i l s to d e t e c t s i z e a b l e d i f f e r e n c e s . Wright (1977b) argued t h a t the important c o n s i d e r a t i o n i s the p r e c i s i o n of the c a l i b r a t i o n of items and concluded t h a t sample s i z e s of 500 are more than adequate i n p r a c t i c e . He contended t h a t sample s i z e depends upon the d e s i r e d standard e r r o r of item c a l i b r a t i o n , and on the e f f e c t s of item i m p r e c i s i o n on the measurement of person a b i l i t i e s . , The p r e v i o u s l y c i t e d study of the NWEA ( F o r s t e r e t a l . , undated(b)) on adequate sample s i z e fcund t h a t a sample s i z e of 200 provided n e a r l y { as accu r a t e i n f o r m a t i o n as a sample s i z e of 300. As a consequence the NWEA now uses 200 to 300 students i n f i e l d t e s t i n g new items f o r the NWEA item bank. ' 79 CHAPTER I I I DESIGN OF THE STODY The purpose of the study was to apply the Easch model to measure change i n a r i t h m e t i c achievement at the end of Grade 7 i n the province of B r i t i s h Columbia., The procedure f o r making comparisons was devised u s i n g the data a v a i l a b l e frcm p r e v i o u s t e s t i n g programs i n 1964 and 1970, and then a p p l i e d to data obtained from a sample s e l e c t e d and t e s t e d i n 1S79, The e s s e n t i a l element i n the a n a l y s i s of change was the d i f f i c u l t y of the t e s t items as e s t a b l i s h e d using the computer program BICAL (Wright 8 Mead, 1978). The item d i f f i c u l t i e s were used, i n t u r n , t o e s t a b l i s h summary s t a t i s t i c s f o r blocks of items grouped so as to provide measures c f performance on p a r t i c u l a r t o p i c s w i t h i n the, elementary mathematics c u r r i c u l u m . Sampling Procedures In March, 1964, the B r i t i s h Columbia Department of Education administered the S t a n f o r d Achievement T e s t , Advanced B a t t e r y , P a r t i a l , Form L, to a l l students i n Grade 7. A 80 random sample of one hundred t e s t b o o k l e t s was s e l e c t e d at t h a t time from each of the top, middle, and lower t h i r d cf the d i s t r i b u t i o n of t o t a l s c o r e s . In May, 1970, the Department administered the two a r i t h m e t i c t e s t s from the same l a t t e r y to a l l Grade 7 students i n B r i t i s h Columbia. A random sample of 300 papers was s e l e c t e d i n a manner s i m i l a r t o t h a t of 1964. In 1979, i t was c l e a r t h a t the r e p r e s e n t a t i v e n e s s of the 1964 and 1970 samples c o u l d not be r e p l i c a t e d . The procedures of the p r e v i o u s years r e s u l t e d i n samples t r u l y r e f l e c t i n g i n d i v i d u a l achievement throughout the province. The f u l l a u t h o r i t y and resources of the Department of Education were used i n the e a r l i e r p e r i o d s , while l i m i t e d f unding and r e l i a n c e upon v o l u n t a r y c o o p e r a t i o n of persons i n the f i e l d were c h a r a c t e r i s t i c of the data c o l l e c t i o n i n 1979. Nevertheless, a procedure was e s t a b l i s h e d which, i t was f e l t , r e s u l t e d i n a c a l i b r a t i o n sample s u f f i c i e n t l y r e p r e s e n t a t i v e of the achievement of the p o p u l a t i o n of Grade 7 students f o r purposes of comparison. In order to produce item c a l i b r a t i o n s and estimates o f a b i l i t i e s with standard e r r o r s comparable to those of p r e v i o u s years, a c a l i b r a t i o n sample s i z e of 300, the same as i n p r evious years, was opted f o r i n the 1979 t e s t i n g . In order to make the s e l e c t i o n of s u b j e c t s as s i m i l a r to previous years as p o s s i b l e , i t was necessary t o s e l e c t these 300 s u b j e c t s from a l a r g e r sample u s i n g the same s t r a t i f i c a t i o n as b e f o r e . The s i z e of the l a r g e r sample was d i c t a t e d by the f i n a n c i a l and c l e r i c a l r esources a v a i l a b l e t c the r e s e a r c h e r . T h i s was set at 1500 students: j u s t under four percent of the 81 approximately 40 000 students i n the Grade 7 p o p u l a t i o n . . I d e a l l y , the l a r g e r sample would have c o n s i s t e d of 1500 students drawn randomly from the p o p u l a t i o n . T h i s was judged to be i m p r a c t i c a l , r e q u i r i n g the t e s t i n g of a very s m a l l number of students i n each of a l a r g e number of classrooms.. There were two e s s e n t i a l f a c t o r s to c o n s i d e r i n s e l e c t i n g a r e p r e s e n t a t i v e group of students, , The f i r s t was r e p r e s e n t a t i v e n e s s c f t e a c h i n g p r a c t i c e ; the second, r e p r e s e n t a t i v e n e s s of student a b i l i t y . . In the former case, the important f a c t o r was the s c h o o l ; i n the l a t t e r , the student w i t h i n the s c h o o l . Hence i t was decided to c o n s t r u c t the sample i n two stages: (1) by s e c u r i n g a r e p r e s e n t a t i v e sample of s c h o o l s t o y i e l d r e s u l t s cn approximately 1500 students, and (2) by s e l e c t i n g a s t r a t i f i e d random sample of 300 students from the t o t a l sample. . The education system cf the province c o n s i s t s of 75 s c h o o l d i s t r i c t s . F o l l o w i n g contemporary M i n i s t r y of Education procedures f o r s e l e c t i n g samples of c l a s s e s from these d i s t r i c t s , two b l o c k i n g f a c t o r s were used: geographic l o c a t i o n and s i z e of s c h o o l . , The geographic r e g i o n s and the number i n the sample from each r e g i o n are.shewn i n Table 3.1. Within each r e g i o n , s c h o o l s were ranked i n order of t h e i r enrolment of Grade 7 students.. The number of Grade 7 c l a s s e s was estimated by d i v i d i n g the school enrolment i n Grade 7 by the average c l a s s s i z e f o r the d i s t r i c t i n which the s c h o o l was l o c a t e d . The average c l a s s s i z e f o r the region was estimated by d i v i d i n g the t o t a l number of students by the estimated number of classrooms i n the r e g i o n . . F i n a l l y , the Table 3. 1 82 1979 Sample by Geographic Regions Region Number of Estimated # Approximate Targetted # D i s t r i c t s o f Students % of T o t a l i n Sample 1 . South 16 5 900 14-9 223 Centre • 2- Greater 9 14 700 37.1 556 Vancouver 3- South 11 4 500 11,4 171 Mainland 4. South 13 6 800 17. 2 2 58 Coast 5- Southeast 12 2 400 6.1 91 6..North 14 5 300 13. 4 201 T o t a l 75 39 600 1 500 number of c l a s s e s r e q u i r e d f o r the sample was determined by d i v i d i n g the t a r g e t t e d number of students f o r the r e g i o n by t h e estimated average c l a s s s i z e f o r the r e g i o n . The r e s u l t was rounded up to the next whole number,. As an example of the f o r e g o i n g , f o r Region 1 the estimated number of classrooms was 285. The average c l a s s s i z e f o r the region was 5900/28 5=20. 7 . The t a r g e t t e d number of students i n the sample was 223, r e q u i r i n g 223/20-7=10.8, or 11 c l a s s e s r For each r e g i o n , the rank o r d e r i n g of s c h o o l s by enrolment was s e c t i o n e d i n t o s t r a t a equal i n number to the re q u i r e d number of c l a s s e s d i v i d e d by two,. Hence, i n the example, f i v e s t r a t a were d e f i n e d , with two c l a s s e s drawn from each stratum, except f o r the lowest from which t h r e e were s e l e c t e d . . Each stratum c o n t a i n e d 10Q%/5=20% of the students i n the r e g i o n . Since schools were ordered by enrolment, the top stratum c o n t a i n e d fewer schools than the bottom. . wi t h i n 83 each stratum the f i r s t c l a s s was randomly chosen and the second was l o c a t e d symmetrically w i t h i n the stratum with r e s p e c t to the f i r s t . T h i s procedure ensured t h a t the s e l e c t i o n of c l a s s e s from s c h o o l s was w e l l d i s t r i b u t e d both across the e n t i r e enrclment range and w i t h i n each stratum. The r e s u l t i n g sample comprised 65 c l a s s e s i n 61 s c h o o l s l o c a t e d i n 35 d i s t r i c t s . Data C o l l e c t i o n E d u c a t i o n a l p o l i c y and funding f c r each s c h o o l d i s t r i c t i n the province i s determined by a d i s t r i c t school board, and the d i s t r i c t superintendent, r e s p o n s i b l e to the board, i s charged with a d m i n i s t e r i n g the a f f a i r s cf the d i s t r i c t . . In l a t e March, 1979, !a l e t t e r was sent to the i superintendent of each d i s t r i c t , r e q u e s t i n g permission to c o n t a c t the p r i n c i p a l s of the s c h o o l s i n the sample (see Appendix D). I t was emphasized that the study was designed to i n v e s t i g a t e performance on a province-wide b a s i s and that s t r i c t c o n f i d e n t i a l i t y with r e s p e c t to s t u d e n t s , s c h o o l s , and i : d i s t r i c t s would be maintained.. Permission was granted by 30 cf the 35 s u p e r i n t e n d e n t s . . In one case, the one s c h o o l s e l e c t e d i n the d i s t r i c t had no Grade 7 students e n r c l l e d . In another, a p p l i c a t i o n forms t o conduct r e s e a r c h had to be f i l l e d out by the r e s e a r c h e r , and* p e r m i s s i o n was u l t i m a t e l y denied because of the short advance n o t i f i c a t i o n . . In two other cases, the p r o j e c t c o n f l i c t e d with d i s t r i c t - w i d e t e s t i n g programs and consent was t h e r e f o r e w i t h h e l d . In the f i f t h c a s e , c o n t a c t with the superintendent was not maintained due 84 to h i s attendance at a lengthy conference. . As a r e s u l t , the sample was reduced to 52 c l a s s e s i n 49 s c h o o l s . . In m i d - A p r i l , a t e s t package (see Appendix D) was sent to the p r i n c i p a l of each s c h o o l . Each package co n t a i n e d a c o v e r i n g l e t t e r t o the p r i n c i p a l e x p l a i n i n g the purpose of the study and asking f o r h i s or her c o o p e r a t i o n , a l e t t e r to the t e a c h e r / t e s t a d m i n i s t r a t o r asking t h a t he or she a d m i n i s t e r the t e s t s t c the Grade 7 stu d e n t s , a s e t of d e t a i l e d d i r e c t i o n s f o r a d m i n i s t e r i n g the t e s t s , f o r t y c o p i e s of each of the reasoning and computation t e s t s f o r each c l a s s s e l e c t e d i n the s c h o o l , and a stamped, s e l f - a d d r e s s e d envelope i n which to r e t u r n the completed t e s t s . , P r i n c i p a l s were asked to have the t e s t s administered i n the period between A p r i l 23 and May 4. . They were a l s o asked to s e l e c t randomly the r e q u i r e d number of c l a s s e s (one or two) i f there were more than t h i s number i n the s c h o o l . By mid-May responses had been r e c e i v e d from a l l the I s c h o o l s i n the sample. Forty-seven of the 49 p r i n c i p a l s cooperated i n the study. One p r i n c i p a l f e l t t h a t the t e s t s would qive r i s e to too much student a n x i e t y because cf the number of items c o n t a i n i n q i m p e r i a l u n i t s cf measure., The s t a f f of the second 'school had a l r e a d y been i n v o l v e d i n another d o c t o r a l r e s e a r c h study, and f e l t t h a t t h e i r students were being o v e r - t e s t e d . , The end r e s u l t was the r e t u r n of 1277 i completed papers from 50 classrooms a c r o s s the province. The 1277 r e t u r n e d papers were marked by hand and ordered by t o t a l s c o r e . One hundred papers were s e l e c t e d at random from each of the top 426 papers, the middle 425 papers. 85 and the bottom 426 papers. V e r i f i c a t i o n of the Data For the years 1964 and 1970 item responses from each of the 300 t e s t b o o k l e t s were coded by the author onto o p t i c a l mark read (CMS) c a r d s . Responses were coded 1,2,3,4,5 acco r d i n g t o the response s e l e c t e d . . The response was coded zero (0) i f the item was l e f t unanswered or i f i t was double marked. In some cases items were double marked but ether n o t a t i o n s on the booklet made i t c l e a r which response the student wished to have counted. . In such cases the i n d i c a t e d response was accepted. The OMR cards were then read i n t o a computer f i l e . In 1964 and 1970 the Department cf Education had hand-prepared master summary sijeets f o r the 300 t e s t s i n the !j form of an examinee by item matrix. E n t r i e s were 1 (one) i f the c o r r e c t answer was given and blank i f not. In order to ensure c o m p a r a b i l i t y of the rescored items with the Department's s c o r i n g , a FORTRAN program was w r i t t e n to transform the data i n the computer f i l e i n t o a s i m i l a r examinee by item matrix of 1's and Q's. A comparison of t h i s output with the Department's t a l l y s h e e t s , and r e f e r r a l to the o r i g i n a l t e s t b o o k l e t s , served t o v e r i f y the c o r r e c t / i n c o r r e c t matrix. . Since the subseguent ' a n a l y s i s used only t h i s i n f o r m a t i o n , i t was not considered necessary t o v e r i f y the responses to items i n c o r r e c t l y answered.. The 300 papers s e l e c t e d i n the 1979 sample were i n d i v i d u a l l y examined and c o r r e c t e d f o r anomalies i n the 86 s e l e c t i o n of responses. The responses, a l p h a l e t i c a l as on the t e s t papers, were commercially keypunched d i r e c t l y from the t e s t papers onto c a r d s . , The keypunch operator v e r i f i e d the e n t r i e s by keypunching the e n t i r e s e t of bo o k l e t s twice, thereby ensuring agreement.. A f u r t h e r check was made on the accuracy of the keypunching by randomly s e l e c t i n g a 10% sample f c r comparison with the o r i g i n a l t e s t papers. A s m a l l computer program was w r i t t e n t o transform the a l p h a b e t i c a l l y coded responses i n t o the numerical values of 0,1,2,3,4,5 as f c r p r evious years. BIG AL Estimates o f item d i f f i c u l t i e s and person a b i l i t i e s were obtained using the computer program EICAL (Wright & Mead, 1S78). A d e s c r i p t i o n of the components cf the program and an annotated example of output, are contained i n Appendix C. . The a l g o r i t h m used by BICAL i s t h e . u n c o n d i t i o n a l maximum l i k e l i h o o d procedure (OCON) and c o n s i s t s of the f o l l o w i n g s teps: (1) Determine the number o f c o r r e c t responses f o r each item, s ( i ) , and the number of persons, n ( r ) , at each score, r . (2) E d i t t he data t o remove items cn which zero or p e r f e c t s c o r e s were achieved, that i s , f o r which s (i) eguals I zero or N*, the number of persons i n the sample. E d i t the data t o remove persons who achieved zero cr p e r f e c t s c o r e s , t h a t i s , f o r whom r eguals zero or K*, the number of items on the t e s t . Let N and K be thje number c f persons and items remaining, r e s p e c t i v e l y . 87 (3) For each raw s c o r e , r , assume a c o r r e s p o n d i n g i n i t i a l a b i l i t y e s t i m a t e , a(r)<>, such t h a t a (r) °=lnj_ r / ( K - r ) i j . (4) F o r each i t e m , i , assume a c o r r e s p o n d i n g i n i t i a l d i f f i c u l t y e s t i m a t e , d(i)«, such t h a t d ( i ) o=ln£ (N-s ( i ) )/s (i) J. . (5) C e n t r e t h e s e t of i t e m d i f f i c u l t i e s a t z e r o by s u b t r a c t i n g t h e mean of t h e K i t e m d i f f i c u l t i e s from each i t e m d i f f i c u l t y . (6) Through i t e r a t i o n , determine a r e v i s e d s e t o f it e m d i f f i c u l t i e s , d ( i ) , by u s i n g Newton's method t o s o l v e each of N maximum l i k e l i h o o d equations.„ (7) Through i t e r a t i o n , and u s i n g the r e v i s e d s e t o f item d i f f i c u l t i e s , d e termine a r e v i s e d s e t of person a b i l i t i e s , a (r) , by u s i n g Newton's method; t c s o l v e each of K maximum l i k e l i h o o d e q u a t i o n s . . (8) Eepeat s t e p s 5, 6, and 7 u n t i l s t a b l e v a l u e s f o r i t e m d i f f i c u l t i e s , d ( i ) , are o b t a i n e d , (9) C o r r e c t f o r b i a s by m u l t i p l y i n g each item d i f f i c u l t y , d ( i ) , by (K-1)/K. (10) Determine person a b i l i t i e s , a (r) , f o r each raw s c o r e , r , u s i n g the un b i a s e d i t e m d i f f i c u l t i e s , d ( i ) , d e termined i n s t e p 9. (11) C o r r e c t f o r b i a s by m u l t i p l y i n g each person a b i l i t y , a ( r ) , by ( K - 2 ) / ( K - 1 ) . . (12) Determine the a s y m p t o t i c s t a n d a r d e r r o r f o r each d ( i ) and a(r) from the second d e r i v a t i v e o f t h e l o g l i k e l i h o o d f u n c t i o n . S e v e r a l f i t s t a t i s t i c s , and an i n d e x of d i s c r i m i n a t i o n , are det e r m i n e d f o r each i t e m . The 88 i n t e r p r e t a t i o n of each of these i s i l l u s t r a t e d i n the sample output i n Appendix C.. I t w i l l be r e c a l l e d t h a t the Easch model does not i n c l u d e a "guessing" parameter f o r m u l t i p l e - c h o i c e guestions.. Va r i o u s s u g g e s t i o n s have been made t o allow f o r t h i s f a c t when c a l i b r a t i n g the items. .. Waller (1975) suggested removing responses f o r items too d i f f i c u l t f o r examinees. , Wright and Mead (1978) suggested a c c e p t i n g only s c o r e s of examinees somewhat above -the guessing l e v e l . . The s u g g e s t i o n s f o r e l i m i n a t i n g examinees are more p r a c t i c a l than those f o r e l i m i n a t i n g p a r t i c u l a r responses. „ Conseguently, Panchapakesan's reccmmendation o u t l i n e d i n Chapter I I was used to e l i m i n a t e examinees from the item c a l i b r a t i o n procedure.. That procedure e s t a b l i s h e s the score r below which examinees are e l i m i n a t e d , such that k(m-1) m2 where k i s the number of items, and m i s the number o f a l t e r n a t i v e s per item. As an example, i f a l l 89 items were to be c a l i b r a t e d , the use c f t h i s c r i t e r i o n wculd l e a d t o the f o l l o w i n g d e c i s i o n s : (1) Seasoning t e s t : For items 1 t c 30 there are f i v e a l t e r n a t i v e s , y i e l d i n g r = 10.38. For items 31 to 45. there are f o u r a l t e r n a t i v e s , y i e l d i n g r = 7,10. T o t a l r f o r the t e s t i s 17.48. . Therefore e l i m i n a t e s c o r e s l e s s than 18,. (2) Computation t e s t : For a l l items t h e r e are f i v e a l t e r n a t i v e s , y i e l d i n g r = 14.11.., Ther e f o r e e l i m i n a t e scores 89 l e s s than 15. (3) T o t a l t e s t : The sum o f the r ' s i s 31.59, . Therefore e l i m i n a t e s c o r e s l e s s than 32. E d i t i n g the Data The D e l e t i o n of Persons For r e l i a b l e e s t i m a t i o n of a s t u d e n t 1 s mathematical a b i l i t y i t i s necessary t h a t the student have s u f f i c i e n t time to respond to a l l items on a t e s t . , That i s , the t e s t should be one of power r a t h e r than of speed.. Speed i s a f a c t o r which both complicates the model and confounds the i n t e r p r e t a t i o n of the r e s u l t s . The authors of the Stanford Achievement T e s t s s t a t e t h a t the time l i m i t s are generous and p r a c t i c a l l y a l l p u p i l s should have s u f f i c i e n t time t o attempt a l l the q u e s t i o n s ( K e l l e y et a l . , 1953, p.,2).. N e v e r t h e l e s s , i n a p r e l i m i n a r y i n s p e c t i o n of the 1964 and 1970 data, b l o c k s of unanswered items i n d i c a t e d t h a t a number of i n d i v i d u a l s probably d i d not have time t o f i n i s h . I t was necessary to e s t a b l i s h a c r i t e r i o n f o r d e l e t i n g persons who, i t was suspected, simply d i d not have time t o complete the t e s t s . For the measurement o f a r i t h m e t i c s k i l l s there were t h r e e timed s e c t i o n s : two on the reasoning t e s t , and the computation t e s t i t s e l f . I t was assumed t h a t students answered the q u e s t i o n s i n the order i n which the q u e s t i o n s were presented on the t e s t s , and t h a t o n l y those items a t the end of a timed p o r t i o n might have been l e f t blank because the 90 student d i d not have time to f i n i s h . I t was a l s o noted t h a t the sample s i z e of 300 was near the lower l i m i t f o r e f f e c t i v e c a l i b r a t i o n , and t h a t the use of a c u t o f f score f o r guessing would f u r t h e r reduce the sample, s i z e s . Hence, i t was necessary to balance the u n d e s i r a b i l i t y of l o s s of s u b j e c t s on which t o c a l i b r a t e items a g a i n s t the i n c l u s i o n of i n a p p r o p r i a t e s u b j e c t s . To t h i s end, the f o l l o w i n g d e c i s i o n r u l e was decided upon: i f , at the end of any of the three timed p o r t i o n s the s u b j e c t omit'ted at l e a s t ten items i n a row, t h a t person was de l e t e d frcm the data base. As a r e s u l t , i n the most severe case, the 1970 sample was reduced i n s i z e by four percent. IkS. De l e t i o n of Items The d e l e t i o n of persons i s made before the c a l i b r a t i o n process; the d e l e t i o n of items i s made a f t e r . D e c i s i o n s concerning the d e l e t i o n o f items t h e r e f o r e are aided by the use of s t a t i s t i c a l data which i n d i c a t e how well the items f i t the model. In p r i n c i p l e , such d e c i s i o n s should be e a s i e r t o defend than the r a t h e r a r b i t r a r y d e c i s i o n on the d e l e t i o n of persons. . In p r a c t i c e , t h i s i s not the case. , C r i t e r i a t o evaluate item f i t used by v a r i o u s r e s e a r c h e r s have been i d e n t i f i e d e a r l i e r . The most f r e q u e n t l y c i t e d c r i t e r i a were the chi - s q u a r e or mean square f i t , r e s i d u a l d i s c r i m i n a t i o n v a l u e s , and p o i n t - b i s e r i a l c c r r e l a t i o n s . No commonly accepted combination of c r i t e r i a or s i q n i f i c a n c e l e v e l s was i d e n t i f i e d i n the l i t e r a t u r e . The present study f e l l i n t o the Eentz and Eentz (1 978) category of 91 " r e j e c t - t h e - w o r s t " because only a l i m i t e d number cf items were a v a i l a b l e and maximum i n f o r m a t i o n was to be sought from the a n a l y s i s . T h e r e f o r e . a reasonably permissive c r i t e r i o n was s e t f o r i n c l u d i n g items i n the c a l i b r a t i o n process.. Items were deemed t o be n o n - f i t t i n g i f : (1) the mean square f i t exceeded u n i t y by four or more standard e r r o r s , and, (2) the d i s c r i m i n a t i o n index was l e s s than 0.70. T h i s c r i t e r i o n was e s t a b l i s h e d b a s i c a l l y because of the p r a c t i c a l demands of the study. On one hand i t was d e s i r a b l e t o e l i m i n a t e items which c l e a r l y f a i l e d t o f i t the model; on the other hand, the most important aspect of the a n a l y s i s l a y i n the comparison of groups c f items. I t was f e l t t h a t , i n a group, the presence of one or two items which f i t not as w e l l as t h e o r e t i c a l l y d e s i r a b l e would not a d v e r s e l y a f f e c t the comparisons. P r e l i m i n a r y a n a l y s i s i n d i c a t e d t h a t the c r i t e r i o n would e l i m i n a t e about ten percent of the items on each t e s t , and t h i s was judged to be a s a t i s f a c t o r y r e s o l u t i o n of the problem. T e s t i n g the U n i d i m e n s i o n a l i t y of the Two T e s t s I t was argued i n Chapter I I t h a t there was a need to regroup the items cn the reasoning t e s t and the computation t e s t . T h i s c o u l d be done meaningfully only i f the a b i l i t y u n d e r l y i n g each t e s t was the same. There i s seme evidence from other sources t h a t t h i s i s indeed the case. M e r r i f i e l d and Hummel-Rossi (1976) subjected the S t a n f o r d Achievement T e s t : High School B a s i c B a t t e r y , 1965 e d i t i o n , to f a c t o r 92 a n a l y s i s u s i n g the responses of a sample of 226 Grade 8 students on the nine t e s t s of the b a t t e r y . The three mathematics t e s t s were found t o l i e i n a compact c l u s t e r on one of two o b l i q u e f a c t o r s . . The authors suggested t h a t the a n a l y s i s i n d i c a t e d redundant i n f o r m a t i o n i n the data. Two procedures were used to t e s t the u n i d i m e n s i o n a l i t y of the t e s t s . , The f i r s t i s r e f e r r e d t c as the simple t e s t and the second as the s t r i c t t e s t . Eentz and Eentz (1978) suggest t h a t there are no separate adequate t e s t s f o r u n i d i m e n s i o n a l i t y . . They arque t h a t the t e s t o f f i t to the model w i l l t e l l whether the antecedent c o n d i t i o n s of the model have been met. They suggest t h a t one might s t i l l get good f i t on a s e t of mathematics items even though some appear to measure algebra and others t o measure a r i t h m e t i c , Wright and Panchapakesan (1969) s t a t e t h a t the f a i l u r e of any item t o f i t the model may be f o r two reasons: (1) the model i s too simple, or (2) the item measures a d i f f e r e n t a b i l i t y than the f i t t i n g items. Thus f i t to the model i s evidence t h a t a s e t of items measures a unidimensional a b i l i t y . Based on these arguments, the s i o p l e t e s t was to c a l i b r a t e the e n t i r e s e t of 89 items as a s i n g l e group f o r each o f the t h r e e a d m i n i s t r a t i o n s . I t was reasoned t h a t the u n i d i m e n s i o n a l i t y c o n d i t i o n c o u l d be assumed i f the items acted as a cohesive u n i t i n a l l three i n s t a n c e s with no c l e a r s e p a r a t i o n of n o n - f i t t i n g items along s u b t e s t l i n e s . In a d d i t i o n t o t h i s t e s t a much more s t r i n g e n t s t a t i s t i c a l t e s t was used. The b a s i c q u e s t i o n was: given two t e s t s A and B, how could one e m p i r i c a l l y decide whether they measure the same u n d e r l y i n g a b i l i t y ? A s t r c n g i n d i c a t i o n of u n i d i m e n s i o n a l i t y would be given i f , a f t e r a d m i n i s t e r i n g the two t e s t s t o a s i n g l e sample of persons, the rank o r d e r i n g of persons by raw s c o r e s was i d e n t i c a l on each t e s t . One would not expect equal person scores on the t e s t s because of p o s s i b l e d i f f e r e n c e s i n the mean d i f f i c u l t y o f items c o n s t i t u t i n g the two t e s t s . But^ the Easch model provides a measure f o r equating t e s t s under e x a c t l y t h i s c o n d i t i o n , that i s , i t c a r r i e s with i t the property of t e s t - f r e e person c a l i b r a t i o n . I f each person t a k i n g the t e s t s i s assigned the same Easch a b i l i t y score by each t e s t , the c o n d i t i o n of u n i d i m e n s i o n a l i t y may be assumed., Each t e s t was c a l i b r a t e d i ndependently, y i e l d i n g two estim a t e s of each person's a b i l i t y . . Each a b i l i t y estimate d e r i v e d from the computation t e s t was then a d j u s t e d by the d i f f e r e n c e between the mean sample a b i l i t i e s on the two t e s t s i n order t o make the a b i l i t y s c a l e s comparable (Panchapakesan, 1969, p. 168) . . Some e l a b o r a t i o n of t h i s procedure may be i n order. S i n c e each t e s t was c a l i b r a t e d independently, and was c e n t r e d a t the mean item d i f f i c u l t y f o r i t s own c o l l e c t i o n o f items, i t was not expected t h a t the mean a b i l i t y l e v e l on each t e s t would be the same., For example, a d i f f i c u l t s e t of items might show a mean a b i l i t y of -0,3, whereas an independently c a l i b r a t e d s e t of easy items might show the mean a b i l i t y of the same group of s u b j e c t s to be 0,2. A l i n e a r s h i f t c f 0.5 u n i t s a p p l i e d to each s u b j e c t on one of the t e s t s would b r i n g the a b i l i t y e s t i m a t e s onto the same s c a l e . Once two comparable a b i l i t y e s timates were obtained, the procedure suggested by Hashway (1977) f o r e v a l u a t i n g the s c o r e i n v a r i a n c e property was invoked.. A s t a n d a r d i z e d d i f f e r e n c e score f o r each examinee was determined and the d i s t r i b u t i o n of such s c o r e s across the sample was compared to the u n i t normal d i s t r i b u t i o n . The t e s t f 0 1 u n i t n o r m a l i t j was the Kclmogorov-Smirnov s t a t i s t i c . ; In view of the f a c t t h a t t h e d e s c r i b e d procedure r e a l l y t e s t e d two c o n d i t i o n s a t cnce--u n i d i m e n s i o n a l i t y and score i n v a r i a n c e — t h e p r o b a b i l i t y f o r r e j e c t i o n of the n u l l hypothesis was set at 0.01. I f the i h y p o t h e s i s of u n i t n o r m a l i t y were r e j e c t e d , the d i s t r i b u t i o n was t o be evaluated f o r shape. , I f i t proved to be l e p t o k u r t i c with v a r i a n c e l e s s than u n i t y , the c o n d i t i o n of s t r i c t u n i d i m e n s i o n a l i t y was to be assumed. , Summarizing the s t e p s , f o r each of 1964, 1970, and 1979, the f o l l o w i n g procedure was c a r r i e d out: (1) BICAI was run on the 45 items o f the r e a s o n i n g t e s t , with a minimum accept a b l e score c f 18, (2) BICAI was run on the 44 items of the computation t e s t , with a minimum acceptable score of 15. (3) The raw score f o r each person on each of the two t e s t s was converted to a Easch a b i l i t y score u s i n g the c o n v e r s i o n t a b l e i n the BICAL output. (4) Each person's computational a b i l i t y score was incremented by an amount determined by s u b t r a c t i n g the sample mean computation a b i l i t y from the sample mean reasoning 95 a b i l i t y . T h i s procedure adjusted the a b i l i t i e s t o a common a b i l i t y s c a l e . (5) A st a n d a r d i z e d d i f f e r e n c e score was determined f o r each person using the two a b i l i t y scores and the standard e r r o r f o r each as i n d i c a t e d i n the EICAL output.. The c a l c u l a t i o n was as f o l l o w s : D = £ a (E) - a (C) ]/ i/se (E) 2 + se (C) 2 where a (E) = the examinee.' s reasoning a b i l i t y , a (C) = the examinee's adjusted computation a b i l i t y , se (E) = the standard e r r o r a s s o c i a t e d with a ( E ) , se (C) = the standard e r r o r a s s o c i a t e d with a (G). j (6) The d i s t r i b u t i o n of standardized d i f f e r e n c e scores i was t e s t e d f o r u n i t n o r m a l i t y using the Kolmogorov-Smirnov s t a t i s t i c . T e s t i n g the Changes.in Item D i f f i c u l t y Once the u n i d i m e n s i o n a l i t y assumption was j u s t i f i e d , the item d i f f i c u l t y values used were those determined by c a l i b r a t i n g the e n t i r e c o l l e c t i o n of items as a s i n g l e t e s t . The f i t c r i t e r i a c i t e d i n a p r e v i o u s s e c t i o n were a p p l i e d to each o f the 1964, 1970, and 1979 c a l i b r a t i o n s . . Items which f a i l e d to meet these c r i t e r i a on at l e a s t two of the three a d m i n i s t r a t i o n s were d e l e t e d from the a n a l y s i s . . The data f o r each year were r e c a l i b r a t e d a f t e r removing the n o n - f i t t i n g items.. The process was repeated u n t i l no n o n - f i t t i n g items were common to two of the t h r e e samples,. . Item d i f f i c u l t i e s 96 used i n the a n a l y s i s were those c a l i b r a t e d on the f i n a l run. Comparisons of item d i f f i c u l t i e s were based on the Easch d i f f i c u l t y parameters, having the l o g i t as the u n i t of measure f o r both item d i f f i c u l t i e s and person a b i l i t i e s . The d i f f e r e n c e between t r a d i t i o n a l comparisons and Easch item d i f f i c u l t y comparisons can be c l a r i f i e d by r e f e r r i n g t o the top p a r t of Figure 3.1, i n which i t i s assumed t h a t 89 items remain i n the c a l i b r a t i o n . They are shown from l e f t t o r i g h t i n order of i n c r e a s i n g d i f f i c u l t y on a h y p o t h e t i c a l 1964 a d m i n i s t r a t i o n . . Suppose t h a t the i n c r e a s e i n percentage d i f f i c u l t y , t h a t i s , i n the p r o p o r t i o n of people i n c o r r e c t l y answering the.item, i s l i n e a r a c r o s s the sample i n 1964. The mean d i f f i c u l t y i s 30%.. Now suppose t h a t a h y p o t h e t i c a l 1970 sample i s unif o r m l y worse i n terms of raw scores a c r o s s a l l items. The mean d i f f i c u l t y i s now 50%. , Hence, the d i f f e r e n c e i n mean d i f f i c u l t y i s 20%. When these data are analyzed by the Easch model the d i s t r i b u t i o n of item d i f f i c u l t i e s i n each case i s represented by the same curve, shown i n the middle p a r t of F i g u r e 3.1, as each independent c a l i b r a t i o n i s centred a t the mean d i f f i c u l t y l e v e l , with a value of zero. The d i f f e r e n c e i n mean d i f f i c u l t y w i l l show up i n the Easch a n a l y s i s as a d i f f e r e n c e i n the mean a b i l i t i e s of the two groups s i n c e the two parameters are determined on a common s c a l e . The mean a b i l i t y o f the 1964 sample would be about -0.9 l o g i t s , and t h a t cf the 1970 sample, 0.0 l o g i t s . To see, g r a p h i c a l l y , how the item d i f f i c u l t i e s d i f f e r , the superimposed c a l i b r a t i o n curves can be separated 100r +-> u C9 5-o (J C 50 0' 3 > 2\ 1 0 Q 05 o:_3 ±L 3 .y ^ p s— to Q 03 1 _Q 1 u in 0 to 03 cr -^ r -1 CD tr) D TJ < -3 ^ ^ ^ ^ ^ ^ ^ 1 45 Items (easy to difficult) 89 i > i , ^ ^ ^ 4 ^ I t < 2 m s 89 ^ ^ ^ ^ >^ Items 89 F i g u r e 3.1. Two h y p o t h e t i c a l d i s t r i b u t i o n s of t r a d i t i o n a l and Easch item d i f f i c u l t i e s 98 and s h i f t e d to the l o c a t i o n s i n d i c a t e d by the mean a b i l i t i e s . . The separated curves would be as shown i n the lower p a r t of F i g u r e 3.1. Once independent estimates have been e s t a b l i s h e d f o r two a d m i n i s t r a t i o n s , two types c f comparisons can be made. In the f i r s t case, by the item d i f f i c u l t y i n v a r i a n c e property of the model, i t i s expected, w i t h i n the l i m i t s of random e r r o r , t h a t the two d i f f i c u l t y e s t i m a t e s f o r an item w i l l be equal., I f t h i s e x p e c t a t i o n i s not met i t can be concluded that changes i n item d i f f i c u l t y have occurred r e l a t i v e to each ' i other . Suppose i n F i g u r e 3.2 t h a t the 1964 t e s t i n g s i t u a t i o n i s the same as i n F i g u r e 3.1. Suppose again t h a t the mean 1970 percent d i f f i c u l t y i s 505L , T h i s time, however, because of changed c u r r i c u l u m emphasis or t e a c h i n g p r a c t i c e , i n c r e a s e s i n item d i f f i c u l t i e s are not uniform, and the i r r e g u l a r l i n e r e p r e s e n t s the graph of the 19 70 item d i f f i c u l t i e s . . T h i s i r r e g u l a r i t y w i l l be r e f l e c t e d i n the Easch item d i f f i c u l t i e s , as shown by the i r r e g u l a r curve superimposed on the r e g u l a r 1964 curve. Thus, comparison of item d i f f i c u l t y based on the unadjusted Easch e s t i m a t e s y i e l d t i n f o r m a t i o n on change i n d i f f i c u l t y w i t h i n the s e t cf items, r e l a t i v e t o the mean d i f f i c u l t y l e v e l of the item group. The v e r t i c a l segments show the magnitudes being t e s t e d . The second type of comparison which can be c a r r i e d out i s t h a t of ab s o l u t e d i f f i c u l t y a c r o s s time,. I n each year the sample represented the o v e r a l l d i s t r i b u t i o n of both c u r r i c u l u m coverage and mathematical a b i l i t y a c r o s s the 99 100 F i g u r e 3.2. The t e s t i n g of change i n the r e l a t i v e d i f f i c u l t y of items. p r o v i n c e . E e p r e s e n t a t i v e n e s s of c u r r i c u l u m coverage i s the key to the f i r s t comparison o f item d i f f i c u l t y w i t h i n the aggregrate o f items. R e p r e s e n t a t i v e n e s s of mathematical a b i l i t y i s the key to the second comparison. In the second case the d i f f e r e n c e i n mean a b i l i t y of the 1964 and 1970 samples provides an estimate o f the d i f f e r e n c e i n the o r i g i n of the d i f f i c u l t y s c a l e s . By a d j u s t i n g each item d i f f i c u l t y on one of the a d m i n i s t r a t i o n s by t h i s amount, the item d i f f i c u l t i e s are placed on a common s c a l e . Again, i t i s expected, w i t h i n the l i m i t s of random e r r o r , t h a t the d i f f i c u l t y e s t i m a t e s f o r each item w i l l be equal. As b e f o r e , the procedure may be i l l u s t r a t e d 100 g r a p h i c a l l y . R e f e r r i n g to F i g u r e 3.3, suppose t h a t the \ 100 F i g u r e 3.3. The t e s t i n g of change i n the ab s o l u t e d i f f i c u l t y o f items. t r a d i t i o n a l d i f f i c u l t y d i s t r i b u t i o n s f o r the two a d m i n i s t r a t i o n s a re the same as i n Figure 3.2. The adjustment procedure d e s c r i b e d f o r F i g u r e 3.1 i s now used t o separate the Basch c a l i b r a t i o n s by the amount of the d i f f e r e n c e i n the mean a b i l i t i e s . In the lower p o r t i o n o f the f i g u r e , i t i s again the magnitude of the v e r t i c a l segments which i s being t e s t e d f o r s i g n i f i c a n c e , but these segments now r e p r e s e n t absolute 101 change because of .the s h i f t e d l o c a t i o n of cne cf the c u r v e s . . For each type of comparison Hashway's (1977) procedure c o n s t i t u t e d an cmnibus t e s t of the e q u a l i t y of item d i f f i c u l t i e s . T h i s procedure p a r a l l e l e d t h a t o u t l i n e d In the p r e v i o u s s e c t i o n on t e s t i n g score i n v a r i a n c e . The Kclmogorov-Smirnov s t a t i s t i c was used to t e s t the assumption of u n i t n o r m a l i t y of the s t a n d a r d i z e d d i f f e r e n c e scores of item d i f f i c u l t y . In each a n a l y s i s r e j e c t i o n of the h y p o t h e s i s of equal item d i f f i c u l t i e s permitted a n a l y s i s of change f o r i n d i v i d u a l items. For the assessment of r e l a t i v e change, the standardized d i f f e r e n c e score f o r each item was determined by: D = £d(1) - d (2) %/ \/ s e C) 2 + se (2) ,2 where d(1) = item d i f f i c u l t y ; i n year 1, d(2) = item d i f f i c u l t y i n year 2, ; i se(1) = s t a n d a r d er.rj>r of item d i f f i c u l t y i n year 1, \ se(2) = standard e r r o r of item d i f f i c u l t y i n year 2.. ft p o s i t i v e value f o r D i n d i c a t e d t h at the item tended t c be r e l a t i v e l y more d i f f i c u l t i n year 1. For the assessment of a b s o l u t e change, the s t a n d a r d i z e d d i f f e r e n c e score f o r each item was c a l c u l a t e d i n a s i m i l a r f a s h i o n except t h a t the numerator of the f u n c t i o n was r e p l a c e d with d(1) - i.d(2).+.kjj, with k = m(1) - m (2) where m(1) = mean sample a b i l i t y i n year 1, and m(2) = mean sample a b i l i t y i n year 2. . The standard e r r o r s remained the same, as k was t r e a t e d as a 102 constant a p p l i c a b l e t o a l l values of d ( 2 ) . Again, a p o s i t i v e value f o r D i n d i c a t e d that the item tended t o be a b s o l u t e l y more d i f f i c u l t i n year 1. Items were deemed t o have changed i n d i f f i c u l t y i f the a b s o l u t e value of the s t a n d a r d i z e d d i f f e r e n c e score of the item was g r e a t e r than 2, T h i s procedure p a r a l l e l e d t h a t of the r e a n a l y s i s o u t l i n e d i n Chapter I, and allowed a comparison of c o n c l u s i o n s reached u s i n g the Easch approach as opposed to the t r a d i t i o n a l approach.. T e s t i n g Change i n Content Area D i f f i c u l t y As was i n d i c a t e d e a r l i e r i n the study t h e r e was a need f o r a r e p o r t i n g u n i t l a r g e r than the s i n g l e item. Such r e p o r t i n g u n i t s , or content areas, should be s m a l l enough to p r o v i d e u s e f u l c u r r i c u l a r i n f o r m a t i o n yet l a r g e enough to avoid the overwhelming i n f l u e n c e o f j u s t one or two widely f l u c t u a t i n g item d i f f i c u l t i e s . , T e n t a t i v e u n i t s , s u b j e c t to m o d i f i c a t i o n through the d e l e t i o n of n o n - f i t t i n g items, were e s t a b l i s h e d using the f o l l o w i n g procedure. Ten c a t e g o r i e s of content were proposed by the author. The author,.and two other persons experienced i n the theory and p r a c t i c e of t e a c h i n g mathematics, independently assigned items to the ten c a t e g o r i e s . . Where a l l three persohs agreed on the placement c f an item, t h a t d e s i g n a t i o n was f i n a l . Agreement was reached immediately on 60 of the 89 items. By broadening or narrowing the d e s c r i p t i v e t i t l e of f i v e c a t e g o r i e s , the remaining items were assigned. In a l l cases, assignment r e g u i r e d unanimous 103 agreement by the t h r e e judges. The content areas and the items assigned t c them are as f o l l o w s (B r e f e r s t o items on the reasoning t e s t ; C to items on the computation t e s t ) : 1. Whole number concepts and o p e r a t i o n s (11 items) B: 14, 31, 41 C: 1, 4, 5, 6, 7, 8, 9, 18 2. A p p l i c a t i o n s using whole numbers (9 items) E: 1, 2, 4, 6, 15, 22, 28, 32 C: 31 3. Common f r a c t i o n concepts and o p e r a t i o n s (12 items) R: 33, 34, 44 C: 10, 11, 12, 13, 15, 21, 22, 24, 36 4. A p p l i c a t i o n s using common f r a c t i o n s (8 items) B: 8, 9, 11, 13, 16, 19, 21, 23 5. , Decimals (11 items) , , E: 5, 17, 35, 37, 39 C: 2, 3, 16, 17, 20,32 6. Money (9 items) E:3, 7, 10, 18, 20, 29, 36, 43 C: 38 7. Percent (8 items) E: 25, 30, 42 C: 14, 19, 35, 41, 43 8. Elementary algebra (9 items) E: 12, 26, 38, 40 C: 23, 37, 39, 40, 42 j 104 9 . Geometry and graphing (8 items) E: 2 7 , 45 C: 2 5 , 2 6 , 2 9 , 3 0 , 3 4 , 44 10o U n i t s of measure (4 items) B: 24 C: 2 7 , 2 8 , 33 Change f o r each group was assessed by comparing the mean values of the c o n s t i t u e n t item d i f f i c u l t i e s on the two a d m i n i s t r a t i o n s . , Since the Rasch procedure placed estimated item d i f f i c u l t i e s on an equal i n t e r v a l s c a l e the c a l c u l a t i o n of the mean values was p s y c h c m e t r i c a l l y d e f e n s i b l e . Each d i f f i c u l t y value c o n t r i b u t i n g t o the mean had i t s own a s s o c i a t e d standard e r r o r of c a l i b r a t i o n . Each item was t r e a t e d as a s t o c h a s t i c a l l y independent u n i t as r e q u i r e d by the t h i r d assumption c f the model.. Hence, the variance of the sum of item d i f f i c u l t i e s was assumed equal t o the sum of the v a r i a n c e s , and t h e variance of the mean was determined by d i v i d i n g the sum of the va r i a n c e s by the number of items iaaking up the group. _ I f the a b s o l u t e v a l u e of the d i f f e r e n c e of the means was g r e a t e r than two standard e r r o r s of the d i f f e r e n c e of the mean, i t ' was concluded that, change had pcc u r r e d i n the o v e r a l l group d i f f i c u l t y , and i n the d i r e c t i o n (indicated by the means. Two types of comparisons, p a r a l l e l i n g those f o r i n d i v i d u a l items, were made. Unadjusted score comparisons y i e l d e d i n f o r m a t i o n on changing emphasis wi t h i n the c u r r i c u l u m . . Comparisons of sc o r e s a d j u s t e d f o r c a l i b r a t e d a b i l i t y d i f f e r e n c e s i n d i c a t e d i r e n d s a c r o s s time. 105 CHAPTER IV RESULTS The s t r u c t u r e of t h i s chapter p a r a l l e l s t h a t o f Chapter I I I . In Chapter I I I the procedures f o r g a t h e r i n g and a n a l y z i n g data were made e x p l i c i t ; i n the present chapter the r e s u l t s of those procedures a r e , g i v e n i n d e t a i l . In order to i n t e r p r e t and as s e s s the s i g n i f i c a n c e of the . r e s u l t s , c o n s i d e r a b l e d i s c u s s i o n i s i n c l u d e d i n t h i s c h a p ter. . However, a ge n e r a l d i s c u s s i o n of the, model and i t s s u i t a b i l i t y f o r measuring change i s reserved f o r Chapter V, the f i n a l chapter., V e r i f i c a t i o n o f Data For 1964 and 1970 a comparison was made o f the computer-generated matrix of 1*s and O's r e s u l t i n g frcm the r e s c o r i n g of the t e s t s , with the Department's t a l l y ..-.sheet of 1's and blanks. I t was found that the; markers i n 1964 had made 35 s c o r i n g e r r o r s (12 on one examinee), and two t a b u l a t i o n e r r o r s had been made on the t a l l y sheet, f o r an o v e r a l l e r r o r r a t e of 0.14%. . A check of the o r i g i n a l , b o o k l e t s r e v e a l e d t h a t a l l the s c o r i n g e r r o r s c o n s i s t e d of the acceptance o f the c o r r e c t response on a multiply-marked item. 106 A s i m i l a r comparison f o r 1970 showed a t o t a l c i s i x Departmental s c o r i n g e r r o r s . In 1979, the responses had been commercially keypunched onto computer cards!. A 10% random sample o f t h i r t y t e s t papers s e l e c t e d from the 1979 sample y i e l d e d no d i s c r e p a n c i e s between keypunched responses and a c t u a l responses. From these r e s u l t s i t was assumed t h a t the keypunching f i r m ' s guarantee of accuracy was confirmed., The D e l e t i o n cf Persons In order to e l i m i n a t e s u b j e c t s f o r whom the t e s t appeared to be too l o n g , i t had been decided t o remove those i n d i v i d u a l s who had omitted ten cr more items at the end of any of the t h r e e timed p o r t i o n s of the t e s t s . As a r e s u l t , f o u r persons were removed from the 1964 data base, twelve from t h a t of 1970, and twelve f o r 1979. . A l l subseguent a n a l y s e s were t h e r e f o r e based upon samples of 296, 288, and 288 f o r the years 1964, 1970, and 1979, r e s p e c t i v e l y . Summary Raw Score S t a t i s t i c s Summary s t a t i s t i c s f o r the raw scores i n the three samples are shown i n Table 4.1., The t e s t s contained 45 items cn the reasoning t e s t and 44 items on the computation t e s t . There i s a: c o n s i s t e n t d e c l i n e i n the mean score and a c o n s i s t e n t i n c r e a s e i n the v a r i a b i l i t y . The r e l i a b i l i t y of 107 the t e s t s i s c o n s i s t e n t l y high, j j i i Table 4.1 Summary Saw Score S t a t i s t i c s Seasoning Test Computation Test T o t a l T e s t Year '—-— : - — i — • - — Mean S.D..Eel 1 Mean .S.D. E e l 1 Mean S.D. . B e l 2 1964 31.31 6.39 0, 83 30,69 6„08 0,83 62.00 11,66 0,85 1S70 30.63 6.82 0.84 27.98 6.74 0.85 58-61 12.75 C.£7 1979 28.03 7. 95 0,88 25.03 6.84 0.84 ' 53.06 14.07 0.89 1 Hoyt estimate o f r e l i a b i l i t y 2 Cronbach's composite alpha T e s t s of U n i d i m e n s i o n a l i t y One of the key reguirements c f the study was to demonstrate t h a t the reasoning and computation t e s t s measured the same a b i l i t y . . Three procedures were used,, t o i n v e s t i g a t e I' !'! the u n i d i m e n s i o n a l i t y of the two t e s t s . As p a r t o f the i n i t i a l data a n a l y s i s , the Pearson product-moment c o r r e l a t i o n c o e f f i c i e n t , between reasoning and computation raw scores .sas determined f o r each s a a p l e . The values were 0.745, 0.769, and .0,809 f o r the years 1964, 1S70, and 1979, r e s p e c t i v e l y . C o r r e c t e d f o r . a t t e n u a t i o n , the c o e f f i c i e n t s were 0.898, 0.910, j and, 0.941, r e s p e c t i v e l y . Using a p p r o p r i a t e procedures (Glass S S t a n l e y , pp. 30 8-210; F o r s y t h & F e l d t , 1969), a l l six' c o e f f i c i e n t s were found t c be d i f f e r e n t from zero a t the 0.001 l e v e l of s i g n i f i c a n c e . Thus, s c o r e s achieved by i n d i v i d u a l s on the two t e s t s i n any given 108 year were h i g h l y p o s i t i v e l y c o r r e l a t e d . . I t must be noted that the c o r r e l a t i o n c o e f f i c i e n t s were based on raw s c o r e s r a t h e r than on Easch a b i l i t i e s . . . An i n s p e c t i o n of the sample t e s t c h a r a c t e r i s t i c curve i n Appendix C shows the transformation, cf raw scores to Easch a b i l i t y s c o res i n the i n t e r v a l from -2 t o +2 l o g i t s to be approximately l i n e a r . G e n e r a l l y , v i r t u a l l y no student a b i l i t i e s f e l l below -2 l o g i t s , and only about 10$ w,ere above +2 l o g i t s . Hence, i f Easch a b i l i t i e s were used i n s t e a d c f raw scores, the s i g n i f i c a n c e of " the c o r r e l a t i o n c o e f f i c i e n t s should not be m a t e r i a l l y d i m i n i s h e d . . The second t e s t f o r u n i d i m e n s i o n a l i t y c o n s i s t e d of examining the n o n - f i t t i n g items when a l l 89 items were c a l i b r a t e d as a s i n g l e group f o r each sample.. I t had been argued t h a t , i f items d i d not sho^ m i s f i t of predominantly r e a s o n i n g or computation items 1, i t could te assumed t h a t they measured the same a b i l i t y . The computer program BIC^AL was used to c a l i b r a t e the •I items i n each sample. A l l 89 items were, i n c l u d e d and,, i n each c a s e , the minimum score of 32 was used' to e l i m i n a t e examinees near the guessing l e v e l . In a l l three c a s e s , v i s u a l i n s p e c t i o n of the output of items ordered from best to worst f i t mean sguare showed no d i s c e r n i b l e s e p a r a t i o n between reasoning (E) and computation (C) items. . Figure. 4.1. shows the p a t t e r n of the worst f i t t i n g , 2 0 % of the items f o r , each year. Reading from l e f t to r i g h t the items become more i l l - f i t t i n g . , The seguence of reasoning and computation items appeared to be randomly ordered., 109 Year Worst F i t t i n g 2055 of. Items 1964 CCCCCBBBCBCBBBECCC 1970 BCRRRCRCCBERCRCCCB 1979 CBECCBB.CBBBCCCECCE Fi g u r e 4.1. Patterns of i l l - f i t t i n g items.. The outcome may have r e s u l t e d frcm the e q u a l numbers of items from each cateqory. Had, f o r example, the t e s t s c o n s i s t e d of 79 s t r a i g h t - f o r w a r d computations and 1C word problems, the word problems, may have shown lack of f i t as a u n i t . That i s , the reading, requirement i n word problems may have ordered examinees i n a d i f f e r e n t manner than the bulk of the remaining items. .. N e v e r t h e l e s s , f o r the t e s t s analyzed, no d i s t i n c t i o n was apparent between the two nominally d i f f e r e n t types of items. The r e s u l t s were accepted as a p r e l i m i n a r y demonstration of u n i d i m e n s i o n a l i t y . , The f i n a l t e s t f o r u n i d i m e n s i o n a l i t y c o n s i s t e d of comparing s t a n d a r d i z e d d i f f e r e n c e scores with the u n i t normal d i s t r i b u t i o n . . The f i r s t step- i n the process was the d e t e r m i n a t i o n of each person's a b i l i t y on the reasoning t e s t and the computation t e s t , Sablfe 4.2 shews the mean a b i l i t y e stimate on each t e s t f o r each year. Each person was then assigned, an a d j u s t e d d i f f e r e n c e score. For example, i n 1970, the score was determined as: ,' j -^ r e a s o n i n g a b i l i t y ] - [computation a b i l i t y ] - 0.22. . F i n a l l y , each d i f f e r e n c e score was d i v i d e d by the pooled standard e r r o r f o r the two o r i g i n a l l y determined a b i l i t i e s . The t w o - t a i l e d 110 Table 4. 2 Wean A b i l i t y Estimates on the Reasoning and Computation T e s t s Test Year 1964 (> ; 1970 1979 Reasoning Computation 1, 17 1. 23 1.06 0,84 0.70 0.41 ( d i f f e r e n c e ) -0.06 0. 22 0. 29 p r o b a b i l i t y l e v e l f o r r e j e c t i o n of the n u l l hypothesis of u n i t n o r m a l i t y had been set at 0-01. The t e s t used was the Kclmogorov-Smirnov (K-S) goodness of f i t t e s t . . Table 4.3 shows the r e s u l t s f o r the three years.. Table 4.3 D i s t r i b u t i o n s of Standardized D i f f e r e n c e Scores on the Reasoning and Ccmputaticr T e s t s Year Mean S. D. . Skew Kurt K-S Z P 1964 0.02 i l . 18 0.04 -0.07 1.28 0.078 1970 0.02 1.20 -0.17 -0.41 1.42 0.035 1979 0.00 1.14 -0.25 -0. 16 1. 21 0. 108 I In no case was the skewness or k u r t o s i s d i f f e r e n t by more than two standard d e v i a t i o n s of each from zero. The main c o n t r i b u t i n g f a c t o r to the departure from n o r m a l i t y appeared t o be the standard d e v i a t i o n of the d i f f e r e n c e s c o r e s ; the g r e a t e r the divergence from u n i t y , the lower t h e , p r o b a b i l i t y 111 of u n i t n o r m ality.. Nevertheless, i n no case was the hypothesis of u n i t n o rmality r e j e c t e d at the predetermined 0,01 l e v e l of s i g n i f i c a n c e . . While the requirements of the study were met, the p r o b a b i l i t y l e v e l s were q u i t e low. ., S e v e r a l a l t e r n a t i v e procedures were f o l l o w e d to see i f the ccncordance between a b i l i t y e s t i m a t e s c o u l d be improved., In the f i r s t attempt a t improving the agreement between the a b i l i t y e s t i m a t e s , the n o n - f i t t i n g items on each t e s t i n 1964 were e l i m i n a t e d from the a n a l y s i s , In p a r t i c u l a r , Items 27, 29, and 45 were removed from the re a s o n i n g t e s t and Items 2, 5, 33, and 40 from the computation t e s t . The c r i t e r i o n f o r removal was a f i t mean square.of f o u r or more standard e r r o r s from u n i t y and a d i s c r i m i n a t i o n index l e s s than 0.70. The BICAL proqram was r e - r u n and s t a n d a r d i z e d d i f f e r e n c e of a b i l i t y s c ores were r e c a l c u l a t e d . . The p r o b a b i l i t y f o r r e j e c t i o n c f the hypothesis of u n i t n o r m a l i t y was 0.110, as compared with 0.078 i n the f i r s t instance... The same procedure c a r r i e d out on the 1970 data base r e s u l t e d i n the e l i m i n a t i o n of Items 4, 27, 29, 43, and 45 from the reasoning t e s t and Items 5, 33, 39, 40, and 43 from the computation t e s t . , The p r o b a b i l i t y f o r r e j e c t i o n o f the hypothesis of u n i t n o r m a l i t y was 0.034, as opposed to 0.035 i n the former a n a l y s i s . The r e s u l t s i n these two cases demonstrate the robustness of the a b i l i t y e s t i m a t e s . They tend to c o n f i r m the c o n c l u s i o n s reached i n Chapter I I t h a t r e c a l i b r a t i o n l i k e l y has l i t t l e e f f e c t on the e s t i m a t e s of i person a b i l i t i e s , Support f o r t h i s p o s i t i o n may be found i n 112 Wright and Douglas (1977b) where, i n simulated runs, random d i s t u r b a n c e s i n item c a l i b r a t i o n s c o u l d be as l a r g e as 1 l o g i t b e fore d i s t o r t i o n s i n esti m a t e s of a b i l i t i e s reached 0.1 l e g i t . A second a l t e r n a t i v e f o r improving the concordance between a b i l i t y e s t i m a t e s was e x p l o r e d . . Although the c a l i b r a t i o n procedure e l i m i n a t e s l o w - s c o r i n g examinees f o r purposes of item c a l i b r a t i o n , the a b i l i t y e s t i m a t e s are given f o r a l l persons i n the sample.. Thus the mean a b i l i t y estimate i n c l u d e s persons s c o r i n g below the chance l e v e l . I t seemed p l a u s i b l e t h a t the o v e r a l l K-S p r o b a b i l i t y should improve i f person scores at the chance l e v e l cn one or both t e s t s were excluded from the sample. The 1970 data base was e d i t e d to remove such, persons, and the mean a b i l i t y e s t i m a t e s were r e c a l c u l a t e d . As a r e s u l t , 15 persons were d e l e t e d from the sample, The r e a n a l y s i s on 273 s u b j e c t s produced a p r o b a b i l i t y f o r r e j e c t i n g the hypothesis of u n i t n o r m a l i t y of 0.050... I t was f e l t t h a t the improvement from 0.035 was not important, and the r e a n a l y s i s was not c a r r i e d out cn the remaining samples. The l a c k of s u b s t a n t i a l improvement may have been due, i n part, to the s m a l l number of examinees e l i m i n a t e d from the sample. Each of the t h r e e procedures tended to c o n f i r m the commonality of the a b i l i t y t r a i t on the two t e s t s . . T h i s r e s u l t , however, i s a p p l i c a b l e only t o the very s p e c i f i c s e t s of items assigned by the St a n f o r d Achievement Test developers to the c a t e g o r i e s " a r i t h m e t i c reasoning" and " a r i t h m e t i c computation", I t was pointed out i n Chapter I t h a t at l e a s t 113 one observer had noted an apparent o v e r l a p i n content. . The r e s u l t s may have been d i f f e r e n t had the two t e s t s contained items more t r u l y r e p r e s e n t a t i v e of the two d e s c r i p t o r s . In summary, the standardized d i f f e r e n c e scores appeared to be s t a b l e . Neither the d e l e t i o n of n o n - f i t t i n g items nor of persons s c o r i n g below the guessing l e v e l had a major e f f e c t on the d i s t r i b u t i o n o f s c o r e s . . The hypothesis of u n i t n o r m a l i t y was not r e j e c t e d at the 0.01 l e v e l of S i g n i f i c a n c e f o r any of the t h r e e samples. The assumption t h a t the a b i l i t i e s measured by the two t e s t s sere i n d i s t i n g u i s h a b l e was considered t o be upheld. Item C a l i b r a t i o n Having accepted the assumption of u n i d i m e n s i o n a l i t y , the two t e s t s were combined and t r e a t e d as a s i n g l e t e s t f o r the remainder of the a n a l y s i s . . F o r each year the minimum accepta b l e score was s e t a t 32. BICAL runs showed no zero s c o r e s and no p e r f e c t scores on any of the three samples. . The s i z e of each c a l i b r a t i o n sample i s shown i n Table 4.U.. The use of a c u t - o f f s c o r e t o e l i m i n a t e examinees s c o r i n g below the guessing l e v e l d i d not s e r i o u s l y a f f e c t the sample s i z e f o r c a l i b r a t i o n . _ G e n e r a l l y , the t e s t s were easy f o r the students i n each a d m i n i s t r a t i o n , producing few low raw s c o r e s . The most s e r i o u s case was the i n i t i a l c a l i b r a t i o n of t h e items i n 1979 when 25 out of the 288 examinees were removed. N e v e r t h e l e s s , t h i s accounted f o r l e s s than 10% of the sample. As Wright (1977b) p o i n t s out, the standard e r r o r 114 Table 4.4 Number of Subjects i n Each C a l i t r a t i o n Year T o t a l Number Subjec t s S c o r i n g Subjects i n o f S u b j e c t s L e s s Than 32 C a l i b r a t i o n 1964 296 4 292 1970 288 7 281 1979 288 25 263 of the item c a l i b r a t i o n i s dominated by the r e c i p r o c a l of the square r o o t of sample s i z e . . In t h i s i n s t a n c e , the minimum standard e r r o r i n the years 1964, 1970, and 1979 were 0.126, 0.129, and 0.133, based on samples cf 292, 281, and 263, respectively.„ Bounded to two decimal p l a c e s , the standard e r r o r s are i n d i s t i n g u i s h a b l e . The f i t mean square standard e r r o r was 0.08 i n 1S64, 0.08 i n 1970, and 0.09 i n 1979. The items f o r which the f i t mean square (FMS) was four or more standard e r r o r s g r e a t e r than u n i t y on the 1964, 1970, ,and 1979 analyses are shown i n Table 4.5 (B=reasoning, C=computation).. T h e i r d i s c r i m i n a t i o n values (Disc) are a l s o i n d i c a t e d i n the same t a b l e . The items marked with an a s t e r i s k (*) i n Table 4.5 demonstrated l a c k of f i t on two out of the three a d m i n i s t r a t i o n s . They accounted f o r 6 of the 8 items not meeting the f i t mean square c r i t e r i o n i n 1964, 6 out of 7 i n 1970, and 4 out of 5 i n 1979. E i g h t items i n t o t a l were deemed not to f i t the model. The c h a r a c t e r i s t i c s of these items are given i n Table 4.6. 115 Table 4.5 Items Not Meeting F i t Mean Square C r i t e r i o n 1964 1970 1979 Item Disc FMS • Item Disc] FMS Item Disc FMS C 5* 0. 47 1.32 E27*. 0. 11 1.32 E33 1,02 1.41 E27* 0. 15 1.35 C4 3* 0.49 1. 33 C4 3* 0, 28 1.48 E29* 0. 19 1.38 E45* 0. 1i1 1.34 C4 0* 0. 17 1.59 E45* 0. 18 1.40 C33* 0.08 ! 1. 37 C 5* 0,35 1.74 E 4* 0.59 1.43 C40* 0.4 9 1.40 E 4* 0, 53 1.7 4 C33* 0.07 1.45 C 7 6.4 9 1.42 C 2 0.49 1,45 B29* 0. 33 1.49 C 1 0. 79 1, 59 i : *: items showing l a c k of f i t on two* o f t h r e e a d m i n i s t r a t i o n s Table 4,6 C h a r a c t e r i s t i c s of N o n p i t t i n g Items Item 1964 1970 1979 D i f f Disc , FMS D i f f D isc FMS D i f f Disc FMS E 4 -0.92 0.59 1. 43 E27 0.56 0.15 1.35 E29 2.37 0. 19 1.38 E45 2.06 0.18 1.40 C 5 -1.33 0.47 1.32 C33 0.29 0.07 1.45 C40 3.33 0.50 1. 28 C43 2.60 0.98 1.05 1.28 0.69 1.16 0.93 6. 11 1. 32 2.44 0.33 1.49 2.06 0. 11 1. 34 1.04 0*55 1.12 0.61 0.08 1.37 3.20 0;49 1.40 2. 16 0.49 1. 33 •1 .34 0.53 1.74 0.38 0,45 1.24 2.07 0.58 1.26 2, 12 0.35 1.25 •1.48 0.35 1.74 0.07 0.09 1.30 2.93 0. 17 1.59 2.43 0. 28 1.48 a f t e r d e l e t i o n of the e i g h t items the s e t of remaining items was r e c a l i b r a t e d f o r each year using a minimum acc e p t a b l e score of 27. In those three years t h e r e were 4, 5, and 14 s u b j e c t s s c o r i n g belcw 27, r e s u l t i n g i n the number of s u b j e c t s i n the c a l i b r a t i o n of 292, 283, and 274, 116 r e s p e c t i v e l y . The f i t mean square standard e r r o r s remained unchanged. Two more items demonstrated l a c k cf f i t on two of the three c a l i b r a t i o n s as shown i n T a b l e 4.7.. Both items were of high d i f f i c u l t y and both demonstrated c o n s i s t e n t l a c k of f i t . Both were i n the 20% o f the o r i g i n a l items having the h i g h e s t f i t mean sguare cn each a d m i n i s t r a t i o n . Table' 4,7 N o n - F i t t i n g Items on R e c a l i b r a t i o n 1964 1970 1979 Item — - • D i f f D isc FMS D i f f D i s c FMS D i f f Disc FMS R43 2.26 0.36 1.34 1.74 0.27 1.36 1.82 0.34 1.36 C39 2.53 0.59 1.28 2.20 6.44 1.33 1.95 0.44 1.38 The two n o n - f i t t i n g items were de l e t e d from the a n a l y s i s and t h e remaining 79 items were r e c a l i b r a t e d , The minimum a c c e p t a b l e score was s e t at 26. T h i s r e s u l t e d i n the d e l e t i o n of 3 , 5 , and 13 s u b j e c t s , l e a v i n g 293, 283, and 275 i n the c a l i b r a t i o n sample f c r each year. F i t mean sguare standard e r r o r s were unchanged.. The i l l - f i t t i n g items f o r each year are shown i n Table 4,. 8, The only item showing l a c k of f i t on two a d m i n i s t r a t i o n s was C13. In eac h ! i n s t a n c e the f i t mean sguare was c l o s e to the c r i t i c a l j v a l u e . , In 1970 and 1979, the c r i t i c a l f i t mean square values were 1.32 and 1.36; the values f o r Item C13 were 1.32 and .1.37. I t was co n s i d e r e d l i k e l y Item 117 Table 4.8 I l l - F i t t i n g Items i n F i n a l C a l i b r a t i o n 1964 1970 1979 D i f f Disc FMS D i f f D i s c FMS D i f f D isc FKS E 8 -0.75 0,64 1.42 E28 2.12 0.14 1. 37 E33 -1.47 0.55 1.48 C 2 -1. 12 0.60 1.92 C 4 -3.20 0.61 1. 33 C 7 -0.84 0.51 1,48 C13 1.03 0.39 1.32 0.17 0.51 1.37 C21 -1.25 0.67 1.39 t h a t the d e l e t i o n of t h i s item would b r i n g about only marginal improvement i n c a l i b r a t i o n , and the d e l e t i o n process was stopped. . In summary, the ten Items d e l e t e d were E4, B27, E29, E43, E45, and C5, C33, C39, C40, C43. : The value of a two-part c r i t e r i o n f o r a s s e s s i n g the f i t of items to the model was borne out by an i n s p e c t i o n of T a b l e 4.5., Items were designated as n o n - f i t t i n g i f , on tfco of the three a d m i n i s t r a t i o n s , they demonstrated a f i t mean sguare f o u r or more standard e r r o r s g r e a t e r than u n i t y , and a d i s c r i m i n a t i o n index l e s s than 0.7., In Table 4.5, i t can be seen t h a t the same.eight items would have been deemed non-f i t t i n g had only the f i t mean sguare c r i t e r i o n been used.. T h i s i s e x p l a i n e d by the high n e g a t i v e c o r r e l a t i o n between f i t mean sguare and d i s c r i m i n a t i o n (1964: -0.66; 1970: -0.90; 1979: -0.83)., However, had such a s i n g l e c r i t e r i o n been adopted, on r e c a l i b r a t i o n , three more items i n c l u d i n g the two i d e l e t e d using the two-part c r i t e r i o n , would have been deleted.,, 118 T h i s , i n t u r n , would l i k e l y have l e d to the d e l e t i o n of a f u r t h e r t h r e e items on the second , r e c a l i b r a t i o n , with more d e l e t i o n s p o s s i b l e . . Thus, the second p a r t of the c r i t e r i o n formed a v a l u a b l e check on the d e l e t i o n p r c c e s s . The problem of using a s i n g l e c r i t e r i o n was encountered by Fryman (1976), who adopted the c r i t e r i o n of a ch i - s g u a r e p r o b a b i l i t y l e s s than or equal to 0.05 f o r r e j e c t i o n o f an item. A p p l y i n g t h i s to an e x i s t i n g Mathematics Placement t e s t of 100 items, and using a computer program d i f f e r e n t from BICAL, he e l i m i n a t e d 13 items cn the f i r s t c a l i b r a t i o n , 18 on the seccnd, 12 on the t h i r d , 7 cn the f o u r t h , and 8 on the f i f t h . At t h i s p o i n t he stopped s i n c e h i s s t a t e d i n t e n t was to develop an instrument which c o u l d be completed i n a maximum of cne hour. , I n h i s c o n c l u s i o n , Fryman c i t e d K i f e r , who suggested using an a d d i t i o n a l c r i t e r i o n based on the slope ( d i s c r i m i n a t i o n ) . . That suggestion r e s u l t s i n a c r i t e r i o n s i m i l a r to the one used i n the present study. A u s e f u l s t a t i s t i c which might have g i v e n o b j e c t i v e evidence f o r the improvement i n the e s t i m a t e s of parameters a f t e r r e c a l i b r a t i o n i s the o v e r a l l c h i - s q u a r e s t a t i s t i c d e s c r i b e d i n Chapter I I . T h i s s t a t i s t i c compares the observed and expected values a c r o s s the e n t i r e raw score / i t e m matrix. In the documentation f o r the program'EICAL, Wright and Mead (1S78) i n d i c a t e how the s t a t i s t i c can be c o n s t r u c t e d . I t i s r e g r e t t a b l e t h a t they d i d not i n c l u d e . i t i n t h e i r program. The Easch model appears t o be s u i t a b l e f o r d e t e c t i n g t e s t items whose psychometric p r o p e r t i e s are suspect. With one e x c e p t i o n , the n o n - f i t t i n g items c o n s i s t e n t l y demonstrated 119 high f i t mean sguare and low discrimination. The only item which appeared to f i t well i n one year but not the others, was C43; in 1964, both f i t mean sguare and discrimination were quite acceptable. I t must be kept in mind that the c r i t e r i o n deleted items showinq generally poor test c h a r a c t e r i s t i c s on a l l three administrations,. There s t i l l remained i n the analysis those items showing lack of f i t cn a single administration. For example, i n 1964, Items E28, C2, C4 , and C21 did not meet f i t c r i t e r i a . In some cases the reason for the s i n g u l a r i t y of lack of f i t was evident.. Fcr example, in 1S64, Item C4 was answered co r r e c t l y by 97.3% of the examinees, providing l i t t l e scope f o r any discriminating power. On the other hand, for B28, a d i f f i c u l t item, there was evidence of consistent guessing across a l l three years, but only in 1964 did the figures exceed the c r i t e r i o n . Six of the ten deleted items were high d i f f i c u l t y items. The explanation of t h i s fact may l i e i n the tendency t of students to guess cn such items. An inspection of some of the deleted items yielded possible explanations for lack of f i t . Fcr example, item C33 was: Add 15 m. 8 cm, , 4 m,. 5 cm. One would expect the item to show l i t t l e discrimination between high a b i l i t y and low a b i l i t y students, since the correct answer could be obtained by adding the component parts, without any knowledge of the conversion f a c t o r from metres to centimetres. ,. Item C40, on the other hand, turned 120 out to be one of the two most d i f f i c u l t items on the t e s t . , I t r e g u i r e d the m u l t i p l i c a t i o n of + 4a and -3, an i n a p p r o p r i a t e item f o r Grade 7 students i n B r i t i s h Columbia, since o p e r a t i o n s on i n t e g r a l a l g e b r a i c e x p r e s s i o n s have never been pa r t of the c u r r i c u l u m at that l e v e l . R e s u l t s no b e t t e r than chance could be expected, and were o b t a i n e d . On the other hand, the reason f o r d e l e t i o n of Item B4 remains obscure. There appears to be no obvious flaw i n : "Dot's mother i s going t o buy tomato p l a n t s to s e t out. There are t o be 14 rows with 18 p l a n t s i n each row. How many p l a n t s w i l l be needed?" On the i n i t i a l c a l i b r a t i o n , : n o n e of the items showed l a c k of f i t on a l l three a d m i n i s t r a t i o n s . . Six of the e i g h t n o n - f i t t i n g items showed l a c k o f f i t on two s u c c e s s i v e a d m i n i s t r a t i o n s . T h i s may i n d i c a t e changing tre n d s i n c u r r i c u l u m r e l a t e d to each item. For example, C43, a d i f f i c u l t question on simple i n t e r e s t , shewed c o n s i s t e n t l y i n c r e a s i n g f i t mean square and d e c r e a s i n g d i s c r i m i n a t i o n a c r o s s time. T h i s may r e f l e c t a move away from t e a c h i n g t h i s t o p i c . F i n a l item d i f f i c u l t i e s g e n e r a l l y were l o c a t e d i n the i n t e r v a l from -3.0 t o +3.0 l o g i t s , . Only one or two items i n each c a l i b r a t i o n f e l l o u t s i d e these l i m i t s . T h i s was to be expected s i n c e the i n i t i a l d i f f i c u l t y e stimates were set at ln£%incorrect/%correct J . For a d i f f i c u l t y value cf +3.0, f o r example, approximately 95% of the responses would have to be i n c o r r e c t , and t h i s i s roughly the upper l i m i t f o r the usual s t a n d a r d i z e d t e s t . Item d i f f i c u l t i e s and t h e i r standard e r r o r s are shown i n Table 4.9, 121 Table 4.9 Item D i f f i c u l t i e s and Standard E r r o r s 1964 1970 1979 Item : D i f f Std Adj D i f f Std Adj D i f f Std Adj Err D i f f ' E r r D i f f E 1 -2.26 0.33 -2.26 -2.28 0.30 -2.03 E 2 -2.37 0.34 -2.37 -2,19 0.30 -1.94 E 3 -2.79 0.42 -2.79 -2.37 0. 31 -2. 12 E 5 -1.16 0.21 -1.16 -1.72 6.24 - U 4 7 E 6 -1.45 0.23 -1.45 -1.97 0.26 -1.72 E 7 -0.25 0.16 -0.25 -0.45 0.16 -0.20 E 8 -0.82 0.19 -0.82 -0.75 CU 17 -0.50 E 9 -0.66 0.18 -0.66 -0.38 0.16 - 6.13 E10 -0.32 0.16 -0. 32 -1.00 0.19 -0.75 £11 -1.30 0.22 -1.30 -1.03 0.19 -0,78 E12 -0.32 0.16 -0.32 -0.69 0.17-0.44 E13 -0.72 0.18 -0.72 -0.93 0.18 -0.68 E14 0.71 0.13 0. 71 6.51 0. 14 0.76 E15 -0. 13 0. 15 -0, 13 -0.19 0.15 6.06 E16 -0.20 0.16 -0.20 0,04 6 . 14 0.29 E17 -0. 35 0. 16 -0.35 -0.53 0. 16 -0.28 B18 -0.15 0.15 -0. 15 -0.06 0, 15 0. 19 E19 0. 41 0. 14 0. 41 0.22 6.14 0.47 E20 0.46 0.14 0.46 0.40 0. 14 0.65 E21 0.00 0.15 0.00 -0.24 6.15 0.01 E22 -0.38 0.16 -0.38 0.40 0. 14 0.65 E23 -0. 20 0.16 -0.20 -0.28 0.15 - 0.03 E24 1.86 0.13 1.86 1.47 0. 13 1.72 E25 1. 39 0. 13 1. 39 1.30 01 13 1.55 E26 0.98 0,13 0.98 0.79 0 . 13 1.04 E28 2. 12 0. 13 2. 12 1.68 0. 13 1.93 E30 2.09 0.13 2.09 2. 16 Oi 14 2.41 E.31 -0. 85 0. 19 -0.85 -1.61 0.23 -1.36 E32 0.64 0.13 0. 64 0.42 0, 14 0.67 B33 -0.51 0. 17 -0.51 -1.47 0,22 -.1,22 B34 1.68 0.13 1.68 1.18 0.13 1.43 E35 -0. 57 0. 17 -0.57 0.42 0. 14 0,67 E36 0.64 0.13 0,64 0.42 Oi 14 0.67 E37 0.69 0.13 0.69 0.66 Oi. 13 0.91 E38 -0.15 0.15 -0,15 -1.00 o i 19 -0.75 E39 0. 27 0. 14 0.27 0.42 0 l l 4 0.67 E40 1.36 0. 13 1.36 0.83 Oi 13 1.08 E41 1. 90 0. 13 1.90 1.12 0.13 1.37 E42 0.37 0.14 0.37 0.33 0.14 0.58 E44 1. 86 0. 13 1.86 1.30 0.13 1.55 E r r D i f f 2. 63 0. 30 -1.94 1.53 0. 20 -1.25 1.96 0. 23 -1. 27 1.42 0. 19 -0.73 1. 69 0. 21 -1.00 0.48 0. 15 0. 21 0, 83 0. 16 -C. 14 0.27 0, 14 0.42 0.99 0. 17 -0.30 0.93 0. 16 -0. 24 0.50 0. 15 0. 19 0.29 0. 14 0.40 0. 19 0. 14 0.50 0.13 0. 14 0.82 0.68 0. 13 1. 37 0.29 0. 14 0.40 0.44 0. 15 0.25 0,24 0. 14 0.93 0. 11 0. 14 0.80 0.10 0. 14 0.59 0, 48 0, 13 1.17 0.07 0. 14 0.76 1-11 0. 13 ;1.80 1.40 0. 14 2.09 0, 66 0. 13 1.35 1.04 0. 13 1.73 2. 19 0. 16 2.88 2.07 o . 24 -1.38 0. 31 0. 14 0.38 1.19 0. 18 -0.50 0. 92 0. 13 1.61 0.47 0, 13 1.16 0. 22 0. 14 0. 9 1 0.50 0. 15 0. 19 0.46 0. 15 0. 23 0.11 0. 14 0.80 0.95 0. 13 1,64 0*48 0. 13 1117 0.29 0. 14 0.40 1.40 0. 14 2.09 i Table 4.9 - c o n f d. 122 1964 1970 1979 Item r - - r : — : D i f f Std Adj D i f f Std Adj D i f f Std Adj E r r D i f f E r f D i f f E r r E i f f C 1 -2.79 0.42 -2. 79 -1.67 0.23 -1.42 -1.56 0.20.-0.87 C 2 -1. 12 0.21 -1. 12 -1.78 0.24 -1.53 -2.07 0.24 -1.38 C 3 -1. 56 0.24 -1. 56 -1.29 0.20 -1 .04 -1.28 0.18 -0.59 C 4 -3.20 0.51 -3.20 -3.42 6. 51 -3. 17 -2.93 0.34 -2. 24 C 6 -0. 82 0. 19 -0. 82 -1.61 0.23 -1.36 -1.53 0.20.-0.84 C 7 -0.69 0.18 -0.69 -0.84 0. 18 -0.59 -0. 73 0*16 -0.04 C 8 -1.30 0. 22 -1. 30 -1.25 0.20 -1 .00 -1. 04 0.17 -0.35 C 9 -0.85 0.19 -0.85 -0.69 6. 17 -0.44 -0.64 0.15 0.05 C10 -1. 40 0.23 -1.40 -0.90 0.18 -0.65 -0.27 0.14 0.42 C11 -0.51 0. 17 -0.51 0.22 0. 14 0.47 0.54 0. 13 1.23 C12 -1.25 0.22 -1.25 -0.26 0.15 -6.01 -0.96 0.17.-0.27 C13 0.33 0.14 0.33 1.03 0*13 1.28 0. 17 0.14 0.86 C14 0.31 0.14 0.31 1.10 0.13 1.35 0.64 0.13 1.33 C15 -0.72 0.18 -0.72 -0.69 Q* 17 -0.44 -0. 29 0. 14 0.40 C16 0.03 0.15 0.03 0.79 0.13 ,1.04 0.17 0.14 0.86 C17 0.48 0.14 0.48 0. 22 0. 14 0.47 0. 17 0. 14 0.86 C18 0.00 0.15 0. 00 0.95 0.13 1.20 6.33 0.13 1.02 C19 0.27 0.14 0.27 0.96 0. 13 1.21 0. 87 0. 13 1.56 C20 0.05 0.15 0.05 0.91 0.13 1.16 0.45 0.13 1.14 C21 -1.25 0.22 -1.25 -1.67 0.23 -1.42 -1. 04 0.17 -0.35 C22 0. 46 0. 14 0. 46 0.88 0. 13 1. 13 1,36 0.14 2.05 C23 0.07 0.15 0.07 r0.87 0. 18 -0.62 -0. 53 0. 15 0. 16 C24 1. 21 0. 13 1. 21 1.25 0. 13 1 .50 1.22 0. 14 1.91 C25 0.1 1 0.15 0. 31 -0.28 0. 15 -6.03 -0.53 0. 15 0. 16 C26 1.05 0. 13 1. 05 1.03 0. 13 1,28 0.97 0.13 1.66 C27 0. 23 0.14 0.23 0.90 0. 13 1. 15 2.36 0. 16 3.05 C28 0. 35 0. 14 0. 35 0.79 0. 13 1.04 1.98 0.15 2.67 C29 0.25 0.14 0.25 -0.02 0. 15 0. 23 -0.50 0.15 0. 19 C30 -0. 57 0. 17 -0.57 -0.78 0.17 -0.53 -0.88 0.16 -0.19 C31 -0.54 0.17 -0.54 0.16 0.14 0.41 0.40 0. 13 1.09 C32 1.37 0. 13 1.37 1.58 0.13 1.83 0.88 0. 13 1. 57 C34 1.93 0.13 1.93 2.68 0. 16 2.93 2. 00 0. 15 2.69 C35 -0.06 0.15 -0.06 0.16 0.14 0.41 0.48 0.13 1.17 C36 -2.06 0.30 -2.06 -1.97^ 0.26 -1.72 - 1 . 45 0. 19 -0.76 C37 2. 05 0. 13 2.05 0,50 6.14 6.75 1.71 0. 14 2.40 C38 2. 17 0. 13 2. 17 2.45 6. 15 2.70 2.09 0. 15 2.78 C41 1.70 0. 13 1. 70 1.75 0.13 2.00 1.75 0.14 2.44 C42 1.32 0. 13 1.32 0.90 0. 13 1. 15 1. 08 0. 13 1.77 C44 3. 41 0. 17 3. 41 3.82 0.22 4.07 3.36 0.22 4.05 123 For each c a l i b r a t i o n the standard e r r o r of item d i f f i c u l t y ranged approximately from 0,130 to 0,400, On each t e s t , items n e a r e s t i n d i f f i c u l t y to the mean a b i l i t y were c a l i b r a t e d with the lowest standard e r r o r . The e f f e c t of changing mean a b i l i t i e s was not great. F c r example, an item of average d i f f i c u l t y on each of the three a d m i n i s t r a t i o n s had a standard e r r o r of 0.148, 0.145, and 0.137 i n 1964, 1970, and 1979. The constant decrease i n standard e r r o r r e f l e c t s the movement of the mean a b i l i t i e s toward the c e n t r e of the d i f f i c u l t y / a b i l i t y s c a l e . F i n a l summary s t a t i s t i c s f o r a b i l i t i e s based cn the 79 remaining items are given i n Table 4.10.; Table 4.10 Summary S t a t i s t i c s f o r A b i l i t i e s Year Mean S.D. 1964 1.39 0.93 1970 1. 14 0.96 1979 0.7 0 1.00 The d e c l i n e i n the mean raw sco r e s on the t e s t i s r e f l e c t e d i n the changing mean a b i l i t i e s . In 1964, the mean a b i l i t y was 1.39, c o n s i d e r a b l y above the zero p o i n t o f the s c a l e . By 1979, the mean a b i l i t y score had d e c l i n e d to 0.70. In Easch terms, t h i s meant t h a t , when con f r o n t e d with any item, the odds cn success f o r the average 1964 student compared with t h a t of 1979 werej twice as g r e a t . T h i s i s 124 a r r i v e d at by determining the value of e r a i s e d t o the power 1.39 - 0.70. I t i s c o i n c i d e n t a l t h a t one mean a b i l i t y value a l s c happens t o be twice the other. Because mean a b i l i t i e s were c o n s i d e r a b l y g r e a t e r than zero, the t e s t s r e s u l t e d i n l a r g e r standard e r r o r s of person measurement f o r most examinees than would have been the case had the t e s t s been l e s s d i f f i c u l t . That i s , i f the mean a b i l i t i e s had been c e n t r e d around zero, the measurement of person a b i l i t i e s would have :been more p r e c i s e . . However, the d i f f e r e n c e s were not g r e a t : f o r the worst case, i n 1964 •, the standard e r r o r f o r the average student was 0.29 as compared with a p o s s i b l e 0.26. ! Changes in Item D i f f i c u l t y For each p a i r of years two s t a n d a r d i z e d d i f f e r e n c e v a l u e s f o r each item were . c a l c u l a t e d : one r e f l e c t e d r e l a t i v e d i f f i c u l t y change w i t h i n the s e t of items, and the second r e f l e c t e d a b s o l u t e change of d i f f i c u l t y across time., For changes i n r e l a t i v e d i f f i c u l t y the s t a n d a r d i z e d d i f f e r e n c e score was determined by ' s u b t r a c t i n g the two d i f f i c u l t y e s t i m a t e s and d i v i d i n g by the pooled standard e r r o r o f the d i f f i c u l t y e s t i m a t e s . The r e s u l t i n g d i s t r i b u t i o n s and Kolmogorov-Smirnov s t a t i s t i c s are shown i n Table 4.11. In a l l cases the hypothesis of u n i t n o r m a l i t y was r e j e c t e d at the 0.01 l e v e l of s i g n i f i c a n c e . Thus, f o r a l l comparisons, the omnibus t e s t i n d i c a t e d that the r e l a t i v e d i f f i c u l t y l e v e l s of seme items had changed. 125 Table 4. 11 D i s t r i b u t i o n s of Standardized Scores B e l a t e d to R e l a t i v e Change i n Item D i f f i c u l t y Comparison Mean S.D. , Skew , Kurt K-S 2 F 1964-1970 0.00 2.38 -0.15 0.70 1.71 0. 006 197 0-1979 -0.10 2. 30 0.34 1.22 1.85 0. 002 1964-1979 -0.13 3.03 0.45 1.24 2.30 0. 000 In Table 4.11, the negative mean standardized scores i n d i c a t e t h a t more items were r e l a t i v e l y e a s i e r i n 1979 than i n 1964 or 1970. However, s i n c e the d i f f i c u l t i e s are c e n t r e d on t h e i r own mean f o r each year, the a l g e b r a i c sum cf the s h i f t s i n d i f f i c u l t y f o r each year must be zero. T h i s would imply t h a t , although fewer items were r e l a t i v e l y more d i f f i c u l t , t h e i r average change i n d i f f i c u l t y was g r e a t e r than t h a t f o r the items which had become e a s i e r , . For example, there was a sharp i n c r e a s e i n the r e l a t i v e d i f f i c u l t y of Items C27 and C28 unmatched Ly a s i m i l a r s h i f t f o r any e a s i e r items. The c r i t e r i o n f o r d e c i d i n g t h a t a p a r t i c u l a r item had changed i n d i f f i c u l t y was a s t a n d a r d i z e d d i f f e r e n c e value whose absolute value exceeded two. Items which changed i n r e l a t i v e d i f f i c u l t y i n a t l e a s t one comparison are shown i n T a b l e 4.12. A plus (+) i n d i c a t e s t h a t the item had become more d i f f i c u l t i n the l a t t e r year, a minus ,(-) i n d i c a t e s l e s s d i f f i c u l t y , and a z e r o (0) i n d i c a t e s no change. Summary s t a t i s t i c s f o r changes i n r e l a t i v e item d i f f i c u l t y are g i v e n i n T a b l e 4.13. 126 Table 4.12 Items Changing i n R e l a t i v e D i f f i c u l t y :tem 1964-70 1970-79 1964-79 . Item 1964-70 1970-79 1S64-E 2 0 0 + C 1 + 0 E10 - 0 - C 2 - 0 -E13 0 + 0 ,C 6 0 — E14 0 - - C10 0 + E16 0 + C11 + 0 + B22 + 0 + C12 + - 0 E24 - 0 - C13 + - 0 E28 - - - C14 + - 0 E31 - 0 - C16 + - 0 E32 0 - - C18 + - 0 E33 - 0 - C19 + 0 + B34 - 0 - C20 + -E35 + 0 + C21 0 + 0 E36 0 0 - C22 + + + E37 0 - - C23 - 0 — E38 - + 0 C25 0 0 -E40 - 0 - . C27 + + + E41 - - - C28 + + + E42 0 - - C29 0 - -E44 - 0 - C31 + 0 - + C32 0 - -C34 + - 0 C35 0 0 + C37 - + 0 C42 0 0 + : items- i n c r e a s i n g i n r e l a t i v e d i f f i c u l t y i n l a t t e r year -: items decreasing i n r e l a t i v e d i f f i c u l t y i n l a t t e r year 0: items showing no change i n r e l a t i v e d i f f i c u l t y Table 4,. 13 Number of R e l a t i v e D i f f i c u l t y Changes Comparison E a s i e r i n Harder i n Latter'' Year L a t t e r Year 1964-1970 14 16 1970-1979 15 9 1964-1979 20 14 127 For the omnibus t e s t of a b s o l u t e change i n d i f f i c u l t y a procedure s i m i l a r to t h a t o u t l i n e d f o r r e l a t i v e d i f f i c u l t y was f o l l o w e d . In t h i s i n s t a n c e , however, the d i f f i c u l t y o f each item was adjusted t c the 1964 s c a l e by adding the d i f f e r e n c e between the mean sample a b i l i t i e s . . . To the d i f f i c u l t y l e v e l s of the 1970 items the value of 0,. 25 was added to compensate f o r the decreased l e v e l c f a b i l i t y o f the 1970 sample.. To the 1979 d i f f i c u l t y l e v e l s , 0,69 was added. These values were obtained by s u b t r a c t i n g the mean a b i l i t i e s shewn i n Table 4.10 f o r the years i n the comparison.. The adjusted d i f f i c u l t i e s are shown i n Table 4.9.. Table, 4.14 shows the r e s u l t s of the a n a l y s i s en these adjusted d i f f i c u l t i e s . Table 4. 14 D i s t r i b u t i o n s of Standardized Scores Related to Absolute Change i n Item D i f f i c u l t y Compa r i s o n Mean S.D, . Skew Kurt K-S Z p 1964-1970 1.11 2.40 -0.04 ! 0.56 3.10 0. 000 1970-1979 1.91 2.26 0.57 1.60 5. 21 0. 000 1964-1979 2.97 2.97 0.68 1.52 6.05 0. 000 The n u l l h ypothesis of u n i t n o r m a l i t y was r e j e c t e d a t the : 0.001 l e v e l of s i g n i f i c a n c e i n a l l three comparisons. Hence, i n a l l comparisons, the ab s o l u t e d i f f i c u l t y l e v e l s of some items had changed.. 128 In Table 4.14, the mean values i n d i c a t e a constant trend toward i n c r e a s i n g d i f f i c u l t y . . There i s no doubt that the d i s t r i b u t i o n s r e f l e c t more than j u s t random e r r o r i n d i f f i c u l t y c a l i b r a t i o n s . In t h i s case, there are no c o n s t r a i n t s on the movement of !item d i f f i c u l t i e s as t h e r e were f o r r e l a t i v e d i f f i c u l t y . The d i f f i c u l t y values i n c o r p o r a t e both change i n r e l a t i v e d i f f i c u l t y due to changing emphasis w i t h i n the c u r r i c u l u m and t h e . e f f e c t of the changing a b i l i t y of the samples. To determine absolute changes i n item d i f f i c u l t y , again the c r i t e r i o n of a s t a n d a r d i z e d d i f f e r e n c e value.whose ab s o l u t e value exceeded 2 was used. Items changing i n absolute d i f f i c u l t y are given i n Table 4,15. Summary s t a t i s t i c s f o r changes i n a b s o l u t e item d i f f i c u l t y are given i n Table 4.16. The s e p a r a t i o n of the d i f f i c u l t y estimates i n t o the c a t e g o r i e s of r e l a t i v e and absolute can provide valuable i n f o r m a t i o n on change. For example, from Table 4.12, i t can been seen t h a t Items E14, E32, E37, E42, C29, and C32 were r e l a t i v e l y e a s i e r i n 1979 than i n both 1964 and 1970. However, the e f f e c t of the g e n e r a l d e c l i n e i n a b i l i t y was to e l i m i n a t e those changes, l e a v i n g f i v e cf the s i x items unchanged i n a b s o l u t e d i f f i c u l t y . . On the other hand, the r e l a t i v e gain i n performance on Item E37 was s u f f i c i e n t l y l a r g e to outweigh the d e c l i n e i n g e n e r a l a b i l i t y , and the item was e a s i e r i n 1979 than i n p r e v i o u s years, as seen i n Table 4. 15. Table 4.15 Items Changing i n Absolute D i f f i c u l t y 129 Item 1964-70 1970-79 1964-79 jltem 1964-70 1970-79 1S64-79 E 2 0 C 1 + 0 E 3 0 + + C 3 0 0 + B 5 0 + 0 C 7 0 + + B 6 0 + 0 C 8 0 + + B 7 0 0 •*• C 9 0 + B 8 0 0 + C10 + + E 9 + + + C11 + + + B11 0 + + C12 + 0 + E12 0 + + C13 + - + B13 0 + C14 + 0 + E15 0 + + C15 0 + E16 + + + C16 + 0 • B17 0 + . + C1 8 0 B19 0 + + C19 + 0 * E21 0 + C20 + 0 + B22 + + + C21 0 + + E23 0 + + C22 + + + E25 0 + + C23 - 0 E26 0 0 + C24 0 + + E28 0 0 - C26 0 + + B30 0 + + C27 + + + B33 - 0 C28 + + B35 + • + C31 + + B37 0 - - C32 + 0 0 B38 - + 0 C34 + 0 + E39 + 0 + C35 • + + E40 0 + 0 C36 0 + + B41 - 0 - C37 + 0 E44 0 0 C38 + 0 + C41 0 + + C42 0 + + C44 + 0 + Table 4.16 Number of Absolute D i f f i c u l t y Changes Comparison E a s i e r i n Harder i n L a t t e r Year L a t t e r Year 1964-1970 1970-1979 1964-1979 5 2 3 24 41 49 130 There i s a caveat r e g a r d i n g any c o n c l u s i o n s drawn on changing d i f f i c u l t y f c r an i n d i v i d u a l item. The c r i t e r i o n f o r d e c i d i n g change was a s t a n d a r d i z e d d i f f e r e n c e f o r which the absolute value exceeded two. T h i s i s e q u i v a l e n t to s e t t i n g an approximate alpha l e v e l of 0,05., With the l a r g e number of comparisons made, the p r o b a b i l i t y i s very high t h a t a Type I e r r o r w i l l have been made on at l e a s t one comparison, Changes i n Content Area D i f f i c u l t y The d e l e t i o n of n o n - f i t t i n g items r e s u l t e d i n changes i n the items c o n s t i t u t i n g each content area,„ The items which f a i l e d to f i t the model came from across the range of c u r r i c u l u m t o p i c s , . Seven out of the ten t o p i c s l e s t one or two items. : The most s e r i o u s e f f e c t l i k e l y was oh T o p i c #10, U n i t s of Measure, which l o s t one of i t s f o u r items, f u r t h e r weakening a group already c o n t a i n i n g few elements., The r e v i s e d item groupings were as f o l l o w s : 1. Whole number concepts and o p e r a t i o n s (WNC). - 10 items R:14, 31, 41 C:1 , 4, 6, 7 , 8 , 9 , 18 2. . A p p l i c a t i o n s using whole numbers (WNA) - 8 items R: 1, 2, 6, 15, 22, 28, 32 C: 31 131 3. Common f r a c t i o n concepts and o p e r a t i o n s (CFC) - 12 items E: 33, 34, 44 C: 10, 11, 12, 13, 15, 21, 22, 24, 36 4. A p p l i c a t i o n s using common f r a c t i o n s (CFA) - 8 items E: 8, 9, 11, 13, 16, 19, 21, 23 5. Decimals (Dec) - 11 items B: 5, 17, 35, 37, 39 C: 2, 3, 16, 17, 20, 32 6. Money (Hon) - 7 items E: 3, 7, 10, 18, 20, 36 C: 38 7. Percent (Pet) - 7 items E: 25, 30, 42 C: 14, 19, 35, 41 8. Elementary a l g e b r a (Alg) - 7 items E: 12, 26, 38, 40 C: 23, 37, 42 9. Geometry and graphing (Geo) - 6 items C: 25, 26, 29, 30, 34, 44 10. O n i t s of measure (Mea) -. 3 items B: 24 ' ' C: 27, 28 For each content area the mean d i f f i c u l t y was c a l c u l a t e d . The standard e r r o r of the mean was determined by summing the squares of the c o n s t i t u e n t standard e r r o r s , d i v i d i n g by the number of items i n the grcup, and t a k i n g the sguare ro o t o f the r e s u l t . Mean values, standard e r r o r s of 132 t h e mean, and means adjusted t o the 1964 s c a l e f o r each a d m i n i s t r a t i o n are shown i n Table 4.17. Table 4.17 Summary S t a t i s t i c s f o r Content Areas 1964 1970 1979 Content ----r—~ — Area Mean Std Adj Mean Std Adj Mean Std Adj E r r Mean E r r Mean E r r Mean 1. WNC -0.79 0. 18 -0.79 -1.04 6.24 -0.79 -0.99 0.20 -0.30 2. WN A -0.55 0. 22 -0.55 -0.50 Oi 21 -0, 25 -0.51 0. 18 0. 18 3. CFC -0. 18 0. 19 -0.18 -0.09 6.16 0.17 0.03 0.15 6.72 4. CFA -0.44 0. 17 -0.44 -0.42 Oi, 16 -0. 17 -0. 18 0. 14 0.52 5. Dec -0. 17 0. 17 -0.17 -0.03 0.17 6.22 -0.30 0. 16 0.39 6. Mon -0. 03 0.21 -0.03 -0.09 0. 19 0. 16 -0.21 0. 16 0. 48 7. Pet 0. 87 0. 14 0.87 1. 1 1 6. 14 1.36 1.01 0. 14 1. 70 8. Alg 0.76 0. 14 0.76 0.07 6.15 0*32 0.42 0. 14 1. 11 9. Geo 1.03 0.15 1.03 1.08 0. 17 1.33 0.74 0. 16 1.43 10. Mea 0. 81 0. 14 0.81 1.05 6.13 1.30 1.82 0. 15 2.51 In a l l t h r e e a d m i n i s t r a t i o n s , the order of d i f f i c u l t y of the ten content areas was roughly the same as t h e i r numerical order, Whole Numbers being e a s i e s t with Geometry, Percent, and U n i t s o f Measure c o n s i s t e n t l y beinc the most d i f f i c u l t . The standard e r r o r of the means tended to decrease t o a minimum value as; the mean group d i f f i c u l t y approached t h e mean a b i l i t y f o r the year. In g e n e r a l , as the mean d i f f i c u l t y i n c r e a s e d , the standard e r r o r decreased.. To determine r e l a t i v e change, the s t a n d a r d i z e d d i f f e r e n c e of the content area means f o r the years i n the comparison was determined, Any s t a n d a r d i z e d d i f f e r e n c e of means f o r which the absolute.! value was g r e a t e r than two was 133 taken to i n d i c a t e a change. , To determine absolute change a s i m i l a r procedure was f o l l o w e d . In t h i s case, the comparison was made of the d i f f i c u l t i e s adjusted t o the 1964 s c a l e . The standard e r r o r s of the means were the same as i n the t e s t f o r change i n r e l a t i v e d i f f i c u l t y . . The r e s u l t s of both r e l a t i v e and absolute comparisons are given i n Table 4,18.. Table 4.18 Changes i n Content area D i f f i c u l t y Content No of 19 64- 1970 1970- 1979 1964- 1S79 Area Items E e l Abs E e l Abs E e l Abs 1, WNC 10 0 0 0 0 0 0 2. . WNa 8 0 0 0 0 0 + 3. CFC 12 0 0 0 + 0 + 4. CFA 8 0 0 b + 0 + 5, . Dec 11 0 0 b 0 0 + 6. Mon 7 0 0 b 0 0 0 7. Pet 7 0 + • 0 0 0 + 8. Alg 7 - - b + 0 0 9. Geo 6 0 0 0 0 0 0 10..Wea 3 0 + + + + The range of the standard e r r o r c f the d i f f e r e n c e of means i n the year to year comparisons was approximately 0.20 f o r the higher d i f f i c u l t y t o p i c s t o 0.30 f o r those o f l e s s , i • d i f f i c u l t y . Thus, i n order to be found s i g n i f i c a n t l y d i f f e r e n t , the measures had to d i f f e r by roughly 0,40 t o 0.60 l o g i t s . For each comparison, only one t o p i c changed i n 134 r e l a t i v e d i f f i c u l t y . On a b s o l u t e d i f f i c u l t y , however, t h e r e was evidence f o r i n c r e a s i n g d i f f i c u l t y , with s i x of ten content areas more d i f f i c u l t i n 1979 than i n 1964. The sole e x c e p t i o n was the Elementary Algebra s e c t i o n which became e a s i e r from 1964 to 1970, but t h a t advantage was l o s t from 1970 t o 1979. The i n t e r p r e t a t i o n of these r e s u l t s must be tempered by the knowledge t h a t the content areas were made up of d i f f e r i n g numbers of items. The s t a t i s t i c a l procedure f o r d e c i d i n g t h a t a change had occurred d i d not take t h i s f a c t o r i n t o account. The degree of confidence with which g e n e r a l i z a t i o n s can be made i s dependent upcn the number and r e p r e s e n t a t i v e n e s s of the items subsumed under any one t o p i c . For example, although Topic #10, O n i t s of Measure, showed c o n s i s t e n t l y i n c r e a s i n g d i f f i c u l t y , t h a t u t i t c o n t a i n e d only t h r e e items; one on time which was unchanged i n d i f f i c u l t y , and two on I m p e r i a l u n i t s of weight, the l a t t e r two dominating the former. More confidence should be pl a c e d i n c o n c l u s i o n s reached on say, T o p i c #3, Common F r a c t i o n Concepts and Ope r a t i o n s , which c o n t a i n e d 12 d i v e r s e items. Comparison of R e s u l t s Using Rasch and T r a d i t i o n a l Procedures To determine whether d e c i s i o n s on change would d i f f e r depending on whether the Rasch model cr the t r a d i t i o n a l approach were used, a f u r t h e r a n a l y s i s was made of the 79 r e t a i n e d items. P-values, i n the form of the percentage of i n c o r r e c t responses on each item f o r each year, were 135 c a l c u l a t e d , along with the standard e r r o r a s s o c i a t e d with each P-value. The usual s t a n d a r d i z e d d i f f e r e n c e s c o r e s were c a l c u l a t e d . The r e s u l t s are shown i n Table 4.19. I f the ab s o l u t e value of the s t a n d a r d i z e d d i f f e r e n c e exceeded two, : t the item was presumed to have changed i n a b s o l u t e d i f f i c u l t y . . D i s c r e p a n c i e s between d e c i s i o n s made using the two models were observed i n 27 cases out of t h e : 237 item comparisons made. The items on which d i s c r e p a n c i e s occurred and the nature of those d i s c r e p a n c i e s are shewn i n Table 4.20. On 24 items, use of the t r a d i t i o n a l model would lead t o the judgment of i n c r e a s i n g d i f f i c u l t y , whereas a c o n e l e s i o n of no change would be made us i n g the Easch model. In three cases the Easch model i n d i c a t e d decreasing d i f f i c u l t y , while the t r a d i t i o n a l approach i n d i c a t e d no change. The mean percent d i f f i c u l t y of the 24 items was' 22.1, and t h a t of the 3 items, 46.6. An a n a l y s i s ' of content area d i f f i c u l t y p a r a l l e l to tha t p r e v i o u s l y d e s c r i b e d f o r the Easch model was c a r r i e d out using P-values, t r a d i t i o n a l standard e r r o r s , and the customary standardized d i f f e r e n c e of group means-. In t h i s case only the absolute change i n d i f f i c u l t y was determined from year to year. The r e s u l t s are shown i n Table 4,21, 136 Table 4. 19 T r a d i t i o n a l A n a l y s i s o f Change Percent I n c o r r e c t Standard E r r o r Stand, D i f f e r e n c e Item r 1964 1970 1S79 1964 1970 1979 64-70 70-79 64-79 B 1 4.43 6.60 11.81 1. 24 1. 47 1, 90 0.97 2. 17 •Z m 12 E 2 4. 39 6.60 19. 10 1. 19 1. 47 2. 32 1-1.7 4. 56 5. 64 E 3 3.04 5.90 14.24 1. 00 1. 39 2. 06 1.67 3, 35 4. 88 E 5 10. 47 9.03 18.75 1. 78 i . 69 2. 30 -0,59 3. 40 2. 84 E 6 8.45 7.64 17,36 1. 62 1. 57 2. 24 -0.36 3. 56 3. 23 E 7 19. 93 21.87 31.94 2. 33 2. 44 2. 75 0.58 2, 74 3. 33 E 8 13.51 17.71 25.35 1. 99 2. 25 2. 57 1.40 2, 24 3 . 64 E 9 15. 20 22.22 34.03 2.09 2. 45 2, 80 2.18 3. 17 5. 39 E10 18.92 14.58 23.96 2. 28 2l 08 2. 52 -1.40 2, 87 1, 48 E11 9. 12 14.58 23.96 1. 68 2. 08 2. 52 2.04 2. 37 4. 90 E12 19.26 18.40 30,21 2. 30 '2i 29 2. 71 -0,26 3, 33 • 3. 08 E13 14. 19 15.28 31.94 2. 03 2'. 12 2. 75 0.37 4. 79 5. 19 B14 36.49 37.50 34.03 2. 80 2. 86 2. 80 0,25 -0. 87 -0. 62 E15 21. 62 25.69 41,67 2. 40 2. 58 2. 91 1.16 4. 11 5. 32 E16 20.61 28.82 51.04 2. 3 6 2i 67 2. 95 2.30 5. 58 £. 06 E17 18. 92 20. 14 32.99 2. 28 !2. 37 2, 78 0.37 3. 52 3. 92 E18 21.62 27.43 30.56 2. 40 2. 63 2. 72 1,6 3 0, 83 2. 46 E19 30. 41 31.94 42.71 2. 68 2. 75 2, 92 0.40 2, 68 3. 11 E20 31.42 35.42 40.28 2. 70 2. 82 2. 90 1.0 2 1. 20 2, 24 E21 23. 65 24.31 35.76 2, 47 2. 53 2. 83 0.19 3. 02 3. 22 E22 18.58 35.42 47.22 2. 26 2. 82 2. 95 4.65 2. 89 7. 71 E23 20. 61 23. 95 38.89 2. 36 2. 52 2. 88 0.97 3. 90 4. 92 B24 59.80 57.29 60.07 2. 85 2l 92 2. 89 -0.61 0. 68 0. 07 E25 50. 34 53.47 64.93 2. 91 2. 94 2, 82 0.76 2. 81 3. 60 E26 41.55 43.06 51.04 2. 87 2. 92 2. 95 0.37 1. 92 2. 31 £28 64. 86 61. 11 59.38 2. 78 2. 88 2,; 90 -0.94 -0. 43 -1. 37 E30 64. 19 70. 14 78. 13 2. 79 2. 70 2. 44 1,5 3 2. 19 3, 76 £31 13. 18 9.72 14.93 1. 97 i . 75 2. 10 -1.31 i . 90 0. 61 B32 34.80 35.76 32.29 2. 77 2. 83 2, 76 0-24 -0, 88 -0. 64 E33 16,89 11. 11 21,53 2. 18 1. 86 2. 43 -2.02 3,41 1. 42 B34 56.42 51.04 55.21 2. 89 2, 95 2, 94 -1. 30 1. 00 -0. 29 E35 15. 88 36. ,11 47. 22 2. 13 2. £4 2. 95 5.71 2. 72 8.62 E36 34.80 36.11 43.40 2. 77 2. 84 2. 93 0.33 1. 79 2, 13 £37 35. 81 40.62 30.21 2. 79 2. 90 2. 71 1,2 0 -2. 62 -1. 44 E38 21,28 14. 93 32.29 2. 38 2. 10 2. 76 -2.00 5, 00 3. 02 £39 28. 04 36.11 39.58 2. 62 2. 84 2. 8 9 2.09 0. 86 2. 96 E40 49.32 44.10 57,64 2. 91 2. 93 2. 92 -1.27 3. 28 2. 02 £41 60. 47 49.65 47. 22 2, 85 ^ * 95 2. 95 -2.64 -0. 58 -3. 23 E42 29.73 34.03 33.68 2, 66 2; 80 2. 79 1. 11 -0. 09 1. 02 £44 59. 80 53.47 65.28 2, 85 2, 94 2. 81 -1.54 2. 90 1. 37 Table 4. 19 - c o n f a. 137 Item -Percent I n c o r r e c t 1964 1970 1979 Standard E r r o r Stand. D i f f e r e n c e 1964 1S70 1979 64-70 70-79 64-79 1 2 3 4 6 7 8 9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C34 C35 C36 C37 C38 C41 C42 C44 3.38 10. 01 7. 77 2.70 13.18 14. 53 9.46 13. 18 8.45 16. 55 9.80 29. 05 28.72 14. 19 23.99 31.76 23.99 28. 38 24.32 9. 80 31.42 24. 66 46.28 25. 34 42,91 27. 36 29.39 27. 70 16.22 16. 22 49.66 61. 15 22.64 5.07 63.51 65. 8 8 56.42 48.65 84.80 9.37 9.03 12.50' 3.82 10.07 16.67 12.50 18.75 15.97 31.94 24.31 48. 26 49.65 18.40 43.40 32. 64 46. 18 46.53 45.49 9.37 44.79 16.32 52.43 23.96 48.26 45. 14 43.40 27.78 17.36 31. 25. 59.03 78. 47 30.90 7.64 37.15 75.00 62.50 45. 14 90.97 17.01 14.93 20.49 11.11 18.75 28.47 22.92 28.82 33.33 48.26 23.61 41. 32 51.39 32.64 41.67 41.67 44.79 55.56 46.53 21.87 63.89 29 61.46 29 56*60 80.90 75 25, 45 17 86 69 30.56 00 49 55*90 75*69 47.92 18.06 70.83 77.43 70.83 58.68 90.62 1.05 1.81 1.56 0. 94 1.97 2.05 1.70 1. S7 1.62 2. 16 1.73 2.64 2.63 2.03 2.49 2.71 2.49 2.62 2.50 1.73 2.70 2.51 2.90 2.53 2.88 2.60 2.65 2.61 2. 15 2.15 2.91 2.84 2.44 1.28 2.80 2.76 2.89 2.91 2.09 1.72 1.69 1. 95 1. 13 1.78 2*20 1.95 2.30 2. 16 2.75 2. 53 2. S5 2.95 2.29 2.93 2.77 2.94 2. 94 2. 94 1.72 2.94 2. 18 2.95 2.52 2.95 2.S4 2. 93 2. €4 2.24 2^74 21 90 2,43 2.73 1.57 2.85 2*56 2i86 2-S4 1.69 2.22 2.10 2.38 1.86 2.30 2.66 2.48 2.67 2.78 2.95 2. 51 2.91 2.95 2.77 2.91 2.91 2. 94 2.93 2.94 2*44 2.84 2.68 2.87 2.70 2.93 2.32 2.53 2. 72 2.56 2.94 2.93 2. 53 2.95 2.27 2.68 2.47 2.68 2.91 1.72 2.97 -0.72 1.8 9 0.76 •1. 17 0.71 1.17 1.84 2.79 4.40 4.73 4.85 5.29 1.38 5.06 0.23 5.76 4.60 5,4 9 -0.17 3.35 -2.51 1.49 -0.39 1.30 4.53 3.55 0.02 0.37 4.32 2.28 4.64 2. 26 1.27 -6.59 2.42 1.50 -0.85 2.30 2.72 2. 19 2.59 3.36 2.98 3.42 3-30 2.85 4.93 4.05 -0. 19 -1.68 0.4 2 3.96 -0.42 2.25 -0.33 2.17 0.25 4.19 4.68 3.72 2.19 1.60 2.01 9.55 8.35 0.73 2.25 3.55 -0.76 -0.79 4.24 3.78 8.60 0.68 2. 13 3.28 -0.14 5.55 1.49 4:47 4. 04 1.84 4. 15 4.47 4.74 7.73 8.67 4.53 3. 12 5.73 5.37 4.62 2.49 5.41 6.90 5.75 4.04 8.29 1.23 2.7 2 1.22 :: 3.33 15.38 12.63 0.76 2.63 8.04 1.51 3.82 6.61 4.99 1.89 3. 12 2.66 2. 44 2. 15 138 Table 4.20 Items Showing D i s c r e p a n c i e s Between D e c i s i o n s Using the Easch and T r a d i t i o n a l Models D e c i s i o n .. •, < D e c i s i o n Item Years Easch Trad Item Years Easch . Trad E 1 70-79 0 + C 1 70-79 0 + E 1 64-79 0 + G 2 70-79 0 E 5 64-79 0 + C 3 70-79 0 + E 6 64-79 0 + C 4 70-79 0 E 7 70-79 0 C 4 64-79 0 E 8 70-79 0 + C 6 70-79 0 + E10 70-79 0 + C13 70-79 - 0 E11 64-70 0 + C17 70-79 0 + E18 64-70 0 + C17 64-79 0 + E20 64-70 0 + C19 70-79 0 + E28 64-79 - 0 C30 70-79 0 E36 64-79 0 + C30 64-79 0 + E37 64-79 - 0 E38 64-79 0 + E40 64-79 0 Table 4.21 Content Area D e c i s i o n s Using the Easch and T r a d i t i o n a l Models 1964-1970 1970-1979 1964-1S79 Content — • — -• Stand D i f f Dec Stand D i f f Dec Stand D i f f Dec Area Easch Trad E T Easch Trad E T ' Easch Trad E T 1. .WNC 0.00 0.76 0 0 1. 58 1,60 0 0 1.82 2,3 6 0 + 2,. ,WNR 1. 00 1.43 0 0 1.55 2.25 0 + • 2.61 3.71 + + 3. CFC 1.36 1.59 0 0 2*24 2.65 + + 3.75 4.27 + + 4; CFR 1. 23 1.73 0 0 3.25 3.55 + + 4,36 5.28 + 5. Dec 1.62 2.26 0 + 0.7 3 1.09 0 0 2.40 3.33 + + 6. Mon 0. 67 0.88 0 0 1.29 1.81 0 0 1.93 2.67 0 7. Pet 2.47 2.44 + + 1.71 1.98 0 0 4.19 4.49 + + 8. ftlg -2. 14 -1.86 - 0 3*85 4. 11 + + 1.77 2.27 0 9. Geo 1. 32 1.36 0 0 0.43 1.02 0 0 1,82 2.33 0 10. Mea 2. 56 2.43 + + 6.10 6.03 + + 8.29 8.9 0 + + 139 CHAPTER V DISCUSSION AND CONCLUSIONS There are numerous c o n t e n t i o u s i s s u e s i n the measurement of change. The s p e c i f i c problem t o which t h i s study was d i r e c t e d was the problem of s c a l e . . Because the Easch model p u r p o r t s to y i e l d measures of i t e m . d i f f i c u l t y and person a b i l i t y on a common equal i n t e r v a l s c a l e , t h i s model was s e l e c t e d as the b a s i s f c r the study. . Once item d i f f i c u l t i e s were e s t a b l i s h e d using Easch procedures, t r a d i t i o n a l s t a t i s t i c a l procedures u t i l i z i n g s t a n d a r d i z e d d i f f e r e n c e scores were used to assess change, A l a r g e p a r t of the d i s c u s s i o n i n t h i s chapter c e n t r e s on how change d e c i s i o n s d i f f e r depending on whether the item d i f f i c u l t i e s used are those generated i n the Easch model, or t r a d i t i o n a l p-values.. The other major s e c t i o n c f t h i s chapter f o c u s s e s on the changing achievement p a t t e r n s i n B r i t i s h Columbia as determined by the Rasch a n a l y s i s . . Some c o n s i d e r a t i o n i s given to the problem of sampling v a r i a t i o n s f o r the three t e s t a d m i n i s t r a t i o n s . . The guestion o f how d e c l i n i n g performance on s p e c i f i c t o p i c s might be viewed by the e d u c a t i o n a l community i s p a r t i a l l y r e s o l v e d by r e f e r e n c e t o a pre v i o u s study of achievement i n B r i t i s h Columbia, 140 Comparison of the Rasch and T r a d i t i o n a l Models On the b a s i s of the Easch a n a l y s i s , 29 of 79 items had changed i n absolute d i f f i c u l t y from 1964 t c 1970.. I n the p r e l i m i n a r y t r a d i t i o n a l r e a n a l y s i s i n the i n i t i a l phase of the study, 28 of the same 79 items were judged t c have changed i n d i f f i c u l t y over the same time span... The only item on which d e c i s i o n s d i f f e r e d was Item R38: e a s i e r i n 1970 using the Rasch item d i f f i c u l t i e s , no change using the % - d i f f i c u l t i e s . . T h i s prompted the ques t i o n of how d e c i s i o n s might d i f f e r , i n g e n e r a l , depending on which model was used. In the p r e l i m i n a r y t r a d i t i o n a l r e a n a l y s i s , the e n t i r e sample of 300 s u b j e c t s had been used f o r each of 1964 and 1970.. In order to make the r e s u l t s comparable with those obtained using the Rasch procedure, the % - d i f f i c u l t i e s were r e c a l c u l a t e d a f t e r d e l e t i n g the same su b j e c t s as had been d e l e t e d f o r the Rasch a n a l y s i s . ., In 27 out of 237 comparisons, d i s c r e p a n c i e s were found to e x i s t between the Easch and t r a d i t i o n a l d e c i s i o n s , i n c l u d i n g t h r e e on the 1964-1970 comparison. In 24 of these cases, items were deemed t o have changed i n d i f f i c u l t y using the t r a d i t i o n a l model, while no change i n d i f f i c u l t y was i n d i c a t e d u sing the Easch model, .. Hence, the Easch model appeared to be more c o n s e r v a t i v e . .. In attempting t o determine the reason f o r d i f f e r i n g i n t e r p r e t a t i o n s , c o n s i d e r a t i o n walk given t o the mechanics f o r ' il d e c i d i n g c h a n g e — t h e s t a n d a r d i z e d d i f f e r e n c e . The formula f o r determining absolute change i n d i f f i c u l t y was: 141 D = i d (1) - a (2) • 1/ \/se(1)2 + se(2)2 J. 11] where a (2) • was the a. 0 0 0 0 19 70 ft * r ++++1979 r + Fi g u r e 5.1.:. The r e l a t i o n s h i p between S!-diff i c u l t y and Easch d i f f i c u l t y , . . 143 a d m i n i s t r a t i o n . As p r e v i o u s l y pointed cut, t h i s has the e f f e c t of i n c r e a s i n g the d i f f e r e n c e between the item d i f f i c u l t i e s as compared with the t r a d i t i o n a l model,. But t h e r e i s also the e f f e c t of the i n c r e a s i n g standard e r r o r of the e s t i m a t e s toward the extremes. T h i s w i l l tend to reduce t h e value of D i n eguation J. 1 j j . , The g u e s t i c n then i s : w i l l the i n c r e a s e i n the standard e r r o r be s u f f i c i e n t l y l a r g e to o f f s e t the i n c r e a s e i n d i f f i c u l t y value? To h e l p answer t h i s , the graph i n F i g u r e 5 . 2 was c o n s t r u c t e d . The graph p o r t r a y s j the r e l a t i o n s h i p between item d i f f i c u l t y and i t s standard e r r o r on the data from the 1979 t e s t a d m i n i s t r a t i o n . . From Figu r e 5.1 i t was noted t h a t the t r a n s f o r m a t i o n from %-d i f f i c u l t y to Easch d i f f i c u l t y was b a s i c a l l y l i n e a r i n the i n t e r v a l from 20% to 80% d i f f i c u l t y . . F c r the 1979 data that t r a n s l a t e d i n t o the i n t e r v a l from - 1 . 2 t o 2 . 2 l o g i t s . For t h a t same i n t e r v a l the r e l a t i o n s h i p between Easch d i f f i c u l t y and standard e r r o r i s c u r v i l i n e a r , with i n c r e a s e s s i n standard e r r o r a c c e l e r a t i n g as the item d i f f i c u l t i e s move outward from the mean sample a b i l i t y p o s i t i o n . T h i s tends to decrease the value of D i n equation [ 1 J , making the Easch t e s t more c o n s e r v a t i v e i n t h i s c e n t r a l r e g i o n . For the extreme r e g i o n s i t might be expected t h a t the r a p i d l y i n c r e a s i n g standard e r r o r more than o f f s e t s the i n c r e a s e d spread i n item d i f f i c u l t y noted e a r l i e r , thereby m a i n t a i n i n g the c o n s e r v a t i v e nature of the model. However, a mathematical demonstration i s r e g u i r e d to r e s o l v e t i e i s s u e , and t h a t has not been done i n t h i s study.. F u r t h e r c o n s i d e r a t i o n must be given to the standard 144 .30+ .25 C D O o t — .20 LU T J fO .1 5-c '1 * CO •tr -ft flH IM I HI tTT* •3 -2 -1 0 I t e m D i f f i c u l t y 1 2 ( l o g i t s ) F i g u r e 5.2, The r e l a t i o n s h i p between Easch item d i f f i c u l t y and standard e r r o r . e r r o r . , As poi n t e d out i n Chapter I I , i t i s here t h a t the Easch model d i f f e r s from the c l a s s i c a l model. The c l a s s i c a l standard e r r o r of estimate i s ,maximum f o r items cf 50% d i f f i c u l t y , and decreases to zero at e i t h e r extremity.. The Easch " standard e r r o r of item c a l i b r a t i o n i s minimum f o r items c e n t r e d at the mean a b i l i t y l e v e l and i n c r e a s e s toward the e x t r e m i t i e s . F i g u r e 5.3 p o r t r a y s 79 items from a h y p o t h e t i c a l t e s t arranged i n order from e a s i e s t to h a r d e s t . The upper p o r t i o n of the f i g u r e shows the conf i d e n c e band cf + 2 standard e r r o r s about the % - d i f f a c u i t y v a l u e s, In the lower 145 100i " y Y 3 s o t s 0' 40 I tems 7 9 3 2 1 .y 0| i i i * y Q -1t -5 - 2 in a: -3 ' . ' / • / t t • / / / / / / 40 Items 7 9 F i g u r e 5.3. V a r i a t i o n i n confidence tands w i t h i n the t r a d i t i o n a l and Easch models. 146 p o r t i o n o f F i g u r e 5.3, the % - d i f f i c u l t y values have been transformed i n t o approximate Rasch e q u i v a l e n t s , assuming the i d e a l i z e d curve from F i g u r e 5.11 The co n f i d e n c e hand of + 2 standard e r r o r s i s shown as before.. The s c a l e s have been exaggerated to make the d i f f e r i n g e f f e c t s c l e a r . , The e f f e c t of the d i f f e r e n t behaviours of the standard e r r o r s i s shown i n F i g u r e 5 .4* A h y p o t h e t i c a l case i s g i v e n i n the diagram i n which i t i s assumed t h a t the items have not changed i n r e l a t i v e d i f f i c u l t y ; they have a l l i n c r e a s e d i n d i f f i c u l t y by the same number of percentage p o i n t s . In the upper p c r t i c n of F i g u r e 5.4, at time A the range i n % - d i f f i c u l t y was 6 to 86 u n i t s with a mean of 46; on time B, the range was 14 to 94 with a mean of 54. The dashed l i n e s are the l i m i t s of the re g i o n whose v e r t i c a l height eguals two standard e r r o r s o f the d i f f e r e n c e of the d i f f i c u l t i e s f o r an item, assuming a sample s i z e of 300. Because the standard e r r o r of estimate f o r the d i f f i c u l t y of an item decreases toward the extremes, sc w i l l the width of the envelope i n F i g u r e .5.4 decrease toward the extremes. In t h i s example, the items having d i f f i c u l t i e s near the mean w i l l not have changed i n d i f f i c u l t y , whereas these on the extremes w i l l . The c r o s s o v e r p o i n t s are around the 47% and 57% d i f f i c u l t y l e v e l s at time A., The r e s u l t s of t r a n s f o r m i n g the % - d i f f i c u l t i e s f o r times A and B i n t o Rasch d i f f i c u l t i e s are shown i n the lower p a r t of Fig u r e 5.4. The s e p a r a t i o n between the curves i s about 0.32 l o g i t s . The dashed l i n e s again r e p r e s e n t the envelope f o r two standard e r r o r s of the d i f f e r e n c e cf the 100i •4—' ZJ o Q I sot 0' 9> S 40 Items 79 3' 2 1f 0| -1 rt -3| /.. ^ • v 40 I tems 79 Figure 5.4.. E f fect of varying standard errors on decis ions on changing item d i f f i cu l ty , . 148 d i f f i c u l t i e s . Again the item o f mean d i f f i c u l t y l i e s w i t h i n the envelope, and the judgment of no change i s made. In t h i s case, however, the standard , e r r o r of estimate of item d i f f i c u l t y i n c r e a s e s toward the extremes, producing the same e f f e c t on the standard e r r o r of the d i f f e r e n c e of the d i f f i c u l t i e s . . As a r e s u l t , the envelope opens out toward the extremes.. Since the v e r t i c a l d i s t a n c e between the curves i s co n s t a n t , no item i s concluded t o have changed i n d i f f i c u l t y . . The net e f f e c t i s t o make the Rasch approach mere c o n s e r v a t i v e than the t r a d i t i o n a l approach.. I f the argument based on the behaviour of standard e r r o r s shown i n F i g u r e 5.4 has merit, i n t h i s study c o n f l i c t i n g d e c i s i o n s should have been concentrated on items having d i f f i c u l t i e s near the extreme ends of the d i s t r i b u t i o n . Furthermore, s i n c e the items were g e n e r a l l y easy i n a l l three t e s t a d m i n i s t r a t i o n s , with only about o n e - f i f t h of the 237 item d i f f i c u l t i e s exceeding 50% d i f f i c u l t y , i t would be expected t h a t the d i s c r e p a n c i e s i n decisicn-making would occur predominantly f o r items at the e a s i e r end cf the s c a l e , . This e x p e c t a t i o n was confirmed.. The mean % - d i f f i c u l t y on the 24 items on which the Rasch model was c o n s e r v a t i v e was about 23%, while the o v e r a l l item d i f f i c u l t y on the t e s t s was about 28%.. That i s , the c o n f l i c t i n g d e c i s i o n s were made on items l e s s d i f f i c u l t than average. For the three items cn which the Rasch was l e s s c o n s e r v a t i v e , the mean % - d i f f i c u l t y was about 47%. I t can only be c o n j e c t u r e d t h a t d i s c r e p a n c i e s here occurred through a combination of standard e r r o r v a r i a t i o n s and f l u c t u a t i o n i n c a l i b r a t i o n , e x e m p l i f i e d i n F i g u r e 5.1. 149 The r e l a t i v e l y conservative nature of the Easch model in making change decisions at the item l e v e l was carried on into the decisions concerning change i n topic d i f f i c u l t y . .. The standardized differences of mean group d i f f i c u l t i e s are almost invariably less in the Easch approach than otherwise (see Table 4.21). This tendency led to c o n f l i c t i n g decisions cn twc topics in the 1964-1970 comparison, one topic i n the 1970-1979 comparison, and four topics i n the 1964-1979 comparison. The difference i s most dramatic i n looking at change from 1964 to 1979. On the t r a d i t i o n a l comparison one would conclude that decline had occurred on a l l ten topics whereas the Easch comparisons would y i e l d more conservative, r e s u l t s . It would appear that the decision as to which model to use rests mainly on the user's view of which model most appropriately represents the confidence i n t e r v a l f o r items at the extremes of the d i f f i c u l t y range.„ For d i f f i c u l t items, the issue may be resclved i n favour.of the Easch procedure by appealing to the argument that uncertainty increases as .items become more l i k e l y candidates for guessing cr random responses. For easy items, however, the s i t u a t i o n i s unclear.. I t has been suggested that boredom, or carelessness, may introduce uncertainty into the results cn very easy items.. This i s c e r t a i n l y possible, but the suggestion does not have the same i n t u i t i v e force as that for guessing on d i f f i c u l t items. The issue remains b a s i c a l l y unresolved. 150 Chanqe i n Achievement i n B r i t i s h Columbia The present study was designed to exp l o r e the use of the Easch model to measure change. I n v e s t i g a t i o n has shown the Easch model to be more c o n s e r v a t i v e than the a l t e r n a t i v e j t r a d i t i o n a l approach, each using the t e s t item as the u n i t of a n a l y s i s . That i s of t h e o r e t i c a l i n t e r e s t . The study was a l s o intended to i n v e s t i g a t e and r e p o r t upon r e a l change fchich had taken place i n the mathematics achievement of Grade 7 students i n B r i t i s h Columbia. P r i n c i p a l s and s u p e r i n t e n d e n t s were promised a summary of the findings,„ That i s of p r a c t i c a l i n t e r e s t . Which change i s " r e a l " - — E a s c h or t r a d i t i o n a l ? S i nce the l e t t e r r e q u e s t i n g the c o o p e r a t i o n of sch o o l p r i n c i p a l s had i n d i c a t e d t h a t a d i f f e r e n t k i n d of s t a t i s t i c a l model would be used t o c a r r y cut t h e . a n a l y s i s , i t was decided that a b a s i c commitment t o the Easch model had been made., Consequently, a l l f u r t h e r d i s c u s s i o n cf r e s u l t s and t r e n d s are based on d e c i s i o n s reached u s i n g the Easch model. Sampling and t i o t i vat ion C o n s i d e r a t i o n s In c o n t r a s t to 1S64 and 1970, the data c o l l e c t i o n f o r 1979 r e l i e d on the v o l u n t a r y c o o p e r a t i o n o f personnel i n th e f i e l d . Consequently, there was some concern t h a t , without the persuasive f o r c e of a u t h o r i t y , s c h o o l d i s t r i c t s would be r e l u c t a n t t o cooperate i n a study which might not r e f l e c t well on the comparative achievement of s t u d e n t s . These concerns proved to be unfounded, as almost 90% of the s u p e r i n t e n d e n t s 151 agreed to allow the r e s e a r c h e r to c o n t a c t s c h o o l p r i n c i p a l s . There s t i l l remained the g u e s t i o n cf how w e l l schools would cooperate, but again the r e t u r n r a t e of completed t e s t s from s c h o o l s was very high, over 95%. As a r e s u l t , completed t e s t papers were returned f o r 1277 students, approximately 86% of the design sample of 1500. The l o s s of fourteen s c h o o l s from the sample was not d i s t r i b u t e d e g u a l l y a c r o s s the s i x geographic r e g i o n s . . The g r e a t e s t l o s s was frcm the Greater Vancouver r e g i o n , with 15 out of 24 s c h o o l s cooperating.„ The second g r e a t e s t l o s s was from the North r e g i o n , with 5 of 9 schools remaining i n the sample. Regions 3 and 4 l o s t one s c h o o l each; there were no l o s s e s from Regions 1 and 5>r , Most of the s c h o o l s l o s t were l o c a t e d i n l a r g e r s c h c c l d i s t r i c t s and s i t u a t e d i n urban c e n t r e s . While the l o s s of such s c h o o l s may b i a s the sample, because of the wide v a r i a b i l i t y of s c h c c l s w i t h i n each d i s t r i c t , no f i r m c o n c l u s i o n can be made i n t h i s regard.„ A second f a c t o r which may have a b e a r i n g cn the v a l i d i t y of the year-to-year' comparison i s the changing sampling procedure. In 1964, the sample was drawn frcm the p o p u l a t i o n s t r a t i f i e d by performance on the e n t i r e S t a n f o r d Achievement B a t t e r y . In 1970, the s e l e c t i o n was the same but performance was based on j u s t the two a r i t h m e t i c t e s t s . In 1979, the c r i t e r i o n was performance cn the two a r i t h m e t i c t e s t s , but the sample was c o n s t r u c t e d t o r e f l e c t the geographic d i v e r s i t y and v a r i a t i o n i n s c h o o l s i z e i n the province. Although the samples f o r 1964 and 1970 probably f a i r l y r epresented the r e g i o n s through the random s e l e c t i o n process, there i s no guarantee, with a sample s i z e of 300, t h a t t h i s was achieved.. The same argument a p p l i e s to s c h o o l s i z e . N e v e rtheless, these p o t e n t i a l sample d i f f e r e n c e s appeared to be overwhelmed by the magnitude and c o n s i s t e n c y of change over the years i n t h i s study. , In another s i t u a t i o n where evidence f o r change was l e s s c l e a r , the sampling v a r i a t i o n s might have been considered a more important f a c t o r . A t h i r d c o m p l i c a t i n g f a c t o r i s t h a t of student m o t i v a t i o n . , In 1964 and i n 1970, the t e s t i n g program was province-wide. In 1964 the r e s u l t s f o r each i n d i v i d u a l and each c l a s s were r e t u r n e d t o the teacher. , I t i s assumed t h a t the p u p i l s w r i t i n g the t e s t s were aware t h a t t h i s would be the case. I t i s not known whether r e s u l t s c f the 1970 t e s t i n g i were forwarded to the classroom teacher, but the t e s t s were admin i s t e r e d under the a u t h o r i t y of the Department. In both i n s t a n c e s , students were under seme pressure to perform as well as p o s s i b l e . . That c o n d i t i o n did not hold i n 1979. , In 1979, the design of the study d i d not r e q u i r e the d e t e r m i n a t i o n of c l a s s averages or summary s t a t i s t i c s f o r d i s t r i c t s . . T h i s , combined with the p r e v a i l i n g view r e g a r d i n g the need f o r c o n f i d e n t i a l i t y cf t e s t r e s u l t s , r e s u l t e d i n a d e c i s i o n not to r e t u r n r e s u l t s t o classroom t e a c h e r s or p r i n c i p a l s . . T h i s may have reduced both the m o t i v a t i o n of students to succeed on the t e s t and t h a t of t e a c h e r s to ensure proper t e s t c o n d i t i o n s . 153 Change i n Achievement by Content Area With sampling and mo t i v a t i o n r e s e r v a t i o n s i n mind, the r e s u l t s f o r each content area on each comparison may be examined.. In the d i s c u s s i o n , one o f the t o p i c s , #10, U n i t s of Measure, w i l l be d e a l t with on i t s own at a l a t e r stage. Only the nine remaining t o p i c s are i n v o l v e d i n each ccmparison to f e l l o w . . The comparison which prompted t h i s study was t h a t of 1S64 to 1970. , The r e s u l t s of the present study i n d i c a t e t h a t the D i r e c t o r ' s concerns f o r d e c l i n i n g performance had some fou n d a t i o n . Standardized d i f f e r e n c e s of mean t o p i c d i f f i c u l t i e s show a g e n e r a l t r e n d supporting the D i r e c t o r ' s c o n c l u s i o n s . However, j u s t one t o p i c can be judged t o have i n c r e a s e d s i g n i f i c a n t l y i n d i f f i c u l t y . . . On Topic #7, Percent, three of seven items (C14, C19, C35) i n c r e a s e d i n absolute d i f f i c u l t y , with two of those t h r e e (C14, C19) a l s o i n c r e a s i n g i n r e l a t i v e d i f f i c u l t y . Changes on the l a t t e r two items were g u i t e l a r g e : on each item, i n 1964 about one-guarter of the examinees responded i n c o r r e c t l y ; i n 1970, that p r o p o r t i o n i n c r e a s e d to o n e - h a l f . . The q u e s t i o n s were s t r a i g h t - f o r w a r d , f o r example, C14: 30% of $40 = ? . I t i s d i f f i c u l t to understand how these items were so much mere d i f f i c u l t while performance on word problems r e g u i r i n g the same computation was not s i g n i f i c a n t l y reduced.^ The s o l e t o p i c which p r e v a i l e d against the g e n e r a l d e c l i n i n g t r e n d was Elementary A l g e b r a , on *hich performance improved s i g n i f i c a n t l y from 1964 to 1970.. F i v e out of seven items improved r e l a t i v e l y (R38, E40, C23, C37, C42) and three 154 of these showed a b s o l u t e improvement (E38, C23, C37). The c h a r a c t e r i s t i c common to the f i v e items showing r e l a t i v e improvement was the i n c l u s i o n of a v a r i a b l e , . The r e s u l t s wculd seem to i n d i c a t e a r e l a t i v e l y g r e a t e r emphasis w i t h i n the c u r r i c u l u m on a l g e b r a i c usage, and a consequent improvement i n student performance on t h i s t o p i c . From 1970 to 1979, three t o p i c s i n c r e a s e d s i g n i f i c a n t l y i n d i f f i c u l t y . . Once ag a i n , the f a c t t h a t a l l numerical changes, s i g n i f i c a n t or not, were i n the d i r e c t i o n of i n c r e a s i n g d i f f i c u l t y i n d i c a t e s an o v e r a l l t r e n d . The improvement i n Elementary Algebra from 1964 to 1970 was l o s t from 1970 t o 1979. , Although the t o p i c d i d not r e c e i v e r e l a t i v e l y l e s s emphasis, performance d e c l i n e d on s i x of the seven component items. Students' understanding and a b i l i t y t c manipulate ccmmon f r a c t i o n s decreased from 1970 t o 1979. On To p i c #3, Common F r a c t i o n Concepts and Operations, 9 out of 12 items were more d i f f i c u l t i n 1979. On Topic #4, A p p l i c a t i o n s Using Ccmmon F r a c t i o n s , 7 out of 8 items showed i n c r e a s i n g d i f f i c u l t y . On the ab s o l u t e comparison from 1964 to 1979, f i v e t o p i c s i n c r e a s e d i n d i f f i c u l t y . In a d d i t i o n t c the three t o p i c s p r e v i o u s l y d i s c u s s e d , the p e r s i s t e n t trend of i n c r e a s i n g d i f f i c u l t y r e s u l t e d i n a s i g n i f i c a n t d e c l i n e i n performance on Topic #2, A p p l i c a t i o n s of Whole Numbers, and Topic #5, Decimals., The p r o p o r t i o n o f items i n each group which were more d i f f i c u l t i n the l a t t e r year were as f o l l o w s : A p p l i c a t i o n s of Whole Numbers, 4 of 8; Common F r a c t i o n 155 Concepts and O p e r a t i o n s , 9 of 12; A p p l i c a t i o n s Using Common F r a c t i o n s , 8 of 8; Decimals, 6 of 11; Percent, 6 of 7. The d e c l i n e i n performance i n Elementary Algebra from 1970 t c 1979 o f f s e t the improvement from 19(54 to 1970, l e a v i n g a net e f f e c t of no change- . A d i s c u s s i o n of change on Topic #10, U n i t s of Measure, has been postponed u n t i l now because i t serves to i l l u s t r a t e a fundamental problem with the t e s t a d m i n i s t r a t i o n i n 1979- , In 1970, the Canadian government announced plans to convert from the i m p e r i a l system o f weights and measures to t h e metric system. The changeover was to be completed by 1981. In September, 1973, a l l p u p i l s at the primary l e v e l were to begin u s i n g the metric system. The C o u n c i l of M i n i s t e r s of E d u c a t i o n , Canada, agreed t h a t i n s t r u c t i o n i n Canadian p u b l i c s c h o o l s should be predominantly metric by 1978- The Metric Commission of Canada recommended t h a t the changeover t o the metric system be done with a minimum of conver s i o n from i m p e r i a l to metric u n i t s . T h i s p o l i c y was to be f o l l o w e d i n the s c h o o l s where students were t c be encouraged to "think m e t r i c " through immersion i n the metric system. Of e i g h t u n s o l i c i t e d l e t t e r s from t e a c h e r s and p r i n c i p a l s who administered the t e s t s i n 1979, seven pointed out the d i f f i c u l t y of using an o l d t e s t to assess students who were used t o metric measures. . The obvious problem l i e s with the u n i t s themselves, but one correspondent a l s o suggested t h a t the use of commas i n s t e a d of spaces i n l a r g e r numbers was a source of c o n f u s i o n on three items. 156 The metric problem had been recognized p r i o r to sending out the 1979 t e s t s . . However, r e g a r d l e s s of the d i r e c t i v e s of the M i n i s t r y of E d u c a t i o n on m e t r i c a t i o n , i t was u n c l e a r what was a c t u a l l y being done i n the s c h o o l s . For example, i n the p r o v i n c i a l mathematics assessment of 1977, E o b i t a i l l e and S h e r r i l l suggested t h a t the m a j o r i t y of elementary t e a c h e r s i n the province were s t i l l using both m e t r i c and i m p e r i a l u n i t s of measure i n t h e i r t e a c h i n g . . I t was a l s o c o n j e c t u r e d by the r e s e a r c h e r t h a t the emphasis on a new measurement system might have r e s u l t e d i n teachers l o o k i n g a t the i n a d e q u a c i e s of the o l d system, thereby evoking an awareness of i m p e r i a l u n i t s . The r e s u l t s frcm the 1979 t e s t s appear to support the concerns r a i s e d by the t e a c h e r s . . F i f t e e n items out of 89 on the t e s t s i n v o l v e d the use of i m p e r i a l u n i t s . Eleven of these i n c r e a s e d i n d i f f i c u l t y from 1970 to 1979. , However, seven had i n c r e a s e d i n d i f f i c u l t y frcm 1964 t o 1970, ; suggesting that more than a problem of u n i t s was i n v o l v e d . The f i n a l comparison shows t h a t 14 had become more d i f f i c u l t ^ f r o m 1964 to 1979, although, oddly enough, the item r e g u i r i n g the reading of a gas meter i n c u b i c f e e t had become e a s i e r . Of the f i f t e e n items, only two r e q u i r e d a knowledqe of a base other than ten. These were items i n v o l v i n g a d d i t i o n and s u b t r a c t i o n of two or more q u a n t i t i e s i n u n i t s cf pounds and ounces. These two items had been p l a c e d under Topic #10 along with Item R24, r e q u i r i n g s u b t r a c t i o n of hours and minutes, and Item C33, r e q u i r i n g the a d d i t i o n of metres and c e n t i m e t r e s . The l a t t e r was the only metric item on the two 157 t e s t s , and i t was e l i m i n a t e d i n the c a l i b r a t i o n process, l e a v i n g j u s t three items i n the t o p i c . Although the t o p i c showed r e l a t i v e and a b s o l u t e i n c r e a s e i n d i f f i c u l t y i n a l l but one comparison, i t was considered t c be too r e s t r i c t e d i n content to permit f u r t h e r g e n e r a l i z a t i o n . On t h i r t e e n of the f i f t e e n items using i m p e r i a l measure, the u n i t was i n c i d e n t a l t o the computation process.„ For example, Item E16: "At 8 miles an hour, how many mi l e s can a sk a t e r go i n 4 1/2 hours?" While i t i s net obvious that the use of u n f a m i l i a r u n i t s i n v a l i d a t e s the q u e s t i o n , the e f f e c t may be to d e t r a c t from the s t u d e n t s ' performance. This was a f a c t o r which p a r t i c u l a r l y a f f e c t e d c o n c l u s i o n s reached r e g a r d i n g Topic #4, A p p l i c a t i o n s Using Common F r a c t i o n s , f o r which a l l but one of i t s 8 items made use cf i m p e r i a l u n i t s . . The problem a l s o occurs, to a l e s s e r extent, on T o p i c #2, A p p l i c a t i o n s Using Whole Numbers, where 4 cf 8 items i n v o l v e d i m p e r i a l measure. The problem does not a r i s e on any other : i t o p i c . Before o u t l i n i n g p o s s i b l e reasons f o r changing performance, s e v e r a l aspects of the t e s t s and t o p i c s should be reviewed and c l a r i f i e d . In the f i r s t p l a c e , both the number and nature of the t o p i c s i n t o which the items were grouped were a r b i t r a r i l y determined. The i n i t i a l c l a s s i f i c a t i o n of the items was t h a t which seemed most n a t u r a l to the r e s e a r c h e r . Another i n v e s t i g a t o r may w e l l have reduced the number of t o p i c s or r e a s s i g n e d the items,. Secondly, there was no choice i n the scope and depth of the items assigned t c each category. The content of each category was determined s o l e l y 158 by the items a v a i l a b l e from the t e s t . There i s no assurance t h a t each t o p i c i s adequately represented by the items subsumed by i t . Ther e f o r e , when performance on a t o p i c i s r e f e r r e d t o , i t must be thought pf achievement on t h i s t o p i c as d e f i n e d by these items from t h i s t e s t . The temptation to g e n e r a l i z e beyond the data must be r e s i s t e d . I t was not the i n t e n t of the study t o attempt t o i d e n t i f y , i n any systematic way, c o r r e l a t e s of change. No demographic data such as age, sex, socio-econcmic s t a t u s , or language spoken at home, were gathered. The nature of changing p o p u l a t i o n c h a r a c t e r i s t i c s and t h e i r r e l a t i o n s h i p to performance i n s c h o o l i s complex,,. No doubt, s o c i e t a l f a c t o r s such as i n c r e a s i n g u r b a n i z a t i o n , t e l e v i s i o n , i n c r e a s e d p e r m i s s i v e n e s s , drug usage, : and m a r i t a l breakdown a l l c o n t r i b u t e to change i n achievement i n s c h o o l , but no attempt has been made here t o assess t h e i r i n f l u e n c e . When the f a c t o r s which p o s s i b l y i n f l u e n c e chance are narrowed down to the s c h o o l and the su b j e c t w i t h i n the c u r r i c u l u m , hypotheses can be formed with somewhat gr e a t e r c o n f i d e n c e . The f i r s t f a c t o r which may help to e x p l a i n the c o n s i s t e n t d e c l i n e i n performance on the t e s t as a whole from 1964 to 1979 i s the changing c u r r i c u l u m . . As pointed out p r e v i o u s l y , the t e s t s were based on the American c u r r i c u l u m of the l a t e 1940's, and there i s a high p r o b a b i l i t y t h a t the t e s t s were a l s o a p p r o p r i a t e f o r the c u r r i c u l u m of B r i t i s h Columbia elementary s c h o o l s i n 1964. Canadian and American mathematics c u r r i c u l a have g e n e r a l l y been comparahle at any given time. In B r i t i s h Columbia, a s e r i e s of textbooks 159 e n t i t l e d Study A r i t h m e t i c s had been used as supplementary t e x t s f o r Grades 3 to 6 i n the e a r l y 1940's. . I t was adopted as the a u t h o r i z e d textbook s e r i e s , and hence course of study, i n 1947- I t continued as the s c l e a u t h o r i z e d textbook s e r i e s u n t i l the Seeing Through A r i t h m e t i c s e r i e s was phased i n between 1962 and 1966. At the Grade 7 l e v e l . Mathematics f o r Canadians, 7 was used from the mid-1950's t c the mid-1S60's. I t r e p l a c e d J u n i o r Mathematics, Eook-1, which had been used s i n c e the e a r l y 1940's. T h e r e f o r e , i n general, there was a s t a b i l i t y o f c u r r i c u l u m l a s t i n g over twenty years on which t e a c h e r s could base t h e i r t e a c h i n g and t e s t i n g . , The Stanf o r d Achievement T e s t s l i k e l y were broadly r e p r e s e n t a t i v e of that c u r r i c u l u m . A r e v i s e d elementary mathematics program was phased i n during the years frcm 1962 to 1967, , By 1970, a l l Grade 7 students had had seven years of the new program., That program expanded the elementary mathematics c u r r i c u l u m t o i n c l u d e ••modern" t o p i c s such as the terminology of s e t s , the number l i n e , the p r o p e r t i e s of number systems, and numeration systems with bases other than t e n . , As w e l l , the program i n c l u d e d i n f o r m a l geometry as a fundamental component f o r each grade. The l a r g e s c a l e c u r r i c u l u m development p r o j e c t s of the 1950's and 1960's were r e p l a c e d i n the 1970's by an emphasis on l o c a l c u r r i c u l u m improvement. T h i s development, i i t o g e t h e r with a change i n the p r o v i n c i a l government, r e s u l t e d i n the d e c e n t r a l i z i c n of the c u r r i c u l u m frcm 1973 to 1976. The Seeing Through A r i t h m e t i c s e r i e s was r e p l a c e d ty a 160 m u l t i p l e a u t h o r i z a t i o n of t h r e e d i f f e r e n t s e r i e s . Teachers were expected to f o l l o w the Department's c u r r i c u l u m guide and t o use the best p a r t s of each s e r i e s t o provide the optimum program f o r t h e i r students. The t e a c h i n g . cf metric c-nits became a p r i o r i t y a t a l l l e v e l s . At the Grade 7 l e v e l the concepts of f u n c t i o n s and f l o w - c h a r t i n g were i n t r o d u c e d . Thus, i t can be seen t h a t i n 1964 the t e s t s used i n t h i s study l i k e l y sampled the t o t a l i t y of the students' mathematical knowledge, whereas by 197? the students had been exposed to a much broader c u r r i c u l u m than t h a t measured by the t e s t s . , Two p o i n t s can be made. In the f i r s t p l a c e , the t e s t r e s u l t s do not r e f l e c t the c h i l d ' s body of mathematical knowledge i n 1979. Performance on t o p i c s ccnmcn to previous years may be poorer, but the c h i l d ' s knowledge i s l i k e l y to be broader. , Secondly, because of an expanded c u r r i c u l u m , one might expect performance cn a p o r t i o n c f the c u r r i c u l u m to d e c l i n e i f the o v e r a l l time devoted to l e a r n i n g mathematics i n t h e elementary s c h o o l remained the same._ That i s , i f the time on a s p e c i f i c task decreased, lower performance might be expected., Table 5.1 shows how the time a l l o t t e d to the study of a r i t h m e t i c changed from 1958 ( B r i t i s h Columbia Department o f Education, 1957) to 1972 ( E r i t i s h Columbia Department of Education, 1972) . I t can be seen t h a t , f o r the l a s t f o u r years o f the elementary program, on which the bulk cf the t e s t items i n t h i s study were based, there has been l i t t l e change i n the time a l l o t m e n t . Ccnseguently, i f time on task i s a 161 Table 5.1 Time A l l o t t e d t c the Study of A r i t h m e t i c * Grade Year 1 2 3 4 5 6 7 1S58 100 100 150 200 200 200 240 1972 150-160 150-200 200-210 200-220 200-220 200-220 200-240 * i n minutes per week s i g n i f i c a n t f a c t o r , the observed d e c l i n e i n achievement should not be unexpected. Another f a c t o r which might be expected t o i n f l u e n c e achievement i s t e a c h i n g method. I f the proponents of such i n n o v a t i o n s as open area s c h o o l s , d i s c o v e r y l e a r n i n g , and team t e a c h i n g are c o r r e c t , improvement i n achievement should r e s u l t frcm the adoption of any or a l l o f the above. I t appears t h a t t h i s q u e s t i o n must remain unresolved f o r , i n s p i t e o f the r h e t o r i c of reformers and the e x h o r t a t i o n s c f c u r r i c u l u m guide w r i t e r s , i n s t r u c t i o n a l p r a c t i c e s seem t o have remained l a r g e l y unchanged. ; An e x t e n s i v e survey of B r i t i s h Columbia mathematics t e a c h e r s i n 1977 ( B o b i t a i l l e & S h e r r i l l , 1977a) concluded t h a t : the teacher of mathematics i s h i g h l y t r a d i t i o n a l i n c h a r a c t e r , . . . t h e most f r e q u e n t l y used t e a c h i n g techniques are t o t a l c l a s s i n s t r u c t i o n and tea c h e r e x p l a n a t i o n . . Among the mcst commonly used student a c t i v i t i e s are i n d i v i d u a l work and textbook e x e r c i s e s . , , , t h e s e r e s u l t s i n d i c a t e t h a t few o r g a n i z a t i o n a l i n n o v a t i o n s are being used i n the mathematics c l a s s e s of the province. (p. 44) Turning to s p e c i f i c changes by t o p i c , the 162 improvement i n Elementary Algebra frcm 1964 to 1970 i s e x p l a i n a b l e by the adoption of the modern mathematics program i n the mid 1960's. That program p l a c e d much emphasis cn the s o l u t i o n of open sentences. Thus performance became b e t t e r r e l a t i v e to other t o p i c s , and the a b s o l u t e achievement of s tudents i n c r e a s e d . The i n c r e a s i n g d i f f i c u l t y of q u e s t i o n s on Percent from 1964 t o 1970 miqht be a t t r i b u t e d , i n p a r t , to the chanqed method f o r s o l v i n q such problems.„ The p r e v i o u s program had p l a c e d r e l i a n c e on r u l e s , whereas the new program t r i e d to apply a s i n g l e s t r u c t u r e using r a t e p a i r s , p r o p o r t i o n s , and t h e s o l u t i o n of equations. . I t i s p o s s i b l e t h a t both t e a c h e r s and students found the new procedure more d i f f i c u l t . From 1970 to 1979, the i n c r e a s i n g d i f f i c u l t y with r e s p e c t to Operations on, and A p p l i c a t i o n s o f , Common F r a c t i o n s may be due to the i n t r o d u c t i o n of the metric system i n the mid 1970's. , T h i s may have served t c remove some.of the emphasis cn common f r a c t i o n s , s i n c e , with metric u n i t s , the n e c e s s i t y f o r d e a l i n g with t h i r d s , t w e l f t h s , s i x t e e n t h s , and the l i k e , i s reduced.. There appears t o be no i d e n t i f i a b l e reason f o r the d e c l i n i n g performance on Elementary Algebra from 1970 to.1979, and on A p p l i c a t i o n s of Whole Numbers, and Decimals from 1S64 t c 1979. The l a t t e r i s p a r t i c u l a r l y p u z z l i n g , as i t might be thought that the i n t r o d u c t i o n of the metric system would lead t o i n c r e a s e d f a c i l i t y i n the use of decimal f r a c t i o n s , A c r i t i c i s m t h a t might be made of a study of t h i s s o r t i s t h a t comparisons with past years do not h e l p to assess 163 performance r e l a t i v e to the e x p e c t a t i o n s of contemporary s o c i e t y . That i s , performance may have d e c l i n e d over p r e v i o u s y e a r s , but may s t i l l be a c c e p t a b l e i n i t s own t i m e , o r t h e o p p o s i t e may h o l d . To g a i n some i n s i g h t i n t o how the 1979 performance might be viewed, an a n a l y s i s was made o f a p r e v i o u s s t u d y i n B r i t i s h Columbia, In t h e s p r i n g c f 1977, the l e a r n i n g Assessment Branch o f the B r i t i s h Columbia M i n i s t r y of E d u c a t i o n a d m i n i s t e r e d p r o v i n c i a l l y developed mathematics t e s t s t o a l l s t u d e n t s i n the p r o v i n c e e n r o l l e d i n Grades 4, 8 and 12. The t e s t s were c o n s t r u c t e d t o measure minimum b a s i c s k i l l s which t h e s t u d e n t might be ex p e c t e d t o possess a t each grade l e v e l . F o r each g r a d e , r e s u l t s on each i t e m were judged by a f i f t e e n -member i n t e r p r e t a t i o n p a n e l c o n s i s t i n g c f seven mathematics t e a c h e r s a t t h a t grade l e v e l , two s u p e r v i s o r s of i n s t r u c t i o n , two t e a c h e r e d u c a t o r s , two s c h o o l t r u s t e e s , and two members of the p u b l i c a t l a r g e . . The pa n e l r a t e d performance on each i t e m on a f i v e p o i n t s c a l e i n d i c a t i n g t h e i r s a t i s f a c t i o n w i t h the r e s u l t s , as f o l l o w s : 5 - s t r e n g t h 4 - very s a t i s f a c t o r y 3 - s a t i s f a c t o r y 2 - m a r g i n a l l y s a t i s f a c t o r y 1 - weakness On the assu m p t i o n t h a t t h e i n t e r p r e t a t i o n o f the performance of Grade 8 s t u d e n t s i n 1S7y would not d i f f e r s u b s t a n t i a l l y from t h a t o f Grade 7 s t u d e n t s i n 1979, t h i s r e s e a r c h e r a s s i g n e d 59 cut o f t h e 60 i t e m s cn the Grade 8 t e s t 164 to the ten content c a t e g o r i e s used i n the present study. Number 10, U n i t s of Measure, became U n i t s cf M e t r i c Measure. The r a t i n g f o r each item, obtained frcm R o b i t a i l l e and S h e r r i l l (1977b), ranged from 1 to 5, and the mean r a t i n g f o r the t o p i c was c a l c u l a t e d , with the r e s u l t s shewn i n Table 5.2. Table 5.2 Mean S a t i s f a c t i o n on the 1977 Ea t i n g s of Content Areas Grade 8 Assessment Content Area No of Items Mean 1. WNC 12 3-5 2. WNA '2 4.5 3. CFC 10 2. 6 4. . CFA 0 ---5, Dec 8 2.75 6. , Mon 2 3.0 7. Pet 3 2.0 8. Alg 3 2.33 9. . Geo 14 2-36 10. Mea (metric) 5 3.6 L i t t l e r e l i a n c e can be placed cn r a t i n g s of content areas which c o n t a i n few items., For i n c l u s i o n i n the d i s c u s s i o n , the number of items r e g u i r e d i n a given c o n t e n t area was a r b i t r a r i l y s e t a t 5. , As a r e s u l t , f i v e t o p i c s may be c o n s i d e r e d : WNC, CFC, Dec, Geo, and Mea ( m e t r i c ) . On three of these f i v e t o p i c s , performance was judged t o be l e s s than s a t i s f a c t o r y , t h a t i s , t h e i r mean s a t i s f a c t i o n r a t i n g was l e s s than 3.0.. One t o p i c showing weakness was Geometry, but the changing r o l e of geometry i n the c u r r i c u l u m s i n c e 1964 prevents any g e n e r a l c o n c l u s i o n s t o be made.. On the other two 165 topics—Common F r a c t i o n s Concepts and O p e r a t i o n s , and Decimals—performance at the Grade 7 l e v e l had d e c l i n e d from 1964. The combination of the r e s u l t s frcm the two s t u d i e s i n d i c a t e s the need f o r a s e r i o u s r e a p p r a i s a l of the e f f e c t i v e n e s s of i n s t r u c t i o n on these two t o p i c s . . L i m i t a t i o n s of the Study The l i m i t a t i o n s of the study d e r i v e from three sources: the model i t s e l f , the nature of the data, and the purpose of the study.. Eentz and Bashaw (1977) i d e n t i f y two d i f f e r i n g a p p l i c a t i o n s of the Easch model: t e s t c o n s t r u c t i o n and t e s t a n a l y s i s . In the former case the t e s t maker can use the.Easch model as a guide, with the freedom to s e l e c t the best set of items which f i t the model.. In the l a t t e r case the c o l l e c t i o n of t e s t items i s v i r t u a l l y f i x e d ; i t becomes necessary t o r e l y on the robustness of the model to accommodate l e s s than i d e a l items. This study f a l l s i n t o the second category. In 1970 some teachers r a i s e d o b j e c t i o n s t h a t the S t a n f o r d Achievement T e s t s were not based cn the B r i t i s h Columbia modern mathematics c u r r i c u l u m ( B r i t i s h Columbia. Department o f Education, 1971). I n f a c t , as has been noted, the t e s t was based on the American c u r r i c u l u m of the l a t e 1940's. Thus, the assessment of change was based upon those elements of the c u r r i c u l u m c f the past f i f t e e n years common to t h a t of t h i r t y years ago. Hence the study i s l i m i t e d i n the scope of the c u r r i c u l u m with which i t d e a l s . I t does not attempt to a s s e s s change i n the o v e r a l l mathematics program. 166 F i n a l l y , the primary purpose of the study was to document and measure change. Although some s u g g e s t i o n s are made concerning the reasons f o r change i t was not the i n t e n t t o undertake an i n v e s t i g a t i o n i n t o the c o r r e l a t i o n between other f a c t o r s and change i n mathematical performance. Any i d e n t i f i e d changes, of course, cannot be g e n e r a l i z e d beyond the boundaries of the province o f B r i t i s h Columbia, Some Concerns and Suqqestions f o r Future Besearch The most s u i t a b l e c r i t e r i o n to be used as a measure o f item f i t i n the Easch model,is s t i l l an open question,. The use of the f i t mean square has r e c e n t l y been c r i t i c i z e d by George (1979), who maintains t h a t the use of the s t a t i s t i c does not d e t e c t unacceptable v a r i a t i o n s i n d i s c r i m i n a t i o n . In a s e r i e s of BICAL c a l i b r a t i o n s on items from an E n g l i s h achievement t e s t , George concluded t h a t d i s s i m i l a r i t i e s i n item d i s c r i m i n a t i o n could produce d i s c r e p a n c i e s i n d i f f i c u l t y e s t imates based on high and low a b i l i t y g icups, While the i m p l i c a t i o n s of George's study are most s e r i o u s f o r a p p l i c a t i o n s of the Easch model such as t e s t l i n k i n g , and v e r t i c a l equatinq, t h e r e i s a l s o reason f o r concern i n ccmparinq the performance of two groups of d i f f e r i n g a b i l i t i e s on the same t e s t , as was the case i n the present study. In view of the c o n t r o v e r s y i n the l i t e r a t u r e concerning the problem of f i t , t h i s d i f f i c u l t y w i l l l i k e l y remain f o r some time t o come. 167 A second matter which r e q u i r e s f u r t h e r i n v e s t i g a t i o n i s the nature of the mathematical r e l a t i o n s h i p between item d i f f i c u l t y and standard e r r o r , , Hashway (1977) c o n j e c t u r e d t h a t the standard e r r o r may be a h y p e r b o l i c tangent f u n c t i o n of the parameter. I t has been pointed out that the b a s i c element r e s p o n s i b l e f o r the sometimes c o n f l i c t i n g d e c i s i o n s reached i n the Rasch and t r a d i t i o n a l models was the d i f f e r e n t i a l behaviour of the standard e r r o r . , The r e l a t i o n s h i p needs t o be c l a r i f i e d i n order t o permit a thorough a n a l y s i s of the d i f f e r i n g r e s u l t s when the two approaches are used. . D i f f i c u l t y with standard e r r o r goes beyond the mathematical. , There i s a f u r t h e r q u e s t i o n of which model best r e f l e c t s the r e a l i t y of t e s t i n g , , An argument has been made f c r p r e f e r r i n g the Rasch model f o r d i f f i c u l t items, but the case at t h e other end of the d i f f i c u l t y s c a l e i s not c l e a r . . A w e l l thought out r a t i o n a l e needs"to be developed i n order t o decide which i s p r e f e r a b l e o v e r a l l . . F i n a l l y , i f the Rasch model i s t o be used f o r f u t u r e comparisons of performance, procedures based upon the item as the u n i t of a n a l y s i s are recommended over those u s i n g the person., In g e n e r a l , the standard e r r o r cf estimate of the d i f f i c u l t y ( a b i l i t y ) parameter i s i n v e r s e l y p r o p o r t i o n a l to the square r o o t of the r e c i p r o c a l of the number of persons (items). Since the number of examinees i s l a r g e r than the number of items, e s t i m a t e s of item d i f f i c u l t i e s are more p r e c i s e than estimates of person a b i l i t i e s . Furthermore, because the number of examinees can be i n c r e a s e d without 168 l i m i t , the d i s t r i b u t i o n c f the item d i f f i c u l t i e s i s n e a r l y continuous, while t h a t o f the a b i l i t i e s i s d i s c r e t e , having the same number of steps as p o s s i b l e raw scor e s . . Hence, changes i n item d i f f i c u l t i e s are more l i k e l y to be detected than changes i n a b i l i t i e s , . I f the t e s t i s d i v i d e d i n t o content areas c o n t a i n i n g a few items i n each, the d i s t i n c t i o n becomes even more important. 169 EEEEBENCES Ahmann, J . S., & Glock, M. D. E v a l u a t i n g P r i n c i p l e s of Tests and Measurements (4th A l l y n and Bacon, 1971, Anderson, J . , Kearney, G. ,E., S E v e r e t t , A.V. An e v a l u a t i o n of Easch's s t r u c t u r a l model f o r t e s t items, 3 r i t i s h J o u r n a l of Mathematical and S t a t i s t i c a l Psychology, 1968, 2J, 231-238. ; • Armbruster, F. ,E. The. more we spend, the l e s s c h i l d r e n l e a r n , The New York Times Magazine, August 28, 1 977. A u s t i n , G. B., 6 P r e v o s t , P. L o n g i t u d i n a l e v a l u a t i o n of mathematical computational a b i l i t i e s of New Hampshire's e i g h t h and t e n t h graders, 1963-1967. J o u r n a l f o r Be search i n Mathematics Education, 1972, 3, 59-64. Beckmann, M.. W. B a s i c competencies -- t w e n t y - f i v e years ago, ten years ago, and uow. The Mathematics Teachejc, 1S78, 7_1, 102-106. . ;i B e l t z n e r , K., P., Coleman, A,. J . , & Edwards, G, D. Mathematical Sciences i n Canada. . Ottawa: P r i n t i n g and P u b l i s h i n g Supply and S e r v i c e s Canada, 1S76. B r i t i s h Columbia Department of Education. Programme of S t u d i e s f o r the Elementary Schools of B r i t i s h Columbia. V i c t o r i a , B. C.: Department of Education, D i v i s i o n of Curriculum, 1957. B r i t i s h Columbia Department of Education. A Beport on the T e s t i n g of A r i t h m e t i c . . V i c t o r i a , B.C.: Department of Education, Besearch and Standards Branch, 1970, E r i t i s h Columbia Department of Education, N i n e t y - n i n t h Annual P u b l i c Schools Beport, 1969-70. V i c t o r i a , B.C.: Department of Education, 1971,. ' B r i t i s h Columbia Department of • Education.. I n s t r u c t i o n a l s e r v i c e s c i r c u l a r #761: elementary s c h c c l — t i m e a l l o t m e n t g u i d e l i n e s . V i c t o r i a , B. C. ; Department of Education, September 9, 1972. ,f P u p i l Growth: Ed.) Boston: ! 170 Carpenter, T. ,P. , Coburn, T. G., Eeys, B. , E., £ Wilson, J. JR e s u l t s and i m p l i c a t i o n s of the NAEP mathematics assessment: secondary s c h o o l . The Mathematics Teacher, 1975, 68, 453-470. C a r t l e d g e , C. McC.. A comparison of e g u i p e r c e n t i l e and Easch equating methodologies ( D o c t o r a l d i s s e r t a t i o n , Northwestern U n i v e r s i t y , 1976),. D i s s e r t a t i o n A b s t r a c t s I n t e r n a t i o n a l , 1976, 37, 5141 A, . ( U n i v e r s i t y M i c r o f i l m s No. .76-2215) Choppin, B. H. . Item bank using sample-free c a l i b r a t i o n . . Nature, 1968, 21.9, 870-872. , Choppin, B. .H. Eecent developments i n item banking: a review. In N. . M. De G r u i j t e r & L. J , Th. van der Kamp (Eds.). Advances i n P s y c h o l o g i c a l I and E d u c a t i o n a l Measurement.. New York: Wiley, 1976. ~ C l a r k e , S. C. T., Nyberg, V. , & Worth, W, H, General Eeport on Edmonton Grade I I I Achievement: 1956 - JJ77 Comparisons.. Edmonton, A l t a . : A l b e r t a Education, 1977.. Conway, C. B. Grade VII Aptitude-Achievement Survey, March 9th - 13th, 1964. V i c t o r i a , B .C. : Department o f Education , D i v i s i o n of T e s t s and Standards, 1964. Dinero, T. E., S H a e r t e l , E. A computer s i m u l a t i o n i n v e s t i g a t i n g the a p p l i c a b i l i t y of the Easch model with v a r y i n g item c h a r a c t e r i s t i c s . . Paper presented at the annual meeting of the N a t i o n a l C o u n c i l cn Measurement i n Education, San F r a n c i s c o , 1976. (ERIC Document Eeproduction S e r v i c e No.„ED 120240) F i s c h e r , G. H..Some p r o b a b i l i s t i c models f o r measuring change. In N. M. de G r u i j t e r 8 L. J . , Th. .van der Kamp (Eds.) Advances i n P s y c h o l o g i c a l and E d u c a t i o n a l Measurement. New York: John Wiley and Sons, 1976.. Forbes, D. W., The use of Easch l o g i s t i c s c a l i n g procedures i n the development of short 1 m u l t i - l e v e l a r i t h m e t i c achievement t e s t s f o r p u b l i c s c h o o l measurement. .. Paper presented a t the annual ^meeting of the American E d u c a t i o n a l Research A s s o c i a t i o n , San F r a n c i s c o , 1976.. (EEIC Document Reproduction S e r v i c e No. ED 128400) F o r s t e r , F., Ingebo, G., & Wolmut, P. Can Easch item ; l e v e l s be determined without random sampling? Monograph Y t Vo.1. I. , Por t l a n d , Ore: Northwest' E v a l u a t i o n A s s o c a t i o n , undated(a) F o r s t e r , F., Ingebo, G, , & Wclmut, P.. What i s the s m a l l e s t sample s i z e needed f o r f i e l d t e s t i n g ? Monograph I I , Vol. I, P o r t l a n d , Ore: Northwest E v a l u a t i o n A s s o c i a t i o n , undated(b) 171 F o r s y t h , B. A., S F e l d t , L. S. An i n v e s t i g a t i o n of e m p i r i c a l sampling d i s t r i b u t i o n s of c o r r e l a t i o n c o e f f i c i e n t s c o r r e c t e d f o r a t t e n u a t i o n . E d u c a t i o n a l and P s y c h o l o g i c a l Measurement, 1969, 29, 61-71. Fryman,, J . G. A p p l i c a t i o n of the Rasch simple l o g i s t i c model to a mathematics placement examination (Doctoral d i s s e r t a t i o n . U n i v e r s i t y , o f , K e n t u c k y , 1976). D i s s e r t a t i o n A b s t r a c t s I n t e r n a t i o n a l , 1976, 37, 5626A. ( U n i v e r s i t y M i c r o f i l m s No. 77-5689) George, A. A. T h e o r e t i c a l and p r a c t i c a l consequences cf the use of s t a n d a r d i z e d r e s i d u a l s as Easch model f i t s t a t i s t - i c s . Paper presented at the : annual' meeting of the Americ-an E d u c a t i o n a l Besearch A s s o c i a t i o n , San F r a n c i s c o , 1979. G l a s s , G. V, S S t a n l e y , J. C. S t a t i s t i c a l Methods i n Education and Psychology. Englewood C l i f f s , N,J.: P r e n t i c e r H a l l , 1970 Hambleton, B. K. An e m p i r i c a l i n v e s t i g a t i o n , of the Easch t e s t theory model (Doctoral d i s s e r t a t i o n , U n i v e r s i t y of Toronto, 1969). D i s s e r t a t i o n A b s t r a c t s I n t e r n a t i o n a l , 1971, 32, 4035A. ( M i c r o f i l m a v a i l a b l e through the N a t i o n a l L i b r a r y of Canada, Ottawa) Hambleton, E. K., & Cock, L. L.. L a t e n t t r a i t models and t h e i r use i n the a n a l y s i s of e d u c a t i o n a l t e s t data. J o u r n a l of E d u c a t i o n a l Measurement, 1977, J 4 , 75-95. . .. . • Hammons, D. W. . Student achievement i n s e l e c t e d areas of a r i t h m e t i c d u r i n g t r a n s i t i o n frcm t r a d i t i o n a l to modern mathematics (1960-1969) (Docto r a l d i s s e r t a t i o n , The L o u i s i a n a S t a t e U n i v e r s i t y and A g r i c u l t u r a l and Mechanical C o l l e g e , 1972). 5 D i s s e r t a t i o n A b s t r a c t s I n t e r n a t i o n a l , 1972, 33> 2237A. ( U n i v e r s i t y M i c r o f i l m s No. 72-28,349) '!'.-' ' Hashway, B..M..A comparison of t e s t s d e r i v e d u s i n g Easch and t r a d i t i o n a l psychometric paradigms (Doctoral d i s s e r t a t i o n , Boston C o l l e g e , 1977)., D i s s e r t a t i o n A b s t r a c t s I n t e r n a t i o n a l . : 1977, 38, 744A. ( U n i v e r s i t y M i c r o f i l m s No. . 77-17, 594) ; Hedges, H. G. Achievement i n B a s i c S k i l l s : A L o n g i t u d i n a l E v a l u a t i o n of E u p i l Achievement i n Language A r t s and Mathematics. Toronto: M i n i s t e r of Education, 1977.. Hungerman, A.. D.. 1965-1975: Achievement and A n a l y s i s of Computation S k i l l s Ten Years L a t e r . Ann, Arbor, Mich.: The U n i v e r s i t y of Michigan, 1975,. ..(EEIC Document Beproduction S e r v i c e No. ED 128202) Hungerman, A. . D. 1965-1975: : Achievement and A n a l y s i s of Computation S k i l l s Ten Years L a t e r (Part I I ) . 1977.. (EEIC Document Beproduction Service'No* ED 144839) 172 K e l l e y , T. L. , Madden, R. , Gardner, E. .F,., Terman, L. M. , & Ruch, G. M. . S t a n f o r d achievement Test r Intermediate and Advanced P a r t i a l B a t t e r i e s , Forms J , L, Mf and N: D i r e c t i o n s f o r administering,; New York: World Bock Co. , 1953. K e r l i n g e r , F. N. Foundations of B e h a v i o r a l Research. New York: H o l t , Rinehart and Winston, 1964. K i f e r , E., & Bramble, W., The c a l i b r a t i c n of a c r i t e r i o n - r e ferenced t e s t . Paper presented at the annual meeting of the american E d u c a t i o n a l Research a s s o c i a t i o n , Chicago, 1S74. (ERIC Document Reproduction S e r v i c e No. ID, 09 1434) . K l i n e , M. Why Johnny Can 1t add: The F a i l u r e of the New Math. New York: St. Martin's Press, 1973. . Larson, R. , M a r t i n , W., Sear I s , D., Form L The Stanford A r i t h m e t i c t e s t was administered as part of a b a t t e r y to 29,204 B.C. Grade V I I students out of 29,533 e n r o l l e d i n March, 1964. I t c o n s i s t e d of 45 Reasoning and 44 Computation items. The sub-tests overlap to a c e r t a i n extent: almost a l l 'reasoning' items requ i r e some s k i l l i n computation and many of the 'computation' items r e q u i r e a c e r t a i n amount of problem-solving a b i l i t y . Both—sub-tests c o n s i s t c h i e f l y of a p p l i c a t i o n s to every-day l i f e : the purchase of g a s o l i n e , r e n t i n g a boat, reading a map, combining f r a c t i o n s , f i n d i n g percentages. The same t e s t was r e p r i n t e d and readministered i n May, 1970. In the meantime, Grade VII enrolment had increased to 40,252 of whom 38,377 or 95% were t e s t e d . Of these, almost every p u p i l , i n c l u d i n g migrants from other p r o v i n c e s , would have had at l e a s t 6 years of "modem maths". The purpose of the second a d m i n i s t r a t i o n was to determine the changes that had occurred i n achievement i n the o r d i n a r y a r i t h m e t i c type of item. There had been rumours that w h i l e - c h i l d r e n were l e a r n i n g modem maths w e l l i n advance of t h e i r parents they were weak i n the s o l u t i o n of the types of mathematical problems that were o c c u r r i n g at home. The changes i n a great m a j o r i t y of the modern math items are, of course, impossible to determine because there i s no previous b a s i s of comparison. I t should be mentioned that i n almost a l l surveys conducted i n B.C. the average B.C. p u p i l has been away above the U.S. norm i n A r i t h m e t i c Reasoning and s l i g h t l y above the U.S. norm i n A r i t h m e t i c Computation. That has not been s u r p r i s i n g because B.C. a l s o has been w e l l above the U.S. norm i n mental age, u s u a l l y determined from v e r b a l group t e s t s which have a higher c o r r e l a t i o n w i t h reasoning than with computation items. That was true i n 1964. In terms of the U.S. modal-age grade equiva» l e n t s of that date, B.C. medians were 18 months or 1.8 school years ahead i n Reasoning and 11 months or 1.1 school years ahead i n Computation. When the same t e s t was used i n 1970, the B.£. students were found to have l o s t most of t h e i r advantage i n Reasoning and more than that i n Computation, as f o l l o w s : E x c e s s over U.S. Modal- Median^ May 1970 Age Grade Norms ( Y r s . Mo.) i n Terms of March, 1964 % i l e s 1964 1970 1964 1970 A r i t h m e t i c Reasoning 1.8 .8 38.3 50 Computation . 1 . 1 -.1 25.4 50 178 What is of even greater concern are the differences obtained when individual schools are compared over the 6.2 year period. As an example we may take a school in which the physical f a c i l i t i e s have been greatly improved with an open area and a l l kinds of modern equipment. As i t has become an experimental school, the pupil/staff ratio has decreased and particular attention has been paid to "modern maths" demonstrations. The school is in an average area: Arithmetic stanines were 5.1 and 4.8 in 1964, and i f any-thing, the neighbourhood seems to have improved since that time. Meanwhile, the Grade VII enrolment has increased from 47 to 73. Pupils are now on a level and continuous progress, rather than a grade system, with those called '"Grade VII" being at the appropriate levels. There is no evidence of excessive promotion by age in the age-grade relationships, with the enrolments in Grades V and VI being 83 and 77 respectively. Here are the comparative results for the school: Pupils Writing Mean (1964) Stanine E qu iv. L.G. E qu iv. Gr. Eq. Gr. Eq. Years 1970-1964 Arith. Reasoning March, 1964 47 May, 1970 73 5.1-5 4.16 C C-9.5 8.1 •1.4 -.2 = -1.6 Arith. Computation March, 1964 47 May, 1970 73 4.79 3.11 C D 9.1 7.4 -1.7 -.2 = -1.9 It must be pointed out that the grade-equivalent comparisons are in terms of pre-1964 U.S. norms. We have no new data but there are rumours that i f new U.S. norms were prepared in 1970, they too would be considerably lower. We do have the B.C. norms in terms of letter grades and percentiles for 1964, however, and a comparison of the percentiles for the two adminis-trations produces the results given on page 1. The detail given on the norm sheet shows that in Reasoning the 99th percentile is exactly the same, but that the difference increases for average and weaker students. The latter is more pronounced in Computation, i.e. the weaker students are being l e f t farther behind in comparison with 1964. This is a matter of considerable concern, because although results in Grades XI and XII show that we are doing a good job of producing a selected group of mathematicians, we s t i l l have to deal with members of a much larger group who w i l l have to buy groceries, read a map scale, make time payments and pay taxes* , It is in the applied-mathematics type of item that the greatest increase in d i f f i c u l t y , i.e. in number of errors, is found. But f i r s t , was the test valid? 179 -3-Validity The validity of a test may be considered from several angles: (a) Text-book v a l i d i t y - does the test measure what has been taught? This criterion is important in classroom tests. (b) Curricular v a l i d i t y - does the test measure what a group of knowledgeabl persons, or teachers in general have decided are reasonable outcomes in a particular f i e l d (say, Arithmetic) at a particular level (say, Grade VII)? (c) Statistical validity - This is determined for the individual items and involves the assumption that the pupils who get the most correct answers are, in general, the best in Arithmetic, and those who obtain the lowest scores are the worst. If we compare success in individual items of the upper and lower thirds of the students, the most valid items are those where the differences are greatest, i.e. the items that do the best job of distinguishing the best pupils from the weakest ones. Subjectively, the text-book validity of the Stanford test is much lower in 1970 than i t was in 1964. It is s t i l l relatively high in (b) however when we consider the application of modern mathematics to everyday computations and problems as they are met. Mileages, interest and taxation have unfortuna-tely not been abolished. And s t a t i s t i c a l l y , almost a l l of the items are more valid in 1970 than they were in 1964 (see the Table). That is largely due to the fact that the average d i f f i c u l t y of the items increased in relation to the average a b i l i t y of the students. As the d i f f i c u l t y of an item approaches 507„ the possible maximum validity rises and most of the items previously had d i f f i c u l t i e s of less than 507o. It should be noticed that while d i f f i c u l t y and valid i t y are related they are not the same. A test may be invalid because i t is entirely too d i f f i c u l t for the pupils, but i t is not valid merely because i t is easy. S t a t i s t i c a l l y the Stanford items proved to have excellent v a l i d i t y in both the 1964 and the 1970 administrations. The item-difficulties and item-validities for the two years are given in the Table on page ©J* It w i l l be noticed that there is a general increase in the number of errors with the exception of a few items. One of these: If 2m + 10 .= 28, m = , is definitely of a "new maths" emphasis. Another involved the setting up of an equation. Most of the remainder involve the meaning of mathematical terminology. 180 I t i s i n the a p p l i c a t i o n of the terminology and i n s t r a i g h t everyday computation that most of the o l d e r r o r s remain and, i n f a c t , have increased. In at l e a s t seven of the 44 Computation items and one of the Reasoning items a "zero d i f f i c u l t y " i s apparent, e.g. 400 X 201 or 3520 -r'5. In 1964, 13% and 23% of the p u p i l s marked the r e s p e c t i v e wrong answers,' i n 1970 these became 17% and 45%,. Operations i n v o l v i n g f r a c t i o n s , decimals and percents' a l s o have become more d i f f i c u l t . P u p i l s have always found the f o l l o w i n g p r o g r e s s i v e l y harder: 1 . 1 1 Y 1 1 ^ . •7 or -r T A - 7 -r — 4 4 4 4 4 4 Then they encounter: — J : The l a t t e r can be stated as, "How many quarters ^ ^ arc there i n one-quarter?" from which the step can be made t o : "How many halves are there i n one q u a r t e r ? " ( A + J : ) 4 2 and i n s t e a d of the obvious answer, "None" they can be shown that there i s 1 of A m i . . 2 2 4 In the p a s t , thousands of mediocre c h i l d r e n learned by rote the " i n v e r t and m u l t i p l y " method of d i v i d i n g by f r a c t i o n s . Many of them never understood what they r e a l l y were doing, but most of them got the c o r r e c t answer. I f we are now going to emphasize understanding we must sec that i t i s complete, e.g. that — ~ — - i s r e a l l y , "How many s i x t e e n t h s are there i n 4 16 one quarter?" and a diagram shows that the answer can q u i t e l o g i c a l l y be l a r g e r than e i t h e r of the o r i g i n a l f r a c t i o n s . Another obvious c o n c l u s i o n that one reaches when studying the items i s that many p u p i l s do not reason by analogy. For example, compare the d i f f i -c u l t i e s of the f o l l o w i n g items i n the Computation sub-test: 1964 1970 - = ~ D = 5% D = 8% (The answer 10 was given.) -1 + —k = ? D - 20% D = 34% (The answer 1 was given but p u p i l s 10 10 5 i , chose "not g i v e n " or "2".) Teachers may draw a d d i t i o n a l c onclusions from the l i s t i n g s of the "more d i f f i c u l t " and " l e s s d i f f i c u l t " items on pages 7 and 8. A b r i e f i n d i c a t i o n of the process i n v o l v e d has been given f o r each one that i s l i s t e d . The remainder are omitted, not because they are easy or hard, but because no s i g n i f i c a n t change i n d i f f i c u l t y has occurred since 1964. 181 Stanford Arithmetic Tests, Advanced, Form L - Grade VII Dif f i c u l t y and Validity of Items - March 1964 vs. May 1970 REASONING 7. Difficulty Val idity 7° D i f f i c u l t y Validity (U-L%) (U-L7.) Item No. Mar. 1964 May 1970 Mar. 1964 May 1970 Item No. Mar. 1964 May 1970 Mar. 1964 May 1970 1 7 5 9 7 24 58 58 34 43 2 3 6 4 9 25 50 54 48 57 3 4 5 5 9 26 39 44 36 47 4 15 13 12 15 27 43 50 4 11 5 12 7 17 6 28 64 61 14 25 6 10 7 17 14 29 73 79 '6 10 7 20 20 19 23 30 64 71 27 39 8 11 17 14 16 31 14 10 17 19 9 17 22 20 29 32 36 36 36 54 10 18 14 23 27 33 20 11 25 . 12 11 12 14 12 25 34 57 52 30 40 12 18 17 26 22 35 17 38 25 57 13 18 17 28 35 36 34 36 25 40 14 39 38 42 55 37 35 43 35 47 15 22 25 26 26 38 20 16 24 25 16 25 30 33 51 39 28 37 31 57 17 20 21 30 22 40 46 45 31 45 18 23 26 32 24 41 58 51 47 55 19 35 34 50 48 42 30 34 16 31 20 34 36 31 4S 43 70 64 19 18 21 27 •. 25 36 49 44 59 55 41 44 22 20 37 15 59 45 63 72 13 8 23 21 24 28 47 -6-182 Stanford A r i t h m e t i c T e s t s , Advanced, Form L - Grade V I I D i f f i c u l t y and V a l i d i t y of Items - March 1964 v s . May 1970 COMPUTATION % D i f f i c u l t y V a l i d i t y % D i f f i c u l t y V a l i d i t y ( U - L % ) (U-L7.) Item Mar. May Mar. May Item Mar. •May Mar. May N o . 1964 1970 1964 1970 N o . 1964 1970 1964 1970 1 3 9 3 12 23 21 18 24 32 • 2 12 7 3 8 24 45 53 31 46 3 7 12 9 16 25 25 22 23 26 4 2 2 3 3 26 42 • 48 53 56 5 10 13 4 7 27 33 46 39 68 6 14 9 11 10 . 28 31 44 37 52 7 17 16 9 15 29 32 28 31 50 8 12 11 16 19 30 17 17 24 30 9 13 17 15 22 31 17 31 26 46 10 11 15 16 27 32 54 59 28 46 11 20 34 25 37 33 28 42 5 9 12 10 23 13 36 34 60 79 47 27 13 32 47 18 27 35 23 31 21 40 14 32 50 34 46 36 5 8 6 18 . 15 16 19 25 30 37 52 39 50 64 16 26 45 35 59 38 66 76 41 36 17 30 31 33 35 39 71 72 22 22 18 23 45 22 43 40 88 88 7 7 19 32 47 38 51 41 54 63 34 46 20 28' 46 28 48 42 47 46 50 70 21 11 9 12 19 43 76 75 18 20 22 38 46 43 70 44 80 91 28 14 - 7 -183 Content of Items Changing in Dif f i c u l t y from 1964 to 1970 Reasoning Less D i f f i c u l t in 1970 Item 4. 14 X 18 5. subtract $ and £ 6. (75^ /25*< = 3) X 3 10. time at 40ft per hour 28. reading gas meter ( s t i l l very d i f f i c u l t ) 31. identification of thousands position 33. lowest common denominator (cf. application in Computation items 10, 11) 34. smallest fraction in group of 4 38. setting up equation 41. 4^ (cf. Computation item 44) 43. meaning of "dividends" 44. meaning of "quotient" More D i f f i c u l t in 1970 8. conversion of map scale 9. distracting data. Addition of mixed numbers with price immaterial. 15. 60j£ at 2 for 154 16. 4i X 8 2 18. (2 X 3H) + (4 X 2U) 22. average height (n.b. Computation 31) 23. conversion of map scale 25. % 26. + for speed and - for wind 27. radius, diameter, circumference 29. zero difficulty? $400/10,000 miles 30. instalment % (zero difficulty?) 35. decimal fraction = to ^ (much more d i f f i c u l t ) 37. rounding decimal to whble number 39. estimation of largest product: 888 X 101 vs # 888 X 90.9 42. 1057. 45. areas -8-184 Content of Items Changing in Di f f i c u l t y from 1964 to 1970 Computation Less D i f f i c u l t in 1970 Item 2. addition of $ and i (but cf. subtraction in Computation 3) 6. 37 X 16 29. reading temperature graph 37. solution of equation (D 1970/164 39/62 V 1970/'64 64/50) More D i f f i c u l t in 1970 1. 205 X'7 3. $51.03 - 4,55 5. sum of 4 numbers 9. 400 X 201 10. 11/12 - 2/3 11. 1/10 + 1/10 12. 3/5 X 7/10 13. 1/4 -r 1/2 14. 30% of $10 16. 200 X 2.5 18. 3520/5 19. 20% of $500 20. .081/9 22. 16 3/4 X 8 24. 1/4 -f- 4 26. (100%) - 61% 27. addition and subtraction 28. of lb. and oz. 31. average' of 3 numbers 32. 6.71/2.2 33. metres and centimetres 34. 1/2 X 15 X 18 35. If 25% = x, 100% = ? 38. interest & taxation 41. (really problems) 44. substitution in equation zero d i f f i c u l t y it tt tt it common denominator reduction of fraction tricky: -S- v_s. 'of decimal fraction in I's, estimation of correct answer? estimation of correct answer or zero zero d i f f i c u l t y zero d i f f i c u l t y zero d i f f i c u l t y 128 + 24/4 8 (a+b) see # 13 ("not given" in pie diagram but valid) reduction to lowest terms estimation of answer decimal fraction see 27 and 28 (very d i f f i c u l t - new math?) %'s %'s irfcaning of r^ Research and Standards Branch 185 B.C. Norms for STANFORD ARITHMETIC TEST, ADVANCED PARTIAL, FORM L RAW SCORES, Grade VII, May 1970 vs. March 1964 Arithmetic Letter Num. Per- Reasoning Computat ion Grade Equiv. centile 1964 1970 1964 1970 A 9 99 43 43 42 41 95 4J 40 40 38 ---40 39 8 90 39 36 7 39 38 B 85 37 35 33 37 80 36 37 36 33 75 35 36 32 70 35 C+ 6 34 ' 3 1 65 35 34 30 34 ~" 33 29 55 32 50 33 31 32 45 32 30 31 40 31 ---- 29 — -28 27 26 30 25 35 30 28 C- 4 29 24 30 29 27 23 25 28 25 27 22 ---20 27 24 26 21 D 15 26 22 25 20 3 2 10 ^ . 2 4 20 23 18 5 21 17 21 15 — -E 1 1 15 11 16 11 Means: 32.0 30.0 31.2 27.2 If class medians are used, they should be compared with the 50th percentile in the appropriate Table. N (1964) 29,204/29,533 N (1970) 38,377/40,252 Research an;' Standards Branch B.C. Norms for STANFORD ARITHMETIC TEST, ADVANCED PARTIAL, FORM L Modal-Age Grade Equivalents at the Grade VII-6 Level, March 1964, and VII-9 Level, May 1970 Arithmetic Letter Num. Per- Reasoning Computation Grade 1964 1970 1964 1970 12.3 12.3 12.1 11.8 11.4 IITo 11.5 11.1 11.2 I57I 11.2 10.7 10.7 9.7 11.0 10.4 9.3 10.4 10.7 10.1 9.0 10.4 8.7 9.7. 9 ' 7 8.5 10.1 9.3 3.2 — -9.4- ~ — 9.7 9.0 8.0 9.1 7.9 9.4 7.7 9.1 8.5 8.5 40 877 8.1 — — 8.2' 7.3 35 8.5 7.9 4 8.0 7.1 30 8.1 7.6 7.0 -- -- 25 7~9 --- 772 777 6~9 --• 20 7.6 7.0 7.5 6.7 15 • 7.4 r 6.6 7.3 6.6 3 .. , , ... 2 10 7.0 6.3 7.0 6.4 5 -- 6.4 5.9 6.7 5.7 ... 1 1 5.5 4.8 5.9 4.9 Equiv. of Mean Scores: 9.1 8.5 8.5 7.7 187 APPENDIX B STANFORD ACHIEVEMENT TESTS: ARITHMETIC REASONING AND ARITHMETIC COMPUTATION DO NOT C O P Y 188 Advanced Battery ARITHMETIC TESTS STANFORD ACHIEVEMENT TEST TRUMAN L. KELLEY • RICHARD MADDEN • ERIC F. GARDNER • LEWIS M. TERMAN • GILES M. RUCH N a m e . Date of T e s t . BRITISH COLUMBIA EDITION Surname Schoo l District S c h o o l Number Name . Day Reasoning Pa jNTES I N U . S . A . Copyright 1954 by Harcourt, Brace & World, Inc., New York Copyright in Great Britain. AU rights reserved. This test is copyrighted. The reproduction ol any part o/ il by mimeograph, hectograph, or in any other way, whether the reproductions are sold ar are lurnUhed free lor me. is a violation ol the copyright law. R e p r o d u c e d by permiss ion for research purposes only. C o p y r i g h . 1953 by M a r c o u n Brace Jovanov ich . A l l nghts reserved. DO NOT COPY 189 6t*nford Advanced Part T E S T 5 Arithmetic Reasoning P A R T I * n D I R E C T I O N S : Work an example, and then compare your answer with the answers which follow it. I f your answer is one of those given, mark the answer space that has the same letter as your answer. Sometimes the correct answer is not given. If you do not find the correct answer, mark the space under the letter for not given. S A M P L E S : 6 1 How many balls are 3 balls and 4 balls? o o 3 b 4 c 7 d 12 e not given si M 8 2 How many books are 3 books and 2 books? ' / f 2 g 3 hi i 6 not given s. 1 Alice has done 14 problems and R u t h has done 8. H o w many more problems must R u t h a do to equal A l i c e 9 G 6 6 8 c 14 d 22 e not given . . 2 The teacher has 27 sheets of paper. H o w many children wil l get paper if she gives / each child 3 sheets? f 3 g 9 A 24 J 30 j not given ; ' 3 Mother bought groceries for $1.19. She gave the clerk two half dollars and a quarter. How much change should she receive? c a 16c b 19c c 25c d $1.25 e not given . . 3 * Dot's mother is going to buy tomato plants to set out. There are to be 14 rows with IS plants in each row. H o w many plants wil l be needed? / 252 g 262 h 352 i 362 j not given < ' 6 Father spent $37.25 last month for gasoline and oil . The gasoline alone cost him $34.67. What did he spend for oil? a a $2.58 b $2.62 c $3.52 d $3.62 e not given s 6 Onions cost 25c for 3 bunches. How many bunches can be bought for 75 c? / f 9 g 10 h 12 i 25 j not given 7 On an average day, Jane's hens lay a dozen eggs, which will sell for 80c. How much would that amount to in 7 days? a a 56c b 80c c 87c d $5.60 e not given 8 The scale of a map reads that 1 inch = 80 miles. How many inches long must a line on the map be to show a distance of 60 miles? r (\ g l\ h 9 i 48 ; not given -9 Thelma wants to buy 1^ yards of ribbon at 12c per yard, and 2$ yards at 30c per yard. How many yards of ribbon does she want to buy? Q a 31 6 3 | c 16 cf 110 e not given 1 0 D i c k and his father are going to rent a boat for fishing. If they leave at 10 A . M . and return at 2 P.M., how much must they pay for renting the boat at 40c an hour? f 40c g 80c h $1.60 J $2.00 j not given . . . . . . . 1 1 It is 90 miles to Cloverdale. The scheduled time for the mail train is 3} hours and that for the streamliner is 2 hours. How many more hours does the mail train take? „ a l 6 2 c 3 j d b\ e n o t given " 1 2 Y o u know how much money you had at the start and at the finish of an automobile trip. T o find out how much money you spent on the trip, you would — / / add g multiply h subtract i divide j not given i : 1 3 On three days, it rained J inch, + inch, and J inch. H o w much did it rain during al l of a these days? a 1 J " b 2 " c2j" d 3 " e not given .3 1 4 On M a y 1, $640 was deposited in a checking account. Since then there has been a deposit of $360, a withdrawal of $70, and another withdrawal of $110. How much is / the balance now? / $100 g $720 A $820 i $1180 j not given . . n 1 1 1 1 Go on tn the next page. DO NOT COPY 190 Stanford Advanced Pftrti&l: L T E S T 5 Arithmetic Reasoning (Continued) • 12 1 8 When ice-cream bars are 2 for 15^, how many can be bought for 60« 1 7 A car's mileage read 4185.4 miles at the beginning of a trip. At the end it read 4211.6 miles. How long was the trip? o a 26 mi. 6 26.2 mi. c 27.2 mi. d 73.8 mi. e not given >; 1 8 Frank wants 2 balls at 34e each and 4 toy cars at 21t each. H o w much wil l they cost / all together? / 55e £ $ 1 . 1 0 h $1.42 t $1.52 j not given :» 1 9 B i l l worked 2{ hours. Fred worked I J hours. Ned worked 5 hours. How many hours longer did Ned work than BUI? „ a \ 6 2 ; c 2 ; d 3\ e not given 19 8 0 Father bought a radio. The price of the radio plus the carrying charge was $52.50. He paid $20 in cash and agreed to pay the rest in 5 equal monthly payments. How much wi l l each monthly payment be? / / $4 g $6.50 h $10.50 t $14.50 / not given x 3 1 How many miles can a man walk in an hour at the rate of | mile in 15 minutes? „ a 2 6 2 | c 3 j d 4 e not given 21 2 2 The heights of the 5 boys on a basketball team are: 64 inches, 60 inches, 65 inches, 57 inches, and 59 inches. What is the average height of the players in inches? / / 59 g 60 h 61 i 64 j not given n ii 2 3 How many miles apart are two towns that are 3 i inches apart on the map, if the map scale reads 1 inch = 20 miles? o a 3 I 6 7 c 2Z\ d 70 e not given a i How many hours, to the near-1 " • • " **" 2 2 4 Ben slept from 9:20 P . M . unti l 6:35 the next morning. est quarter hour, did he sleep that night? j f 81 g9\ h 9\ i 151 j not given 2 < 3 8 R u t h budgets her yearly allowance this way: clothes, $80; lunches, $50 ; shows, $20; carfare, $20 ; miscellaneous, $30. What per cent of her allowance does she spend for , clothes? a 25 6 331 c 40 d 80 e not given »; 2 8 If +250 is the miles per hour which an airplane would travel if there were no wind, and —40 represents the loss of speed in miles per hour, due to a cross wind of 60 miles an hour, how many miles of forward progress does the plane make in an hour? f 150 mi. g 190 mi. h 210 mi. i 270 mi. ;' not given s» 2 7 If the radius of a circle is doubled, the circumference wil l be increased how many times? a 2 6 3 ; d 61 e not given 2 8 What is the reading of the gas meter shown at the left, in cubic feet? / 762 g 25,700 h 75,200 / j 76,200 not given =s;: 2 9 M r . Jof.es bought a car for $2000. At the end of the year he sold it for $1600. The difference is called depreciation. If he drove the car 10,000 miles, how much was the cost per mile for depreciation? a a 1.6c 6 2'c c 3.6c d 4c e not given » 3 0 Furniture which sells for 5500 cash costs on an installment plan $90 down and 10 equal payments of Ho each. B y what per cent is the installment-purchase cost greater / than the cash price? / 5 % g 6 % h 9 % i 18% j not given 30 • 12' Go on to the next page. DO NOT COPY 191 B U t J o r d A d v a n c e d P»iti»i: L T E S T 5 Arithmetic Reasoning PART I I 4 i i 3 DIRECTIONS: The answer to each of these examples can be thought out without doinp any figuring on paper. Y o u are to think out the answer and mark the answer space that is lettered the same as your choice. S 1 In which number is the 9 in the thousands position? n i. c d a 1,988 6 11,911 c 19,111 d 88,911 3 2 If all the odd-numbered houses are on the same side of the street, which of the following would be on the same side as N o . 2437? f / „ h e No. 2432 / No. 2524 g No. 2645 h No. 3724 ' 3 3 What is the lowest common denominator for | , and i? a & c , a 8 6 16 c 32 d 64.' 33 3 4 Which is the smallest fraction? < / s e e f la £ 15 ^ 21 .34 a b c « | = a .20 6 . 0 5 c .01 d .00\ 35 3 6 The amount left from a sale after costs and expenses are taken out is called — <• / „ e rent / profit g -wholesale h commission sr.. 3 7 H o w much is 46.735 rounded off to a whole number? a b c a 46 6 46.7 c 46.8 d 47 37 3 8 Dorothy earned 6 cents and spent d cents. How many cents did she have left? b d < I 9 e bd f -j 8 b - d h g 3* 3 9 B y estimation, choose the example which wi l l have the largest product. a 888 6 888 c 888 d 888 06c X 90.9 x 9.09 x 10.1 x 101 3<> 4 0 Which is the same as "18 more than a number = 4 4 " e 1 8 ^ = 1 4 h N=18 + 44 40; a b c « 4 J = a 2 6 4 c 8 d 16 4.. 4 2 H o w would 1059c of a number compare in size with the number? e more than twice / slightly larger g slightly smaller e / „ h less than half 42 4 3 For the use of money paid for a share of its stock, a company pays — a b c a dividends 6 bonds c a premium d a mortgage .3 4 4 B y estimation, choose the quotient which wil l be larger than 1. < 1 Q e l 3 6 -r-135| /125 + 125| g l 4 8 -H 148 h 152 + 153 .44 18A/ / ^ = 44 g N + 1 8 = 4 4 p ; p 9 4 5 When the dimensions of a square are doubled, its area becomes how many times as large a 2 6 4 c 6 d 8 Stop. b N o . BIQUT l t t 4 1 1 7 1 1 11 i i it i i it it i i IT it it n i i n n 14 is M r n «>• i i u i n i i s u r m i n LEWIS M. TERMAN • GILES M. RUCH BRITISH COLUMBIA EDITION Name. School District Number. School . Name . Date or Test. Day Computation Copyright 1954 by Harcourt, Bract & World, Inc., New York nu~T» •» v*.*. Copyright in Great Britain. AU rights reserved. This test is copyrighted. The reproduction o\ any part ol U by mimeograph, hectograph, or in any other way, whether the reproduction, are sold or are furnished tree lor use, is a violation oi the copyright Um. Reproduced by permission for research purposes only. Copyright 195S by Harcourt Brace jovanovich. All rights reserved. DO NOT COPY 193 Stanford Advanced Parti*!. L T E S T 6 Arithmetic Computation 14 DIRECTIONS : Work each example. Then compare your answer with the answers given at the right of the example. If your answer is one of those given, mark the answer space that has the same letter as your answer. Sometimes the correct answer is not given If the correct answer is not given, mark the answer space under the letter for not given. Look carefully at each example to see what it tells you to do. If you need to do any figuring, use a separate sheet of paper. i M u l t i p l y 205 7 2 A d d $8.70 5.65 s Subtract $5.03 4.55 a b c rf f a 1235 b 1505 c 1615 rf 1705 e not given , f $13.35 g $13.45 h $14.35 i $15.35 ; not given a b c il c a $.48 b $.58 c $1.48 d $1.52 e not given f a h i j 34)68 ( l g 2 h 3 i 20 ; not given < » A d d 638 67 56 334 n b c rf f a 985 b 995 c 1085 d 1195 e not given 6 M u l t i p l y 37 16 f 582 g 592 h 602 i 692 j not given i Subtract 211.3oo ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ g ^ 8 42)1428 9 M u l t i p l v 400 201 io Subtract \ \ 1 1 Add JV f 33 S 36 £ A 304 I 340 j not given a 8400 b 80,400 c 82,600 d 84,400 e not given / „ h i: j | * J i f y not given a b c il a i i c 2 rf 11 e not given 1 2 f x 7 _ TIT -13 ± + \ = £ * ! * ? l 2 f o ; not given (> c rf c a 2 6 1 c J d | e not given / 0 * » 3 " 30% of $ 4 0 = / * & « * i ^ o t ^ e n » Add 2 1 a b c d e a 5 6 4 c 3 | rf 3 e not given »: [ 14 ] Go on to the next page. DO NOT COPY Bt*nJord Advanced Part ia l : L T E S T 6 A rithmetic Computation (Continued) 16 200 X 2.5 " Add 4667.55 786.68 99.64' 6547.78 is 5)3520 i» Selling Price / .50 g 5 h 5.00 i 500 ;' not given a 12,001.65 e not given. 6 12,100.65 c 12,101.65 rf 12,110.65 « f 74 g 704 h 724 i 740 j not given = $500 Rate of Commission = 207c a $520 6 $40 c $100 d $400 e not given v. Commission so 9).081 / .009 g .09 h .9 i 9 ; not given *1 Subtract 2 | a l 6 11 c2 d 31 e n o t given " M u l t i p l y 161 8 f 24 I g 128 A 1281 . J 134 ; not given " If 8 y = 56. a 7 6 8 c 49 rf 64 e not given 24 1 ^ 4 g 1 A 4 ' i 16 j not given 1 % Recreation. « What per cent of the taxes was spent for hospitals 15 % Hospitals and welfare? ,.,r a 15% 6 209c c 35% rf 6 0 % e n o t given.. - 20 ~e Welfare 25 7o General 2 6 What per cent was spent for "other costs"? government f 39% g 40% h 6 0 % ' 6 1 % j not given -<> 2 7 Subtract 9 lb. 8 oz. 6 lb. 12 oz. a 16 lb. 8 oz. rf 2 lb. 2 oz. b 16 lb. 4 oz. c 2 lb. 12 oz. e not given * * A d d 61b. 9 o z . 121b. 13 oz. 71b. 8 o z . / 28 lb. g 27 lb. 14 oz. h 27 lb. 6 oz. / i 26 lb. 14 oz. ;' not given -8 T E M P E K A T V R E CHART T 2 9 How many degrees warmer was it at 3 P . M . than it was at 9 A . M . ? a 4° 6 8= c 10 ? rf 18= o e not given -"•> 3 0 How much did the temperature fall from 2 P . M . to 6 P . M . ? , f 5° g 10' k 18° i 20° / j not given *> 3 1 F i n d the average 16 ft. 9 ft. 11 ft. a 9 ft. 6 11 ft. c 12 ft. d 36 ft. e not given 3, [ 15 ] Go on to the next page. DO NOT1 CQPY 195 Stanford Advanced Partial. L TEST 6 Arithmetic Computation (Continued) '16 » 2.2)6.71 / .305 g 3.05 A 30.5 i 305 j not given 3 J " A d d 15 m. 8 cm. 4 m. 5 cm. a 19 m. 13 cm. b 19 m. 3 cm. c 20 m. 3 cm. a b e d rf 21 m. 3 cm. e not given 33 ** If A = % bh, what is the area of the triangle shown at the left? / 1 6 | g 33 h 135 i 270 not given f 0 h i j M If 2 5 % of an amount is $1.25, what is the amount? a b c d e a $20 b $6.00 c $.3125 rf $.0125 e not given. 3 / 6 ^ 10 A 20 1 40 j not given / ( n i j a b e d w If 2 m + 10 = 28, m a 9 6 12 c 16 rf 18 e not given Amount of Tax = ? ; not given » - 21 + 3 a 6 r d e a 18 ft 7 c -21 rf -24 e not given « Multiply + 4 a - 3 / 12 a g -12 A 12 i 1 a > not given J a k i j « Principal = $600 Annual Interest = $30 6 % 6 2 % 5 % rf ^ e n o t given 4> Rate of Interest = ? a b e d e «If 1 = 18, n = { 6 £ 15 A 21 i 54 ; not given / e * « ; 42 1 : ** Principal = $800 0 6 c d e Rate 3 % a $4 b $12 c $24 rf $144 e not given 43 : Time 6 mo. Interest = 1 If A = rr"-, what is the area of the circle shown at the left? (» = 3.14) / 78.50 sq. ft. g 77.50 sq. ft. A 31.40 sq. ft. i 15.70 sq. ft. j not given f 0 h i j Stop. No. UOBT l i t 4 1 1 t 1 1 It 11 It 11 14 li 11 17 11 11 » tl u n ii a H n a n ii 11 It U l< u u tl a n tt 41 U 41 44 C r . . c o r . 21 21 31 13 JS » 4C 43 45 47 ff5IOKS7»tlUWtt u t i n n n r s n n i c K 15 «7 » »3 V 101 104 107 Ml 114 III 121 124 127 16 196 APPENDIX C THE COMPOTES PROGRAM BICAL 197 The computer program used to perform the a n a l y s i s of data was BICAL (Wright & Mead, 1S78). The d e s c r i p t i o n i n t h i s appendix r e l i e s h e a v i l y on the documentation given f o r the program. The program c o n s i s t s of three major s e c t i o n s : i n p u t , e s t i m a t i o n , and f i t . The i n p u t p o r t i o n reads the c o n t r o l c a r d s , s t o r e s the data f o r each person, and computes the frequency of raw scores and the p r o p o r t i o n of c o r r e c t responses on each item (marginals). As p a r t of t h i s s e c t i o n a sub r o u t i n e e l i m i n a t e s zero and p e r f e c t scores from item and person f i l e s . The user has the o p t i o n of s p e c i f y i n g minimum and maximum sco r e s t c be i n c l u d e d i n the c a l i b r a t i o n sample., T h i s f e a t u r e helps to a l l e v i a t e the guessing problem by e l i m i n a t i n g from the c a l i b r a t i o n process those s c o r e s l e s s t h a t that due to chance alone on m u l t i p l e - c h o i c e q u e s t i o n s . , There i s a l s o a f a c i l i t y by which items may be removed from the a n a l y s i s thereby a l l o w i n g r e c a l i b r a t i o n without changing other c o n t r o l c a r d s . . There are s e v e r a l forms i n which data may be i n p u t ; i n the present study each item was coded 0,1,2,3,4, or 5 a c c o r d i n g to response s e l e c t e d , and a s c o r i n g key was provided. The e s t i m a t i o n s e c t i o n c a l c u l a t e s the esti m a t e s of a b i l i t y and d i f f i c u l t y frcm the marginal person and item score d i s t r i b u t i o n s . There are two e s t i m a t i o n options a v a i l a b l e . . The f i r s t (PEOX) i s an approximate method using Cohen's procedure (Wright & Douglas, 1977a) which may be used f o r long t e s t s with symmetrical score d i s t r i b u t i o n s , . The second (OCON) i s the c o r r e c t e d u n c o n d i t i o n a l maximum l i k e l i h o o d procedure 198 (Wright & Douglas, 1977b) suggested f o r use with s h o r t e r t e s t s and skewed d i s t r i b u t i o n s . Given the f i n a l d i f f i c u l t y estimates the program computes the corresponding a b i l i t y e s t i m a t e s f o r a l l raw scores. , The o r i g i n f o r both persons and items i s at the c e n t r e of estimated item d i f f i c u l t i e s , . The f i t s e c t i o n computes a mean square t e s t o f f i t f o r each item and o r g a n i z e s the r e s u l t s i n t o a summary table.„ The sample i s d i v i d e d i n t o a number o f subgroups (maximum of six) s t r a t i f i e d by t o t a l s c o r e s . The observed successes on each item i n each score subgroup are compared with those p r e d i c t e d f o r t h a t subgroup from the t o t a l sample estimate. The model suggests t h a t the d i f f i c u l t y e s t i m a t e s are independent of the a b i l i t y o f the sample, hence t h e r e should be c l o s e agreement between observed and p r e d i c t e d successes on each item. F i n a l l y , BICAL c a l c u l a t e s a r e s i d u a l index of the s l o p e of each item c h a r a c t e r i s t i c curve a f t e r the model has been f i t t e d . T h i s s t a t i s t i c may be i n t e r p r e t e d as an index of item d i s c r i m i n a t i o n . An example of the output of BICAL i s i n c l u d e d i n t h i s appendix. The f o l l o w i n g d e t a i l e d d e s c r i p t i o n cf the p r i n t o u t on each page should g i v e some i n s i g h t i n t o the data i a n a l y s i s . . Page 1 l i s t s the c o n t r o l c a r d s . . I t a l s o l i s t s the item reponses f o r the f i r s t s u b j e c t and g i v e s the t o t a l number of items and s u b j e c t s t c ensure t h a t the data f i l e was read c o r r e c t l y . Page 2 t a b u l a t e s the responses t c each item,. The l e t t e r s i n d i c a t e t h a t the items were from the reasoning t e s t 199 and the computation t e s t . The f i g u r e i n the "unknown" column i s the number of times the respondents omitted the item or responded unacceptably, f o r example, by marking s e v e r a l responses f o r the item. Page 3 summarizes the e d i t i n g process. Four s u b j e c t s whose s c o r e s were l e s s than the minimum cf 15 were d e l e t e d from the a n a l y s i s . No one achieved a p e r f e c t score. Page 4 shows the f r e q u e n c i e s of raw s c o r e s and the corresponding histogram. An i n s p e c t i o n of the histogram shows a preponderance of high s c o r e s . , Page 5 shows the number of c o r r e c t responses f o r each item on the t e s t and the corresponding histogram., Again, c a s u a l i n s p e c t i o n shows the items to be g e n e r a l l y easy. Page 6 l i s t s the item d i f f i c u l t y estimates and the a s s o c i a t e d estimates of the standard e r r o r of c a l i b r a t i o n . The mean estimate of a b i l i t y i s 1,23. Items having d i f f i c u l t i e s c l o s e to t h i s value show the l e a s t standard e r r o r . . These items are best matched to the a b i l i t y o f the s u b j e c t s and are best estimated.. Items of l e a s t d i f f i c u l t y are l e a s t best estimated as shown by the higher standard e r r o r of c a l i b r a t i o n . The s c a l e f a c t o r s at the top of the page are r e l a t e d t o the PROX (Cohen):' procedure of approximate e s t i m a t i o n . The unnumbered page f o l l o w i n g page s i x shows the r e l a t i o n s h i p between raw scores and estimates of person a b i l i t i e s , along with the standard e r r o r of measurement r e l a t e d to each s c o r e . The mean a b i l i t y at the bottom cf the page i s the summary s t a t i s t i c f o r the a b i l i t i e s of a l l persons 200 i n the sample. . The carve i s the f u n c t i o n r e l a t i n g raw scores cn the v e r t i c a l a x i s and a b i l i t y on the h o r i z o n t a l a x i s , with the t y p i c a l l o g i s t i c shape. Pages 7 and 8 c o n t a i n i n f o r m a t i o n on which to judge the degree of f i t t c the model f o r each item. The s u b j e c t s are d i v i d e d i n t o , i n t h i s case, s i x groups ranging from low a b i l i t y t o high a b i l i t y . . The s c o r e range, number of s u b j e c t s , and mean a b i l i t y f o r each group are shown at the bottom of page 7. The f i g u r e s i n the s i x columns under "item c h a r a c t e r i s t i c curve" are the f r a c t i o n of each grcup c o r r e c t l y answering a given item... On item C19, f o r example, 33% of the 45 persons i n the lowest a b i l i t y group gave c o r r e c t answers. Moving across the groups f o r item C19 i t w i l l be noted that the item i s well-behaved; as a b i l i t y i n c r e a s e s the p r o p o r t i o n o f c o r r e c t answers i n c r e a s e s . . T h i s trend i s not e v i d e n t f o r item C39. The s i x columns under "departure from expected ICC" show how the obtained values d i f f e r from values p r e d i c t e d from the t h e o r e t i c a l item c h a r a c t e r i s t i c curve based on the mean group a b i l i t y and the estimated d i f f i c u l t y of the item. For example, 16% of the f i r s t group c o r r e c t l y answered item C39, and t h i s i s 8 percentage p o i n t s more than expected. Under the s e c t i o n l a b e l l e d " f i t mean sguare" the " w i t h i n group" column i n d i c a t e s the v a r i a n c e remaining i n the groups a f t e r removing the e f f e c t of d i f f e r e n c e s i n the.shapes of c h a r a c t e r i s t i c curves. I f the c o r r e c t p r o p o r t i o n of the group succeeded but the wrong people i n the group were s u c c e s s f u l the within group value w i l l be l a r g e r e l a t i v e to the between group v a r i a n c e , . The "between grcup" variance 201 serves to eva l u a t e the agreement between the observed item c h a r a c t e r i s t i c curve and the t h e o r e t i c a l c urve. These mean sguares have expected values of u n i t y . A n o n - s i g n i f i c a n t value i n d i c a t e s t h a t s t a t i s t i c a l l y e g u i v a l e n t estimates of item d i f f i c u l t y are produced r e g a r d l e s s of which s c o r i n g group was used i n the c a l i b r a t i o n . The b a s i c f i t s t a t i s t i c f o r each item i s the one given under the heading " t o t a l " . . I t s expected value again i s u n i t y . The value w i l l be l a r g e when the observed trend does not f o l l o w the p r e d i c t e d t r e n d , f o r example, i f too many higher a b i l i t y persons f a i l or v i c e v e r s a . I t i s an i n d i c a t o r i of disagreement between the a b i l i t y c a l l e d f o r on the item and th a t d e f i n e d by the aggregate of items. The d i s c r i m i n a t i o n index i s r e l a t e d to the p a t t e r n o f "departures from expected ICC". , I t s model value i s u n i t y . I f the p a t t e r n of departures runs from n e g a t i v e t o p o s i t i v e across the s i x groups, t h e . d i s c r i m i n a t i o n index w i l l be gr e a t e r than u n i t y , f o r example, item C19,. . I f the t r e n d i s p o s i t i v e t o n e g a t i v e the index w i l l be l e s s than u n i t y , f o r example, item C40. The index i s a measure of the l i n e a r r e s i d u a l trend a c r o s s score groups. , The p o i n t b i s e r i a l values are the customary c o r r e l a t i o n s between a s u b j e c t ' s success on each item and h i s or her estimated a b i l i t y score from the t e s t . On page 8, the items are arranged i n t h r e e d i f f e r e n t ways i n order t o f a c i l i t a t e r e t r i e v a l of i n f o r m a t i o n . The f i t order on the r i g h t hand s i d e i s the most u s e f u l i n s e l e c t i n g n o n p i t t i n g items. . 20 2 Page 9 shows the r e l a t i o n s h i p between p r o b a b i l i t y o f s u c c e s s f o r a s c o r e group cn a g i v e n item and t h e mean sguare f o r t h e group. The l a t t e r f i g u r e i s o b t a i n e d by s t a n d a r d i z i n g and s g u a r i n g the f i g u r e s i n the c e n t r e p a n e l on page 7. . T h i s p l o t i s u s e f u l on m u l t i p l e - c h o i c e g u e s t i o n s where t h e presence of g u e s s i n g i s i n d i c a t e d by l a r g e v a l u e s of mean s g u a r e s l o c a t e d t o the l e f t o f t h e chance p r o b a b i l i t y l e v e l f o r the t e s t . I n t h i s i n s t a n c e i t e m 40 would be a gccd c a n d i d a t e f o r g u e s s i n g . Pages 10, 11, and_ 12 c o n t a i n twc-way p l o t s o f i t e m d i f f i c u l t y , r e s i d u a l d i s c r i m i n a t i o n , and t o t a l f i t mean s g u a r e . They might be u s e f u l i n d e t e r m i n i n g by i n s p e c t i o n any p a r t i c u l a r l y i n t e r e s t i n g t r e n d s . , 1964 c o m p u t a t i o n t e s t c a l i b r a t i o n p f t CONTROL PARAMETERS NITEM NGROP MINSC MAXSC LREC KCA B SCORE 1 2 3 1 5 6 7 8 9 10 11 44 0 14 43 160 2 0 0 0 0 0 0 0 0 0 0 0 0 COLUMNS SELECTED 1 2 3 4 5 6 7 8 1********0*********0*********0*********0*********0*********0*********0*********0 1111111111 1111111111 1111111111 1111111111 1111 KEY KEY 5312521525 2124243231 3411313425 3213521255 3421 FIRST SUBJECT 0020000000 1 1241115124 2353152425 4325314432 3113431131 3443300000 FIRST SUBJECT 0020000000 2 5312352524 5142334235 3122434545 5515154131 5155000000 NUMBER OF ITEMS 44 NUMBER OF SUBJT 296 ho O 1964 computation test c a l i b r a t i o n ALTERNATIVE RESPONSE FREQUENCIES. SEQ NUM ITEM NAME 1 2 3 4 5 UNKN 1 Cl I 0 4 1 1 290 0 I 5 2 C2 I 11 3 267 8 6 1 I 3 3 C3 I 276 5 7 0 8 0 I 1 4 C4 I 0 291 1 2 2 0 I 2 5 C5 I 0 3 11 14 268 0 I 5 6 C6 I 6 258 2 10 19 1 I 2 7 C7 I 255 4 5 1 28 3 I 1 8 C8 I 6 11 . 3 5 .269 2 I 5 9 C9 I 24 260 0 0 10 2 I 2 10 CIO I 7 11 3 2 272 1 I 5 11 C l l I 9 248 4 0 34 1 I 2 12 C12 I 269 1 5 3 15 3 I 1 13 C13 I 29 210 12 31 13 1 I 2 14r .C14 I 6 14 . 15 211 43 7 I 4 15 C15 I 8 255 11 10 11 1 I 2 16 C16 I 2 5 8 225 53 3 I 4 17 C17 I 13 12 202 3 63 3 I 3 18 C18 I 60 228 0 4 3 1 I 2 19 C19 I 19 9 213 12 39 4 I 3 20" C20 I 224 52 8 2 5 5 I 1 21 C2h I 0 . 1 269 20 4 2 I 3 22 C22 I 6 11 30 203 44 2 I 4 23 C23 I 224 17 4 in 33 8 I 1 24 C24 I 159 94 4 19 17 3 I ] 25 C25 I 35 11 223 3 20 4 I 3 26 C26 I 169 5 7 14 98 . 3 I i 27 C27 I 2 12 215 10 50 7 I 3 28 C28 I 6 5 7 209 67 2 I 4 29 C29 I 25 214 22 11 21 3 I 2 30 C30 I 13 8 4 18 250 3 I 5 31 C31 I 9 2 249 16 19 1 I 3 32 ' C32 I 13 150 48 5 78 2 I 2 33 C33 I 207 2 65 3 15 4 I 1 34 C34 I 19 26 115 65 61 10 I 3 35 C35 I 13 20 28 3 231 1 I 5 - 36 C36 I 5 284 4 0 2 1 I 2 37 C37 I 108 13 59 57 47 12 I 1 38 C38 . I 36 101 28 32 78 21 I 2 39 C39 I 38 67 18 77 78 18 I 5 40 C.40 I 101. 29 7 98 40 21 I 5 41 C41 I 27 42 1 30 17 62 18 I 3 42 C42 I 81 19 15 152 17 12 I t 43 C43 I 15 f.8 94 51 44 24 I 2 44 C44 I 4 5 1 5 55 79 72 30 I 1 1964 computation t e s t c a l i b r a t i o NUMBER OF ZERO SCORES NUMBER OF PERFECT SCORES NUMBER OF ITEMS SELECTED 44 NUMBER OF ITEMS NAMED 44 SUBJECTS BELOW 14 SUBJECTS ABOVE 43 SUBJECTS IN CALIB. 3 0 293 TOTAL SUBJECTS" - 296 REJECTED ITEMS ITEM ITEM "ANSWERED NUMBER NAME CORRECTLY NONE SUBJECTS DELETED = 0 SUBJECTS REMAINING = 293 ITEMS DELETED = 0 POSSIBLE SCORE '= 44 MINIMUM SCORE = 14 MAXIMUM SCORE = 4 3 PAGE 3 rO O 1964 computation test c a l i b r a t i o n SCORE / DISTRIBUTION OF ABILITY COUNT PROPORTION 2 4 6 ' 8 10 J **********Q*********Q*** * * * * * * ( ) * * * * * * * **Q *#***#***fj i 1 0 0 0 I I 2 0 0. 0 I I 3 0 0. 0 I I ' 4 0 0. 0 I I 5 0 0. 0 I I 6 0 0. 0 I I 7 0 0. 0 I - I 8 0 ' 0. 0 I I 9 0 , 0. 0 I I 10 o • 0. 0 I I 11 0 ' 0. 0 I I 12 1 0. 00 I I 13 2 . 1 0. 01 IXX I 14. 1 0. 00 I I V?. - 1 ' : o. 00 I. I 16 - - 3 -- 0. 01 IXXX I 17 0 ' 0. 0 I I 18 1 0. 00 I I 19 3 " 0. 01 IXXX I 20 5 0. 02 IXXXXX > ' - . ' . . I 21 6 0. 02 IXXXXXX I 22 ^ 7 0. 02 IXXXXXXX I 23 9 0. 03 IXXXXXXXXX I 24 9 0. 03 IXXXXXXXXX I 25 10 0. 03 IXXXXXXXXXX I 26 11 0. 04 IXXXXXXXXXXX I 27 21 0. 07 IXXXXXXXXXXXXXXXXXXXXX I 28 15 0. 05 IXXXXXXXXXXXXXXX I 29 14 0. 05 IXXXXXXXXXXXXXX I 30 15 0. 05 IXXXXXXXXXXXXXXX I 31 12 0. 04 IXXXXXXXXXXXX I 32 14 0. 05 IXXXXXXXXXXXXXX I 33 17 0. 06 IXXXXXXXXXXXXXXXXX I 34 31 - 0. 11 IXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX I 35 18 0. 06 IXXXXXXXXXXXXXXXXXX I 36 23 0. 08 IXXXXXXXXXXXXXXXXXXXXXXX I 37 16 0. 05 IXXXXXXXXXXXXXXXX I 38 11 0. 04 IXXXXXXXXXXX I 39 5 0. 02 IXXXXX I 40 5 0. 02 IXXXXX I 41 6 0. 02 IXXXXXX . I 42 4 0. 01 IXXXX. I 43 0 0. 0 I I 44 . 0 0. 0 I . I EACH X = (1. 31 PRRCRNT XXXXXXXX1 5 1 • (1 Sit t t XXXXXXXXXXX1 l"Z ' 0 Li) it XXXXXXXXXXXXXXXXXXXXXXXXXXI ZS • 0 ZSI Zt XXXXXXXXXXXXXXXXXXXXXXI tt • 0 '• OET tt> : . xxxxxxx J tl 0 Of at XXXXXXXXXXXXXI LZ 0 BL 6e XXXXXXXXXXXXXXXXXI tl o t o t 8E XXXXXXXXXXXXXXXXXXI LZ 0 80T LZ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 96 0 zsz 9Z XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 8L 0 ozz 5E XXXXXXXXXXXXXXXXXXXXI 6Z 0 SIT tz XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI OL 0 toz zz XXXXXXXXXXXXXXXXXXXXXXXXXI TS 0 .6n ZZ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI ,58 0 8f 2 , IE XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 58 0 8frZ OE XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI ZL 0 HZ 6Z-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI IL 0 602 8Z XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI ZL 0 . STZ LZ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 85 0 691 9Z XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI SL 0 zzz sz XXXXXXXXXXXXXXXXXXXXXXXXXXXI *s 0 651 tz XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 9L •o tZZ zz XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 69 "0 coz zz XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 16 •o £9Z TZ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 9L •o tzz oz XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI ZL •o z\z 61 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI LL •o 9ZZ , 81 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 69 •o ZOZ L\ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI LL •o szz 91 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI L9 •o frSZ 51 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI ZL •o - irz - t-T XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI ZL •o ore " t l XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI T6 •o L9Z ZI XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI f 8 •o LtZ TI XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI Z6 •o . UZ 01 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 88 •o 8SZ 6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 16 •o 89Z B XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 98 •o ESZ L XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 88 •o 8SZ 9 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI T6 •o 99Z 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 66 •o 68Z t XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI >6 •o MLZ z XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 06 •o S9Z z XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI 86 •o L9Z I i o » » » « « » « » » o » » » » « » » » » o » » » » » » » » » o » » » » » » « » » o » » » » » » » » » » i 01 8 9 t Z NOIiaOdOdd 1N0OD s aovd ssaNisva io NOiinai^aisia waxi UOT. j e jqt . i e o i s a q uo j lerinduioo t96\ 1964 computation test c a l i b r a t i o n PROCEDURE USED UCON DIFFICULTY SCALE FACTOR 1.148 ABILITY SCALE FACTOR NUMBER OF ITERATIONS = 2 1. 35 PAGE SEQUENCE I ITEM I NUMBER I NAME I ITEM DIFFICULTY . 1 I C l I -2.922 2 I C2 I -1.269 3 I C3 I -1.708 4 I C4 I -3.332 5 I C5 I -1.311 6 I C6 I -1.004 7 I C7 I -0.840 8 I C8 I -1.400 9 I C9 I -1.004 10 I CIO I -1.54 5 11 I C l l I -0.662 12 I C12 I -1.355 13 I C13 I 0. 177 14 I C14 I 0. 158 15 I C15 I -0.871 16 I C16 I -0.127 IT I C17 I 0.327 18 I c i r I -0.148 19 I C19 I 0. 139 20 . I C20 I -0.105 21 I C21 I -1.355 22 I - C22 I 0. 308 23 I C23 I -0.105 24 I C24s ' I 1.055 25 I C25 I, -0.063 26 I C26 I 0.893 27 I C27 I 0.080 28 I C28 I 0.196 29 I C29 I 0.099 30 I C30 I -0.690 31 I C31 I -0.690 32 I C32 I 1.216 33 I C33 I 0. 290 34 I . . C34 I 1.771. 35 I C35 I -0.237 • 36 I C36 I -2.296 37 I C37 I 1.890' 38 I C38 I 2.012 39 I C39 I 2.444 40 I C40 I 3.386 41 I C41 I 1.523 42 I C42 I 1.168 43 I C43 I 2.676 3.233 STANDARD ERROR 0.413 0. 205 0.242 0. 502 0.208 0.188 0.178 0.215 0.188 0.227 0. 169 0.212 0.140 0.140 0.180 0.148 • 0.137 0.149 0. 141 0. 147 0.212 • 0.137 0.147 0. 129 0. 146 0.129 0. 142 0. 140 0. 142 0.170 0. 170 0.128 0. 138 0. 131 0.152 0.310 0.132 0.134 0.143 0.180 0. 129 0. 128 0.150 0.172 LAST DIFF CHANGE -0.018 -0.012 -0.014 -0.018 -0.013 -0.010 -0.009 -0.013 -0.010 -0.013 -0.008 -0.013 -0.000 -0.000 -0.009 -0.003 0.002 -0.003 -0.000 -0.003 -0.013 0.001 -0.003 0.010 -0.003 0.008 -0.001 0.000 -0.001 -0.008 -0.008 0.012 0.001 0.019 -0.004 0.016 0.020 0.021 0.024 0.028 0.016 0.012 0.025 0.028 PROX DIFF 3.153 1. 293 1.777 3.626 1.339 1.007 •0.831 •1.436 •1.007 •1.596 •0.643 •1. 387 0.221 0.201 -0.864 -0.087 0. 371 -0.109 0.182 -0.065 FIRST II CYCLE II -1 387 0.353 -0.065 1.090 -0.022 0.931 0.123 0. 240 0. 143 -0.673 -0.673 1. 247 0.334 1.788 -0.200 -2.4 37 1.904 2.024 .450 .403 .546 . 200 .682 -2.905 -1.257 -1.695 -3.315 -1.299 -0.994. -0.831 -1.387 -0.994 -1.532 -0.654 -1.342 0.177 0. 158 -0.862 -0.123 0. 325 -0.145 0. 139 -0.102 -1.342 0.307 -0.102 1.045 -0.060 0.885 0.081 0. 196 0.100 -0.683 3.245 -0.683 1. 204 0. 289 1.752 -0.233 -2.281 1.870 1.991 2.420 3.359 1. 507 1.157 2.651 3. 206 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 11 I I ROOT MEAN SQUARE » 0.013 MTAN ABIL ITY = 1 - 23 to o oo COMPLETE SCORE EQUIVALENCE TABLE RAW LOG STANDARD ] TEST CHARACTERSTC CURVE SCORE COUNT ABILITY ERRORS 1 1 43 0 4.71 1. 06 ] ! * 42 4 3.93 0. 78 1 1 41 .6 3.44 0. 66 1 1 * 40 5 3.06 0. 59 I I * 39 5 2.75 0. 55 J I * 38 11 2. 48 0. 51 1 I * 37 16 2. 24 0. 48 1 I * 36 23 2.02 0. 46 1 I * 35 18 1.83 0. 44 ] I * 34 31 . 1.64 0. 43 1 I 33 17 1.47 0. 41 1 I * 32 14 1. 31 0. 40 1 I * 31 12 1.15 0. 39 I I 30 15 1.00 0. 39 I I * -29 14 0.86 0. 38 ] I * 28 15 0.72 0. 37 ] I * 27 21 0. 59 0. 37 ] I * 26 11 0.46 0. 36 1 I * 25 10 0. 33 0. 36 I I * 24 9 0.20 0. 36 I I * 23 9 0.08 0. 36 1 I * 22 7 -0.05 0. 36 1 I * 21 .6 -rO. 17 0. 35 1 I * 20 5 -0.-29 0. 36 ] I * 19 3 -0. 42 0. 36 J I * 18 1 -0.54 0. 36 1 I * 17 o' -0.67 0. 36 I * . ~ 16 3 -0.79 0. 36 [I * 15 1 -0.92 0. 37 [I * 14 1 -1.06"- 0. 37 I 13 2 -1.20 0. 38 [I * 12 1 -1.34 0. 39 I * 11 0 -1.49 0. 39 [I * 10 0 -1.65 0. 41 [I * 9 0 -1.82 0. 42 [ I 8 0 -2.00 0 44 I * 7 0 -2. 20 0 46 [ I * 6 0 -2.42 0 49 [ I * 5 0 -2.67 0 53 [ I * 4 0 -,2.96 0 57 [I * 3 0 -3.32 0 64 : i * 2 0 -3.80 0 76 [ i * 1 0 -4.57 1 04 * + -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 MEAN ABILITY =1.23 SD OF ABILITY' 0.88 ho o 1964 computation test c a l i b r a t i o n ITEM CHARACTERISTIC CURVE DEPARTURE FROM EXPECTED ICC PAGE 7 FIT MEAN SQUARE SEQ ITEM I 1ST 2ND 3RD 4TH 5TH 6TH I 1ST 2ND 3RD 4TH 5TH 6TH I WITHN BETWN DISC POINT I NUM NAME I GROUP GROUP GROUP GROUP GROUP GROUP I GROUP GROUP GROUP GROUP GROUP GROUP I GROUP GROUP TOTAL INDX BISER I 1 C l I 0.93 1. 00 1.00 0.98 1. 00 0.97 I -0. 01 0. 03 0 . 02 -0. 01 0. 01 -0. 02 I 1.19- 2. 41 1. 51 0.69 0. 09 2 C2 I 0.84 0. 83 0.89 0.93 0. 96 0.94 I 0. 09 -0. 02 -0. 01 0. 00 0. 01 -0. 03 I 1.45 1. 13 1. 44 0.66 0. 15 3 C3 I 0.91 0. 88 0.91 0.93 0. 96 0.99 I 0. 09 -0. 02 -0. 02 -0. 02 -0. 01 0. 00 I 1.05 0. 72 1. 05 0.73 0. 13 4 C4 I 1.00 0. 95 0.98 1.00 0. 98 1.00 I 0. 04 -0. 03 -0. 01 0. 01 -0. 01 0. 00 I 1.04 1. 14 1. 04 1.97 0. 04 5 C5 I 0.82 0. 86 0.98 0.91 0. 94 0.93 I 0. 06 -0. 00 0. 08 -0. 03 -0. 01 -0. 05 I 1.48 2. 44 1 . 50 0. 56 0. 14 6 C6 I 0.76 0. 81 0.84 0.93 0. 92 0.97 I 0. 06 -0. 01 -0 . 02 0. 02 -0. 02 0. 00 I 1.01 0. 30 0. 99 0.89 0. 23 7 C7 I 0.71 0. 81 0.86 0.81 0. 96 0.96 I 0. 05 0. 02 0. 02 -0. 08 0. 03 -0. 00 I 1.00 0. 93 1. 00 0.87 0. 26 8 C8 I 0.76 0. 88 0.91 0.95 0.98 0.97 I -0. 02 0. 01 0. 00 0. 02 0. 02 -0. 01 I 1. 55 0. 22 1. 52 1.03 0. 26 9 C9 I 0.73 0. 83 0.86 0.93 0. 92 0.96 I 0. 03 0. 02 -0. 00 0. 02 -0. 02 -0. 01 I 1.07 0. 22 1. 06 0.85 0. 26 10 CIO I 0.82 0. 86 0.91 1.00 0. 96 0.97 .1 0. 02 -0.03 -0. 01 0. 05 -0. 00 -0. 01 I 1.77 0. 67 1. 75 0.93 0. 20 11 C l l I 0.56 0. 79 0.86 0.81 0. 94 1.00 I -0. 07 0. 03 0. 04 -0. 07 0. 02' 0. 05 I 0.80 1. 50 0. 82 1.12 0. 36 12 C12 I 0.71 0. 93 0.91 0.95 0. 94 0.99 I -0. 05 0. 07 0. 01 0. 02 -0. 02 0. 01 I 0.87 0. 63 0. 86 1.08 0. 28 13 C13 I 0.40 0. 64 0.75 0.77 0. 78 0.87 I -0. 02 0.06 0. 09 0. 01 -0. 05 -0. 03 I 1.10 0. 78 1. 09 0.85 0. 31 14 C14 I 0. 47 0. 55 0.55 0.84 0. 86 0.93 I 0. 04 -0. 03 -0. 12 0. 07 0. 03 0. 03 I 0.92 1. 16 0. 93 1.10 0. 40 15 C15 I 0.60 0. 81 0.86 0.91 0. 94 1.00 I -0. 07 0. 01 0. 01 0. 01 0. 01 0. 04 I 0.74 0. 86 0. 74 1.16 0. 40 16 C16 I 0.40 0. 64 0.73 0.84 0. 88 0.99 I -0. 10 -0. 01 -0. 00 0. 03 0. 02 0. 06 I 0.78 1. 25 0. 79 1.32 0. 46 17 C17 I 0.49 0. 50 0.55 0.72 0. 80 0.93 I 0. 10 -0. 04 -0. 08 -0. 01 -0. 00 0. 04 I 0.99 1. 00 0. 99 1.00 0. 37 18 C18 I 0.44 0. 60 0.73 0.86 0. 94 0.94 I -0. 06 -0. 06 -0. 01 0. 05 0. 07 0. 02 I 0.88 0. 91 0. 88 1.24 0. 41 19 C19 I 0.33 0. 57 0.61 0.81 0. 88 0.97 I -0. 10 -0. 02 -0. 06 0. 05 0. 05 0. 07 I 0.82 1. 62 0. 83 1.38 0. 48 20 C20 I 0.42 0. 62 0.68 0.86 . 0. 94 0.94 I -0. 07 -0. 03 -0. 04 0. 05 0. 08 0. 02 I 0.85 1. 04 0. 85 1.26 0. 44 21 C21 I 0.76 0. 93 0.93 0.86 0. 96 0.99 I -0. 01 0. 07 0. 03 -0. 08 0. 00 0'. 01 I 1.00 1. 27 1. 00 0.94 0. 24 22 C22~ I 0.16 0. 60 0.73' 0.74 0. 90 0.90 I -0. 24 0. 05 0. 09 0. 01 0. 10 0. 01 I 0.80 3. 20 0. 85 1.37 0. 52 23 C23 f "0.58 0. 76 0.57 0.79 0. 86 0.93 I 0. 09 0. 12 -0. 16 -0. 02 -0. 00 0. 00 I 1.03 1. 87 1. ,05 0.83 0. 27 24 C24 I 0.13 0. 45 0.48 0.65 0. 67 0.74 I -0. 10 0. 09 0. 03 0. 08 0. 02 -0. 05 I 0.99 1. 39 1. 00 0.98 0. 41 25 . C25 I 0.53 0. 57 0.75 0.88 0. 84 0.89 I 0. 05 -0. 06 0. 03 0. 08 -0. 02 -0. 04 I 1.07 0. 97 1. 07 0.87 0. 32 26 C26 I 0.29 0. 21 0.52 0.74 0. 65 0.86 I 0. 02 -0. , 19 0. 03 0. 14 -0. 04 0. 04 I 0.92 2. 19 0. ,95 1.16 0. 46 27 C2J I 0. 36 0. 60 0.75' 0.72 0. 90 0.94 I -0. 09 -0. 01 0. 06 -0. 06 0. 06 0.03 I 1.05 1. 13 1. ,05 1.23 0. 44 28 C28 I 0.33 0. 62 0.73 0.63 0. 92 0.91 I -0. 09 0. 05 0. 07 -0. 13 0. 10 0. 01 I 0.88 1. 99 0. 90 1.15 0. 44 29 C29 I 0.49 0. 55 0.73 0.72 0. 92 0.87 I 0. 05 -0. ,05 0. 05 -0. 05 0. 09 -0. 04 I 1.01 1. 17 1. 01 0.91 0. 34 30 C30 I 0.64 0. 71 0.84 0.95 0. 90 0.96 I 0. 01 -0. ,05 0. 02 0. 07 -0. 02 0. 00 I 1. 10 0. 61 1. 09 1.01 0. 33 31 C31 I 0.58 0. 71 0.91 0.88 0. 94 0.97 I -0. 05 -0. ,05 0. 08 0. 00 0. 02 0. 02 I 0.83 0. 82 0. ,83 1. 17 0. 39 32 C32 I 0. 36 0. 24 0.39 0. 56 0. 57 0.77 I 0. 15 -0. ,09 -0. 03 0. 03 -0. 05 0. 00 I 1.07 1. 65 1. ,08 0.84 0. 36 33 C33 I 0.62 0. 64 0.66 0.72 0. 73 0.76 I 0. 23 0. ,09 0. 02 -0. 02 -0. 07 -0. 13 I 1. 44 5. 59 1. , 53 0.23 0. 09 34 C34 I 0.04 0. 19 0.23 0.42 0. 51 0.74 I -0. 09 -0. ,03 -0. ,06 0. 03 0. 03 0. 08 I 0.91 1. 35 0. ,92 1.37 0. 48 35 C35 I 0. 49 0. 74 0.77 0.81 0. 82 0.97 I -0. 03 0. ,06 0. 02 -0. 01 -0. 06 0. ,04 I 0.95 0. 93 0. ,95 1.05 0. 36 36 C36 I 0.89 0. 95 0.95 0.95 1. ,00 1.00 I -0. 00 0. ,01 -0. 00 -0. 02 0. 02 0. 01 I 0.72 0. 49 0. ,71 0.99 0. 21 37 C37 I 0. 13 0. 19 0. 25 0.30 0. ,47 0.67 I 0. 01 -0. ,01 -0. 01 -0. 06 0. 01 0. 03 I 1.00 0. 26 0. ,98 1.07 0. 42 38 C38 I 0.02 0. 24 0. 27 0. 33 0. ,41 0.63 I -0. 09 0. ,06 0. ,03 -0. 01 -0. 02 0. 02 I 0.95 0. 97 0. ,96 1.10 0. 42 39 C39. I 0. 16 0. 21 0.18 0. 28 0. , 22 0.44 I 0. 08 0. , 09 0 . ,01 0. 03 -0. 10 -0. ,07 I 1.22 2. 35 1. 24 0. 56 0. 23 40 C40 I 0.11 0. 07 0.09 0.09 0. , 18 0.21 I 0. 08 0. , 02 0. 02 -0. 02 0. 03 -0. 09 I 1.63 2. 88 1. ,65 0.51 0. 12 41 C41 I 0.18 0. 36 0. 30 0.42 0. ,45 0.77 I n. 01 0. ,09 -0. ,05 -0. 03 -0. 10 0. 06 I 1.09 1. 17 1. , 10 0.99 0. 37 42 C42 I 0. 16 0 . 26 0. 39 0. 53 0. ,73 0.83 I -0. 06 -0. , 08 -0 . ,04 -0. 00 0. 10 0. ,05 I 0. PR 1. 17 0. ,88 1.32 0. 51 43 C43 I 0.09 0. 07 0.14 0.21 0. , 27 0. 46 I 0. 03 -0. .03 -0 . ,00 0. 00 -0. 01 0. ,00 I 1. 08 0. 24 1. ,06 0.97 0. 33 44 C44 I 0.02 0. 02 0.09 0.16 0. , 10 0. 39 I -0. 01 -0. .04 0. ,01 0. 03 -0. 08 0. ,05 I 0.85 0. 95 0. ,85 1.17 0. 35 SCORE RANGE 14-24 25-27 28-30 31-33 34-35 36-43 N = 45 42 41 43 49 70 287 6 293 DEC OF FROM MEAN ABILITY -0.14 0.49 0.86 1.33 1.71 2.50 PLUS=TOO MANY RIGHT MINUS=TOO MANY- WRONG 0. 08 0.58 0.08 STD ERROR GROUP MN SQ 2.2 0.9 0.7. 6.9 1.0 2.0 SE = 0.2 • SD(MN SQ) 3.2 1.3 1.3 1.3- 1.1 3.3 EXPECT'1.4 1964 computation test c a l i b r a t i o n SERIAL ORDER DIFFICULTY ORDER FIT ORDER SEQ ITEM NUM NAME ITEM DIFF DISC INDX FIT I MN SQ I SEQ NUM ITEM NAME ITEM DIFF DISC INDX FIT I MN SQ I SEQ NUM ITEM NAME ITEM DIFF DISC INDX FIT MN SQ POINT I BISER I 1 Cl -2.92 0.69 1.51 I 4 C4 -3. 33 1.97 1 . 04 I 36 C36 -2. 30 0 99 0. 71 0. 21 I 2 C2 -1.27 0.66 1.43 I 1 Cl -2.92 0.69 1. 51 I 15 C15 -0.87 1 16 0. 73 0. 40 I 3 C3 -1.71 0.73 1.04 I 36 C36 -2.30 0.99 0. 71 I 16 C16 -0.13 1 32 0. 79 0. 46 I 4 C4 -3. 33 1.97 x 1.04 I 3 C3 -1 .71 0.73 1. 04 I 11 C l l -0.66 1 12 0 81 0. 36 I 5 C5 -1.31 0.56 1.50 I 10 CIO -1.55 0.93 1. 74 I 31 C31 -0.69 1 17 0 82 0. 39 I 6 C6 -1.00 0.89 0.99 I 8 C8 -1 .40 1.03 1. 51 I 19 C19 0.14 1 38 0 83 0. 48 I 7 C7 -0.84 0.87 1.00 I 21 C21 -1 . 35 0.94 1. 00 I 22 C22 0.31 1 37 0 84 0. 52 I 8 C8 -1.40 1.03 1.51 I 12 C12 -1. 35 1.08 0. 86 I 20 C20 -0.11 1 26 0 85 0. 44 I 9 C9 -1.00 0.85 1.05 I 5 C5 -1.31 0.56 1. 50 I 44 C44 3.23 1 17 0 85 0. 35 I 10 CIO -1.55 0.93 1.74 I 2 C2 -1.27 0.66 1. 43 I 12 C12 -1.35 1 08 0 86 0. 28 I 11 C l l -0.66 1.12 0.81 I 6 C6 -1 .00 0.89 0. 99 I 42 C42 1.17 1 32 0 88 0. 51 I 12 C12 -1.35 1.08 0.86 I 9 C9 -1.00 0.85 1. 05 I 18 C18 -0.15 1 24 0 88 0. 41 I 13 C13 0.18 0.85 1.09 I 15 C15 -0.87 1.16 0. 73 I 28 C28 0.20 1 15 0.89 0. 44 I 14 C14 0.16 1.10' 0.92 I 7 C7 -0.84 0.87 1. 00 I 34 C34 1.77 1 37 0 92 0. 48 I 15 C15 -0.87 1.16 0.73 I 31 C31 -0.69 1.17 0. 82 I 14 C14 0.16 ' 1 10 0 92 0. 40 I 16 C16 -0.13 1.32 0.79 I 30 C30 -0.69 1.01 1. 08 I 26 C26 0.89 1 16 0 94 0. 46 I 17 C17 0.33 1.00 0.99 I 11 C l l -0.66 7 1.12 0. 81 I 35 C35 -0.24 1 05 0 95 0. 36 I 18 C18 -0.15 1.24 0.88 I 35 C35 -0. 24 1.05 0. 95 I 38 C38 2.01 1 10 0 95 0. 42 I - 19 C19 0.14 1.38 0.83 I 18 C18 -0. 15 1.24 0. 88 I 37 C37 .1.89 1 07 0 98 0. 42 I 20 C20 -0.11 1.26 0.85 I lfi C16 -0. 13 1.32 0. 79 I 6 C6 .-1.00 0 89 0 99 0. 23 I 21 C21 >l.35 0.94 1.00. I 20 C20 -0.11 1.26 0. 85 I 17 C17 0.33 1 00 0 99 0. 37 I 22 C22 0.-3L 1.37 0.84 I 23 C23 -0. 11 0.83 1. 04 I 24 C24 li06 0 98 1 00 0. 41 I 23 C23 -0.11 0.83 . 1.04 I 25 C25 -0.06 0.87 1. 07 I 21 C21 -1.35 0 94 1 00 0. 24 I 24 C24 1.06 0.98 1.00 I 27 C27 0.08 1.23 1. 05 I 7 C7 -0.84 0 87 1 00 0. 26 I 25 C25 -0.06 0.87 1.07. I 29 C29 0.10 0.91 1. 00 I 29 C29 0.10 0 91 1 00 0. 34 I 26 C26 0.89 1.16 0.94- I 19 C19 0.14 1. 38 0. 83 I 4 C4 -3.33 1 97 1 04 0. 04 I 27 C27 "0.08 1.23 1.05 I 14 C14 0.16 1.10 0. 92 I 3 C3 -1.71 0 73 1 04 0. 13 I 28 C28 0.20-- 1.15 0.89 I 13 C13 0.18 0.85 1. 09 I 23 C23 -0'. 11 0 83 1 04 0. 27 I 29 C29 0.10 0.91 1.00 I 28 C28 0. 20 1.15 0. 89 I 9 C9 -1.00 0 85 1 05 0. 26 I 30 C30 -0.69 1.01 1.08 I 33 C33 0.29 0.23 1. 52 I 43 C43 2.68 0 97 1 05 0. 33 I 31 C31 -0.69 1. 17 0.82 I 22 C22 0.31 1.37 0. 84 I 27 C27 0.08 1 23 1 05 0. 44 I 32 C32 1. 22 0.84 1.08 I 17 C17 0.33 1.00 0. 99 I 25 C25 -0.06 0 87 1 07 0. 32 I 33 C33 0.29 0.23 1.52 I 26 C26 0.89 1.16 0. 94 I 30 C30 -0.69 1 01 1 08 0. 33 I 34 C34 1.77 1. 37 0.92 I 24 C24 1.06 0.98 1. 00 I 32 C32 1.22 0 84 1 08 0. 36 I 35 C35 -0.24 1.05 0.95 I 42 C42 1. 17 1.32 0. 88 I 41 C41 1.52 0 99 1 09 0. 37 I 36 C36 -2. 30 0.99 0.71 I 32 C32 1.22 0.84 1. 08 I 13 C13 0.18 0 85 1 09 0. 31 I 37 C37 1.89 1.07 0.98 I 41 C41 1.52 0.99 1. 09 I 39 C39 2.44 0 56 1 24 0. 23 I , 38 C38 2.01 1. 10 0.95 I 34 C34 1.77 1.37 0 92 I 2 C2 -1.27 0 66 1 43 0. 15 I 39 C39 2. 44 0.56 1.24 I 37 C37 1.89 1.07 0. 98 I 5 C5 -1. 31 0 56 1 50 0. 14 I 40 C40 3.39 0.51 1;65 I 38 C38 2.01 1.10 0 95 I 8 C8 -1.40 1 03 1 51 0. 26 I 41 C41 1. 52 0.99 •1.09 I 39. C39 2.44 0.56 1 24 I 1 Cl -2.92 0 69 1 51 0. 09 I 42 C42 1.17 1.32 0.88 . I 43 C43 2.68 0.97 1 05 I 33 C33 0. 29 0 23 1 52 ' 0. 09 I 43 C43 . 2.68 0.97 1.05 I 44 C44 3.23 1.17 0 85 I 40 C40 3.39 0 51 1 65 0. 12 I 44 C44 3.23 1.17 0.85 I 40 C40 3.39 0. 51 1 65 I 10 CIO -1.55 0 93 1 74 0. 20 I MEAN S.D. 0. 00 1. 51 1.01 0.29 1.05 0. 25 CORRELATION DIFF*DISC SI --0. 10 DIFF*MNSQ= -0. 07 DISC*MNSQ= -r .62 1964 computation test c a l i b r a t i o n 20.0 II 1 1 ITEM MN.SQ. FOR EACH GROUP(Y) VERSUS PROB(RIGHT) (X) H II -II PAGE 9 II 33 16.0 40 . 12.0 22 33 8.0 26 32 23 39 1916 28 11 4.0 38 26 39 40 24 44 38 41 42 39 32 44 17 19 24 28 . 27 0.0 4344 4344 4440404440 40393743 34 37 41 3943 38 29 34323843 14 28 22 2334 29 20 42 2 31 41 2 5 11 33 22 13 41 2427 35 24 15 1442 13 25 2827 6 . 27 2026 222829 3237 726 20 29 1814 15 10 16 2339 20 2518 34 4217 19273824 30 35 25 1729 13 7 4 4 4 1 36 36 1420 83110 2533131132233525232228232130 8 II-0.0 -II- -II- -II-PROB(RIGIIT) PLOT SYMBOL = SEP NUMBER — II 1 . no ho 1964 computation test c a l i b r a t i o n II — _ II-2.5 TOTAL FIT MEAN SQUARE (Y) VERSUS DIFFICULTY (X) -II II — - - — ----II — — - - I I — ~ II PAGE 10 2.0 10 401 1.5 8 5 2 33 0.5 39 1.0 I 4 36 3 21 12 9 30 7 6 11 15 252713 2329 35 17 14 1820 28 19 22 16 32 41 24 43 3738 26 34 42 44 0.0 II — -3.33 — II — II II II - II-X-AXIS PLOT SYMBOL = SEQ NUMBER II 3.39 CO 1964 computation test c a l i b r a t i o n II 2.5 PAGE 11 TOTAL FIT MEAN SQUARE (Y) VERSUS DISCRIMINATION (X) H II : II •- — 1 1 — — II 0.5 40 10 1.5 33 5 1 2 39 1.0 9 4 330 7292124 6 173538 14 26 36 1131 15 27 34 12 44 20 42 19 16 0.0 II 0.0 — : I I — X-AXIS PLOT SYMBOL = SEQ NUMBER — II 1.97 to I—' 1964 computation test c a l i b r a t i o n II U -2.5 DISCRIMINATION (Y) VS DIFFICULTY (X) II PAGE 12 -II- -II 2.0 1.5 l.D 0.5 0.0 II — -3. 33 36 19 22 16 20 18 27 34 42 12 8 1021 1531 11 30 26 1428 35 38 37 17 9 7 29 25 13 23 24 41 32 43 39 33 X-AXIS PLOT SYMBOL = SEQ NUMBER 44 40 --II 3. 39 to APPENDIX D CORRESPONDENCE THE UNIVERSITY OF BRITISH COLUMBIA 2075 WESBROOK MALL VANCOUVER, B.C., CANADA V6T 1W5 FACULTY OF EDUCATION March 26, 1979.-Mr. D. R. Sutherland, Superintendent, School D i s t r i c t No. 77, Box 339, Summerland, B. C. , VOH 1Z0. Dear Mr. S u t h e r l a n d : I am w r i t i n g to e n l i s t your support f o r an important r e s e a r c h p r o j e c t which Mr. Thomas O'Shea, my r e s e a r c h a s s i s t a n t , and I are conducting t h i s s p r i n g . The study i s designed to o b t a i n i n f o r m a t i o n concerning the changes i n achievement l e v e l s i n Grade 7 mathematics from 1964 to the present. In 1964 and 1970, the M i n i s t r y of Education administered standardized achievement t e s t s i n mathematics to Grade 7 students throughout the p r o v i n c e , and these data have been made a v a i l a b l e to us. P r e l i m i n a r y a n a l y s i s i n d i c a t e s t h a t , although some d e c l i n e o c c u r r e d , changes appear to be c o n f i n e d to s p e c i f i c content areas w i t h i n the c u r r i c u l u m . We propose to administer these same t e s t s to a sample of present Grade 7 mathematics c l a s s e s and to compare and c o n t r a s t the r e s u l t a n t achievement p a t t e r n s . The sample has been c o n s t r u c t e d i n such a way as to minimize the chances of a g i v e n c l a s s being asked to p a r t i c i p a t e i n any M i n i s t r y - s p o n s o r e d p r o j e c t s t h i s s p r i n g . The l i s t of s c h o o l s from your d i s t r i c t whose p a r t i c i p a t i o n i s requested i s a t t a c h e d . A d m i n i s t r a t i o n of the t e s t s r e q u i r e s two f o r t y - f i v e minute c l a s s p e r i o d s f o r each classroom s e l e c t e d , p r e f e r a b l y on c o n s e c u t i v e days. D e t a i l e d i n s t r u c t i o n s , a d m i n i s t r a t i v e d i r e c t i o n s , , and t e s t m a t e r i a l s w i l l be mailed d i r e c t l y to the p r i n c i p a l s of the s c h o o l s i n v o l v e d . We hope to have the teachers a d m i n i s t e r the t e s t s i n the week o f A p r i l 23-27. S t r i c t c o n f i d e n t i a l i t y with r e s p e c t to students, s c h o o l s , and d i s t r i c t s w i l l be observed. The study w i l l r e s u l t i n comparisons i n performance across time on a province-wide b a s i s o n l y . A summary of the f i n d i n g s w i l l be sent to you before the commencement of the 1979-1980 sc h o o l year. Permission i s granted f o r Dr. David R o b i t a i l l e and Mr. Thomas O'Shea of the F a c u l t y of Education, U n i v e r s i t y of B r i t i s h Columbia, to contact the f o l l o w i n g schools w i t h regard to the a d m i n i s t r a t i o n of standardized t e s t s i n a r i t h m e t i c to Grade Seven students: Superintendent Date School D i s t r i c t THE UNIVERSITY OF BRITISH COLUMBIA 2075 WESBROOK MALL VANCOUVER, B.C., CANADA V6T 1W5 FACULTY OF EDUCATION A p r i l ^ ^ g > P r i n c i p a l , T. M. Roberts School, 10 W a t t s v i l l e St., Cranbrook, B.C., VIC 2A2. Dear Sir/Madam: The superintendent of your d i s t r i c t has g i v e n me permission to c o n t a c t you i n order to e n l i s t your help i n c a r r y i n g out an important r e s e a r c h p r o j e c t which Mr. Thomas O'Shea, a d o c t o r a l student at U. B. C , and I are conducting t h i s s p r i n g . The p r o j e c t , which has been approved by our Behavioural Sciences Screening Committee for Research I n v o l v i n g Human S u b j e c t s , i s being undertaken as a d o c t o r a l d i s s e r t a t i o n i n the Department of Mathematics Education at U. B. C. F i n a n c i a l support has been provided through a grant from the E d u c a t i o n a l Research I n s t i t u t e of B r i t i s h Columbia. In 1964 and 1970, the M i n i s t r y of Education administered s t a n d a r d i z e d t e s t s i n mathematics to Grade 7 students throughout the p r o v i n c e , and these data have been made a v a i l a b l e to us. P r e l i m i n a r y a n a l y s i s , using a new s t a t i s t i c a l model, i n d i c a t e s t h a t , although some d e c l i n e i n performance o c c u r r e d , changes appear to be c o n f i n e d to s p e c i f i c content areas w i t h i n the mathematics c u r r i c u l u m , f o r example, o p e r a t i o n s on common f r a c t i o n s . We propose to administer these same t e s t s to a sample of present Grade 7 mathematics c l a s s e s and to compare and c o n t r a s t the r e s u l t a n t achievement p a t t e r n s . Your school has been s e l e c t e d as p a r t of a s t r a t i f i e d random sample, based on geographic r e g i o n and school s i z e , of over 60 s c h o o l s i n more than 30 d i s t r i c t s throughout the p r o v i n c e . The sample has been c o n s t r u c t e d i n such a way as to minimize the chance t h a t your school w i l l be asked by the M i n i s t r y of Education to p a r t i c i p a t e i n any p r o j e c t s t h i s s p r i n g . We b e l i e v e that t the r e s u l t s w i l l be of i n t e r e s t to you and your teachers by h e l p i n g to i d e n t i f y c o n t i n u i n g s t r e n g t h s or p o t e n t i a l weaknesses w i t h i n the elementary mat-hematics c u r r i c u l u m . S t r i c t c o n f i d e n t i a l i t y with r e s p e c t to students, s c h o o l s , and d i s t r i c t s . w i 1 1 be observed. The study w i l l r e s u l t i n comparisons i n performance ac r o s s time on a p r o v i n c e -wide b a s i s o n l y . A summary of the f i n d i n g s w i l l be sent to your d i s t r i c t superintendent before the beginning of the 1979-80 sc h o o l year. THE UNIVERSITY OF BRITISH COLUMBIA 2075 WESBROOK MALL VANCOUVER, B.C., CANADA V6T 1W5 FACULTY OF EDUCATION A p r i l 18, 1979 To the Teacher/Test A d m i n i s t r a t o r The d i s t r i c t s u p e r i n t e n d e n t , and your p r i n c i p a l , have g i v e n us permission to ask f o r your help i n conducting an important research p r o j e c t i n B r i t i s h Columbia s c h o o l s . A l e t t e r to your p r i n c i p a l c o n t a i n s i n f o r m a t i o n on the background and purpose of the p r o j e c t . B r i e f l y , the study i s designed to y i e l d i n f o r m a t i o n concerning changes i n Grade 7 mathematics achievement from 1964 to the p r e s e n t . Claims of a g e n e r a l d e c l i n e i n performance have not been s u b s t a n t i a t e d by our p r e l i m i n a r y a n a l y s i s of data from 1964 to 1970. However, some d e c l i n e i n s p e c i f i c content areas w i t h i n the mathematics c u r r i c u l u m seems to be i n d i c a t e d . We hope to be able to i d e n t i f y p a r t i c u l a r t o p i c s on which the performance of pr e s e n t -day students i s d i f f e r e n t from that of students i n 1964 or 1970. Your Grade 7 c l a s s has been s e l e c t e d as p a r t of a random sample of over 60 c l a s s e s a c r o s s the p r o v i n c e . The study i s designed so that c o n c l u s i o n s can be drawn regarding the p r o v i n c i a l Grade 7 p o p u l a t i o n o n l y . No comparisons are p o s s i b l e of i n d i v i d u a l s , c l a s s e s , s c h o o l s , or d i s t r i c t s . Student names and school d i s t r i c t numbers are necessary f o r c l e r i c a l purposes o n l y . Once the data have been t r a n s f e r r e d from the t e s t papers, no i d e n t i f y i n g codes w i l l be r e t a i n e d . The names of students are r e q u i r e d on the t e s t papers o n l y to ensure that the two p a r t s of the t e s t which each student w r i t e s may be matched. If you p r e f e r to use some other means of i d e n t i f y i n g papers which w i l l accomplish the same purpose, please f e e l f r e e to do so. The t e s t s to be administered are i d e n t i c a l to those used in the 1964 and 1970 t e s t i n g programs. They are the A r i t h m e t i c Reasoning and,Arithmetic Computation t e s t s from the Stanford Achievement T e s t . S p e c i f i c i n s t r u c t i o n s regarding a d m i n i s t r a t i o n procedures are contained i n the document Di r e c t i o n s f o r A d m i n i s t r a t i o n which i s e n c l o s e d . Please f o l l o w these c l o s e l y s i n c e they are based on the o r i g i n a l d i r e c t i o n s f o r g i v i n g the t e s t s . The a d m i n i s t r a t i o n of each t e s t r e q u i r e s about one c l a s s p e r i o d of 45 minutes. It would be p r e f e r a b l e to g i v e the t e s t s on two c o n s e c u t i v e days, 224 ARITHBETIC ACHIEVEMENT TESTS D i r e c t i o n s f o r A d m i n i s t r a t i o n The t e a c h e r s h o u l d become t h o r o u g h l y f a m i l i a r w i t h a l l o f t h e f o l l o w i n g d i r e c t i o n s b e f o r e g i v i n g t h e t e s t s . G e n e r a l D i r e c t i o n s 1. B e f o r e b e g i n n i n g each t e s t , see t h a t the desks are c l e a r e d and t h a t each p u p i l has an e r a s e r and one o r two sharpened p e n c i l s , p r e f e r a b l y w i t h v e r y s o f t l e a d s . Pens s h o u l d not be used. A s u p p l y of e x t r a p e n c i l s s h o u l d be a t hand. S c r a t c h paper s h o u l d be p r o v i d e d . 2. A n a t u r a l c l a s s r o o m s i t u a t i o n s h o u l d be r e t a i n e d as f a r a s p o s s i b l e . P r o v i s i o n s h o u l d be made t o e n s u r e q u i e t and freedom from i n t e r r u p t i o n s of any k i n d . 3. The t e a c h e r s h o u l d t a k e p a i n s t o ensure t h a t the p u p i l s u n d e r s t a n d what t h e y are t o do i n each t e s t and how they a r e t o r e c o r d t h e i r answers. T h i s can be done b e s t by r e a d i n g the d i r e c t i o n s v e r b a t i m and s u p p l e m e n t i n g w i t h e x p l a n a t i o n s as q u e s t i o n s from t h e p u p i l s i n d i c a t e need. Hhen d o i n g t h i s , the t e a c h e r s h o u l d not g i v e h e l p on s p e c i f i c t e s t q u e s t i o n s , but may f u l l y c l a r i f y t h e d i r e c t i o n s . *». A f t e r a t e s t has been s t a r t e d , t h e t e a c h e r s h o u l d c i r c u l a t e about t h e room t o see t h a t i n s t r u c t i o n s a r e b e i n g f o l l o w e d . Hhen th e y are n o t , c l a r i f y f o r the i n d i v i d u a l p u p i l but do n o t d i s t u r b t h e e n t i r e c l a s s . 5. Adhere t o the time l i m i t s . A watch w i t h a second-hand s h o u l d be used i n o r d e r t o g u a r a n t e e u n i f o r m i t y o f t i m e . 6. F o l l o w i n g i s t h e s c h e d u l e f o r the a r i t h m e t i c t e s t s : FIBST SITTING D i s t r i b u t i n g b o o k l e t s , r e a d i n g d i r e c t i o n s , e t c . 5 min. T e s t 1: A r i t h m e t i c S e a s o n i n g ....... Work time 35 min. T o t a l 40 mia. SECOND SITTING D i s t r i b u t i n g b o o k l e t s , r e a d i n g d i r e c t i o n s , e t c . 5 min. T e s t 2: A r i t h m e t i c Computation ..... Work t i m e 35 min. 7o£&X 40 IBxii• I f a l l p u p i l s f i n i s h a t e s t b e f o r e the recommended time has e l a p s e d , time may be c a l l e d . 7. Under no c o n d i t i o n s s h o u l d a t e s t be s t a r t e d u n l e s s s u f f i c i e n t t ime i s a v a i l a b l e t o c omplete i t . 225 S p e c i f i c D i r e c t i o n s To a d m i n i s t e r each t e s t , say t o t h e p u p i l s : " T h i s i s a t e s t t o show how such you have l e a r n e d i n a r i t h m e t i c . Mhen you get y o u r t e s t b o o k l e t , do not w r i t e on i t or open i t u n t i l I t e l l you t o . " (Be s u r e p u p i l s do n o t open b o o k l e t s . ) Pass out t h e t e s t b o o k l e t s . Then s a y ; "Now look a t the f r o n t page where i t says •Name'. ( P o i n t t o t h e p r o p e r p l a c e . ) W r i t e your f i r s t and l a s t naaes h e r e . Be s u r e to w r i t e p l a i n l y . (Pause.) I n t h e second l i n e , w r i t e your s c h o o l d i s t r i c t number, and the name o f your s c h o o l . (Pause.) I n t h e t h i r d l i n e , w r i t e the d a t e . » A f t e r t h e b l a n k s have been f i l l e d i n , c o n t i n u e : "How l i s t e n c a r e f u l l y . l o u Bust do your b e s t , b u t I do not e x p e c t you to be a b l e t o answer a l l the q u e s t i o n s . Do not s t a r t u n t i l I say •BEGIN 1 and when I say • STOP• put your p e n c i l r i g h t down. I f you break your p e n c i l , h o l d up your hand and I w i l l g i v e you a n o t h e r . A f t e r we have begun you aust not ask q u e s t i o n s . " ( C o n t i n u e w i t h t h e d i r e c t i o n s f o r t h e f i r s t t e s t , g i v e n below.) F i r s t S i t t i n q - A r i t h m e t i c Reasoning "How open your b o o k l e t . A r i t h m e t i c R e a s o n i n g . F o l d the page back, l i k e t h i s , so t h a t o n l y the f i r s t page o f q u e s t i o n s i s showing." (Demonstrate.) "Look a t t h e t o p o f t h e page, where i t s a y s • D i r e c t i o n s * . (Hold up a b o o k l e t and p o i n t t o the p r o p e r p l a c e . ) "They say: •Hork an example, and t h e n compare your answer w i t h t h e answers which f o l l o w i t . I f y o u r answer i s one o f t h o s e g i v e n , mark t h e answer space t h a t has t h e same l e t t e r as your answer. Sometimes the c o r r e c t answer i s not g i v e n . I f you do not f i n d t h e c o r r e c t answer, mark t h e sp a c e under the l e t t e r f o r »not g i v e n f . Now l o o k a t t h e samples." (Hold up a b o o k l e t and p o i n t t o the sample e x e r c i s e s . ) t;' "The f i r s t sample s a y s : *How many are 3 b a l l s and 4 b a l l s ? 3 4 7 12 not g i v e n * , which i s the c o r r e c t answer?" (Wait f o r the c l a s s t o answer.) " l e s , t h e answer i s • 7 » . The l e t t e r b e s i d e the •7* i s • c ' f so t h e answer space under t h e l e t t e r • c* has been f i l l e d i n . Mow s t u d y the second sample. Hhat i s t h e answer?* 1 (Pause f o r r e p l y . ) 226 " l e s , • 5* i s the c o r r e c t answer t o t h i s problem, but i t i s not l i s t e d among t h e c h o i c e s . Hence, t h e c o r r e c t answer f o r t h i s example i s t h e answer 'not g i v e n * , so you f i l l i n t h e space under t h e l e t t e r • j 1 . " " F o r each example on t h i s page and on t h e next page, d e c i d e which i s t h e c o r r e c t answer, and f i l l i n t h e answer space below t h e l e t t e r which r e p r e s e n t s t h e answer you have chosen. Use the s c r a t c h paper you were g i v e n t o f i g u r e on." "Begin w i t h Q u e s t i o n No. 1 and answer as many q u e s t i o n s as you c a n . When you f i n i s h the f i r s t two pages, go r i g h t on t o P a r t Two, on the l a s t page. When you f i n i s h t h e l a s t page, go back and check your answers. READY. BEGIN!" (Record t h e s t a r t i n g t i m e . Add t w e n t y - f i v e m i n u t es and t e n minutes.) A f t e r t w e n t y - f i v e m i n u t e s , s a y : " I f you have not a l r e a d y s t a r t e d work on P a r t Two on t h e l a s t page, do so now." (Hake s u r e the p u p i l s do t h i s . ) Then say: "Go on w o r k i n g . " A f t e r an a d d i t i o n a l t e n minutes - i . e . , at t h e end of t h i r t y -f i v e m i n u t e s - s a y : "STOP! Put your p e n c i l down." C o l l e c t the t e s t b o o k l e t s i m m e d i a t e l y . (The f i r s t s i t t i n g ends here.) Second S i t t i n g - A r i t h m e t i c Coa£Utation D i s t r i b u t e t h e t e s t b o o k l e t s . Have the p u p i l s complete the t i t l e page as i n t h e f i r s t s i t t i n g (see • S p e c i f i c D i r e c t i o n s * ) . C o n t i n u e w i t h : "Now open your b o o k l e t , A r i t h m e t i c C omputation. F o l d t h e b o o k l e t back, l i k e t h i s , so t h a t o n l y the f i r s t page o f q u e s t i o n s i s showing. (See t h a t a l l do t h i s c o r r e c t l y . ) "Look a t the top o f t h e page, where i t s a y s • D i r e c t i o n s * . (Hold up a b o o k l e t and p o i n t t o the p r o p e r p l a c e . ) "They s a y : #Mork each example. Then compare your answer w i t h t h e answers g i v e n a t t h e r i g h t of the example. I f y o u r answer i s one o f t h o s e g i v e n , mark t h e answer space t h a t has t h e same l e t t e r a s your answer. Sometimes t h e c o r r e c t answer i s not g i v e n . I f t h e c o r r e c t answer i s n o t g i v e n , mark the answer space under t h e l e t t e r f o r * n ot 227 g i v e n 1 . Look c a r e f u l l y a t each example t o see what i t t e l l s you t o do. I f you need t o do any f i g u r i n g , use a s e p a r a t e sheet o f paper. ••• "How b e g i n w i t h Q u e s t i o n Ho. 1 and answer as many q u e s t i o n s on t h i s page and t h e next two pages as you can. When you f i n i s h t h e l a s t page, go back and check your work. Are t h e r e any q u e s t i o n s about what you are t o do? (Pause.) HEADY. BEGIN!" (Record the s t a r t i n g t i m e and add t h i r t y -f i v e minutes.) A f t e r t h i r t y - f i v e m i n u t e s , s a y : "STOP! C l o s e your b o o k l e t and put your p e n c i l down." C o l l e c t the t e s t b o o k l e t s i m m e d i a t e l y . (The second s i t t i n g ends here.)