"Education, Faculty of"@en . "Educational and Counselling Psychology, and Special Education (ECPS), Department of"@en . "DSpace"@en . "UBCV"@en . "Gard, Barbara Kathleen"@en . "2010-07-15T02:12:53Z"@en . "1986"@en . "Master of Arts - MA"@en . "University of British Columbia"@en . "This study investigated item characteristics which may affect the validity of the Slosson Intelligence Test (SIT) when used with school children in British Columbia. The SIT was developed as a quick, easily administered individual measure of intelligence to correlate highly with the Stanford-Binet Intelligence Scale as an anchor test. Use of the SIT has become widespread, but little technical information is available to support this.\r\nTo examine the internal psychometric properties of the SIT for British Columbia schoolchildren, SIT responses were collected from 319 children (163 males, 156 females) in three age groups (7 1/2, 9 1/2, and 11 1/2 years). These data were subjected to a variety of item analysis procedures. Indices were produced for: item difficulty, item discrimination (item-total test score correlations), rank correlation between empirically determined item difficulties and item order given in the test, test homgeneity, and item-pair homogeneity.\r\nResults of the item analyses suggest that the SIT does not function appropriately when used with British Columbia school children. Two-thirds of the item difficulty indices were found to be outside the desired range: one-third of the items did not discriminate effectively; and many items are not in correct order of difficulty in administration of the SIT. The thesis discusses effects of these findings on the test's internal consistency, criterion validity, and technical utilization. Factors which may underlie the shift in item difficulties are also discussed."@en . "https://circle.library.ubc.ca/rest/handle/2429/26474?expand=metadata"@en . "ANALYSIS OF ITEM CHARACTERISTICS OF THE SLOSSON INTELLIGENCE TEST FOR BRITISH COLUMBIA SCHOOL CHILDREN By BARBARA KATHLEEN GARD A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTERS OF ARTS in THE FACULTY OF EDUCATION (Department of Educational Psychology and Special Education) We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA March 1986 \u00C2\u00A9 B a r b a r a Kathleen Gard, 1986 In presenting t h i s thes is i n p a r t i a l fu l f i lment of the requirements for an advanced degree at the Univers i ty of B r i t i s h Columbia, I agree that the L ib ra ry s h a l l make i t f ree ly ava i l ab le for reference and study. I further agree that permission for extensive copying of t h i s thes is for scho la r ly purposes may be granted by the head of my department or by h i s or her representat ives. I t i s understood that copying or pub l i ca t ion of t h i s thes is for f i n a n c i a l gain s h a l l not be allowed without my wr i t t en permission. Department of\" ftrliinati nnal P s y n V i n i n e y & S p p n i a i Education The Univers i ty of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date March 1986 ABSTRACT This study investigated item c h a r a c t e r i s t i c s which may a f f e c t the v a l i d i t y of the Slosson I n t e l l i g e n c e Test (SIT) when used with school children i n B r i t i s h Columbia. The SIT was developed as a quick, e a s i l y administered i n d i v i d u a l measure of i n t e l l i g e n c e to c o r r e l a t e highly with the Stanford-Binet I n t e l l i g e n c e Scale as an anchor t e s t . Use of the SIT has become widespread, but l i t t l e t echnical information i s a v a i l a b l e to support t h i s . To examine the i n t e r n a l psychometric properties of the SIT for B r i t i s h Columbia schoolchildren, SIT responses were co l l e c t e d from 319 children (163 males, 156 females) i n three age groups (7 1/2, 9 1/2, and 11 1/2 years). These data were subjected to a variety of item analysis procedures. Indices were produced f o r : item d i f f i c u l t y , item discrimination (item-total test score c o r r e l a t i o n s ) , rank c o r r e l a t i o n between empirically determined item d i f f i c u l t i e s and item order given i n the t e s t , test homgeneity, and item-pair homogeneity. Results of the item analyses suggest that the SIT does not function appropriately when used with B r i t i s h Columbia school c h i l d r e n . Two-thirds of the item d i f f i c u l t y indices were found to be outside the desired range: one-third of the items did not discriminate e f f e c t i v e l y ; and many items are not i n correct order of d i f f i c u l t y i n administration of the SIT. The thesis discusses e f f e c t s of these findings on the test's i n t e r n a l consistency, c r i t e r i o n v a l i d i t y , and technical u t i l i z a t i o n . Factors which may underlie the s h i f t i n item d i f f i c u l t i e s are also discussed. Robert Conry, \"rTiTD. Research Supervisor^ i i TABLE OF CONTENTS Abstract i i Table of Contents i i i L i s t of Tables v Acknowledgements v i Chapter Contents Page I Introduction 1 The Slosson Intelligence Test 2 Importance of the Study .- 4 II Literature Review 6 I n i t i a l Development of the Slosson Intelligence Test .... 6 SIT Test Construction 8 Norm Population 9 SIT R e l i a b i l i t y 10 SIT V a l i d i t y 10 Concurrent V a l i d i t y of the Stanford-Binet and the SIT ... 11 Concurrent V a l i d i t y of the SIT and the Wechsler Scales .. 12 Concurrent V a l i d i t y of the SIT with Achievement Tests ... 14 Content Analysis of SIT Items 14 Summary 15 Purpose of the Study 15 III Methodology 17 Sample Ch a r a c t e r i s t i c s 17 Data C o l l e c t i o n 18 Data Analysis 18 Item D i f f i c u l t y 19 Item Discrimination (Item-Test Correlation) 20 Comparison of Rank Order Item D i f f i c u l t i e s 21 Loevinger's C o e f f i c i e n t of Test Homogeneity 22 Loevinger's C o e f f i c i e n t of Item Pair Homogeneity 23 IV Results 25 Analysis of Item D i f f i c u l t y Indices 25 Analysis of Item Discrimination Indices 28 Comparison of Rank Order of Item D i f f i c u l t y 29 Test Homogeneity 33 i i i TABLE OF CONTENTS CONT'D Chapter Contents Page IV Homogeneity of Pairs of Test Items 34 Summary 34 V Summary and Conclusions 35 Purpose of the Study 35 Summary of Test Findings 35 Limitations of the Study 39 Conclusion 40 References 43 Appendix A: Breakdown o\u00C2\u00A3 Sample by S t r a t i f i c a t i o n Variables 47 Appendix B: Computational Formulas for Determining Test Homogeneity and Item Pair Homogeneity 49 Appendix C: Item Categorization Schemas of the SIT 51 Appendix D: Co e f f i c i e n t s of Homogeneity of Item Pairs 53 iv LIST OF TABLES Table Contents Page Table 1 Common item range composition by age 19 Table 2 SIT Items: Frequency of response, index of d i f f i c u l t y , index of discr iminat ion and rank order misplacement by d i f f i c u l t y 22 Table 3 Percentage of items f a l l i n g below, within and above the preferred range of d i f f i c u l t y 28 Table 4 Items changing rank order posi t ion by more than ten places from the order of presentation for the B r i t i s h Columbia sample 30 Table 5 Categorization of items changing rank posi t ion d i f f i c u l t y by more than ten positions 32 v ACKNOWLEDGEMENTS I wish to express my appreciation to Dr. Barbara Holmes for her kindness i n permitting me to use for the purpose of my analyses, the SIT data she coll e c t e d as part of her large province-wide anchor-norming study. I also wish to thank my committee chair, Dr. Robert Conry, for his many readings of drafts and invaluable advice; my committee members, Dr. J u l i e Conry and Dr. Buff Oldridge, for their suggestions and support; and my advisor, Dr. Emily Goetz, for her f r i e n d l y encouragement to bring t h i s project to a close. v i Chapter I Introduction The Slosson Intelligence Test (Slosson, 1977, 1983) was developed to meet a need for an e a s i l y administered, b r i e f test of i n t e l l i g e n c e . Since i t s introduction over twenty years ago, the Slosson Intelligence Test (SIT) has proven popular (Brown & McGuire, 1976). The appeal of the SIT derives from the test's easy administrative and scoring procedures, brevity, low cost, and a v a i l a b i l i t y to a broad range of professionals without s p e c i f i c t r a i n ing in i n t e l l i g e n c e t e s t i n g . However, review of the technical information available regarding the development of the test suggests that a paucity of research underlies the widespread acceptance of the SIT. I t i s the purpose of t h i s study to examine the i n t e r n a l psychometric properties of the SIT i n regard to the use of the test with B r i t i s h Columbia schoolchildren. The benefits of standardized tests such as the SIT derive from care f u l technical construction and s e l e c t i o n of test items, and large representative norming procedures. Examination of SIT manuals (Slosson, 1977, 1983) indicates a lack of technical information pertaining to the development of the test. For example, norm tables are based on data col l e c t e d from a small, sketchily described, regional sample, only one r e l i a b i l i t y measure i s reported, and v a l i d i t y indices c i t e d are based on studies with limited samples. These technical weaknesses suggest that the psychometric properties of the SIT need to be further examined in r e l a t i o n to those populations to whom the test i s administered. 1 2 Evaluation of the psychometric properties of a test provides information as to how e f f e c t i v e l y the construct tested i s being measured (Jensen, 1980). The functioning of a test of general mental a b i l i t y , such as the SIT, can be judged through examining the properties of test items r e l a t i v e to the test as a whole (Jensen, 1980). To determine item c h a r a c t e r i s t i c s , technical procedures known as item analyses are conducted. Item analyses provide information regarding the shape and d i s t r i b u t i o n of test scores, discrimination among test takers, test score variance and the i n t e r n a l consistency r e l i a b i l i t y of the test (Jensen, 1982; Nunnally, 1978). To evaluate the effectiveness of the SIT in terms of the i n t e r n a l psychometric properties of test items for a B r i t i s h Columbia sample of schoolchildren, SIT responses, c o l l e c t e d as part of a province-wide .norming study (Holmes, 1981), were subjected to a variety of item an a l y t i c procedures. These included analysis of item d i f f i c u l t y , analysis of item-total test score correlations, c o r r e l a t i o n of the rank order of item d i f f i c u l t i e s between the B r i t i s h Columbia and norm sample, test homogeneity of adjacent item pairs (item-test homogeneity). For v a l i d and r e l i a b l e test measures, these item indices should remain r e l a t i v e l y stable across the populations to whom the test i s applied. The Slosson Intelligence Test The SIT i s an i n d i v i d u a l l y administered test composed of a series of primarily verbal questions arranged i n order of increasing d i f f i c u l t y . Each question i s credited as a pass or f a i l depending on whether or not a correct response i s given. The st a r t i n g point varies for each i n d i v i d u a l 3 based on age and a b i l i t y , and testing stops when consecutive f a i l u r e s occur. Since the SIT was designed for administration across a broad-age range, test items of varying d i f f i c u l t y were selected and rank ordered from easiest to most d i f f i c u l t . Rank ordering of items permits the use of basal and c e i l i n g points to shorten test administration without losing test r e l i a b i l i t y achieved from test length. The basal point i s the test item below which i t can be assumed that a l l e a r l i e r (less d i f f i c u l t ) items w i l l be answered c o r r e c t l y i f given, and the c e i l i n g point i s the test item above which f a i l u r e can be assumed on a l l higher placed (more d i f f i c u l t ) items i f given. For the SIT, the basal and c e i l i n g points are set, respectively, at ten consecutive correct and incorrect, and in d i v i d u a l s are tested only on the subset of items which f a l l between his or her basal and c e i l i n g points. The development of the SIT, both in structure and content, was based largely upon the 1960 Stanford-Binet, and high correlations obtained between the two tests were interpreted as a strong i n d i c a t i o n that the SIT provided a v a l i d substitute for the Stanford-Binet. In 1972 the Stanford-Binet was renormed and i t was found that IQ scores dropped by approximately six points, r e f l e c t i n g an increased sophistication among the general population (Terman and M e r r i l l , 1973). The change in the Stanford-Binet IQ tables r e s u l t i n g from the 1972 renorming lowered the c o r r e l a t i o n between the SIT and the Stanford-Binet and the SIT norms required r e v i s i o n i n order to re-establish high c o r r e l a t i o n v a l i d i t y with the Stanford-Binet. 4 The equi-percentile method of equating test scores was selected to rescale SIT IQ's to the 1972 r e v i s i o n of the Stanford-Binet. This method involves using the scores of one test (the 1972 Stanford-Binet) as an anchor against which the second test's (the SIT) scores are d i s t r i b u t e d and matched according to percentile ranks. The SIT norm tables were developed from r a t i o IQ's and i n order to obtain deviation IQ's which correlated highly with the 1972 Stanford-Binet deviation IQ's, the SIT simply matched IQ's along chronological and mental age. As test content on the Stanford-Binet was unchanged, raw scores (mental age) had remained constant and could function as the equating variable (Armstrong & Jensen, 1982, 1984). The 1981 norms were published in the second e d i t i o n of the SIT (Slosson, 1983) and a c o r r e l a t i o n of .95 with the 1972 Stanford-Binet was reported (Armstrong & Jensen, 1982, 1984). Importance of the Study The study reported here addresses several important issues relevant to the administration and i n t e r p r e t a t i o n of the SIT to B r i t i s h Columbia schoolchildren. F i r s t , the SIT i s generally accepted as a test of general i n t e l l e c t u a l a b i l i t y (Brown & McGuire, 1976). However, the widespread use of the SIT i s not validated by empirical research available supporting the test's claims. Second, the technical information available on the SIT i s meager and outdated: the test questions and item order are based on data col l e c t e d on a limited sample over twenty years ago. The v a l i d i t y and r e l i a b i l i t y of any test application and i n t e r p r e t a t i o n rests upon the technical strengths of the test i t s e l f . Third, the 1981 renorming of the SIT f a i l e d to improve the technical weaknesses of the e a r l i e r version. No 5 item analyses are reported i n the 1983 manual, a representative norm sample was not col l e c t e d for the purpose of rescaling the IQ scores, and the renorming consisted only of matching IQ's through use of the equi-percentile method to equate them to the 1972 Stanford-Binet norm tables. Fourth, the SIT's psychometric properties have not been examined r e l a t i v e to the B r i t i s h Columbia population so as to support l o c a l use and int e r p r e t a t i o n of test r e s u l t s . 6 Chapter II Literature Review This chapter reviews l i t e r a t u r e relevant to the use of the Slosson Intelligence Test for use with B r i t i s h Columbia schoolchildren. The development of the SIT and the technical data presented in the test manual are reviewed, and i t s standardization, v a l i d i t y , and r e l i a b i l i t y are discussed. I n i t i a l Development of the Slosson Intelligence Test Richard L. Slosson designed the Slosson Intelligence Test (SIT) as a short, e a s i l y administered measure of i n t e l l i g e n c e . The major i n t e l l i g e n c e tests available prior to the development of the SIT were the Stanford-Binet and the Wechsler Scales of Intelligence, which require extensive t r a i n i n g in administrative, scoring, and i n t e r p r e t i v e procedures. The SIT was constructed to be a t t r a c t i v e to professionals who need an i n d i c a t i o n of an i n d i v i d u a l c l i e n t ' s i n t e l l e c t u a l a b i l i t y , but who either are not s p e c i a l l y trained i n i n t e l l i g e n c e test administration or want a less time-consuming assessment instrument. For example, the SIT manual states that \"the test has been made for the use of school teachers, p r i n c i p a l s , psychometrists, psychologists, guidance counselors, s o c i a l workers, school nurses and other responsible persons who, in t h e i r professional work, often need to evaluate an i n d i v i d u a l ' s mental a b i l i t y \" (Slosson, 1977, p . i i i ) and that the SIT \"yields s u f f i c i e n t l y v a l i d IQ's, for children four years of age into adulthood, as to furnish a useful screening instrument in the hands of responsible, professional persons\" ( p . v i i i ) . Although Slosson uses the term \"screening instrument,\" the 7 in t e r p r e t a t i v e potential of IQ scores from the SIT i s equated with that of the Stanford-Binet. For example, the manual states that \"IQ i s a numerical score...(which) gives an i n d i c a t i o n of a person's a b i l i t y to learn, solve and understand problems. I t i s a 'rough' measure of an in d i v i d u a l ' s capacity to reason, judge and ret a i n knowledge\" (1977, p.24); also, that \" i t i s generally proposed and accepted that the r e s u l t s of IQ tests should be or can be used as achievement predictors\" (1983, p.41), and that \"the Slosson (SIT) i s a v a l i d , r e l i a b l e , i n d i v i d u a l IQ test that achieves i t s stated purpose\" (Slosson, 1983, p.49). Introduced i n 1961, the SIT quickly gained popularity. For example, a survey of test use i n c l i n i c s across the United States found the SIT to be one of the ten most frequently administered tests across a l l age l e v e l s (Brown & McGuire, 1976). Frequent use of the SIT has also been noted i n B r i t i s h Columbia schools, the arena for t h i s study (Holmes, 1981). The design of the SIT i s founded on the assumption that individuals gain knowledge over time. Like the Stanford-Binet, the SIT i s composed of a series of test questions ordered by ascending d i f f i c u l t y . SIT items are assigned a chronological age equivalent corresponding to the age at which a c h i l d of average a b i l i t y i s expected to pass or f a i l . t h e item. A number of months of mental age cr e d i t i s assigned to each question, with the t o t a l obtainable per chronological year l e v e l equal to twelve. To obtain a f u l l year's mental age c r e d i t , a l l questions within that age l e v e l need to be answered c o r r e c t l y . An i n d i v i d u a l ' s Intelligence Quotient (IQ) i s determined by comparing the number of months mental age c r e d i t received from correct test responses to the in d i v i d u a l ' s chronological age. For example, an i n d i v i d u a l of average i n t e l l i g e n c e would be expected to answer test 8 questions c o r r e c t l y up to a d i f f i c u l t y l e v e l equal to his chronological age, while an i n d i v i d u a l with above-average i n t e l l i g e n c e would be able to answer test items rank ordered in d i f f i c u l t y above his chronological age. SIT test questions range from an infant l e v e l (0-0.5 months) to an adult l e v e l (27-0 years). SIT item design was based on two well-established measures of cognitive functioning (Slosson, 1977). At the infant and early childhood age l e v e l s , test questions follow the format of the Gesell Infant Scale of Development and are comprised primarily of performance-type items which involve f i n e motor s k i l l s , while school-age c h i l d to adult l e v e l questions are based on the Stanford-Binet. From items 5-4 up, a l l questions are administered verbally and require a verbal response. IQ determination i s based on the sum of mental age c r e d i t s given for a l l items answered co r r e c t l y plus c r e d i t for a l l pre-basal items. No c r e d i t . i s given for items after the c e i l i n g point. The SIT o r i g i n a l l y used the r a t i o IQ formula to determine IQ where IQ = MA/CA x 100 (MA = number of months mental age c r e d i t obtained; CA = chronological age). The 1981 r e v i s i o n of the SIT involved equating or rescaling SIT IQ's to match the 1972 Stanford-Binet deviation IQ norm tables. SIT Test Construction The SIT test manual (1977, 1983) provides scanty information regarding the construction of the SIT. The manual states that item design was based on the Stanford-Binet test questions and that \"the most favorable\" (p.iv) items were selected over several years of testing (Slosson, 1977). Items which teachers reported to be d i f f i c u l t to administer or score were 9 eliminated. No other selec t i o n c r i t e r i a and no item s t a t i s t i c s are provided. Norm Population No account of the norming procedures i s given i n the 1977 SIT test manual. Sample size and s t r a t i f i c a t i o n information such as age, sex or socio-economic status of the sample i s not detailed. The information given regarding the sample used for concurrent v a l i d i t y studies of the SIT with the Stanford-Binet i s : The children and adults used i n obtaining comparative r e s u l t s , came from both urban and r u r a l populations i n New York State. The r e f e r r a l s came from cooperative nursery schools, public, parochial and private schools, from junior and senior high schools. They came from g i f t e d as well as retarded classes \u00E2\u0080\u0094 White, Black and some American Indian. Some came from a c i t y Youth Bureau, some from a Home for Boys. The very young children resided i n an infant home. The adults came from the general population, from various professional groups, from a university graduate school, from a state school for the retarded and from a county j a i l (Slosson, 1977, p . i v ) . No further d e t a i l s regarding the sample composition are provided. Development of the 1981 norm tables was based on a sample of 1109 subjects, aged 2 years 3 months to 18 years. Data was co l l e c t e d between 1968 and 1977 and included some of the o r i g i n a l sample data (S. Slosson, personal communication, February 13, 1986). The sample was drawn only from the New England area. Sample d i s t r i b u t i o n was analyzed within four age groups: below 6-6, 6-7 to 10-7, 10-7 to 13-6, 13-7 and above; and within three IQ a b i l i t y l e v e l s : below 84, 84-116, above 116 (Armstrong & Jensen, 1982; Slosson, 1983). No other sample c h a r a c t e r i s t i c s or sel e c t i o n procedures are described. 10 SIT R e l i a b i l i t y Only one r e l i a b i l i t y measure i s reported in the SIT test manuals (Slosson, 1977, 1983). A test-retest r e l i a b i l i t y of .97 was obtained over a two-month i n t e r v a l for a sample of 139 subjects aged four to f i f t y . No new r e l i a b i l i t y information i s given,in the 1983 manual but the SIT technical manual reports an additional test-retest c o r r e l a t i o n a l finding of .93 for a sample of 350 individuals over a ten-week i n t e r v a l (Armstrong & Jensen, 1982). SIT V a l i d i t y The SIT (1977) reports \" s u f f i c i e n t \" test v a l i d i t y , based on a t o t a l of nine concurrent v a l i d i t y studies ( p . v i i i ) . Correlation c o e f f i c i e n t s are given for only four of the studies while IQ scores alone are reported in the other studies. Six studies report the relationship of the SIT with the Stanford-Binet, two with the WAIS or WISC as well as the Stanford-Binet, and one with the C a t t e l l Infant Intelligence Scale. Six of the studies interpret findings based on data from less than 25 ind i v i d u a l s . Concurrent v a l i d i t y c o e f f i c i e n t s , reported by age levels (four and up), for the SIT and the Stanford-Binet f a l l in the mid-90's (r =.23 to .71). On the basis of the high correlations obtained between Stanford-Binet and SIT IQ scores, the test author concludes that the SIT i s a val i d assessment instrument for individuals four years of age and up (Slosson, 1977, 1983). The 1983 edition of the SIT reviews v a l i d i t y data from various c o r r e l a t i o n a l studies of the SIT with the Stanford-Binet, Wechsler scales and other achievement measures. A median correlation of .90 (range .96 to .60) i s reported for 18 studies c o r r e l a t i n g Stanford-Binet and SIT IQ scores carried out between 1963 and 1974; a median correlation of .75 11 with Wechsler f u l l - s c a l e IQ's (range .96 to .52), .82 with Wechsler verbal IQ's (range .96 to .44) and .62 with Wechsler performance scale scores (range .84 to .10) was based on 18 c o r r e l a t i o n a l studies conducted between 1968 and 1974. The median of eighteen c o r r e l a t i o n a l studies between the SIT and various achievement tests was found to be .55 with a range of .83 to .24. Concurrent V a l i d i t y of the Stanford-Binet and the SIT A ten-year review (1963-1974) of research involving the concurrent v a l i d i t y of the SIT was carried out by Stewart and Jones (1976). Ten concurrent v a l i d i t y studies of the SIT and the Stanford-Binet were reviewed (Armstrong & Jensen, 1972; Armstrong & Mooney, 1971; C a r l i s l e , Shinedling & Weaver, 1970; DeLapa, 1973; Johnson & Johnson, 1971; Jongeward, 1968; Lamp, Traxler & Gustafson, 1973; R i t t e r , Duffey & Fischman, 1973; Stewart, Wood & Gallman, 1971; Stewart & Myers, 1974). V a l i d i t y c o r r e l a t i o n c o e f f i c i e n t s ranged from .60 to .94 with a median of .90. Although some studies did not obtain v a l i d i t y c o e f f i c i e n t s as high as those reported i n the SIT manual (DeLapa, 1973; Johnson & Johnson, 1971; Jongeward, 1968; Lamp et a l . , 1973; Stewart & Myers, 1974), the median c o r r e l a t i o n of .90 generally supports Slosson's finding that the SIT measures a construct similar to that of the Stanford-Binet. Mean IQ scores obtained on the two measures d i f f e r e d by four points or less i n each of the 10 studies. Stewart and Jones concluded that the ranked ordering of children on the SIT and the Stanford-Binet was nearly equivalent. However, they cautioned against substitution of the SIT for the Stanford-Binet because large enough discrepancies occurred between IQ 12 scores on the two tests to have resulted in m i s c l a s s i f i c a t i o n of a s i g n i f i c a n t proportion of indi v i d u a l s . Rotatori, Sedlak, and Freagon (1979) administered both tests to 40 severely or profoundly retarded children, aged 11 to 19. A cor r e l a t i o n of .90 was obtained between IQ scores on the two measures, which concurs with the e a r l i e r findings. High agreement was found between the rank ordering of individuals on the two tests. However, i t was noted that scores on the SIT were more than seven points higher than those on the Stanford-Binet in 75% of the cases. Rogers (1982) cautioned against use of the SIT after finding that IQ scores on the SIT were 9 to 40 points (x\" = 20) higher than Stanford-Binet IQ scores for nine 3 to 6 year olds. The larger difference in IQ scores obtained between these two studies and the e a r l i e r studies may be an a r t i f a c t of the renorming of the Stanford-Binet in 1972. Concurrent V a l i d i t y of the SIT and the Wechsler Scales Stewart and Jones (1976) also summarize the findings of 15 studies reporting on the corr e l a t i o n of the SIT with WISC or WAIS IQ's (Houston & Otto, 1968; Jerrolds, Calloway & Gwaltney, 1972; Jongeward, 1968; Kaufman & Ivanoff, 1969; Lamp et a l . , 1973; Lessler & Galinksy, 1971; Martin & Rudolph, 1972; Maxwell, 1971; Stewart et a l . , 1971; Stewart & Myers, 1974; Swanson & Jacobson, 1970). Overall, i t was noted that the SIT correlated highest with the Wechsler Verbal Scale, s l i g h t l y lower with the Wechsler Full- S c a l e , and considerably lower with the Performance Scale. SIT correlations ranged from .52 to .96 with a median of .83 with the Verbal Scale; from .44 to .94 with a median of .74 with the Full-Scale; and from .10 to .84 with a median of .65 with the Performance Scale. Stewart and Jones (1976) conclude that i t i s not \" j u s t i f i a b l e to treat the SIT IQ as a 13 d i r e c t substitute for the. Wechsler IQ\" (p.375) because the two tests d i f f e r in the s k i l l s they measure, especially the Wechsler Performance Scale. They also note that use of the SIT usually results in substantially higher IQ scores than the Wechsler scales, which could lead to m i s c l a s s i f i c a t i o n of i n t e l l e c t u a l a b i l i t y . A number of studies reporting Wechsler-SIT correlations have been published in the ten years since the Stewart and Jones review (Baum & Kelly, 1979; Covin, 1977a, 1977b; Crofoot & Bennett, 1980; Dirks, Wessels, Quarforth & Quervon, 1980; Lowrance & Anderson, 1979; Mize, Calloway & Smith, 1979; Rotatori et a l . , 1979; Rust & Lose, 1980; Smith, 1981; Vance, Lewis & DeBell, 1979). In general, their findings support the conclusions drawn from the e a r l i e r studies. SIT IQ scores correlate highest with the Wechsler Verbal Scale (range .41 to .92; median .61) and lowest with the Performance Scale (range .003 to .70; median .51). A number of studies found the SIT to y i e l d s i g n i f i c a n t l y d i f f e r e n t IQ scores than the Wechsler. The SIT tended to overestimate IQ scores at the higher end of the IQ range and to underestimate IQ at the lower range of i n t e l l i g e n c e (Baum & Kelly, 1979; Covin, 1977a, 1977b; Crofoot & Bennett, 1980; Dirks et a l . , 1980; Lowrance & Anderson, 1979; Mize et a l . , 1979). These researchers caution against substitution of the SIT for the Wechsler scales because of the potential m i s c l a s s i f i c a t i o n of i n d i v i d u a l s . For the B r i t i s h Columbia sample used in this study, Holmes (1981) reports a c o r r e l a t i o n of .75 between the SIT and the WISC-R Verbal Scale, .48 with the Performance Scale, and .71 with the Ful l - S c a l e . 14 Concurrent V a l i d i t y of the SIT with Achievement Tests The c o r r e l a t i o n of various achievement tests with the SIT has also been reported i n a number of studies. Stewart and Jones (1976) summarize fourteen studies carried out between 1967 and 1974 with a variety of tests including the Wide Range Achievement Test, the Peabody Picture Vocabulary Test, and the C a l i f o r n i a Achievement Test. Correlations ranged from .24 to .83 with a median c o r r e l a t i o n of .55. A review of the research shows that more recent studies have found si m i l a r c o rrelations between the SIT and achievement tests (Baum & Abelson, 1981; Cianflone & Zullo, 1980; Colarusso, McLesky & G i l l , 1977; Coleman, Brown & Ganong, 1980; Covin, 1977a, 1977b; Crofoot & Bennett, 1980; Grossman & Johnson, 1983; Hale, Douglas, Cummins, Rittgarn, Breeds & Dabbert, 1978; Kle i n , 1978; Martin, B l a i r & Vickers, 1979; Rust & Lose, 1980; Smith, 1981; Vance et a l . , 1979). Correlational findings of the SIT with achievement tests, including the Peabody Picture Vocabulary Test, the Wide Range Achievement Test, Stanford Achievement Tests, the McCarthy, and the Shipley I n s t i t u t e of Living Scale, ranged from .31 to .94 with a median of .56. S i m i l a r l y , a c o r r e l a t i o n of .62 was found between the SIT and the PPVT for a B r i t i s h Columbia norming sample (Holmes, 1981). Content Analysis of SIT Items Several researchers have examined the content of SIT items in terms of the i n t e l l e c t u a l functions they measure r e l a t i v e to the Stanford-Binet and the WISC. Nicholson (1970) applied S a t t l e r ' s Stanford-Binet c l a s s i f i c a t i o n scheme to SIT items and Stone (1975) adapted Valett's c l a s s i f i c a t i o n scheme to determine the degree of s i m i l a r i t y of item content between the two tes t s . Both reports note a high, but not exact, 15 correspondence between the proportion and type of mental functions evaluated. Boyd (1974) and Fudala (1979) analyzed the item content of the SIT r e l a t i v e to the WISC and conclude that SIT item content corresponds to the WISC Verbal Scale. Comparison of the four categorization schemas shows no major discrepancies between c l a s s i f i c a t i o n of i n d i v i d u a l items i f allowance i s made for the d i f f e r e n t terms used by Nicholson ( i . e . language for vocabulary, s o c i a l i n t e l l i g e n c e for information, memory for d i g i t span). SIT items were categorized as vocabulary, information, arithmetic, s i m i l a r i t i e s , d i g i t span, and visual-motor. Summary In summary, the above review of the development of the SIT and studies which have examined the test's concurrent v a l i d i t y indicate technical weaknesses i n both o r i g i n a l and revised editions of the SIT. Test norms are li m i t e d , r e l i a b i l i t y information i s lacking, item s t a t i s t i c s are not given, and concurrent test v a l i d i t y i s based on small samples. These areas of weakness suggest a need for further evaluation of the SIT for the populations to whom the test i s given. Purpose of the Study The present study i s designed to examine the in t e r n a l psychometric properties of the SIT i n r e l a t i o n to use of the test with B r i t i s h Columbia schoolchildren. I t should be noted here that Holmes' (1981) data used for the present item analyses was collected on the 1977 ed i t i o n of the SIT. The findings of th i s study, however, are equally applicable to the administration of the revised e d i t i o n of the SIT (1983) because no changes were made in the test items themselves, or in the order of their 16 presentation. As test items and item order are i d e n t i c a l i n the o r i g i n a l and the revised editions of the SIT, administration procedures and questions asked remain the same for both editions. IQ scores which do d i f f e r between editions, are not involved i n item analysis conducted in t h i s study. Therefore, the findings reported in t h i s paper are applicable to the i n t e r p r e t a t i o n of r e s u l t s a r i s i n g from use of either e d i t i o n of the SIT. Five research questions are addressed i n this study and a l l of them rela t e to the adequacy of use of SIT with B r i t i s h Columbia schoolchildren: 1. How adequate or e f f e c t i v e are the range and d i s t r i b u t i o n of SIT item d i f f i c u l t y indices? 2. How adequate or e f f e c t i v e are the range and d i s t r i b u t i o n of SIT item t o t a l test score correlations (item discrimination)? 3. How adequate or e f f e c t i v e i s the c o r r e l a t i o n between the rank order of SIT item d i f f i c u l t i e s for the B r i t i s h Columbia sample and the rank order given in the test? 4. How adequate i s the SIT's test homogeneity? 5. How adequate i s the range and d i s t r i b u t i o n of adjacent item pair homogeneity? 17 Chapter III Methodology This chapter presents methods used to c o l l e c t and analyze the data. Subject c h a r a c t e r i s t i c s , testing procedures, and data analysis methodology are outlined. Sample C h a r a c t e r i s t i c s In t h i s i n v e s t i g a t i o n , analysis of SIT item r e l i a b i l i t y for B r i t i s h Columbia children used Holmes' (1981) data co l l e c t e d to norm several psycho-educational measures frequently administered to B r i t i s h Columbia schoolchildren (Wechsler Intelligence Scale for Children-Revised, Raven's Standard Progressive Matrices, the Peabody Picture Vocabulary Test, the M i l l H i l l Vocabulary Test, and the Slosson Intelligence Test). Holmes selected children in three age groups as a representative sample of the B r i t i s h Columbia population of schoolchildren. The s t r a t i f i c a t i o n variables i d e n t i f i e d were based on those used i n the standardization of the WISC-R (Wechsler, 1974) and included: age, sex, geographic region, community si z e , and size of school. A breakdown of sample c h a r a c t e r i s t i c s i s given i n Appendix A. SIT tests were given to a t o t a l of 319 children (163 males and 156 females) i n three age groups: 7 1/2 year olds (ri = 108), 9 1/2 year olds (ri = 111), and 11 1/2 year olds (n_ = 100). At the time of testing children were within 3 months of the midyear, i . e . 7 years 3 months to 7 years 9 months. The three age groups correspond to grades 2, 4, and 6 and were chosen to be representative of elementary schoolchildren. 18 Data C o l l e c t i o n The SIT (1977) was i n d i v i d u a l l y administered to each c h i l d i n the study. Testing was conducted during school hours in a quiet location within the c h i l d ' s school. Test administration was counter-balanced. A l l tests were administered by trained personnel f a m i l i a r with SIT test procedures. Data Analysis As basal and c e i l i n g points were not constant for a l l children, the set of children and the number of responses varied among SIT items. In order to carry out item analyses, a set of items administered to a majority of the children i n the sample was i d e n t i f i e d . These items are referred to as the common item range, and for the purpose of item analysis was established as those items given to f i f t y percent or more of the children tested for each of the three age l e v e l s . This c r i t e r i o n was used i n an item analysis of the Peabody Picture Vocabulary Test (Berry, 1977) which also had variable data per i n d i v i d u a l . I t was adopted i n the present study for comparability of research method, and because i t provided more than 50 responses per item for analysis. The items analyzed at each age l e v e l are i d e n t i f i e d i n Table 1. 19 Table 1 \u00E2\u0080\u00A2 Common item range composition by age Age Common Item Range Number of Items 1 1/2 5-4 to 12-4 43 9 1/2 6-10 to 15-4 52 11 1/2 7-0 to 18-6 65 A l l Groups 5-4 to 18-6 75 To investigate the psychometric properties of the SIT for the B r i t i s h Columbia sample, the following f i v e types of item analysis were conducted. Item D i f f i c u l t y Item d i f f i c u l t y (p) values indicate the proportion of individuals who answer a dichotomously scored item c o r r e c t l y and r e f l e c t the extent to which items discriminate between i n d i v i d u a l s . Item d i f f i c u l t y values range from 0 to 1. Items which approach 0 increase i n d i f f i c u l t y (fewer pass the item) while items which approach 1 decrease in d i f f i c u l t y (more pass the item). As item d i f f i c u l t i e s approach .5, the d i s t r i b u t i o n of test scores becomes more normal, and standard deviation increases (Nunnally, 1978). Items i n the middle range of d i f f i c u l t y (.25 to .75) are preferred for their potential to disperse test scores and enhance i n d i v i d u a l differences (Stanley & Hopkins, 1981). 20 D i f f i c u l t y values influence the discriminating power of items which i n \u00E2\u0080\u00A2turn influences test variance and i n t e r n a l consistency r e l i a b i l i t y (Stanley & Hopkins, 1981). As item d i f f i c u l t y values approach .5, item i n t e r c o r r e l a t i o n s and in t e r n a l consistency r e l i a b i l i t y i s maximized. Items which f a l l at either end of the d i f f i c u l t y continuum f a i l to discriminate between in d i v i d u a l s and add no r e l i a b l e variance (Nunnally, 1978). Out of order items, i n terms of increasing d i f f i c u l t y l e v e l , and item d i f f i c u l t i e s which do not maximize discrimination, are two common test problems (Bornstein, McLeod, McLurg & Hutchinson, 1983). To compute the index of d i f f i c u l t y , correct responses were assigned a value of 1 and incorrect responses 0. Total test score was equal to the number of correct items i n the common item range for the relevant age l e v e l . Item Discrimination (Item-Test Correlation) Item discrimination, sometimes referred to as item v a l i d i t y , i s the cor r e l a t i o n of an item with a c r i t e r i o n and i s a form of the Pearson product-moment c o r r e l a t i o n c o e f f i c i e n t . P o i n t - b i s e r i a l r (r , . ) i s the v pbis preferred product-moment c o r r e l a t i o n c o e f f i c i e n t used to estimate the rela t i o n s h i p between a dichotomously scored test item and a continuous variable, here t o t a l test score (Nunnally, 1978). Item discrimination values r e f l e c t how well an item discriminates between high and low scores on the o v e r a l l test and range from -1.0 to +1.0. Items that correlate highly with t o t a l test score (approach +1.0) increase the r e l i a b i l i t y of i n d i v i d u a l differences and the test's standard deviation (Jensen, 1980; Nunnally, 1978). Test r e l i a b i l i t y i s greatest when item-total test score correlations f a l l above .30 (Nunnally, 1978). 21 Discrimination values are related to item d i f f i c u l t y and are maximized when item d i f f i c u l t y l e v e l s approach .5. For the purpose of t h i s item analysis, point b i s e r i a l correlations were determined for each item i n the common item range. Total test score for each age l e v e l was defined as the sum of the number of items answered co r r e c t l y with cr e d i t for pre-basal items. Comparison of Rank Order Item D i f f i c u l t i e s By arranging test items i n order of increasing d i f f i c u l t y and using basal and c e i l i n g rules as entry and exit points, tests such as the SIT can be administered to a wide age and a b i l i t y range of in d i v i d u a l s and yet be kept b r i e f . The use of basal and c e i l i n g rules involves the prediction that a l l pre-basal items would be passed and a l l p o s t - c e i l i n g items would be f a i l e d and requires the rank order by d i f f i c u l t y of test items to remain r e l a t i v e l y constant across samples (Nunnally, 1978). Items which d i f f e r i n rank order between groups may be suspected of bias and may r e f l e c t d i f f e r e n t learning opportunities or changes i n the c u l t u r a l knowledge base over time (Jensen, 1982; Terman and M e r r i l l , 1973). If i n t e r n a l c r i t e r i a suggest the presence of bias, then the test's predictive v a l i d i t y may be biased for d i f f e r e n t c u l t u r a l groups, and the presence of bias lowers the v a l i d i t y of the test as a whole. Items which maintain t h e i r same rank order placement across groups may be considered to be measuring the same a b i l i t y (Jensen, 1982). Two comparisons of the rank order of item d i f f i c u l t i e s were made between the B r i t i s h Columbia sample responses and the test presentation order. F i r s t , Spearman's rank order c o r r e l a t i o n c o e f f i c i e n t between the two groups was obtained (Jensen, 1982). Spearman's r ranges from 0 (no 22 rank association) to 1 (perfect correspondence i n ranks). Correlations above .95 are desired and represent a very high degree of s i m i l a r i t y i n the order of item d i f f i c u l t y (Jensen, 1982). Second, rank order can also be evaluated i n terms of change i n item position between groups. For the purpose of t h i s study, items were i d e n t i f i e d as misranked i f rank order placement changed by more than ten positions between the two samples. A movement of more than ten rank positions was selected to take into account the SIT's basal and c e i l i n g c r i t e r i o n of ten correct and incorrect responses i n a row, respectively. Rank position of item d i f f i c u l t y was defined as location of placement i n the test for the SIT items and by order of item d i f f i c u l t y values for the sample group. Loevinger's C o e f f i c i e n t of Test Homogeneity Loevinger's index of test homogeneity measures i n t e r n a l consistency or the degree to which items are ordered according to increasing d i f f i c u l t y (Loevinger, 1947; C l i f f , 1979). The c o e f f i c i e n t of test homogeneity i s defined i n terms of \"the p r o b a b i l i t i e s of passing successive items and p r o b a b i l i t i e s of passing the easier of two items granted that the harder of the two i s passed, for a l l pairs of items\" (Loevinger, 1947, p.31). For a perfectly homogeneous test, i n theory, every i n d i v i d u a l would pass a l l items up to a certain point and f a i l a l l subsequent items. A test departs from homogeneity when an i n d i v i d u a l passes an item(s) after a f a i l u r e has occurred. The c o e f f i c i e n t of homogeneity equals 1.0 for a perfectly homogeneous test and 0 for a perfectly heterogeneous test. Three implications may be drawn when a test of a b i l i t y i s perfectly homogeneous: (1) that a l l easier or e a r l i e r items have been passed when i t i s known that a harder or l a t e r item has been passed; (2) that a l l 23 ind i v i d u a l s with correct responses to an item have higher t o t a l test scores than those individuals who f a i l the item; and (3) that an in d i v i d u a l who obtains a higher score on the test than another i n d i v i d u a l has more of the a b i l i t y the test i s - measuring. For the purpose of data analysis, the c o e f f i c i e n t of SIT test homogeneity was determined for each age l e v e l for items responded to by 30 or more ind i v i d u a l s within the age group. Item d i f f i c u l t y was based on each item's rank order placement i n the te s t . Loevinger's c o e f f i c i e n t of test homogeneity i s capable of handling incomplete or t a i l o r e d dichotomous response data, such as on the SIT. The computational formula for the c o e f f i c i e n t of test homogeneity i s given i n Appendix B. Loevinger's C o e f f i c i e n t of Item Pair Homogeneity Loevinger's (1947) c o e f f i c i e n t of item pair homogeneity i d e n t i f i e s discrepancies i n d i f f i c u l t y values between pairs of items. A discrepancy occurs when an in d i v i d u a l passes the harder of two items but f a i l s the easier when items are presumed ordered by increasing d i f f i c u l t y . The c o e f f i c i e n t of homogeneity of an item pair i s equal to 1 when the item pair i s perfectly homogeneous and equal to 0 when the items are unrelated. For the purpose of th i s analysis of the SIT, homogeneity c o e f f i c i e n t s were computed for a l l pairs of items answered by at least 100 out of the t o t a l 319 subjects administered the SIT. The c o e f f i c i e n t takes into account chance expectancy. The formula for the c o e f f i c i e n t of item pair homogeneity used i n the present analysis i s given i n Appendix B.. In summary, th i s study examined the v a l i d i t y of the SIT, for use with schoolchildren i n B r i t i s h Columbia, by means of assessing i t s i n t e r n a l psychometric properties according to the following f i v e c r i t e r i a : 24 1. item d i f f i c u l t y l e v e l s approach the desired value of .5 and f a l l within the range of .25 and .75; 2. item-total test score correlations (item discrimination indices) f a l l above .30, and approach the desired value of 1.0; 3. rank order c o r r e l a t i o n between the order of items by d i f f i c u l t y for the B r i t i s h Columbia sample of schoolchildren and the SIT test order of items (by d i f f i c u l t y l e v e l ) f a l l s at or above .95; items given to the B r i t i s h Columbia sample of schoolchildren do not d i f f e r by more than ten rank positions from SIT test item order; 4. test homogeneity of the SIT w i l l approach perfect homogeneity of 1.0; and 5. item-pair homogeneities w i l l be s i g n i f i c a n t l y p o sitive and approach the \"perfect homogeneity\" value of one. 25 Chapter IV Results This chapter reports the results of fi v e item analyses of the SIT for B r i t i s h Columbia schoolchildren: (1) analysis of item d i f f i c u l t y ; (2) analysis of item discrimination; (3) comparison of rank order of item d i f f i c u l t y for the B r i t i s h Columbia sample and the norm sample; (4) test homogeneity; and (5) homogeneity of adjacent item pairs. Analysis of Item D i f f i c u l t y Indices Item d i f f i c u l t y values (p) were determined for items f a l l i n g within the common item range for each of the three age-groups. The obtained values are given for each age-group in Table 2. Examination of the values indicates that item d i f f i c u l t y indices ranged from .04 to 1.0 where actual item d i f f i c u l t y decreases as a value of 1 i s approached. Table 2 suggests that, for the B r i t i s h Columbia sample, many items are not functioning e f f e c t i v e l y , as items with a d i f f i c u l t y value of .25 to .75 are desired and test variance and r e l i a b i l i t y i s maximized when item d i f f i c u l t y approaches .5. Table 2 also suggests that items do not consistently increase in d i f f i c u l t y over age, as desired for tests of a b i l i t y employing basal and c e i l i n g points and given to a broad age range of in d i v i d u a l s . The interpretation may be drawn, therefore, that for this sample, some SIT items may be misplaced r e l a t i v e to their rank order of item d i f f i c u l t y . 26 Table 2 SIT Items: Frequency of Response, Index of D i f f i c u l t y , Index of Discrimination, and Rank Order Misplacement by D i f f i c u l t y for B r i t i s h Columbia Schoolchildren on the SIT Age N Response Frequency (f) 7 9 11 108 111 100 Item D i f f i c u l t y (P) 7 9 11 108 111 100 Item Discrimination ( r p b i s ) 7 9 11 108 111 100 Rank Order Misplacement 7 108 9 111 11 100 Item Number 5-4 61 1.0 0 5-6 63 .98 .08 .5 5-8 63 .98 .01 .5 5-10 75 .93 .32 1.5 6-0 78 .91 .37 3 6-2 87 .95 .39 2 6-4 91 .88 .44 2.5 6-6 97 .88 .36 1.5 6-8 97 .94 .26 4 6-10 102 57 .76 .91 .38 .30 5 7 7-0 106 90 53 .60 .90 .94 .39 .14 .24 9 7.5 6.5 7-2 106 90 53 .82 .98 .96 .49 .19 .37 .5 1 2 7-4 106 91 53 .93 .99 1.0 .17 .14 6.5 3 2 7-6 107 92 53 .60 .97 .92 .59 .21 .40 6 2 7 7-8 107 98 54 .82 .96 .96 .35 .16 .32 2.5 1.5 1 7-10 108 100 54 .43 .88 .94 .60 .47 .34 7 4.5 1.5 8-0 108 100 54 .81 .92 .93 .25 .20 .19 3 1 2.5 8-2 108 103 58 .33 .70 .79 .51 .52 .51 8 9 11 8-4 108 106 67 .59 .81 .88 .43 .21 .28 3 5 5 8-6 108 106 69 .60 .83 .8 .27 .32 .43 0 3 5 8-8 108 106 70 .13 .20 .26 .46 .35 .50 13 25 35.5 8-10 108 107 71 .83 .96 .99 .17 .19 .36 11 8.5 10 9-0 108 107 72 .75 .93 .96 .28 .08 .40 7 8 9 9-2 108 107 75 .62 .88 ..93 .44 .15 .34 6 3.5 4.5 9-4 108 108 78 .18 .51 .82 .38 .53 .49 13 8.5 1 9-6 108 110 89 .24 .71 .91 .35 .42 .39 3 0 4 9-8 108 111 92 .73 .90 .95 .41 .33 .39 10 8.5 11 9-10 107 111 94 .06 .24 .61 .20 .39 .46 9.5 16.5 10 10-0 107 111 95 .03 .32 .61 .23 .51 .40 11.5 11.5 9 10-2 107 111 95 .27 .64 .67 .38 .41 .40 3 1 4 10-4 105 111 95 .39 .60 .81 .36 .32 .45 7 .5 4 10-6 105 111 95 .19 .55 .78 .45 .54 .51 1 0 .5 10-8 104 111 94 .22 .85 .89 .41 .33 .34 3 11 10 10-10 98 111 95 .34 .74 .78 .46 .27 .24 9 9 2.5 11-0 92 111 95 .15 .32 .59 .45 .35 .42 2 5.5 4.5 11-2 92 111 95 0 .08 .17 .20 .46 6.5 19 28 11-4 91 111 95 .25 .60 .64 .49 .28 .14 9 6.5 1 27 Table 2 Cont'd Response Item Item Rank Order Frequency D i f f i c u l t y Discrimination Misplacement Age 7 9 11 7 9 11 7 9 11 7 9 11 N 108 111 100 108 111 100 108 111 100 108 111 100 Item Number 11-6 77 110 99 0 .08 .35 .27 .47 4. 5 17 15.5 11-8 77 110 97 .03 .25 .52 .19 .37 .46 5 4 4 11-10 77 110 97 .06 .36 .74 .09 .41 .42 2. 5 2 7 12-0 74 110 97 .08 .34 .45 .30 .39 .34 6 2 6.5 12-2 60 108 97 .07 .50 .66 .23 .31 .38 6 7 7 12-4 55 106 98 .05 .38 .79 .27 .66 .58 4 6 14 12-6 106 96 .04 .16 .25 .43 14 11.5 12-8 102 96 .15 .35 .32 .35 35 8.5 12-10 101 96 .13 .23 .25 .23 5 13 13-0 99 96 .06 .20 .30 .51 10 15 13-2 91 95 0 .03 .18 13 24 13-4 91 95 .14 .13 .32 .27 1 14 13-6 91 95 .24 .43 .54 .54 5.5 .5 13-8 84 95 .65 .79 .61 .53 23 22 13-10 78 94 .18 .40 .31 .32 5 0 14-0 71 92 .15 .49 .48 .53 4.5 9 14-2 64 89 .09 .30 .41 .43 1 1 14-4 64 89 .28 .61 .49 .59 13 17 14-6 63 87 .02 .16 .16 .25 3.5 9.5 14-8 60 86 .12 .43 .45 .55 5 6.5 14-10 59 86 .08 .23 .27 .49 3 1 15-0 59 85 .41 .48 .55 .42 13 14 15-2 59 85 .02 .02 .09 .26 1 14 15-4 59 85 .51 .59 .54 .41 27.5 20.5 15-6 80 .45 .51 14.5 15-8 79 .47 .43 17 15-10 78 .54 .53 22 16-0 78 .44 .51 16 16-3 77 .23 .40 7 16-6 76 .01 .00 8 16-9 75 .21 .21 7 17-0 75 .09 .13 .5 17-3 73 .10 .32 3 17-6 73 .03 .18 1 17-9 69 .09 .26 4.5 18-0 66 .03 .30 1 18-3 62 .26 .37 17.5 18-6 54 .04 .15 5 28 The percentage of items which f a l l below, within or above the desired d i f f i c u l t y range of .25 to .75 i s given i n Table 3. Table 3 indicates that approximately one-third of the items are too easy and one-third too d i f f i c u l t . Therefore, for th is sample of B r i t i s h Columbia schoolchildren, two-thirds of the items do not work ef fec t ive ly to maximize test r e l i a b i l i t y . Comparison of item d i f f i c u l t i e s across the three age levels shows that, as expected, test items decrease in d i f f i c u l t y over age. This finding suggests that the items are functioning as desired between age l eve l s ; however, the two-year age differences between the groups tested weakens the signif icance of th i s f ind ing . Table 3 Percentage of items f a l l i n g below, within and above the preferred range of d i f f i c u l t y for B r i t i s h Columbia chi ldren at three age levels Item D i f f i c u l t y (p) Range (p)<.25 (p)=-25 to .75 (p)>.75 Number of Age Items 7 1/2 35 30 35 43 9 1/2 34.5 36.5 29 52 11 1/2 28 38 34 65 Analysis of Item Discrimination Indices Item-test corre la t ion coeff ic ients were computed for items f a l l i n g in the common range for each of the three age l eve l s . The obtained po in t -b i se r i a l corre la t ion coeff ic ients ( r p | ^ s ) a r e given in Table 2. 29 Correlations range from .00 to .66 where correlations of .30 and above are accepted as contributing to test r e l i a b i l i t y . These findings suggest that of items i n the common item range, 35%, 39% and 20% (n_ = 43, 52, 65) are not discriminating well at the 7 1/2, 9 1/2, and 11 1/2 year old age le v e l s , respectively, for t h i s sample. Comparison of Rank Order of Item D i f f i c u l t y The rank order of item d i f f i c u l t y for the B r i t i s h Columbia sample of schoolchildren was compared to item presentation order. Spearman's rank order c o r r e l a t i o n c o e f f i c i e n t s were computed for each of the three age lev e l s over the common item range. Correlation values of .88, .79, and .81 were obtained for the 7 1/2, 9 1/2 and 11 1/2 year olds, respectively, i n d i c a t i n g a degree of s i m i l a r i t y i n the rank ordering of items by d i f f i c u l t y for the two samples. Degree of difference i n the rank order placement of items for the two samples was determined for each age l e v e l and presented i n Table 2. Rank order changes i n d i f f i c u l t y ranged from 0 to 35. Items were found both to be easier and harder for the B r i t i s h Columbia sample r e l a t i v e to the i r placement i n the te s t . The items which changed more than 10 positions i n terms of rank order of d i f f i c u l t y are l i s t e d by age l e v e l i n Table 4. 30 Table 4 Items changing rank order position by more than ten places from the order of presentation for the B r i t i s h Columbia sample Age Level 7 1/2 - 9 1/2 11 1/2 8-8 8-8 8-10 9-10 8- 8\"\" 9- 8 9-4 10-0* 11-2* 10-0\" 10-8 11-6\" 11-2\" 12-4 11-6\" 12-6\" 12-6\" 12-10 13-2\" 13-0 13-8\" 13-2* 14-4* 13-4 15-0\" 13-8* 15-4 14- 4* 15- 0* 15-2 15-4\" 15-6 15-8 15- 10 16- 0 18-3 ' I -change i n rank position by more than ten positions at two age lev e l s change i n rank position by more than ten positions at three age lev e l s 31 Twenty-five items were found to have sh i f t e d more than ten positions. Table 5 presents these items by content and di r e c t i o n of rank position movement. Items were c l a s s i f i e d as information, s i m i l a r i t i e s , short-term memory, arithmetic-reasoning, arithmetic-information, vocabulary or visual-motor. Categories were based on previously developed schemes with the exception of numerical reasoning which was broken down into the categories of arithmetic-reasoning and arithmetic-information on the basis of item content, and d i g i t span and/or sentence memory which were combined into the category of short-term memory (Boyd, 1974; Fudala, 1979; Nicholson, 1970; Stone, 1975). C l a s s i f i c a t i o n across schemes was generally consistent. The majority of discrepancies which did occur was over categorization of items as information versus arithmetic. This was corrected for i n the present study by the in c l u s i o n of both an arithmetic-reasoning and arithmetic-information category. The number and percentage of items by category are given i n Appendix C for each of four previously developed item c l a s s i f i c a t i o n schemas. The c l a s s i f i c a t i o n of items f a l l i n g i n the common item range and the number and percentage of items per category which changed r e l a t i v e d i f f i c u l t y are also presented i n Appendix C. Test item content i s not consistent over age. Of the questions which were easier for the B r i t i s h Columbia sample, 8 out of 12 were c l a s s i f i e d as vocabulary items and 7 out of 13 of the harder items were arithmetic-information. Over a l l SIT items analyzed, 46% of the vocabulary items and 100% of the arithmetic-information items changed rank position by more than 10 places. 32 Table 5 Categorization of items changing rank posit ion of d i f f i c u l t y by more than ten positions Items Less D i f f i c u l t than Test Placement Item Age Question Category 8-10 7 What does destroy mean? Vocabulary 9-i 3 11 What was a dungeon used for? Vocabulary 12--4 11 What does scarce mean? Vocabulary 13--8 9, 11 What does tremendous mean? Vocabulary 14--4 9, 11 What i s the pr inc ipa l kind of work done by a pharmacist? Vocabulary 15--0 9, 11 What i s the pr inc ipa l kind of work done by an architect? Vocabulary 15--4 9, 11 What does fragrant mean? Vocabulary 15--6 11 What i s the area or how many square feet are there in a room 9' wide and 12' long? Arithmetic Reasoning 15--8 11 In what ways are an octopus and an octave a l ike? S imi l a r i t y 15--10 11 What does environment mean? Vocabulary 16--0 11 A boy who had $5.00 took his g i r l to the movies. I f the t ickets cost $.75 each, and they both had $.30 milkshakes, how much did he have lef t? Arithmetic Reasoning 18--3 11 Say these numbers backwards: '8 3 2 9 4 7' Short-term Memory 33 Table 5 Cont'd Items More D i f f i c u l t than Test Placement Item Ag e Question Category 8-8 7, 9, 11 Listen c a r e f u l l y and say exactly what I say \"The t r a i n goes fast on the tracks carrying people and bags of mail\" Short-term Memory 9-4 7 What does vacant mean? Vocabulary 9-10 9 How many inches i n two feet? Arith-Info 10-0 7 How many minutes i n 3/4 of an hour? Arith-Info 10-8 9 If a boy had 45 cents, how many nickel or 5 cent candy bars could he buy? Arithmetic Reasoning 11-2 9, 11 What does i t mean to be t h r i f t y ? Vocabular; 11-6 9, 11 What would a man do i f he took an inventory of his store? Vocabulary 12-6 9, 11 How many inches i n two yards? Arith-Info 12-10 11 What should be a healthy person's temperature? Inf ormatioi 13-0 11 How many feet i n thirteen yards? Arith-Info 13-2 9, 11 How many pints i n a gallon? Arith-Info 13-4 11 How many pounds i n a ton? Arith-Info 15-2 11 How many feet i n a mile? Arith-Info Test Homogeneity Loevinger's c o e f f i c i e n t of test homogeneity was computed for the SIT items. At the 7 1/2, 9 1/2 and 11 1/2 age group, the obtained c o e f f i c i e n t s of test homogeneity were equal to .003, .004, and .006, respectively. 34 These low coeff ic ients suggest that items on the SIT are not arranged according to order of d i f f i c u l t y for the sample of chi ldren tested. Homogeneity of Pairs of Test Items Loevinger's coeff ic ient of item-test homogeneity was determined for item pairs responded to by one hundred or more of the children tested, grouped over age. Coefficients ranged from -.22 to .88, where items which are perfectly homogenous have a coeff ic ient of 1.0 and items which are unrelated approach 0, and are l i s t e d i n Appendix D. The coeff ic ient median f e l l at .21 and 75% of the responses f e l l i n the range of .06 to .44. The large number of low coeff ic ients ref lec t discrepancies i n item pairs where the harder item i s passed and the easier f a i l e d , suggesting that items on the SIT are not i n order of increasing d i f f i c u l t y for the B r i t i s h Columbia sample. Summary The resul ts of the analyses suggest that SIT items are not working as desired for the B r i t i s h Columbia sample of schoolchildren tested. Two-thirds of the items i n the common item range were not functioning ef fec t ive ly i n terms of item d i f f i c u l t y , one-third of the items were not discriminating w e l l , and many items were misordered according to increasing l eve l of d i f f i c u l t y for the sample tested. The item weaknesses iden t i f i ed are recognized to lower a t e s t ' s overa l l in ternal consistency r e l i a b i l i t y and, consequently, i t s c r i t e r i o n v a l i d i t y . 35 Chapter V Summary and Conclusions This chapter includes a summary of the findings of the item analyses of the SIT and a discussion of them i n r e l a t i o n to the effectiveness of the SIT for use with B r i t i s h Columbia schoolchildren. Purpose of the Study The purpose of the present study was to examine the effectiveness of the SIT as a measure of general i n t e l l i g e n c e for B r i t i s h Columbia schoolchildren through analysis of the in t e r n a l psychometric properties of the test items. Test item's psychometric properties a f f e c t the d i s t r i b u t i o n of test scores, i n t e r n a l consistency r e l i a b i l i t y and c r i t e r i o n v a l i d i t y . Five item c h a r a c t e r i s t i c s which influence test i n t e r p r e t a t i o n were examined: item d i f f i c u l t y , item discrimination, rank order of item d i f f i c u l t y c o rrelations between t h e . B r i t i s h Columbia sample and the standardization group, test homogeneity, and homogeneity of item pa i r s . For a test to discriminate most e f f e c t i v e l y , item d i f f i c u l t y values should be i n the range of .25 to .75, item discrimination values should f a l l above .30, and items should be arranged by increasing l e v e l of d i f f i c u l t y . Summary of Test Findings The item d i f f i c u l t y values obtained for the B r i t i s h Columbia sample indicate that less than one-third of the items, at each of the three age le v e l s tested, achieved the desired range of d i f f i c u l t y of .25 to .75. This suggests that, for th i s sample, approximately two-thirds of the items are f a i l i n g to discriminate between individuals on the t r a i t measured by 36 the tes t . Items f a l l i n g outside the optimal d i f f i c u l t y range do not raise test variance nor lower in ternal consistency r e l i a b i l i t y . For tests of general i n t e l l i gence , such as the SIT, a wide dispersion of test scores i s desirable i n order to maximize discrimination between ind iv idua l s . For the B r i t i s h Columbia sample, approximately one-third of the item discriminat ion values were found to be too low to discriminate ef fec t ive ly between high and low scorers on the test as a whole for each of the age levels assessed. Items which are good discriminators are passed by individuals with higher test scores than those who f a i l the item. Items which are poor discriminators diminish the r e l i a b i l i t y of ind iv idua l differences and the t e s t ' s in ternal consistency r e l i a b i l i t y . The SIT's incorporation of basal and c e i l i n g entry and ex i t points assumes that items are presented i n order of increasing d i f f i c u l t y . Analysis of the item d i f f i c u l t y indices showed that items f a l l i n g i n the common item range do not consistently increase i n d i f f i c u l t y within each age-group for the B r i t i s h Columbia sample. A consistent decrease i n d i f f i c u l t y over age was noted. However, the. two year gap between age-groups measured reduces the significance of the f inding. Spearman's rank order corre la t ion coeff ic ient was used to compare the re l a t ive rank order of item d i f f i c u l t y for the B r i t i s h Columbia sample and the order of item presentation. I f a test i s measuring the same a b i l i t y across groups, the r e l a t ive rank order of the items should not vary. The obtained rank order correlat ions of .79 to .88 between the norm and the B r i t i s h Columbia sample for the three age-groups i s respectable but f a l l s short of the desired value of .95. 37 To further examine which items varied by d i f f i c u l t y between groups, items which changed rank order placement by more than ten positions were i d e n t i f i e d . This c r i t e r i o n was chosen to take into account the SIT's basal and c e i l i n g rules. Rank order position changes ranged from 0 to 35, and twenty-five items were found to have shifted by more than ten positions. Analysis of items by content suggested that d i f f e r e n t types of items were easier or harder for the B r i t i s h Columbia sample than the standardized group. Of the easier items, eight were c l a s s i f i e d under vocabulary. These items c a l l e d for the meaning of destroy, dungeon, scarce, tremendous, pharmacist, a r c h i t e c t , fragrant and environment. Of the 13 harder items, seven involved non-metric arithmetic knowledge. These items included converting feet to inches, yards to inches, gallons to pints, tons to pounds and miles to feet. Three vocabulary words, t h r i f t y , vacant and inventory, were also more d i f f i c u l t for the B r i t i s h Columbia sample. S h i f t s i n d i f f i c u l t y between groups may r e f l e c t d i f f e r e n t learning opportunities or c u l t u r a l experiences, or may be an a r t i f a c t of changes i n the common knowledge base which occur over time (Jensen, 1982). Examination of the content of items which s i g n i f i c a n t l y altered i n d i f f i c u l t y can be attributed either to c u l t u r a l differences or changes which occur over time (time-factor). The finding that 46% of the vocabulary items i n the common item range s i g n i f i c a n t l y changed i n d i f f i c u l t y , becoming either easier,or harder supports the time-factor in t e r p r e t a t i o n as vocabulary use i s recognized to a l t e r over time. For example, the words \"tremendous\" and \"environment\" are more commonly used today than twenty years ago. A c u l t u r a l or educational difference interpretation of changes i n vocabulary d i f f i c u l t y i s less supported since 38 B r i t i s h Columbia children have exposure to a similar language base (through TV and print media) as their American counterparts. The role of c u l t u r a l factors i n the v a r i a t i o n of the psychometric properties between the two groups i s suggested by the increase i n d i f f i c u l t y of the non-metric arithmetic problems for the B r i t i s h Columbia sample, although th i s i n t e r p r e t a t i o n i s not clear-cut because the age of the test i s a confounding factor. A c u l t u r a l i n t e r p r e t a t i o n may be drawn because the metric system was adopted in Canada i n the early 1970's and children are less f a m i l i a r with non-metric arithmetic values than at the time the SIT was developed. Therefore, the increase i n d i f f i c u l t y of the arithmetic problems may be related to the change in the math system taught to B r i t i s h Columbia schoolchildren. However, as twenty years have elapsed since the standardization of the SIT, the age of the test confounds the in t e r p r e t a t i o n of t h i s f i n d i n g . An analysis of item d i f f i c u l t y for a comparable present-day American sample i s needed to determine whether the change i n item d i f f i c u l t y may be attributable to c u l t u r a l differences or to the test's age. Loevinger's c o e f f i c i e n t of test homogeneity was also used to evaluate the degree to which items on the SIT were ordered by increasing d i f f i c u l t y for the B r i t i s h Columbia sample. When a test i s perfectly homogeneous, an in d i v i d u a l ' s t o t a l test score r e f l e c t s his or her a b i l i t y r e l a t i v e to the t r a i t measured by the test, and higher test scores can be interpreted i n terms of greater a b i l i t y . The obtained c o e f f i c i e n t s of test homogeneity approached zero, i n d i c a t i n g that for this sample, test items are not ordered i n terms of increasing d i f f i c u l t y . 39 Lack of homogeneity i s also reflected in the low adjacent item pair homogeneity c o e f f i c i e n t s . For this sample of B r i t i s h Columbia school-children, the median f e l l at .21 and 75% of the responses f e l l in the range of .06 to .44, where a c o e f f i c i e n t of 0 suggests that the pair of items are unrelated in regard to the a b i l i t y that they measure. Limitations of the Study The findings of the present study must be interpreted r e l a t i v e to the following l i m i t a t i o n s on i t s g e n e r a l i z a b i l i t y . F i r s t , a l l subjects in this study resided within the province of B r i t i s h Columbia and may not, therefore, be representative of children in other provinces of Canada or the United States.. Second, the sample selected was limited to three age groups, 7 1/2, 9 1/2, and 11 1/2 (corresponding to grades 2, 4, and 6), and i s therefore not representative of the t o t a l age-range of individuals to whom the SIT i s administered. Third, a l l subjects were drawn from regular class placements and findings may not, therefore, be generalizable to children i n special education programs. The sample was, however, ca r e f u l l y selected to be representative of the B r i t i s h Columbia population for the age groups tested and the findings, therefore, may be generalizable to those age groups within the B r i t i s h Columbia population and may possibly be extended to the B r i t i s h Columbia population of schoolchildren at large. It must be noted that lack of an American comparison group l i m i t s the interpretation of this study's findings. Data from an American comparison group might provide insight as to whether the s h i f t in d i f f i c u l t y values noted r e f l e c t the limited nature and age of the norm data or c u l t u r a l differences. For example, i f the results of an American comparison group 40 matched the findings of the norm data or they matched on a l l but the arithmetic-information items, then a c u l t u r a l difference i n t e r p r e t a t i o n would be favoured. If the variations are a factor of the age of the test, then si m i l a r r e s u l t s would be expected for an American comparison sample as those found for the B r i t i s h Columbia sample. The degree to which t h i s sample of B r i t i s h Columbia children i s or i s not representative of Canadian and American children's performance on the Slosson Intelligence Test l i m i t s the g e n e r a l i z a b i l i t y of the study's findings. Although c u l t u r a l influences cannot be separated from the technical weakness and age of the SIT item analysis data without obtaining comparative information for a present-day sample of American schoolchildren, the information provided by these item analyses should be considered useful for in t e r p r e t a t i o n of the SIT with B r i t i s h Columbia schoolchildren. Conclusion Technical weaknesses evident i n the SIT manuals (1977, 1983) suggested a need to examine the psychometric properties of the test when administered to B r i t i s h Columbia schoolchildren. Analyses indicated that a s i g n i f i c a n t proportion of test items are not working to discriminate between i n d i v i d u a l s on the t r a i t measured by the t e s t . A d d i t i o n a l l y , items were not consistently ordered by their increasing l e v e l of d i f f i c u l t y . Misranking of items can have two r e s u l t s . One, children of younger ages, who would gain c r e d i t for knowing the item i f i t were administered, receive no c r e d i t for i t i f the item comes aft e r the c h i l d ' s c e i l i n g point: i t would not be administered. Two, the l i m i t ( c e i l i n g ) i s extended upward for children who pass a misplaced, easy item. In the most 41 extreme case, a c h i l d may corr e c t l y answer the easy (misplaced) item after nine consecutive f a i l u r e s ; he must then be given additional questions u n t i l ten consecutive items are f a i l e d . This could amount to 19 wrong responses i n 20 items. This pattern was noted several times during data c o l l e c t i o n . Repetitive f a i l u r e i s undesirable during test administration, as i t can have deleterious e f f e c t s on the c h i l d and i n v a l i d a t e further t e s t i n g . Two possible factors have been presented in explanation of the noted differences in item d i f f i c u l t y and discrimination values for the B r i t i s h Columbia sample and the standardization group: (1) c u l t u r a l or educational bias r e s u l t i n g from d i f f e r e n t learning experiences encountered by the two groups, or (2) changes in the content knowledge base of the general population. An a l t e r n a t i v e explanation of the differences in obtained item d i f f i c u l t y values i s related to the limited nature of the o r i g i n a l norm sample. The standardization data was c o l l e c t e d s o l e l y from New England residents, and therefore may not be representative of the United States at large. Discrepant d i f f i c u l t y values would perhaps not have been found i f the norm sample had been more representative. As there i s no way of assessing what item d i f f i c u l t y indices would have been in a well-s t r a t i f i e d sample collected at the time of the limited norm data, this a l t e r n a t i v e cannot be evaluated empirically. However, analysis of a w e l l - s t r a t i f i e d present-day American sample would shed l i g h t on this issue. Evidence which indicates that item d i f f i c u l t i e s do change over time weakens support for t h i s i n t erpretation (Terman & M e r r i l l , 1973; Wechsler, 1974; Dunn & Dunn, 1981). The p o s s i b i l i t y that testing a present-day American sample would support a c u l t u r a l difference 42 i n t e r p r e t a t i o n of the s h i f t i n d i f f i c u l t y on the arithmetic items also cannot be dismissed without getting empirical evidence. In conclusion, i t i s hypothesized that the differences i n item d i f f i c u l t y between the B r i t i s h Columbia sample and the norm sample i s a function of the age of SIT norms. Nearly a quarter of a century has passed since the Slosson Intelligence Test was constructed and the test's items have not been re-evaluated or updated, despite the appearance of a 1981 r e v i s i o n of norms. To support t h i s hypothesis, further research i s needed to gather comparative item analysis data for a present-day American sample of c h i l d r e n . U n t i l such data i s c o l l e c t e d , i t i s not possible to determine the factors contributing to the differences i n item d i f f i c u l t y . This information i s not, however, necessary for caution to be drawn against use of the SIT with B r i t i s h Columbia schoolchildren. A3 References Armstrong, R.J. & Jensen, J.A. ( 1972). The v a l i d i t y of an abbreviated form of the Stanford-Binet Intelligence Scale, Form L-M. Educational and Psychological Measurement, 32, A63-A67. Armstrong, R.J. & Jensen, J.A. (1982). Slosson Intelligence Test (SIT) for Children and Adults. Technical Manual. East Aurora, NY: Slosson Educational Publications. Armstrong, R.J. & Jensen, J.A. (198A). Slosson Intelligence Test (SIT) for Children and Adults. Norms tables: Application and Development. East Aurora, NY: Slosson Educational Publications. Armstrong, R.J. & Mooney, R.F. (1971). The Slosson Intelligence Test: Implications for reading s p e c i a l i s t s . Reading Teacher, 24, 336-340. Baum, D.D. & Abelson, G. (1981). Comparison of Slosson and Peabody IQs among white kindergarten children. Psychological Reports, 48, 754. Baum, D.D. & Kelly, T.J. (1979). The v a l i d i t y of the Slosson Intelligence Test with learning disabled Kindergartners. Journal of Learning D i s a b i l i t i e s , _12, 268-270. Berry, G.M. J r . (1977). An investigation of the item ordering of the Peabody Picture Vocabulary Test by sex and race. (Doctoral d i s s e r t a t i o n , University of Connecticut, 1977). Dissertation Abstracts International, 38, 6642A. Borstein, R.A., McLeod, J., McClurg, E. & Hutchinson, B. (1983). Item d i f f i c u l t y and content bias on the WAIS-R information subtest. Canadian Journal of Behavioral Science, 15, 27-34. Boyd, J.E. (1974). Use of the Slosson Intelligence Test in Reading Diagnosis. Academic Therapy, 9_, 441-444. Brown, W.R. & McGuire, J.M. (1976). Current psychological assessment practices. Professional Psychology, ]_, 475-484. C a r l i s l e , A., Shinedling,.M. & Weaver, R. (1970). Note on the use of the Slosson Intelligence Test with mentally retarded residents. Psychological Reports, 26, 865-866. Cianflone, R. & Zullo, T.O. (1980). The relationship of an early measure of i n t e l l i g e n c e to the a b i l i t y to learn sight vocabulary words and to later achievement. Educational and Psychological Measurement, 40, 1197-1200. C l i f f , N. (1979). Test theory without true scores. Psychometrika, 44, 373-391. Colarusso, R., McLesky, J., G i l l , S.H. (1977). Use of the Peabody Picture Vocabulary Test and the Slosson Intelligence Test with urban black kindergarten ch i l d r e n . Journal of Special Education, 4J_( 1), 19-23. Coleman, M., Brown, G. & Genong, L. (1980). A comparison of PPVT and SIT scores of young children. Psychology in the Schools, 17, 178-180. 44 Covin, T.M. (1977a). Comparison of SIT and WISC-R IQ's among special education candidates. Psychology i n the Schools, 14, 19-23. Covin, T.M. (1977b). Relationship of the SIT and PPVT to the WISC-R. Journal of School Psychology, 15, 259-260. Crofoot, M.J. & Bennett, T.S. (1980). A comparison of the screening tests and the WISC-R i n special education evaluations. Psychology i n the Schools, 17, 474-478. DeLapa, G. (1973). Correlates of Slosson Intelligence Test, Stanford-Binet Form L-M, and achievement indices. (Doctoral d i s s e r t a t i o n , West V i r g i n i a University, 1967). Dissertations Abstracts International, 3498-A. (University Microfilms no. 68-2678). Dirkes, J., Wessels, K., Quarforth, J . & Quervon, B. (1980). Can short-form WISC-R IQ tests i d e n t i f y children with high f u l l scale IQ? Psychology i n the Schools, 17, 40-41. Dunn, L.M. and Dunn, L.M. (1981). Peabody Picture Vocabulary Test-Revised. C i r c l e Pines, MN:American Guidance Service. Fudala, J.B. (1979). D i f f e r e n t i a l evaluation of students with the SIT. Academic Therapy, 15, 61-64. Grossman, F.M. & Johnson, K.M. (1983). V a l i d i t y of the Slosson and O t i s -Lennon i n predicting achievement of gi f t e d students. Educational and Psychological Measurement, 43, 617-622. Hale, R.L., Douglas, B., Cummins, A., Rittgarn, G., Breeds, B. & Dabbert, D. (1978). The Slosson as a predictor of Wide Range Achievement Test performance. Psychology i n the Schools, 15, 507-509. Holmes, B.J. (1981). Individually-administered i n t e l l i g e n c e t e s t s : An appli c a t i o n of anchor test norming and equating procedures i n B r i t i s h Columbia. (Doctoral d i s s e r t a t i o n , University of B r i t i s h Columbia, 1981). Dissertation Abstracts International, 42, 2626A. Houston, C. & Otto, W. (1968). Poor readers' functioning on the WISC, Slosson Intelligence Test and Quick Test. Journal of Educational Research, 62, 157-159. Jensen, A.R. (1980). Bias i n Mental Testing. New York: The Free Press. Jensen, A.R. (1982). Straight Talk about Mental Tests. New York: The Free Press. Jerrolds, B.W., Callaway, B., & Gwaltney, W.K. (1972). Comparison of the Slosson Intelligence Test and WISC scores of subjects referred to a reading c l i n i c . Psychology i n the Schools, 9_, 409-410. Johnson, D.L. & Johnson, C.A. (1971). Comparison of four i n t e l l i g e n c e tests used with c u l t u r a l l y disadvantaged ch i l d r e n . Psychological Reports, 28, 209-210. 45 Jongeward, P.A. (1968). A v a l i d i t y study of the Slosson Intelligence Test for use with educable mentally retarded students. Journal of School Psychology, 7_, 59-63. Kaufman, H. & Ivanoff, J . (1969). The Slosson Intelligence Test as a screening instrument with a r e h a b i l i t a t i o n population. Exceptional Children, 35, 745. Klein, A.E. (1978). The r e l i a b i l i t y and predictive v a l i d i t y of the Slosson Intelligence Test for pre-klndergarten pupils. Educational and Psychological Measurement, 38, 1211-1217. Lamp, R.E., Traxler, A.J. & Gustafson, P.P. (1973). Predicting academic achievement of disadvantaged fourth grade children using the Slosson Intelligence Test. Journal of Community Psychology, _1, 339-341. Lessler, K. & Galinksy, M.D. (1971). Relationship between the Slosson Intelligence Test and Wechsler Intelligence Scale for Children scores i n s p e c i a l education candidates. Psychology i n the Schools, 8, 341-344. Loevinger, J . (1947). A systematic approach to the construction and evaluation of tests of a b i l i t y . Psychological Monographs, 61, No.4. Lowrance, D. & Anderson, H.N. (1979). A comparison of the Slosson Intelligence Test and the WISC-R with elementary school ch i l d r e n . Psychology i n the Schools, 16, 361-364. Martin, J.D., B l a i r , G.E. & Vickers, D.M. (1979). Correlation of the Slosson Intelligence Test with the C a l i f o r n i a Short-Form Test of Mental Maturity and the Shipley-Institute of Liv i n g Scale. Educational and Psychological Measurement, 39, 193-196. Martin, J.D. & Rudolph, L. (1972). Correlates of the Wechsler Adult Intelligence Scale, the Slosson Intelligence Test, ACT scores and grade point averages. Educational and Psychological Measurement, 32, 459-462. Maxwell, M.T. (1971). The rel a t i o n s h i p between the Wechsler Intelligence Scale for Children and the Slosson Intelligence Test. Child Study Journal, _1, 164-171. Mize, J.M., Calloway, B. & Smith, J.W. (1979). Comparison of reading disabled children's scores on the WISC-R, Peabody Picture Vocabulary Test and Slosson Intelligence Test. Psychology i n the Schools, 16, 356-358. Nicholson, C.I. (1970). Analysis of functions of the Slosson Intelligence Test. Perceptual and Motor S k i l l s , 31, 627-631. Nunnally, J.C. (1978). Psychometric Theory. New York: McGraw H i l l . 46 R i t t e r , D., Duffey, J . & Fischman, R. (1973). Comparability of Slosson and Stanford-Binet estimates of i n t e l l i g e n c e . Journal of School Psychology, j _ l , 224-227. Rogers, S.J. (1982). Problems with the Slosson Intelligence Test for pre-school children. Journal of School Psychology, 20(1), 65-68. Rotatori, A.F., Sedlak, B. & Freagon, S. (1979). Usefulness of the Slosson Intelligence Test with severely and profoundly retarded ch i l d r e n . Perceptual and Motor S k i l l s , 48, 334. Rust, J.O. & Lose, B.D. (1980). Screening for giftedness with the Slosson and the scale of rating behavioral c h a r a c t e r i s t i c s of superior student Psychology i n the Schools, 17, 446-451. Slosson, R.L. (1977). Slosson Intelligence Test (SIT) for Children and Adults. East Aurora, New York: Slosson Educational Publishers. Slosson, R.L. (1983). Slosson Intelligence Test (SIT) for children and adults (2nd ed.). East Aurora, New York: Slosson Educational Publisher Smith, S.A. (1981). Slosson and Peabody IQ's of mentally retarded adults. Psychological Reports, 48, 786. Stanley, J.C. & Hopkins, K.D. (1981). Educational and psychological measurement and evaluation (6th ed.). Englewood C l i f f s , NJ: Prentice-Hall. Stewart, K.D. & Jones, E.C. (1976). V a l i d i t y of the SIT. A ten-year review. Psychology i n the Schools, 13, 372-380. Stewart, K.D. & Myers, D.G. (1974). Long term v a l i d i t y of the Slosson Intelligence Test. Journal of C l i n i c a l Psychology, 30, 180-181. Stewart, K.D. & Wood, D.Z. & Gallman, W.A. (1971). Concurrent v a l i d i t y of Slosson Intelligence Test. Journal of C l i n i c a l Psychology, 27, 218-220 Stone, M. (1975). An i n t e r p r e t i v e p r o f i l e for the Slosson Intelligence Test. Psychology i n the Schools, 12, 330-333. Swanson, M.S. & Jacobson, A. (1970).. Evaluation of the Slosson Intelligence Test for screening children with learning d i s a b i l i t i e s . Journal of Learning D i s a b i l i t i e s , 3^ , 318-320. Terman, L.M. & M e r r i l l , M.A. (1973). Stanford-Binet Intelligence Scale (3rd ed.). New York: Houghton-Mifflin. Vance, H.B., Lewis, R. & DeBell, S. (1979). Correlations of the Wechsler Intelligence Scale for Children - Revised, Peabody Picture Vocabulary Test, and Slosson Intelligence Test for a group of learning disabled students. Psychological Reports, 44, 735-738. Wechsler, D. (1974). Wechsler Intelligence Scale for Children - Revised. New York: Psychological Corporation. 47 Appendix A Breakdown of Sample by S t r a t i f i c a t i o n Variables 48 Appendix A Breakdown of Sample by S t r a t i f i c a t i o n Variables Variable S t r a t i f i c a t i o n N Sex Male 163 Female 156 Age 7 years 3 months to 7 years 9 months 108 9 years 3 months to 9 years 9 months 111 11 years 3 months to 11 years 9 months 100 Community Size Under 1000 45 1000 to 50,000 80 Over 50,000 192 School Size Under 150 47 151 to 300 80 Over 300 192 Zone Okanagan 43 Metropolitan Vancouver 133 Fraser Valley 26 Vancouver Island 48 Kootenay 26 Northern B r i t i s h Columbia 43 49 Appendix B Computational Formulas for Determining Te Homogeneity and Item Pair Homogeneity 50 Appendix B Computational Formula for Determining Test Homogeneity (Loevinger, 1947) N ( x k 2 - x k ) + ^ 2 - a k ) 2 Est H = 2 N ( E . N . - X, ) + N . 2 - ( X . ) 2 1 1 k I k Where: H = c o e f f i c i e n t of test homogeneity X = raw score = number passing the i t h item, when items are ordered according to decreasing number passing k = summation for a l l _N individuals i = summation for a l l m items Computational Formula for Determining the C o e f f i c i e n t of Homogeneity of Item Pairs (Loevinger, 1947) Where: NK ^ i i ~ ^ h e H.. = the c o e f f i c i e n t of homogeneity of two items n N = number of cases P^ = number passing the harder item 0 = number f a i l i n g the easier item e b K = number passing the harder and f a i l i n g the easier item 51 Appendix C Item Categorization Schemas of the SIT 52 Item Categorization Schemas of the SIT Nicholson Boyd Stone Fudala N = 149 N = 65 N = 122 N = 159 (2:0-\u00E2\u0080\u00A227:0) (4:\u00C2\u00A3 5-15:10) (2:0 -20) (2:1-\u00E2\u0080\u00A227:0) Category N % N % N % N % Information 26 17 15 23 35 29 26 16 S i m i l a r i t i e s 26 17 12 18 17 14 20 13 Short-Term Memory 13 9 6 9 12 10 11 7 Arithmetic 31 21 15 23 26 22 36 23 Vocabulary 49 33 17 26 28 23 61 38 Visual-Motor 4 3 4 3 3 2 SIT Items by Category in the Common Item Range Category Too Easy Too Hard Sum % of Total Information 12 S i m i l a r i t i e s 11 Short-Term Memory 7 Arithmetic-Reasoning 13 Arithmetic-Information 7 Vocabulary 24 1 1 7 3 1 1 2 3 7 11 9 29 23 100 46 Sum 75 13 12 25 33 53 Appendix D Co e f f i c i e n t s of Homogeneity of Item Pairs 54 Appendix D C o e f f i c i e n t s of Homogeneity of Item P a i r s (N>100) Passed F a i l e d H. . n N Passed F a i l e d H. . i i N 5-10 5-8 11-6 11-4 -.11 282 6-0 5-10 .10 100 11-8 11-6 .21 284 6-2 6-0 .26 107 11-10 11-8 .50 284 6-4 6-2 .44 121 12-0 11-10 .25 281 6-6 6-4 .35 134 12-2 12-0 .15 265 6-8 6-6 .11 160 12-4 12-2 .23 258 6-10 6-8 .60 171 12-6 12-4 .88 254 7-0 6-10 .12 185 12-8 12-6 . 18 234 7-2 7-0 .14 249 12-10 12-8 .24 228 7-4 ' 7-2 .06 249 1.3-0 12-10 .24 226 7-6 7-4 .21 250 13-2 13-0 .62 201 7-8 7-6 .19 252 13-4 13-2 .002 201 7-10 7-8 .43 259 13-6 13-4 .05 199 8-0 7-10 .07 262 13-8 13-6 . 15 . 190 8-2 8-0 .22 262 13-10 13-8 .30 179 8-4 8-2 .14 270 14-0 13-10 .09 168 8-6 8-4 .17 282 14-2 14-0 .25 158 8-8 8-6 .47 284 14-4 14-2 .21 158 8-10 8-8 .01 282 14-6 14-4 .38 155 9-0 8-10 .38 287 14-8 14-6 .07 151 9-2 9-0 .16 286 14-10 14-8 .43 150 9-4 9-2 .56 290 15-0 14-10 .06 149 9-6 9-4 .30 294 15-2 15-0 -.18 149 9-8 9-6 . 12 307 15-4 15-2 .004 149 9-10 9-8 .52 310 15-6 15-4 .22 131 10-0 9-10 .39 312 15-8 15-6 .35 126 10-2 10-0 .13 313 15-10 15-8 .25 125 10-4 10-2 .14 311 16-0 15-10 .36 124 10-6 10-4 .44 311 16-3 16-0 .26 122 10-8 10-6 .25 309 16-6 16-3 -.22 121 10-10 10-8 .44 303 16-9 16-6 .05 120 11-0 10-10 .23 298 17-0 16-9 -.16 117 11-2 11-0 .32 298 17-3 \u00E2\u0080\u00A2 17-0 -.08 110 11-4 11-2 .03 297 17-6 17-3 .00 109 "@en . "Thesis/Dissertation"@en . "10.14288/1.0054614"@en . "eng"@en . "Special Education"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Analysis of item characteristics of the Slosson Intelligence Test for British Columbia school children"@en . "Text"@en . "http://hdl.handle.net/2429/26474"@en .