The purpose of the present study was to simulate the Anchor Test Study for reading achievement tests using five individually-administered intelligence tests: The Wechsler Intelligence Scale for Children—Revised (WISC-R), the Peabody Picture Vocabulary Test (PPVT), the Slosson Intelligence Test (SIT), the Standard Progressive Matrices (SPM), and the Mill Hill Vocabulary Scale (MHVS). Three major objectives were adopted from the Anchor Test Study: to prepare tables of equivalent score values for the conversion of scores from one test to another; to compare linear and equipercentile equating procedures in the derivation of equivalent scores; and to develop provincially representative norms for the five tests. The rationale for the present study was based on the fact that intelligence tests are commonly used interchangeably on the apparent assumption that an equivalency relationship exists among common purpose tests. The primary focus of the present study was an empirical investigation of the viability of this use. In addition, American and British norm-referenced intelligence tests are interpreted in British Columbia as if the population of children to whom they are applied is identical to the population of children for whom each of the tests was prepared. An ancillary focus was the determination of the relevance of existing norms for use in British Columbia. All five tests were administered to a stratified random sample of 340 children at three age levels: 115 aged 7½ years, 117 aged 9½ years, and 108 aged 11½ years. The population from which the sample was drawn consisted of all non-Native Indian, English-speaking children at these three age levels attending public and independent schools in British Columbia. This population was further restricted to exclude children in classes for the physically handicapped, emotionally disturbed, and trainable mentally retarded. The stratification variables employed were geographic region, community size, school size, age, and sex. In addition, information was collected on a sixth variable, level of education of the head of the household, to provide a description of the sample using a socioeconomic index. The tests were first scored using the norms tables in their respective manuals. Statistical tests for differences of means and variances for the B.C. sample compared to the original standardization sample revealed that, in most cases, B.C. children scored significantly higher and with less variability (p < .05). Therefore, new norms tables were prepared for each test. These consisted of IQ conversion tables for the WISC-R, PPVT, and SIT, and percentile ranks associated with raw scores for the SPM and MHVS. The renorming procedure involved lowering and spreading out the IQ score scales to mean 100 and standard deviation 15. As a result students scored lower with the B.C. than with the published norms. This is most pronounced in the lower score ranges. In the equating phase of the study, the equivalence of each of the PPVT, SIT, SPM, and MHVS to the three' WISC-R IQ scales was examined using both psychological and statistical criteria of equivalence. Pairs of tests were defined as nominally parallel (Lord & Novick, 1968) if they were psychologically similar in terms of content and purpose, and statistically similar as defined by a disattenuated correlation coefficient ≥ .70. Thirteen test pairs were identified which satisfied the dual criteria for equivalency. Both linear and equipercentile equating procedures were applied to the observed score distributions of these test pairs. The accuracy of the results were judged by comparison of the conditional root-mean-square errors of equating associated with the equating procedures. These errors averaged 12 score points and were similar across all procedures. It was concluded that none of the test pairs considered in the study were equivalent, or parallel, and that, consequently, their interchangeable use is erroneous. Further, it was concluded that test equivalence requires a close correspondence of content in terms of item similarity. Without such correspondence, differences between tests render equating inappropriate.

