Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Producing equivalent examination forms : an assessment of the British Columbia Ministry of Education… MacMillan, Peter D. 1991

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1991_A8 M23_96.pdf [ 5.79MB ]
Metadata
JSON: 831-1.0100498.json
JSON-LD: 831-1.0100498-ld.json
RDF/XML (Pretty): 831-1.0100498-rdf.xml
RDF/JSON: 831-1.0100498-rdf.json
Turtle: 831-1.0100498-turtle.txt
N-Triples: 831-1.0100498-rdf-ntriples.txt
Original Record: 831-1.0100498-source.json
Full Text
831-1.0100498-fulltext.txt
Citation
831-1.0100498.ris

Full Text

PRODUCING EQUIVALENT EXAMINATION FORMS: AN ASSESSMENT OF THE BRITISH COLUMBIA MINISTRY OF EDUCATION EXAMINATION CONSTRUCTION PROCEDURE by PETER DENTON MAC MILLAN B.Sc , University of British Columbia 1972 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS i n THE FACULTY OF GRADUATE STUDIES Department of Educational Psychology We accept this thesis as conforming to the required standard ©THE UNIVERSITY OF BRITISH COLUMBIA August 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2/88) i i ABSTRACT Questions have been raised concerning the equivalency of the January, June, and August forms of the British Columbia provincial Grade 12 examinations for a given subject. The procedure for constructing these examinations has been changed as of the 1990/91 school year. The purpose of this study was to duplicate this new procedure and assess the equivalency of the forms that resulted. An examination construction team, all of whom had previous experience with the British Columbia Ministry of Education's Student Assessment Branch, simultaneously constructed two forms of a Biology 12 examination from a common table of specifications using a pool of multiple choice items from previous examinations. A sample of students was obtained in the Okanagan, Thompson, and North Thompson areas of British Columbia. Both forms were administered to each student, as required by the test equating design (Design II (Angoff, 1971)) chosen. The data sample consisted of responses from 286 students. The data were analyzed using a classical item analysis (LERTAP, Nelson, 1974) followed by a 2x2 order-by-form fixed effects ANOVA with repeated measures on the second factor. Item analysis revealed all items on both forms performed satisfactorily, ruling out an alternate hypothesis of flawed items being the cause of the lack of equivalence found. Results showed a significant (p<.05) difference in the means of the two forms, no < i < significant (p>.25) order effect, and a significant (p<.25) order-by-form interaction. Linear and equipercentile equatings were carried out. The linear and equipercentile equatings yielded very similar results. Equating errors, judged using the conditional root mean square error of equating, was 4.86 points (9.35%) for both the equatings. Equivalency was also judged employing a graphical procedure in which the deviation of the equating function from the identity function was plotted with error bands produced using the standard error of equating. The procedure showed the two forms to be nonequivalent, particularly for the lower scoring students. The source of the nonequivalency was investigated by separating the forms into three subtests based on the pairs of items possessing or lacking item statistics at the time of test construction. The linear equating followed by the graphical analysis was repeated for the pairs of subtests. The pairs of subtests comprised of item pairs for which difficulty (p) values were present at time of construction for one or both of the items in an item pair were found to be equivalent. In contrast, the pair of subtests comprised of items for which p values were unavailable for either item in an item pair at time of construction were found to be not equivalent. It was concluded that the examination construction procedure in its present state cannot be relied on to produce equivalent forms. An experienced examination construction team was unable to accurately match items based on the level of i v difficulty for items which did not have prior item statistics. As such, a necessary requirement for construction of equivalent forms is that item statistics be present at the time of construction. Research Supervisor Dr. D. J. Bateson V TABLE OF CONTENTS ABSTRACT LIST OF TABLES . LIST OF FIGURES ACKNOWLEDGEMENT II ix x xi Chapter 1 INTRODUCTION The Grade 12 Examination Program Purpose of the Examinations . Nonequivalency . Changes in Examination Assembly Procedures Statement of the Problem Definition of Terms Equating Parallel Forms Equivalent Tests Equivalent Scores Comparable Scores Hypotheses . . . . . Outline of Thesis . . . . REVIEW OF THE LITERATURE . Equivalence versus Comparability Parallelism Psychological Parallelism Statistical Parallelism Equating Designs Equating Methods . . . . Linear Equating . . . . Design I Equivalent Groups Design Design II Test-Retest Design Design III Anchor Test Design Equipercentile Equating Equating Error . . . . Conditional Root Mean Square 1 1 2 3 5 6 6 6 7 7 7 8 8 1 0 1 1 1 1 1 2 1 2 1 3 1 3 1 5 1 6 1 6 1 7 1 9 20 20 21 v i The Average Absolute Difference . 22 Standard Error of Equating 22 Linear Equating 23 Design I 23 Design II . . . . 23 Design III . 24 Equipercentile Equating 24 Design I . 25 Design II . . . . 25 Design III . 25 Use of Standard Error of Equating . 2 5 Magnitude of the Standard Error of Equating 2 7 Selection of Data Collection Design 2 8 Selection of Equating Model 28 3 EXAMINATION CONSTRUCTION 31 Content, Materials and Assembly Team 3 1 Choice of Subject 31 Number of Forms 31 Type of Item 3 2 Content Tested 3 3 Item Bank 34 Examination Assembly Team 34 Table of Specifications 3 5 Examination Construction Process 3 6 Examination Construction 37 Allocation of Items to Forms 3 7 Assessment of Comparability 40 Production of Examination Booklets 4 0 4 METHODOLOGY AND PRELIMINARY ANALYSIS 41 Sample Selection 41 Sample size 41 School Selection 41 Examination Administration 43 Scoring and Data Preparation 4 5 Preliminary Analysis 46 Response Rate 46 v i i Preliminary Classical Item Analysis 47 Order of Administration 4 8 5 ANALYSES AND RESULTS 5 0 Pre-Equating Analysis 50 Statistical Properties of the Forms 5 0 Equating 51 Linear Equating Angoff's Design II . 51 Linear Equating 5 2 Equipercentile Equating 54 Comparison of the Two Equatings . 5 7 Differential Performance at Lower Achievement Levels . 5 8 Assessment of Equivalence of Forms 59 CRMS and AAD 59 Standard Error of Equating 5 9 Nonequivalence 61 Source of Nonequivalence 61 Equating of Subtests 61 6 CONCLUSIONS 6 5 Summary 65 Purpose and Problem 65 Procedure . 6 5 Analysis and Results 66 Limitations . . . . . . 68 Conclusions . . . . . . 68 Implications . . . . . . 69 Implications for Practice . . . 69 Implications for Future Research . 70 REFERENCES 71 APPENDIX A: Ministry Examination Construction Process 76 APPENDIX B: Materials for Construction Team . 79 APPENDIX C: Examination Forms 8 6 APPENDIX D: Request and Permission Forms APPENDIX E: Teacher Administration Instructions APPENDIX F: Equipercentile Equating Tables I X LIST OF TABLES Table 1 1989 Examination Means and Standard Deviations 3 2 Three Random Group Equating Designs 14 3 1900 Biology 12 Table of Specifications 3 5 4 Form A and Form B Table of Specifications 3 8 5 Test Administration Design and Estimated Enrollment . . . . . . 44 6 Summary ANOVA 48 7 Cell Means 49 8 Psychometric Properties of the Two Forms 51 9 Weighted and Pooled Means and Variances 5 2 10 Results of the Linear Equating; 5 3 1 1 Results of Equipercentile Equating 5 6 LIST OF FIGURES Figure 1 Linear Equating 54 3 Equipercentile Equating 56 3 Standard Error of Equating Total Examination . 6 0 4 Standard Error of Equating P Values Known for Both Items 6 2 5 Standard Error of Equating P Value Known One Item Only 63 6 Standard Error of Equating P Values Unknown for Both Items 64 x i ACKNOWLEDGEMENT I would like to express my appreciation and gratitude to the many individuals and organizations that made completion of this study possible. Although all of my thesis committee members have offered insightful comments, and given me valuable support beyond what is normally required, all have given me much more. Special thanks are offered to: Dr. David J . Bateson for first stirring my interest in educational measurement; Dr. Harold Ratzlaff for the encouragement that lead to my entrance into the MERM area; Dr. W. Todd Rogers, not only for his advice and support during my course work both at the University of British Columbia and later the University of Alberta, but most especially during the writing of this thesis. This study could not have been carried out without a grant from the B. C. Ministry of Education. Personal financial assistance in the form of course fee reimbursement was given by the Kamloops District Teachers' Association. I appreciate the time and effort put forth by the students and teachers who took part in this study. I am especially indebted to the examination assembly team for the use of their time and expertise. Finally, this thesis could not have been completed without the invaluable support of my family, and my friends and colleagues at UBC, the U of A, and in the Kamloops School District. 1 j; CHAPTER 1 INTRODUCTION In 1983, the British Columbia Ministry of Education reintroduced provincial examinations for thirteen grade 12 subjects. While there have been, and still are many issues surrounding British Columbia governmental examinations (Anderson, Muir, Bateson, Blackmore, & Rogers, 1990; Rogers 1990; Ministry of Education, 1985; Sullivan, 1988), this study is concerned with only one of these issues. Seven years after the reinstatement, there are still concerns about the ability of the test makers to create equivalent forms that are required for use at the three testing times - January, June, and August - throughout the school year. In an attempt to address this concern, the Ministry has changed its procedures for examination construction. The purpose of this study was to assess the success of these new procedures. The Grade 12 Examination Program All Grade 12 students in British Columbia are required to write school leaving examinations in each examinable course in which they are enrolled. All grade 12 students must write at least one examination in order to graduate, either English 12 or Communications 12. Presently examinations are administered in fifteen courses: Algebra (Mathematics as of September 1990), Biology, Chemistry, Communications, English, English Literature, Francais-langue, French, Geography, Geology, German, History, Latin, Physics, and Spanish. The scores from each examination are 2 combined with the school awarded mark so that the examination counts 40% of the final mark for each course. The examinations are offered on three occasions during the school year: January, June, and August. The January sittings allow students in semestered schools to write examinations for their first term courses at the appropriate time. The June sittings are the times at which students in semestered schools write their second term examinations. Students in nonsemestered schools write all of their examinations in June. Some students are granted the opportunity to write one or more examinations in August. Purpose of the Examinations The expressed purposes of the Grade 12 examinations are ... to insure that grade 12 students meet consistent provincial standards of achievement, in academic subjects. The examination program will also ensure that graduating students from all schools will be treated equitably when applying for admission to universities and other post-secondary institutes. An additional purpose is to respond to strong public concerns for improved standards in education. (Ministry of Education, 1983, p. 6) The stated reasons for the examinations center around a desire for consistent standards, whether the consumers of this information be the Ministry, universities, teachers, parents, or students themselves. It is this consistency that is the issue of this study. 3 Nonequivalencv The examinations are prepared using a common table of specifications, the assumption being that the forms produced will be equivalent. Preliminary inspection of the examinations using statistical criteria indicates there may possibly be differences between forms in some subject areas (Ministry of Education, 1989). Table 1 displays means and standard deviations of the multiple choice sections for the 1989 examinations. Table 1 1989 Examination Means and Standard Deviations January June Subject n M S.D. n M S.D. t-test Alg 2857 34.0 10.19 9795 33.5 9.79 -2.33* Bio 2007 32.3 8.28 7741 31.9 8.96 -1.89 Chem 1234 32.2 7.06 5881 32.5 7.43 1.35 Comm 1094 35.5 4.01 4618 31.8 4.47 -26.88* Eng 6420 18.3 3.74 21338 18.4 3.71 1.88 Lit 577 19.0 5.20 3354 20.0 5.07 4.28* Fren 1 136 45.4 7.74 4838 42.9 8.42 -9.63* Geog 1599 35.9 7.55 6644 35.5 7.80 -1.88 Hist 1562 31.9 8.12 6205 34.4 7.65 11.00* Phys 432 39.4 10.02 3343 45.0 9.37 11.01* * p<.05 4 Examination of Algebra 12 (Alg), Biology 12 (Bio), Chemistry 12 (Chem), English 12 (Eng), and Geography 12 (Geog) show little difference, less than one point, between the means of the respective January and June examinations. While the Algebra means are significantly different (t=-2.33, p<.05), the difference of 0.5 points might be considered a trivial difference. Communications 12 (Comm), English Literature 12 (Lit), French 12 (Fren), History 12 (Hist), and Physics 12 (Phys) all show significant (p<.05) and nontrivial differences. The equal means of the January and June forms of the examinations for the first group of subjects, Algebra, Biology, Chemistry, English, and Geography suggest that the January and June forms for a given subject are of equal difficulty and that the January and June examinee populations are of equal ability within the various subject areas. If then, the examinee populations are of equal ability, the difference in means of the second group of subjects must be due to January and June forms that vary in difficulty. The Ministry of Education is aware of the possible lack of equivalency between forms as reflected by the following statement which appears in its Biology 12 Report to Schools (1988): Without field test information it is not possible to ensure that the January, June, and August examinations will be of equal difficulty. (Biology 12 Report to Schools, Appendix C, p. 2) 5 The Ministry has attempted to address this problem through the use of a standard setting process (Ministry of Education, 1989, p. 75). Students do not receive raw score percentages as their examination marks. Instead, the province-wide distribution of examination results for a given subject is compared with the province-wide distribution of school-awarded marks for the given subject. If a difference between the two; distributions occurs, the examination distribution is altered to more closely fit the school awarded marks distribution. This is done in spite of a Ministry caveat that the two marks do not measure the same skills or knowledge.1 Changes in Examination Assembly Procedures In an attempt to address the apparent lack of equivalency of examinations in a given school year, the Ministry implemented a new test construction procedure (see Appendix A) in 1990 (Ministry of Education, 1990). Prior to 1990, the forms for a given school year were produced separately at different times of the year, although by the same team. Beginning with the 1990 examinations, the three forms were assembled simultaneously 1 The August results are not discussed for two reasons. First, the Ministry of Education does not routinely release information pertaining to the August examinations as it does for the January and June sittings. Second, students writing one or more examinations in August do so either because of a previous examination failure that school year or due to some other reason acceptable to the Ministry of Education. The examinees are clearly neither a random nor a representative sample of the population of students writing examinations in any given school year. Knowledge of the August results would yield no information by which the equivalency of the three forms could be assessed. 6 using items that had already been developed, and, in some cases field tested. Statement of the Problem The purpose of this .study was to empirically test the assumption that the examination forms produced by the new procedure are equivalent. The specific question addressed was: For a given subject, will the two forms of the government examination constructed at the same time by the same team from a common table of specifications be equivalent? It is in the interests of all the stakeholders that the exams be made as fair and consistent as possible. Not only must the results of the examinations be as consistent as possible, but it must also be demonstrated that this consistency exists. There is an "...obligation to review whether a practice has appropriate consequences for individuals and institutions, and especially to guard against adverse consequences" (Messick, 1989, p. 20). The intent of this study was to determine whether or not the procedure of simultaneous construction of the examination forms would produce the equivalent forms of an examination that would ensure consistent treatment of all examinees. Definition of Terms Equating For the purposes of this study, the definition of equating proposed by Angoff (1971) was adopted: 7 equating, or the derivation of equivalent scores, concerns itself with the problem of unique conversions which may be derived only across test forms that are parallel, that is forms that measure, within acceptable limits, the same psychological function, (p. 563) Parallel Forms Equivalence is dependent upon parallelism, in fact is synonymous with it (Holmes, 1981, p. 7). Parallel tests can be defined using both statistical and psychological criteria. Gulliksen (1950) defined parallelism as follows: In addition to equal means, variances, and reliabilities, parallel tests should have approximately equal validities for any criterion...The tests should contain items dealing with the same subject matter, items of the same format, etc. In other words, the tests should be parallel as far as psychological judgement is concerned, (pp. 173-174) Equivalent Tests Dorans and Lawrence (1990) describe tests as being equivalent if the equating function, the function that maps scores on one test to scores on another, is the identity function, y=x, (p.253). Equivalent Scores Test forms which meet the criteria for parallelism are referred to as parallel forms or equivalent forms. The scores obtained from these forms are interchangeable; they can be interpreted in the same way. 8 Comparable Scores If the tests are not parallel, then comparable scores may be calculated, but these1 scores do not function in the same fashion. For example, scores on a spelling quiz could be mapped to comparable scores on a multiplication quiz, yet no one would claim these scores measure the same construct. Such scores, while comparable, are not equivalent. Rogers and Holmes (1987) state: {computing comparable scores} can be done irrespective of test content and carries no implication of the interchangeability of tests, (p. 4) If two scores are merely comparable, they correspond to equal standard scores or the same rank position if the score distributions are similar. However, the scores do not carry any connotation as to the equality of the amount of the trait(s) being measured (Rogers & Holmes, 1987, p. 5). The score conversions are nonunique and are not generalizable beyond the group for which they were established (Millman & Linoff, 1964). Hypotheses Given the Gullicksen (1950) definition of parallel forms, the following three hypotheses for the study were formulated: 1. Ho Lla=LLb Hi LLaiLLb, where LLa and Lib are the means on Form A and Form B, respectively. 9 2. Ho 02a/a2b =1 Hi 0 2 a /0~ 2 b * 1 , where 0 2 a and 0"2b are the variances of Form A and Form B, respectively. 3. Ho paa = pbb Hi p a a * p b b , where p a a and pbb are the internal reliabilities of Form A and Form B, respectively. A fourth hypothesis, based on work by Loret, Seder, Bianchini, & Vale (1974, p. 8) and Holmes (1981, p. 80), was: 4. Ho cpab =1 Hi cpab<1, where cpab is the intertest correlation coefficient between Form A and Form B corrected for attenuation. A fifth hypothesis, based on Dorans' and Lawrence's work, was: 5. Ho F(x) = l(x) Hi F(x) * l(x), where F(x) is the equating function and l(x) is the identity function y=x. 1 o Outline of the Thesis Chapter Two begins with an elaboration of equivalent and comparable scores discussed in relation to psychological and statistical parallelism. Data collection designs and equating methods along with associated equating error estimates are discussed. The relative merits and weaknesses of each model as applied to this study are used to select the most feasible data collection design and equating process. In Chapter Three the construction of the examination forms used in this study is described. Chapter Four includes a description of both the; methodology and the preliminary data analysis, while the main analyses and the corresponding results are presented in Chapter Five. A summary of the study together with conclusions, implications for practice, and implications for future research are presented and discussed in Chapter 6. 11 CHAPTER 2 REVIEW OF THE LITERATURE As described in Chapter 1, questions have been raised regarding the equivalency of the three forms of a Grade 12 examination that are administered during a given school year. Described in this chapter are various procedures that can be employed to test the equivalency of test forms. First, equivalent and comparable scores are discussed in relation to psychological and statistical parallelism and the differences between them noted. This is followed by an outline of standard data collection designs and corresponding analytical procedures for assessing the equivalence of different test forms. Various error estimates of equating are then described. The chapter concludes with the rationale for the design selected and used in the present study. Equivalence versus Comparability The distinction between equating and comparing is somewhat blurred due to inconsistent use in the literature (Rogers & Holmes, p. 5). As described in Chapter One, comparable scores can be produced irrespective of test content but these scores are not interchangeable and do not carry any connotation about the amount of trait measured. Equivalent scores are interchangeable and represent equal amounts of a trait. Likewise, terms describing the mathematical process for obtaining these equivalent and comparable scores is also blurred. Angoff (1971) uses the term calibrating as the more inclusive term for the process that maps a score on ; one test to a score on another test. Thus the calibration process produces comparable scores. In the more restrictive case of parallel tests, the process of calibrating is properly described as equating. The result of an equating is equivalent scores on forms that are said to be equivalent forms. The production of equivalent scores is dependent on the requirement of parallel forms being met. Parallelism Gulliksen (1950) addressed the interchangeability of forms when he suggested that "two tests are parallel when it makes no difference which test you use." (p. 11). Later Lord and Novick (1968) clarified the definition of parallel when they stated: ...parallel measurements measure exactly the same thing in the same scale and, in a sense, measure it equally well for all persons, (p. 48). Parallelism can be described in both psychological and statistical terms. Both psychological and statistical parallelism must be satisfied if the forms are to be considered parallel. Psychological parallelism. Psychological parallelism requires that the test forms to be equated^ measure the same psychological trait or function (Rogers & Holmes, p.6). Earlier, Wesman (1958) suggested, "the degree to which the conditions of parallelism are met are most closely achieved when forms are designed to be parallel" (p. 8). Since the two forms of the Biology 12 examination, the examination selected! for study in this thesis, were produced simultaneously from the same table of specifications, they met the criteria suggested by Wesman. Therefore they were judged to be psychologically equivalent. Statistical parallelism. In addition to Gullicksen's (1950) statistical criteria of equal means, variances, and reliabilities, Loret et al (1974, p. 8), and Holmes (1981, p. 80) employed the intertest correlation coefficient corrected for attenuation, c pab, as a test for parallel forms. The basic formula for this coefficient is: cpab = Pab/^(PaaPbb), (1) where cPab is the intertest correlation corrected for attenuation, pab is the intertest correlation, and Paa and pbb are the internal consistency reliabilities of Form A and Form B. (Cronbach, 1971, p. 489). A value of .95 is a commonly accepted cutoff point for equating studies (Linn, 1975, p. 206). Equating Designs Angoff (1982) presented three basic designs for use in developing equivalent or, in the case of lack of parallelism, comparable scores. These designs are shown in Table 2. 1 4 Table 2 Three Random Group Equating Designs Design Group 1 Group 2 I X Y II X:Y Y:X III X, U Y , U In Design I, the Equivalent Groups Design, two groups formed by random selection are administered two forms of the same instrument (X and Y). The Equivalent Groups Design employs random groups and assumes equality of the two groups. Design II, the Test-Retest Design, involves the administration of both forms to one group in one order and a randomly different group in reverse orderi thereby allowing for an assessment of a possible order effect. The order effect may consist of practice effects or fatigue effects. The randomization is often achieved by randomly splitting the group in two by spiralling the test booklets so that half the sample write the forms in one order while the other half write in the reverse order. Design III, the Anchor Test Design, consists of different forms administered to random groups as in Design I but with a set of common or anchor items (labelled U in Table 1). The anchor items may be placed within the forms of the test or administered in a separate form. Anchor items are placed on all forms of the instrument so that comparisons between performance of the two groups can be made. Use of the Anchor Test Design, Design III, reduces the need for the assumption of equivalency of the two groups required by Design I. Equating Methods An equating method is an empirical procedure for determining a transformation to map the scores on one form of a test onto the scores of a second parallel form of the test (in the the case of parallel tests) or to map the scores of one test onto another (in the case of comparable scores). There are three methods for determining the transformation: linear equating, equipercentile equating, and item characteristic curve (ICC) equating (Petersen, Marco, & Stewart, 1982, p. 73). Linear and equipercentile equating may be carried out using observed scores (Braun & Holland, 1982, pp. 9-49) within the classical test model; ICC equating is carried out using item response models.2 Observed score equating can be carried out if the forms are equally reliable (Angoff, 1982, pp. 58-59)J Examination of the internal consistency! reliability estimates for the 1989 grade 12 examinations revealed that eight of the ten examinations' reliabilities differ by .02 or less (Ministry of Education, 1989, pp. 2Classical test-model theory has been usefully applied to examinee achievement and aptitude test performance for many years. Item Characteristic Curve (ICC) equating, which uses item response theory, is an alternative to classical observed score equating (Lord, 1980). The reader is referred to Hambleton (1989) for a more complete introduction to item response theory. 1 6 66-67). As the forms in this study are designed following Ministry procedures, it is likely that equally reliable forms will result. The following discussion of equating deals with the case of equally reliable forms The reader is referred to Angoff (1971) for the case of unequally reliable forms. Linear Equating In linear equating it is assumed that two scores are equal if they correspond to equal standard score deviates (Angoff 1971, p. 564): (Y-My)/Sy = (X-M x )/s x , (2) where M y and M x and s y and s x are, respectively, the means and standard deviations of Form Y and Form X. The linear nature is more apparent if the equation is rewritten to give: Y = AX+B (3) with coefficients A and B defined according to the equating design. Design I Equivalent Groups Design The equivalent groups design consists of one administration of Form X to one group and one administration of Form Y to another group assumed to be equivalent to the first group. The values of the coefficients A and B can be calculated according to the following equations: A = Sy/Sx (4) and B = My-Alvlx (5) (Angoff, 1971, pp. 564-575). Design II Test-Retest Design In the case of the test-retest design each examination form is administered to both group 1 and group'2. Therefore there are two means and two standard deviations for each group. The values for A and B can be calculated according to the following equations provided the sample sizes are equal: (Angoff, 1971, pp. 564-577). However as pointed out in Glass & Hopkins (1984), if the sample sizes are not equal, then this must be taken into account when computing values for A and B. The equations then used are: A = V[(s2 y 1+ S 2y2)/(S 2x1+s2 x 2)] (6) and B = 0.5(M y 1+My2)-0.5A(M xi+M x 2) (7) A = V( S2 y/ S2 x) (8) and B = [(ny2My2+nyiMyi)/(ny2+ny1)]-[(A)(nX2Mx2+nxiMxi)/(nx2+nxi)], (9) where s 2 y is the weighted average variance on Form Y, s 2 x is the weighted average variance on Form X, M y i is the mean on Form Y of the group of examinees who wrote Form X first, My2 is the mean on Form Y of the group of examinees who wrote Form X second, Mxi is the mean on Form X of the group of examinees who wrote Form X first, M X2 is the mean on Form X of the group of examinees who wrote Form X second, and nxi, nX2, nyi> a n d ny2 are the respective sample sizes (Angoff, 1971, pp. 574-575). The weighted average variances are calculated with the equations: S2y = [(ny1-1)S2y1+(ny2-1)S2y2 + ny n(My l-My) 2+ny2(My2-My) 2]/(n y-1 ) (10) and s2x = [ (n x i -1 ) s 2 x i+ (n X 2 -1 ) s 2 X 2+n x 1 (M x 1 -M x ) 2 +n X 2 (M x 2 -M x ) 2 ] / (n x -1 ) , (11) where s 2 x i , s 2 X2. s 2 y i , and s 2 y 2, are the variances of the respective subsamples, and with other variables defined as before (Glass & Hopkins, 1984, p. 53). Angoff (1971) describes a variation of these procedures (p. 575). The data on Form X for the two half groups are combined and likewise the data on Form Y for the two half groups. The equating is then carried out using equation 4 and equation 5. Design III Anchor Test Design The anchor test design consists of a set of common items that can be described as forming a test U. Tests X and Y are then regressed on U to form estimates for the mean (u,x and |i y) and standard deviation (c?x and a y). The values for A and B can be calculated according to the following equations: A = c » y / a x (12) and B = |Iy-A|IX, (13) where the estimated means are: | i x = M x i + b x u i ( L i u - M u i ) (14) \iy= M y 2+b y U2(|iu-Mu2), (15) ! \ 20 and the estimated standard deviations are: o-x = V [ a 2 xi+ b 2 xui ( a 2 u -<J 2 ui ) ] ( 1 6 ) o-y = V[c 2x2+b 2yu2(a 2 u-a 2u2)L (17) where b x u i and b y U 2 are the regression coefficient X on U and Y on U for the respective subsamples. (Angoffi 1971, pp. 564-577). Equipercentile Equating Equipercentile equating is based on a definition of equivalent scores (Flanagan, 1951; Lord, 1950) that considers two scores equivalent if their corresponding percentile ranks are equal. As such, equipercentile equating makes no assumptions about the score distributions for the two forms. The procedure first requires the computation of percentile ranks for the raw score distributions. These-percentile ranks are; plotted against the raw score. Smoothing of the line may be done by hand although computer programs that simulate the smoothing process have been developed (Lindsay and Prichard, 1971). Scores from the smoothed curves are plotted against one another and this new curve is then smoothed. Score conversions are taken from this final smoothed curve. Equating Error Three error measures are considered when assessing the degree of equivalence as determined by an equating method. The first two, the conditional root-mean-square (CRMS) and the average absolute difference (AAD) are used to judge the equivalence between scores with the sample(s) used. The third, i the standard error of equating (sy*), is used to assess the degree of sampling error in the estimates across| repeated replications. Conditional Root Mean Square The CRMS is a measure of the fit between an observed score on X and the equivalent score on X derived from a knowledge of a score on Y. The formula used to calculate CRMS is: CRMS = V[£(X-Y*)2/n-1], (18) X is the the score on Form X' Y* is the the score derived frolm Form Y, and n is the the number of subjects writing both Form X and Form Y (Bianchini & Loret, 1974, p. This error is an estimate of "the degree to which a score read from the equating table would differ from the score a pupil would have earned had he been given the equated test" (Jaeger, 1973, p. 7). Holmes (1981) judged discrepancies of 12. points or more (M=100, S.D.=15) as an indication of nonequivalency (p. 137). In a practical sense, the magnitude of the error as it pertains to test interpretation can be considered an indicator of equivalency (Holmes, p.136). As well, the CRMS can serve as an indicator of the suitability of the method of equating. where 158). i 22 The magnitude of the conditional root mean square can be compared to another similar error estimate in an attempt to assess equivalency of the forms and the* interpretability of the individual scores. The standard error of measurement (SEM) "can be viewed as the standard deviation of the discrepancies between a typical examinee's true score and the observed scores over an infinite number of repeated testings" (Crocker & Algina, 1986, p. 150). The Average Absolute Difference The AAD is defined as: AAD = £|X-Y*|/n (19) where X is the score on Form X, Y* is the score derived from,Form Y and, n is the number of pairs of scores. Like the CRMS, the AAD provides an indication of the degree to which individuals achieve the same score on two forms to be equated. Standard Error of Equating Unlike the CRMS and the AAD, which are indicators of the similarity of an individual's scores on the two forms, the standard error of equating, sy*, is a measure of error in the equating procedure itself. The standard error of equating (sy*) "reflects the degree to which the equating results would vary if the same equating procedure and method were applied to different representative samples of pupils" (Jaeger, 1973, p. 7). Unlike the CRMS and the AAD, which are defined as above regardless of equating design used, the formula for sy* depends upon the design used and the equating method employed. Linear Equating Design I. For Design I, the Equivalent Groups Design, the error is estimated by the equation: sy* = V[2s2 y( z2 x+2)/n], (20) where sy* is the standard deviation of the error estimate, s 2 y is the variance of the scores on Form Y, z x is a standard score on Form X, and n is the total number of examinees in both groups (Angoff, 1971, p.570)3. Design II. For Design II, the Test-Retest Design, the appropriate formula is: sy* = V{s2 y (1-r X y)[(z2 x ( l + r x y ) + 2]/n}, (21) where s 2 y is the variance of the Form Y scores, r x y is the correlation between the Form X and Form Y 3Braun and Holland (1982, pp.32-36) discuss the more general case of unequal group size and nonnormal score distributions. ZX n scores, is the standard score for each Form X score, and is the number of examinees (Angoff, 1971, p.575). Design III. For Design III, the Anchor Test Design, the appropriate formula is: sy* = V{2s2 y ( l-r2)[(z2 x ( l + r2 ) + 2] /n} , (22) where s 2 y is the estimated variance of the Form Y scores, r is the estimated correlation between the estimated Form X and estimated Form Y scores, z x is the standard score for each Form X score, and n is the total number of examinees (Angoff, 1971, p.577). Equipercentile Equating Lord (1982), demonstrated that the magnitude of standard errors of linear equating was one half that of equipercentile error in the middle of the score range and comparatively smaller at the extremes (p. 173). Even though Lord derives standard errors of equipercentile equating for Design I and Design II, he concludes that linear equating is clearly the method of choice (p.174). Given Lord's strong preference and, as will be reported in Chapter 5, only the equipercentile standard error of equating for Design I is discussed here. Design I. For Design I, the Equivalent Groups Design, the error is estimated by the equation: Sy* = V [ s 2ypq (n x " 1 + ny-1)/(t)2] (22) where s 2 y is the estimated variance of the Form Y scores, p is the proportion of scores below a score, xn, q is 1-p, <(> is the standard normal density: at the unit normal below which p of the cases fall, n x is the number of examinees writing Form X, and n y is the number of examinees writing Form Y, (Petersen, Kolen & Hoover, 1989, p. 251) Design II. For Design II, the Test-Retest Design, the appropriate formula and related discussion is found in Lord (1982). Design III. For Design III, the Anchor Test Design, the appropriate formula and related discussion is found in Jarjoura & Kolen (1985). Use of Standard Error of Equating The standard error of equating (sy*) was used to judge the the equivalency of the two forms in the Anchor Test Study. Linn (1975) reported errors of less than one score point were found in the Anchor Test Study (p. 207). Dorans and Lawrence (1990) used this error (Sy*) to place a heuristic confidence band around the equating function (p. 247). They suggested that the null hypothesis for equating be that the equating function equals the identity function (p. 253). They then suggested placing bands of plus or minus two standard errors of equating (±2S y *) around the equating function to define the region in which the identity function must fall if the tests are judged to be equivalent at the .05 level of significance. To present this information, Dorans and Lawrence suggested using a graph with the ratio of the difference between predicted score and observed score divided by the standard error of equating (sy*) plotted on the ordinate and the observed score of a test plotted on the abscissa. Forms were judged equivalent if the plotted function does not exceed bands of ±2 units. While attractive due to its ease of interpretation, the graph of this ratio versus raw score has two flaws. First, the standard error of equating is not constant throughout the range of scores, but this is not apparent using this graph since the standard error of equating is not plotted. Second, the plot of the equating function is distorted, giving the appearance of being least equivalent near the mean of the scores. Both these shortcomings are corrected by first using bands formed;by ±1 .96Sy* which are placed around the line y=0. The line ,y=0, is equivalent to the difference between the equating function and the identity function if the forms are perfectly equivalent. Forms are judged as equivalent if the plot of difference between the equating function and the identity function falls within the standard error bands. This plot accurately displays both the nature and magnitude of the equating function and the standard error of equating. Magnitude of the Standard Error of Equating Sample size, equating design and equating method all affect the magnitude of the standard error of equating. Within a given equating method, sample size and design interact to determine the magnitude of equating error. With Design I, large sample sizes may be needed to reduce the standard error of equating. For example, Dorans and Lawrence (1990) used two random groups of 48,639 and 43,845 subjects in an equating study in which the statistical equivalence of nearly identical test editions of the Scholastic Aptitude Test (SAT) was investigated. Angoff (1971) found that for a test-retest reliability of .80, ten times the sample size is needed to reduce the standard error of equating Design I to the standard error of equating Design II (p. 575).. With Design III, the anchor test scores are used to reduce error caused by group differences, whether the differences result from random or non-random group assignment. Continuing his earlier example, Angoff states that when the correlation between the anchor test and the forms to be equated equals .87, the equating error is one-fourth as large at the mean as it would be using Design I. The sample size using Design I would need to be four times that of the Design III sample if the same magnitude of error is desired. Given a set sample size and set equating design, the method of equating also influences equating error. As described earlier, the error of equipercentile equating is generally larger than that of linear equating throughout most of the score range. Selection of Data Collection Design Since this study will use forms as similar as possible to a British Columbia Grade 12 Biology Examination, the content area to be tested includes the entire core curriculum which is taught throughout the course. Also, since students should be given the opportunity to learn the material to be tested, the testing should take place as near to the end of a school year. However, teachers, wary of the pending examinations, express concern about time that can be spared for a research project. It was therefore decided, that given limited time for testing and in light of cost considerations, Design II, the Test-Retest Design, (see Table 2) would be employed to collect the data. Further, these restrictions were the reasons that two rather than three test forms would be constructed at one time for a single subject, Biology 12. Selection of Equating Model Linear equating is appropriate if the X and Y distributions differ only in the means and standard deviations (Braun & Holland, 1982, p. 17). As it is totally analytical and free from smoothing errors, it is the preferred method if the differences between the shapes of the score distributions can be regarded as trivial. Dorans and Lawrence (1990) support the choice of the linear method arguing that because the null hypothesis for parallel forms is a linear identity function, it does not make sense to use equipercentile equating if forms are expected to be nearly parallel (p. 253). If a linear equating function cannot be assumed, then equipercentile equating can be used. However, the lack of assumptions about the score distributions and subsequent wider applicability has a price. The error of equipercentile equating is larger than that of linear equating throughout the score range (Lord, 1982, pp. 173-174). This makes it less powerful in detecting nonequivalency between forms (Dorans & Lawrence, 1990, p. 253). Ceiling and floor effects allow equipercentile equating to be used when a linear equating produces equated score outside of the test score range. This occurred in the Anchor Test Study (Bianchini & Loret, 1974). Skaggs & Lissitz (1986) suggest that no single method of equating is consistently superior to others. If tests are reliable, nearly equal in difficulty, and samples are nearly equal in ability, then most linear, equipercentile, and IRT models will produce satisfactory results (p. 523). For small sample sizes (n<1000), it is unclear which of these methods yields the best results. Earlier, Kolen and Whitney (1982), using the Test of General Educational Development in an equating study with a sample size of 200, found that linear and Rasch (a one-parameter logistic item response model) equating methods produced the acceptable results for horizontal equating 4 of achievement tests, while equipercentile and three-parameter logistic (item response model) equating methods produced unacceptable results. Equating tests of similar difficulty is referred to as horizontal equating. The examination forms for this study were simultaneously assembled according to the specifications of a common specification table. The intent of the new Ministry procedure is to construct equivalent forms. There should be no difference in difficulty. Equally reliable test forms would also be expected since most examination pairs are already equally reliable. A test-retest design is being used; there should be no difference in ability of the examinees from one testing day to the next. Under these conditions, linear equating, (Petersen, Cook, & Stocking, 1983) using linear method for equally reliable tests (equations 3-6), appeared to be the most appropriate method to follow. However, given the uncertainty in the literature about which method is superior, it was also decided that both Angoff's linear procedure for Design II in which the data are pooled across order of administration, and equipercentile equating using pooled data would be performed. If the linear equating using pooled data proves feasable, only this method will be retained. CHAPTER 3 EXAMINATION CONSTRUCTION As described in Chapter 1, beginning with the 1989/90 school year the three forms of each course examination are to be constructed at the same time in an attempt to produce a greater degree of equivalency among the three forms. The degree to which this desired outcome is met was tested in this study. Due to restrictions of time for testing, two rather than three test forms were constructed at one time for a single subject, Biology 12. The construction of these two forms is described in the present chapter. Content, Materials, and Assembly Team Choice of Subject The choice of Biology 12 was based on three factors. First, there was a large enrollment in this course. The large enrollment means a large potential sample size representing a wide cross section of Grade 12 students. Second, there was a sufficiently large number of multiple choice items available that would be suitable for use with the 1990 Table of Specifications adopted for this study. Finally, an intact experienced team of school teachers was available to assemble the Biology 12 forms from the pool of available items. Number of Forms Two forms, each containing 52 items, were produced. Producing three forms, as is the intention of the full Ministry examination program, would have required a larger sample of both items and people as well as increased testing time, all factors which could not be accommodated in the present study. The forms were administered to students near the end of the school year to better approximate the actual time of writing. Teachers, while willing to cooperate, were willing to administer only two forms. However, if the two forms for this study, which were constructed following procedures set forth by the Ministry of Education, were found to be equivalent, then it would be reasonable to expect the process would be capable of yielding three equivalent forms. Contrarily, if the two forms were found not to be equivalent, then it would be impossible that three forms would be. Type of Item In 1987, 1988, and 1989, the provincial Biology 12 examination has consisted of approximately 50 multiple choice and 5 supply items. These items formed the pool of field tested items that were used for construction of the two forms administered in the present study. Since copies of previous examinations may be obtained from the Ministry, it is likely that each student would be familiar with at least some of the items in this pool, particularly since many teachers often use these items as review material. This creates a potential problem. An item that is recognized by an examinee because of previous experience with that item would be more likely to be answered correctly than if the item were new to the examinee. If two examination forms are created to be of equal difficulty using previous item difficulty statistics, yet one form contains more items that are recognized by the examinees sitting the two forms, this form will appear easier due to the recognized items. Since the items to be used were to be selected from a pool of items available to the examinees, the greater number of multiple choice items than supply items provided a larger pool from which to select the items for the two forms to be constructed. This greater number of multiple choice items means that the likelihood of students recognizing a multiple choice item would be lower in comparison to the likelihood of recognizing a supply item. More importantly, the frequency of item recognition over the entire examinee population should be randomly equal on the two forms. With the much smaller sample of supply items, recognition of items is more likely to occur and an unequal number of items recognized on each form is more likely to result. Thus, given the limited time for testing and the greater number of potential multiple choice items, the decision was taken to include only multiple choice items on the two forms. Content Tested The Biology 12 curriculum consists of a core area and six options. The provincial examination for Biology 12 reflects this structure; students must write a set of core items and then select two of the six options. Assuming for the sake of argument that each option is chosen by an equal number of examinees, then each option is attempted by approximately 1/3 of the total examinees. If it were essential to obtain item statistics on the optional items equivalent in precision to the statistics for items included in the core area, a sample of examinees three times as large as that for the core items would be needed. Consequently, given the restrictions mentioned earlier, the sample of items for the tests used in this study was restricted to the core content only. Item Bank The item bank available consisted of 288 multiple choice items in the core topics of 'cells' and 'humans' that were used on the 1987, 1988, and 1989 Biology 12 examinations. However, only the 184 items from the 1987 and 1988 examinations had any available item statistics, and only the percentage of students who chose each option was available. Statistics for the 1989 items were not available at the time of examination assembly. Examination Assembly Team The team of five teachers responsible for the construction of the two forms of the Biology 12 examination used in this study were members of a team currently involved in the Ministry's item banking project for Biology 12. The five teachers each had received training and gained experience in item writing, item classification, and mini-test construction. The five teachers had established a stable working relationship through working on the item banking project for over a year. The; team chairperson was a teacher who, for several years prior to the item banking project, had served on the Ministry's Biology 12 examination production team. As such, they were comparable in knowledge and experience to the Ministry of Education teams that construct the Biology 12 examinations. Due to illness, one member of the team later dropped out of the project. Table of Specifications The specifications table used, presented in Table 3, was adapted from the 1990 Table of Specifications for Biology 12. Table 3 1990 Biology 12 Table of Specifications Topic/subtopic Cognitive level K* U/A HMP Total Methods & principles I Experimental Design II Homeostasis Cells III Cell compounds IV Ultrastructure V Ultraprocesses VI Photosynthesis Subtotal Humans VII Cells, Tissues and VIII Digestive IX Circulatory X Nervous XI Excretion & Respiration XII Endocrine Subtotal TOTAL 10 15 5 ITEMS 1 1 1 8 29 19 5 8 33 52 *K = Knowledge U/A = Understanding/Application HMP = Higher Mental Process The specifications table includes the core topics of 'Cells' and 'Humans' and reflects the restriction to multiple choice items. It should be noted that each item was classified by a two dimensional grid of "topic" and "cognitive level". The British Columbia Ministry of Education uses Bloom's taxonomy (Bloom, 1956) collapsed to three levels: knowledge (K), understanding/application (U/A), and higher mental processes (HMP). This table contains the information given to the examination assembly team to form a basis for their selection of items. The numbers within the cells are the number of items for each topic by cognitive level cell. A total of five items from the 'Cells' and 'Humans' topics are also included by the Ministry in the category 'Methods and Principles'. Since these five items are already included within the 'Cells' or 'Humans' topics, they are not included again in the item subtotals or item total. Examination Construction Process The Ministry document titled 'FLOWCHART and TIMELINES' for the Production of the 1991 Examinations DRAFT' (see Appendix A) describes the entire new procedure to be followed to construct the forms of the examination. Only the stages that are directly related to this study are discussed below. A package consisting of instructions, the examination specifications table, items, and item option p-values (when available) was sent to all team members. The materials with the exception of copies of the items are included in Appendix B. Examination Construction Prior to meeting on the examination assembly day the team leader assigned a topic area and corresponding subtopics (see Table 3) to each member of the team. Each team member was asked to confirm through personal professional judgement the curricular validity of the items referenced on the basis of previous years classification for the subtopics for which they were responsible. Their judgements were then discussed at the beginning of the meeting to achieve consensus on the curricular validity. Only items which the team unanimously judged to measure the subtopic were retained. Allocation of Items to Forms The published table of specifications (Table 3) gives subtotals for only the very broad topics of 'Cells' and 'Humans' by cognitive level. Following identification of suitable items, the assembly team determined the number of items to be included for each subtopic by cognitive level. The results of this assignment are shown in Table 4. To actually assemble the test forms, the team divided into two pairs, each pair responsible for one of the topic areas. Working within each > subtopic by cognitive level sub-cell, each pair first categorized each retained item according to difficulty level. The proportion of examinees answering each item correctly (item p values) were available for the 1987 and 1988 items. The difficulty levels for the 1989 items were established subjectively by each pair of team members. Table 4 Form A and Form B Table of Specifications Cognitive level Topic/subtopic . K U/A HMP Subtotal Form A , B a A;B A,B Methods & principles I Experimental Design II Homeostasis 5 ITEMS Cells III Cell compounds 1,1 2,3 1,1 IV Ultrastructure 1,1 3;3 0,0 V Ultraprocesses 3,3 3,2 2,2 VI Photosynthesis 0,0 3,3 0,0 Subtotal 5,5 11,11 3,3 Humans VII Cells, Tissues 1,1 0,0 0,0 VIII Digestive; 3,3 4,4 0,0 IX Circulatory 2,2 4,4 1,1 X Nervous 3,3 4,4 0,1 XI Excrete & Respire 2,2 4,4 1,0 XII Endocrine 1,1 3,3 0,0 Subtotal 12.12 19,19 2.2 TOTAL 17,17 30,30 5,5 19,19 33,33 52,52 a Form A number of Items, Form B number of items. Once this had been done, items within the same subcell were paired according to similar difficulty. One item of each pair was then randomly assigned to each form. Three different situations occurred during the matching of items. First, both items in a pair had p value information. There were 24 such pairs. Second, an item with known p value was matched with an item whose difficulty was judged subjectively by the test assembly team. There were 17 such pairs; 8 items with known p values were placed on Form A and 9 items on form B. The remaining 11 pairs were formed with no prior p value information about either item. In four instances, a matched pair of items within a subtopic area could not be formed. In these cases, items from different subtopics within the same topic area, at the same cognitive level, and with similar difficulty levels were combined for assignment purposes. The final number of items selected for each cell is reported in Table 4. As shown in Table 4, the number of items classified as either K or U/A is greater than the intended number shown in Table 3. This difference is due to the lack of availability of matched item pairs within some subtopics at the HMP cognitive level. Items were placed on each form according to the order given in the table of specifications (see Table 4). Draft copies of each form were prepared. These draft forms were checked by the team for duplication of items or the possibility that information in one item could be used to correctly answer another item. Assessment of Comparability The composition of the two forms is shown in Table 4 and a copy of each examination form is provided in Appendix C . 5 The item subtotals for both the Topic by Cognitive Level and Subtopic by Cognitive Level are identical on both forms. The only differences between the structures of the two test forms are in the 'Cells by U/A' cell: 'cell compounds' and 'ultraprocesses' subcells and the 'Humans by HMP' cell: 'nervous' and 'excrete and respire' subcells. These differences were not expected to create any differences in examinee performance between the two forms. The mean p value for the items of known p value was 0.64 for each form. Production of Examination Booklets The final forms of the examination were printed by a professional printer using camera ready copy provided by the researcher. Prior to printing, the camera ready copy was reviewed by a Biology 12 teacher who was not on the examination assembly team. This was done to produce an independent check of the technical aspects of the form; the clarity and accuracy of the diagrams; correctness of spelling; spacing; and item formatting. 5 Form A and Form B were initially referred to as Form I and Form II respectively. The designations, Form I and Form II, are used consistently throughout the appendices. 41 CHAPTER 4 METHODOLOGY AND PRELIMINARY ANALYSES Chapter Four consists of two major sections. In the first section, the procedure used to select the sample is described. This is followed by a description of the test administration and scoring procedures. The second section contains a description of the preliminary analysis of the data. The purposes of the preliminary analysis were to determine the characteristics of the sample and the characteristics of the two forms that were relevant to the choice of equating methods used to test the equivalence of the two forms. Sample Selection Sample Size The desired sample size was set at 500. For forms that should be very nearly parallel, one might want to detect a one mark difference between the means using an alpha level of .05. Using these values, and in light of the results of previous Biology 12 examinations (standard deviation 10 for approximately 50 items), the t-test between independent groups indicated a sample size of 400 would provide the statistical precision necessary to test the equivalence of the two forms using the test-retest design with independent groups (Angoff, 1971, pp. 58-59). School Selection Sample schools were selected in two stages. First, officials in four school districts in the Okanagan, Thompson, and North Thompson geographical areas in British Columbia were asked for permission to approach Biology 12 teachers in their districts to request that they and their students participate in the study (see Appendix D). These four districts were selected so that the combined Biology 12 course enrollment was greater than 700, thereby allowing for possible nonresponse at the district, school, and student levels. Initial verbal inquiries of teachers in these districts yielded a high expressed willingness to participate; therefore an initial target sample size of 700 was judged adequate. Permission was granted in three districts. Officials in the fourth district indicated that they did not wish to have students in their district approached. No reason for refusal was given nor was one requested. Attempts to replace this district with a district of similar population size in the same region were unsuccessful due to the lateness of the request. Consequently, the potential sample size at the end of stage 1 was approximately 600. In the second stage, a letter was sent to all Biology 12 teachers within participating districts. This letter contained a brief explanation of the process of the study and a request that their Biology 12 classes take part (see Appendix D). Of the 16 schools in which these teachers taught, teachers in 15 schools initially agreed. The single Biology 12 teacher in the remaining school explained that because of time considerations his classes would not be able to participate. All Biology 12 teachers in each of the remaining 15 sample schools agreed to administer the two forms of the examination to all students in their Biology 12 classes. Based on teacher enrollment figures the total number of potential subjects was 597. This figure of 597 is somewhat high as teachers who did not have exact figures readily available when contacted were asked to overestimate. As this estimate was used to determine the number of examination forms to be printed, overestimation was more desirable than underestimation. The request for permission letters along with administration instructions and the 'Request for Ethical Review' form were sent to The University of British Columbia Behavioral Sciences Screening Committee For Research and Other Studies Involving Human Subjects. A copy of the Certificate of Approval received is provided in Appendix D. Examination Administration Students were tested by their teachers during the last four weeks of the school year. The test administration design and the estimated enrollment as reported by the teachers for their respective schools are shown in Table 5. As shown, Form A was scheduled first in 5 schools and Form B first in 10 schools. While there are advantages to interleaving the forms within classes, students within a given class were all given the same form at the the same time. Further, all classes within a school were given the forms in the same order and on the same day. The concern that students would share information between test sittings was the reason for both practices. Teachers were asked to allow up to a maximum of 5 school days between administrations and not to administer both forms at one sitting. Table 5 Test Administration Design and Estimated Enrollment School Enrollment Order AB BA 1 AB 32 2 BA 42 3 BA 27 4 BA 35 5 BA 35 6 BA 35 7 BA 25 8 AB 10 9 BA 1 4 10 AB 46 11 BA 41 12 BA 7 13 BA 73 14 AB 130 15 AB 45 Total Enrollment 291 306 n=597 Beyond these instructions teachers were allowed to schedule the examinations at their convenience. This was necessary given the time of year at which the testing was completed. During the last four weeks of school there is a general increase in the number of activities that impinge upon the time spent in the classroom. However, despite the existence of these activities, it was necessary to administer the tests toward the end of the school year as the content examined required the students to have completed the core sections of the Biology 12 curriculum. Teachers were asked not to inform the students following the first test administration that they (the students) would be tested with the second form. Further, they were asked not to discuss content relevant to the examinations during the interval between testings. The necessary number of test forms (see Appendix D) and instructions (see Appendix E) were sent to each cooperating school during the first week of May. Students were allowed 40 minutes to write each test form. Students recorded their responses on an NCS General Purpose Answer Sheet. The same sheet was used on both occasions with Form A answers recorded on side 1 and Form B answers on side 2. Following both test administrations, teachers collected the answer sheets and returned the sheets to the researcher for scoring and analysis. Scoring and Data Preparation The completed answer sheets were processed using the optical scanner maintained by the Educational Measurement Research Group (EMRG) at the University of British Columbia. The tests were scored and an item analysis was performed using LERTAP (Nelson, 1974). Scoring and statistical analyses were completed using various SPSS-X (SPSS Inc., 1988) computer programs. The analyses were performed on the Amdahl 470/V8 computer maintained by the University Computing Centre at the University of Alberta. Preliminary Analyses Response Rate Altogether, answer sheets were received for 312 students, or 52.3% of total teacher estimated enrollment. After initial agreement to participate, teachers at two schools opted out of the study, reducing potential sample size by 175 students. The reason given for opting out was lack of time. An additional 73 answer sheets from one school were lost in the mail. The mail loss, the two schools opting out, and the 312 completed forms account for 560 of the estimated 597 students. The difference of 37(6.19%) is likely due to a number of causes. Students who had written one or neither of the two forms should not have had an answer sheet returned. Class enrollments may decrease slightly due to the student dropping Biology 12 near the end of the school year From the answer sheets received, those from one school with seven students were deleted due to failure to follow instructions. The students in this school were allowed to begin their second form once they had completed their first form rather than waiting until the next class, the stipulated minimum amount of time i ' between testings. Eight students who, due to absence, completed only one of the two forms were removed from the sample. Lastly, 11 students who failed to see Item #52 Form A were removed. This problem will be discussed in detail in the preliminary classical item analysis. Thus, the final number of usable Form A and Form B pairs was 286. The loss of 248 students due to schools opting out and mail loss reduces the the sample size considerably. The schools were urban schools that enrolled an estimated two to six classes of Biology 12. The 1990 Biology 12 examination means and standard deviations for the two schools that opted out revealed little difference between either of these schools and the means and standard deviations for the district. Based on these results there is no suggestion of bias due to the loss of these two schools. Preliminary Classical Item Analysis The student responses were analyzed using LERTAP (Nelson, 1974). Inspection of the percentage of students who responded to each item in each form revealed that the last item in Form A had a larger nonresponse than the preceding 51 items. Twelve subjects (4.0 %) of the sample did not respond. This non response rate is substantially higher than the mean item nonresponse rate of 0.1% (S.D=0.2%) for the remaining items. The value of the point biserial, 0.00, for the nonresponse category indicates no correlation between the ability of the subjects and the lack of response to this item. Further Item #52 appeared alone on the second to the last page while the last page was blank. The placement of the item on the form, together with the item results, suggested that the 12 students likely did not see this item. However, of the 12 students, one subject also omitted Item #51. Thus, in contrast to the other 11 students, it seemed plausible that this particular student omitted Item #52 for reasons other than poor placement of the item. Consequently, only 11 students were deleted from the sample. Order of Administration The order of administration was tested with a 2x2 (order of administration-by-test form) fixed effects ANOVA with repeated measures on the second factor. The results are summarized in Table 6. As revealed in Table 6, there is no significant examination form order of administration effect (F=1.33, p<.25) but there is a significant examination form effect (F=25.18, p<.05). Table 6 Summary ANOVA Part of model df MS F P Between Subjects Order 1, 284 209.51 1.33 .25 Subject(order) 156.99 Within subjects Form 1, 284 313.56 25.18 .00 Order*Form 1, 284 61.45 4.93 .03 Form*Subjects(order) 1, 284 156.99 12.61 .00 Error 12.45 There is also a significant examination form by administration order interaction (F=12.16, p<.25). Of interest here was whether this interaction was attributable to differences within form between the two examination occasions. Examination of the cell (form by administration order) means and application of Scheffe's multiple comparison procedure revealed that these differences were not significant (p<.05). The significant interaction was attributable to differences between means on the two forms within schools in which the two test forms were administered in the same order. In the sample of eight schools in which Form B was administered first, the Form B means exceeded the Form A means in all but one school (a small school: n=9). In the 3 schools in which Form A was administered first, the Form B means were greater (see Table 7). Table 7 Cell Means Order AB BA n 84 202 Form means Form A 29.12 (56.00%) 28.51 (54.83%) Form B 31.46 (60.50%) 29.42 (56.58%) Deletion of this school from the analysis revealed little change in the results. Consequently, given the lack of known cause for this one school's results and the lack of significant change when this school was removed from the sample, the students in this school were retained. CHAPTER 5 . ANALYSES AND RESULTS Chapter Five begins with the presentation of the psychometric characteristics of the two forms. The characteristics of the two forms are then used to justify the equating techniques used. The second portion of the chapter consists of a description of the analyses and presentation of results in the sequence in which the analyses were conducted. The chapter concludes with an exploration of the source of the nonequivalence identified. Pre-Equating Analyses Statistical Properties of the Forms Presented in Table 8 are the mean, standard deviation, range, internal consistency (Hoyt, 1941), standard error of measurement (SEM), skewness, and kurtosis of the distributions of scores on Form A and Form B. As shown, the mean scores of Form A, 28.69 (55.17%), and Form B, 30.02(57.73%), are significantly different (t=5.02, p<.05), while the standard deviations (t=.58, p<.56) and internal consistency estimates (t=0.92, p<.36) are not (see Chapter 1, Hypotheses 1, 2, and 3). The value of skewness and kurtosis for Form A and for Form B are, respectively, within 1 standard error of each other. The ihtercorrelation coefficient corrected for attenuation (.97) exceeds the value of .95 that Linn (1975) describes as a commonly accepted cut off for equating studies (see Chapter 1, Hypothesis 4). 51 Table 8 Psychometric Properties of the Two Forms n=286 Form A Form B Mean 28.69 30.02 Standard deviation 9.46 8.97 Minimum score 9.00; 12.00 Maximum score 51.00 51.00 Internal consistency 3 .89 .87 SEM 3.14 3.15 Skewness 0.24 0.28 Kurtosis 2.13 2.24 Correlation coefficient .85 Disattenuated correlation .97 i a Hoyt (1941) Equating Linear Equating Angoff's Design II Since the internal consistency of the two tests did not differ significantly, and the intercorrelation coefficient corrected for attenuation (.97) exceeded .95, Lord's (1950) linear equating method for equally reliable tests for Design II (Angoff, 1982; 1971) was used. Although Angoff described a variant of Design II in which the data are pooled across order of administration (see Chapter 2), the Form A and Form B samples were not initially combined. Before equating, the weighted means and variances for each form to be used for the linear equating were compared with the pooled estimates that Angoff suggested could be used. As shown in Table 9, the weighted means and variances and the corresponding pooled means and variances for Form A and for Form B were found to be identical (less than 0.1% difference in the Form A and Form B variances). The equivalence of the weighted and pooled estimates means the equating results based on the pooled variances would be identical to those obtained using the weighted variances. All equatings were completed using the pooled data. Table 9 Weighted and Pooled Means and Variances Sample number of cases mean variance SD Form A AB 84 29.12 99.60 9.98 BA 202 28.51 85.56 9.25 Weighted 286 28.69 89.42 9.46 Pooled 286 28.69 • 89.39 9.46 Form B AB 84 31.46 88.74 9.42 BA 202 29.42 76.04 8.72 Weighted 286 30.02 80.33 8.96 Pooled 286 30.02 80.41 8.97 Linear Equating The results of the linear equating using the pooled estimates are summarized in Table 10 and illustrated in Figure 1. As shown in Table 10, while the means of the derived score on Form B and the obtained scores on Form B are equal, the CRMS is large, 4.87. The value for the AAD is also large, 3.74. Further, the corresponding ranges of the differences between B and B*, -19 to 13, is wide. Table 10 Results of the Linear Equating Variable Mean SD Range Score A 28.69 9.46 9 51 Score B 30.02 8.97 12 51 Score B* 30.01 8.97 12 51 B - B * 0.01 4.87 -19 1 3 AAD 3.74 3.11 0 1 9 Figure 1 is a graph of derived Form B scores vs Form A scores. The identity function, B=A, has been added to the graph to serve as a point of reference as this is the line that would result if the two forms were perfectly equivalent. Examination of Figure 1 reveals that the significant difference of -1.33 (2.56%) observed between the mean on Form A and the mean on Form B (Table 10) does not appear to be constant across all performance levels. That is, the equating line is not parallel to the linear identity function B=A. In particular, students at the lower performance levels (below a score of 26 (50.0%) on Form A) would have received derived scores 2 marks (3.8%) above that had there been perfect equivalence. In contrast, for those in the middle range of performance the difference was about 1 mark while the difference between the derived score and the score received had there been perfect equivalence for the the most able examinees (above a score of 39 (75.0%) on Form A) was essentially zero. Figure 1 Linear Equating 60 - i o -jr—> 1 • 1 • 1 • 1 ' 1 • 1 0 10 20 30 40 50 60 Score A These linear equating results lend support to that conclusion of nonequivalency derived from the significant difference of the means, and adds the suggestion that the nonequivalency arises because of the poor fit between the derived score and observed scores on Form B for the less able examinees. Collectively, these results confirm the finding that the two forms were not equivalent for the sample of students include in this, study. Equipercentile Equating The steps followed to complete the equipercentile equating were as follows: 1. A table of relative cumulative frequencies was prepared separately for the distribution of scores on Form A and the distribution of scores on Form B. 2. The cumulative frequencies and the raw scores for each distribution were plotted on arithmetic graph paper with raw scores placed on the horizontal axis. Hand smoothed curves were drawn through the plots of the two distributions. 3. Score values from the smoothed plots were read and recorded for each percentile within each distribution. 4. The score values from step 3 for Form A were plotted against the score values for Form B and a smoothed curve drawn. 5. A table of equivalent score values was prepared from this final curve. (Angoff, 1971, pp. 571-576) Tables of cumulative frequencies and score values (step 1, step 3, and step 5) are provided in Appendix F. Equipercentile equating results are summarized in Table 11 and illustrated in Figure 2. The means for Form A and Form B and the derived scores on B along with their corresponding standard deviations are reported in Table 11 together with the values of the CRMS and AAD. The mean difference, B-B*, (0.18) is likely due to rounding error and smoothing error at the various stages of the equipercentile equating process. The CRMS is large, 4.86. The value for the AAD is also large, 3.73. Further, the corresponding ranges of the differences between B and B*, -20 to 13, is wide. Table 11 Results of Equipercentile Equating Variable Mean SD Range Score A 28.69 9.46 9 51 Score B 30.02 8.97 12 51 Score B* 29.84 8.91 11 51 B - B * 0.18 4.86 -20 1 3 AAD 3.73 3.10 0 20 Examination of Figure 2 reveals that the significant difference of -1.33 (2.56%) observed between the mean on Form A and the mean on Form B ( Table 11) does not appear to be constant across all performance levels. That is, the equating line is not parallel to the linear identity function B=A. Figure 2 Equipercentile Equating 60 -i 50 -DQ CD \— o o CO 40 -30 -20 -10 -— o — B=A — E q u a t i n g Function T T 1 0 20 30 40 Score A 50 60 In particular, students at the lower performance levels (below a score of 22 (42.3%) on Form A) would have received derived scores 2 marks (3.8%) above that had there been perfect equivalence. In contrast, for those in the middle range of performance the difference was about 1 mark, while the difference between the derived score and the score received had there been perfect equivalence for the the most able examinees (above a score of 37 (71.2%) on Form A) was essentially zero. These equipercentile equating results lend support to that conclusion of nonequivalency derived from the significant difference of the means and adds the suggestion that the nonequivalency arises because of the poor fit between the derived score and observed scores on Form B for the less able examinees. Comparison of the Two Equatinas The results of the linear and equipercentile equatings were very similar. The conditional root mean squares for the two equatings were 4.87 (9.37%), and 4.86 (9.35%) respectively. In each case a mean absolute difference between observed score on Form B and predicted score on Form B were 3.74 (7.19%) and 3.73 (7.17%) respectively. The two equating results also suggest the two forms perform differently for varying ability levels. There is approximately a 2 mark difference at the lower level of examination performance, about a 1 mark difference for the middle range of scores, and no difference for the upper level of achievement. Earlier in the chapter, given that the linear weighted and linear pooled means and variances agreed, the decision to pool was taken. Using the pooled data, the equipercentile equating results agreed with the results of the linear equating. As there is no advantage to equipercentile equating as demonstrated by the CRMS and the AAD, all subsequent analyses were conducted using linear equating with the pooled sample of 286 students. Differential Performance at Lower Achievement Levels Both equatings revealed greater differences in the performance on the two forms by lower achieving students. This is contrary to a suggestion by Petersen, Kolen, and Hoover (1989) that test forms may appear equivalent at chance levels of achievement (p. 243). The interpretability of scores, particularly the lower scores, may be questionable due to the phenomenon of guessing. If these lower scores are suspect, then the significant differences found may be in fact due to guessing. Various studies indicate, however, there is no clear advantage to either applying a correction for chance formula or removing chance scores from the sample (Angoff, 1989; Albanese, 1988; Donlon, 1981; Tinkelman, 1971). In contrast, the performance of high scoring students was similar on both forms. This may be explained by the possible presence of a ceiling effect. If both forms are too easy, then high performing students would be expected to achieve similar and near perfect scores. However, the observed score distributions do not show the strong negative skewing (see Table 8) that would be expected if a ceiling effect was present. Therefore this explanation was rejected. Assessment of Equivalence of Forms CRMS and AAD The conditional root mean square, CRMS, was described in Chapter 2 as an indicator of the degree to which a score earned by a student would differ from a score derived for that student from an equating table. Similarly the absolute average difference, AAD, provides an indication of the degree to which an individual would achieve the same score on the two forms to be equated. As reported in Table 8, the standard error of measurement (SEM) was 3.15 (6.1%) and the CRMS was 4.87 (9.37%) (Table 10). The SEM provides a logical lower bound for the CRMS. Form B cannot be more parallel to Form A than Form A is to itself nor can Form A be more parallel to Form B than Form B is to itself. The CRMS was 1.55 times greater than the SEM. The AAD, like the CRMS, can serve as an indicator of usefulness or interpretability of the parallel form scores. , As reported in Table 10, the AAD was 3.73 (7.17%). In the ideal situation, the AAD would be zero. This value is judged too large to be an indication that,Form A and Form B are parallel forms. Standard Error of Equating The standard error of equating (sD*) reflects the degree to which the equating results would vary if the equating was repeated with a different sample. The modified Dorans and Lawrence (1990) approach described in Chapter 2 allows a graphical interpretation of equivalence. As stated there, the confidence bands are established about a line that would be obtained given exact equivalency. As shown in Figure 3, the plot of the equating function cuts through the 95% confidence band limits about the identity function. As the difference between the equating function and the identity function exceeds the standard error of equating, the two forms are judged to be overall nonequivalent (see Chapter 1, Hypotheses 5). Figure 3 Standard Error of Equating Total Examination As suggested by Figures 1 and 2, and confirmed by Figure 3, while the forms are equivalent for the highest achievers, for most of the 61 score range, a difference that is too large to be attributed to equating error is found. i Nonequivalence Source of Nonequivalence As described in Chapter 3, prior p value information was available and used to form 24 pairs of items. One item from each pair was randomly assigned to each form. An additional 17 pairs of items in which only one item had a prior p value were created. Again, one item from each pair was assigned to each form so that 8 items with prior p value information were placed on one form and 9 items on the other form. For the remaining 11 pairs of items judgements were made about item p value similarity prior to item assignment to form. The three different conditions of p value information within item pairs were the basis for the formation of three subtests. Equating of Subtests It was hypothesized that the lack of parallelism noted in the two forms was attributable to differences arising from an inability to match items on difficulty level when p values were not available. To test this hypothesis, the linear equating was replicated for each of the three subtests now formed within each test. The equating results were examined using the modified Dorans and Lawrence (1990) graphical analysis. As the separate subtests are comprised of varying numbers of items, all reported figures on the graphs are percentages in order to facilitate comparison among the three. As previously described, the first pair of subtests each consisted of 24 items. Difficulty indices (p values) were known for all these items prior to administration. The mean p value for Form A subtest items was .64 while for the Form B subtest the mean p value was .65. Equivalent subtests were expected. As shown in Figure 4, the equating function lies between the standard error of equating bands, ± 1 . 9 6 s D * , the two forms comprised of pairs of items for which p values were known for both items are equivalent. Figure 4 Standard Error of Equating P Values Known for Both Items The second set of subtests consisted of 17 items each. Eight items with known p values were placed on Form A while the remaining nine items with known p values Were placed on Form B. The Form A subtest mean p value was .62 for the items with prior p values. The Form B subtest had a mean p value of .63 for the items with prior p values. As with the first pair of subtests equivalent tests were expected. As seen in Figure 5, subtests in which the p value for only one of the pair of items were known also resulted in equivalent forms. Figure 5 Standard Errror of Equating P Value Known One Item Only CD o o CO "O CD - • — » co '> CD Q 4 -2 -0 - o - H . . . * " -* *• Equating Function Standard Error Standard Error Exact Equivalency — i — 20 —r-40 — r -60 80 1 00 1 20 Score A The third set of subtests consisted of 11 items each. None of these items had prior p values. The results for these subtests formed from pairs of items for which the difficulty was qualitively judged are markedly different (see Figure 6). These subtests are not equivalent. The source of nonequivalence of the total forms appears to be due to the pairs of items for which p value data was not available for either item. Figure 6 Standard Error of Equating P Values Unknown for Both Items 40 n -10 I • 1 1 1 1 1 • • 1 • i i • 1 1 • 1 1 0 20 40 60 80 100 120 Score A CHAPTER 6 CONCLUSIONS This chapter contains four sections. First, a summary of the study including its purpose, procedure, and findings is given. Limitations of the study are then discussed. The limitations are followed by the conclusions of the study. The chapter ends with implications for practice and implications for research. Summary Purpose and Problem Questions have been raised concerning the equivalency of the January, June, and August forms of the British Columbia provincial Grade 12 examinations for a given subject. The implementation of a new procedure for constructing these forms began in the 1990/91 school year. The change in procedure was expected to improve the ability of the examination construction team to produce equivalent forms. The purpose of this study was to duplicate this new procedure and assess the equivalency of the forms that result. Procedure An examination construction team, all of whom had previous experience with the British Columbia Ministry of Education's Student Assessment Branch, simultaneously constructed two forms of a Biology 12 examination from a common table of specifications using a pool of multiple choice items from previous examinations. Some of the items had accompanying p value information at the time of examination construction, while other items lacked this information. A sample of students was obtained in the Okanagan, Thompson, and North Thompson areas of British Columbia. Both forms were administered to each student, as required by the test equating design (Design II (Angoff, 1971)) chosen. The final usable data sample consisted of responses from 286 students, with Form A administered first to a subsample of 84 students and Form B administered first to the other 202 students. Analysis and Results The data were first analyzed using a classical item analysis followed by a 2x2 order-by-form fixed effects ANOVA with repeated measures on the second factor. Classical item analysis revealed all items on both forms performed satisfactorily, ruling out the alternate hypothesis of a flawed item(s) being the cause of the lack of equivalence found. The ANOVA revealed a significant difference in the means of the two forms. While no significant order effect was found, a significant order-by-form interaction was present. No other significant differences between the two forms were found. Linear and equipercentile equatings were carried out. Since the weighted and pooled estimates for the variances of the two forms were found to be equal, the pooled estimates were used for these analyses. Both equatings yielded essentially the same results with the same degree of agreement between observed scores and derived scores (B derived from A) in the sample as estimated by the CRMS and the AAD. Both revealed that students at high levels of achievement found the forms of equal difficulty while students at lower levels of achievement found one form to be easier than the other form. Since the linear and equipercentile method yielded similar results, and in light of the arbitrariness associated with equipercentile equating, all further analyses were conducted using the linear equating method and computation of standard error of equating. The forms were judged to be nonequivalent as the CRMS and AAD were large and a plot of the equating function deviated around the identity function fell outside the range of the standard error of equating estimated for the sample. The source of the nonequivalency was examined by separating each of the two forms into three subtests based on the items possessing or lacking p values at the time of test construction. The first subtest pair of forms consisted of 24 pairs of items with prior p value information available for both items in the pair. The second subtest pair of forms were created from an additional 17 pairs of items in which only one item of the pair had a prior p value. The third pair of subtest forms were created from the remaining 11 pairs of items for which qualitative judgements had been made about item difficulty prior to item assignment to form. The forms were judged to be equivalent or nonequivalent based on whether the plot of the equating function deviated around the identity function fell inside or outside the range of standard error of equating estimated for the sample. The first pair of subtests were judged to be equivalent, as were the second pair of subtests. Only the pair of subtests formed from items with no p value present for either item were judged to be nonequivalent. Limitations The generalizability of the study is limited by several factors. This study was restricted to multiple choice items in the core curriculum of one subject area. Factors not considered were supply items, optional areas within a subject area, and other subject areas. However, since the multiple choice items on the core areas are most probably the most reliable and stable items, if two equivalent forms of a test using only these items cannot be constructed, there is little hope that equivalency using the other forms of items or other topic areas can be achieved. Regarding the sample, although there was sufficient range in scores to examine the student performance at all plausible achievement levels (scores ranged from 17% to 98%) for the experimental examinations, the score distributions obtained cannot be assumed to represent provincial norms. Conclusions The central problem of this study was: will the new procedure for the construction of equivalent forms from a common table of specifications result in equivalent forms? The major conclusion is that the procedure in its present state cannot be relied on to produce equivalent forms. Subsequent investigation following the finding of nonequivalent forms suggests that the lack of equivalency results from the inability of an experienced examination assembly team to accurately match levels of difficulty for pairs of items which do not have prior item statistics accompanying them. Conversely, an experienced examination assembly team is able to produce equivalent forms if prior p value information is available. This study utilized an item pool in which the proportion of the items with prior p value information was greatly in excess of the 20% goal for the 1990 examinations (see Appendix A). The lack of equivalency would be expected to worsen when the proportion of items with prior p values is decreased. Equivalency of forms can be judged using a combination of a fit statistic, the conditional root mean square (CRMS), and a measure of sample error in the equating, the standard error of equating (sy*). Implications Implications for Practice The procedure for examination construction used in this study cannot be expected to produce equivalent forms within a given subject area. The use of a trained and experienced examination construction team will not guarantee equivalent forms, items with p value information must be used. The magnitude of the CRMS and AAD indicate that a student's score on one form cannot be taken as a reasonable estimate of the score that would have been obtained had a different form been written. This is of particular importance when interpreting the student scores obtained on the August supplemental examination. The score from the August supplemental examination is used in place of a January or June score that is either absent or unsatisfactory. The group writing the August supplemental cannot be assumed to be randomly equivalent to the larger population. The awarding of provincial scholarships depend in part on scores achieved on these examinations. Scores from students writing an examination in January, for example Biology 12, must be compared with scores for another student writing a different Biology 12 examination in June. The modified Dorans and Lawrence (1990) procedure could be used to judge January and June forms as equivalent or nonequivalent in the score range of interest. If a pair of forms are equivalent, no equated score need be used, the scores are interchangeable. If the January and June forms are judged as nonequivalent in the score range of interest, then an equated score could be given to the January student. In either situation, the January and June examinees in a given subject area can be compared. Implications for Future Research The implication that the use of classical item statistics could be used to produce parallel forms should be tested for multiple mark (supply) questions. The possibility of creating statistically parallel optional sections within an examination form should also be explored. Other subject areas should be investigated. There should be an attempt to duplicate the results that suggested equivalent examination forms were produced when item pairs in which only one member of a pair possessed prior item statistics were used. 71 REFERENCES Albanese, M. A. (1988). The projected impact of the correction for guessing on individual scores. Journal of Educational  Measurement. 25. 149-157. Anderson, J. O., Muir, W., Bateson, D. J . , Blackmore D., & Rogers, W.T. (1990). The Impact of Provincial Examinations on Education in British Columbia: General Report. Victoria, BC: Ministry of Education. Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 508-600). Washington, DC: American Council on Education. Angoff, W. H. (1982)! Summary and derivation of equating methods used at ETS. In P. W. Holland & D. B. Rubin (Eds.), Test  equating (pp. 55-69). New York: Academic Press. Angoff, W. H. (1989). Does guessing really help? Journal of  Educational Measurement. 26. 323-336. Bianchini, J . C.,& Loret, P. G. (1974). Anchor Test Study. Final Report. Project report and volumes 1 through 30: (ERIC Nos. ED 092 061 through ED 092 631). Bloom, B. S. (Ed.). (1956). Taxonomy of educational objectives- Handbook 1: The cognitive domain. New York: McKay. Braun, H. I., & Holland, P. W. (1982). Observed-Score equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9-49). New York: Academic Press. Crocker, L , & Algina J . (1986). Introduction to classical and  modern test theory. New York: Holt, Rinehart and Winston. Cronbach, L. J . . (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 443-507). Washington, DC: American Council on Education. Donlon, T. F. (1981). Uninterpretable scores: their implications for testing practice. Journal of Educational Measurement. IS , 213-219. Dorans, N. J . , & Lawrence, I. M. (1990). Checking the equivalence of nearly identical test editions. Applied Measurement in  Education. 3. 245-254 Flanagan, J . C. (1951). Units, scores, and norms. In E. F. Linquist (Ed.), Educational measurement, (pp. 695-763). Washington, DC: American Council on Education. Glass, G. V., & Hopkins, K. D. (1984). Statistical methods in education and psychology. Englewood Cliffs, NJ: Prentice-Hall Inc. Gulliksen, H. (1950) Theory of mental tests. New York: Wiley. Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational  measurement (3rd ed.) (pp. 147-200). New York.: MacMillan Holmes, B. J . (1981). Individually administered intelligence tests:  an application of anchor test norming and equating  procedures in British Columbia. Unpublished doctoral dissertation, University of British Columbia, Vancouver. Hoyt, C. J . (1941). Test reliability estimated by analysis of variance. Psvchometrika. 6. 153-160. Jaeger, R. M. (1973). The national test-equating study in reading (The anchor test study). NCME Measurement in Education. 4 (Whole No. 4). Jarjoura, D., & Kolen; M. J . (1985). Standard errors of equipercentile equating for the common item nonequivalent populations design. Journal of Educational Statistics. 10. 143-160. Kolen, M. J . , & Whitney, D. R. (1982). Comparison of four procedures for equating the tests of general educational development. Journal of Educational Measurement. 1_9_, 279-294. Lindsay, C. A., & Prichard, M. A. (1971). An analytical procedure for the equipercentile method of equating tests. Journal of  Educational Measurement. 8, 203-207. Linn, R. L. (1975). Anchor Test Study: The long and the short of it. Journal of Educational Measurement. 12. 201-204. Lord, F. M. (1950). Notes on comparable scales for test scores (Research Bulletin), Princeton, NJ: Educational Testing Services. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Lord, F. M. (1982). The standard error of equipercentile equating. Journal of Educational Statistics. 7165-174. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental  test scores. Reading, MA: Addison-Wesley. Loret, P. G. , Seder, A., Bianchini, J . E. , & Vale, C. A. (1974). Anchor  test study. Equivalence and norms tables for selected reading  achievement tests (grades 4. 5. 6). Washington, DC: U. S. Government Printing Office. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement. (3rd ed.) (pp. 13-102). New York: MacMillan. Millman, J . , & Linof, J . (1964). The comparability of fifth grade norms of the California, Iowa, and Metropolitan achievement tests. Journal of Educational Measurement. 1. 135-137. Ministry of Education. (1983). Ministry policy circular. Victoria, BC: Author 7 4 Ministry of Education. (1985). Let's talk about schools: A report to  the Minister of Education and the people of British Columbia. Victoria, BC: Author. Ministry of Education. (1988). Biology 12 Report to schools. Victoria, BC: Author. Ministry of Education. (1989). Report to schools. Victoria, BC: Author. Ministry of Education. (1990). Flowchart and Timelines for the  Production of the 1991 Examinations Draft. Victoria, BC: Author. Nelson, L. R. (1974). Guide to LERTAP use and interpretation [Computer program manual]. Dunedin, New Zealand: University of Otago, Education Department. Petersen, N. S., Cook, L. L., & Stocking, M. L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics. 8. 137-156. Petersen, N. S., Kolen, M. J. , & Hoover, H. D, (1989). In R. L. Linn (Ed.), Scaling, norming, and equating. Educational  Measurement (3rd ed.) (pp. 221-262). New York: MacMillan Petersen, N. S., Marco, G. L., & Stewart, E. E. (1982). A test of the adequacy of linear score equating models. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 71-136). New York: Academic Press. Rogers, W.T. (1990). Current educational climate in relation to testing. Alberta Journal of Educational Research. 3JL 52-64. Rogers, W. T., & Holmes, B. J. (1987). Individually administered intelligence test scores: equivalent or comparable? Alberta  Journal of Educational Research. 3JL 2-20. Skaggs, G. , & Lissitz, R. W. (1986). IRT test equating: Relevant Issues and a review of recent research. Review of Educational Research. 5j6_, 495-529. SPSS Inc. (1988). SPSS-X user's guide (3rd ed.) [Computer program manual]. Chicago: Author. Sullivan, B. M. (1988). A legacy for learners: The report of the  Roval Commission on Education. Victoria, BC: Ministry of Education. Tinkelman, S. N. (1971). Planning the objective test. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 46-80). Washington, DC: American Council on Education. Wesman, A. C. (1958). Comparability vs. equivalence of test scores. Test Service Bulletin No. 53. New York. The Psychological Corporation. Appendix A Ministry Examination Construction Process 77 F L O W C H A R T end TJW1EL3MES for tha Production of tho 1991 Examination's tO/3/1 Phis* 1 E X A M WRfTMG T M J T I davabpa k t m i *u<(lcl4nt lor T H R E E Ptov. & T H R E E Schol. axtmt ( mlgs. req'd. lot T E A M EOrT.). 4 parsons @ J2000/pS«rcon Slaggerad sub<nlislon: tst-April 30 1990 2nd-July 15 or bafora. i.tj /1 / 3 0. 120 DESKTOP PU8LEHNG Wcxd prooaas Input submlnad by Exam Wiling laama. LC IMA 120 SO/8/9 P O O L oirr E M S (or P R E P A R A T I O N T E A M . • uttlcUnt to prodoc* THREE SETS ol < u m i . (sat - Ptov.&Schol.) l$±\J 3 ITEMS (Iltld-Ujlad?) from' ITEM BANKINQ 20% of Kama on atch a*sm should b« •|tam-banV (laid lestacT Tha potential ol lhasa lama as yal It undalatmlnad. For mulmum ItailblBly. Exam Oavalopan not dapand on this i t a soutca; rathar thata hams should appralsad as lhay baooma avallabta and used whara . approprlata.Tha 20% manlbnad abova repratanu (ha Idaal goal. SELECTION of lama to prapa/a THREE SETS d EXAMS by lha EXAM PREPARATION TEAM. (< parto"ninS>~J1SC/atam/patson) H potsbla 20% of Kama on aach aram should ba'ham-bank Paid ttslad* Taam maeilng toma'.lma In August. tSt.ll 11 DRAFT 9 0 / 8 / 2 0 7 ISJA1V. DeSKTCf PUBLISHING UNIT ol lha 3 t»U (Pfov.iScfior) Inlo Ravla»-(a»dy llrtt dialls. i£±l±2-2 X 30 78 LSX2-L2.4 A. EXTERNAL REVIEW ol lha 3 Mts ol t u n i i ; 3 day mlg. raq'd. Sa.t.&Sun.+ Won. ot F i l . 4 p«non> @ $120 par w ! of m m i (Prov.* Schol) 9.0/9/28 ton o / i 5 W Suggeilad change* liom E*t«rn«l Ravlaw lo PREP TEAM lor potabl* action (changes). 6 0 / 1 ' DESKTOP PUBLISHING UNIT Word prooen Ravlaw an l« . changes lo produce lecond draft lot external review. SO/10/12 ' 8 0 / 1 0/22 10 JONT REVIEW o( lha 3 %alt o( l u r n i ; 3 day mlg. raq'd. Sal.ASun.* Mon. or FiL conabttng of 4 Reviewer* and 4 Pr apart. DRAFT 80/10/27 8QM 0/1H DESKTOP PUBLISHING UNIT Word procsft 'Review team changea' lo product Internal Review draft. 10/11/8 8 0 / 1 1 / 1 ? DRAFT 1 0 KTERNALREVEW SAB N.B. The OTPU wl l alao ba Invorved processing changes as lhay arise >l this lima. Because of lima, only I ia( of eiisns can ba processed with these daadlnaa, peihaps tha other 2 should ba staggered lor the two weeke following. 90/1 1112 TRIAL WRITINO ol 3 sets of t m r n i (J60 par exam) 9 0 / 1 1/1 v 9 0 / 1 1 / 1 6 80/1 1/1 9 CUE ENS PRINTER Cameta-ready copyUi ol ONE (Ihiaa) eel(>) ol i m m i sent l o O P (or Blue-line production. 9B/U/10 Appendix B Materials for Construction Team 81 BIOLOGY 12 - PROVINCIAL EXAMINATION - 1990 TABLE OF SPECIFICATIONS TOPIC CURRICULUM GUIDE REFERENCE COGNITIVE LEVEL Knowledge Understanding and Application Higher Mental Processes TOTAL P M R E I t a n 1. H N C OD 1 D P S L E S I Experimental Design II Homeostasis —*4&- "10 L S in IV v VI Cell Compounds Ultrastructure Ultraprocesses Photosynthesis -49--(00% H U 3. M A N S S b 1 B Kt VII Cells, Tissues and VTJJ Digestive LX Circulatory X Nervous •XI Excretion and Respi-XII Endocrine lo 33 c O H P T I 0 8. N 9. S O o s E T W O I-Immunology II Skeletal System and III Reproduction and Embryology IV Genetic Disorders & Engineerin; V Cancer VI Sensory Receptors -to-2 5* -20-TOTAL % ,30, 103 ^ o c , ^ 3 % multiple-choice questions —48% written response questions-* METHODS AND PRINCIPLES: Questions related to Experimental Design and Homeostasis will be embedded within topics 2 and 3 in tlx above table, and account fcr 17. 4-G4r-of the consent of the exam. Ao n result horizontal addition of lopicr, 2 and 3 yield 29jt and 51ft rcsp:aively. 3 Biology Page 5 TABLE 5A MULTIPLE-CHOICE SUMMARY JANUARY 1987 N = 1,924 Item No. 1 2 3 4 5 6 7 8 9 10 11 -i-3--re-3-17 18 19 20 21 Topic and Cog. L e v e l 1 IK IK 1U IK IK 1H 2U 2U IK 3K 3K -MJ--5*-4K 4K 4U 7K 7U % of Students Choosing Alternatives 2 73* 16 24 4 0 46* 12 9 40* 4 14 -2-4— -44--29-41 43* 62* 5 12 B 4 11 72* 88* 3 33 4 41 17 34 4 -±6— -3-2-6 23 14 71* 74* D 10 21 2 5 74* 13 82* 4 20 48* 12 -52*--fr2*-15 9 21 14 7 13 53* 2 2 22 8 2 45* 22 13 70* 38* 24 2 10 7 Item No. Topic and Cog. Level 1 % of Students Choosing A l t e r n a t i v e s 2 A B C D 22 7U 33* 6 16 44 23 7U 38* 9 8 45 24 7U 31 9 57* 4 25 7U 9 71* 5 15 26 7U 4 11 54* 30 27 7H 17 3 10 70* 28 7H 13 80* 5 2 29 6U 68* 4 6 22 30 8U 7 75* 14 3 31 9U 56* 8 34 2 32 8K 3 7 87* 3 33 8U 14 15 6 65* 34 8K 14 14 63* 9 35 8K 8 10 13 69* 36 9U 11 8 33 49* 37 9K 71* 15 8 5 38 9U 77* 8 14 1 39 10U 48 39* 6 7 40 10K 1 25 66* 8 41 10X 79* 8 10 3 42 10U 10 14 34 42* See p. 6 for 43-70 (Optional Section) *correct response (The % students choosing the correct response indicates the d i f f i c u l t y level of the question.) 'These codes represent topics and cognitive levels l i s t e d i n Table 4. For further discussion of topics and cognitive levels, see Page 1. Percentages are based on a l l students who responded to the item. 3 Item 16 was deleted at the reconLnendaticn of the markers. Biology Page 7 TABLE 53 hULTIPLZ-CSOICZ SUMMARY JUNE 1987 N = 7,288 Itera No. Topic and Cog. Level 1 % of Students Choosing A l t e r n a t i v e s 2 A B C D 22 7U 1 78* 7 14 23 7U 68* 11 4 16 24 7K 68* 15 14 3 25 7U 6 11 51* 32 26 • 7U 29* 21 26 24 27 7K 3 3 55* 39 28 7U 55* 7 15 23 29 8U 13 51* 28 8 30 8U 10- 63* 18 4 31 8U 48* 5 19 27 32 8K 17 78* 1 4 33 8K 17 19 22 42* 34 8U 6 25 65* 4 35 9U 1 8 71* 19 36 9K 75* 7 9 9 37 © 7 13 3 14 70* 38 9U 20 47* 10 23 39 10U 3 77* 18 2 40 10K 2 2 86* 10 41 10U 6 89* 4 1 42 10U 71* 11 9 9 See p. 6 for 43-70 (Optional Section) •'•'correct response (The % students choosing the correct response indicates the d i f f i c u l t y l e v e l of the question.) 'These codes represent topics and cognitive levels l i s t e d in Table 4. For further discussion of topics and cognitive l e v e l s , see Page 1. Percentages are based on a l l students who responded to the itera. 3 Item 18 was deleted at the recommendation of the markers. Biology Page 5 TABLE 5A HULTIPLE-CHOICE SUMMARY JANUARY 1988 N = 2,139 Topic % of Students Topic % of Students Item and Choosing Alternatives 2 Item • and Choosing A l t e r n a t i v e s 2 No. Cog. No. Cog . L e v e l 1 A B C D Level 1 A B C D 1 2U 64* 27 4 5 27 6K 9 68* 19 5 1 4- 1U 10 45* 39 6 28 6K 11 19 28* 42 3 IK 8 3 10 79* 29 6K 36* 38 16 10 4 1U 7 3 84* 5 30 6U 7 59* 4 29 5 1H . 61* 10 21 9 31 6U 12 22 7 59* 6 1U 11 65* 14 10 32 7U 35 45* 11 9 7 IK 7 7* 10 10 3 33 7U 4 2 92* 3 8 2U 68* 4 3 25 34 7K 5 0 6 89* 9 2U 2 65* 20 12 35 7K 6 23 18 53* 10 3H 42* 49 3 6 36 7U 83* 4 11 3 11 3K 15 65* 7 13 37 7K 13 10 58* 19 12 3U 45* 14 21 21 38 . 8K 3 87* 3 7 13 3K 3 2 79* 16 39 ' 8K 3 6 81* 9 14 3U 12 4 15 70* 40 10U 32 7 50* 12 IS 3U 5 14 13 68* 41 10K 20 1 1 78* 16 4U 10 70* 8 12 42 10U 5 77* 5 13 17 4U 13 45* 11 31 43 10U 9 5 70* 17 18 4U 69* 24 4 3 44 10K 54* 21 17 8 19 4U 6 59* 24 10 45 10X 17 11 21 51* 20 4K 19 8 17 56* 46 10K 1 10 88* 1 21 4U 9 77* 2 11 47 10U 59* 21 6 14 -2-2— - -r *r 1 0 -50*- -20- 48 9U 2 12 67* 19 I J I i -23 r n i 1 / —79*- — 2 - 49 9K 9 12 51* 29 OiJ " ± H *" 1' *7 N I -I 50 9U 22 51* 18 9 jy*' 10 J / -2-5 *~n -26 4 -6«*- 51 9U 72* 19 3 6 i o 2 6 — A S . J, 0 - 7 — R~ 52 9U 2 13 16 68* 0 '•'•'correct response (The % of students choosing the correct response indicates the d i f f i c u l t y level of the question). 'These codes represent topics and cognitive levels l i s t e d i n Table 4. Percentages are based on a l l students who responded to the item. B io 1 ogy Page 7 TABLE 5B MULTIPLE-CHOICE SUMMARY JUNE 1988 N = 7,992 I tern No. Topic and Cog. L e v e l 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1-5-1-6-i-7~ 18-1-9-20-24-TT -26-26 2 LI IK 1U 1U IK 1U IK 2H 2K 3U 3U 3U 3U 3K -5U--5-U-5K-—5K-—5K-5U-5U-6K 1 of Students Choosing Alt e r n a t i v e s 2 3 53* 8 7 8 2 3 80* 5 3 86* 8 57* 67* —6&-—12-—~3-—16— — 6 -9-—3r*-18 3 34 10 61-9 22 11 7 41-' 27" 10 •64* 19 4 -4.4--6-9-— 9 --76'-'< —2-5-—3-0--g-7--4-8-68* D 84-' 3 72-31 11 75--' 81-4 41 8 2 4 13 18 -i-5--47'' —1-8-6-— 8 -—16--22--21-—4--58* -13 10 10 10 1 72* 1 4 8 13 63 3 24 11 11 — 3 ^ —te — T f l * —1-1-—53* —M--61* -8-3* - 44 -Item , No. . Topic and Cog. Level 1 % of Students Choosing Alterna' r i v e s 2 A B C D 27 6K 81* 8 4 6 28 6U 8 9 11 72* 29 6U 6 75* 14 5 30 7U 8 73* 12 7 31 7K- 4 11 4 82* 32 7U 60* 16 5 19 33 7U 23 21 46* 10 34 7U 7 17 14 62* 35 7K 25 2 69* 3 36 7K 24 12 62* 2 37 7U 4 15 70* 12 38 8U 74* 14 8 4 39 . 8K 68* 12 11 9 40 8K 22 62* 8 6 41 8K 14 11 15 60* 42 8U 61* 14 15 10 43 9K ' 6 11 38 45* 44 9U 7 54* 7 31 45 9U 67* 12 15 6 46 9K 5 6 84* 5 47 10U - 83* 4 8 4 .48 10U 61* 15 19 5 49 10K 8 4 81* 8 50 10U 19 36* 34 10 51 10U 5 32 7 56* 52 10U 16 7 28 49* •'•'correct response (The % of students choosing the correct response indicates the d i f f i c u l t y l e v e l of the question). ^hese codes represent topics and cognitive levels l i s t e d i n Table 4. Percentages are based on a l l students who responded to the item. Appendix C Examination Forms FORM I NAME: NUMBER: SCHOOL: BIOLOGY 12 GENERAL INSTRUCTIONS 1. Write your name and school in the space provided above. 2 . DO NOT WRITE YOUR NAME ON THE ANSWER SHEET. 3. Ensure that the Form Number (I or II) on this booklet and the answer sheet are the same. 4. Ensure that the I.D. Number (1 to 600) on answer sheet i s the same as on the exam booklet. 5. When writing the second form (I or II) ensure that the I.D. Number i s the SAME I.D. as on the f i r s t form. (I or II) 6. When instructed to open this booklet, check the numbering of the pages to ensure that they are numbered i n sequence from page 1 to the l a s t page which i s i d e n t i f i e d by END OF EXAMINATION. 7. At the end of the examination, place your answer sheet inside the front cover of this bookelt and return the booklet and your answer sheet to the supervisor. Value: 52 marks (one mark per question) Time: 40 minutes INSTRUCTIONS: For each question, select the BEST answer and record your choice on the answer sheet provided. Using an HB pe n c i l , completely f i l l in the c i r c l e that has the le t t e r corresponding to your answer. Page - 1 Use the following diagram to answer question 1. I h r e e - l e l l e r cocons of cessenger RWA and tht « no acifis s p e c i f i e d by the codons . A A U A A C AAA ) ) Asparag.nr Lysine C A U C A C CAA C A C > > H.H<0-AC G A U GAC G A A GAG ) A i p i M . c acid G-uun'C acd U A U UAC UAA UAG ) ) (7eirrw\a:c']' ACU ACC AC A ACG i C C ' J ccc C C A CCG I P<0'-<nc CC'J GCC G C A G C O I Ai j r . .n ( UCJ uCC U C A f A j J AG A AGO } ) Arj*w\e C C J C C C CGA CGG 1-\ GGu GCC G G A CGG 1 uGu UGC U C A L- 'GG I (Teim*nj:o*J* AUU A-_C AUA } } Cuu CL.-C CUA CuG \ Cuu C-jr-G'JG 1 UCTJ u-JC U U A uuG > ) • I t inv . C<*3 Ol 11*4 IC«fr\jl •o« Ol Using the above table of codons, which amino acid sequence below i s formed from the DNA strand? A T A C G A C A A G C C A. Tyrosine-alanine-valine-arginine. B. Tyrosine-arginine-leucine-alanine . C. Methionine-alanine-leucine-alanine. D. Methionine-arginine-glutamic acid-cysteine. Use the following information to answer 2. glucose + fructose ^ sucrose + water The above reaction represents A. reduction. B. synthesis. C. hydrolysis. D. denaturation. Page - 2 Use the following diagram to answer question 3. 1 The box which indicates a single nucleotide i s A. 1. B. 2. C. 3. D. 4. Which of the following would describe the t e r t i a r y structure of a protein? A. A chain of amino acids in a linear sequence. B. A molecule whose specific shape i s s p i r a l . C. A molecule characterized by side chains of aminoacids. D. A molecule whose spe c i f i c shape i s determined by fold i n g back on i t s e l f . Use the following table to answer question 5. 1. CELL MEMBRANE 2. CELL WALL 3. CHLOROPLAST 4. NUCLEAR MEMBRANE 5. RIBOSOME 6. MITOCHONDRION Which of the above structures are found in eukaryotic c e l l s BUT NOT i n prokaryotic cel l s ? A. B. C. D. 1, 1, 2, 3, 3, 2, 4, 4, 5. 5. 6 . 6. Page - 3 90 The 5-carbon compound that reacts with carbon dioxide i n the Calvin cycle i s A. PGA (phosphoglyceric acid). B. RuBp (ribulose biphosphate). C. PGAIi (phosphoglyceraldehyde). D. NADP (nicotinamide adenine dinucleotide phosphate) . Use the following diagram to answer question 10. GRAPH x GRAPH Y product product Progress of Reaction Progress of React ion E a = energy o f ac t i v c l i E a i s greater i n graph X than in graph Y. The BEST explanation for this i s that the energy of a c t i v a t i o n i n Y i s A. raised by the addition of an enzyme. B. lowered by the addition of an enzyme. C. raised by the addition of more substrate. D. lowered by the addition of more substrate. Page - 5 91 Use the following graph to answer question 11. RATE OF REACTION vs. SUBSTRATE CONCENTRATION (AT OPTIMUM TEMPERATURE) SUBSTRATE CONCENTRATION -The above graph contains "data on the interaction of an enzyme at its optimum temperature with varying amounts of substrate. 11. Which statement BEST explains why the graph r i s e s at C? A. More enzyme was added. B. More substrate was added. C. The temperature was lowered. D. A competitive i n h i b i t o r was added. 12. The c e l l membrane surrounds some molecules i n solution and takes them into the c e l l . This i s an example of A. d i f f u s i o n . B. pinocytosis. C. phagocytosis. D. f a c i l i t a t e d transport. 13. During yeast c e l l fermentation, pyruvic acid i s converted to A. alcohol. B. glycogen. C. l a c t i c acid. D. active acetate. Page - 6 Use the following diagram to answer question -1'4 . X . Y It solution to salt and water If the experimental set up i l l u s t r a t e d above were allowed to s i t for 5 hours, what net movement of s a l t and water would occur? A. S a l t moves from Y to X; water moves from X to Y. B. S a l t moves from X to Y; water moves from Y to X. C. S a l t moves from Y to X; water moves from Y to X. D. Sal t moves from X to Y; water moves from X to Y. Use the following diagram to answer question 15. In aerobic respiration, most of the ATP is produced at A. 1. B. 2. C. 3. D. 4. Page - 7 16- The difference between active transport and f a c i l i t a t e d transport i s that ONLY A. active transport requires ATP. B. f a c i l i t a t e d transport requires ATP. C. active transport uses carrier molecules. D. f a c i l i t a t e d transport uses c a r r i e r molecules. 17. Which of the following occurs in BOTH c y c l i c and noncyclic photophosphorylization? A. O2 i s produced. B. ATP i s produced. C. C0-> i s broken down. D. H26 i s a hydrogen donor. Use the following diagram to answer question 18. CO, Visible licht Glucose 18. The name given to the pathway labelled "1" i s A. hydrolysis. B. Krebs cycle. C. Calvin cycle. D. photophosphorylation. 19. The production of glucose occurs during A. aerobic respiration. B. anaerobic respiration. C. the carbon dioxide-reducing reactions. D. the light-capturing reactions. 20. Which of the following i s c l a s s i f i e d as an organ? A. Bone. B. Skin. C. Blood. D. Muscle. Where i n the digestive tract does the chemical digestion o protein begin? A. B. Mouth. Stomach. C. Small intestine. D. Large intestine. Protein i s digested by A. li p a s e . B. tryps i n . C. amylase. D. secretin. Which enzyme catalyses the following reaction? fat droplets + water —-—^ glycerol + fatty acids A. Lipase. B. Pepsin. C. Maltase. D. Trypsin. An example of chemical digestion i s A. chewing. B. absorption. C. hydrolysis. D. p e r i s t a l s i s . Which of the following i s NOT a component of gastric juice A. Water. B. Pepsin. C. Amylase. D. Hydrochloric acid. Blockage of the b i l e duct would MOST LIKELY A. decrease b i l e production. B. a f f e c t the digestion of fat. C. rai s e the pH of the duodenum. D. decrease the quantity of feces. Page - 9 Use the following diagram to answer question 27. X 27. Identify the structure labelled X i n the above diagram. A. Cecum. B. Duodenum. C. Cardiac Sphincter. D. Pyloric sphincter. 28. Which ion is necessary for blood clotting? A. Iron.. B. Sodium. C. Calcium. D. Chloride. 29. Which of the following are absorbed by lacteals? A. Starches. B. Fatty acids. C. Amino Acids. D. Monosaccharides. Page - 10 MOST of the carbon dioxide transported to the lungs by the ci r c u l a t o r y system i s i n the form of If a person with blood group A receives type AB blood in a transfusion, which of the following occurs? A. Agglutination. B. Cl o t t i n g . C. Erythroblastosis. D. No reaction. Which blood vessels have the greatest surface area to volume ratio? A. Veins. B. Venules. C. A r t e r i o l e s . D. C a p i l l a r i e s . In a blood pressure reading of 120/80, 80 represents the pressure A. i n the c a p i l l a r i e s . B. i n the veins. C. when the ventricles have relaxed. D. when the ventricles have contracted. A. B. C. D. carbonic acid, bicarbonate ions. calcium carbonate, carbonic anhydrase. Page - 11 97 Use the following diagram to answer question 34. z 34. Which of the following i s true of blood velocity? A. Velocity i s fastest at Z, slowest at X. B. Velocity i s fastest at Y, slowest at Z. C. Velocity i s fastest at X, slowest at Y. D. Velocity is- fastest at X, slowest at Z. 35. Which of the following i s associated with increased heartbeat and breathing rate? -A. Aldosterone. B. Adrenalin. C. Acetylcholine. D. Cholinesterase. 36. What type of neuron transmits an impulse from a receptor to the CNS (central nervous system)? A. Motor. B. Efferent. C. Sensory. D. Interneuron. 37. The correct pathway of an impulse passing through a r e f l e x arc i s A. receptor, sensory neuron, motor neuron, effe c t o r . B. effector, sensory neuron, motor neuron, receptor. C. sensory"neuron, motor neuron, effector, receptor. D. motor neuron, receptor, sensory neuron, effector. V = c.e - 12 38. Which of the following describes an effect of alcohol on individual? A. Narcotic. B. Stimulant. C. Depressant. D. Hallucinogen. 39. The control of hearbeat, breathing, and reflex a c t i v i t i e s l i e s with the A. cerebrum. B. thalamus. C. cerebellum. D. medulla oblongata. 40. The sodium/potassium pump in neurons is involved with A. excretion of s a l t s . B. resting potential. C. synaptic transmission. D. threshold le v e l . Use the following diagram to answer question 41. +40 mV 0 mV Y 41. " 5 0 m V ACTION POTENTIAL During which stage in the above graph is energy used? A. W B. X C. Y D. Z Page - 13 Use the following diagram to answer question 42. 42. The structure labelled 2 i n the above diagram i s A. an axon. B. a dendrite. C. a nucleus. D. a node of Ranvier. 43. Receptors i n the breathing centre of the medulla oblongata are stimulated by A. low O? levels. B. high 62 levels. C. low COo leve l s . D. high CO2 levels. 44. Urea i s produced i n the A. l i v e r . B. kidney. C. pancreas. D. large intestine. 45. During expiration (exhalation) the diaphragm A. relaxes and the r i b cage l i f t s . B. relaxes and the r i b cage drops. C. contracts and the r i b cage l i f t s . D. contracts and the r i b cage drops. 46. Which would be an indication of kidney malfunction? A. Salts i n the loop of Henle. B. Glucose i n the glomerulus. C. Hemoglobin in the Bowman's capsule. D. Uric acid i n the collecting ducts. Page - 14 In the kidney, nutrients and salt ions are s e l e c t i v e l y reabsorbed into the A. renal artery. B. c o l l e c t i n g duct. C. afferent ar t e r i o l e . D. peritubular c a p i l l a r i e s . Fluids appear i n the Bowman's capsule as a result of A. reabsorption. B. blood pressure. C. osmotic pressure. D. tubular excretion. If a person has had very l i t t l e to drink on a hot day the p i t u i t a r y gland would respond by secreting more A. adrenalin. B. thyroxin. C. PTH (parathormone). D. ADH (antidiuretic hormone). A hormone is injected into muscle tissue. The muscle c e l l s are observed to increase their rate of c e l l u l a r respiration. This hormone was MOST LIKELY obtained from the A. testes. B. thyroid. C. pancreas. D. posterior pituitary. Hormones travel to their target c e l l s by way of A. v i l l i . B. ducts. C. lacteals. D. c a p i l l a r i e s . Page - 15 Which of the following BEST displays a graph representation of a homeostatic mechanism? Page - 16 FORM II NAME: NUMBER: SCHOOL: BIOLOGY 12 GENERAL INSTRUCTIONS 1. Write your name and school in the space provided above. 2. DO NOT WRITE YOUR NAME ON THE ANSWER SHEET. 3. Ensure that the Form Number (I or II) on th i s booklet and the answer sheet are the same. 4. Ensure that the I.D. Number (1 to 600) on answer sheet i s the same as on the exam booklet. 5. When writing the second form (I or II) ensure that the I.D. Number i s the SAME I.D. as on the f i r s t form. (I or II) 6. When instructed to open this booklet, check the numbering of the pages to ensure that they are numbered i n sequence from page 1 to the last page which is i d e n t i f i e d by END OF EXAMINATION. 7. At the end of the examination, place your answer sheet inside the front cover of this bookelt and return the booklet and your answer sheet to the supervisor. Value: 52 marks (one mark per question) Time: 40 minutes INSTRUCTIONS: For each question, select the BEST answer and record your choice on the answer sheet provided. Using an HB pencil, completely f i l l i n the c i r c l e that has the letter corresponding to your answer. Use the following diagram to answer question 1. Which i s the c o r r e c t r e l a t i o n s h i p ? A. DNA a t X produces mRNA (messenger RNA) which i s used a t W B. DNA a t Y produces mRNA (messenger RNA) which i s used a t X C. DNA a t W i s used to make p r o t e i n at X. D. DNA a t X i s used to make p r o t e i n at Z. A carbon compound has 2 hydrogen atoms f o r every oxygen atom, compound i s a A. f a t . B. p r o t e i n . C. n u c l e o t i d e . D. carbohydrate. Page - 1 104 Use the following chart to answer question 3. BASE COMPOSITION OF A PIG'S THYMUS CELLS X Guanine Y Z 27% 21% . 21% 27% The al>ove chart shows and experimental analysis of the DNA bases contained i n the c e l l s of a pig's thymus. The correct i d e n t i f i c a t i o n of the bases X, Y and Z respectively i s A. adenine, cytosine, thymine. B. thymine, adenine, cytosine. C. cytosine, adenine, thymine. D. cytosine, thymine, adenine. The anticodon on tRNA i s A. i d e n t i c a l to the codon on mRNA. B. i d e n t i c a l to the t r i p l e t code on rRNA. C. complementary to the codon on mRNA. D. complementary to the t r i p l e t code on DNA. DNA molecules isolated from onion and human c e l l s d i f f e r i n their A. type of sugars. B. sequence of bases. C. number of strands. D. order of phosphates. Page - 2 Use the following diagram to answer question 6. Identify the structure labelled Z i n the above diagram. A. Lysosome. E. Ribosomes. C. Golgi apparatus. D. Endoplasmic reticulum. According to the f l u i d mosaic model of the c e l l membrane, A. proteins are scattered throughout a double layer of phospholipid. B. proteins are sandwiched between two layers of c e l l u l o s e . C. phospholipids are sandwiched between two layers of protein, D. phospholipids are scattered throughout a double layer of ce l l u l o s e . Page - 3 1 06 Use the following chart to answer question 8. . X 2% SALT 98% WATER I 92% Y SALT WATER SEMI / - PERMEABLE ME.V.3RANE 8. In the above experimental situation, osmosis i s the term used to describe the movement of A. s a l t from X to Y. B. s a l t from Y-to X. : - -C. water from Y to X. D. water from X to Y. 9 . Which c e l l u l a r function would be interfered with i f lysosomes were destroyed? -A. RNA synthesis. B. C e l l secretion. • C. C e l l digestion. D Packaging of molecules. Use the following information to answer question 10. NAD NADH, SUBSTRATE A > SUBSTRATE B 10. What kind of chemical reaction .dees NAD NADH2 represent: A. Reduction. 3. Synthesis. C. Oxidation. D. ' Phosphorylation. 1 07 Use the following diagram to answer question 11. ^ Y- enzyme 2 z enzyme 1 I Inhibition -(negative feedback) • 11. With reference to the above diagram, when the amount of "Z" increases i n - a - c e l l , enzyme "1" w i l l convert A. less X to Y. B. less Y to X. C. more X to Y. D. more Y to X. 12. The c e l l process which uses energy to bring substances into the c e l l i s A. osmosis. B. d i f f u s i o n . C. active transport. D. f a c i l i t a t e d transport. 13. With a l l other factors constant, which of the following w i l l increase the quantity of the end product of an enzyme-catalysed reaction? A. Increase the temperature to 100 C. B. Increase the amount of the substrate. C. Maintain the pH. D. Maintain the reaction at body temperature. 14. The Calvin cycle reactions i n photosynthesis produce A. ATP. B. water. C. oxygen. D. carbohydrates. Page - 5 Use the following graph to answer question 15. OXYGEN PRODUCTION DURIXG PHOTOSYNTHESIS vs. TEMPERATURE Amount of released 5 1 0 - 1 5 20 Temperature (°C) 25 According to the above graph, which of the following conditions must be kept constant? A. Temperature and relative humidity. B. Water supply and oxygen concentration. C. Light intensity and carbon dioxide concentration. D. Oxygen and carbon dioxide concentration. Substances which may contribute atoms to an enzyme-catalyzed reaction are called A. i n h i b i t o r s . B. coenzymes. C. apoenzymes. D. heavy metals. The Calvin cycle in photosynthesis uses energy from A. NADPH2 produced only in the thylakoids. B. NADPH2 and ATP produced in the thylakoids. C. NADPH2 and ATP produced in the stroma. D. NAJDPH2 produced only in the stroma. A lowered concentration of environmental carbon d i o x i d e would a f f e c t a plant by A. r a i s i n g i t s rate of respiration. B. lowering i t s rate of respiration. C. r a i s i n g i t s rate of photosynthesis. D. lowering i t s rate of photosynthesis. During the light-capturing of photosynthesis, l i g h t energy i s A. used to make glucose. B. used to make carbon dioxide. C. changed into three carbon sugars. D. transformed into chemical energy. Which one of the following i s an example of a tissue? A. Eye. B. Skin. C. Blood. D. Pancreas. Blockage of the substances leaving the pancreas would A. decrease b i l e production. B. r a i s e the pH of the duodenum. C. decrease the quantity of feces. D. a f f e c t the digestion of protein. The conversion of amino acids to glucose occurs mainly i n the A. l i v e r . B. thymus. C. spleen. D. pancreas. Which of the following would produce the greatest amount of energy per gram? A. Fat. B. Sugar. C. Starch. D. Protein. The correct sequence of structures that food contacts as i t moves along the digestive system i s A. mouth, stomach, l a r g e i n t e s t i n e , small i n t e s t i n e , anus. B. pharynx, small intestine, stomach, l a r g e i n t e s t i n e , anus. C. mouth, esophagus, stomach, small i n t e s t i n e , l a r g e i n t e s t i n e . D. esophagus, pharynx, stomach, la r g e i n t e s t i n e , s m a l l i n t e s t i n e . Page - 7 110 25. The substance secreted by the walls of the stomach that prevents the stomach from digesting i t s e l f is A. - water. B. mucus. C. pepsin. D. gastrin. 26. Which digestive enzyme is incorrectly matched to i t s substrate? A. Lipase - fat. B. Pepsin - protein. C. Trypsin - nucleic acid. D. Salivary amylase - starch. Use the following information to answer question 27. 27. Which number represents the organ that secretes amylase? A. 1. B. 2. C. 3. D. 4. Page - 8 111 28. Deoxygenated hemoglobin i s found in i t s greatest concentration in A. pulmonary veins. B. systemic a r t e r i o l e s . C. pulmonary art e r i e s . D. coronary ar t e r i e s . 29. Failure of the lymphatic system to collect tissue flu i d s results in A. i n f e c t i o n . B. tissue swelling. C. excessive urination. D. growth of adipose tissue. 30. Which of the following are necessary for the production of thrombin \ i n an area of injured tissue? A. F i b r i n and calcium ions. B. F i b r i n and thromboplastin. C. Calcium ions and prothrombin. D. Fibrinogen and thromboplastin. 31. A blood pressure reading of 165/100 is characteristic of A. hypotension. B. hypertens ion. C. a normal resting adult. D. i n s u f f i c i e n t dietary s a l t . 32. Blood plasma with fibrinogen removed is known as A. serum. B. lymph. C. formed elements. D. tissue f l u i d . Page - 9 Which blood vessel i s labelled X i n the above diagram? A. Anterior vena cava. B. Pulmonary trunk. C. Pulmonary vein. D. Aorta. In the fetus, blood passes from the right atrium to the l e f t atrium through the A. semi-lunar valves. B. ductus venosus (venous duct). C. foramen ovale (oval opening). D. ductus arteriosus ( a r t e r i a l duct). Which part of the nervous system regulates body temperature? A. Thalamus. B. Cerebellum. C. Hypothalamus. D. Cerebral cortex. A sensory neuron carries information A. to muscles. B. from the central nervous system. C. to the central nervous system. D. between interneurons. Page - 10 1 13 37. When f r i g h t e n e d , a person's p u p i l s i z e i n c r e a s e s . The p a r t o f t h e n e r v o u s s y s t e m t h a t c o n t r o l s t h i s r e s p o n s e i s t h e A. sensory nervous system. B. somatic nervous system. C. sympathetic nervous system. D. parasympathetic nervous system. 38. Sodium ions are moved out of a neuron by A. passive transport. B. d i f f u s i o n . C. osmosis. D. active transport. 39. The type of sensation experienced as a result of a nerve impulse depends on the A. area of the brain stimulated. B. part of the spinal cord that is stimulated. C. strength of the impulse. D. type of effector involved. 40. Damage to the o c c i p i t a l lobes of the brain may result i n impaired A. smell. B. speech. C. v i s i o n . D. hearing. Page - 11 114 Use the f o l l o w i n g diagram to answer q u e s t i o n 41. 41. Which number represents an e f f e c t o r ? A. 1 B. 2 C. 3 D. 4 42. Which of the f o l l o w i n g causes human lungs t o i n f l a t e ? A. C o n t r a c t i o n of abdominal muscles. B. C o n t r a c t i o n of the diaphragm. C. R e l a x a t i o n of the diaphragm. D. R e l a x a t i o n of i n t e r c o s t a l muscles. 43. Below the pharynx, which i s the c o r r e c t descending anatomical order? A. Larynx, e p i g l o t t i s , trachea, a l v e o l i . B. E p i g l o t t i s , trachea, larynx, a l v e o l i . C. E p i g l o t t i s , larynx, trachea, a l v e o l i . D. Larynx, trachea, e p i g l o t t i s , a l v e o l i . 44. If a i r p l a n e passengers were to experience a lower l e v e l of atmospheric oxygen, what response would occur immediately? A. The heart r a t e would decrease. B. C a l c i t o n i n l e v e l s would increase. C. The b r e a t h i n g r a t e would increase. D. More hemoglobin would be produced. Page - 12 1 1 5 45. After prolonged sweating and l i t t l e water intake, youR body would respond by A. secreting more ADH. B. reducing aldosterone levels. C. increasing urine production. D. decreasing c o l l e c t i n g duct permeability. 46. Which of the following i s the BEST indicator of abnormal kidney function? A. Glucose i a the urine. B. Protein i n the urine. C. Hormones i n the urine. D. Electrolytes i n the urine. 47. Hemoglobin pigments are excreted by the' A. l i v e r . B. marrow. C. spleen. D. pancreas. Use the following diagram to answer question 48. 3 48. Pressure f i l t r a t i o n occurs at the region labelled A. . 1 B. 2 C. 3 D. 4 Page - 13 116 49. Damage to the posterior pituitary gland would have an ef f e c t on A. adrenal glands. B. kidney function. C. the pancreas. D. the thyroid gland. 50. Which of the following glands has both an endocrine and an exocrine function? A. Adrenals. B. Thyroid. C. Pit u i t a r y . D. Pancreas. 51. Deamination of amino acids produces ammonia which is then converted to A. urea. B. protein. C. creatinine. D. b i l e . 52. Which hormone is a secretion of the adrenal medulla? A. C o r t i s o l . B. Adrenalin. C. Aldosterone. D. Testosterone. END OF EXAMINATION Page - 14 Appendix D Request and Permission Forms CONSENT FORM I, Superintendent of School District consent to the involvement of Biology 12 students in the examination equivalency study as outlined in the letter from Peter MacMillan. Peter MacMillan has permission to contact any or all Biology 12 in the district requesting their participation. It is the teachers' professional decision as to whether or not they wish their classes to participate. 125 Appendix E Teacher Administration Instructions ADMINISTRATION INSTRUCTIONS TO THE TEACHER Advance I n f o r m a t i o n Students may be t o l d ahead of time that they w i l l be w r i t i n g two forms o f an examination that i s e q u i v a l e n t to the multiple, c h o i c e s e c t i o n of the Biology 12 government examination. Students should be t o l d one c l a s s i n advance o f when an exam w r i t i n g w i l l occur. Students may be g i v e n any information about the study t h a t you wish to g i v e BUT only a f t e r the second form has been w r i t t e n . I f you or the students wish any further i n f o r m a t i o n or have any comments about the study, please send the requests or comments to me when the answer .'sheets are being returned. P r i o r to the Examination Please check that you have equal numbers of FORM I and FORM II of the examination. Feel free to xerox e x t r a exam copies i f necessary. E x t r a answer sheets have been i n c l u d e d with the package. ADMINISTRATION INSTRUCTIONS Your school has been s e l e c t e d by l o t to write e i t h e r FORM I or FORM II f i r s t . Please ensure that a l l students i n a l l c l a s s e s i n your school write the two forms i n the sequence s e l e c t e d f o r you. Please attempt to coordinate the c l a s s e s i n which the forms are w r i t t e n so that d i s c u s s i o n between c l a s s e s i s minimized. The other form should be w r i t t e n as soon as f e a s i b l e a f t e r the f i r s t form; back, to back or w i t h i n two to three c l a s s periods would be i d e a l . At the Time of A d m i n i s t r a t i o n Give the students answer sheets and t e l l them TO PUT NAMES ON THEM BUT NOT TO BUBBLE IN THEIR NAMES. Review d i r e c t i o n s f o r marking bubble sheets found on side 2 of the answer sneet. Have students bubble i n the sex, grade, b i r t h date columns. DO NOT bubble i n any I.D. number even i f you are in t e n d i n g to mark the sheets using a scanner i n your s c h o o l . IF YOUR SCHOOL IS SELECTED TO WRITE FORM I FIRST, BUBBLE IN THE *0* BUBBLE OF SPECIAL CODE K. IF YOU ARE WRITING FORM II FIRST, BUBBLE IN THE '0' BUBBLE OF SPECIAL CODE L. ADMINISTRATION INSTRUCTIONS WHEN WRITING FORM I BUBBLE IN ANSWERS 1 TO 52. WHEN WRITING FORM II BUBBLE IN ANSWERS 101 TO 152. The students should read the Student I n s t r u c t i o n Sheet r e p l a c i n g g e n e r a l i n s t r u c t i o n s 2. to. 5. when a p p r o p r i a t e . A l l other r u l e s f o r examination w r i t i n g s h a l l be the.same as for the B i o l o g y 12 government examination e.g. c a l c u l a t o r usage, scrap paper, teacher non-assistance. The e x c e p t i o n being that the students are supervised by t h e i r classroom teacher. - ' A f t e r A d m i n i s t r a t i o n of F i r s t Form C o l l e c t a l l t e s t booklets with answer sheets p l a c e d i n the booklets. Check f o r name but remove any bubbled i n names on answer sheets. Separate and s t o r e examination booklets and answer sheets i n a secure l o c a t i o n . DO NOT RETURN, OR MARK, OR DISCUSS THE EXAM AT THIS TIME! A d m i n i s t r a t i o n of the Second Form student answer sheets to the o r i g i n a l owners, same a d m i n i s t r a t i o n procedure as f o r the f i r s t Return the Follow the f orm. Appendix F Equipercentile Equating Tables 1 30 13 Sep 90 SPSS -X RELEASE 3.0 FOR IBM MTS 2 2 : 0 1 : 3 2 U n i v e r s i t y o f A l b e r t a SCOREA VALID CUM VALUE LABEL VALUE FREQUENCY PERCENT PERCENT PERCENT 9 1 . 3 . 3 . 3 10 1 3 . 3 . 7 12 2 . 7 . 7 1 . 4 13 4 1 . 4 1 . 4 2 . 8 14 4 1 . 4 1 . 4 4 . 2 15 7 2 . 4 2 . 4 6 . 6 1G 5 1 . 7 1 . 7 8 . 4 17 9 3 . 1 3 . 1 1 1 . 5 18 15 5 . 2 5 . 2 16 . 8 19 9 3 . 1 3 . 1 19 .9 20 7 2 . 4 2 . 4 22 . 4 2 1 16 5 . 6 5 . 6 28 .0 22 1 1 3 . 8 3 8 3 1 . 8 23 8 2 . 8 2 8 34 . 6 24 9 3 . 1 3 . 1 37 . 8 25 1 1 3 8 3 8 4 1 . 6 26 9 3 1 3 1 44 8 27 9 3 . 1 3 . 1 4 7 . 9 28 9 3 . 1 3 . 1 5 1 0 29 1 4 4 . 9 4 , 9 55 . 9 30 6 2 . 1 2 . 1 58 .0 3 1 1 2 4 . 2 4 2 62 2 32 10 3 . 5 3 5 65 . 7 33 8 2 . 8 2 8 68 5 3d 1 2 4 . 2 4 , 2 72 . 7 35 5 1 . 7 1 . 7 74 5 36 3 1 .0 1 0 75 . 5 37 7 2 . 4 2 4 78 0 38 8 2 . 8 2 8 80 . 8 39 9 3 . 1 3 . 1 83 . 9 40 6 2 . 1 2 . 1 86 . 0 4 1 8 2 . 8 2 8 88 . 8 42 7 2 . 4 2 4 9 1 . . 3 43 2 . 7 . 7 92 .0 44 6 2 . 1 2 . 1 94 . 1 45 5 1 . 7 1 , . 7 95 . 8 46 4 1 . 4 1 . . 4 97 . 2 47 2 . 7 7 97 9 48 5 1 . 7 1 7 99 7 5 1 1 . 3 3 100 .0 "AL 286 100 . 0 100 .0 PERCENT ILE VALUE PERCENTILE VALUE 1.00 1 1 . 7 4 0 100.00 VALID CASES 286 . MISSING CASES O 131 13 S e p 9 0 S P S S - X R E L E A S E 3 . 0 FOR IBM MTS 2 2 : 0 1 : 3 2 U n i v e r s i t y o f A l b e r t a SCOREB VALID C U M V A L U E L A B E L V A L U E F R E Q U E N C Y P E R C E N T P E R C E N T P E R C E N T 12 3 1 . 0 1 . o 1 . 0 14 1 3 3 1 . 4 15 3 1 . O 1 . O 2 . 4 16 4 1 . . 4 1 . 4 3 .8 1 7 3 1 . . 0 1 0 4 . .9 18 7 2 4 2 . 4 7 . . 3 19 15 5 . . 2 5 . 2 12 . .6 2 0 1 1 3 . 8 3 .8 16 . 4 2 1 10 3 . .5 3 .5 19. 9 22 9 3 . 1 3 . 1 2 3 . 1 2 3 15 5 . 2 5 2 2 8 . .3 24 19 6 . 6 6 . 6 3 5 . .0 25 6 2 . 1 2 . 1 37 . 1 26 9 3 . 1 3 . . 1 4 0 . 2 27 12 4 . 2 4 2 44 . 4 28 7 2 . 4 2 . 4 4 6 . 9 29 9 3 . 1 3 . 1 5 0 . .0 3 0 1 1 3 .8 3 . .8 5 3 . 8 3 1 10 3 . 5 3 . . 5 57 . . 3 32 14 4 .9 4 . .9 6 2 . 2 3 3 7 2 . 4 2 . 4 64 . . 7 34 5 1 . 7 1 . .7 6 6 .4 35 15 5 . 2 5 . 2 7 1 . . 7 36 1 1 3 . 8 3 . 8 7 5 .5 37 3 1 . 0 1 . 0 7 6 . .6 38 10 3 . 5 3 . 5 8 0 . 1 39 6 2 . 1 2 . 1 8 2 . 2 4 0 9 3 . 1 3 . 1 8 5 . 3 4 1 8 2 . 8 2 .8 8 8 . 1 42 7 2 . 4 2 . 4 9 0 . .6 43 5 1 . 7 1 . 7 9 2 . 3 44 2 . 7 . 7 9 3 . 0 4 5 3 1 . 0 1 . 0 94 . 1 46 3 1 . 0 1 . 0 9 5 . 1 47 5 1 . 7 1 . 7. 9 6 .9 48 4 1 . 4 1 . 4 9 8 . 3 49 2 . 7 . 7 9 9 . 0 5 0 1 . 3 . 3 9 9 . 3 5 1 2 . 7 . 7 100 . 0 ' AL 2 8 6 100 . 0 100 . 0 P E R C E N T I L E V A L U E P E R C E N T I L E VALUE 1 . 0 0 1 2 . 0 0 0 1 0 0 . 0 0 V A L I D C A S E S 2 8 6 M I S S I N G CASES O Table F3 E q u i p e r c e n t i l e (Pooled Variance Estimates) Score A Score B* D i f f e r e n c e 1-8 -9 11 -2 10 12 -2 11 13 -2 12 14 -2 13 15 -2 14 16 -2 15 17 -2 16 18 -2 17 19 -2 18 20 -2 19 21 -2 20 22 -2 21 23 -2 22 24 -2 23 24 -1 24 25 -1 25 26 -1 26 27 -1 27 28 -1 28 29 -1 29 30 -1 30 31 -1 31 32 -1 32 33 -1 33 34 -1 34 35 -1 35 36 -1 36 37 -1 37 37 0 38 38 0 39 0 40 40 0 41 41 0 42 42 0 43 43 0 44 45 -1 45 46 -1 46 47 - 1 47 48 -1 48 49 -1 49 50 -1 50 51 -1 51 52 -1 52 -T a b l e F4 Cumulative P e r c e n t i l e s (CP) and Scores on Form A and Form B CP Score CP Score CP Score A B A B A B 1 11. 0 12. 0 35 23. 2 24. 6 68 32. 8 34. 2 2 12. 5 14. 6 36 23. 5 24. 8 69 33. 3 34. 4 3 13. 2 15. 5 37 23. 7 25. 1 70 33. 5 34. 6 4 13. 8 16. 3 38 24. 1 25. 4 71 33. 9 34. 9 .5 14. 4 17. 0 39 24. 3 25. 6 72 34. 4 35. 1 6 15. 0 17. 7 40 24. 6 25. 9 73 34. 7 35. 5 7 15. 3 17. 9 41 24. 9 26. 1 74 35. 0 35. 8 a 15. 8 18. 2 42 25. 2 26. 5 75 35. 6 36. 1 9 16. 2 18. 4 43 25. 5 26. 7 76 36. 0 36. 6 10 16. 6 18. 6 44 25. 8 27. 0 77 36. 4 37. 0 11 16. 9 18. 8 45 26. 1 27. 4 78 36. 8 37. 3 12 17. 2 19. 0 46 26. 4 27. 7 79 37. 2 37. 7 13 17. 4 19. 3 47 26. 7 28. 0 80 37. 5 38. 0 14 17. 7 19. 5 48 27. 0 28. 3 81 37. 8 38. 4 15 17. 9 19. 7 49 27. 3 28. 6 82 38. 3 38. 8 16 18. 1 20. 0 50 27. 7 29. 0 83 38. 7 39. —> 17 18. 3 20. 3 51 28. 0 29. 2 84 39. 1 39. 6 18 18. 5 20. 5 52 28. 2 29. 5 85 39. 5 39. 9 19 18. 9 20. 9 53 28. 5 29. 7 86 40. 0 40. 3 20 19. 1 21. 0 54 28. 7 29. 8 87 40. 4 40. 7 21 19. 5 21. 3 55 28. 9 30. 3 88 40. 8 41. 1 22 19. 7 21. 6 56 29. 3 30. 6 89 41. 1 41. 5 23 19. 9 21. 9 57 29. 7 30. 9 90 41. 5 41. 8 24 20. 2 22. 1 58 30. 0 31. 2 91 42. 0 42. 3 25 20. 5 22. 3 59 30. 2 31. 5 92 42. 6 43. 0 26 20. 7 22. 6 60 30. 5 31. 7 93 43. 3 43. 8 27 20. 9 22. 8 61 30. 7 31. 9 94 44. 0 44. 6 28 21. 1 23. 1 62 31. 0 32. 1 95 44. 7 45. 4 29 21. 4 23. 2 63 31. 3 32. 4 96 45. 2 46. 3 30 21. 7 23. 4 64 31. 6 32. 9 97 46. 0 47. 0 31 21. 9 23. 7 65 31. 9 33. 1 98 46. 9 48. 8 32 22. 2 23. 8 66 32. 2 33. 6 99 47. 5 49. 0 33 22. 3 24. 0 67 32. 6 34. 0 100 51. 0 51. 0 34 22. 8 24. 2 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0100498/manifest

Comment

Related Items