UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Reporting, grading, and the meaning of letter grades in Science 9 : perspectives of teachers, students.. 1998

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


ubc_1998-345335.pdf [ 19.56MB ]
JSON: 1.0054928.json
JSON-LD: 1.0054928+ld.json
RDF/XML (Pretty): 1.0054928.xml
RDF/JSON: 1.0054928+rdf.json
Turtle: 1.0054928+rdf-turtle.txt
N-Triples: 1.0054928+rdf-ntriples.txt

Full Text

Reporting, Grading, and the Meaning of Letter Grades in Science 9: Perspectives of Teachers, Students, and Parents by Susan R. Brigden B.Sc, University of British Columbia, 1973 Professional Teaching Certificate, University of British Columbia, 1974 M.A., University of British Columbia, 1989 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Department of Curriculum Studies e accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA © Susan R. Brigden, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of C - O ^ V M ^ C O J U u f - C . «STUS> C^S-S The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract This study investigates the reporting and grading, as well as the meaning of letter grades, of students in Science 9 from the perspectives of teachers, students, and parents in five schools from two British Columbia school districts, one urban and one rural. To that end, four research questions guided the data collection and analyses: (1) What reporting methods do teachers use to communicate information about student learning in Science 9 to students and parents, and what are teachers', students', and parents' opinions of those reporting methods? (2) What grading components do teachers incorporate into Science 9 letter grades, and what grading components do students and parents believe teachers incorporate into Science 9 letter grades? (3) What meanings do teachers, students, and parents attribute to Science 9 letter grades? and (4) What are students' and parents' perceptions about some possible effects of student progress reports in Science 9? A mixed-methodology design was employed to collect the data. Quantitative data, collected via self-administered written questionnaires from the five Science 9 teachers, 43 students, and 21 parents who volunteered to participate in the study, were used to identify participants' practices and perceptions about grading and reporting. Qualitative data, collected via individual, audio-taped interviews conducted with a subset of the people who completed questionnaires (all five teachers, 16 students, and seven parents), were used to verify, clarify, and expand the questionnaire data. Observational notes and collected documents (e.g., report card forms) also served as data sources. The results of this study show that most of the participants in the study were generally satisfied with most aspects of the reporting of student progress in Science 9. However, individual teachers consider different kinds of assessment information when they assign Science 9 letter grades, teachers are not always clear and consistent about what they intend letter grades to mean, and students' and parents' beliefs about the grading components and meanings of Science 9 letter grades vary widely. The results pf this study also indicate that the information communicated by a letter grade is not always clear and consistent. That the meaning of a letter grade is not always clear has implications for the ways in which letter grades are used by students and parents. The results of this study indicate that some students' attitudes, behaviours, and decisions could be affected by the grades they receive in Science 9. However, in order for students' attitudes, behaviours, and decisions to be appropriate, their interpretations of the meanings of letter grades must be appropriate. Given the multiple meanings attributed to a Science 9 letter grade, it is likely that peoples' inferences and actions based on a letter grade will not always be appropriate. This study raises a number of issues. Two classes of issues are discussed: those arising from the research findings, and those arising from the methodology of the study. An example of an issue arising from the research findings is that the process of assigning letter grades is problematic. An example of an issue arising from the methodology is that participants do not always interpret questionnaire items in the way they are intended. This study contributes to our understanding of teachers' grading practices with respect to the assignment of Science 9 letter grades, and it provides information about students' and parents' understandings of those grading practices. The study also provides insight into teachers', students', and parents' understandings of the meaning of letter grades. In addition, the results of this study help us understand some possible consequences of reports of student progress from the perspectives of students and parents. Another contribution is a direct result of the methodology of the study — by interviewing a subset of the questionnaire respondents after they had completed the questionnaires, it was possible to learn more about how different people interpreted the questionnaire items; that is, it was possible to explore the internal validity of the study. As a result, this study offers evidence about the value of employing more than one data collection method when conducting research. iii Table of Contents Abstract ii Table of Contents iv List of Tables xi List of Figures xii Acknowledgments xiii CHAPTER 1 Introduction to the Study 1 Background 1 The Purpose and Research Questions of the Study 5 Definition of Terms 8 Delimitations and Limitations of the Study 9 Organization of the Thesis 11 CHAPTER 2 The Personal and Historical Context of the Study 12 A Personal Perspective 12 A North American Perspective...: 16 Introduction 16 American Educational Concerns of the 1950s and the 1960s 16 Educational Reform, Accountability, and Standardized Testing in the 1970s and 1980s 17 Testing and Assessment Under Review 20 A Uniquely British Columbian Perspective 23 The Sullivan Royal Commission and its Legacy for Learners 23 Assessment, Evaluation, and Reporting Under the Year 2000 Program: 26 Public Reaction to the Year 2000 27 Government Reaction to Public Pressure 29 Validity and Assessment 37 Summary 41 iv CHAPTER 3 Related Literature 42 Assessing, Evaluating, and Reporting Student Progress 42 Assessing Student Progress 42 Introduction 42 Purposes of Assessment 43 Assessment Methods 44 Grading Student Progress 45 Introduction 45 Purposes of Grades 46 Types of Grading Systems 46 Letter grades 46 Numerical grading systems 47 Pass-fail system 48 Types of Comparisons Used When Grades Are Assigned 49 Achievement as a basis for comparison 49 Aptitude or ability 52 Effort : 52 Improvement or growth 53 Summary 53 Components of Letter Grades 54 Reporting Student Progress 56 Introduction 56 Purpose of Reports of Student Progress 57 Methods of Reporting Student Progress 57 Formal reporting methods 57 Informal reporting methods 60 Classroom Assessment and Grading Practices: Teacher-Related Research 63 Assessment Methods Used by Teachers 63 Sources of Assessment Devices Used by Teachers 65 Teachers' Measurement Knowledge and Grading Practices 65 Classroom Assessment and Grading Practices: Student-Related Research 68 Students' Perceptions About Grading 68 Impact of Assessment on Students 68 Classroom Assessment and Grading Practices: Parent-Related Research 69 Parents' Perceptions About Grading 69 Summary 69 v CHAPTER 4 Research Method 70 Research Design 70 Study Context 72 Recruiting the Participants 74 Teacher Recruitment 74 Student Recruitment 75 Parent Recruitment 75 Collecting the Consent Forms 76 Data Collection Methods 77 Written Survey Questionnaires 77 Development of the Questionnaires 77 Student and parent questionnaires 78 Teacher questionnaire 80 Description of the Questionnaires 80 Student and parent questionnaires 80 Teacher questionnaire 83 Administration and Distribution of the Questionnaires 84 Interview Schedules 85 Introduction 85 Development and Description of the Interview Schedules 86 Conducting the Interviews 87 Collection of Documents 88 Research Protocol 88 Data Management 88 Excerpt Desi gnations and Presentation 90 Data Analysis 91 CHAPTER 5 The People of the Study 94 Introduction 94 The Classes of the Study 94 Overall Rates of Participation 94 The Classes of Cityside School District 96 Wade Mitchell's Class 97 David Turner's Class 99 Elena Kovac's Class 101 vi The Classes of Whitewater School District 103 Henry Szabo's Class 103 Robert Reid's Class 105 The Teachers, Students, and Parents of the Study 107 The Teachers of the Study 107 Teachers' General Background Information 108 Teachers' Education in Student Assessment and Grading 108 Teachers' Beliefs About the Purposes of Assessment and Grading 113 The Students of the Study 114 The Parents of the Study 118 Summary 123 CHAPTER 6 Reporting Student Progress in Science 9 125 Introduction 125 Methods Used to Report Student Progress in Science 9 126 How Teachers Determine What to Assess and Grade 129 How Teachers Assign Letter Grades 131 Participants' Preferred Reporting Methods 133 Methods Preferred by Teachers 134 Methods Preferred by Students 138 Methods Preferred by Parents 141 Students' and Parents' Satisfaction With Reporting Methods 144 Students' and Parents' Concerns About Assessment and Reporting Methods 147 Students' and Parents' Beliefs in the Accuracy of Science 9 Letter Grades 147 Grading Information Distributed to Students and Parents 151 Summary 155 CHAPTER 7 Components of Science 9 Letter Grades 157 Components of Science 9 Letter Grades: Questionnaire Results 159 Grading Components Incorporated into a Science 9 Letter Grade: Teachers' Reported Practices 159 Students' Beliefs About Science 9 Grading Components 160 Parents' Beliefs About Science 9 Grading Components 162 vii Components of Science 9 Letter Grades: Interview Results 163 Test Results 164 Lab Assignments 168 Homework 171 Effort/Work Habits 176 Participation in Class Activities 181 Behaviour in Class 186 Attendance 191 Project Work 194 Attitude 196 Performance Tasks 200 Notebook 203 Self-evaluation 205 Learning Journal 207 Learning Ability 209 Work Portfolio 213 Summary 215 CHAPTER 8 Meanings Attributed to Science 9 Letter Grades 220 Meanings Attributed to Science 9 Letter Grades: Questionnaire Data 221 Meanings Attributed by Teachers 221 Meanings Attributed by Students 226 Meanings Attributed by Parents 227 Meaning Attributed to Science 9 Letter Grades: Interview Data 228 Letter Grade Compares a Student to Teacher Expectations 229 Letter Grade Reflects the Effort of a Student 232 Letter Grade Reflects the Ability of a Student 237 Letter Grade Shows the Achievement of a Student 242 Letter Grade Shows How Much a Student Improved 246 Letter Grade Compares a Student to Ministry Standards 250 Letter Grade Compares a Student to Other Students 253 Summary 257 viii CHAPTER 9 The Effects of Science 9 Progress Reports: Perceptions of Students and Parents 261 Introduction 261 Effects of Progress Reports on Students' Feelings About Studying Science in School... 261 Effects of Progress Reports on the Amount of Work Done in Science Class 265 Effects of Progress Reports on the Amount of Homework Done in Science 269 Effects of Progress Reports on Students' Confidence in Ability to do Science 273 Effects of Progress Reports on Parent-Child Relationship 276 Effects of Progress Reports on Educational Decisions 279 Effects of Progress Reports on Vocational Decisions 282 Summary 284 CHAPTER 10 Conclusion 286 Introduction 286 Findings Related to the Research Questions 286 Issues Arising From the Research Findings 294 Educational Measurement is Imprecise 294 Assessment is a Subjective Process 295 Multiple Methods Required for Fair and Accurate Assessment 296 Appropriate Methods Required for Fair and Accurate Assessment 297 Assessment Information Must be Linked to Purpose 298 Assessment and Grading Practices Must be Clearly Articulated 299 Process of Assigning Letter Grades is Problematic 301 Judgment is an Integral Part of the Grading Process 304 Meaning Arises From Comparison Basis Used to Assign Letter Grade 305 Consequences of Evaluation can be Positive, Negative, or Neutral 309 Discussion 310 Contribution of the Study 312 Issues Arising Out of the Methodology of Study 313 Implications for Teacher Education and Professional Development 315 Implications for Reporting Student Progress 316 Implications for Further Research 318 References 320 ix Appendix A - Teacher Letter of Explanation 331 Appendix B - Teacher Consent Form 333 Appendix C - Student/Parent Consent Form 337 Appendix D - Student Questionnaire 341 Appendix E - Parent Letter of Transmittal and Questionnaire 351 Appendix F - Teacher Questionnaire 362 Appendix G - Student Interview Schedule •. 371 Appendix H - Parent Interview Schedule 374 Appendix I - Teacher Interview Schedule 377 x List of Tables Table 1 Student and Parent Participation Rates for A l l Classes 95 Table 2 Participation Rates for Wade Mitchell's Class 99 Table 3 Participation Rates for David Turner's Class 101 Table 4 Participation Rates for Elena Kovac's Class 102 Table 5 Participation Rates for Henry Szabo's Class 105 Table 6 Participation Rates for Robert Reid's Class 107 Table 7 Summary of Students' Background Information 115 Table 8 Summary of Students' Letter Grades and Work Habits Ratings 118 Table 9 Summary of Parents' Background Information 120 Table 10 Summary of Children's Letter Grades and Work Habits Ratings 122 Table 11 Science 9 Grading Components: Questionnaire Results 158 Table 12 Agreement of Students and Parents with Science Teacher About Science 9 Grading Components: Questionnaire Data 161 Table 13 Meanings Attributed to Science 9 Letter Grades: Questionnaire Results 222 Table 14 Agreement of Students and Parents with Science Teacher About Intended Meaning of Science 9 Letter Grade: Questionnaire Data 223 Table 15 Comparison of Contrasting Intended Meaning of Letter Grade Statements: Questionnaire Results 225 xi List of Figures Figure 1. Reported range of scores associated with each level of achievement for end of course grades. Source: Administrative Handbook (BCME, 1986) 31 Figure 2. Research protocol 89 Figure 3. Sample teacher transcript excerpt designation 90 Figure 4. Sample student transcript excerpt designation 90 Figure 5. Sample parent transcript excerpt designation 91 Figure 6. Sample interview transcript 93 Figure 7. Information printed on the back of the Cityside District report card form 127 Figure 8. Information printed on the back of the Whitewater District report card form 128 xii Acknowledgments Anyone who reads this document will realize that it was a long time in the making. / realize that it would never have been completed had I not had the support of my friends, fellow graduate students, dissertation committee, family, and, of course, the teachers, students, and parents of the study. Five teachers volunteered for this study. Not only did they answer my questions and provide needed documents, they encouraged their students and their students' parents to volunteer, as well. Of necessity, these generous people must remain anonymous. Nevertheless, I give them my deepest thanks because, without them, there would be no study. Whether they know it or not, all of my friends (and a number of acquaintances) helped me in this endeavour. They listened to my ideas and gave me theirs — everyone has something to say about grading and reporting! There are, however, a few who did more than listen, they helped me pre-test the instruments used to collect the data. So, thanks to you all: Ruth and Heidi Erlandsen; Brenda, Ted, and Samantha Ingram; Janet and Sarah Mauza; Sandy Sovio; Heather Kelleher; and Stuart Brigden. In addition, I must thank my long-time friend, Marie McNeil, who gave me food and lodging those many times when I was too tired to make the long drive home. I owe thanks to another group of friends, as well — the friends I met in grad school. Friends like Cynthia Nicol, Sandra Crespo, Renee Fountain, Heather Kelleher, and Clare Brooks whose phone calls and e-mail messages, like voices in the wilderness, brought comfort to me during those long, lonely periods of isolation and frustration. This has been a long and difficult process for me, and without the help of my committee, I would not have overcome some of the hurdles I encountered. So, I'd like to thank Harold Ratzlaff who, before he retired, reviewed my research proposal and instruments; Tony Clarke, who replaced Harold, and brought with him insightful comments, needed encouragement, and his cheery "G'day"; Ann Anderson, whose empathy and positive attitude made me believe that I had something to contribute and would be able to do so; and, of course, Dave Bateson who is responsible for me becoming a graduate student so many years ago. I first met Dave when he was still a doctoral student. David had volunteered to drive to a rural village, some three hours from Vancouver, to teach an introductory statistics course to graduate students and I ended up taking his course. I was not a graduate student at the time, however. I only took the course because my friends and fellow teachers needed enough people in the course to warrant Dave teaching it. It was as a result of Dave's enthusiasm and extraordinary teaching ability that I became interested in statistics, and assessment and evaluation. It was due to his encouragement that I decided to pursue a master's degree and then a doctorate. Over the years, Dave helped me earn money to pay for my education by giving me the opportunity to work with him on several large-scale assessment projects and recommending me for others. I learned much from these projects, and I learned much from Dave. His patience and encouragement has been continuous. So, Dave, thank you for giving me so many opportunities to learn about assessment and evaluation, higher education, and, most importantly, myself. Without you in my life, I would never have reached a goal I had secretly dreamed of, but never believed I could attain. Finally, I must thank my family. When I started this long journey through the hallowed halls of the Faculty of Graduate Studies, my husband was my family. Along the way we had two sons — Jordan and Matthew. My sons have grown up with a mother who has nearly always been a student — and, for that matter, a father who has been a student for much of their lives, as well. They've seen me through the best of times and the worst of times. And, sometimes, they've not seen me at all. Despite all this, they have grown up to be sons of whom I am very proud. Their constant questions — How's it going today, Mum? Are you nearly finished? When will you be finished? Are you ever going to finish? — made me feel guilty, but kept me going. So, Jordan, it's with love that I say thanks for being patient, for turning your music down when I asked, and for giving me a hug when I needed one. And, Matthew, it is xiii with love that I also thank you for your patience, for helping me make dinner when we needed to eat, and for making me laugh when 1 needed to laugh. To be truthful, of the many people who have supported me throughout this journey, I owe the most to my husband, Stuart. It was he, who after a couple of particularly difficult years in our lives, encouraged me to finish my half-completed master's degree because he knew that I would never have forgiven myself if I hadn't. It was he, who encouraged me to continue on and pursue a doctorate despite the extra financial and parental burdens it would place upon his shoulders. It was he, who listened to my concerns and ideas and read my work, in the morning, when he was barely awake, and late at night, when he wanted to sleep. And, lastly, it was he, with his patience, understanding, and love who gave me the strength, and time, to persevere until I reached my goal. So, thank-you, Stu — you are truly my best friend! xiv CHAPTER 1 INTRODUCTION TO THE STUDY Background Letter grades have been the most common method used to communicate information about student performance, or achievement, in school to students and their parents during the 20th century (Gronlund & Linn, 1990). For this reason, virtually all of us have had some experience with letter grades during our lives. As students, information about our progress in school is summarized and communicated to us via letter grades. As parents, we study the letter grades on our children's report card to determine how well they are doing in school. And, if we are teachers, we struggle to condense what we know about a student's progress in school into a letter grade so that we can report our judgment about that progress to the student and his or her parents. At one time or another, then, nearly all of us have had the occasion to think about how letter grades are determined and what they mean. In British Columbia (B.C.), "reporting refers to the communication among educators, students and parents about student learning" (British Columbia Ministry of Education [BCME], 1993d, p. 12), and formal and informal reports "communicate to parents significant aspects of students' progress in learning ... [and] ... describe, in relation to the curriculum, student progress in intellectual, social, human, and career development" (BCME, 1994a, p. 3). Information communicated to students on their report cards provides them with a summary of their learning progress, can motivate them to work harder, and can help them make educational and vocational decisions (Gronlund & Linn, 1990; Worthen, Borg, & White, 1993). Information communicated to parents on report cards helps them understand how well their children are achieving the intended learning outcomes so that they can help their children with their learning, give them support and encouragement, and help them make educational and vocational decisions (Gronlund & Linn, 1990; Worthen et al., 1993). Information about students' progress in school can be reported to students and their parents in a number of different ways (e.g., letter grades, anecdotal comments, notes, parent- 1 teacher conferences, student-led conferences, telephone calls). Throughout the 1990s, as a result of the Year 2000 initiatives (e.g., B C M E , 1990a, 1990b, 1992a, 1992b) and subsequent B C M E policies, students, their parents, and teachers in B.C. became familiar with a variety of student progress reporting methods. Which methods are most understandable, and meaningful, became a topic of debate for many people in B . C . during the early part of this decade. Much of the debate pitted anecdotal reporting against the more-traditional report cards with letter grades. Throughout 1993, newspapers published articles and letters that strongly favoured the traditional letter grade approach to reporting student progress (e.g., Balcom, 1993a; McCormick, 1993), as well as those that favoured anecdotal reporting (e.g., Smith, 1993; Young, 1993). In the early 1990s, anecdotal reporting was introduced at the primary level when the Year 2000 Primary Program was implemented. At that time, the B C M E planned to have the Year 2000 Intermediate Program implemented by the mid-1990s and the Year 2000 Graduation Program implemented by the late 1990s. In anticipation of the Year 2000 Intermediate Program, some schools issued anecdotal reports in lieu of letter grades at the intermediate level. Predictably, not all parents were pleased with the new reporting method — although some parents seemed to support anecdotal reports at the primary level, others felt they were inappropriate at the intermediate and graduation levels (Bachor & Anderson, 1993b). In late 1993, reacting to the public's dissatisfaction with many of the changes proposed by the Year 2000program, the B.C. government decided not to implement several of the proposed changes and to rescind others that had already been implemented. Of particular significance to this study was the government's decision to replace anecdotal reports with structured written reports for Kindergarten to Grade 3 students, and to require structured written reports and letter grades for Grades 4 to 7 students and letter grades with written comments for students in Grades 8 to 12 (BCME, 1993c). The government abandoned anecdotal reports because parents complained that they could not understand them. Often, parents complained that anecdotals did not show how well their child was doing compared to the other students in the class. Many parents stated they preferred 2 letter grades on their children's report cards — that is, they wanted a reporting method with which they were familiar. Bachor and Anderson (1993b) found "many parents were positively disposed towards grades because grades let parents know where the student stands, the parents are familiar with grades, and grades can be motivational for the student in the sense of something to strive/or"(p. 53,[emphasis in original]). It is no wonder that parents feel comfortable with letter grades — after all, letter grades have been the most common method of grading students throughout this century (Gronlund & Linn, 1990; Worthen et al., 1993). Parents are familiar with letter grades and like them because they believe letter grades let them "know where the student stands"; that is, they believe they know what they mean. But what does a letter grade mean to parents? Or to students? And what does a teacher intend a letter grade to mean? Few studies have been published that examine what parents and students think letter grades mean. Those that have been published (e.g., Friedman & Manley, 1991; Waltman & Frisbie, 1993, 1994), have certain limitations restricting the conclusions which can be drawn from them. For example, when Friedman and Manley (1991) investigated high school grading practices from the perspectives of teachers, administrators, counselors, parents, and students, they did not situate their study in a specific educational context. That is, they did not ask the participants to refer to a specific school subject, reporting period, or grade-level as they completed their questionnaires. Consequently, participants could have referred to a number of different reporting situations as they completed their questionnaires, thereby making it difficult to compare their perceptions. Even though Waltman and Frisbie (1993) situated their study in a specific educational context — fourth-grade mathematics — they only compared parents' and teachers' perceptions about the meaning of letter grades. Students were not included in their study because they believed the primary purpose of a report card grade is to communicate information about student achievement from the teacher to the parent and, because students get on-going feedback from their teachers, they do not "need the information provided by report card grades" (p. 1). In 3 addition, because they wanted "to minimize the possibility that parent responses would be directly influenced by the actual performance level of their fourth-grade child" (p. 5), Waltman and Frisbie administered the questionnaires before the teachers had distributed the first report card; hence, parents were, presumably, indicating not what they thought a letter grade given in mathematics by a particular teacher meant, but what letter grades given in fourth-grade mathematics, in general, meant. Given that the parents in their study may have had varied experiences interpreting fourth-grade mathematics letter grades, and some may not have had any previous experience interpreting such letter grades at all, it is possible that they envisioned a variety of different reporting situations as they completed their questionnaires. If this was the case, then the consistency of the findings may be compromised and difficult to interpret. As Bachor and Anderson (1993b) pointed out, "the assessment of student achievement is a major element of the educational process. In B.C., the Year 2000 initiatives ... [had] a pronounced emphasis on student assessment using a wide array of procedures for the collection of achievement information" (p. 1). Assessment results can be communicated to students and parents in a variety of ways, one of which is through the use of letter grades on a report card; yet, teachers sometimes intend letter grades to mean different things for different students (Brookhart, 1992; Waltman & Frisbie, 1993,1994). If letter grades are to be an effective way of communicating information about student progress, then teachers must be clear and consistent about the meaning of the letter grades they assign, and students and their parents must interpret them in the way the teacher intended — teachers, students, and parents "must have a clear and consistent understanding of what the grade represents" (Waltman & Frisbie, 1993. p. 17). If letter grades are to be an effective way of communicating information, then we need to know what meanings are attributed to letter grades by teachers, students, and parents. If students and parents interpret letter grades in the way the teacher intended, then communication about student progress can be effective and will help improve student learning and educational and vocational decisions. However, if there are differences in the meanings students, parents, and teachers attribute to letter grades, communication may not be effective, student learning may 4 not improve, and poor decisions may be made. Once the meanings of letter grades have been identified and described, teachers, students, and parents can work to improve the quality of communication and, hence, improve student learning. With a better understanding of the meanings attributed to letter grades by students and parents, teachers can then critically reflect upon their assessment and reporting practices to determine whether or not their practices are conveying the information in the way, or ways, in which they were intended — that is, in ways that support student learning. To better understand the meanings that students, parents, and teachers attribute to letter grades, all three stakeholders must be included in educational research. In addition, such research must be placed within a specific educational context with respect to subject, grade-level, and reporting period to ensure that, as much as is possible, participants refer to similar reporting situations as they complete questionnaires and/or participate in interviews. To do this, research must be conducted in a way that enables participants to refer to a specific letter grade assigned for a specific course at a specific time of the year.1 Such is the aim of this study. The Purpose and Research Questions of the Study A report card letter grade is a symbol used to communicate information about student progress in school from the teacher who assigned the letter grade to a number of possible audiences. While it is recognized that letter grades can be used to communicate information about student achievement to school administrators, other teachers, other educational institutions, and, on occasion, potential employers (Gronlund & Linn, 1990; Worthen et al., 1993), it is an assumption of this study that the primary purpose of letter grades is to communicate information about student achievement to students and their parents (or guardians). Communication is the process whereby people attempt to transmit messages, or information, to one another (Deaux & Wrightsman, 1988). To communicate effectively, people 1 I decided to collect my data part way through the Science 9 course, at the end of a term, rather than at the end of the course, because I felt it would be difficult, if not impossible, to get people to participate in a study after the course had been completed and new courses were underway, as is the case for semestered schools, or during the summer holidays, as is the case for non-semestered schools. •:5 must give similar meanings to the words, symbols, and/or gestures used to transmit information (Tait & Wibe, 1992) and they "must share certain beliefs and suppositions that will enable them to coordinate their communicative efforts" (Deaux & Wrightsman, 1988, p. 129). Miscommunication takes place when the receiver of a message misunderstands, does not understand, or misinterprets that which was to have been communicated; that is, miscommunication takes place when the receiver of the information does not attribute the same meaning as that intended by the sender. For a report card letter grade to be an effective means of communicating information about student progress in school, the teacher must be clear about the meaning of the letter grade, and the student and his or her parent(s) must attribute the same meaning to it as the teacher intended. Conversely, if different meanings are attributed to a letter grade by the teacher, student, and parent(s), there will be miscommunication. When a letter grade is misinterpreted, the inferences made about student progress toward the expected learning outcomes of a course based on that letter grade may not be valid. To attribute the same meaning to a letter grade, and thereby make valid inferences and decisions, the teacher, student, and parent(s) should have a common understanding of the processes involved in the generation of the letter grade; that is, the more familiar students and parents are with a teacher's grading practices (e.g., kinds of assessment information included in the letter grade, the emphasis given to each grading component) the greater the likelihood that they will attribute the appropriate meaning to the letter grade, and thereby, make valid inferences and decisions. Information about student progress in a particular course is communicated, via a letter grade, primary for the purposes of improving student learning, and making educational and vocational decisions. If these purposes are to be met, teachers, students, and parents must all attribute the same meaning to a letter grade. In light of the above, the purpose of this study is to identify and describe some of the practices, opinions, and beliefs of the teachers, students, and parents of five different classes located in two different B.C. school districts with respect to grading and letter grades in Science 9 so that we might better understand those practices and beliefs. To that end, both 6 quantitative (written survey questionnaires) and qualitative (semi-structured interviews, document collection) data collection techniques were employed, and the following four questions guided this study: 1. What reporting methods do teachers use to communicate information about student learning in Science 9 to students and parents, and what are teachers \ students', and parents' opinions of those reporting methods? The method(s) used to report students' progress help give meaning to the letter grades a teacher assigns, and students' and parents' opinions of the reporting method(s) used by the teacher affect how letter grades are interpreted. The purpose of this question, therefore, is to describe the method(s) teachers use to report student progress, and ' teachers', students', and parents' opinions of those methods. 2. What grading components do teachers incorporate into Science 9 letter grades, and what grading components do students and parents believe teachers incorporate into Science 9 letter grades? The grading components incorporated into a letter grade help give meaning to the letter grades a teacher assigns; at the same time, students' and parents' beliefs about the components of a letter grade affect how they interpret that grade. Hence, the purpose of this question is to describe the components of Science 9 letter grades, and students' and parents' beliefs about those components. 3. What meanings do teachers, students, and parents attribute to Science 9 letter grades? The purpose of this question is to describe the meaning(s) teachers intend their letter grades to communicate about student progress in Science 9 (e.g., it shows how well a student in Science 9 compared to the other students in the class), and the meaning(s) students' and parents' attribute to Science 9 letter grades. 4. What are students' and parents' perceptions about some possible effects of student progress reports in Science 9? 7 Two reasons for reporting information about student progress in school are to improve student learning and to make educational and vocational decisions. Accordingly, the purpose of this question is to describe students' and parents' perceptions of the effect of progress reports on several factors affecting student learning in school (e.g., the amount of homework they do, their work habits in school), and vocational and educational decisions. Definition of Terms In this study, assessment and evaluation are viewed as two closely related, but distinct, processes. Assessment takes place when a teacher collects information about student knowledge, skills, and attitudes. Evaluation takes place when a teacher uses the information collected via the assessment process to make a judgment about a student's learning progress, or achievement in particular unit, course, or grade-level, in school. Whenever a teacher grades or assigns a letter grade to a student, she does so as a result of an evaluation process; therefore, for the purpose of this study, teacher evaluation of student achievement, or progress, is equated with grading and a letter grade is the symbolic indication of the evaluation, or judgment, a teacher has made about a child. Hence, the terms used in this document are defined as follows: • Assessment "is the systematic gathering of information about what students know, are able to do and are working toward" (BCME, 1994a, p. 103). • Evaluation is the process of judging student performance in a particular unit, course, or grade-level in school based on assessment information. • Grade refers to a symbol (i.e., letter, number) that summarizes "complex information about a student's performance in a particular area" (Worthen et al., 1993, p. 378). • Grade-level refers to the year of schooling in which a child is enrolled (e.g., Grade 9). 8 • Grading/actors or grading components refer to the kinds of information considered by a teacher and incorporated into a letter grade.2 • Reporting "refers to the communication among educators, students and parents about student learning" (BCME, 1993c, p. 12). • A report of student progress, or student progress report, is the method by which information about student learning is communicated to students and parents; such reports can be classified as formal or informal.3 • A report card is a formal written summary report about a student's progress in school. Delimitations and Limitations of the Study Delimitations: This study is restricted to the teachers, students, and parents from five Science 9 classes who volunteered to complete written survey questionnaires about grading and letter grades, and to a subset of them who agreed to be interviewed. It is also restricted to participants' perceptions about grading and letter grades given at the end of a term, part way through a course, rather than at the end of a course. Limitations: There are four limitations to this study. The first limitation is that all of the participants were volunteers. As such, they cannot be considered to be random samples of any specific target populations (Borg & Gall, 1979). The samples in this study are known to be highly biased in favour of non-immigrant, English-speaking students and non-immigrant, English-speaking parents with fairly high levels of education (most have a diploma or a degree), therefore, the views and opinions expressed by the people in this study should not be thought of as either exhaustive or typical of those existing in the total population. In the end, the volunteer nature of the participants decreases the generalizability of the findings with the result that the Grading components are discussed in more detail in Chapter 3. Reports of student progress are discussed in more detail in Chapter 3. 9 study will not be generalizable to all student grading situations. The second limitation concerns the participation rates of students and parents. Overall, they were low, and the rates varied greatly from class to class. Moreover, no students were interviewed from two of the classes, and no parents were interviewed from three. Had the purpose been to test hypotheses and make statistical inferences, the rates would have been extremely disappointing and the data unusable. However, because the purpose of the study is to explore people's practices, beliefs, and opinions as they pertain to the grading and letter grades in Science 9,1 believe that, in the end, a satisfactory number, and variety, of people participated. The third limitation has to do with the methods used to collect information from the participants; there is a possibility that the organization of the questionnaires and interviews could have prevented people from expressing views different from those implied by the response options. Despite the fact that I gave them the opportunity to add any other information they wanted, by including fixed-response items on the questionnaires I could have limited participants' abilities to identify or recall some of their other views. The findings emerging from this study, therefore, are limited by the structure of the questionnaires and interviews. A fourth limitation arises from the fact that I met only once with each of the people I interviewed. As a result, each person had only one opportunity to discuss their questionnaire responses and share their views with me. Moreover, they had no opportunity to discuss the data and results with me. Had member checks (Lincoln & Guba, 1985; Merriam, 1988) been conducted, the internal validity of the study would have been strengthened. Accordingly, the interpretations presented in this document are mine, and it is recognized that the findings could be subject to other interpretations. 10 Organization of the Thesis There are 10 chapters in this document. The first four chapters present the foundations of the study: Chapter 1 introduces the study, Chapter 2 presents the personal and historical contexts of the study, Chapter 3 discusses literature related to the study, and Chapter 4 describes the methodology of the study. Chapter 5 describes the people of the study, and Chapters 6 through 9 present the outcomes of the analyses of the data for the four questions of the study. The final chapter discusses the results and implications of the study. 11 CHAPTER 2 THE PERSONAL AND HISTORICAL CONTEXT OF T H E STUDY The context of the study is presented in this chapter. In it, I describe personal experiences that help frame the study, consider several recent developments in education in North American education and B.C. relevant to the study, and discuss validity theory as it applies to student testing and assessment. A Personal Perspective Among other things, I am the mother of two school-age children, I am an educator, and I am a student. Each of these aspects of my life has helped form the backdrop of this study. Each of these aspects of my life has helped me interpret my children's report card letter grades. In this section, I describe several experiences that compelled me to pursue this study — that is, I describe the personal context of the study. During the first few years of his schooling, my eldest son's progress was reported via anecdotal report cards. Because I had consistently monitored what he was doing in school, and because he was an academically strong student, I always felt that I understood what his report cards meant and that I knew how he was progressing in school; I was a proud parent with a child who seemed confident about his ability to succeed in school. When he entered the intermediate level in school, we were told there would be letter grades on his report card — anecdotal report cards were no longer to be sent home and progress was to be reported via letter grades, work habit symbols, and written comments. He looked forward to his first term report card, convinced that he would get "straight As". A good report card it was; however, there was that " B " in Language Arts that really bothered him. "/ don't know why I got a 'B'. I did all my work. I finished all my Reader's Response sheets and even did extra work for bonus marks." "Bonus marks? " I asked. 12 "Yes, Mrs. '_ uses comparative marking and allows us to work for bonus marks if we want to improve our letter grade." "Mmmmmm." I replied. Because my son was highly motivated to get "straight As", he decided to find out what other students had done to get an " A " in Language Arts. He discovered that some students had received many bonus points for the glossary section of their Reader's Response sheets — in the glossary section, students were expected to give definitions for key words used in a story. As it turned out, my son had defined each term according to its context in the story; the other students, however, had written down as many definitions as they could find in the dictionary for each key word. Any guesses as to how my son did his work for the rest of the year? And, yes, he did get his " A " in Language Arts! As a more than interested parent and educator, I have to admit that my son's strategy for getting an " A " in Language Arts dismayed me. I could see that he was being rewarded with a higher letter grade for what, in my opinion, was lower quality work. Not for the first time, I had to question just what letter grades were supposed to mean and to wonder about the consequences of using letter grades. I reflected upon my struggles as a science and band4 teacher during report card time when I tried to be as fair as possible with my students. I shuddered when I thought about some of the decisions I had made as a practicing teacher vis-a-vis assessment and evaluation, and I wondered about the effects they may have had upon my students. I began to think that I should put aside my proposed dissertation topic on attitudes toward science and start to investigate teachers' grading practices, instead — but I didn't. I didn't, that is, until I helped my son through two more reporting periods that year. As I have already suggested, by the second term, my son had figured out how to achieve an " A " in Language Arts — it was in science that the vagaries of grading student progress was revisited during the second term. The science letter grade for the second report card was to be based on a single research project. Because each child had been given a sheet outlining how marks were to 4 In addition to teaching junior and senior science courses, I taught Band 8 to 12. 13 be allotted for the project, my son knew exactly what he had to do. He worked hard. He did everything he had to do. His science report looked very good to me. It must have looked good to his teacher, too, because the day he handed it in he came home to tell me that he had received 100% for his science report. "Well done, Hon!" I said. "Yeah, but I might not get an A'," he replied. "Oh?" "Well, you know that Mrs. uses comparative marking ..." "Yes? " I said through my teeth. "Well,... she told us that she still has to take all the projects home and line them up on her living room floor from best to worst before she can decide who will get an "A ". She can only give out so many As, you know! " "Mmm. Well, how do you think you did on your project? " I asked as I tried to avoid casting aspersions against a teacher whom I knew to be very good — in spite of my son's description of her assessment and evaluation practices. "I did a very good job. I learned a lot about swifts." "Yes, and you learned a lot about word processing, too." "Yeah. But what about Julie, Mum? She worked really hard, but her project looked the worst. She's probably going to get a 'D'." "Mmmmmm," was all I could say. He did get his " A " in science that term. I don't know how Julie fared. I, however, began to seriously think about changing my dissertation topic. My son's experiences during the last term of that year provided the final catalyst I needed to change topics. Once again, he was working for an " A " in Language Arts; to get one, the teacher had told the students that they needed to achieve a score of, at least, 500 points on their Reader's Response sheets. Well, my son — bolstered by his bonus point strategy — worked hard, handed in the required five out of a possible seven Reader's Response sheets by the day 14 they were due, and managed to accumulate a total of 630 out of 500 points. Convinced that he had enough marks for an " A " , he decided to concentrate on other aspects of his studies — even when the teacher extended the deadline by another week. Bad choice! It seems that, because additional time had been allotted to them, some of the other children in my son's class had decided to complete more than the five required response sheets and had been able to amass a total of 700 points each. As a result, my son was told by his teacher that he probably would not be getting an " A " after all. By this time he didn't seem to be quite as concerned about his letter grade, he was, however, once again concerned about Julie. "But what about Julie ? " he asked. "She worked really hard and thought that she would be getting a 'C', but now she's probably going to get a 'D'. She was crying." I had to wonder about Julie's fate. Could she have already told her parents that she was going to get a " C " on her report card? Would there be any consequences for her if there was a " D " on her report card instead of the expected "C"? What did a letter grade mean in my son's class, anyway? The primary purpose of a report of student progress is to communicate to the students, and their parents or guardians, information about student learning — about their performance, or achievement, in school (BCME, 1994a). Yet, how was I to interpret the letter grades assigned to my child on his report card? How did other parents interpret the letter grades on their children's report cards? How did the students in my son's class interpret their letter grades? What did the teacher intend a letter grade to mean? How do teachers learn how to assess and evaluate student performance in school? How do they decide which aspects of student performance to assess and evaluate for the purpose of grading? With these, and many more, questions motivating me, I began reading research on grading and letter grades, and began the long journey toward the completion of a study that will, hopefully, help to provide insight into some of these questions — a study that will explore grading, reporting, and letter grades from the perspectives of teachers, students, and parents. 15 A North American Perspective Introduction In addition to my own personal experiences, several relatively recent events in education help frame this study. Some of these events originated in the United States (U.S.A.) (e.g., National Commission on Excellence in Education [NCEE], 1983) and apply more to education in the U.S.A., however, I discuss them in this section because, due to its vast size and close proximity, American educational policies and practices have often influenced education in Canada (Tomkins, 1979). Many Canadian educators complete their graduate work at American universities and return to Canada with American ideas, while American experts are often commissioned to do research and to help with Canadian educational problems. As Tomkins concluded, "the professionalism of Canadian educational theory and practice ... [has become] ... in essence Americanization" (1979, p. 8). Rather than responding in its own way to its own problems, Canada has often viewed its problems as mirror images of American ones, resulting in the "indiscriminate importation of American curriculum materials and projects that naturally enough ... [deal] ... with American social problems" (Tomkins, 1981, p. 164). Because developments in American education tend to affect education throughout North America, they cannot be ignored and must be considered to be part of the context of this study. Other events, like the 1988 Royal Commission on Education (Sullivan, 1988), originated in B.C. and have a distinctly B.C. flavor. These I discuss in the next section. American Educational Concerns of the 1950s and the 1960s On October 4,1957 the, then, Soviet Union successfully launched the Sputnik satellite. As a result, science and technology education became a "national priority" in the United States (Duschl, 1990). Money poured into American education, and many curriculum projects were launched (Worthen & Sanders, 1987). Many of those projects, and the materials developed for them, were imported from the U.S.A. into B.C. and the rest of Canada. 16 During the 1960s, Americans were concerned not only with their place in the world vis- a-vis science and technology, they were also concerned with the civil rights of their citizens and the educational opportunities of minority children (Worthen & Sanders, 1987). As a result of the 1964 Civil Rights Act and the 1965 Coleman Report, even more money was devoted to American education. Eventually, government concern that "money authorized for education would be spent as intended" (Worthen & Sanders, 1987, p. 17) led to legislation mandating educational evaluation and a new emphasis on educational accountability through student testing (Popham, 1990). Educational Reform, Accountability, and Standardized Testing in the 1970s and 1980s In the 1970s, reports that Scholastic Aptitude Test (SAT) scores were declining in the U.S.A., and that many students were graduating from secondary schools unable to read or write led some states to institute minimum competency testing programs (Brandt, 1989). Then, in 1983, the National Commission on Excellence in Education published its report, A Nation at Risk (NCEE, 1983) which, according to Berlak (1992), "told the American public that a major cause, if not the major cause, of America's fall from grace as the world's pre-eminent economic power was the failure of the nation's schools to educate a competent, dedicated work force" (p. 2). In response to this criticism, the educational reform movement moved into high gear and state-wide testing programs were implemented for the purpose of assessing student progress in school. By the end of the 1980s, "virtually every state had instituted a combination of top-down measures intended to raise educational standards ... by far the most common measure [was] statewide testing programs throughout the grades" (Berlak, 1992, pp. 4-5), and the U . S . A . had, what some called, a "culture of testing" (e.g., Kleinsasser, Horsch, & Tastad, 1993; Wolf, Bixby, Glenn, & Gardner, 1991). Inevitably, as newspapers began to publish the results of these testing programs on a district-by-district and a school-by-school basis, the scores generated as a result of standardized testing programs, designed and implemented for the 17 purpose of making student promotion and graduation decisions (Worthen 1993a), were also used for purposes other than that for which they had been intended (Popham, 1990). The extent of the impact of this culture of testing is reflected in this passage written by Shepard in 1989: In the United States today, standardized testing is running amok. Newspapers rank schools and districts by their test scores. Real estate agents use test scores to identify the "best" schools as selling points for expensive housing. Superintendents can be fired for low scores, and teachers can receive merit pay for high scores. Superintendents exhort principals and principals admonish teachers to raise test scores — rather than to increase learning, (p. 4) By the mid-1980s, many, if not most, administrators, teachers, and students in the United States, had become participants in "high-stakes" testing programs as standardized tests were used to assess students, teachers, administrators, and educational programs, and test results were being used in a variety of ways not originally intended by their developers. The situation in Canada during the mid-1980s, however, appears to have been somewhat different than in the U.S.A. — Canada did not embrace standardized testing to the same extent as the United States had (McLean, 1985). Although both B.C. and Alberta re-instated provincial Grade 12 examinations during the 1980s, and standardized testing was common at both the district and provincial levels in Canada, McLean concluded that the results of standards tests were not widely used throughout the country. That may have been the case when McLean examined Canadian student evaluation practices during the mid-1980s; by the mid-1990s, however, the results of standardized testing were indeed being used more extensively — and used in a manner similar to that observed by Shepard in 1989. For example, in February of 1996, one B.C. newspaper, The Province, published a series of articles that linked school success to standardized test results (Austin, 1996a, 1996b, 1996c, 1996d; Proctor, 1996a, 1996b, 1996c). At the same time, the newspaper published lists that "ranked" schools on the basis of the 1995 B.C. Grade 12 provincial exam results ("Exam Results School by School," 1996a, 1996b, 1996c). For each of several subjects — Chemistry, English, English Literature, Geography, History, and Physics — all B.C. public secondary schools were ranked on the basis of the average score obtained by their students on 18 the 1995 Grade 12 provincial exam. Also included in the list were the number of students who had written the exam for each subject and the district to which the school belonged. Accompanying the ranked list was the explanation: The figures on this page show the average marks in last year's [1995] Grade 12 provincial exams for all British Columbia public high schools. The Province [newspaper] obtained the results from the Ministry of Education for each subject and each school. To ensure we compared only high school marks, we removed continuing education and correspondence courses from the list. The Ministry uses these same figures as one of the bench marks to judge how schools compare. The exam results count for 40 per cent (sic) of the student's total marks. ("Exam Results School by School," 1996a, 1996b, 1996c) The Grade 12 provincial exam results, therefore, were used not only by a provincial newspaper to rank B.C. schools, but according to the newspaper, exam results were also used by the B C M E "as benchmarks to judge how schools compare" — a discomforting thought given that the provincial exams were developed for the purpose of assessing student achievement; not to evaluate programs or schools. Just as standardized test results have been used for purposes other than for what they were intended in the U.S.A., so have they been in B.C. It would seem that, once again, the influence of American educational policies and practices have extended into Canada, or at least into B.C. Publishing lists of schools ranked by standardized test results, early in 1996, had definite consequences for people working and learning within the B.C. public education system. For some, the consequences were positive — how could the students and teachers of the schools profiled in articles such as 'We 're the best' (Austin, 1996d), Coach ignites classes: Cheri Smith heads the hottest science department in B.C. (Austin, 1996a), or Kits all-round best: Athletics and the arts are equally important (Proctor, 1996c) feel anything other than immense pride in themselves? For others, however, the consequences were negative. Gene Darreth, a secondary school teacher, articulated some of the negative consequences of ranking schools by test results in a letter he sent to The Province. He wrote: I commend your paper for publishing some very rare positive views about school teachers. But to blindly rate the schools as though education were some great competition is pathetic and damaging. ... We are a very small school in a community with severe social and economic problems. Teachers at this school 19 struggle every day to build hope, self-esteem and pride in native cultural history amid a very difficult social situation in which alcohol, drugs, suicide and family abuse have had a very damaging effect on almost everyone. When our students saw their school rated last in a major newspaper, it made them feel worthless, embarrassed and hopeless. How could our students be judged against students whose reality and opportunity are totally different? It was brutally unfair to coldly list us all without taking into account the many variables of the schools and communities involved. (Darreth, 1996) There are consequences when standardized testing programs are instituted. Some consequences are intended — student achievement is assessed and decisions about promotion or graduation are made based on test results. Other consequences are unintended. Some unintended consequences may be perceived as positive, as is the case when schools receive accolades for the high scores of their students, or they may be perceived as negative, as is the case when schools are ranked at the bottom of the list due to low test scores. Like their American counterparts, educators and students in B.C. are now participants in a "high-stakes" testing program — the Grade 12 Provincial Examination Program — the results of which are used not only to assess student achievement but to rank schools by students' test scores, as well. Testing and Assessment Under Review The increased reliance on testing in education in the U.S.A. , during the 1970s and 1980s, encouraged researchers to identify and address some of the problems associated with it. Cannell's 1988 article described the "Lake Wobegon Effect" in which he asserted that "standardized, nationally normed achievement tests give children, parents, school systems, legislatures, and the press misleading reports on achievement levels ... [because] ... these tests allow all the states to claim to be above the national average!" (p. 6).5 This revelation helped to touch off debates that focused on issues such as the adequacy of national norming samples (e.g., Lenke & Keene, 1988; Linn, Graue, & Sanders, 1990; Phillips & Finn, Jr., 1988; Cannell (1988) argued that the main reason more than 50% of American students scored above average on standardized tests was due to out-of-date test norms. As he observed, "An above-average score does not mean that the average student or the district or the state is above the current year's average. It means only that the score achieved is better than the mean score achieved by the norm group in years past" (p. 7). 20 Shepard, 1990); the kinds of inferences that can be made based on test scores (e.g., Lenke & Keene, 1988); test-wiseness (Carter, 1986; Rogers & Bateson, 1991); and appropriate test preparation strategies (e.g., Mehrens & Kaminiski, 1989; Popham, 1991; Shepard, 1990). As the 1980s ended and the 1990s began, researchers continued to address a variety of other issues associated with testing and the assessment of academic achievement including: bias in test use (e.g., Cole & Moss, 1989; Linn & Drasgow, 1987); teachers' classroom assessment practices (e.g., Bachor & Anderson, 1991, 1993a, 1993b; Bateson, 1990; Brookhart, 1992, 1994; Olson, 1990; Stiggins, 1989; Stiggins & Conklin, 1992; Stiggins, Frisbie, & Griswold, 1989; Wilson, 1989,1990, 1992); and the impact of testing on students, teaching, and learning (e.g., Airasian, 1988; Anderson, Muir, Bateson, Blackmore, & Rogers, 1990; Brandt, 1989; Crooks, 1988; Jervis, 1989; Nolen, Haladyna, & Haas, 1992; Smith, 1991). Research indicating that "high-stakes" testing helps determine how and what teachers actually teach (e.g., "Interview with Lorrie Shepard," 1991; Nolen et al., 1992; Smith, 1991) and what students will study for (e.g., Crooks, 1988) led some to argue that testing should "once again serve teaching and learning" (Wiggins, 1989b, p. 41). In 1987, concern about the quality of student assessment devices and the use made of them by educators, led representatives from several American professional education associations involved in teaching, teacher education, and student assessment to collaborate to develop standards for teacher competence in student assessment (Sanders, 1989, p. 25). The outcome of this collaboration was the publication of the document, Standards for Teacher Competence in Educational Assessment of Students (American Federation of Teachers, National Council on Measurement in Education, National Education Association, 1990). The committee who produced this booklet believed that good student assessment is essential to good teaching, and that both practicing and preservice teachers should be trained to develop the competencies advocated by the seven standards. Around the same time as the American standards were published, representatives from a number of Canadian professional education organizations and provincial governments began 21 working together to produce a similar document for Canada. Published in 1993, the booklet, Principles for Fair Student Assessment Practices for Education in Canada (Principles for Fair Student Assessment, 1993), presented nine different principles, and their related guidelines, that identified "the issues to consider in exercising professional judgment and in striving for the fair and equitable assessment of all students" (p. 3). Organized into two parts, the booklet presented five principles relevant to assessments carried out by teachers and four relevant to standardized assessments. While some concerned educators developed standards and principles to guide teachers' assessment practices, others argued that teachers should replace traditional tests with other forms of assessment. For example, in their often-cited 1988 publication, Archbald and Newmann criticized the use of traditional standardized tests for the purpose of assessing student progress on the grounds that such tests "communicate very little about the quality or substance of students' specific accomplishments ... [and because] the type of learning actually measured is often considered trivial, meaningless, and contrived by students and adult authorities" (p. 1). They contended that "a valid assessment system provides information about the particular tasks on which students succeed or fail, but more important, it also presents tasks that are worthwhile, significant, and meaningful — in short, authentic [emphasis in original]" (p. 1). Archbald and Newmann were not alone in their belief that assessments should be "authentic"; many other authors advocated the use of authentic assessment methods to assess student achievement (e.g., Berlak, Newmann, Adams, Archbald, Burgess, Raven & Romberg, 1992; McLean, 1990; Perrone, 1991; Shepard, 1989; Wasserman, 1993; Wiggins, 1989a, 1992, 1993; Wolf, et al., 1991). Referred to variously as "direct assessment", "performance assessment", and "alternative assessment", authentic assessment is viewed, by its advocates, as an alternative to traditional paper-and-pencil tests because it involves the "direct examination of student performance on significant tasks that are relevant to life outside of school [emphasis in original]" (Worthen, 1993a). 22 As interest in authentic assessment increased, researchers began to address a number of issues pertaining to it including: the conditions needed to implement authentic assessment (Worthen, 1993b); its cost (Madaus & Kellaghan, 1993; Popham, 1993); its time requirements (Madaus & Kellaghan, 1993); its practicability (Madaus & Kellaghan, 1993); its effect on teachers and students (Madaus & Kellaghan, 1993); the value of the information elicited by it about students (Madaus & Kellaghan, 1993); validity issues as they apply to authentic assessment (Burger, 1994; Wiggins, 1993); and the generalizability of scores from performance-based assessments (Brennan & Johnson, 1995; Linn & Burton, 1994). In B.C. , the B C M E advocated authentic assessment methods in its publication Reform of Assessment, Evaluation, and Reporting for Individual Learners: A Discussion Paper (BCME, 1992a). Authentic assessment has also influenced educational measurement literature and science education (e.g., Pine, Baxter, & Shavelson, 1991; Shavelson & Baxter, 1992) in recent years. In 1992, Bateson suggested that the authentic assessment movement had initiated a paradigm shift in educational measurement (Bateson, 1992). It is not yet possible to ascertain whether or not authentic assessment will truly revolutionize teachers' assessment practices, however, because there are many who believe in the continuing efficacy of standardized testing programs and other traditional assessment methods. Nevertheless, there is no doubt that the authentic assessment movement has influenced policy and practice in education throughout North America. Whatever its longevity, it is necessary to consider developments in the authentic assessment movement as part of the context of this study. A Uniquely British Columbian Perspective The Sullivan Royal Commission and its Legacy for Learners Here in B.C., a major influence — if not the major influence — on education during the late 1980s and early 1990s, was the 1988 British Columbia Royal Commission on Education (Sullivan, 1988), or as it is more commonly known, the Sullivan Commission. Initiated in March 1987, the Sullivan Commission took place at a time when birth rates were low and the population was aging — "two social factors with profound meaning for educational social 23 planning" (p. 2). It was also a time of social change as people in B.C. began to consider individual, minority, cultural, and language rights issues. The Sullivan Commission was initiated because social changes had "significant implications for provincial schools and educational policies in general... [and it was] recognized that a reassessment was necessary if we are to look forward to the future with confidence" (p. 2). There were three main purposes to the Sullivan Commission: the first, and primary purpose, was "to listen to what the people of British Columbia had to say about education and the school's role in the educational process" (Sullivan, 1988, p. 5); the second was to initiate "a series of research studies to examine some fundamental and vexing issues relating to the provision of educational services in the province" (p. 5); and the third purpose, based on the information generated as a result of the first two purposes, was "to develop and present a coherent understanding of the school's role in British Columbia society today and the meaning of education to a provincial community that is experiencing important social and economic changes" (p. 5). The Commission sought to understand the strengths and weaknesses of the B.C. educational system at that time. However, it did not conduct an audit of school productivity and efficiency. Members of the Sullivan Commission traveled throughout B.C. listening to the people of the province talk about education. They traveled more than 24 000 km and visited 139 schools in 89 communities. They held 66 public hearings and attended 54 meetings with teachers, and 23 student assemblies. People were invited to give both oral and written submissions. In the end, the "Commission received almost 2 350 written and oral submissions from individuals and groups in all parts of the province [attesting] to the public's great interest in schooling in British Columbia" (Sullivan, 1988, p. 3). The Commission's report, The Report of the Royal Commission on Education, 1988: A Legacy for Learners (Sullivan, 1988), contained 83 recommendations several of which focused on student assessment and reporting. The response of the Social Credit government of the day to the recommendations of the Sullivan Commission was "presented in A Mandate for the School System and in Policy 24 Directions, and was given legislative form in the School Act [of September, 1989]" (BCME, 1990b, p. 3b). The booklet, Year 2000: Framework for Learning (BCME, 1990b), delineated the B C M E ' s policies as they pertained to the organization of educational services for students. Although it was not a curriculum guide, this booklet outlined the direction education was expected to take in B.C. The changes proposed for the provincial educational system were vast and of the kind Cuban (1988) referred to as "second-order changes" (p. 342). They were changes intended to alter the fundamental organization of schools in B.C., and to influence the ways teachers taught students, and assessed, evaluated, and reported student progress. The changes proposed by the B C M E were designed to address changes in society, to incorporate new knowledge about how people learn, and to help schools better meet B.C.'s educational goal — the intellectual, human and social, and career development of students. The key principles of learning articulated in the booklet, Year 2000: Framework For Learning (BCME, 1990b), served as the foundation for the principles of curriculum and assessment. As stated by the B C M E (1990b), the key principles of learning were: "learning requires the active participation of the learner"; "people learn in a variety of ways and at different rates"; and "learning is both an individual and a social process" (BCME, 1990b, pp. 7-8). Following from these key principles of learning, the B C M E asserted that curriculum and assessment should be "learner focused". The Year 2000: A Framework For Learning (BCME, 1990b) stated that: learner-focused curriculum and assessment is developmentally appropriate, allows for continuous learning, provides for self direction, meets the individual learning needs of students as much as possible, and deals with matters of relevance to learners [emphasis in original], (p. 9) To accommodate learner-focused curriculum and assessment, new provincial programs were defined for B.C., and the 13 years of schooling were divided into three programs: the Primary (Grades K to 3), the Intermediate (Grades 4 to 10), and the Graduation Programs (Grades 11 and 12). 25 The B C M E published a number of documents to explain the Year 2000program (e.g., B C M E , 1990a, 1992a, 1992b, 1992c, 1992d). Particularly relevant to this study were three documents written about the changes to be implemented at the junior secondary level — The Intermediate Program: Foundations (1992d) and Supporting Learning: Understanding and Assessing the Progress of Children in the Primary Program. A Resource for Parents and Teachers (BCME, 1992b), and Reform of Assessment, Evaluation, and Reporting for Individual Learners: A Discussion Paper (BCME, 1992a). Assessment. Evaluation, and Reporting Under the Year 2000 Program: The Year 2000 documents encouraged educators to move away from a heavy reliance on paper-and-pencil testing to more authentic methods (BCME, 1992b); they encouraged students, parents, and teachers to contribute to and participate in the assessment process; and they explained that letter grades were no longer to be used for reporting children's progress at the primary level because "letter grades were dependent on teacher and parent interpretation and often focused on surface knowledge rather than understanding" (BCME, 1992b, p. 10). As an alternative, the B C M E planned "to move away from the sole reliance on letter grades and to move towards the use of descriptions and examples of student performance" (BCME, 1992a, p. 10), and primary teachers were expected to use samples of student work and anecdotal reports to communicate information about what students are learning and can do. With respect to the Intermediate Program, The Intermediate Program: Foundations (1992d) document maintained that assessment, evaluation, and reporting should be embedded in the learning environment and enhance student learning. Assessment, evaluation, and reporting were defined by the B C M E as follows: Assessment involves gathering information about students' experiences and about what they have learned. Evaluation involves interpreting that information in order to make further curricular decisions. Reporting refers to the communicating of student learning to students and parents/guardians. (BCME, 1992d, p. 103). The B C M E viewed assessment, evaluation, and reporting as "part of an ongoing process of collaboration between students and teachers, that enables them to identify what was learned, 26 how students work best, and how they can improve" (BCME, 1992d, p. 103), and intermediate teachers were expected to "seek authentic evidence of what students can do, determining their strengths and needs through a variety of methods" (BCME, 1992d, p. 103). Moreover, the B C M E took the position that symbols (i.e., letter grades) do not provide enough information about a child's progress in school because they do not explain what students are learning, and how they are developing and performing; as a consequence, the B C M E decided to replace letter grades with anecdotal reports at the Intermediate level. Public Reaction to the Year 2000 As might be expected, the Year 2000 changes were not met with enthusiasm by all British Columbians. People criticized the program because it promoted continuous progress (e.g., Balcom, 1993a, 1993b), downplayed competition in the schools (e.g., Balcom, 1993a; Foulds, 1993), and used anecdotal report cards (e.g., Balcom, 1993a, 1993b). According to Moira Baxter, the president of the B.C. Confederation of Parent Advisory Councils, parents were most concerned about assessment and reporting under the Year 2000 program, (Balcom, 1993a). These newspaper excerpts reflect some of the concerns expressed by people in B.C. at the time: Byron Price has never been a teacher. Nor has he been a parent. But as president of the newly formed Society For The Advancement of Education, he has a lot to say about the quality of education in B.C. school. Most of it isn't good. ... Price believes most parents want to know how their children are doing compared to other students in the class. But it's getting tougher to find out. At the primary level, report cards don't show marks anymore, only anecdotal comments about a child's progress. (Balcom [Sun staff writer], 1993a) As a parent, I'm frustrated because 'anecdotal reporting' does not tell me how my children are doing in relation to their peers. While the ministry of education tells me "this is not important, I feel it is." (McCormick [parent], 1993) The ease with which children can fall behind their classmates is a major concern for parents. So too is the assessment and reporting process, especially anecdotal report cards. The carefully couched phrases are too vague, they say. Teachers complain it takes far more time to write individual comments for each child that it does to record letter grades. (Balcom [Sun staff writer], 1993d) 27 As the above excerpts show, some British Columbians did not like anecdotal report cards because they felt that they were too vague and that they did not indicate how well their children were doing in school compared to their peers. Not all people agreed with the opinions expressed above, however. The following excerpts represent some other points of view: Primary school parents and conservative-minded educators who want the schools to award letter grades instead of anecdotal reports, as provided for in the Year 2000 program, are really wanting the schools to pick out winners and losers. ... Letter grades tell some children they are successful learners and tell others they are failures as learners. ... Shame on parents and educators who want schools in which failure is used in an absurd, futile attempt to promote learning. Our schools should not be in the business of picking winners and losers. (Young [retired school principal], 1993) As a parent, I don't want to go back to letter grades in elementary school. An A , B, or C doesn't tell me nearly enough. I want to know my children's strengths and weaknesses, their accomplishments and where they are having difficulties. ... I want to know especially what the teachers' expectations are for children of the same age and, in general how my children measure up against those guidelines. ... My son's last report card told me — with lots of examples — what the teacher expected of the class that year. And I got a clear idea of exactly how well my son did." More teacher training seems the answer rather than going back to simple letter grades. (Roberts [parent], 1993) I read Vaughn Palmer's Nov. 23 attack on 'edu-crats' and non-graded report cards only a few hours before seeing my daughter's first grade report card (A primer on the premier and the edu-crats). Those three pages of carefully considered narrative — I'm referring to the report card, not Mr. Palmer's column — told me more about my daughter's progress and her adjustment to the classroom that any list of arbitrary letter grades ever could. (Smith, [parent], 1993) The people quoted above supported anecdotal report cards because they believed that written narratives that described what their child can do, is interested in, and needs help with, provided a clearer picture about their child's progress in school than did letter grades. The debate over the strengths and weaknesses of the Year 2000 program, in general, and over assessment and reporting, in particular, played out in the media throughout 1993. 28 Government Reaction to Public Pressure So much pressure was put on the NDP government of the time that, late in the summer of 1993, it was announced that the Year 2000 program was to be reviewed and modified: [Premier Mike] Harcourt says he's asked the minister of education for a full report on the problems plaguing education by Sept. 30. ... "Not only are changes needed but they're going to come quickly," he promised. Parents want to be able to read a report card without having to have a Ph.D. The whole question of standards and evaluations is actively being pursued." (Balcom, 1993c, p. A l ) Premier Mike Harcourt says he will ensure much of the controversial Year 2000 program for education reform is jettisoned or overhauled.... Harcourt said Tuesday the program will be replaced by a list of new reforms headed by a renewed emphasis on teaching basic skills and a system that tell parents in clear terms how well their kids are progressing."(Baldry, 1993). ... Mr. Harcourt cut adrift the Year 2000 program, which his government had been defending as a necessary reform to the system. "To put it bluntly, the report card is in on the Year 2000 and it's failed the grade," he said. "There are going to have to be quite substantial changes." ... By "substantive," the premier apparently meant a return to standards, testing, report cards, letter grades and clearly delineated classes. He didn't quite say that the Year 2000, once touted as the future, had become history, but he came close." (Palmer, 1993a) The government had responded to the public's concerns about how student progress was assessed and reported, and promised to make changes. On November 16, 1993, those changes were announced. Of particular relevance to this study were the changes made to assessment and reporting policies. The media reported: B.C. schools will keep what the government says is the best of the controversial Year 2000 program, and trash the rest. ... In his long-awaited report to Premier Mike Harcourt on the future of education reform in B.C., Education Minister Art Charbonneau dumps anecdotal report cards and replaces them with "structured written" reports, effective September, 1994. ... He also reinstates letter grades, beginning in Grade 4, and erases a potential pitfall of Year 2000's emphasis on child-directed learning: that some student might float from grade to grade without completing their previous year's work. (Balcom, 1993e) After much public debate, the government decided not to implement much of the Year 2000 program, and to replace the short-lived anecdotal report cards with "structured written" reports and letter grades. 29 In December 1993, the B C M E spelled out the changes announced in November 1993 the document The Intermediate Program Policy: Grades 4 to 10 (BCME, 1993d). Several policy statements concerning "student progress reports" and "letter grades and symbols" were presented in this document, including: In Grades 4 to 7: teachers will prepare structured written comments and assign letter grades [italics added] ... school districts will decide how letter grades are to be communicated to parents; for example, letter grades could be included with the written comments in the report card or shared at a parent-teachers' conference, (p. 12) Students from Grades 8 to 10: will receive letter grades [italics added] with whatever structured written comments are required to inform parents. Reporting will include: assignment of letter grades based on student achievement in relation to criteria and standards established for each subject or course ... increased use of self-assessment, peer assessments, portfolios and student-led conferences to supplement the reporting process ... use of informal reports such as telephone calls, student-parent-teacher conferences and journals, (p. 12) The letter grades [italics added] for use in the Intermediate Program on term and final student progress reports will be A , B, C or the symbol IP. The letter grades will help describe what a student is able to do in relationship to expected outcomes, (p. 12) It is interesting to note that the B C M E revised its reporting policy and reinstated letter grades because parents complained that anecdotal reports did not tell them how their children were doing compared to their peers, which requires norm-referenced evaluation. The 1993 policy, however, clearly states that student evaluation is to be criterion-referenced and not norm- referenced. As such, it may not address the concerns of those parents who want to know how their child is doing compared to the other students in the class. Parents who called for letter grades that showed how well their children had done in relation to other students may have done so because that is how they had been graded when they were students. After all, prior to 1981, the B C M E had described letter grades in norm- referenced terms (e.g., above average, average). Parents who wanted norm-referenced letter grades were possibly unaware of the fact that, as of 1981, the B C M E had described letter grades in criterion-referenced terms. The BCME's 1986 version of the Administrative Handbook for 30 Elementary and Secondary Schools (Administrative Handbook) (BCME, 1986), for example, gave the following definitions of the letter grades " A " , " B " , and "C" : A = Excellent achievement. The student has achieved excellent performance for the subject/course/grade/level and is considered fully capable of handling subsequent work with ease at a superior level of performance. B = Very good achievement. The student has achieved very good performance for the subject/course/grade/level and is considered fully capable of handling subsequent work with ease at a good level of performance. C = Satisfactory achievement. The student has achieved the basic standard of performance widely expected for the subject/course/grade/level and is considered capable of handling subsequent work. (BCME, 1986, pp. 67-70) As these examples show, a letter grade was expected to show a student's progress in comparison with the widely held expectations for the subject/course/grade/level at which the s/he is working.6 The Administrative Handbook (BCME, 1986) also provided the reported range of scores associated with each letter grade. Figure 1 displays the letter grades to be assigned at the end of a course and their associated reported range of scores from the document. Level of Achievement Reported Range of Scores A (Excellent) 8 6 - 1 0 0 B (Very Good) 7 3 - 8 5 C + (Satisfactory) 6 7 - 7 2 C (Satisfactory) 6 0 - 6 6 P (Passing) 5 0 - 5 9 F (Failing) below 50 Figure 1. Reported range of scores associated with each level of achievement for end of course grades. Source: Administrative Handbook (BCME, 1986). It is important to understand what the B C M E meant by this table. As the following excerpt shows, the B C M E did not intend a teacher to assign a letter grade based on a student's 6 This is the definition that was in place when the data were collected for this study. 31 percentage, rather, it expected a teacher to assign a percentage score for reporting purposes, based on the letter grade that corresponds to a student's level of achievement: It should be noted that it is not the Ministry's intent to set pre-determined percentages that students must obtain in order to attain certain letter grades. The intent, rather, is to standardize the reporting of different levels of achievement. Teachers may require different percentages for letter grades during the school year according to the difficulty of tests and other considerations. On the final report, however, " A " level achievement (as determined by the teacher during the year) is to be reported in the 86% to 100% range, thereby standardizing achievement reporting. The following material outlines a method that could be used to standardize the reporting of achievement levels. This method acknowledges the importance of teacher judgment in assessing student performance. A teacher's evaluation of a student takes place over an entire semester or year. If, as a result of that evaluation, the teacher's assessment is that the student has demonstrated (for example) an excellent level of achievement, then a score in the 86-100 range ("A" level achievement) should be reported as the school mark. Teachers might wish to use the following procedures in assigning percentages. 1. Identify students by groups with respect to achievement (i.e., the " A " group, the " B " group, etc.) 2. Within each group, rank students according to their measured achievement during the course. 3. Assign each student a percentage in the appropriate range for each group's achievement. (BCME, 1986, p. 70-71) When a teacher assigns letter grades as the B C M E intended, s/he is required to make a judgment about a student's performance first, and then assign a percentage score to represent that judgment. Yet, in my experience, this is not how teachers determine letter grades — I know that that is not how I assigned letter grades — nor is it how my sons' teachers assign letter grades. In most cases, teachers calculate an overall percentage score for a student for the reporting period and then convert that percentage into a letter grade according to the ranges shown in Figure 1. However, the B C M E expects a teacher to judge a student's level of achievement compared to the widely held expectations for Science 9, assign a letter grade based on that level of achievement, and then assign a percentage in the appropriate range for that achievement (e.g., a student who has demonstrated a "very good" level of achievement — " B " level achievement — is assigned a percentage in the 73% to 85% range). Given that teachers often 32 explain to students and parents how they calculated a letter grade based on the student's percentage for the term, it is not surprising that students and parents are not aware of the BCME's expectations and believe that letter grades are to be assigned based on a percentage score for the term. When the government announced its new policy in November 1993, the meanings of letter grades were given as follows: A — excellent or outstanding achievement in relation to expected learning outcomes. B — very good achievement in relation to expected learning outcomes. C — satisfactory achievement in relation to expected learning outcome. IP [In Progress] — expected learning outcomes not achieved and further development required. (BCME, 1993d, p. 12) Although, similar to that which had been in place prior to the Sullivan Commission, the new policy replaced the grades " D " and " E " with "IP", and defined the letter grades somewhat differently — student achievement was now to be described in relation to "expected learning outcomes"7 instead of "widely held expectations". The 1993 definitions make it very clear that letter grades are to be criterion-referenced. As is to be expected, reaction to the changes were both positive and negative and, for a short time after its announcement, both criticism (e.g., Palmer, 1993b) and praise (e.g., Balcom, 1993e) was directed at the government. Several months after the government announced that a new reporting policy was to take effect in September 1994, British Columbians were invited to review and comment on a draft version of the reporting policy presented in the B C M E paper Policy for Reporting Student Progress in British Columbia: Kindergarten to Grade 12. Draft for discussion purposes. (BCME, 1994c). In September 1994, a newly-articulated policy for reporting student progress, along with guidelines for teachers and administrators to use when reporting student progress, were published in the booklet Guidelines for Student Reporting for Kindergarten to Grade 12 7 The terms "expected learning outcomes" and "intended learning outcomes" are used interchangeably throughout this document. 33 (BCME, 1994a). At the same time, two other documents, Report to Parents (BCME, 1994d) and Parents' Guide to Standards (BCME, 1994b), written to explain the new policy to parents, were distributed throughout the province. The policy for reporting student progress adopted in September 1994 was similar to the one proposed by the government in December 1993. Gone were anecdotal reports; instead, parents were to be provided with a minimum of three formal reports cards (structured written reports and letter grades) and two informal reports (e.g., conferences, telephone calls, notes, interim reports) every year (BCME, 1994a). Whether or not students were to receive letter grades on their report cards varied depending of their level in school. The B C M E guidelines for assigning letter grades set out for teachers in 1994 — and still in effect today — were as follows: ... The assessment and evaluation of the student's performance demonstrated through the learning activities is collected and recorded. ... The teacher judges the student's overall performance in relation to the outcomes for the unit or term and decides whether the overall performance is outstanding, very good, good, satisfactory, minimally acceptable, progressing but needs more time to complete requirements or not demonstrating minimally acceptable performance. ... The Ministry-approved letter grades that correspond to the level of performance demonstrated by the student are assigned. (BCME, 1994a, p. 20) As was the case prior to 1993, teachers were expected to determine a student's level of achievement first, and then assign a letter grade based on that level of achievement. The percentage associated with a letter grade were only to be included for courses numbered 11 and 12 (BCME, 1994a), and again, a reporting percentage was to be assigned only after the letter grade was determined. Formal reports prepared for students in Kindergarten to Grade 3 were to include a structured written report, but letter grades were not required. For students in Grades 4 to 7, formal reports were to include structured written reports and letter grades were to be written on the reports unless the school district chose to communicate letter grades to parents by another 34 method (i.e., parent-teacher conferences). Letter grades, supplemented where appropriate by written comments, were to be written on report cards for students in Grades 8 to 12. In addition, students enrolled in Grade 11 and Grade 12 courses were to have percentages recorded on their report cards. The B C M E stipulated that letter grades assigned by teachers were to be "criterion- referenced" because "criterion-referenced letter grades in Grades 4 to 12 indicate students' levels of performance as they relate to the expected learning outcomes set out in provincial curriculum guides for each subject or course and grade" (BCME, 1994a, p. 4). Teachers in B.C., therefore, were expected to evaluate a student's performance by comparing it to "established criteria rather than to the performance of other students" (BCME, 1994a, p. 14) which is the case for norm-referenced evaluation. The B C M E took the position that, because norm-referenced evaluation is based on a normal-distribution that "shows how achievement in a particular area is distributed over an entire population" (BCME, 1994a, p. 16) it is appropriate for large-scale system analysis, ranking students for scholarships, or diagnosing students' learning difficulties but not for describing "student's individual progress ... [because] it compares student achievement to that of others rather than comparing how well a student meets the criteria of a specified set of learning outcomes" (BCME, 1994a, p. 16). In December 1993, the B C M E announced that letter grades would be restricted to the symbols, " A " , " B " , "C" , and "IP", yet, when the new reporting policy was officially introduced in September 1994, the approved letter grades were: A The student demonstrates excellent or outstanding performance in relation to the expected learning outcomes for the course or subject and grade. B The student demonstrates very good performance in relation to the expected learning outcomes for the course or subject and grade. C+ The student demonstrates good performance in relation to the expected learning outcomes for the course or subject and grade. C The student demonstrates satisfactory performance in relation to the expected learning outcomes for the course or subject and grade. 35 C- The student demonstrates minimally acceptable performance in relation to the expected learning outcomes for the course or subject and grade. IP (In Progress). The student is making progress, but it has been determined that additional time is required to meet the expected learning outcomes for the course or subject and grade. Guidelines for assigning an IP must be followed. Expectations and timelines must be attached for each assigned IP. F Failed or failing. The student has not demonstrated, or is not demonstrating minimally acceptable performance in relation to the expected learning outcomes for the course or subject and grade. F (Failed) may only be assigned if an IP (In Progress) has been previously assigned. (BCME, 1994a, p. 8) Originally, the "IP" letter grade was to be optional in the 1994-95 school year but mandatory in the 1995-96 school year (BCME, 1994a). However in their June 18, 1996 policy circular, the B C M E stated that the "IP" letter grade would be implemented in a two-stage process. During the first phase (1995-96 school year), the "IP" letter grade was to be used for Grades 4 to 7; during the second phase (1997-98 school year), for Grades 8 to 12 (BCME, 1996). In its policy circular of June 10, 1997 (BCME, 1997), the B C M E announced revisions to the above letter grade system which were to be implemented as of September 1997. First, teachers were to use the symbol "I" (In Progress or Incomplete) instead of "IP": "I" will be used to cover broader circumstances than has been the case with "IP". In addition, the requirements when using "I" will be more flexible. Documentation, timelines and administrative details will be determined by school districts. (BCME, 1997, p. 1) Second, "the old letter grades ' D ' and ' E ' will no longer be used as of September 1997" (BCME, 1997, p. 1). Consequently, as of September 1997, B.C. teachers were required to use the following letter grade system on formal reports: " A " , " B " , "C+", "C" , "C-" , "I", and " F ' (for a final grade only). As previously mentioned, Kindergarten to Grade 3 students were to be issued structured written reports, students in Grades 4 to 7 were to be issued letter grades and structured written reports (comments), and students in Grades 8 to 12 were to be issued letter grades, percentages and, where appropriate, structured written comments. Written comments were expected to describe: "what students are able to do; the areas in which students require further attention or 36 development; and ways to support students in their learning (BCME, 1996, p. 1). In addition to letter grades and structured written comments, the B C M E stipulated that "all formal reports will include a description of student behaviour, attitudes, work habits and effort" (BCME, 1996, p. 2). Although the debate about the best method for reporting student progress subsided somewhat after the new policy was introduced in the fall of 1994, it has never truly ended and people continue to publicly express their views about how student progress should be and should not be reported to parents in B.C. (e.g., Balabanov, 1996; Young, 1996) — such is the context of education in B.C., and of this study. Validity and Assessment Recent developments in measurement specialists' conceptions of validity as it applies to testing and assessment that helped to shape this study are presented in this section. Conceptions of validity have changed over time. Various trends and developments in educational research and testing have influenced how validity has been conceived and how validation research has been conducted. Most notions of validity have been formulated with commercial, standardized tests in mind. Validity, as it applies to teachers' classroom assessment practices cannot be overlooked, however, because students spend much more time completing teacher-made tests, and other assessment devices, than they do standardized tests (Crooks, 1988). Conceptions of validity have been formulated as a consequence of the widespread use of standardized tests — rather than as a consequence of teachers' classroom assessment practices — because high stakes standardized tests enjoy a higher profile and, therefore, receive more criticisms than do teacher-made tests. Over the years, test developers and measurement specialists have often reformulated their conceptions of validity to address a variety of concerns associated with standardized tests. Validity has always been a very important concept in educational and psychological testing, and test developers have generally been expected to provide some sort of validity 37 evidence (Angoff, 1988; Messick, 1989b). Yet it was not until 1954 that validity standards were formally introduced into the psychometric literature in the Technical Recommendations for Psychological Tests and Diagnostic Techniques (Anastasi, 1968). At that time, four types of validity were identified — content, predictive, concurrent, and construct — and validity was described as: Validity information indicates to the test user the degree to which the test is capable of achieving certain aims. Tests are used for several types of judgment, and for each type of judgment, a somewhat different type of validation is involved. (American Psychological Association [APA], 1954, p. 13, cited in Shepard, 1993, p. 408) The 1954 Standards helped to reduce some of the chaos in test validation processes by limiting the different kinds of validity to four; they also indicated that the type of validity to be used depended on the purpose of the test. Conceptions of validity did not change dramatically during the 1950s and 1960s; however, in the 1966 Standards, instead of four different types or aspects of validity, there were only three because predictive and concurrent validities had been placed into the category criterion-related validity (APA, 1966). The resulting "three aspects of validity corresponding to the three aims of testing" (APA, 1966, p. 12) became known as content validity, criterion- relatedvalidity, and construct validity and the "tripartite" view of validity took hold. When the 1974 version of the Standards for Educational and Psychological Tests (Standards)(American Psychological Association, American Educational Research Association, & National Council on Measurement in Education, 1974) was published to replace the outdated 1966 version, validity was conceived of somewhat differently: Questions of validity are questions of what may properly be inferred from a test score; validity refers to the appropriateness of inferences from test scores or other forms of assessment, (p. 25) As of the 1974 Standards, then, the "inferences" made from "test scores" were to be validated rather than the test itself. New to the 1974 Standards was the section entitled "Standardsfor the use of tests" (p. 56) which "explicitly introduced] concern for bias, adverse impact, and other 38 social consequences of the uses and misuses of tests" (Messick, 1989b, p. 18) — concerns which were becoming prevalent as more and more testing programs were put into place. In the most recent version of the Standards for Educational and Psychological Testing (APA et al., 1985), validity referred "to the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores" (p. 9) and it is viewed as a "unitary concept". No longer are three different kinds of validity recognized, instead three "categories of validity evidence" have been identified: "content-related, criterion-related, and construct-related evidence of validity" (p. 9). The Standards maintained that, while "these categories are convenient... the use of category labels does not imply that there are distinct types of validity or that a specific validation strategy is best for each specific inference or test use" (p. 9). The Standards (1985) no longer focus solely on the test developer; rather, they stress that, "although the test developer should supply the needed information, the ultimate responsibility for appropriate test use lies with the user" (p. 3). It is the test user who "should know the purposes of the testing and the probable consequences" (p. 41) and judge a test's appropriateness "in the context of the larger assessment process" (p. 41). More recent conceptions of validity (e.g., Cronbach, 1988; Messick, 1989b; Shepard, 1993) extend the conception outlined in the 1985 Standards and reflect the importance of values and consequences in validation research. Messick (1989a) considered values to be an important consideration in validity inquiry because they influence our theories, our decision to use or not use a test for a particular purpose, the names given to tests, the labels given to constructs, and how we interpret test results (Messick, 1989b). As Messick asserted, "values are intrinsic to the meaning and outcomes of testing" (1989a, p. 10) and other forms of assessment. Messick also stressed the importance of the consequences of test interpretation and use in a number of his articles (e.g., Messick, 1975, 1988, 1989a, 1989b). He suggested that, when a decision is to be made about a particular test use, it is important to ask: "Should the test be used for the proposed purpose?" (Messick, 1975, p. 962). To answer this question "requires an evaluation of the potential consequences of the testing in terms of social values" (Messick, 1975, 39 p. 962). If a test, used for a particular purpose, leads to unintended and undesirable consequences for individuals or for society, then "the intended ends do not provide sufficient justification" (Messick, 1989b, p. 85) for using the test. Evidence of social consequences is an important component of Messick's (1989a, 1989b) "unified validity framework". Messick (1989b) defined validity as "an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment [emphasis in original]" (p. 13). This definition reflects what Messick viewed to be the key issues of validity: "the meaning, relevance, and utility of scores, the import or value implications of scores as a basis for action, and the functional worth of scores in terms of the social consequences of their use" (Messick, 1989a, p. 5). Recent conceptions of validity have important implications for teachers' classroom assessment and grading practices. To begin with, because validity refers to the appropriateness, meaningfulness, and usefulness of the inferences drawn from assessment results, "the development or selection of assessment methods for collecting information [about student knowledge, skills, attitudes, and behaviours] should be clearly linked to the purposes for which inferences and decisions are to be mate" (Principles for Fair Student Assessment, 1993, p. 5). That is, teachers must determine the adequacy and appropriateness of the results of each assessment method they employ vis-a-vis the inferences and decisions they are required to make about student progress in school. Moreover, because letter grades reflect a teacher's judgments, or inferences, about students' progress toward the expected learning outcomes of a course based on the results of their assessments, teachers must determine the adequacy and appropriateness of those judgments. In addition, teachers must provide information that will help students and parents to make inferences about student progress in school based on letter grades that are adequate and appropriate, and thereby, ensure that any actions taken as a result of those inferences (e.g., decide to work harder in school, enroll in subsequent courses) are appropriate, meaningful, and useful. 40 Summary In this chapter, I have discussed circumstances in my personal life, events in education in both North America and B.C., and developments in validity theory. They have been included here because my experiences as a teacher, mother, student, and researcher influenced my choice of research topic; affected the decisions I made as I developed my research plan, collected and analyzed the data, and prepared this document; and motivated me to see this study through to the end. 41 CHAPTER 3 RELATED LITERATURE Two areas of literature important to this study are discussed in this chapter. The first section discusses assessing, evaluating (grading), and reporting student progress, while the second reviews research that has investigated teachers' assessment and grading practices. Assessing, Evaluating, and Reporting Student Progress Assessment, evaluation, and reporting of student progress, or achievement, are integral facets of the education process. When student progress in school is appropriately and effectively assessed, evaluated, and reported, teachers, students, parents, and others can monitor student learning and make appropriate educational decisions (BCME, 1994a; Popham, 1995). More detailed discussions of these facets are presented below. Assessing Student Progress Introduction Whenever teachers collect information, or evidence, about what students know, what students can do, or how students feel about something (i.e., student's attitudes), they are assessing. A n assessment method is the strategy or technique that a teacher uses to collect assessment information. Teachers use a variety of assessment methods to collect evidence about students' performance in relation to the expected learning outcomes for a course and grade. Some of these methods are informal while others are formal. For instance, as students complete a lab designed to investigate the freezing and boiling points of water, a science teacher may observe and question the students while they work to determine how well they understand the purpose of the lab, whether they can appropriately and effectively use a thermometer, and if they understand and can apply laboratory safety rules — in this case, the teacher is informally collecting information about what students know and can do, and may use this information to give immediate feedback to the students so that they can successfully and safely complete the lab. 42 On the other hand, a teacher may wish to use a more systematic, or formal, method to determine if students understand the purpose of the lab, can use a thermometer, and have knowledge of, and can apply, laboratory safety rules. If this is the case, then she may design a checklist that she completes as she observes students working to determine the freezing and boiling points of water. With the checklist, the teacher can collect the same kinds of information about all of the students in the class, and may use that information to give students immediate feedback as they work, to make decisions about future lessons, and/or to provide feedback to the students about their progress. In the event that the teacher wishes to use an even more formal method to collect evidence about students' knowledge of the properties of water, how to investigate those properties, and students' knowledge of laboratory safety rules, she may have students complete a lab report that is marked, or design a paper-and-pencil test for the students to write. The information collected via the lab report or written test could then be used to give students feedback about their understanding of the subject matter, to provide information to the teacher for the purposes of making instructional decisions, and/or recorded in order to assign letter grades at the end of a term or the year. In B.C., the B C M E views assessment as the "systematic gathering of information about what students know, are able to do and are working toward" (BCME, 1994a, p. 103). Students are assessed by teachers for many purposes and in many ways. Some of the more common purposes and methods are discussed here. Purposes of Assessment Many purposes of student assessment have been identified in the literature. Data and other information, both anecdotal and numerical, collected through assessment activities are commonly used to monitor student progress (BCME, 1994a; Popham, 1995; Principles for Fair Student Assessment, 1993); diagnose students' strengths and weaknesses (Popham, 1995); group and place students (Stiggins & Conklin, 1992); diagnose individual and group needs (Stiggins & Conklin, 1992); screen students in and out of programs (Stiggins, 1994); generate, 43 or assign, letter grades (Popham, 1995; Stiggins, 1994; Stiggins & Conklin, 1992); determine teacher's own educational effectiveness (Popham, 1995); judge curricular adequacy (Sanders, 1989); and evaluate teachers and other educators (Popham, 1995; Stiggins, 1994). In addition, the assessment process can help define what is valued by the teacher and/or society (Stiggins, 1994), control and motivate students (Ebel, 1979; Stiggins, 1994; Stiggins & Conklin, 1992), and prepare students for later assessment (Stiggins & Conklin, 1992). Students, therefore, are assessed for a number of different purposes. However, as Wilson's (1990) research suggested, the main purpose for assessing students in school, at both the elementary and secondary levels, is for "the generation of marks for reporting purposes" (p. 13) that is, for grading. Whatever its purpose, the strategy or technique used to acquire assessment information "should be appropriate for and compatible with the purpose and context of the assessment" (Principles for Fair Student Assessment, 1993, p. 5). Assessment Methods Over the years, educators have developed numerous strategies and techniques to assess students' progress in school. Included among these methods are teacher-made paper-and-pencil tests, standardized tests, portfolios of student work, observations, checklists, interviews, student self-evaluation, peer evaluation, performance assessments, performance reviews, projects, daily practice assignments, oral questioning, interviews, oral reports, written reports, rating scales, and attitude scales (BCME, 1994a; Popham, 1995; Stiggins, 1994). The strategy or technique used by a teacher depends upon the type of information that is to be collected which, in turn, depends upon the purpose of the assessment. Hence, an assessment method "should be developed or chosen so that the inferences drawn about the knowledge, skills, attitudes, and behaviours possessed by each student are valid and not open to misinterpretation" (Principles for Fair Student Assessment, 1993, p. 5). 44 Because this study does not focus on the mechanics of student assessment, a detailed discussion of the wide variety of assessment tools and methods available to teachers for collecting information about student progress is not included here — in-depth discussions of these assessment methods can be found elsewhere (e.g., Gronlund & Linn, 1990; Stiggins, 1994; Worthen et al., 1993). Grading Student Progress Introduction Grading is the process of assigning a symbol to indicate how well a student has performed in a particular area. The symbol used to summarize a student's performance, or achievement, in a particular area is called a grade — a grade may be a letter or a number and should always be accompanied by a complete definition of that grade or number. A grade communicates a judgment about the general performance of a student, but does not explain how or why that judgment was made. Grades, such as letter grades, are used extensively because they reduce a large amount of information about a student's performance in a particular area of their school work to a single symbol that can be easily used for administrative purposes and readily interpreted by students and parents (Worthen et al., 1993). Moreover, identically defined grades can be readily and conveniently combined to calculate an overall percentage or grade point average (GPA). The process of grading involves evaluation — whenever a teacher grades a student's performance she assigns a symbol that indicates her evaluation, or judgment, of a student's performance in relation to specified standards or criteria (BCME, 1994a; Popham, 1995). The standards used by teachers may vary; however, in B.C., "standards are realistic expectations of what students need to know and be able to do as a result of their education[;]... provincially mandated curriculum guides express these standards as expected 'learning outcomes'" (BCME, 1994a, p. 13). 45 Purposes of Grades Although the basic function of grades is to inform students, and their parents, about their progress in school (BCME, 1993b), they are also used by teachers to make instructional decisions; by counselors to help students make educational and vocational decisions; by administrators to make decisions about promotion, academic awards, and scholarships; by other schools and/or post-secondary institutions to make admission decisions; and by some employers to make hiring decisions (Gronlund & Linn, 1990; Worthen et al., 1993). In addition, grades are used to control students behaviour in class (e.g., "If you don't behave, you won't get an 'A ' . " ) and to motivate students so that they will work harder to achieve higher grades (Kohn, 1994). In most cases, however, the primary purpose of a letter grade is to communicate information about a student's performance, or achievement, in school to the student and his or her parents. Types of Grading Systems Several commonly-used grading systems are discussed in this section. Key features of each grading system are described along with some of their strengths and weaknesses. Letter grades The most common grading system is the five-point letter grade system (Gronlund & Linn, 1990; Popham, 1990; Worthen et al., 1993). In this system, a student's progress is indicated by a single letter that represents a specific descriptor, or adjective, that describes student progress (e.g., " A " = excellent, " B " = good, " C " = fair, " D " = poor, and " F ' = fail). Typically, each letter grade is associated with a percentage range. Several variations of this system exist. For instance, teachers may use "+" or " - " signs to increase the number of points on the scale (e.g., C+, C-), or numbers (1, 2, 3, 4, 5) may be used instead of letters (Gronlund & Linn, 1990). The following version of the five-point letter grade system is currently used in B.C.: " A " , " B " , "C+", " C " , "C-" , "I", and " F ' (for the final grade only).8 8 See Chapter 2 for a discussion of the meanings of these letter grades. 46 The popularity of the letter grade system can be attributed to its conciseness and convenience, and its usefulness in predicting future achievement in school (Gronlund & Linn, 1990; Popham, 1990). Because letter grades have been widely used for a long time throughout North America, teachers are comfortable using them, students and parents are familiar with them and believe they know what they mean, and comparisons among students can readily be made (Bailey & McTighe, 1996). However, the previous discussion about the recent changes to the definitions of these seemingly familiar letter grades in B.C. may mean that these positive aspects are illusory. Even if these positive aspects are realized, however, the letter grade system is not without its shortcomings. Gronlund & Linn (1990) identified three: (1) they typically are a combination of achievement, effort, work habits, and good behavior; (2) the proportion of pupils assigned each letter grade varies from teacher to teacher; [and] (3) they do not indicate a pupil's strengths and weaknesses in learning, (p. 430) In addition, the meaning of letter grades may vary from teacher to teacher, course to course, school to school, or from one time period to another (Hills, 1981). To overcome the shortcomings associated with them, letter grades need to be supplemented with other kinds of information, such as written comments or parent-student-teacher conferences (Gronlund & Linn, 1990; B C M E , 1992d). Numerical grading systems A numerical grading system indicates student progress via a numerical score or a percentage. Numerical grading systems are used because they are easy to calculate and explain, can be easily converted to a letter grade (i.e., 86 - 100% = A), and "lend an aura of objectivity to student evaluation" (Bailey & McTighe, 1996, p. 125). The most common numerical grading system used in education is the percentage grading system (Worthen et al., 1993). In this system, a student's performance in a particular area is indicated by a number between 0 and 100 "that supposedly reflect[s] the percentage of material 47 the student has mastered" (Worthen et al., 1993, p. 389). This assumption, however, is not necessarily valid because, due to the variability in the tests or other assessment devices used by teachers to collect information about what students know and can do, it would be a rare case where the percentage score of a student actually represents the amount of material a student has mastered (Worthen et al., 1993). Early in this century, the percentage grading system was the most popular way to report student achievement. However, concern that it was difficult to differentiate between the many small increments on the percentage scale (e.g., between 81% and 82%, or 55% and 60%) led to a decrease in its popularity and the adoption of grading systems with fewer, and larger, increments on their scales (i.e., the five-point letter grade system) (Hills, 1981; Worthen et al., 1993). Pass-fail system Another grading system consists of only two categories — "pass" and "fail". Although not as popular as it once was, it was originally implemented for two reasons: first, "it was viewed as a means of avoiding the nonegalitarian aspects of traditional grading procedures" (Cunningham, 1986, p. 184), and second, it allowed "students to take some courses, usually elective courses, under a pass-fail option that is not included in their grade-point average" (Gronlund & Linn, 1990, p. 430), thereby enabling them to explore new, and unfamiliar, areas of study by removing the fear of a possibly lower grade-point average. A major shortcoming of the pass-fail system lies in the fact that less information is provided about a student's performance than with the traditional letter grade system because there is no indication of the level of student learning — because students' records are often considered to be incomplete when pass-fail grades are assigned, some schools are hesitant to accept students who have completed numerous pass/fail courses (Gronlund & Linn, 1990). In addition, the pass/fail system appears to affect student motivation; that is, students tend to work only for a pass and, as a result, do not show the same kind of achievement as when letter grades are assigned (Gronlund & Linn, 1990; Worthen et al., 1993). 48 Types of Comparisons Used When Grades Are Assigned According to Worthen et al. (1993), "all grading systems consider how well a particular student has done in comparison to some standard" (p. 384). There are a number "bases for comparison"9 commonly used by teachers when grades are assigned. For example, a student's achievement can be compared to the performance of other students (i.e., norm-referenced or relative standards); to pre-specified standards (i.e., criterion-referenced or absolute standards); to their learning ability, or aptitude; to the effort they apply; or to the amount of improvement, or growth, they have shown over a term (BCME, 1979; Cunningham, 1986; Hills, 1981; Gronlund & Linn, 1990; Worthen et al., 1993). The basis for comparison used to assign a letter grade determines the meaning to be attributed to that grade. Achievement as a basis for comparison As Stiggins (1994) observed, "[achievement] has long represented the foundation of our grading process" (p. 369). When achievement serves as the basis for comparison10, students who demonstrate that they have learned more get higher grades than those who have learned less. Measurement specialists (e.g., Gronlund & Linn, 1990; Stiggins, 1994; Worthen et al., 1993) agree that grades should be assigned solely on the basis of achievement, and that other factors such as effort or aptitude should be reported on separately. The letter grades assigned to a student can be determined by comparing his or her performance to that of other students (norm-referenced grading), or to absolute standards (criterion-referenced grading). These two bases for comparison are discussed here. It should be noted that a variety of terms are used in the literature for the term "basis for comparison". For example, Hills (1981) used the term "comparison basis" and Gronlund & Linn (1990) used "frame of reference". Some authors (e.g., Gronlund & Linn, 1990 use the term "frame of reference" for the term "basis for comparison 49 Achievement relative to other students: Norm-referenced grading When norm-referenced grading (relative grading) is used to assign letter grades, students' grades are determined by their relative ranking within a group rather than on "some absolute standard of achievement" (Gronlund & Linn, 1990, p. 439). The reference group chosen for comparison may be the students in a Science 9 class, or some other group such as all students enrolled in the same course within the school. Because a student's grade is based on how well s/he has done compared to the rest of the students in the class, it is affected by an individual's performance as well as by the performance of the total group; as a consequence, the basis for comparison can shift according to the makeup of the reference group. Moreover, because a student's grade depends more on the performance of the other students in the reference group than on his or her own performance, a grade assigned in this way does not show how well a student has mastered the material, nor what the student can do (Gronlund & Linn, 1990; Guskey, 1996; Worthen et al., 1993). Gronlund and Linn (1990) suggested that relative grading is widely used in schools "because much of classroom testing is norm referenced. That is, the tests are designed to rank pupils in order of achievement rather than to describe achievement in absolute terms" (p, 440). When relative grading is used to assign letter grades, the number of As, Bs, Cs, etc. is, often, determined before any letter grades are assigned. Relative grading is also known as "grading on the curve" because, traditionally, the number of each letter grade has been determined using the normal curve (Worthen et al., 1993). Hills (1981) believed that relative grading is appealing because "it is apparently readily understood by parents, teachers, and administrators, and ... [because] it doesn't require any soul-searching by teachers to determine what standards are appropriate" (p. 291) — the numbers speak for themselves without any judgments required. However, as Gronlund and Linn (1990) argued, Such grading is seldom defensible for classroom groups because (1) the groups are usually too small to yield a normal distribution; (2) classroom evaluation instruments are usually not designed to yield normally distributed scores; and (3) the population becomes more select as it moves through the grades and the less- able pupils fail or drop out of school, (p. 441) 50 For relative grading to be used appropriately, the reference group must be large, and the abilities of the students must be normally distributed. Because relative grading is more appropriate for large-scale assessments than it is for classroom assessment, the B C M E takes the position that "a norm-referenced evaluation system is not meant for classroom assessment because a classroom does not provide a large enough reference groups" (BCME, 1994a, p. 16) and that teachers are to assign criterion-referenced letter grades (see below). Achievement relative to absolute standards: Criterion-referenced grading Absolute grading (also referred to as criterion-referenced or mastery grading) takes place when a teacher uses a criterion-referenced approach and compares each student's performance to a pre-specified set of standards. These standards are generally specified by the teacher based on her knowledge of the subject and perceptions of what students should be able to do (Hills, 1981). Because the standards are to be pre-specified, a teacher should be able to clearly communicate to the students her expectations and what her grades mean. As a result, Hills contended, if they work hard and the teacher provides adequate support, all students in a class should be able to obtain the best grade possible. In an absolute grading system, it is assumed that the percentage achieved by a student on a criterion-referenced test represents the percentage of the material he has mastered (Worthen et al., 1993). However, the perceived precision of this type of grading system is illusory because the difficulty level of tests created by teachers can vary — given two tests purportedly designed to assess students' understanding of the same concept, the class average on one may be 90% and 60% on the other — hence, the percentage grades are not as precise as they first appear to be (Worthen et al., 1993). As Gronlund and Linn (1990) observed: The absolute system of grading is much more complex than it first appears. To use absolute level of achievement as a basis for grading requires that (1) the domain of learning tasks be clearly defined, (2) the standards of performance be clearly specified and justified, and (3) the measures of pupil achievement be criterion referenced, (p. 440) 51 To meet these requirements, teachers need time, experience, well-developed curricula with well- defined learning outcomes and criteria, and reliable and valid measurement instruments — conditions that rarely exist and are difficult to attain, in the real world of the individual classroom teacher. Aptitude or ability Some teachers assign letter grades by comparing students' achievement, or what they have learned, to their aptitude, or ability to learn (Popham, 1990; Worthen et al., 1993). When ability is the basis for comparison, two students with the same achievement in a subject, would be assigned different grades if the teacher believed their abilities were different. That is, given that two students have demonstrated the same level of achievement, the student with a perceived low ability would be assigned a higher letter grade than would be the student with a perceived high ability. A major problem with this method is that ability, or aptitude, is very difficult to measure, and even if the measurement were to be accurate, there is little reason to believe that aptitude is stable over time (Worthen et al., 1993). Despite this problem, some teachers continue to assign letter grades based on the perceived ability of their students (Brookhart, 1992; Friedman & Manley, 1991). Effort When effort serves as the basis for comparison for evaluating student progress, a teacher assigns letter grades based on how hard she believes students have worked. Such grades do not represent the achievement of students, they represent a teacher's perception of the effort expended by students as they worked to learn what is required of them in school. Consequently, students who are perceived to have tried harder are assigned higher grades (Stiggins, 1994). As Hills (1981) observed, teachers who subscribe to this method of assigning letter grades believe students who have difficulty learning are more deserving than students who learn easily, and tend to justify this grading method on the basis of motivation, in the belief that the main purpose 52 of grades is to motivate students, rather than to communicate information about student achievement. Improvement or growth Letter grades are sometimes assigned by teachers on the basis of student improvement, or growth (Waltman & Frisbie, 1993; Worthen et al., 1993). When improvement serves as the basis for comparison for assigning letter grades, students who — often on the basis of pre- and post-test scores — show the most improvement in a particular subject area are assigned better letter grades than those who show less improvement. Therefore, due to the "ceiling effect" (Borg & Gall, 1979), students who start a course with a lot of knowledge about a subject may not show as much improvement when assessed at the end of a unit or course as those who started with relatively little knowledge. Moreover, if students are informed about how they will be evaluated — as advocated by the Principles for Fair Student Assessment Practices for Education in Canada (1993) and our values as a fair and democratic society — most students would work very hard to score as close to "0" as possible on the pre-test, thereby destroying the integrity of the method. Such letter grades can be misleading because a letter grade assigned for the purpose of conveying information about student achievement does necessarily not represent how much a student knows and/or can do (Worthen et al., 1993). Summary Several bases for comparison used when teachers assign letter grades have been described in this section. As noted above, measurement specialists agree that letter grades should be assigned solely on the basis of student achievement. Moreover, they agree that, for the purpose of grading, a student's achievement should be compared absolutely to pre-set standards rather than relatively to other students; that is, grades should be criterion-referenced not norm- referenced. However, at times, teachers use some other basis for comparison (e.g., effort, ability) — either on its own, or in combination with other comparison bases — when assigning grades. When this is the case, a letter grade no longer is intended to communicate information solely about student achievement — the primary purpose of a letter grade; it now is intended to 53 communicate information about a variety of different student characteristics. For that reason, the meaning becomes unclear, and the letter grade becomes ineffective as a tool for communicating information about student achievement. Components of Letter Grades When it comes to the kinds of information teachers should incorporate into letter grades, measurement specialists tend to agree that they should probably consider only student achievement (e.g., Gronlund & Linn, 1990; Principles for Fair Student Assessment, 1993; Worthen et al., 1993). Furthermore, " if letter grades are to serve as valid indicators of achievement, they must be based on valid measures of achievement" (Gronlund & Linn, 1990, p. 437). Valid measures of achievement are those tests, teacher observations, written assignments, and other devices that are developed or selected that most directly measure expected learning outcomes in a reliable fashion. At the same time, measurement specialists also tend to agree that non-achievement factors relating to student development (e.g., effort, behavior, motivation, aptitude, neatness, class participation, work habits, improvement, or attitude) are important, but should be reported separately by other methods such as written comments or checklists and not incorporated into a letter grade (e.g., Gronlund & Linn, 1990; Principles for Fair Student Assessment, 1993; Stiggins, 1994; Stiggins & Conklin, 1992; Worthen et al., 1993). The inclusion of non- achievement factors in letter grades is problematic because it is difficult to find reliable methods to assess such factors, and "when letter grades combine various aspects of pupil development, ... they lose their meahingfulness as a measure of achievement... [and] suppress information concerning other important aspects of development" (Gronlund & Linn, 1990, p. 437). Besides, non-achievement factors such as effort, behaviour, work habits etc. are likely already included a student's assessment results (e.g., criterion-referenced test scores) in that, all other things being equal, the student who has put more effort (better behaviour, better work habits, etc.) will probably achieve better results on assessments than will a student who has put 54 in less. Purposefully factoring an effort component into a letter grade, in effect, "double counts" effort and therefore over emphasizes it. In addition to the development-related non-achievement factors mentioned above, various authors have highlighted other factors that they believe probably should not be considered when letter grades are determined. • Gronlund and Linn (1990) argued that the amount of work done by a student should not be factored into the letter grade because it is the quality of the work that is important not the quantity. • Some teachers allow students the opportunity to raise their grades by completing extra work, however, Hills (1981) argued that this is an acceptable practice only if the higher grade is given for a better quality of work rather than for extra quantity. • The B C M E has stated that exercises done for practice or drill "should not contribute to the term or final letter grade" (BCME, 1994a, p. 18) because such exercises are designed to help students learn, not to collect assessment data. • Hills (1981) objected to grading on tardiness or absences from class because grades that factor in this kind of information are being used as a discipline method and do not accurately reflect student achievement.11 • Stiggins (1994) viewed the inclusion of homework (e.g., whether or not it was completed; the amount done) as problematic because it is not always possible to know how much of it was completed by the student and how much by a helpful parent.12 1 1 Some teachers inadvertently include attendance in their letter grades by assigning a score of "0" when a student is absent for a test. 1 2 It is also not always possible to know if incomplete homework was under the control of the student rather than due to some unavoidable family or environmental situation. 55 • Information collected "for diagnosing student needs, providing students with practice performing or evaluating performance, and tracking student growth during instruction" (Stiggins, 1994, p. 381) should not be considered when determining letter grades because such information is collected for purposes other than for assessing student achievement. When it comes to the factors that should and should not be considered by teachers when letter grades are determined in B.C., the B C M E has taken a position similar to that of measurement specialists. It has asserted that although it is important for teachers to communicate information to parents about aspects of student development other than achievement, "assessing behaviour, effort, motivation and interest and including them in a grading system is problematic" (BCME, 1994a, p. 22). As a result, teachers are required to include written comments about a student attitudes, work habits, and effort in formal reports (BCME, 1994a). In addition, most B.C. report cards have a specified area and another set of symbols (i.e., G = good; S = satisfactory; N = needs improvement)13 for at least some of these factors. Despite the general consensus that achievement should be the only factor teachers consider when they assign letter grades, research shows that teachers often consider non- achievement factors such as behaviour in class, attendance, tardiness, and work habits when they determine letter grades (e.g., Bachor & Anderson, 1993b; Friedman & Manley, 1991; Hobbs, 1992; Stiggins & Conklin, 1992; Waltman & Frisbie, 1993, 1994). Reporting Student Progress Introduction After a teacher has assessed and evaluated (graded) a student's performance in a particular subject or course, that evaluation must be communicated to various audiences; that is, 1 3 As an example of how history or previous experience might lead to misinterpretation, at one time, the Ministry of Education in B.C. defined the G, S, N effort/work habits symbols as G = good, N = normal, and S = slow, but satisfactory (and a U = unsatisfactory). It is possible, therefore, that people familiar with these definitions could misinterpret the effort/work habits symbols used on report cards today. 56 it must be reported. The purpose of reports of student progress, along with some reporting methods are discussed in this section. Purpose of Reports of Student Progress In B.C., reports of student progress document and communicate "significant aspects of students' progress in learning. They describe, in relation to the curriculum, student progress in intellectual, social, human and career development" (BCME, 1994a, p. 3). The information in a student's progress report can describe what a student is able to do, is working toward, and areas in which s/he needs help. A progress report can also provide information about important aspects of a child's development (e.g., behaviour, work habits), or the objectives of the school (Gronlund & Linn, 1990). The information provided by progress reports can motivate students, and enable parents to give their children support and encouragement. Methods of Reporting Student Progress A variety of reporting methods can be used to report student progress — some of these can be considered to be formal (e.g., a report card or form with letter grades, written narrative report), and others to be informal (e.g., telephone call, brief written note) (BCME, 1994a). Descriptions of several different reporting methods follow. Formal reporting methods In B.C., a formal report is one that is written on a form that has been approved by either the B C M E or a school board — such a report may include a structured written report, letter grades, and/or percentages, depending on the grade in which the student is enrolled. Three formal reports are required in B.C.; two during the school year and one at the end of the of the year (BCME, 1994a; B C M E , 1996). The format of the report, the information provided, and the number of formal reports issued during the year may vary in other educational jurisdictions. Three formal reporting methods are: report cards, written narrative reports, and checklists. Report Cards: A common method of reporting student progress is the report card. Report cards are standardized forms designed so that grades and/or percentages can be recorded for 57 each subject or course in which a student is enrolled. In addition, report cards often include sections where other student information (e.g., attendance, work habits) can be recorded and/or teacher's comments can be written (Worthen et al., 1993). Report cards may appeal to students and parents for a number of reasons. For example, they may like report cards because student achievement is displayed in a concise fashion and, because report cards with letter grades have been widely used for a long time, most people are familiar with them and feel that they are easy to understand. In addition, when grades for different subjects, and terms, are recorded on the same report card, the subjects in which a student is doing well and/or doing poorly can be readily identified, and progress over time can be easily tracked. Report cards, therefore, are useful because they provide a convenient summary of a student's progress in school. However, the amount of detailed and personalized information a teacher communicates about a student's performance in school via a report card is often limited due to the time it takes to compile such information. Teachers often write short, generalized comments, rather than extensive, personalized comments. As a result, the amount of information provided on a report card about a student's strengths and weaknesses, likes and dislikes, plans and goals is limited or may be nonexistent. Teachers' comments about a student's progress in school may be even more limited and less personalized if the school uses computer generated report cards that require teachers to select their comments from a pre-existing list of comments. For teachers, a major advantage of report cards with letter grades lies in the time it takes to complete them compared to the time required to complete other formal reporting methods — it generally takes less time to determine letter grades than it does to write a carefully written narrative report, or construct and complete checklists, especially when computer grading programs are utilized to calculate grades and generate comments. In addition, the concise summary of student achievement provided by the grades written on report cards enable teachers and/or administrators to more easily track the progress of their students than is the case when student progress is reported via written narrative reports or checklists. 58 Narrative Reports: A written narrativereportthat carefully describes a child's progress in school is another way to communicate information about a child's progress in school. A narrative report can provide more than just a summative evaluation of a child's performance; it can describe what a child has done, is working on, and is working toward. A child's strengths and weaknesses can also be described in such a report, along with the teacher's suggestions for improvement. Several factors, however, limit a narrative report's usefulness as the sole method of reporting student progress. First, a great deal of teacher time and effort is required to write extensive and personalized reports for all of the students in a class. Second, there is a danger that, because of the time and effort required to write them, teachers may rely on stereotyped, instead of personalized, comments. Third, because a narrative report does not provide a convenient and systematic summary of student progress it can not be readily used for administrative purposes (Bailey & McTighe, 1996; Worthen et al., 1993). Furthermore, some parents have criticized the exclusive use of narrative reports on the grounds that they do not show how a child is doing compared to their peer group (e.g., Balcom, 1993a; McCormick, 1993). Nevertheless, a well-written narrative report is a useful supplement to other reporting methods such as letter grades (Gronlund & Linn, 1990). Checklists: A checklist consists of a list of objectives, skills, tasks, or outcomes that are checked, or rated, by the teacher. Many different kinds of rating scales can be used on a checklist to rate a child's level of performance for a specific task, or outcome. For example, by checking "Yes" or "No", a teacher may use a checklist to indicate whether a child can or cannot do a particular task. A teacher may check "never", "rarely", or "frequently" to indicate the how often a child exhibits a given skill or behaviour. Or a teacher may use the scale " G " , "S", and " N " (Good, Satisfactory, Needs Improvement) to indicate a student's work habits. As another alternative, a teacher may provide a list of expected learning outcomes and check off only those that the student has successfully achieved (Bailey & McTighe, 1996; Gronlund & Linn, 1990; Hills, 1981). 5 9 When carefully constructed, a checklist can provide a detailed analysis of a student's strengths and weaknesses as they pertain to specific learning outcomes, and efficiently communicate this information to students and parents (Bailey & McTighe, 1996; Gronlund & Linn, 1990). Furthermore, a checklist combined with a rating scale is an effective way to report non-achievement factors (e.g., effort, behaviour). However to be effective, checklists must be written concisely using terms appropriate to the audience, or audiences. As is the case for the narrative report, a checklist does not provide a convenient summary record of student progress — it may be most valuable when used in conjunction with letter grades or some other reporting method. Informal reporting methods Teachers use several different methods to informally report information about student progress in school. These include telephone calls, written notes, interim reports, conferences (with the parents and the teacher, or the student, parents, and teacher), and portfolio reviews (BCME, 1994a). In B.C., teachers are required to provide two informal reports for each student per year (BCME, 1996). Such reports may be used to describe "what the student is able to do, the areas of learning that require further attention or development, [and the] ways the teacher is supporting the student's learning needs (and, where appropriate, ways the student or the parents might support the learning)" (BCME, 1994a, p. 4). Portfolios: In recent years, some educators have concluded that a portfolio of student work is an effective way to communicate information about student progress (e.g., Bailey & McTighe, 1996; Stiggins, 1994). A portfolio is a purposeful collection of student work that can be used to illustrate a student's performance in school. When carefully selected, the collection within a portfolio can show what a child is learning, how the child has grown over time, and what the child is working toward. It can also serve as the focal point of a discussion involving the student, the teacher, and/or the parents (Bailey & McTighe,' 1996). Portfolios, however, have certain drawbacks. One major drawback is the amount of time required to prepare a useful portfolio of student work. To begin with, a teacher must set aside 60 time to determine and carefully describe the purposes and criteria for a portfolio so its contents will be of value to those who review it (e.g., student, parents, teacher, other teachers administrators, employers), and so that students understand why they are preparing a portfolio of their work and the kinds of samples that need to be included. Once the purposes and criteria have been established, the process of putting a portfolio together also takes time because the student work within a portfolio must be systematically and continuously collected, reviewed, and discussed. The time and effort that is required to produce a useful portfolio is particularly problematic at the secondary level where the large number of students taught by each teacher makes it very difficult to schedule and conduct individual student-teacher discussions of portfolio contents (Bailey & McTighe, 1996). Another drawback of portfolios concerns the difficulties that may arise when portfolios are used to assess student progress — because the samples of work selected by each student for the same unit or course can vary widely, the assessment, or scoring, of portfolios and the evaluation of student progress, based on that assessment information, may be inconsistent and unreliable (Worthen et al., 1993). Furthermore, although a portfolio may effectively show how a student has grown over the term, or the year, it does not provide a convenient summary of student performance, nor does it show how a student has done compared to his or her peers. As is the case for checkl