UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Effect of topical knowledge on L2 writing He, Ling 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2010_fall_he_ling.pdf [ 5.19MB ]
Metadata
JSON: 24-1.0071206.json
JSON-LD: 24-1.0071206-ld.json
RDF/XML (Pretty): 24-1.0071206-rdf.xml
RDF/JSON: 24-1.0071206-rdf.json
Turtle: 24-1.0071206-turtle.txt
N-Triples: 24-1.0071206-rdf-ntriples.txt
Original Record: 24-1.0071206-source.json
Full Text
24-1.0071206-fulltext.txt
Citation
24-1.0071206.ris

Full Text

EFFECT OF TOPICAL KNOWLEDGE ON L2 WRITING by Ling He  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY  in  The Faculty of Graduate Studies (Language and Literacy Education)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  August 2010 ©Ling He, 2010  ii  ABSTRACT  This study investigates the effect of topical knowledge on university-level ESL (English as a Second Language) students’ writing in a testing situation, following Messick”s (1989) validity theory, which embraces an integration of multiple types of validity evidence (content-, criterion-, and construct-based validity, along with social consequences) to support the inferences drawn from the test scores. A total of 50 participants with different levels of English language proficiency and various ethnic, cultural, and linguistic backgrounds took part in the study in a metropolitan city in western Canada. Each student wrote two 60-minute essays: one responding to a prompt requiring general knowledge and the other responding to a prompt pertaining to specific prior knowledge. Using a mixed methods sequential explanatory design (Creswell & Plano Clark, 2007), the study collected two types of data to attend to its purposes: (1) quantitative data based on repeated direct measures of the effect of prompts on the overall writing scores, component scores (content, organization, and language), and indicator scores (idea quality, position-taking, idea development, idea wrapup, cohesion, coherence, fluency, accuracy, and lexical complexity); and (2) the qualitative interview data for an in-depth understanding of the writers’ perceptions of the two writing prompts. The overall writing scores showed that students, especially those at the intermediate and advanced proficiency levels, performed significantly better on the general topic than they did on the specific topic. The topic-specific task produced lower scores on content, organization, and language due to poor idea quality, hidden position, insufficient idea development, weak idea wrap-up, a lack of coherence and cohesion, shorter length, more  iii  syntax and lexical errors, and less frequent use of academic words. Posttest interviews confirmed how participating students were challenged by writing prompt that requires specific prior knowledge. The findings suggest that topical knowledge is a fundamental schemata to elicit a writer’s performance. Without such knowledge, an ESL writer, even with a high English proficiency, cannot achieve his or her optimal performance. The study calls attention to the effect of specific topical knowledge on ESL students’ writing and the importance of developing appropriate prompts for writing tests.  iv  TABLE OF CONTENTS  Abstract ......................................................................................................................................ii Table of Contents ......................................................................................................................iv List of Tables ..........................................................................................................................viii List of Figures ............................................................................................................................x Acknowledgements ..................................................................................................................xi  CHAPTER 1  INTRODUCTION ..........................................................................................1 1.1 1.2  1.3 1.4 1.5  1.6  CHAPTER 2  Background of the Study .......................................................................1 Direct Measures of Writing ....................................................................2 1.2.1 Writing Task Characteristics ....................................................5 1.2.2 Rating Characteristics ..............................................................7 1.2.3 Writer Characteristics .............................................................11 1.2.4 Rater Characteristics ..............................................................15 1.2.5 Summary of Direct Measures of Writing................................17 Purposes of the Study and Research Questions ...................................18 Need for the Study ..............................................................................19 A Note on Terms .............................................................................20 1.5.1 Writing Sample .......................................................................21 1.5.2 Task, Topic, and Prompt .........................................................21 1.5.3 Topical Knowledge and Content Knowledge .........................21 1.5.4 Test, Measure, and Assessment ..............................................22 The Structure of the Chapters ..............................................................22  LITERATURE REVIEW..............................................................................23 2.1 Theoretical Framework ........................................................................23 2.1.1 Traditional Conceptulations of Validity .................................23 2.1.2 The Modern Concepts of Validity ..........................................28 2.2 International and Local Standardized English Tests ............................37 2.2.1 TOEFL ....................................................................................38 2.2.2 IELTS......................................................................................40 2.2.3 MELAB ..................................................................................41  v  2.3  2.4  CHAPTER 3  RESEARCH METHODS..............................................................................67 3.1 3.2  3.3  3.4  CHAPTER 4  2.2.4 CAEL......................................................................................43 2.2.5 LPI ..........................................................................................44 2.2.6 Summary of the Five Standardized English Tests ..................47 Previous Studies on Topical/Content Knowledge in Writing...............48 2.3.1 Topical/Content Knowledge and L1 Writing..........................48 2.3.2 Topical/Content Knowledge and L2 Writing..........................55 Need for the Present Research .............................................................65  Introduction ..........................................................................................67 Research Design...................................................................................70 3.2.1 The Major Types of Mixed Methods Designs .........................70 3.2.2 The Mixed Methods Design of this Study ...............................73 3.2.3 Research Site............................................................................78 3.2.4 Participants...............................................................................80 3.2.5 Instruments...............................................................................82 3.2.5.1 Background Information Questionnaire...................................82 3.2.5.2 Writing Prompts .......................................................................83 3.2.5.3 Writing Tests ............................................................................84 3.2.5.4 Interview Protocol....................................................................85 Scoring Procedures ..............................................................................85 3.3.1 Analytic Scoring Rubric of this Study .....................................85 3.3.2 Rater Training...........................................................................88 3.3.3 Analytic Scoring.......................................................................89 Procedures of Data Analysis ................................................................92 3.4.1 Phase One: Repeated Measures of a Paired t Test and a 3!2 ANOVA ....................................................................................92 3.4.2 Phase Two: Repeated Measures of the Mixed Designs 3!2 ANOVAs ..................................................................................97 3.4.3 Phase Three: Posttest Interviews..............................................98  FINDINGS AND DISCUSSION ................................................................100 4.1 4.2  4.3  Findings and Discussion For Research Question One .......................100 Findings and Discussion For Research Question Two .......................110 4.2.1 Findings and Discussion on Component Scores ..................112 4.2.2 Findings and Discussion on Indicator Scores .......................118 Findings and Discussion for Research QuestionThree ......................141 4.3.1 Topic Familiarity ..................................................................142  vi  4.4  CHAPTER 5  4.3.2 A Lack of Knowledge About Federal Politics ......................143 4.3.3 A Lack of Vocabulary to Write on Unfamiliar Topics ..........147 4.3.4 A Lack of Confidence to Comment on Authority ................147 4.3.5 A Lack of Understanding of the Expectations of Readers....149 Summary of Results. ..........................................................................151  CONCLUSIONS.........................................................................................153 5.1 5.2 5.3 5.4 5.5 5.6  Summary of Findings.......................................................................153 Significance of the Study .................................................................157 Implications for Methods .................................................................159 Implications for Testing ....................................................................160 Implications for Teaching .................................................................161 Limitations of the Study and Direction for Future Research............163  REFERENCES.......................................................................................................................164 APPENDIX A Topics and Standard Instructions From MELAB Assessment Battery .......194 APPENDIX B Descriptions of Score Levels for MELAB Compositions ..........................195 APPENDIX C Holistic Rating Scale for the Language Proficiency Index .........................197 APPENDIX D Interview Guideline.....................................................................................198 APPENDIX E Letter of Initial Contact ...............................................................................199 APPENDIX F Letter of Permission ....................................................................................200 APPENDIX G Consent Forms ............................................................................................201 APPENDIX H Recruitment Poster......................................................................................204 APPENDIX I  Focus-Group Study for Ranking Difficult or Easy Writing Topics ............205  APPENDIX J  Writing Test Sheets for Prompt A!................................................................207  APPENDIX K Writing Test Sheets for Prompt B ...............................................................209 APPENDIX L Six-point Analytic Rating Scale of this Study ............................................211  vii  APPENDIX M Headwords of the Word Families in the Academic Word List....................212 APPENDIX N Ethics Review Certificate............................................................................220  viii  LIST OF TABLES  Table 2.1  Summary of the Previous Studies on the Concept of Content ......................66  Table 3.1  Creswell and Plano Clark’s Major Mixed Methods Design Types ...............71  Table 3.2  Descriptive Statistics for Participants’Country of Origin .............................81  Table 3.3  Descriptive Statistics for Participants’Gender ..............................................81  Table 3.4  Descriptive Statistics for Participants’Age and Duration in Canada ............82  Table 3.5  Principle Component Analysis Matrix a,b......................................................91  Table 4.1  Paired-Samples t Test for Comparisons of Overall Writing Scores for Prompts .......................................................................................................101  Table 4.2  Descriptive Statistics of Overall Writing Scores for Prompts.....................101  Table 4.3  3!2 Univariate ANOVA for Overall Writing Score Measures ...................102  Table 4.4  Post-hoc Analysis: Comparisons of Mean Differences of Proficiency Effects on Overall Writing Scores for Prompts..........................................103  Table 4.5  Summary of Prompt Effects on Overall Writing Scores in Phase One.......105  Table 4.6  Descriptive Statistics for Component Scores for Prompts..........................112  Table 4.7  3!2 Univariate ANOVA for Component Score Measures ..........................112  Table 4.8  Post-hoc Analysis: Comparisons of Mean Differences of Proficiency Effects on Language Component Scores for Prompts .................................116  Table 4.9  Summary of Prompt Effects on Component Scores in Phase Two.............118  Table 4.10  Descriptive Statistics of Indicator Scores for Prompts...............................119  Table 4.11  3!2 Univariate ANOVA for Indicator Scores Measures .............................122  ix  Table 4.12  Post-hoc Analysis: Comparisons of Mean Differences of Proficiency Effects on Fluency and Accuracy Indicator Scores for Prompts ................133  Table 4.13  The Word Frequency of the Tokens of Prompt A and Prompt B ................135  Table 4.14  Summary of Prompt Effects on Indicator Scores in Phase Two .................137  x  LIST OF FIGURES  Figure 2.1  Messick’s Progressive Matrix of Facets of Validity .......................................31  Figure 2.2  Zumbo’s Integrative Cognitive Judgment in the Contextualized and Pragmatic Explanation View of Validity and Validation ................................34  Figure 3.1  Mixed Methods Sequential Explanatory Design of this Study ......................75  Figure 3.2  Frequency of Participants’ Choices of the Easy and Difficulty Topics ..........84  Figure 3.3  Visual Digraph of the Six-point Analytic Rating Scale of this Study ............87  Figure 3.4  The Boxplot Check for the Normal Sample Distribution ...............................95  Figure 4.1  The Plot of Prompt Effects on “Overall” Writing Scores .............................103  Figure 4.2  The Plot of Prompt Effects on “Content” Component Scores......................116  Figure 4.3  The Plot of Prompt Effects on “Organization” Component Scores..............116  Figure 4.4  The Plot of Prompt Effects on “Language” Component Scores...................116  Figure 4.5  The Plot of Prompt Effects on “Fluency” Indicator Scores..........................131  Figure 4.6  The Plot of Prompt Effects on “Accuracy” Indicator Scores .......................131  Figure 4.7  The Plot of Prompt Effects on “Lexical Complexity” Indicator Scores.......136  xi  ACKNOWLEDGEMENTS  I am heartily thankful to my research supervisor Dr. Ling Shi for her expert guidance, enthusiasm, and enduring support throughout all stages of this dissertation. Without her advice and assistance, this dissertation would not come into being. I am also thankful to my research committee members, Drs. Bruno Zumbo and Monique Bournot-Trite, for their constructive and supportive suggestions which have certainly helped shape this dissertation. I am deeply grateful to the Department of Language and Literacy Education at the University of British Columbia for providing opportunities that enabled me to work toward my academic goals, of which this dissertation is an important starting point. Finally, I own my special gratitude to my parents for their unwavering support and encouragement through my years of education; the sacrifices they made in the name of academia cannot be fully appreciated.  1  CHAPTER 1 INTRODUCTION  1.1  Background of the Study The cultural landscape of English-speaking universities is in transformation due to the  presence of an increasing number of international students from non-English speaking countries. Such ethnic and linguistic diversity has inspired scholars to investigate the problem of written language assessment in many different ways. Since standardized English language tests usually determine ESL students’ entry into universities in the target language countries, writing assessment teams confront challenges to provide greater test validity. Research has been conducted to identify potential variables affecting the performances of L2 test takers. Among the studies, some have examined the overall quality of written texts (Engber, 1995; Hamp-Lyons & Henning, 1991; Tsang, 1996), the accuracy of specific textual features of lexicon and syntax (e.g., Hamp-Lyons & Henning, 1991; Kobayashi & Rinnert, 1992; Ishikawa, 1995), or the appropriateness of metadiscourse, organization, cohesion or register (e.g., Devie, Railey, & Boshoff, 1993; Reynolds, 1995; Schnerfer & Connor, 1990; Shaw & Liu, 1998). Other individuals have investigated the general writing processes (Bosher, 1998; Penningtion & So, 1993; Whalen & Menard, 1995; Zamel, 1983) or a specific aspect of the writing process such as revision and prewriting activities (Hall, 1990; Henry, 1996; Porte, 1997). What is lacking, however, is research investigating topical or content knowledge in the writing prompts as a variable that potentially influences L2 writing. As such, the Conference on College Composition and Communication (CCCC) Resolution on  2  Testing (1979) emphasized that topic should be carefully considered before essay tests are used. Hoetker (1982) also points out that “[We] know almost nothing about topic variables” (p. 38). While over three decades have passed since the CCCC’s recommendation, research on the effects of topics on L2 writers’ test results is still scarce. Emphasizing the need for research on writing topics, leading scholars in L2 writing, Leki, Cumming, and Silva, (2008) have recently stated a need for “distinctions between different types of writing tasks and prompts and the qualities of writing that they [L2 writers] produce for assessment purposes” (p. 89). Topics assigned in essay tests demand greater attention because they initiate and direct the act of writing that produces samples for evaluation. This chapter discusses a few related aspects of L2 writing to introduce the background of the present study. The first section discusses the use of direct measures in large-scale L2 writing tests in terms of its theoretical influence, and test characteristics in terms of task or topic, rating, writers, and raters. Section 2 addresses the need for the present study from both a cross-cultural perspective which is derived from more than a single society and a within-cultural perspective which situates any claims within the criteria relevant to that specific context; Section 3 introduces and defines the technical terms used in this study to avoid any confusion; finally, Section 4 provides a brief overview of the structure of the subsequent chapters.  1.2  Direct Measures of Writing Different from indirect measures of writing proficiency (i.e., using discrete test items  such as multiple choices to test a writer’s grammar knowledge), direct measures based on  3  students’ writing samples have been used in large-scale writing assessment in the past decades. Given the fact that a direct measure examines the actual text the writer produces via a rating scale, it is considered a form of performance test; that is, the test assesses writing competence directly in a context which, as far as possible, approximates real-life language use. Therefore, such tests “rely heavily on the selection of appropriate topics for eliciting writing samples” (O’Donnell, 1984, p. 243). Thus, an appropriate topic is vital for a direct measure of writing competency. The trend of using direct performance tests for assessing writing ability has gained prominence in tandem with standardized English tests for assessment of L2 learners’ language proficiency. Specifically, a direct measure of writing was influenced by some leading theories on L2 proficiency from the 1970s to 1990s. First, Carroll’s skill model of language proficiency (1975) conceptualized writing and its relation to other basic language skills. According to Carroll (1975), writing is one of four integral skills accompanied by reading, listening, and speaking and should be measured directly through written products rather than through tests of knowledge concerning grammar and vocabulary. Carroll’s model, as Cumming (1997) has observed, set the initial type of performance testing which used a direct measure of L2 writing for education purposes. The advent of the communicative movement in language teaching in the 1970s imbued direct measures with a rationale based on the theory of communicative competence, represented by Canale and Swain’s influential theoretical framework (1980). Canale and Swain (1980) articulated four major types of competencies for effective communication: grammatical competence, sociolinguistic competence, discourse competence, and strategic  4  competence. The theory of communicative competence shifted the focus of assessment from language proficiency to the use of language, in the form of performance on written and/or spoken tasks. Following this trend, direct measures strive to assess the integrated use of linguistic and pragmatic skills across discourse. Later, Bachman (1990) and Bachman and Palmer (1996) expanded Canale and Swain’s (1980) theory by adding cognitive strategies to the framework. Cognitive strategies are appropriated in the high-level processes of planning and affective schemata (i.e., the affective or emotional correlates of topical knowledge) in language use. The emphasis on cognitive strategies indicates how psychological factors can influence one’s writing performance. In short, these theories of language proficiency consistently highlight not only the requirement of linguistic knowledge, but also the ability to position language resources to accomplish a specific task. From these theoretical perspectives, writing ability is seen as integral to effective communication through written texts which demonstrate an appropriate use of form and content. Compared with indirect multiple-choice measures which cannot provide a sufficient basis to evaluate the writer’s use of language beyond the testing situation, direct measures of writing are valued for their strength of employing actual writing samples which require more than accuracy of form and editing skills. Additionally, a direct measure of writing is reported to be fairer than a multiple choice test for ethnic minorities (White, 1985). Direct performance tasks are also perceived as one possible means to increase generalisability through sample representativeness, general descriptive parameters of the intended target situation, and the specific skills necessary for successful participation in the assessment (Weir, 1993). With these merits, a direct measure has become a widely accepted way for  5  evaluating L2 writing competence in standardized English tests (e.g., Bachman, 1990; Bachman & Palmer, 1996; Weigle, 2002) such as TOEFL (Test of English as a Foreign Language), IELTS (International English Language Testing System), CAEL (Canadian Academic English Language Assessment), and MELAB (Michigan English Language Battery) mainly for academic purposes (See the detailed review in the following Chapter 2. To understand the direct measure of writing, the following sections discuss four major components of test characteristics: tasks, ratings, writers, and raters.  1.2.1  Writing Task Characteristics  Direct measures of writing generally measure two types of tasks: the independent writing task and the integrative writing task. The independent writing task usually consists of a single topic, independent of all other test sections, in the form of a mini-text containing a single word or a phrase and a sentence or several sentences. The mini-text contains a word or a phrase requiring writers to generate relevant implications or predications. For example, the topics “Crowded Places” (p.15) and “Airports” (p. 51) in The LPI Workbook (University of British Columbia, 1997) do not provide the writer with a thesis for focus. The writer must develop his or her own thesis such as “Crowded places provide more chances for the likelihood of theft.” In contrast, the topic as a sentence or more provides writers with more explicit instructions or guidelines. For example, the following topic in the TOEFL iBT includes a question whose answer could lead to forming a thesis: “Many people visit museums when they travel to new places. Why do you think people visit museums? (Educational Test Service, 2006, p. 268)  6  Different from the independent writing task with a single topic, the integrated writing task is a series of related tasks. The integrated task is advocated by scholars for its interest in testing “language in use” through the integrative continuum toward a more direct, performance-based measure of language proficiency (e.g., Carroll, 1961; Oller, 1979). To complete integrated tasks, test takers need to employ more than one language skill simultaneously. For example, the integrated writing task of the TOEFL iBT is composed of three parts: First, the writer is required to read a passage concerning an academic topic, then listen to a speech on the same topic, and finally write to summarize key points in the listening passage and explain how these points are related to the content of the reading passage. Although the integrated writing task includes a series of activities, direct measures only evaluate the written products. The following is an example of the final summary task in the Official Guide to the New TOEFL iBT (Educational Testing Service, 2006):  Instructions You have 20 minutes to plan and write your response. Your response will be judged on the basis of the quality of your writing and on how well your response presents these points in the lecture and their relationship to the reading passage. Typically, an effective response will be 150 to 225 words.  Task Summarize the points made in the lecture you just read, explaining how they cast doubt on points made in the reading. (p. 251)  7  A few previous studies (e.g., Hilgers, 1982; Newell, 1984; Square, 1983) have shown that the variations in the wording and the content of the topic may affect what and how a student writes. However, few studies have been done in this aspect, due to the fact that “our ability to propose and interpret the effect of the phrasing of essay topics is limited by our poststructuralist awareness that an essay topic is a text which, like any type of text, processes very limited control over what any particular reader will make of it” (Hoetker & Brossell, 1986, p. 328). In this sense, preparing a brief, simply phrased, writing topic for large-scale examination needs to minimize the opportunities for any miscommunication. In summary, direct measures of writing are employed for both independent and integrated writing tasks. Commonly, two elements, a topic or a prompt (the subject for writing) and instructions, comprise the structure of a writing task. For the independent writing task, the tests often contain a topic which can be in the form of a word, a phrase, or one or a few sentences. For the integrated writing task, the tests usually elicit the writing performance through more than one thematically linked activity. A clear and unambiguous task is the basis for writing performances that can be measured with a greater degree of consistency (Milanovic & Weir, 2007). Direct measures of writing skills focus on writing products because “process strategies are judged successful only as they result in a good written product” (Isaacson, 1988).  1.2.2  Rating Characteristics  To choose an appropriate rating scale along with criteria or rubrics based on the purpose of the evaluation is one of the first decisions for direct measures, because  8  unsystematic grading can threaten scoring validity. In addition, at the heart of the construct validity of many performance assessments is a rating scale, as it offers an operational definition of the construct being measured (McNamara, 1996). In general, there are three types of rating scales commonly used: holistic scoring, analytic scoring, and primary trait scoring (Weigle, 2002). Among them, holistic scoring and analytic scoring have been mainly used for direct measures of L2 writing in test situations (Canale, 1981; Carroll, 1980, Jacobs, Zinkgraf, Wormuth, Hartfiel, & Hughey, 1981; Perkins, 1983). Holistic scoring, also referred to as impressionistic marking, aims to rate overall proficiency level by assigning a single score to each written text based on raters’ immediate and general impression of the examinees’ final written products using a rating scale or scoring rubric. The rubric is often a five- or six-point continuum, where each point corresponds to a descriptor which can be either general or fairly specific. A distinct advantage of holistic scoring, from a practical perspective, is that essays can be measured rapidly and therefore the process is more economical than analytic scoring. In this regard, holistic scoring draws raters’ attention more to the strengths of essays rather than their weaknesses or shortfalls (Shaw & Weir, 2007). Researchers in both L1 and L2 writing generally agree that holistic scoring is reliable in giving useful ranking information when rater training and rating session administration are faithfully adhered to (Perkins, 1983; White, 1994). Because of these distinct strengths, holistic scoring is commonly used in largescale assessment of writing. However, in terms of the validity of the procedure, researchers have shown some concerns about holistic scoring. For example, holistic scoring has been critiqued for failing to  9  provide useful diagnostic information about some aspects of a test taker’s writing ability (Elbow, 1996) such as language accuracy, control of syntax, lexical range, and organization (Davies, Brown, Elder, Hill, Lumley, & McNamara, 1999). Holistic scoring is rather “problematic for second-language writers, since different aspects of writing ability develop at different rates for different writers” (Weigle, 2002, p. 114). In other words, the same holistic score assigned to two different texts may represent two entirely different sets of characteristics. In addition, the inter-rater reliability check for measures of linguistic accuracy of L2 writing is often inadequate (Hamp-Lyons, 1990; Henning, 1991; Polio, 1997; Raimes, 1990;). These drawbacks may cause assessors to confound L2 writing competence with language proficiency (Cohen, 1994) and render “the validity of holistic scoring … an open question” (Charney, 1984, p. 68). Different from holistic scoring, analytic scoring provides separate scores or component scores on specific features such as relevance and adequacy of content, organization, and lexical breadth and depth. Analytic scoring is preferred over holistic schemes by many writing specialists for explicit diagnostic information about students’ writing abilities. As Shaw and Weir (2007) state,  Analytic scales are more suitable for second-language writing as different features of writing develop at different rates. This method, therefore, lends itself more readily to full profile reporting and could well perform a certain diagnostic role in delineating students’ respective strengths and weaknesses in overall written production. (pp. 151-152)  10  It is true that analytic scales can measure specific textual features which L2 writers may have developed unevenly. For example, some writers may have an excellent control of sentence structure and grammar but lack knowledge in organizing their ideas in the manner expected in the target language. Using analytic scoring, a reliable total score can be derived from a sum of a set of averaged multiple ratings of components. Such multiple ratings awarded to the same essay tend to enhance the reliability of assessment (Hamp-Lyons, 1991; Hout, 1996; Shaw & Weir, 2007; Weir, 1990). The analytic scale is also welcomed for its practical and efficient rating procedure which makes it easier to train raters (Cohen 1994; McNamara, 1996). The major disadvantage of analytic scoring is that it is time-consuming and costly for large-scale writing assessment. It is also sometimes challenging to assign numerical scores based on certain descriptors even for experienced essay raters (Hamp-Lyons, 1989). Critics of analytic scoring also point out that measuring the quality of individual aspects may maximize the role of autonomous text features and diminish the inter-language correlation of written discourse (Hillock, 1995; Hughes, 2003; White, 1994). Thus, qualitative judgments concerning content, coherence, style, and language are not always easily accommodated by analytic scoring methods alone. Research has shown that reliable and valid information gained from both analytic and holistic scoring instruments can inform test users (e.g., an educational institution) of test takers’ proficiency levels. However, the purpose of the writing task, whether for recruitment or diagnosis, is significant in deciding which scale to use. Holistic scoring, which assigns a  11  single score representing a reader’s general overall assessment of an essay, is often used to differentiate test takers by their relative ranking or to place test takers on a continuum across a range of scores; while analytic scoring, which specifies separate scores for specific features of writing, is considered to best serve classroom evaluation of learning, calling student writers’ attention to areas of needed improvement or to their achievement (Brown, 2004).  1.2.3  Writer Characteristics  Writer characteristics are another component of the direct measure of writing. With the globalization of education, more L2 writers of diverse backgrounds and writing skills choose to study in English-speaking countries. Research has shown that the writers themselves are a variable influencing their writing performances and writing processes (e.g., Freedman, 1983; Hamp-Lyons, 1990; Hoetker, 1979; O’Shea, 1987). Among various characteristics of L2 writers, L2 writing ability, L2 proficiency, L1 and L2 transfer, and L1 and L2 cultural backgrounds are four features which have a relationship to L2 writing performances. The following paragraphs will briefly review research on these features and their relationship to L2 writing. First, research has shown that L2 writing ability significantly influences the writer’s composing processes. Both the early study (e.g., Scardamalia & Bereiter, 1987) and the recent research (e.g., Eysenck & Keane, 2005) concur that there are two distinctive writing strategies. One is knowledge telling, a strategy which less skilled writers use as they generate content from memorized resources. The other is knowledge transforming, a strategy which skilled writers use as they continuously interact between developing knowledge and  12  developing text through the composing process itself. Bereiter and Scardamalia (1987) further point out that unfamiliarity with either text type or topic may inhibit the operation of knowledge-transforming processes. During knowledge transforming, skilled L2 writers, who were often older as observed by Leki, Cumming, and Silva (2008), considered the complexity of a task in terms of content, audience, register, and goal setting more than those less skilled L2 writers. Compared with less skilled writers who did little planning (Hyland, 2002), skilled writers were reported to pay greater attention to planning and monitoring their product (Field, 2004). Good L2 writers were also observed to be more willing to spend time practicing and revising their writing using various rhetorical strategies (Raime, 1987; Sasaki & Hirose, 1996; Sasaki, 2000; Victori, 1999). They have also been observed to write longer texts, use more grammatical features such as modals, nominations, verb tenses, voice, and varieties of words, and exhibited originality of ideas or content in each essay (e.g., Grant & Ginther, 2000; Linnarud, 1986). While the distinction between these two types of knowledge in composing processes has been observed (Scardamalia & Bereiter, 1987), Green (2007) points out that knowledge transforming does not exclude knowledge telling, but rather integrates with it. L2 writing ability is thus co-constructed by both knowledge telling and knowledge transforming during composing processes. Second, several studies have shown a correlation between L2 proficiency and L2 writing ability. For example, Aliakbari (2002) and Kiany and Nejad (2001) reported that a higher level of L2 proficiency was related to a higher L2 writing ability. Hirose and Sasaki (1994) also noted correlations between language proficiency levels and the quality of writing in terms of content, organization, language use, and discourse modes. Meanwhile, other  13  researchers observed how a certain level of L2 proficiency is necessary but not a sufficient condition for the prediction of L2 writing ability (Aliakbari, 2002) and that L2 writing is not independent of L1 writing ability (Carson & Kuehn, 1992; Sasaki & Hiroe, 1996; Zamel, 1982). As Cumming (1989) observes, among other L2 variables, L2 proficiency accounted for a large portion of variance in text quality. Third, a number of studies have found both positive and negative relationships between writers’ L2 and L1 writing abilities. Some claimed that L2 writers positively transferred L1 discourse competence to their L2 writing, either directly or indirectly. For example, Carson and Kuehn (1992) found that L2 writing ability significantly corresponds to their L1 writing ability. Sasaki & Hirose (1996) showed that writers’ initial competence in L1 writing might lead to more confidence in L2 writing. Similarly, Ma and Wen (1999) reported that L1 writing could predict and affect L2 writing ability, including L2 vocabulary comprehension and L2 discourse comprehension. In contrast, some findings suggest a negative relationship between L2 and L1 writing ability for writers at different age levels. For instance, while tracing the positive transfer from L1 to L2, Carson and Kuehn (1992) noticed that younger L2 writers who had more L2 educational experience but less L1 education had higher L2 writing ability, whereas older L2 writers who had high proficiency in their L1 but little L2 education, despite a comparatively longer period in the L2 context, demonstrated lower L2 writing ability. Worthy of language educators’ attention is that L2 writers may subconsciously transfer some alternative discourses or L1 heritage conventions and norms to their writing in the target language, which may eventually posit an effect on their L2 writing. As research shows, a transfer of rhetorical and linguistic features from L1 to L2 could indeed  14  result in problems such as rhetorical redundancy (Bartelt, 1983), use of general statements (Leung, 1984), lack of audience awareness in persuasive writing (Zainuddin & Moore, 2003), use of L1 linguistic patterns and rules (Achiba & Kuromiya, 1983; Indrassuta, 1988; Janopoulous, 1986), and wrong use of coordinators (Hinkel, 2001). Fourth, scholars in the field have largely concurred with the view that the L2 writing ability is influenced by multiple factors related to writers’ L1 cultural backgrounds. These factors include L1 rhetorical practices and conventions (Atkinson, 2004; Connor, 1996; Hirose, 2003; Kachru, 1995; Kaplan, 1966), L1 educational experiences (Carson, Carrell, Silberstein, Kroll, & Kuehn, 1990; Mohan & Lo, 1985), and genres of communication with more specific differences related to literacy in various cultures and contexts (e.g., Cope & Kalantzis, 1993; Halliday, 1994; Hyland, 2003; Johns, 2003; Martin, 1985, 2002).These factors call for L2 writers’ tacit incorporation of the mainstream ideology being related to English and English-speaking cultures such as the negotiation of meaning and voice for a shared communicative purpose (e.g., Casanave, 2004; Hyland, 2007; Ramanathan & Atkinson, 1999), peer review (e.g., Zhu, 2006), critical thinking (e.g., Canagarajah, 2006; Kubota, 2004), and textual ownership (e.g., Canagarajah, 2002; Hyland, 2000; Pennycock, 2001; Shi, 2006). Due to the impact of L1 cultural backgrounds, L2 writers who lack knowledge of English conventions may engage in inappropriate textual borrowing (Pennycock, 1996; Shi, 2004); behave like a “hermit” (Matsuda, 2002 & 2003); follow an inductive style (Kaplan, 1972; Li, 1996), or “poetry, flowery, and florid styles” (Fagan & Cheong, 1987, p. 25), or an onion-like organization (Lisle & Mano, 1997); lack voice and originality in writing; participate passively in peer feedback (Nelson & Carson, 1995), and/or  15  include a quantity of shared knowledge in their writing (Grabe & Kaplan, 1989; Hinds, 1987). When L2 writers try on their L1 social and cultural markers or stylistic stance while writing in English, their written identities may feel disorienting to readers familiar with English discourse conventions (e.g., Rubi, 1995; Shen, 1989). Thus, to become legitimate members of the academic community, L2 writers experience frustration with the expectations and assumptions of the target community (Fox, 1994), and some of them resist conforming to these standards (Balcher, 1991; Casannave, 1992). These cultural factors indicate that L2 writers need to develop their knowledge both cognitively (e.g., based on their prior knowledge as well as familiarity with similar texts) and socially (e.g., based on their understanding of the cultural conventions in the target language) to become legitimate members of the target community. As Kubota (1998) noted, low-proficiency L2 writers were generally those who knew little about cultural conventions and were even poor at organizing a coherent text on their L1.  1.2.4  Rater Characteristics  Raters’ variations in evaluating examination form another important component of direct measures of L2 writing ability. According to McNamara (1996), there are four ways in which raters can be at variance with one another: (1) tendency to overall leniency, (2) bias towards certain groups of test takers or types of tasks, (3) rating behavior, and (4) interpretations and application of the rating scale. Such variability, especially inter-rater reliability, has been documented in research. For example, Shi (2001) illustrates how native  16  and nonnative EFL teachers made different evaluations on the content, language, and organization, of some English essays written by Chinese university students. The results suggest that raters’ backgrounds and their teaching standards may be responsible for rater variability, which threatens the validity of scoring. Similarly, Lumely (2002) demonstrates the complexity of rater variability by examining the rating process, in which raters had to balance between the rating scale and their own intuitive impression of text quality. It is noteworthy that Eckes’s recent study (2008) investigated 64 raters in scoring test takers’ writing performance on a large-scale assessment using a four-point scale. A multifaceted Rasch analysis confirmed the hypothesis that raters with different backgrounds differed significantly in their interpretations of the importance of the scoring criteria in an EFL writing context. These studies confirm earlier findings concerning the wide variability in rating associated with raters’ interpretive evaluations of language features (Diederich, French, & Carlton, 1961; Edgeworth, 1890; Homburg, 1984). Researchers in the field consistently concur with the importance of establishing the reliability of scoring among a pool of raters (e.g., Cumming, 1997; Cushing-Weigle, 1994; Weir, 2005) and the importance of training to reduce rater variability and maximize consistency among raters (i.e., inter-rater reliability) (e.g., Alderson, Clapham, & Wall, 1995; Bachman & Palmer, 1996; Brown, 1995; Eckes, 2008; Lumley, 2002, Weigle, 1998). L2 writing experts (e.g., Hamp-Lyons, 1990; Henning, 1991; Raimes, 1990) have pointed out that establishing and maintaining inter-rater agreement is merely the first step toward reliability. Cumming (1997) argues that “…[ using] a simple calculation of agreement with a pool of raters on each test administration…may neglect actual differences among  17  raters, for example, differences over time and over geographical locations” (p. 55). Also, after having reviewed various studies of L2 writing, Polio (1997) emphasizes that it is inadequate to make a claim about L2 writers’ linguistic accuracy simply by checking reliability among raters since there are other factors such as writers and tasks in the direct measures. These scholarly insights suggest caution in scoring conditions where raters mediated the test scores; that is, the reliability of a rating scale depends on raters who operate it, and particularly in large-scale assessments when the raters’ decision directly affects the test takers’ life. In general, while scoring validity is critical, the rater is an important variable in the direct measure of writing. If raters fail to mark student writing in response to the rating criteria, the test becomes invalid even though it may have been well-developed. Thus, rater monitoring and rater training are a necessary component of a valid writing test.  1.2. 5  Summary of Direct Measures of Writing  A direct measure of writing ability consists of a test taker’s composition in response to a prompt, scored by raters according to the selected scales. The measure is mediated by various characteristics of L2 writing including tasks, rating, writers, and raters. First, rhetorical specifications in prompts (a single word, a phrase, a sentence or a few sentences) can make the writing task easy or difficult. In terms of rating, two commonly used scales, holistic and analytic, are applied based on the purposes of the measures. Compared with holistic rating which is commonly used for a summative purpose, analytic rating is commonly used for an informative purpose. For writers, L1 and L2 are interrelated in a number of ways: L2 writing ability and L2 proficiency, L1 and L2 writing ability transfer,  18  and L1 and L2 culture backgrounds. Finally, inter-rater reliability is a necessary check of the consistency between raters, but not the only one. Raters’ training, previous backgrounds, and assumptions can influence the validity of test scores. A question arises here: Can the singleprompt form of the direct measure fit writers of different backgrounds and different needs? This question leads to the purpose of the present study.  1.3  Purposes of the Study and Research Questions The general purpose of this study is to investigate the effects of prompts on the ESL  students’ writing performance in a testing situation. To achieve this goal, I have designed the study to compare two writing tasks or prompts: One (Prompt A) that requires general knowledge and the other (Prompt B) which requires specific knowledge. I collected 100 sample essays from 50 participants who were international students or newly arrived immigrants in Canada from three geographical areas: Mainland China, Taiwan, and South Korea. At the time of data collection, the participants were taking either basic or higher proficiency level (intermediate and advanced) English courses in a college in a metropolitan city in one of Canada’s western provinces. Each participant wrote two essays within 60 minutes on the randomly assigned Prompt A or Prompt B. Written samples of the two writing tasks were compared, and follow-up interviews were conducted to understand the writers’ experiences of completing the two tasks. Three research questions guided the study:  19  RQ1: Do ESL students across proficiency levels perform differently in terms of overall writing scores when responding to a prompt (which I refer to as ‘Prompt A’) which requires general knowledge in comparison to a prompt which requires specific topical knowledge (referred to as ‘Prompt B’)? RQ2: Do general knowledge and specific topical knowledge prompts (Prompt A and Prompt B, respectively) have different effects on specific textual features in ESL students writing across proficiency levels in terms of content (idea quality, positiontaking, idea development, and idea wrap-up), organization (coherence and cohesion), and language (fluency, accuracy, and lexical complexity)? RQ3: How do participants perceive their writing performances for the two prompts that require either general or specific topical knowledge?  1.4  Need for the Study The validity of direct measures of L2 writing is influenced by numerous variables,  such as the characteristics of task, rating scales, writers, and raters. Among these variables, the variable of topic has not been much researched. As is shown in the aforementioned discussion, the current standardized English tests, either international or local, measure writing competence through different writing tasks focusing on different content domains though their test scores are used as equivalent to each other. This leads to concerns about the validity of the tests. If validity means to evaluate what a test is supposed to measure, then the construct of the task is vital, since any construct-irrelevant variance will minimize the validity of the test. If the test is to measure writing ability but at the same time requires  20  certain content knowledge, this may hinder writers who lack the relevant knowledge to perform well in the writing test. These measurement errors can have negative consequence for test takers. The lack of investigation of effects of writing topics, therefore, needs to be addressed. The need for this study is also supported by findings of a recent study conducted by He and Shi (2008) on ESL students’ perceptions and experiences of standardized English writing tests. The researchers interviewed 16 international students (13 from Mainland China and 3 from Taiwan) in a Canadian university to explore their perceptions and experiences of two standardized English writing tests: the Test of Written English (TWE) and the essay task in the English Language Proficiency Index (LPI). In western Canada, the TWE is used as an entrance test for international students who speak English as a second or foreign language (ESL/EFL) whereas the LPI is required, in many post-secondary institutions, for all incoming ESL/EFL students and some native-English-speaking students whose final English mark from high school is below a certain level. As international students, all participants in the study passed the TWE but many took the LPI repeatedly before passing it. The study raised questions about the validity of the test as participants complained about some culturally biased essay prompts in the LPI. Further investigation is needed to address the effects of topics and issues of fairness and equity in L2 writing assessment.  1.5  A Note on Terms L2 writing research is a relatively young field, and there is not yet consensus on the  use of a number of terms. Thus, it is necessary to define some of the terms used in this thesis.  21  1.5.1  Writing Sample  The term “writing sample(s)” refers to the written product of the writing tests. Other terms such as “writing products”, “written products”, “written text”, and “essays” are used interchangeably with “writing samples” in this study.  1.5.2  Task, Topic, and Prompt  The writing task given to students for a writing test will be referred to variously as “the writing topic”, “the writing task”,” the writing prompt”, and “the test question”. The writing task generally consists of two main parts: (a) the topic or stimulus on what the writer should write; and (b) the instructions pertaining to how the writer should address the topic.  1.5.3  Topical Knowledge and Content Knowledge  In previous research, “topical knowledge” refers to the knowledge required for the given writing task or the prompt, whereas “content knowledge” may refer to “academic subject matter” (Crandall & Tucker, 1990), or any topic or theme of interest to the learners (Chaput, 1993; Genesee, 1994). In this study, “content knowledge”, and “topical knowledge”, because they are intertwined in the meanings of both subject matter knowledge and socio-cultural knowledge, are used interchangeably in contrast with “general knowledge”.  22  1.5.4  Test, Measure, and Assessment  The terms “test”, “measure”, and “assessment” are used interchangeably. The writer, however, is aware that the term “test” is commonly used to refer to a set of questions, exercises, or practical activities to measure someone's skill, ability, or knowledge. In comparison, the term “measure” tends to emphasize the quantity of product in terms of length and size, for example, whereas the term “assessment” seems to focus on decisionmaking based on the test and measurement results.  1.6  The Structure of the Chapters This dissertation includes five chapters. Chapter 1 is essentially an orientation to the  study. Introductory materials are presented to clarify and signify the importance of the study along some background information. Chapter 2 consists of a discussion of the theoretical framework and a review of previous literature pertinent and applicable to the study. Significant studies on content knowledge and their impact on L2 writing performance are cited. Chapter 3 includes the detailed procedures of mixed methods for conducting the study. Data sampling, collection, and analyses are specifically demonstrated. Chapter 4 presents the results of the study, supported by statistical tables and figures. Finally, Chapter 5 concludes by focusing on contributions, implications, limitations, and recommendations for further research.  23  CHAPTER 2 LITERATURE REVIEW  This chapter is divided into four major parts: (1) theoretical framework, (2) standardized English tests, (3) previous studies on topical knowledge referencing both L1 and L2 writing, and (4) the need for research on topical knowledge in L2 writing. I first discuss the theoretical framework of test validity, embarking on a brief discussion of the traditional concept of validity leading to an overview of the evolution of the modern concept of validity advanced by Samuel Messick (1989). I then review existing international and local Canadian standardized English tests. Next, I summarize empirical studies on topical knowledge and its impact on writing performances in both L1 and L2 contexts. For studies pertaining to L2 writing, I also review research on the assessment of content (i.e., the content criterion) in writing. Finally, I discuss the need for research on the role of topical knowledge in L2 writing.  2.1  Theoretical Framework 2.1.1  Traditional Conceptualizations of Validity  Conceptions of validity have evolved in the past 50 years, but one conception has remained constant: validity is the most fundamental consideration in the evaluation of measures and interpretations of assessment results for a given purpose. The validity of a test is commonly defined as the degree to which “a test measures what it is supposed to measure” (Garrett, 1937, p. 324). Through this lens, validity depends upon the accuracy of  24  the operational measures. In another traditional understanding, validity is “the extent to which inferences made from assessment results are appropriate, meaningful, and useful in terms of the purpose of the assessment” (Gronlund, 1985, p. 226). Validity thus refers to the inferences one makes from the test scores used for a given purpose in a given context. Shifting from test measures to inference-making, the concept of validity calls for greater attention to the dynamic, constantly changing social contexts test takers and test users (such as educators, teachers, and test administers) come from. The emphasis on the role of inference-making in the conception of validity is important, for “it highlights that the validity of the inferences one makes from the test scores is bounded by place, time, and the use of the scores resulting from a measurement operation” (Zumbo, 2007, p. 48). Validity is not only the procedures of the measures but also the truthfulness of the inferences that are made from the measures. To support such inferences, accumulation of evidence is the key. In this regard, the Standards for the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) (1985) state the conceptual difference between validity and validation as below:  Validity is the most important consideration in test evaluation. The concept refers to the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores. Test validation is the process of accumulating evidence to support such inferences. Validity, however, is a unitary concept. (p. 9)  25  While the concept of validity appears straightforward, there are several different types of validity that are relevant in the social sciences. Traditionally, validity evidence has been gathered or identified in three basic types of categories: content-based, criterion-based, and construct-based. These categories provide a convenient approach to organize and discuss validity evidence. Each of these types of validity requires a different approach to assessing the extent to which a test measures what it purports to. The primary purpose of this section is to discuss these three major types of validity and their different meanings, uses, and limitations. Content-based validity, which is usually established by content experts, refers to the extent to which test questions or contents represent skills in the specified subject area. Notably, content validity has played a major role in the development and assessment of various types of tests in psychology and especially education. Fundamentally, “Content validity depends on the extent to which an empirical measurement reflects a specific domain of content….The researcher must be able to specify the full domain of content that is relevant to the particular measurement situation” (Carmines & Zeller, 1979, p. 20). For example, a reading test to measure ESL learners’ reading comprehension would not be content valid if the reading topic focused only on a special area beyond the knowledge of most of the examinees. Thus, content validity is concerned with sample-domain representativeness. In this sense, the knowledge and skills covered by the test items should be representative of the larger targeted domain or construct. As Cronbach and Meehl (1955) emphasize, the “acceptance of the universe of content as defining the variable to be measured is essential” (p. 282). One limitation of establishing a content-valid measure, as researchers  26  (e.g., Carmines & Zeller, 1979; Nunnally, 1978) observe, is a lack of well-defined and objective criteria for determining the extent to which a measure has attained content validity. In most situations, content validity is assessed by evidence of agreement in judgments. However, since the validation of content during this process possibly involves the experts’ subjective values, content validity is relative and requires the accumulation of more validity evidence. A second basic type of validity is criterion-based validity. Criterion-based validity seeks to demonstrate “the extent to which the ‘criterion’ of the test has actually been reached…. [It] is best accumulated through a comparison of results of an assessment with results of some other measure of the same criterion” (Brown, 2004, p. 24). Criterion-based validity is thus used to examine the accuracy of an operational measure by comparing it with another measure which has been demonstrated to be valid. Different from content validity, criterion-based validity centers on the effectiveness of predicting criterion or indicators of a construct. For example, to assess whether the validity of a written driving test is an accurate test of driving skills, one compares the scores on the written driving test with the scores from the road driving test. Obviously, the written test will not be useful unless it correlates significantly with the outside criterion (e.g., a road driving test). Thus, the degree of criterion-based validity depends on the extent to which the test corresponds to the criterion. Technically, criterion-based validity can be categorized into two types: concurrent validity and predictive validity. The only difference between these two concerns the current or future existence of the criterion.  27  Criterion-based validity has been used mainly in psychology and education to analyze the validity of certain types of tests and selection procedures. One limitation of criterionbased validity is the difficulty of measuring many of the abstract concepts used in the social sciences. For example, phenomena such as self-esteem and cognition are abstract concepts rather than concrete objects; thus, there is less likely to be a relevant outside criterion against which the measure of such phenomena can be reasonably evaluated. The third basic type of validity is construct validity. Construct validity, as Carmines and Zeller (1979) put it, “is concerned with the extent to which a particular measure relates to other measures consistent with theoretically derived hypotheses concerning the concepts (or construct) that are being measured” (p. 23). Similarly, Brown (2004) defines construct validity as “any theory, hypothesis, or model that attempts to explain observed phenomena in our universe of perception” (p. 25). Construct validity compensates for the limited usefulness of content- and criterion-based validity and is grounded upon clearly formulated theoretical expectations. However, it also subjects itself to empirical challenges. If empirical evidence is consistent with the theoretical expectations, then the measurement is valid; otherwise, the measurement is considered to lack construct validity. As Cronbach and Meehl (1955) explain, “Construct validity must be investigated whenever no criterion or universe of content is accepted as entirely adequate to define the quality to be measured” (p. 282).  Similarly,  Brown (2004, p. 25) points out that construct validity asks whether a particular test “actually tap[s] into the theoretical construct as it has been defined ”. In other words, construct validity is a relationship between theory and practice: whether the theorized psychological construct match up with a specific scale or test measurement applied in research, or whether a scale or  28  test measures the theorized psychological construct adequately. According to Carmines and Zeller (1979), three steps should be followed for the judgment of the construct validity of a piece of research:  First, the theoretical relationships between the concepts themselves must be specified. Second, the empirical relationships between the measures of the concepts must be examined. Finally, the empirical evidence must be interpreted in terms of how it clarifies the construct validity of the particular measure. (p. 23)  They point out, “Construct validity is enhanced if one has obtained multiple indicators of all the relevant variables” (p. 26). Comparatively speaking, construct validity holds more weight for its explanatory focus than content- and criterion-based validity which focus on representativeness or prediction. Moreover, in contrast to content validity and criterion-based validity, construct validity has generalized applicability in social sciences, where there is more subjectivity to concepts. As such, researchers, especially in education and language studies, usually test construct validity including defining its theoretical framework before conducting their main research.  2.1.2  The Modern Concepts of Validity  The three most basic types of validity summarized by Carmines and Zeller (1979) were broadly applied in psychology and education until the theory of validity was reconceptualized as a unified concept by Samuel Messick’s influential work (1989) in the  29  spirit of the AERA/APA/NCME Standards (1985). According to the AERA/APA/NCME Standards (1985),  Validity is a unitary concept. Although evidence may be accumulated in many ways, validity always refers to the degree to which that evidence supports the inferences that are made from the scores. The inferences regarding specific uses of a test are validated, not the test itself. (p. 9)  Specifically, Messick (1989) argues that the traditional conception of validity is incomplete because it ignores the dynamic social dimension in terms of the value implications of scores and the consequences of test use. In this sense, the conception of validity is expanded to refer to the extent to which evidence and theory jointly support the interpretations of test scores for a given purpose. Messick’s work (1989) foregrounds a modern concept of validity. However, the effort to find an alternative to the traditional concept of validity can be traced back to Cronbach and Meehl’s (1955) pioneering work heavily influencing contemporary discussions of validity in educational assessment (McNamara & Rover, 2006). In their notable research article Psychological Bulletin, Croback and Meehl (1955) explicate the concept of construct validity and frame their new concept of construct validity as an alternative to the traditional criterionbased validity. In their words,  30  Construct validity is ordinarily studied when the tester has no definite criterion measure of the quality with which he [sic] is concerned and must use indirect measures. Here the trait or quality underlying the test is of central importance, rather than either the test behavior or the scores on the criteria. (p. 283)  Cronbach and Meehl (1955) emphasize the trait interpretations indicates the importance of construct validation. That is, the conception of validity goes beyond “how” to measure to the “what”, “who”, and “when” of assessment. This shift in attention takes into account the heterogeneous populations where construct is shaped and reshaped by diverse individual differences. As McNamara and Rover (2006) comment, Cronbach and Meehl’s emphasis on “trait” frames the target of validation in language individuality and cognition. Cronbach and Meehl (1955) embrace the collective roles of two types of evidence that support interpretations: one is weak, referencing any sort of evidence, while the other is strong, based on logical or empirical findings. Following Cronbach and Meehl’s (1955) work, Messick (1989) makes a major contribution to the development of the validity by making construct validity as a central aspect in a unitary model of validity. Messick’s validity theory (1989) embraces the notion of validity as a unitary concept. In his own words, “validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (p. 13).There is a broad consensus that Messick’s modern concept of validity has most influenced current thinking and practices in language testing (e.g., Bachman, 1990;  31  Chapelle, 1998; Kane, 2001; Kunnan, 2000; Linn, 1997; Lynch, 1997; Shepard, 1997, Shohamy, 2000). Unlike the traditional view that each type of validity has a separate role, Messick (1989) emphasizes the integration of multiple supplementary forms of all kinds of validity evidence to answer an interdependent set of questions. This unitary notion of validity consists of several distinct aspects working in tandem: content validity, criterion-based validity, and construct validity function as general validity standards for all measures. Messick’s progressive matrix (1989, p. 20) is shown in Figure 2. 1.  Figure 2.1 Messick's Progressive Matrix of Facets of Validity Test Interpretation  Test Use Construct Validity +  Evidential basis  Construct Validity Relevance/Utility Value Implication  Consequential Basis  Value Implications Social Consequences  Messick (1989) expands the traditional conception of validity by taking into account both evidence of value implications and social consequences of the score. There are four cells in the matrix distinguished by test interpretation and use on one level and evidential basis and consequential basis on the other. Construct validity as the first concept introduced in the matrix moves forward to join the concept in the next cell and becomes an essential component in each of the four cells. In other words, construct-based validity is embedded in the content- and criteria-based evidence alongside testing consequences and value  32  implications of scores. The evidential basis for interpreting the content aspect of construct validity includes content relevance or utility of test use; that is, how well a test or an experiment measures the construct that theory claims. Compared with the evidential basis, the consequential basis for interpreting the content aspect of construct validity requires judgments of the value implications of score interpretation and potential social consequences of test use, such as bias and fairness. Accordingly, the matrix emphasizes the importance of using theoretical as well as empirical evidence to support the interpretation of test scores in the validation process. Through empirical investigation, Messick intends to conduct construct validation with reference to the specified theoretical relationship to examine the empirical predictions. Through empirical investigations of the three basic types of validity, Messick calls for evidence of applicability and usefulness by checking the appropriateness of the assessment for the measure of the claimed construct. Through value implications and evidence of testing consequences, Messick (1996) embraces fairness in test use, acknowledging that all interpretations involve a questioning of values and intended or unintended consequences of score interpretation. Evaluating unintended consequences from testing is important, for it concerns negative consequences or washback to individuals and groups. The core of Messick’s progressive matrix calls for the use of evidence to support test inferences in validating test use, a view that has been widely accepted in the educational measurement community (American Educational Research Association, 1999). Messick (1989) identifies two sources of invalidity: construct under-representation and construct-irrelevant variance. According to Messick (1989), construct underrepresentation occurs when “the test is too narrow and fails to include important dimensions  33  or facets of the construct” (p. 34). In this sense, a test can only address a limited sample of the target domain or construct. As a result, test takers can often predict test content. Thus, the test scores may in effect reflect a relatively limited measurement of test takers’ ability. While emphasizing construct representation as a fundamental feature of construct validity, Messick (1989) demonstrates that construct under-representation threatens authenticity, or maximal construct representation. In contrast, as Messick (1989) points out, construct-irrelevant variance exists when the “test contains excess reliable variance that is irrelevant to the interpreted construct” (p. 34). Therefore, construct-irrelevant variance contaminates score interpretation. This type of invalidity can lead to either “construct-irrelevant easiness”(p. 34), which causes one to score higher than one would under normal circumstances, or “constructirrelevant difficulty” (p. 34), which causes a notably lower score. In tandem with the threat of construct under-representation to authenticity, construct-irrelevant variance threatens directness, the minimal construct-irrelevant variance or an assurance that nothing irrelevant that distorts construct assessment has been added. Messick (1989) emphasizes that authenticity and directness actually “constitute tacit validity standards” (p. 65). Messick’s call for attention to both threats indicates the importance of being explicit about inferences that are made in a given context. His emphasis on the validity standards of authenticity and directness also reveals more practicality in performance assessments which frequently attempt to assess integrated skills and knowledge in close simulations of the real world. Scoring and interpretation are more challenging in such a simulation, since it may not measure all the factors a writer would face in a real-world context.  34  Zumbo’s (2009) explanatory model of validity further expands Messick’s validity theory by embracing integrative cognitive judgment in the contextualized and pragmatic explanation view of validity and validation (see Figure 2.2).  Figure 2.2 Zumbo’s Integrative Cognitive Judgment in the Contextualized and Pragmatic Explanation View of Validity and Validation (Zumbo, 2009, p. 70)  According to Zumbo, validity is the explanation, and such explanation is informed by contexts. These contexts include four elements: validity, and psychometrics, social consequences, and utility. It is these elements that influence and shape each other for the achievement of validity. Zumbo insists using both the statistical methods as well as more  35  qualitative methods of psychometrics to support the inference to the explanation. As Zumbo (2009) argues,  The process of validation involves consideration of the statistical methods, as well as the psychological and more qualitative methods of psychometrics, work to establish and support the inference to the explanation – i.e., validity itself; so that validity is the explanation, whereas the process of validation involves the myriad methods of psychometrics to establish and support that explanation. The process of validation also includes the utility and evidence from test use such as sensitivity and specificity of the decisions (e.g., pass/fail, presence/absence of disease) made from test scores and predictive capacity (e.g., predictive regression equations); as well as the fourth element of social consequences. (p. 70)  Notably, Zumbo (2009) insists that construct validity should focus on explaining test scores. He views psychometrics analysis as the means for attaining more empirical evidence to support the claim of validity. Different from Messick’s unitary model of validity which views construct validity as the totality of all other validity evidence (including value and social consequences), Zumbo separates the concept of validity from utility, social consequence, and psychometrics, all of which he deems as contextual phenomena shaping validity. In doing so, Zumbo further clarifies the relationship between the conceptions of validity and validation, maintaining that validity is different from but informs validation. Specifically, he claims:  36  We can see that validity is separate from utility, social consequences, and the psychometrics, but validity is shaped by these. Furthermore, the inferences are justified by the psychometric, social consequences, and utility but validity is something more because it requires the explanation. (p. 69)  Zumbo’s (2009) explanatory view of validity highlights the complex reality that a homogenous population, which psychometrics assumes, does not exist; everyone scores differently. As Zumbo emphasizes, “It is important to note that validity continues to be deeply rooted in the notion of ‘individual differences’ or disposition theory, as dispositional theory has evolved over the decades” (p. 68). In sum, Zumbo expands the evidential basis for test validation by providing a richer explanation of the processes of responding to tests and variation in test or scores, hence promoting richer psychometric theory-building. The above review illustrates how the concept of validity has shifted in the past decades, from one of purely criterion-based validity to the addition of construct validity as an alternative (Cronchech & Meehl, 1955), then towards an embrace of both evidential and consequential bases of score interpretation and test use in social contexts (Messick, 1989), and finally, to an emphasis on the process of validation (Zumbo, 2009). Despite the changes and development, the core of validity theory has consistently been construct validity. Indeed, the five components (content, criterion, construct, value, and social consequences) of the unitary concept of validity (Messick, 1989) provide theoretical and practical bases on which inferences are drawn from tests and testing, especially for L2 researchers and educators who seek to facilitate L2 learners’ language learning. Of the theories reviewed, Messick’s (1989)  37  four faceted matrix is most directly applicable to the current study’s focus on the effects of topical knowledge; thus, Messick’s concept provides the theoretical framework for exploring the present research questions, while the broader notion of the development of construct validity, from Cronbach and Meehl (1955) to Zumbo (2009), informs the study.  2.2  International and Local Standardized English Tests To understand the use of direct measures of writing in language testing, this section  reviews five major standardized English tests used in North America. A standardized test is one that is supposedly given under the same conditions to everyone who takes it. Proper use of such test instruments requires giving and evaluating the test under controlled conditions such as “certain standard objectives, or criteria, that are held constant across one form of the test to another” (Brown, 2004, p. 67). Among the five tests, the TOEFL iBT (Test of English as a Foreign Language Internet Based Test), the IELTS (International English Language Testing System), and the MELAB (Michigan English Language Battery) are international tests, while the CAEL (Canadian Academic English Language Assessment) and the LPI (Language Proficiency Index) are local or national tests. While the international tests are designed for a large number of test takers in different countries, the local tests are for test takers who plan to enter certain local universities or work for certain local companies. All five tests include a section which assesses writing competence for academic purposes. I discuss and compare the purpose, criterion and rating of these tests in the following sections.  38  2.2.1  TOEFL  The TOEFL, a testing service launched by the Educational Testing Service (ETS) in 1963, is available throughout the world. The major purpose of the test is to assess the English language proficiency of nonnative English speakers who wish to enter universities or colleges in North America. According to The Official Guide to New TOEFL iBT (ETS, 2006), “The TOEFL test is administered in more than 180 countries, making it the most accessible test in the world....More than 5,000 colleges and universities in 90 countries accept TOEFL scores” (p. 1). The current TOEFL iBT, also called “New TOEFL” or “New Generation TOEFL,” differs from the early paper/pencil and computer-based tests. The TOEFL iBT has four sections: (a) a 60-100-minute reading section (b) a 60-90-minute listening section (c) a 20-minute speaking section, and (d) a 50-minute writing section. The score of TOEFL iBT is based on performances on the four sections: listening (0-30), reading (0-30), speaking (0-30), and writing (0-30). The total score (120) is the sum of the four skill scores. The total test time is between a minimum of 1.5 hours and a maximum of 4 hours. In the TOEFL writing section, test takers are expected to complete two types of writing tasks: (a) an integrated writing task and (b) an independent writing task. The former starts with a three-minute reading task on an academic topic, followed by a listening task on a lecture related to the reading topic, and finally a 150-250 word summary of the listening passage to explain how it is related to the reading passage. The latter contains a prompt asking for the writer’s view about or attitude to an opinion statement. The prompt for the independent writing task is typically phrased as follows:  39  Do you agree or disagree with the following statement? Always telling the truth is the most important consideration in any relationship. Use specific reasons and examples to support your answer. (ETS 2006, p. 259)  Although the test takers are allowed to write as much as they wish in the time allotted, an effective response is typically about 300 words (ETS, 2006). The TOEFL writing section is scored holistically on a 5-point scale. Appendix 1 contains a description of the writing at each level on the scale. For both the writing tasks in the TOEFL iBT, the test takers typically write with some errors and produce essays that are not well-researched because of the time limit and lack of resources. The TOEFL iBT is praised for its emphasis on the performance of all four skills in contrast to the earlier paper/pencil and computer based tests which did not contain a speaking section (Butler et al., 2000) and focused on discrete knowledge about the forms of the English language (e.g., multiple choice items about grammar or vocabulary) rather than communicative performances (Alderson & Hamp-Lyons, 1996; Bailey, 1999). The new writing section laudably tests how well the examinees can use English instead of how well they know English, and this new direction can divert test takers’ attention from rote memory in test preparation. A large number of researchers advocate the innovations of the TOEFL iBT, believing that the performance elicited in the writing section can lead to a positive washback in teaching and learning and finally help students perform better in universities (Cumming, Grant, Mulcahy-Ernt, & Powers, 2004; Cumming, Kantor, Powers, Santos, & Taylor, 2000; Hamp-Lyons & Kroll, 1997).  40  2.2.2  IELTS  The IELTS, jointly managed and administered by the University of Cambridge Local Examinations Syndicate and the British Council and IDP Education Australia, is another large-scale standardized English test available in many countries. The test was initially used by British universities to assess the English language proficiency of applicants whose first language was not English. The test scores have also been used by immigration authorities and government agencies in many countries such as Australia, Canada, New Zealand, the UK and the USA. There are two versions of the IELTS test: the Academic Module for students seeking entry to a university or an institution of higher education, and the General Training Module, either for people applying for immigration or students seeking entry to a secondary school or vocational training courses. Like the TOEFL, the IELTS test consists of four components: (a) a 60-minute reading section; (b) a 60-minute writing section, (c) a 30-minute listening section, and (d) an 11 to 14 minute speaking section. The total test time is two hours and 55 minutes. The Academic Writing Module of the IELTS tests examinees’ ability to produce two different writing tasks: Writing Task 1 is a 150-word descriptive report on some graphic or pictorial information provided. Writing Task 2 is a 250-word argumentative essay on a given topic; examinees are expected to organize their answers clearly and give some examples to support their points from their own knowledge and experiences. For the General Training Module, the writing test includes two different tasks: Task 1 is a 150-word letter, while Task 2 is a 250-word essay based on a given topic. Examinees are allowed one hour to complete the two tasks in each module.  41  Scoring of the writing tasks in IELTS follows a 9-band scale, from 1 (the lowest) to 9 (the highest) (see The IELTS Task 2 Writing Band Descriptor. Retrieved in April 2010 from http://www.ielts.org/pdf/UOBDs_WritingT2.pdf). The standard of scoring for writing is based on four major aspects: task response, coherence and cohesion, lexical resource, and grammatical range and accuracy. Each band of score indicates a proficiency level of English. For example, a score of 9 means that a candidate is an expert user of English with complete understanding of syntax and grammar and possession of a rich vocabulary. Similarly, a score of 7 is an indication that a candidate is a good user of English despite some inaccuracies. The IELTS has become popular in recent years since the launch of the New TOFEL. The IELTS may be chosen by many examinees who aspire to avoid low scores on the TOEFL’s new integrated writing task due to their poor listening and reading ability. These L2 writers may feel “safe” with the IELTS writing tasks which are not integrated with listening and reading tasks. Many researchers (Chalhoub-Deville & Turner, 2000; Cheng, Watanabe & Curtis, 2004; Clapham, 1996; Green, 2007; Mayor, 2006; Ross, 2005; Shaw & Weir, 2007) have discussed the potentially positive washback of the IELTS to classroom teaching for the need of more background knowledge required by the writing task in the test.  2.2.3  MELAB  The MELAB (Michigan English Language Battery) is also a testing service used throughout the world. In the manner of TOEFL, it assesses the English language proficiency of nonnative English speakers who seek education or employment opportunities in Englishspeaking countries such as the United States, Canada, and Britain. The MELAB test is  42  administered by University of Michigan English Language Institute (ELI-UM) and ELI-UM authorized official examiners in the United States and Canada. The MELAB includes three compulsory components: (a) a 30-minute impromptu writing task on one of two assigned topics; (b) a 25-minute listening task; and (c) a 75-minute grammar and reading comprehension task using multiple-choice items, a cloze test, and some comprehension questions based on four or five reading passages. For the MELAB writing section, examinees choose from two given prompts, which may be expository, descriptive, or argumentative/persuasive. One typically calls for content based on examinees’ personal experience, and the other requires knowledge from the external world. Appendix A shows the standard for MELAB composition form and instructions. The essays are graded on a 10-point holistic scale including 97, 93, 87, 83, 77, 73, 67, 63, 57, and 53, where 97 is the highest level and 53 the lowest. A score of 97 means that topic is richly and fully developed, with mature syntax, accurate morphological control, appropriate organization, and free of mechanical errors. A score of 73 is the midpoint of the scale, meaning topic development is present but the essay lacks clarity or focus. A score of 53 refers to an extremely short essay of about 40 words or fewer, which communicates nothing and is poor in both form and content. The essay scores based on the 10-point scale are converted to a score equivalent to the other sections of the test. Appendix B contains a description of the writing at each level on the scale. Two raters independently measure each essay and the average score is taken. A third rater is ready for adjudication if the two raters’ scores are extremely different.  43  2.2.4  CAEL  The Canadian Academic English Language Assessment (CAEL) is a highly integrated language proficiency test designed for in-house use: to determine students’ eligibility for admission to a university program at Carlton University, Ottawa; for example, to determine an appropriate placement for low proficiency students in the university’s ESL program (Fox, 1999; Jennings, Fox, Graves, & Shohamy, 1999).The listening passages are extracted from first-year lectures and the reading passages are taken from academic articles. Based on a 1987-1989 survey which showed how students’ needs are not served in widely used standardized English tests such as the TOEFL and IELTS, the CAEL test replicates authentic university academic activities (e.g., taking notes, writing definitions, filling in flow charts and diagrams). The CAEL assessment truly reflects what students are required to do and allows instructors to “teach to the test”. The CAEL includes four sections: (a) a 50-minute Reading Section that requires test takers to identify main ideas, extract specific information, understand vocabulary in context, and follow a logical or chronological sequence of events, (b) a 20-minute Lecture Section in which test takers answer questions while listening to a prerecorded lecture, (c) a 25-minute Oral Language Test, in which students make short presentations, summarize main points, and respond to group discussions, and (d) an approximately 45-minute Writing Section with a writing task using information obtained from the Reading Section and the Listening Section. “The essay topic asks test takers to [state whether they] agree or disagree with a claim, argue for or against a position, [or] discuss advantages and disadvantages of a course of action. The  44  test takers are encouraged to plan their essay before they begin writing” (See CAEL website http://www.cael.ca/). The writing samples are scored by a 10-90 band, from 10 (the lowest proficiency level) to 90 (the highest proficiency level). Each band of score indicates a proficiency level of English. Assessment Band Scores provide a detailed description of the standard created by “a group of experienced ESL/EAP teachers at the Language Assessment and Testing Research Unit in the School of Linguistics and Applied Language Studies at Carleton University” (See CAEL website http://www.cael.ca/tsu/bandscores.shtml). For example, a score of 80-90 indicates that a candidate is an “Expert Writer” with exceptional competency required for academic English use including fluency, accuracy, flexibility and adaptability for academic tasks; a score of 10-20 indicates that a candidate is a “Very Limited Writer” who demonstrates very little ability to use English for the academic purposes. A full overview of the development of the CAEL Assessment is provided in the work by Fox, Pychyl, and Zumbo (1993). The assessment results of the CAEL are accepted by 117 Canadian institutions across eight provincial areas in Canada: British Columbia, Alberta, Saskatchewan, Manitoba, Ontario, New Brunswick, Nova Scotia, and Prince Edward Island, in addition to professional associations (See CAEL website http://www.cael.ca/taker/ who.shtml).  2.2.5  LPI  Like the CAEL, the Language Proficiency Index (LPI), administered by the University of British Columbia (UBC), is another local English proficiency test in Canada.  45  The test is used by over 20 post-secondary institutions in Western Canada. ESL/EFL students and some native-English-speaking students whose final English marks from high schools are below 75% are required to take the test in order to either enter a university or a college or register for the compulsory 100-level English courses at the LPI host university. The test has also been used by real estate agents, notaries public, police recruitment, and Immigration Canada to assess the English level of immigrants. The LPI test consists of four parts comprising 80 points: (a) identifying errors in sentence structure (10 points), (b) identifying errors in English usage (10 points), (c)evaluating and/or summarizing short prose passages (20 points), and (d)writing an expository essay (40 points). The Writing Section of the LPI requires test takers to write a 300- to 400-word argumentative essay on one of three prompts provided. Many of these prompts are related to Canadian culture, which is difficult for international students who do not possess the relevant knowledge (He & Shi, 2008). The examinees are expected to either state a point of view and explain it or express agreement or disagreement with a given opinion statement as in the following example:  •  Would you rather support a national charity or a local charity? Be specific.  •  "Gifted athletes should be admitted to college without having to meet the regular admission standards." Agree or disagree.  •  Should governments spend more money on health care or on education? Be specific. (See the LPI website http://www.ares.ubc.ca/LPI/test_sections.html#section4)  46  The essay is assessed using a 6-point holistic rating scale where level 6 is the highest score. Essay Level 5 is the passing score and is defined as “effective proficiency” (See Appendix C for the criteria for the LPI Writing). Each essay is read by two raters. If both do not place the essay at the same level, it will be discussed by a committee of secondary and post-secondary English specialists until a consensus is reached. Students’ success or failure of the LPI test is determined by the results of essay assessment. In other words, a student with a writing score of 4 fails the entire test. Specifically, the characteristics that the markers look for are as follows:  • The writer has dealt clearly and specifically with one of the topics that was on the exam. • The essay has a clear structure that is easy for the reader to follow. That is, the first paragraph begins with a brief introduction that leads into a clearly expressed topic statement. That topic is then developed in two or three welldeveloped paragraphs, each of which contains at least four or more sentences. Finally, the essay ends with a concluding paragraph that does something more than just repeat the idea or ideas in the topic statement. • The sentence structure of the essay is varied and correct. • The English usage is exact, not too simple, and is idiomatic. (See the LPI website http://www.ares.ubc.ca/LPI/test_sections.html#section4)  47  2.2.6  Summary of the Five Standardized English Tests  The five major standardized English tests discussed above are often accepted by many universities and colleges in North America for admission purposes. All tests include a writing component with either an integrated task (the TOEFL iBT, and CAEL) or an independent writing task (the ILETS, MELAB, and LPI). The writing sections in the TOEFL iBT, ILETS, and MELAB measure the examinees’ writing competence using a general topic related to their examinees’ life and experiences, whereas the writing sections in the CAEL and LPI measure test takers’ writing ability using topics that require prior knowledge of a specific topic or subject matter. For example, writing in the CAEL is integrated with tasks such as reading an academic text and listening to a university lecture. Similarly, the LPI writing task measures L2 test takers’ Canadian English language proficiency, which is needed for their post-secondary studies. Compared with the international standardized English tests, the two local tests are academically orientated and tied to local academic requirements at universities such as the Carlton University and the University of British Columbia Universities. As the CAEL claims, “We have consistently found the test to be a much more accurate instrument for predicting academic success than the other tests available” (See the CAEL website http://www.cael.ca/edu/whyuse.shtml). Thus, direct measures of L2 writing are used in both international and national standardized English tests, though these tests have different priorities in their assessment.  48  2.3  Previous Studies on Topical/Content Knowledge in Writing The relationship between content or topical knowledge and composing has not been  investigated in depth in writing research. Despite Braddock, Lloyd-Jones, and Schoer’s (1963) landmark report on the state of knowledge in composition, which identified content knowledge as a key variable in writing research, little work has been conducted on the effects of assigned topics on writing performances. Given the scant literature on the impact of topic or content knowledge on writing performance, this section reviews and discusses in the following section relevant findings from both L1 (first language) and L2 writing, since research does show close similarities in composing strategies as well as discourse-level features between L1 and L2 writers (Jones & Tetroe, 1987; Raime, 1985; Shaw & Weir, 2007; Zamel, 1982, 1983).  2.3.1  Topical/Content Knowledge and L1 Writing  A body of research has investigated the effects of task complexity on L1 writing performance in terms of discourse mode (e.g., Engelhard, Gordon & Gabrielson, 1991; Hillocks, 1986), task choice (e.g., Hoetker, 1979; Hoetker & Brossell, 1986), wording (e.g., Brossell & Hoetker, 1984; Greenberg, 1981), cognitive demands (e.g., Freeman & Pringle, 1980; Matsuhashi, 1982; Ruth & Murph, 1984), rhetorical specification (e.g., Brosell, 1983; Hoetker, 1982, Hoetker & Brossell, 1986; Plasse, 1981), subject matter (e.g., Bereiter & Scardamalia, 1987; Flower & Hayes, 1981), and prior knowledge (e.g., Chesky, 1984; Chesky & Hiebert, 1987; Langer, 1984; McCutcheon, 1986). These studies generally focus on L1 children’s high school writing, particularly on their writing processes, such as retrieval,  49  organization, general writing strategies, and task focus. A central component of these focuses was the type of knowledge, such as domain knowledge, subject knowledge, content knowledge, linguistic knowledge, genre knowledge, socio-cultural knowledge, prior knowledge, declarative knowledge, and/or procedural knowledge. Research on L1 shifted their attention to the types of knowledge on writing performance. Although no theorist has defined operationally how domain-knowledge influences the writing process, as Faigle, Cherry, Jolliffe, and Skinner (1985) point out, the knowledge writers need has been described in various ways. Bereiter and Scardamalia (1987), Flower and Hayes (1981), and Moffet (1968; 1983), for example, suggest that good writers possess necessary subject knowledge for optimal performance. Similarly, Purves and Purves (1986) examine the dimensions of topics in writing and reveal that content knowledge, along with language knowledge, genre knowledge, and socio-cultural knowledge can have an impact on writing performance. In this vein, the act of writing involves a great deal of knowledge concerning not only linguistic and structural conventions particular to written language (e.g., text form, text function), but also topic or content knowledge that is situated within particular cultures and contexts. Specifically, Jolliffe and Brier (1988) address four types of interrelated components of domain knowledge competent writers need in writing for academic purposes:  • Knowledge of the discipline as a discourse community; • Knowledge of the subject matters writers in a discipline may write about, the  50  methods writers in the discipline use to investigate subject matters in order to write about them, and the lines of argument or explanation writers employ in their texts; • Knowledge of the ways writers in a discipline organize, arrange, and format their texts; and • Knowledge of the acceptable styles -- in general terms, the syntax and diction that writers in a discipline employ.(p. 38)  Later, Hayes (1990) comments that writing is a composing activity that involves not only declarative knowledge about subject matter, rhetorical goals, and organization, but also procedural knowledge or strategies for composing. These types of knowledge, as Jenning and Purves (1991) state, “in addition to the knowledge about the whole world at large, are lodged in the mind and are called into play in different situations where reading and writing are called for” (p. 6). While theory has acknowledged that various types of knowledge are needed for the composing process, how content knowledge specifically affects the writing process remains a relatively unexplored topic (Faigley et al., 1985; Hayes, 1990; McCutchen, 1986). As Ruth and Murphy (1988)) observe, only a small number of studies, mainly undertaken between the late 1970s and early of 1980s, have examined the dimensions of topics in designing writing tasks in L1 contexts (Brossell 1983; Odell, 1970; Odell, Copper, & Courts, 1978; Greenberg, 1981a; Hoetker, 1982; Purves, Soter, Takala, & Vahapassi, 1984). Among them, a few scholars do call attention to the gap in research about writing tasks, specifically on the  51  development of topics for direct measures of writing competency (Greenberg, 1981b; Hoetker, 1982; Odell, 1979; Odell, Copper, & Courts, 1978). They inquire, “How should researchers frame a writing task so as to obtain the best possible work from students? Is there any aspect of the rhetorical context that we need not include in a writing task” (Odell, Cooper, & Courts, 1978, p. 11)? “Is it in fact true that different kinds of writing tasks elicit different kinds of writing performance from students? Does one writing task elicit a great number of abstract (or connotative or formal) word choices than do other tasks?” (Odell (1979, p. 41), and “How do students read writing tasks? Which aspects of the directions do they understand? Or use? Or ignore”(Greenberg, 1981b, p. 8)? In general, these questions indicate the need for research on the roles of topic-specific knowledge in the development of direct measure writing tasks. Among the existing research in L1 writing, several studies have investigated effect of prior knowledge of the subject matter of a writing task, including topic preference and subject familiarity. These studies are discussed in the following, for they provide a relevant background for the current study. Notably, Bereiter and Scardamalia (1982) and Gradwohl and Schumacher (1989) explored L1 children’s writing and reported no significant differences in the quality of organizations between preferred and disliked writing topics. On the other hand, in a substantive study on the role of subject knowledge in writing, Langer (1984) found different results, claiming significant correlations between students’ subject knowledge and holistic ratings of their texts. Prior to writing, participants in Langer’s study (students in four 10th-grade American history classes) were assessed for topic-specific knowledge about two writing topics they were to write on. During the knowledge  52  assessment, participants were asked to write down key words about any ideas related to the topic. The responses were then rated for five analytic knowledge scores: overall quality, coherence, syntactic complexity, audience, and function. The results of the analysis evidenced a high correlation between each of knowledge scores and holistic scores of essay students wrote after the knowledge measures. As Langer (1984) commented, “the data clearly suggested a strong and consistent relationship between topic-specific background knowledge and the quality of student writing” (p. 146). Langer (1984) also found other evidence of topic effects on writing performance during her analysis of the content of four teacher-developed assignments. The writing assignments generally required two different types of response pattern knowledge: one related to knowledge of compare or contrast organizational patterns, the other pertaining to knowledge of the more general expository pattern using a thesis-support structure. Langer (1984) found a strong correlation between topical knowledge and students’ writing performance. For example, those who lacked topical knowledge failed to include sufficient details or amounts of information to elaborate upon their thesis statements. Langer (1984) concluded her analysis of topic differences with the following statement:  These findings imply that different assignments, given for different purposes, tap different aspects of a writer’s knowledge of a topic. A low score on a particular paper might not mean that a student does not know the information, but that knowledge that was available was not organized in a useful way for that particular assignment. (p. 144)  53  Langer’s (1984) findings on the interactions between topical knowledge and writing performance show that prompt types did not affect students’ organization of the information but background knowledge was strongly related to essay quality in terms of appropriateness and relevancy. That is, a writer must have some knowledge of a subject in order to write about it well. Like Langer, Chesky and Hibert (1987) investigated the effect of high and low prior knowledge together with other variables on high school students’ overall writing performance. In their study, Chesky and Hibert examined correlations between prior knowledge and students’ holistic writing scores by measures of five components: thesis statements, cohesion, essay length, T-unit length, and error analysis. The measures revealed mixed results: students with high prior knowledge outperformed those who had low prior knowledge except in T-unit length and error analysis which showed no significant difference between groups. The findings indicate that prior knowledge generally plays a positive role in writing, although it is still unknown how prior knowledge might influence content processes or a writer’s thinking process during text composition. While the above researchers investigated the effect of prior knowledge of the subject matter of a writing task or content knowledge on writing products, a few others traced the effect of such knowledge on writing processes. For example, McCutchen (1986), inconsistent with the results of Bereiter et al. (1982) and Gradwohl et al (1986), found that prior knowledge affected children’s writing processes, including retrieval, organization, general writing strategies, and task focus. Specifically, McCutchen (1986) investigated the importance of domain knowledge (about football) for writing among 300 children from  54  grades four, six, and eight participating in the study. The subjects were divided into two groups, low-knowledge and high-knowledge, and all responded to two types of topics: a general topic related to school or friends, and a subject topic requiring domain knowledge of football. The texts were measured on the basis of coherence, conceptual structure, and content. The results of the analyses showed that the high-knowledge group of students, irrespective of age, wrote on the football topic more coherently and in more elaborate depth than did the low-knowledge group. McCutchen (1986) concluded that topical knowledge differences “affect what gets said, but knowledge of another sort affects how it gets said” (p. 442). Similarly, using protocol analyses, Schumacher, Gradwohl, Brezin, and Parker (1986) investigated the impact of content knowledge on various writing processes (e.g., written lexical retrieval, organizational plan, writing strategies, and verbal protocols) in children’s writing. They found that children performed significantly differently when using subject knowledge versus general knowledge. Compared to young writers with low subject knowledge, those with high subject knowledge retrieved information more efficiently during the writing process and thus could generate more sophisticated text. These findings indicate cognitive and psychological concerns in writing. As McPeck (1981), Resnick (1987), and Glaser (1984) have argued, good thinking is subject-specific and dependent on domain knowledge. These studies indicate that prior knowledge affects the quality of both the written products and the writing process, at least for some domains. In summary, most investigations in L1 writing have compared the level of high school students’ prior knowledge and the impact of such knowledge on their high school writing  55  performances in both products and processes. The preliminary data of these studies were mainly based on the writing of high school-aged participants. Empirical evidence has not confirmed similar findings for adult writers and L2 writers. In addition, since there were mixed results regarding the role of content knowledge, more empirical evidence is needed to verify the findings.  2.3.2  Topical/Content Knowledge and L2 Writing  Since 1980s, a body of research has investigated the effects of task complexity on the writing performance of university-level ESL students. Some studies focused on student writing in classroom settings (Kuiken & Vedder , 2007; Teddick, 1990; Windfield & Felfeli, 1982), whereas others explored student writing performances in standardized English tests (He & Shi, 2008, Jennings, Graves, & Shohamy, 1999; O’Loughlin & Wigglesworth, 2003; Spann’s, 1990). These studies examined the task effects on L2 writing performance in terms of (1) prompt types (Windfield & Felfeli, 1982; Teddick, 1990, Spann, 1990), (2)task choice (Jennings, Graves, & Shohamy, 1999), (3) specific linguistic features such as accuracy, syntactic complexity, and lexical variation (Kuiken & Vedder, 2007), (4) the quantity of information and manner of presentation of information (O’Loughlin & Wigglesworth, 2003), and (5) the L2 students’ perceptions of their writing performance (He & Shi, 2008). In addition, six empirical studies examined assessment of content in L2 writing: Hamp-Lyons (1991), Lumely(2000), Norton (2000a), Sakyi (2000), Vaughan (1991), and Shi (2001). Among them, however, only three studies on the effect of topic: Barnes-Felfeli (1982), Tedick (1990) and Spann (1993) and six studies on assessment of content of L2 writing:  56  Hamp_Lyons (1991), Lumely(2000), Norton (2000a), Sakyi (2000), Vaughan (1991), and Shi (2001). These studies are presented in the following sections chronologically for its relevancy to the current study. The effect of topic on L2 writing performances is dated back to Winfield and BarnesFelfeli’s study in 1982. Winfield and Barnes-Felfeli (1982) investigated the effects of culturally familiar and culturally unfamiliar materials on L2 writing among 20 university students enrolled in intermediate level ESL classes. The participants were divided into two groups. One group consisted of 10 Spanish-speaking adults from various Latin American countries, while the other group contained 10 adults from various other countries whose first languages were Hebrew, Arabic, Navajo, Lugbura, Persian, Greek, Kamba, and Hindu. Prior to writing, students were presented with two thematic paragraphs of equal length, one from a Spanish book, Don Quixote, of which Spanish students had greater previous knowledge, and the other from the Japanese Noh Theater, of which all students had little or no knowledge. After reading the two paragraphs silently, students were required to write down anything they could remember of the themes of the two paragraphs within 15 minutes. During this period, students were not allowed to check either the previous reading paragraphs or any dictionaries, since thematic fluency was of interest. The results of the 2 ! 2 mixed analyses of variance on students’ written products showed that the native Spanish-speaking group wrote significantly differently while responding to the culturally familiar material and the culturally unfamiliar material in terms of fluency, grammaticality, and complexity. Specifically, the native Spanish speakers wrote less fluently and with more grammar errors on the nonfamiliar material than they did on the familiar material. In contrast, the other group showed  57  no such differences in their writing using both materials. Winfield and Barnes-Felfeli (1982) reported that cultural familiarity with the content was found to have a positive impact on their writing. Teddick (1990) investigated the effects of topic variables on writing performance and the impact of subject matter knowledge on the writing performances of ESL graduate students with three English proficiency levels: beginning, intermediate, and advanced. Using a holistic rating based on overall quality, fluency, and syntactic complexity, Teddick measured 105 participants’ written responses to two types of topics: one general topic and one field-related topic. The results of a 3 ! 2 multivariate analysis of variance showed that “L2 writers produced qualitatively better writing when provided with a topic that allows them to make use of their prior knowledge” (p.136).The study suggests a positive effect of subject matter familiarity on the writing performances of ESL writers. As Teddick argued, “If L2 writers are capable of producing syntactically complex utterances with fewer errors, their familiarity with the subject matter allows them to demonstrate this capability”(p. 136). In the same year that Teddick (1990) published his study about the effects of prompts on L2 writing performance, Spann (1990) reported the effects of prompts on L2 writing performance in the essay examination of the MELAB. A total of 88 ESL graduates and undergraduates in the United States from three language proficiency levels (beginning, intermediate, and advanced) were asked to write a 30-minute impromptu essay on two MELAB prompts which required different content or rhetorical modes. Using official MELAB norms, Spann scored students’ written responses to the two prompts in terms of fluency, length, and syntax. Results of t tests for comparisons of students’ responses revealed  58  that participants’ performance on the two types of prompts generally showed no difference as measured by holistic ratings. Textual analysis of student writing revealed some relations between the holistic score and length, lexis, and rhetorical features. Spann suggests caution in generalizing the findings due to a lack of participants’ background information (e.g., duration in the U. S., program fields, and reasons for studying), small sample sizes, and low inter-rater reliability. However, based on the findings, Spann argues that the prompt developers should take particular care to make the subject content accessible or universal for the sake of test validity. In tandem with the above three studies, which directly investigated the effect of content knowledge on L2 writing performances in the 1990s, several scholars in L2 writing have explored assessment of content in ESL or ESL composition in their empirical research. As a pioneer of the research in the field, Hamp-Lyons (1990) conducted an experimental study to examine how raters made judgments about L2 students’ essays in response to disciplined-based writing tasks administered by the British Council’s ELTS (English Language Testing Service). The study aimed to determine what the raters valued or did not value in the assessment of essays. Four experienced readers who had received training for scoring participated in the study, and they were all nonmembers of the discipline. Raters spent one day rating 23 ELTS essays individually. Then they were asked to report their scores on each essay separately and explain their rationales for the scores. This explanation process was audio-taped. The tape was later transcribed to determine possible discrepancies and/or similarities in criteria that raters used in their scoring of the essays. Along with other findings, the analysis of the audiotape data revealed that four raters consistently considered  59  relevance and argument as important content criteria in determining the judgments of the students’ written texts. In other words, how much appropriate knowledge writers brought to address the subject-matter topics and whether writers argued for a position while addressing the topic from the perspective of argument, rather than for factual accuracy, played a major part in the raters’ decision. Regarding relevance as a content criterion, Hamp-Lyons (1990) suggests that different discipline areas of the four raters could account for their different judgments about relevancy to content criteria. As she stated in this study, “…because of the curious discipline-related…nature of the ELTS and of the writing test in particular, the aspect of content quality that most engages readers’ attention is relevance” (1990, p. 138).This indicates that a content criterion on which ESL students’ writing is judged may vary due to raters’ specific academic knowledge of the writing task. In a similar vein, Vaughan (1991) examined the mental processing of nine trained raters, two of whom were native English speakers and others who were non-native English speakers (including L1 Chinese, Hebrew, French, and Spanish speakers), when they used a six-point holistic rating to assess six essays written by university students. The transcribed verbal comments of raters showed that content appears as the most mentioned problem and that raters focused on different elements of content in different ways. A total of 25 comments were made, among these nine raters, for the criteria of content for assessing native and nonnative English-speaking students’ essays, as follows:  More interesting as it goes along Boring  60  Worse than the other essays in content Content offensive Reader feels student is lying Too long Reader amused by content Reader laughs (sardonically) at content Reader disagrees with content Substantial, “authentic” content Well-developed example Writer shows lack of critical thinking Content or point unclear Good point as made Well argued Writer takes audience into consideration Thoughtful sentence Similar to other essays in content: having the same problem Only personal experience used Under developed example Better than other essays in content Good choice of words Content weak, simplistic or unsophisticated Cliché ending  61  Reader agrees with or likes content (Vaughan, 1991, p. 123)  These 25 comments, as the raters indicated, are the criteria of content for grading student writing. The criteria cover topic interest, appropriateness, originality, argument, agreement or disagreement, supporting evidence and elaboration, critical thinking, clarity, lexical sophistication, syntactic complexity, and register. What is worth noting is that among the listed criteria of content, raters’ agreement or disagreement was considered for assessing quality of writing. In other words, raters’ sometimes differing understanding and interpretation of criteria could affect the judgement of ESL students’ writing scores. As Vaughan (1991) observed:  Despite their similar training, different raters focus on different essay elements and perhaps have individual approaches to reading essays. Holistic assessment is a lonely act. Frequent end-of-tape comments such as ‘I don’t know what someone else might say,’ reveal the uncertainly of raters to whether their judgments were within the established criteria. Each rater comes to rely on his own method. (pp. 120-121)  Such varying views on the content indicate that individual raters may inconsistently judge the quality of writing content. Lumely (2000) reported a different criterion for content in his study investigating the assessment criteria of a large-scale writing test, the Special Test of English Proficiency  62  (STEP), which was used by the Australian government to assist immigration decisions. In the study, four experienced raters assessed 24 ESL writing samples using a five-point analytic scale. The scale contained four rating categories, among which TFA (Task Fulfillment and Appropriacy) distinguished “content” from “meaning”. According to TFA, “content” is about ideas and argument in response to the writing task, while meaning focuses on clarity, confusion, or comprehensibility of what is said. The results of the analysis revealed the superficiality of rating scales in comparison with the complexity of written texts and the raters’ judgments. As Lumely pointed out, the implicitness and inadequacy of the scale criteria led raters to rely heavily on their intuitive impression of the texts. In this case, it cannot be guaranteed that all raters will assess “content,” even with a focus on ideas and argument, in an identical way. In investigating the factors that could affect raters’ holistic rating of ESL writing, Sakyi (2000) also reported raters’ perspectives on content. To develop a theoretical model for measuring holistic assessment of ESL writing, Sakyi investigated the validity of holistically scored L2 English essays written by first-year ESL university students and scored by six raters. Analysis of the rater’s think-aloud protocols revealed that each rater reacted to the content based on “whether or not what the writer was trying to say made sense and whether they had adequate English to say whatever they wanted to say” (p. 135). In other words, appropriateness and adequacy of ideas were deemed an important content criterion. In his suggested tentative model that showed factors affecting holistic scores of written compositions, Sakyi approached content as a broad concept that encompasses “idea development, organization, support, logic, relevance, and quantity” (p. 146). Such a claim is  63  in contrast to the traditional belief that organization is separate from content as a different aspect in writing assessment (Breland & Jones, 1984; Freedman, 1979; Mendelsohn & Cumming, 1987; Santos, 1988; Song & Caruso, 1996). Norton (2000a) has compared the uses of the scoring guides for assessment of L2 writing competency in three university entrance tests: DEC (run by the Department of Education and Culture) in South Africa, TWE (Test of Written English) in the USA, and CLBA (Canadian Language Benchmark Association) in Canada. According to Norton, test creators have different perceptions and priorities of content in assessing writing. The DEC test includes “content” and “language” as two separate criteria. The content is specified as organization and ideas. Unlike the DEC, the TWE frames content as rhetoric, in comparison with language which was specified as syntax; both content and language are given equal weight in scoring for the TWE. Differing from the DEC and the TWE, the CLBA conceptualizes content as the relationship between the reader and the writer in making judgments about L2 writing proficiency. By emphasizing so, Norton urges that we not take the existing assessment criteria for granted but validate measurement processes with consideration of test consequence; that is, whether the construct claimed to be measured is situated within a clear theory that is fair to all writers. Norton listed the specific criteria that the DEC markers applied to determine the quality of content as follows:  How well did the candidate relate to the topic? Is the topic introduced and concluded effectively?  64  Does the essay hold the reader’s attention through interesting description, or imaginative writing, or perceptive ideas? Is it generally coherent? (p. 7)  Shi (2001) examined different interpretations of content between 46 native and nonnative EFL teachers while they were scoring 10 expository English essays written by Chinese students. Shi coded the teachers’ recounts of reasons for scores into five scoring categories: general, content, organization, language, and length. Among these categories, Shi presented content as a two-faceted concept with two subcategories: (a) ideas, including both macro and micro comments in response to the thesis, and (b) argument, focusing on idea elaboration (e.g., originality, relevance, depth, and maturity), paragraph development (e.g., unity, supporting details, counter argument, logic), and rhetoric (clarity, conciseness, objectivity). Such a comprehensive description of “content”, as Shi (2001) pointed out, indicated how teachers/raters evaluated the content of the writing of EFL students from various perspectives. In addition, the differences in the frequencies of the comments made by the two groups of participating teachers (native- and nonnative-English speaking) implied differences in their instructional goals. Content assessment, the study suggests, is not neutral but value-based. To summarize, the existing literature on the issue of content assessment of L2 writing indicates no consensus on the definition and application of content as one of the major criterion in L2 writing assessment (see Table 2.1). However, among the three existing studies that directly investigated the role of topical knowledge in L2 writing, Winfield & Barnes-  65  Felfeli (1982) and Teddick (1990) showed that subject familiarity had an impact on writing performance. Although Spann (1990) did not report the effects of topical knowledge on writing, he did point out that more caution was necessary while interpreting the results, considering a lack of participants’ background information and the limited size of the sample. Winfield & Barnes-Felfeli (1982), Teddick (1990), and Spann (1990) also indicate that content is defined and interpreted differently from one situation to another and from one rater to another. Although the criteria are coded in different ways, all the six studies reviewed include “relevance”, “argument”, or “idea” as the major criteria for content in L2 writing assessment. It is important to include an explicit theoretical assumption for the concept of content to achieve fairness in L2 writing assessment. To be ethically accountable, both test developers and test users must create fair and unbiased tests as well as use tests in a way that is fair for all test-takers (Hamp-Lyons, 1997; Kunnan, 2000; Norton, 1997; Shohamy, 1993, 2000).  2.4  Need for the Present Research Given such a small body of research on the role of content or topical knowledge in L2  writing performance, there remains a substantial number of unanswered questions. It appears that previous studies have consistently confirmed the importance of knowledge in the writing task. However, what kind of knowledge and whether and how intercultural and cross-cultural knowledge carries a writer through a writing task need to be experimentally confirmed. The present study therefore aims to explore empirical evidence to understand how prior knowledge affects the writing processes and shapes the texts of L2 writers.  66  Table 2 .1 Summary of the Previous Studies on the Concept of Content Studies Hamp-Lyons, 1990 Vaughan,1991  Lumely, 2000 Norton, 2000a  Definition of Content • relevance and argument • • • • • • • •  topic interest appropriateness, originality, idea sophistication argument, supporting evident and elaboration raters’ agreement or disagreement critical thinking sentence structure, word choice, clarity audience ideas and argument  • •  How well did the candidate relate to the topic? Is the topic introduced and concluded effectively? Does the essay hold the reader’s attention through interesting description, or imaginative writing, or perceptive ideas? Is it generally coherent? idea development, organization, support, logic, relevance, and quantity idea (general or specific comments on ideas and thesis) argument (General or specific comments on aspects of arguments such as balance, use of comparison, counter arguments, support, uses of details or examples, clarity, unity, maturity, originality, relevance, logic, depth, objectivity, conciseness, development and expression)  •  Sakyi, 2000  • •  Shi, 2001  • •  67  CHAPTER 3 RESEARCH METHODS  3.1  Introduction In this chapter I begin by reminding the reader of the motivations for this study and  the research questions. Then I describe the methods used to answer the research questions. Along the way, I describe the statistical techniques used and clarify their rationals. The literature review in the previous chapter examined the existing methodological problems that threaten the validity of the inferences test users or educators make from L2 writing assessment in standardized English tests. The purpose of this study is to investigate the effects of prompt types on the ESL students’ writing performance in a testing situation. To achieve this goal, this study was designed to answer three research questions:  RQ1: Do ESL students across proficiency levels performance differently in terms of overall scores when responding to a prompt (which I refer to as ‘Prompt A’) which requires general knowledge in comparison to a prompt which requires specific topical knowledge (referred to as ‘Prompt B’)? RQ2: Do general knowledge and specific topical knowledge prompts (Prompt A and Prompt B, respectively) have different effects on specific textual features in ESL students’ writing across proficiency levels in terms of content (quality of ideas, position-taking, idea development, and idea wrap-up), organization (coherence and cohesion), and language (fluency accuracy, and lexical complexity)?  68  RQ3: How do participants perceive their writing performances for the two prompts that require either general or specific topical knowledge?  In an attempt to answer these research questions, I investigated the impact of generalknowledge versus specific-knowledge prompts on the performances of 50 ESL students across different language proficiency levels (basic, intermediate, and advanced) based on the entrance placement tests they received at the beginning of their program studies at a college. A total of 100 writing samples written by the 50 ESL students in response to two types of prompts were collected. To avoid any subjective assumptions that could influence the selection of writing topics, two prompts, one requiring the general knowledge and the other requiring specific topical knowledge, were pooled out from a pilot study in which 20 students participated. I will describe the pilot study in detail later (see Section 3.3.5.2). The two prompts were then randomly assigned to the 50 students, who completed the two writing tasks on two occasions with a week in between. Finally, I interviewed five volunteers to explore their perceptions about their writing performances. The 100 students’ essays were analyzed by a mixed-methods explanatory design (Creswell & Plano Clark, 2007), through three phases, discussed in detail below. Phase One: I made a comparison of the effects of prompt types (general versus culture-specific) on the overall writing scores and then the interaction of these prompts across the three language proficiency levels – i.e., prompt types and proficiency levels are the independent variables, and the overall writing score is the dependent variable.  69  Phase Two: I focused on the specific textual features of the writing samples in response to the general knowledge prompt (A) and the specific topical knowledge prompt (B). It is important to note that the six-point analytic scoring rubric designed by this study involved seven numerical indicators that can be summarized into three scores (content, organization, and language) based on the LPI six-point holistic scale and previous literature on content criteria for assessing L2 writing. However, one of the composite scores, language, required the use of statistical principal components analysis to resolve the scaling issue (i.e., three indicators comprising this composite score are on different metrics and the raw composite score would be difficult to interpret). Therefore, two of the composite scores, content and organization, are derived by simply summing the respective numerical indicators and the third, language, was derived using principal component analysis as a mechanism for computing the composite score. These three composite scores, based on the seven indicators in the analytical scaling of this study, are referred to as “components” throughout because they reflect the three components (content, organization, and language) of the analytic scoring rubric. As in Phase One, the three language proficiency levels and the two prompt types are used as independent variables; however, for this phase the dependent variables are (i) the three component scores (content, organization, and language), and (ii) each of the seven measured indicators (idea quality, position-taking, idea development, idea wrap-up, fluency, accuracy, and lexical complexity). Phase Three: posttest interviews were conducted to understand students’ perceptions about the writing tasks.  70  In summary, the mixed methods explanatory design started with quantitative methods (Phase One and Phase Two) followed by qualitative methods (Phase Three). In the quantitative phases, the dependent variables are the continuous scores of the students’ writing, including the overall score of the essay, the score of each component, and each indicator, whereas independent variables are the two prompts (general knowledge and specific topical knowledge) and three levels of language proficiency (basic, intermediate, and advanced).  3.2  Research Design In order to provide the reader with background information about mixed methods  designs and also a description of my design choice, I first briefly review the major types of mixed methods designs in the literature and then discuss the research design of this current study in detail.  3.2.1  The Major Types of Mixed Methods Designs  According to Creswell & Plano Clark (2007) (see Table 3.1), there are four major types of mixed methods designs: Triangulation, Embedded, Explanatory, and Exploratory. These designs will be discussed below in terms of their models, corresponding timing, weighting, and mixing decisions.  71  Table 3.1 Creswell and Plano Clark’s Major Mixed Methods Design Types Design Type Variants  Timing  Weighing Mixing  Notation  concurrent: usually Triangulation -convergence -data transformation quan and qual equal -validating quan data at the same -multilevel time  merge the data during the interpretation or analysis  quan + qual  Embedded  embed one type of data within a larger design using the other type of data between the two phases  quan (qual) or qual (quan)  connect the data between the two phases  qual" quan  -embedded experimental -embedded correlational  concurrent or unequal sequential  Explanatory -follow-up sequential: explanations quan followed -participant by qual selection Exploratory -instrument sequential: development qual followed -taxonomy by quan development Note. quan =quantitative; qual=qualitative  usually quan usually qual  quan" qual  The Triangulation Design has a single phase where quantitative and qualitative data are given equal emphasis. This design is categorized into different models according to the way data are mixed. For example, if the results of the quantitative and qualitative are converged during the interpretation, it is a convergence model. If one type of data is transformed into the other type to answer a research question based on both types of data, it is a data transformation model. When the quantitative and qualitative data are collected simultaneously and the qualitative data are used to validate the quantitative results, it is a validating quantitative model. When the two sets of data, quantitative and qualitative, are collected to represent different levels of analysis within a system, it is a multilevel model.  72  Different from the Triangulation Design, the Embedded Design allows quantitative and qualitative data to be collected either concurrently or sequentially, and the two sets of data carry unequal emphasis. Specifically, there are two models in an Embedded Design: (1) the experimental model, wherein the quantitative data are used to answer primary questions in an experimental design, while the qualitative data are embedded before, during, or after the intervention to answer a secondary question related to the experiment, and (2) the correlation model, wherein the quantitative data are used to answer primary questions in a correlation design, while the qualitative data are embedded to explain the predictor and outcomes variable in the design. Unlike the Triangulation and the Embedded Designs which include one major phase, the Explanatory Design consists of two phases (quantitative followed by qualitative) and the two types of data are considered sequentially, with more weight towards a qualitative emphasis. There are two phases in an Explanatory Design. If the first phase uses quantitative methods and the second phase is based on the results of the first phase and uses qualitative methods to verify or explain these results, it is called the follow-up explanations model. The other is the participant selection model wherein the intent is to select participants to best answer the qualitative research questions. Like the Explanatory Design, the Exploratory Design includes two phases but starts with qualitative followed by quantitative methods, with more weight on qualitative findings. The two models in the Exploratory Design are the instrument development model, wherein the two phases are interrelated by the development of an instrument based on the results of  73  the first qualitative phase and the taxonomy development model, wherein the second phase is meant to quantitatively generalize the qualitative results. Regarding the types of mixed methods designs, mixed methods researchers (e.g., Creswell, 2003; Creswell & Clark, 2007; Rossman & Wilson, 1985; Tashakkori & Teddlie, 1998) recommend that choosing a type of mixed methods design should take research purposes into consideration. As Creswell and Plano Clark (2007) suggest,  We strongly recommend that researchers carefully select a single design that best matches the research problem. This will make the study more manageable and simpler to implement and describe. In addition, it provides the researcher with a framework and logic to guide the implementation of the research methods. (p. 79)  3.2.2  Mixed Methods Explanatory Design of this Study  The data collection of this study took the form of a three-phase Sequential Mixed Method Explanatory Design (see Table 3.2.1, for a summary of design by Creswell & Plano Clark, 2007) in response to the research purposes and the nature of research question. In this design, I first collected and analyzed the quantitative (numerical) data and then the qualitative (textual) data. The rationale for using this method was that quantitative data and their analysis would provide a general view of the findings, whereas the subsequent qualitative data and their analysis would refine and explain those quantitative results with more depth and breadth (Creswell, 2003; Creswell & Clark, 2007; Rossman & Wilson, 1985; Tashakkori & Teddlie, 1998).  74  Specifically, the Sequential Explanatory Design of this study (Figure 3.1) was implemented in three major phases, each of which included specific procedures. As a reminder, dependent variables were the overall writing score for each type of prompt (for Phase One) and the component and indicator scores (for Phase Two), while the independent variables were prompts, with two levels (Prompt A and Prompt B), and proficiency, with three levels (basic, intermediate, and advanced).  75  Figure 3.1 Mixed Methods Sequential Explanatory Design of this Study  Topical Knowledge and L2 Writing  Phase One Repeated Meaures: A Paired-Samples t Test & A 3 x 2 ANOVA for Overall Writing Scores  Phase Two Repeated Measures: 3 x 2 ANOVAs for Components and Indicator Scores  Phase Three Posttest Interviews: Discourse Analysis for Understanding Writers’ Perceptions of Writing Two Prompts  Findings and Inference Making  76  In Phase One, participants’ overall writing score for the two prompts were entered into SPSS (Statistical Package for the Social Sciences) version 16 to conduct two repeated measures of statistical analyses for comparisons of the overall writing performance. The first repeated measure test used a paired -samples t test to detect the overall writing scores on the two prompts (Prompt A and Prompt B) by comparing the group mean difference. The pairedsamples t test was applied for this phase because the same participants participated in both experimental conditions (i.e., tests of Prompt A and Prompt B) in this study. The second repeated measure analysis in this phase used a 3 ! 2 univariate analyses of variance (ANOVA) with both within-group and between-group factors to identify the main effects of prompts and interaction effects of prompts on overall writing performance across the three different proficiency levels, respectively. The two-way ANOVA included two independent variables in the measures. The first independent variable was “prompt”, a within-subjects variable which included two types: general knowledge (Prompt A) and specific knowledge (Prompt B). The second independent variable was “proficiency”, a between-subjects variable which had three levels: basic, intermediate, and advanced. The two-way ANOVA was applied for group comparison of the different combinations of pairs of means in an attempt to avoid a Type I error if multiple t tests were conducted separately. Finally, the Tukey b test of significant simple main effects (a type of post-hoc analysis) was used to investigate any statistically significant interactions. In Phase Two, participants’ component and indicator scores were entered into SPSS version 16 for a series of repeated measures by 3 ! 2 analysis of variances (ANOVAs) to analyze specific rhetorical features of student writing by identifying the effects of the three  77  proficiency levels (a between-groups factor) and two prompts (a repeated measures factor) on each of the three component scores (content, organization, and language) and the seven indicator scores (idea quality, position-taking, idea development, and idea wrap-up in content; fluency, accuracy, and lexical complexity in language) in total. Following Huberty and Morris’ (1989) recommendations, given that the research questions treated each dependent variable separately and that the component and indicator scores were also treated separately, a series of univariate ANOVAs were conducted. The Type I error rate was corrected via Bonferroni methods (i.e., for the component scores it is .05/3 = .0167 for each ANOVA in that section, and .05/7 = .007 for each ANOVA for each of the indicator variables). Finally, as in Phase One, the Tukey b test of significant simple main effects was used when an interaction was found. In Phase Three, posttest semi-structured individual interviews were conducted to explore participants’ perceptions about writing two prompts. The findings helped explain and verify the quantitative findings generated from Phase One and Phase Two. The semistructured format using a set of questions based on the Interview Guideline (Appendix D) included two parts: (1) background information, regarding the participants’ age, gender, duration of stay in Canada, language proficiency levels, majors, previous education, program of study, and experiences of taking English language proficiency tests; (2) their perceptions about the two writing prompts. The interviews were held on a one-on-one basis in an attempt to prevent participants from being influenced by their peers. All interviews were at a location convenient to the participant and conductive to tape-recording. Each interview lasted approximately one hour  78  in length. Although interviews should be ideally conducted in participants’ native languages (Spradley, 1980), I did not limit the language for this study; instead, the four participants who spoke Chinese were allowed to switch between English and Chinese while answering the interview questions; I spoke both English and Chinese fluently. However, I used English when interviewing the student from South Korea. The Interview Guideline was used as a guide but remained open to ideas and knowledge that emerged. Throughout the interview, I tried to put precedence on seeking rapport with the interviewees. To do so, I allowed the interviewees to raise questions and concerns, and change the format and sequence of the questions as necessary. I audio-recorded the interviews and later transcribed them verbatim because transcribing, as “a selective process reflecting theoretical goals and definitions” (Ochs, 1979, p. 44), is an integral part of data analysis and interpretation (e.g., Hutchby & Wooffitt, 1998; Ochs, 1979). During the transcribing process, I also attended to those nonverbal cues (based on my observation notes) in the interview transcript such as a pause, a gesture, laughter or an awkward response. The interview transcriptions were coded using Microsoft Word 2007 and saved as computer files. By collecting multiple phase datasets, this study aimed to gather and interpret as much information as possible to answer the research questions.  3.2.3  Research Site  This present research was conducted at one site: City College, which is in a metropolitan city in western Canada. The college is a private educational institute that  79  provides both academic and vocational training programs to meet students’ specific career goals and offers a two-year Associate of Arts Degree under the written consent of the local Ministry of Advanced Education. Every year the college enrolls a large number of international students who speak English as a second language. One of the college missions is to help the newly arrived ESL students to participate in the local Canadian society through its English curricula. The English for Academic Purposes (EAP) program offered at the college consists of seven intensive English courses, including training in academic reading, writing, speaking, vocabulary, and idioms: two at the beginning level, two at the intermediate level, and one at the advanced level. The advanced course focuses on the application of rhetorical strategies in academic writing processes (e.g., planning, revising, and proofreading, unity, coherence, and cohesion) to writing products. Besides the EAP courses, a test preparation course for the LPI was offered as needed to meet the need of the students who hoped to enter the local universities which require an LPI score. All students admitted to the college are required to take a placement test, using Canadian Language Benchmark, for their English proficiency levels at the beginning of the program. Such proficiency level information is used to determine the English courses they take. Students are regularly tested to adjust their English classes if they are making progress.Within such a bound context, City College was chosen as the research site due to its rich resources including culturally diverse ESL students, comprehensive English courses for different levels of college ESL learners, and LPI test preparation courses. In February 2008, I initially contacted the president of City College by email (see Appendix E for the Letter of the Initial Contact) to get the permission for conducting the  80  current study. After a few follow-up meetings with the president, I received the formal approval (see Appendix F for the Letter of Permission) to conduct the study at the college in May 2008.  3.2.4  Participants  A total of 50 ESL students from City College voluntarily took part in the study. All were currently enrolled ESL students at the college. They were informed of the study through their instructors, the researcher, and a poster on campus. In the beginning of May 2008, the president of City College sent an email message to all the instructors at the college to inform them of the study and get their assistance for recruitment. Five instructors responded and expressed willingness to help recruit participants. They distributed the Consent Letter (Appendix G) to students who might be interested in the study. The Consent Letter introduced the purpose of the study, the benefits to the participants, the amount of time needed for this study, and the participants’ rights. The five instructors also posted a recruitment notice in their classrooms. Meanwhile, I put up a recruitment poster (Appendix H) in the hallways inside the teaching building at the college. Prior to data collection, each participant signed a written consent form, indicating his or her agreement to participate in the study. The 50 participants in this study, aging from 17 to 35, was composed of the ESL students from Mainland China (n = 35), Taiwan (n = 9), and South Korea (n = 6) (Table 3.2), among whom were 29 females and 21 males (Table 3.3). They had been in Canada from 0.2 to 4 years (Table 3.4). All participants met the three required sampling criteria for the study:  81  They 1) were all registered students at the College, 2) spoke English as a second language, and 3) had sufficient language proficiency to write English essays and orally communicate about their writing experiences. The 50 participants, who were spread across five classes, turned out to be almost evenly distributed across three language proficiency levels (basic = 17, intermediate = 16, and advanced = 17).  Table 3.3 Descriptive Statistics for Participants’ Country of Origin Proficiency Basic  Intermediate  Advanced  Nationality Chinese Taiwanese  Number of Participants 8 5  South Korean  4  Total Chinese Taiwanese  17 13 1  South Korean  2  Total Chinese Taiwanese  16 10 7  Total  17  Table 3.2 Descriptive Statistics for Participants’ Gender Proficiency Basic  Intermediate  Advanced  Gender male female Total male female Total male female Total  Number of Participants 10 7 17 9 7 16 10 7 17  82  Table 3.4 Descriptive Statistics for Participants’ Age and Duration in Canada Proficiency Basic Intermediate Advanced  Age Age Duration Age Duration Age Duration  Minimum * 18 0.2 19 0.8 18 0.3  Maximum* 23 4 35 3.5 36 4  Mean 20.1 2.2 26 2.1 23.8 2.1  Note. * years  All participants in this study had at least a high school education. Most of the students at the intermediate English proficiency level had received higher education in their home countries before coming to Canada. These previous programs of study were diverse, including engineering, accounting, business, mathematics, physics, arts, and foreign languages.  3.2.5.  Instruments  Four instruments were utilized in this study: (1) a background information questionnaire, (2) writing prompts, (3) writing tests, and (4) interview protocol.  3.2.5.1  Background Information Questionnaire  A questionnaire, part of the questions asked in the Interview Guideline (Appendix D), was used to obtain the necessary information about the participants’ backgrounds such as age, gender, duration in Canada, language proficiency levels, education levels, and program  83  of study. All participants completed the questionnaire at the beginning of writing for either Prompt A or Prompt B. The information about the participants’ language proficiency levels were integrated in the analyses of this study to examine how language proficiency might interact with the writing prompt to influence students’ writing performances.  3.2.5.2  Writing Prompts  Prior to the beginning of the study, a focus-group study was conducted to select the two writing prompts for this investigation. A group of 20 participants from City College voluntarily took part in a general survey involving a total of 24 prompts in The LPI Workbook (The University of British Columbia, 2008), a book intended to help students prepare for the LPI test, the test I was focusing on in this study. The 24 prompts (Appendix I) were organized in the same order as those in the LPI Workbook, and the participants were asked to rank the prompts as either “difficult” or “easy”. The participants’ rankings were entered into SPSS for statistical analysis. The frequency of participants’ choices of easy or difficult essay topics showed that 97 percent participants chose Topic 2 as “difficult”, and 100 percent checked Topic 17 as “easy” (see Figure 3.2). Based on the results, Topic 2 about federal politics was pooled out as the “difficult” topic, while Topic 17, about students’ choice of their program of study in the future was selected as the “easy” topic (see Figure 3.2). Topic 17 is named as Prompt A (choice of what to study) and Topic 2 as Prompt B (Canadian federal politics). Notably, Prompt A was a general topic, while Prompt B required specific knowledge about federal politics.  84  3.2.5.3  Writing Tests  Two writing tests following the same criteria, formats, and procedures as the actual LPI writing test were used to collect the primary data for Prompts A and Prompt B. The two prompts were randomly assigned to 50 students during the tests. The students were required to write an essay within 60 minutes responding to either Prompt A or Prompt B on the issued two-page writing sheets (Appendixes J and K) in one test. The students then wrote the second essay responding to the other prompt in the following week. To encourage students to write more, I did not set a word limit for each essay although the LPI test requires 300 words.  Figure 3.2 Frequency of the Participants’ Choices of Easy and Difficult Topics  Easy  Difficult  The Number of Students  30 25 20 15 10 5 0 1  3  5  7  9  11  13  15  17  19  21  23  Twenty Four Promps  Prompt A: If you plan to a college or a university, what factors will influence your choice of what to study? Provide reasons. (p. 81) Prompt B: Explain why you do OR do not take an interest in federal politics. Be Specific. (p. 15)  85  3.2.5.4  Interview Protocol  Five participants from three proficiency levels voluntarily took part in a semistructured interview. Each participant had one hour interview to talk about their experience of writing two prompts. I carefully read the interview data and generated the emerged patterns, which helped to understand the participants’ perceptions of writing two prompts.  3.3  Scoring Procedures Scoring procedures in this study included three major steps: (1) developing a rubric  for analytic scoring, (b) pre-rater training, and (c) analytic scoring and post-inter-rater reliability check.  3.3.1  Analytic Scoring Rubric of this Study  A six-point analytic rating scale with three components, i.e., content, organization, and language (Appendix L) was designed according to previous literature on content criteria for assessing L2 writing and the criteria of the six-point holistic rating rubric of the LPI, wherein Level 6, Advanced Proficiency, is the highest proficiency level, and Level 0 is the lowest. Notably, the holistic rating scale of the LPI measures content through “original insights”, “complex and straightforward concepts”, “coherent articulation of ideas”, organization through “excellent organizational ability” and “clarity in development and organization”, and language through “exceptional fluency”, “fluent competency”, and “error  86  in expression” (See Language Proficiency Index website http://www.ares.ubc.ca/LPI/ 03_LPI_Test.html#top3_2). The pass score for the LPI is 5 or a Level 5, Effective Proficiency. To adapt to these criteria, the six-point analytic rating scale used in the present study rates from Level 6 to Level 0 across three components, “content”, “organization”, and “language”, each of which included its observed indicators (see Figure 3.3). Please note that Figure 3.3 is not a ‘structural equation model” and hence should not be interpreted as such.  87  Figure 3.3 Visual Diagraph of the Six-point Analytical Rating Scale of this Study  Analytic Scoring Rubric  Analytic Rating Scale for Writing Performace  Components  Indicators idea quality  Content  postion-taking idea development idea wrap-up  Organization  coherence & cohesion  fluency  Language  accuracy lexcial complexity  The writing components and indicators of the rating scale were developed by adapting categories used in previous research and standardized writing tests. Specifically, the component of content entails four indicators: (1) idea quality, emphasizing relevance, originality, and depth to the topic, (2) position-taking, looking at the thesis statement, position-taking, and the limitations of the thesis statement, (3) idea development, examining factors in body paragraph writing such as topic sentences, supporting details, and final  88  sentence, and (4) idea wrap-up, showing a sense of closure by summarizing the main ideas discussed in the body paragraphs and further comments by making a final statement such as prediction and recommendation in the conclusion paragraph. Similarly, the component of organization measures (1) coherence and (2) cohesion, by focusing on not only smooth transitions within sentences and between paragraphs, but also the organization of discourse with all elements present and fitting together logically (e.g., the presence of an introduction, body, and conclusion). Similarly, the component of language is composed of three other indicators: (1) fluency measured by overall length of the easy, (2) accuracy, measured by the percentage of error-free T-units, and (3) lexical complexity, measured by the percentage of frequency of academic words in each essay. Using this established analytic scale, this study aimed to accumulate both overall scores of the essays on the two prompts and sub-scores of each component and each indicator to answer the research questions. Pedagogically, such detailed information on the basis of analytic measures is also more useful for diagnostic purposes than a single holistic score.  3.3.2  Rater Training  Rater training is an important part of any rigorous research on writing assessment (Hamp-Lyons, 1990). To ensure the internal reliability of the established analytic rubric, the researcher conducted a 45-minute training session for the two raters, of whom one was a native-English speaker, and the other was a nonnative-English speaker. Both were experienced English writing instructors in an English institute in Canada. They had taught nonnative English speakers before and were familiar with essay ratings. During the training  89  session, the raters received two sample essays chosen from The LPI Workbook (The University of British Columbia, 2008) and evaluated them together according the criteria of the established analytic rating scale. While reviewing the criteria, the raters explained and discussed their thoughts and decisions in order to reach a consensus. In this process, I further clarified the meanings of certain indicators and explained T-units and error-free T-unit determination. For example, a compound sentence was coded as two T-units, but a complex sentence was coded as one T-unit. However, I was cautious not to provide any explanations about the “good quality” of an essay in case my subjective views might mislead the raters’ judgment or direct their attention toward certain aspects and thereby invalidate the rating scores. I made frequent calibration checks throughout the scoring session.  3.3.3  Analytic Scoring  The 100 essays on Prompt A and B, assigned to the two raters, were analytically scored using the six-point scale to measure the three components (content, organization, and language). In order to avoid raters’ biases due to their impressions of students’ handwriting (Markham, 1976; McColly, 1970), I typed all the students’ essays before issuing them for marking. The students’ names on the essays were also eliminated. After the two raters marked each essay individually, their scores were averaged. A randomly selected third rater was prepared to evaluate the essays in any cases where the scores differed by more than one point, but no such discrepancies occurred. Two raters scored the accuracy indicator together based on the percentage of error-free T-units in relation to the total number of T-units of each essay. In accordance with the previous literature (e.g., Larsen-Freeman, 1983; Henry, 1996;  90  Hirano, 1991; Sharma, 1980), the error-free T-units in the present study were defined as those T-units that did not have any syntactic, lexical, spelling, and punctuation errors. Error-free Tunits were counted as one of the criteria for the judgement of the language component in this study because they are “... now considered to be more valid measures of growth in a second language” (Gaies, 1980, p. 55). Given that the measure of tasks were based on counts, I scored the indicators in the language component; namely, fluency based on the total number of words of the essay and lexical complexity based on the percentage of the frequency of academic words in the essay. The academic words were counted based on the Academic Word List (AWL) (Appendix M) by Coxhead (1998), which was derived from a corpus of over 400 written academic texts, about 3.5 million words. These academic words were chosen by examining the range and frequency of words not included in the General Service List (GSL) (West, 1953), which contains the first 2,000 most frequently occurring words of English. For more information about the GSL frequency numbers, see the website (http://jbauman.com/gsl.html). The Microsoft Word 2007 and the online software, The AWL Highlighter, were used to facilitate the calculation of the overall length and the academic words in each essay. The language component was scored twice by the converted scale to fit the six-point analytic scale. First, the raw indicator scores in language were calculated based on a rescaled 0-100 scale considering the nature of the indicators (fluency, accuracy, and lexical complexity) mainly measured on the counts. Second, the component score of language by the 0-100 scale was reconverted to a six-point scale in order to be consistent with the ratings of the other two component scores, content and organization, in the analytic rating scale for this  91  study. During these processes, I conducted the weighting check using statistical principle component analysis for three observed indicator scores (fluency, accuracy, and lexical complexity) and revealed their approximate weighting, ranging from 0.35 to 0.918 with a weighting mean 0.713 (M =0.713) (Table 3.7.3). The weighting check for the three indicators scores here further validated the Analytic Rating Scale applied in the current study.  Table 3. 5 Principle Component Analysis Matrixa,b Prompt A fluency A accuracy A lexical complexity A  Component 1 .92 .89 .53  Prompt B fluency B accuracy B lexical complexity B  Component 1 .86 .72 .35  Note. Extraction Method: Principle Component Analysis a. 1 components extracted b. Language Proficiency  In the scoring session, two raters marked independently during three different times within two weeks. After all essays were marked, I collected them immediately and verified the number of T-units and error-free T-units. As stated in the Consent Letter, such measures of the students’ essays were conducted as both a supplement to the findings of the study and a benefit to the participants, who got feedback on their essays from me through either free individual English tutoring or lecturing to groups. Inter-rater reliability was checked by the Pearson product-moment coefficient, a measure used to examine the agreement or the  92  correlation between two raters on the marking. The Pearson product-moment coefficient of the three component scores (content, organization, and language) between the two raters was higher than 0.85 (r > 0.85), ranging from 0.86 to 0.89.  3.4  Procedures of Data Analysis The data collection and analysis took multiple phases to attend to the purpose of the  current study. I collected two types of data sequentially: (1) quantitative data by repeated measures of a paired-samples t test and a series of repeated measures of 3 ! 2 ANOVAs for comparisons of prompt effects on the overall writing performance and specific textual features, and (2) qualitative verbal discourse data for the in-depth understanding of the writers’ perceptions about the writing tasks. The two datasets, quantitative followed by qualitative, were merged in the final interpretation in order to accumulate persuading evidence to answer the research questions. Specifically, I carried out data analysis of these mixed datasets to answer three research questions of this study through the following three phases: (1) repeated measures by a paired-samples t test and a 3 ! 2 univariate ANOVA, (2) a series of repeated measures by 3 ! 2 ANOVAs, and (3) posttest retrospective interviews. The specific procedures of data analysis are described in the following sections.  3.4.1  Phase One: Repeated Measures of a Paired t Test and a 3 ! 2 ANOVA  A repeated measure of a paired-samples t test and a 3 ! 2 ANOVA were conducted to investigate the effects of prompts on the overall writing scores (i.e., the average of three  93  component scores of the essay) on the two prompts across different proficiency levels. The repeated measures were chosen since participants (n=50) were mutually dependent; namely, the same group wrote both Prompt A and Prompt B. As a reminder, the paired-samples t test was to compare the two group mean differences of the overall writing scores for Prompt A and Prompt B; the repeated measures of the 3 ! 2 ANOVA was to investigate the effects of interactions between prompt types and proficiency levels. The 3 ! 2 design included three effects: the first (main) effect was prompt with two levels (Prompt A and Prompt B). The second effect was the three proficiency levels (basic, intermediate, and advanced). The third was the interaction between prompt and proficiency. In other words, the repeated measures of the 3 ! 2 ANOVA in this phase examined the effects of prompts on the dependent variables, the overall writing scores across two independent variables of both a between-subjects variable proficiency (basic, intermediate, and advanced) and a within-subjects variable prompt (Prompt A and Prompt B). To meet the assumption of the normal sampling distribution for the paired-samples t test and the repeated measures of the 3! 2 univariate ANOVAs, the data for this study were first checked by a boxplot, which displayed an easy graphical representation of the distribution of the data. The resulting boxplots (Figure 3.4) showed a normal distribution of the dependents through the almost even distance between two whiskers in the overall writing scores of Prompt A, and no detected outliers except one on the component of language. In tandem, the skewness of both overall writing scores and component scores of Prompt A and Prompt B were checked by SPSS statistical analysis. Their resulting skewness value ranged  94  from -0.003 to -0.314, which indicated a normal sampling distribution of the dependent variables. The boxplots on Prompt B (see the column on the right) were visually skewed, which indicated the effect of Prompt B, which was considered difficult, on the writing performance. A repeated measure of the paired t test was run to compare the effects of prompts on the overall writing scores of students’ essays. Then a repeated measure of the 3! 2 univariate ANOVA was conducted to find out whether there were effects of interactions among the two prompts and three proficiency levels. During this phase, the component scores and individual indicators in each component were not examined. These analyses aimed to answer Research Question One: Do ESL students across proficiency levels perform differently in terms of overall scores when responding to a prompt (which I refer to as Prompt A) which requires general knowledge in comparison to a prompt which requires specific knowledge (referred to as Prompt B)?  95  Figure 3. 4 The Boxplot Check for the Normal Sampling Distribution Prompt A  Prompt B  96  Prompt A  Prompt B  97  3.4.2  Phase Two: Repeated Measures of the Mixed Design 3 ! 2 ANOVAs  Repeated measures of 3 ! 2 ANOVAs were conducted to investigate the effects of prompts on three component scores (content, organization, and language), the individual indicator scores in each component, and the effects of interactions among components or indicators (or both) across the different language proficiency levels. Each component score is the average of its indicators. As a reminder, the 3 ! 2 design included three effects. The first (main) effect was of prompt with two levels (Prompt A and Prompt B). The second effect was the three proficiency levels (basic, intermediate, and advanced). The third was the interaction between prompt and proficiency. In other words, the repeated measures of the 3 ! 2 ANOVAs in this phase examined the effects of prompts on the dependent variables, including (a) components (content, organization, and language), and (b) indicators (idea quality, positiontaking, idea development, idea wrap up, coherence and cohesion, fluency, accuracy, and lexical complexity) across two independent variables of both a between-subjects variable proficiency (basic, intermediate, and advanced) and a within-subjects variable prompt (Prompt A and Prompt B). Given that content has four indicator scores, organization has just one component score, and language has three indicator scores, there are a total of 10 dependent variables for the components (n=3) and their indicators (n=7). These analyses aimed to answer Research Question Two: Do general knowledge and specific knowledge prompts (Prompt A and B, respectively) have different effects on specific textual features in ESL students’ writing across proficiency levels in terms of content (quality of ideas, position-  98  taking, idea development, and idea wrap-up), organization (coherence and cohesion), and language (fluency, accuracy, and lexical complexity)? To display the effects of the interactions between prompt types and individual indicators in each component across three proficiency levels, the plots were run. A subsequent investigation using a post-hoc Tukey b test was conducted to examine the detected significant interactions in the 3 ! 2 ANOVA, if any.  3.4.3  Phase Three: Posttest Interviews  Follow-up retrospective interviews were of great importance to understand students’ opinions about the two writing tasks. A retrospective interview was conducted in the same week following the writing tests while the participants’ memories were still fresh. Through a series of semi-structured questions (see Appendix D), the interviews solicited participants’ perspectives and attitudes towards the two writing tests. Five out of 50 participants across the three language proficiency levels volunteered to participate in the interviews. The participants were not informed of their essay scores until after the interviews so as not to affect their responses to the interview questions. To analyze the data, I first carefully read through the interview data to prepare the ground for analysis. Specifically, I used the Interview Guideline (see Appendix D), which includes a set of questions. I then annotated the emerged themes and formulated them as the initial categories of information. I finally reread the data to identify specific themes related to the writer’s perceptions of their writing experiences. The patterns generated from the  99  interview were to answer Research Question Three: How do participants perceive their writing performances for the two prompts that require either general or specific knowledge?  100  CHAPTER 4 FINDINGS AND DISCUSSION  This chapter reports the results generated by the following three tasks: (1) identification of the effects of prompts on the overall writing scores by the repeated measures of a paired-samples t test and the interaction effects of prompt and proficiency by a repeated measure of a 3 ! 2 univariate ANOVA, (2) identification of the prompt effects on the component scores, the indicator scores, and the interactions of these scores (i.e., components and indicators) between-and within-subjects variables (i.e., the two prompts and three language proficiency levels) by repeated measures of 3 ! 2 ANOVAs, and (3) identification of the themes in students’ perceptions of their writing performance on the two prompts via posttest retrospective interviews.  4.1  Findings and Discussion for Research Question One The result of the paired-samples t test revealed statistically significant group mean  differences (t (49) = 10.56, p < 0.05) in the overall writing scores for the two prompts (Table 4.1).The general knowledge task (Prompt A) had a higher overall writing score than the specific-knowledge task (Prompt B) across all proficiency levels (Table 4.2).  101  Table 4.1 Paired-Samples t Test for Comparisons of Overall Writing Scores for Prompts Paired Differences 95% CI M SD SEM LB UB t df Pair Prompt A - 1.51 1.01 .14 1.22 1.79 10.56 49 1 Prompt B  N 50  Note. *. The mean difference is significant at the .05 level; *p <.05, two tailed. M= mean; SD = standard deviation; SEM = standard error mean; LB = lower bound; UB =upper bound  Table 4.2 Descriptive Statistics of Overall Writing Scores for Prompts Component Overall Score (Prompt A)  Overall Score (Prompt B)  Proficiency Basic Intermediate Advanced Total Basic Intermediate Advanced Total  M 1.64 3.43 4.63 3.23 .62 2.05 2.51 1.72  SD .83 .59 .41 1.4 .47 .92 1.25 1.23  N 17 16 17 50 17 16 17 50  Note. M= mean; SD = standard deviation; N = the number of participants  The results of the 3! 2 univariate ANOVA showed statistically significant main effects of the prompt types (F(1, 47) = 135.47, p < .05, partial est #2 (effect size) = .74) and  102  the proficiency levels (F(2, 47) = 59.82, p < .05, est #2 = .72) as well. In addition, there was a statistically significant interaction across different proficiency levels (F(2, 47) = 6.42, p < .05, partial est #2 = .22).  Table 4. 3 3 ! 2 Univariate Analysis of Variance for Overall Writing Score Measures Source  df  Proficiency Levels Error  2 47  Prompts Proficiency ! Prompt Error (Prompt)  1 2 47  F Between Subject 59.82 .88 Within Subject 135.47 6.42 (.42)  !2  p  .72  .001*  .74 .22  .001* .003*  Note. *. The mean difference is significant at the .05 level; *p <.05, two tailed. Values enclosed in parentheses represent mean square errors.  Figure 4.1 is a plot of the means of the overall writing scores for Prompt A and Prompt B across proficiency levels. Because the two lines for the two prompts are not parallel, the interaction of prompt by proficiency is evident in the plot.  103  Figure 4.1 The Plot of Prompt Effects on “Overall” Writing Scores  Although the 3 ! 2 univariate ANOVA revealed the main effect of prompts on the overall writing scores and the effect of the interaction between prompt and proficiency, it did not provide specific information about which groups (i.e., basic, intermediate, and advanced) were affected. To examine the results of the interaction (as depicted in the 3 ! 2 ANOVA in Table 4.3), a post-hoc pairwise comparison using a Tukey b analysis was conducted to investigate at which proficiency levels the overall writing scores for two prompts differed. The post-hoc analysis detected that there was a statistically significant mean difference of proficiency effects on the overall writing scores for all pairwise comparisons across  104  proficiency levels (p = .001) (see Table 4.4). In other words, students’ overall writing scores for two prompts were effected by both prompts types and their proficiency levels. The detected interaction effects here suggest that the effect of prompts on the overall writing scores was unsystematic and needs to be understood along with consideration of proficiency levels or vice versa. The finding here also indicates that proficiency levels alone cannot predict students’ overall writing scores on the two prompts; other factors such as the specific topical knowledge need to be considered. Note that given the findings of the simple main effects (wherein the main effect of prompt type depended on the level of proficiency or vice versa, and there was no difference at the basic proficiency level), the main effect of proficiency is not interpreted on its own with a post-hoc of proficiency level.  Table 4.4 Post-hoc Tests: Comparisons of Mean Differences of Proficiency Effects on Overall Writing Scores for Prompts  (I)Proficiency Basic Intermediate Advanced  (J)Proficiency Intermediate Advanced Basic Advanced Basic Intermediate  MD(I-J) -1.61* -2.44* 1.61* -.83* 2.44* .83*  SE .23 .23 .23 .23 .23 .23  Sig.a .001* .001* .001* .002* .001* .002*  95%CI LB UB -2.17 -1.05 -2.99 -1.89 1.05 2.17 -1.39 -.27 1.89 2.99 .27 1.39  Note. Based on observed means *. The mean difference is significant at the .05 level; *p <.05, two tailed. a. Adjustment for multiple comparisons: Boniferroni. CI = confident interval; MD =mean difference; SE= standard error; LB = lower bound;UB =upper bound.  105  Table 4.5, summarizing the findings through the repeated measure of the pairedsamples t test and the 3 ! 2 univariate ANOVA along with the follow-up post-hoc analysis, summarizes the answer to Research Question One. There were a main effect of prompts as well as an interaction effect on the overall writing scores of the ESL students across all proficiency levels; that is, students at the basic, intermediate and advanced proficiency levels performed significantly better on the general topic Prompt A (choice of what to study) than they did on the specific topic Prompt B (federal politics).  Table 4.5 Summary of Prompt Effects on Overall Writing Scores in Phase One Proficiency Basic Intermediate Advanced  Prompt A low low low  Prompt B low high high  Main Effect yes yes yes  Interaction yes yes yes  These results concur with previous findings that students’ writing performances improve when they are familiar with the content or topical knowledge, in either their L1 (Bereiter & Scardamalia,1982; Chesky & Hibert,1987; Gradwhol & Schumacher,1989; Langer,1984; McCutchen,1986; Purves & Purves,1986; Schumacher, Gradwohl, Brezin, & Parker, 1986) or L2 (Tedick, 1990; Winfield & Barnes-Felfeli, 1982). The statistically significant mean differences of the ESL students’ overall writing scores for the two prompts suggest that L2 writers, even those at the higher proficiency levels, cannot be assumed to possess sufficient knowledge to perform optimally on writing tasks; instead, they need a prior  106  knowledge base for writing on the specific topic as was the case with Prompt B. Writing is a metacognititive activity which involves recursive and nonlinear mental processes (Hayes & Flower, 1980), and during these processes for text generation the writers, in particular L2 writers, draw more heavily on their existing fundamental knowledge than on new, novel language creations (Ellis, 2002; Machon & de Haan, 2008; Wray, 2002). The evidence here indicates the importance of knowledge for information processing which can be enhanced by the comprehensive structure of all language knowledge and use that writers have received from their lifelong language input, including topical knowledge and an understanding of the norms of the target language (Ellis, 2002). If the writers have more topical knowledge, they can better access their own schemata or prior knowledge. An important insight has emerged from these results regarding the effect of topical knowledge on writing performance: it is a false assumption that writers’ failure to perform satisfactorily is only due to their limited writing skill. Notably, when L2 writers in this study were confronted with a lack of topical knowledge for the completion of the writing task, they were clearly at a disadvantage. This further confirms the role of topical knowledge in information processing, which can support active writing processes (planning, formulating, and revising) and optimize writing performance. The present finding is also consistent with similar claims from previous L1 writing studies which maintain that good writing depends on domain knowledge and ways of thinking (Glaser, 1984; McPeck, 1981; Resnick, 1987). It is understood here that “domain knowledge” implies familiarity with a specific subject, while “ways of thinking” refers to thought patterns influenced by writers’ diverse social, ethnic, and educational backgrounds. This important insight also calls attention to the relationship  107  between content and meaning in writing: writing must display content for its meaning; however, the type of knowledge or content measured pertains directly to establishing the validity of L2 writing assessments. Indeed, the research to date has reported that writers openly protest problematic prompts (Petersen, 2009; Weigle, 2002). If the topic is favorable to one group but not another, the writing task is biased and will negatively affect test takers. Compared with those at the intermediate and advanced levels, students at the basic level scored more poorly consistently for the specific-knowledge based topic (Prompt B) (M =.62) than for the general topic (Prompt A) (M= 1.64) (see Table 4.2). This was possibly due to their lack of both language proficiency and specific topical knowledge for the writing tasks. The average duration in the host country of the participants was just over two years, and some of the participants had just begun their program of study five months ago. These students might still rely heavily on linguistic and content knowledge they had learned in their home countries. As other researchers have stated, text models and discourse types in L1 writing vary across the world (Kobayashi & Rinnert, 2008; Shi & Kubota, 2007). These confounding factors could make the lower-proficiency students feel more challenged in response to the specific knowledge Prompt B. Moreover, as previously discussed, lowerproficiency L2 writers are generally unwilling to spend time planning, practicing, and revising their writing using various rhetorical strategies (Hyland, 2002; Raime, 1987; Sasaki & Hirose, 1996; Sasaki, 2000; Victori, 1999). Such reluctance may worsen if L2 writers lack topical knowledge. If good topical knowledge facilitates L2 writers’ performance, writers with weak L2 language proficiency need to rely more on such knowledge to help information processing in response to the writing task. The detected interaction effect between prompt  108  and proficiency on the students’ overall writing scores reminds test users to consider students’ proficiency levels while assessing their performance on the two prompts. In fact, the higher proficiency groups (intermediate and advanced) decreased in their overall writing scores on the specific Prompt B. Thus, L2 writers’ language proficiency is not the only factor that accounts for their writing performance; instead, topical knowledge or content knowledge is clearly another indispensable factor affecting students’ writing. The students’ overall difference in performance for Prompt A and B in this study raises two issues concerning the validity of L2 English writing assessment: the test construct under assessment and the methods employed in the assessment. On the one hand, the prompt on Canadian federal politics attempts to assess L2 writers’ writing ability within the assumed legitimate content of the host culture. This would appear to be aligned with the development of language testing, which has shifted from the traditional concept of language proficiency based on four language skills (i.e., listening, speaking, reading, and writing) to embrace the notion of communicative competence (Canale & Swain, 1980), which focuses on language performance in specific contexts. It is noteworthy that the construct of communicative competence intends to encompass not only linguistic knowledge (sentence-level grammar, vocabulary, and mechanics) but also knowledge of the social appropriateness of language use and means of structuring discourse to enhance the rhetorical effect for communicative goals (Bachman, 1990; Bachman & Palmer, 1996; Canale, 1983; Canale & Swain, 1980). From this perspective, Prompt B seems an attempt to link the writing task with the specific knowledge in the local context of the target language. Admittedly, the local LPI test is a  109  gatekeeper for Canadian educational institutions through a consistent and standardized measure of Canadian English language proficiency. On the other hand, the skills that require writers to apply the topical knowledge demanded in Prompt B are more difficult to assess indirectly in an artificially formalized language assessment situation, like the LPI, than they would be in authentic circumstances driven by the demands of everyday community activity. Although timed essay tests or direct measures of writing are considered the best method for large-scale assessment and accurate placement, they are still not the ideal means for assessing writing competence (Peterson, 2009). Other confounding factors, such as writers’ stress in the impromptu timed test and the time needed to acquire the topical knowledge required by the prompt, contribute to the limitation of such tests for effectively assessing writing proficiency. These factors can have more influence on L2 examinees, such as the newly arrived participating ESL students in this study. It would appear unreasonable to assume these L2 learners from non-English-speaking countries could obtain knowledge of a complex phenomenon like federal politics in such a short time. Thus, there is a dilemma between what to test and how to test: while the test construct tries to be in accordance with its theory (the notion of communicative competence) by including content pertaining to the real world (e.g., federal politics), the test itself is limited in its power due to the assumption of the homogeneous population and the impossible duplication of real contexts as well as other cognitive factors for knowledge accumulation. This seems one of the most challenging issues in the assessment of language proficiency based on the notion of communication competence; that is, to what extent the assessed items  110  should reflect the characteristics of test content in response to the real heterogeneous population. While assessing language proficiency, it is essential to recognize the immediate connection between the test construct or content under assessment and the task used in the assessment because the validity of language proficiency assessment, like all tests, rests on the equivalence to all examinees. In this regard, Prompt B is biased against the newly arrived ESL writers. As the AERA (American Education Research Association), APA (American Psychology Association), and NCME (National Council on Measurement in Education) Standards (1999) emphasize, tests must be equivalent for examinees of different backgrounds:  The term equivalence refers to the degree to which test scores can be used to make comparable inferences for different examinees… In general, linguistic and cultural characteristics of the intended examinee population should be reflected in examinee samples used throughout the processes of test design, validation, and norming. (pp. 92 - 93)  Similarly, Spann (2000) comments on the importance of matching test content to the targeted examinee population in language testing:  As we move from a skill model (i.e., the traditional four skills of listening, speaking, reading, and writing) towards a comprehensive model (i.e., linguistic, pragmatic, sociolinguistic, and strategic competences), we find general purpose  111  skills testing somewhat inadequate and we seek instead to match test content and tasks more closely to the specified examinee population.(p. 36)  Following Spann (2000), I believe that Prompt B, by relying on the specific knowledge in assessing writing competence, is construct irrelevant which, according to Messick (1989), creates a major threat to validity.  4.2  Findings and Discussion for Research Question Two Consistent to the main effect of prompt on the overall writing score in Phase One, the  results of repeated measures of 3 ! 2 ANOVAs in Phase Two also revealed that students at each of the three proficiency levels all scored higher in components (content, organization, and language) for the general knowledge task (Prompt A) than the specific knowledge task (Prompt B)(see Table 4.6).  112  Table 4.6 Descriptive Statistics of Component Scores for Prompts Component Content (Prompt A)  Proficiency M SD N Basic 1.17 .63 17 Intermediate 3.25 .56 16 Advanced 4.75 .38 17 Total 3.05 1.59 50 Content (Prompt B) Basic .70 .48 17 Intermediate 2.53 1.19 16 Advanced 3.23 1.59 17 Total 2.15 1.59 50 Organization (Prompt A) Basic 1.11 .82 17 Intermediate 3.46 .65 16 Advanced 4.82 .57 17 Total 3.12 1.7 50 Organization (Prompt B) Basic .53 .60 17 Intermediate 2.67 1.26 16 Advanced 3.18 1.62 17 Total 2.11 1.68 50 Language (Prompt A) Basic 2.65 1.55 17 Intermediate 3.58 .81 16 Advanced 4.34 .66 17 Total 3.52 1.27 50 Language (Prompt B) Basic .64 .45 17 Intermediate .97 .55 16 Advanced 1.13 .67 17 Total .91 .59 50 Note. M= mean; SD = standard deviation; N= the number of participants  4.2.1  Findings and Discussion on Component Scores  The reader should recall that for the three component scores, the Bonferonni-adjusted critical value for statistical significance is .0167.The results of repeated measures of the 3 ! 2 ANOVAs revealed that the independent variable “prompt,” a within-subjects variable, which  113  consisted of two levels (Prompt A and Prompt B), produced statistically significant different main effects on three component scores (see Table 4.7): content (F (1, 47) = 30.90, p < .0167, partial est #2 = .4); organization (F(1, 47) = 35.11, p < .0167, partial est #2 = .43); and language (F(1, 47) = 308.53, p < .0167, partial est #2 = .88), and interaction effects on language across proficiency levels (F (2, 47) =.003, p < .0167, partial est #2 = .22). There were no interaction effects on content (F (2, 47) = 3.88, p > .0167, partial est #2 = .14) and organization, F (2, 47) = 3.72, p > .0167, partial est #2 = .14). No detected interaction effects for most components suggest that the effects of prompts on content and organization were systematic across three proficiency levels and did not depend on proficiency levels. Oppositely, the detected interaction effects suggest that the effects of prompts on the language component were unsystematic and need to be understood along with consideration of proficiency levels or vice versa. Without interaction effects of the prompt and proficiency on content and organization, it seems easy to identify how a lack of specific topical knowledge could hinder students from achieving optimal writing performances. The students’ lower writing scores for Prompt B indicate that their language resources were not readily sufficient for a satisfactory writing performance on a specific-knowledge topic which requires not only procedural knowledge (i.e., knowledge of strategies for composing or the knowledge of how to perform a task), but also declarative knowledge or factual knowledge (i.e., knowledge about subject matter and appropriate register in the specific contexts or knowledge of “what” to write). Such prompt effects can be seen even more clearly from the identified main effects of proficiency levels on the components.  114  Besides the main effects of prompts on component scores, the results of the repeated measures of the 3 ! 2ANOVAs also revealed a statistically significant main effects of proficiency on each of the component variables: content (F (2, 47) = 80.49, p < .0167, partial est #2 = .77); organization (F(2,47) = 70.76, p < .0167, partial est #2 = .75); and language (F (2, 47) =10.77, p < .0167, partial est #2 = .31) (see Table 4.7).  Table 4.7 3 ! 2 Univariate Analysis of Variance for Writing Component Score Measures Source Proficiency Level content organization language Error content organization language  df  F Between Subjects  2 2 2  80.49 70.76 10.77  47 47 47  1.01 1.28 .95 Within Subjects  !2  .77 .75 .31  p  .001** .001** .001**  Prompt Type content 1 30.90 .40 .001** organization 1 35.11 .43 .001** language 1 308.53 .87 .001** Proficiency " Prompt content 2 3.88 .14 .028 organization 2 3.72 .14 .032 language 2 5.45 .19 .007** Error(Prompt) content 47 (.66) organization 47 (.18) language 47 (.01) Note. **. The mean difference is significant at the .0167 level; **p <.0167, two tailed. Values enclosed in parentheses represent mean square errors.  115  Given the main effects of proficiency and the interaction effects on the language component score (p = .007), a post-hoc analysis using a Tukey b test was applied to investigate which at proficiency levels the language component for two prompts differed (Table 4.8). The post-hoc test revealed a significant mean difference of proficiency effects on the language component between the basic and the advanced (p = .001), but there were no such mean differences for the pairs between the basic and the intermediate (p =.03), and the intermediate and the advanced (p = .144) for two prompts. The findings here suggest that the effects of prompts on language were unsystematic or unequal. In other words, no mean differences of proficiency effects on the language component scores between the basic and the intermediate levels as well as at higher proficiency levels (intermediate and advanced) were supposed to exist if the difficulty of prompts was the same. However, the plots for component scores of content, organization, and language show that even students at the higher proficiency levels scored much lower on Prompt B than they did on Prompt A (see Figure 4.2, Figure 4.3, and Figure 4.4). A comparison of the three figures illustrate a bigger differences between the prompts for the language component scores than for the content and organization scores. This decrease of scores from Prompt A to Prompt B indicates that the effects of prompts and that proficiency levels alone could not guarantee the ESL students’ effective text generation.Thus, the specific topical knowledge called for in Prompt B qualitatively effects even the high proficiency level students’ writing scores of each component in comparison with their scores of Prompt A.  116  Table 4.8 Post-hoc Analysis: Comparisons of Mean Differences of Proficiency Effects on Language Component Scores for Prompts 95%CI Component (I)LP (J)LP MD(I-J) SE Sig.a LB UB Language Basic Intermediate - .63** .24 .03 -1.21 - .05 Advanced -1.09** .24 .001** -1.66 - .52 Intermediate Basic .63** .24 .03 .05 1.21 Advanced - .46 .24 .14 -1.04 .12 Advanced Basic 1.09** .24 .001** .52 1.66 Intermediate .46 .24 .14 - .12 1.04 Note. Based on observed means **. The mean difference is significant at the .0167 level, **p <.0167, two tailed. a. Adjustment for multiple comparisons: Boniferron CI= confident interval; MD= mean difference; SE= standard error; LB= lower bound; UB=upper bound  Figure 4.2 The Plot of Prompt Effects on “Content” Component Scores  117  Figure 4.3 The Plot of Prompt Effects on “Organization” Component Scores  4.4 The Plot of Prompt Effects on “Language” Component Scores  118  Table 4.9, summarizing the findings through the repeated measures of the 3 ! 2 univariate ANOVAs along with the follow-up post-hoc analysis, visually displays prompt effects on component scores for Research Question Two.  Table 4.9 Summary of Prompt Effects on Writing Component Scores in Phase Two Proficiency Basic  Intermediate  Advanced  Component content organization language content organization language content organization language  Prompt A high high high high high high high high high  Prompt B low low low low low low low low low  Main Effect yes yes yes yes yes yes yes yes yes  Interaction no no yes* no no no no no yes*  Note. *Interaction effects for the pair of the basic and the advanced except the pairs of the basic and intermediate levels as well as the intermediate and advanced levels.  4.2.2  Findings and Discussion on Indicator Scores To identify specific factors that could affect the detected difference of the prompt  effects, I then conducted further analysis by repeated measures of the 3 ! 2 ANOVA to every individual indicator in each component in Phase Two. The reader should recall that, for the indicator scores, the Bonferonni-adjusted critical value for statistical significance is 0.007.The descriptive statistics showed that the general knowledge task (Prompt A) had  119  higher group mean indicator scores than the specific knowledge task (Prompt B) in terms of idea quality, position-taking, idea development, idea wrap-up, coherence and cohesion, fluency, accuracy, and lexical complexity (see Table 4.9).  Table 4.10 Descriptive Statistics of Indicator Scores for Prompts Indicator idea quality (Prompt A)  idea development (Prompt A)  Proficiency Basic Intermediate Advanced Total Basic Intermediate Advanced Total Basic Intermediate Advanced Total Basic Intermediate Advanced Total Basic  M 1.82 3.39 4.91 3.37 1.11 2.92 3.54 2.52 1.04 3.43 4.88 3.11 .77 2.81 3.39 2.32 1.46  SD .66 .47 .47 1.39 .73 1.16 1.73 1.63 .88 .59 .32 1.73 .62 1.25 1.69 1.68 .84  N 17 16 17 50 17 16 17 50 17 16 17 50 17 16 17 50 17  idea development (Prompt B)  Intermediate Advanced Total Basic  3.58 4.8 3.28 .68  .74 .43 1.56 .53  16 17 50 17  Intermediate Advanced Total Basic Intermediate Advanced Total  2.68 3.16 2.16 .34 2.61 4.43 2.46  1.32 1.57 1.62 .57 1.13 .68 1.89  16 17 50 17 16 17 50  idea quality (Prompt B)  position-taking (Prompt A)  position-taking (Prompt B)  idea wrap-up (Prompt A)  120  Indicator idea wrap-up (Prompt B)  Proficiency M SD Basic .24 .44 Intermediate 1.71 1.48 Advanced 2.84 1.47 Total 1.59 1.62 coherence & cohesion (Prompt A) Basic 1.11 .82 Intermediate 3.46 .65 Advanced 4.82 .57 Total 3.12 1.71 coherence & cohesion (Prompt B) Basic .53 .60 Intermediate 2.68 1.26 Advanced 3.18 1.62 Total 2.11 1.68 fluency (Prompt A) Basic .96 .59 Intermediate 2.92 1.26 Advanced 3.26 .94 Total 2.37 1.4 fluency (Prompt B) Basic .61 .46 Intermediate 1.8 1.02 Advanced 1.8 1.02 Total 1.39 1.03 accuracy (Prompt A) Basic 3.4 2.27 Intermediate 4.01 .88 Advanced 4.87 .76 Total 4.1 1.58 accuracy (Prompt B) Basic 1.71 1.74 Intermediate .13 .10 Advanced .19 .19 Total .16 .16 lexical complexity (Prompt A) Basic 3.59 2.18 Intermediate 3.81 .88 Advanced 4.87 .77 Total 4.1 1.52 lexical complexity (Prompt B) Basic 1.14 .82 Intermediate .98 .80 Advanced 1.42 1.31 Total 1.18 1.01 Note. M= mean; SD = standard deviation; N= the number of participants  N 17 16 17 50 17 16 17 50 17 16 17 50 17 16 17 50 17 16 17 50 17 16 17 50 17 16 17 50 17 16 17 50 17 16 17 50  121  Table 4.10, showing the repeated measures of the 3 ! 2 ANOVAs, revealed that there were statistically significant main effects of prompts on most indicators including idea quality,(F (1,47) = 56.36, p < .007, partial est #2 = .71); position taking (F(1, 47) = 78, p < .007, partial est #2 = .77); idea development (F(1, 47) = 57.65, p < .007, partial est #2 = .71); idea wrap-up (F(1, 47) = 71.03, p < .007, partial est #2 = .75); coherence and cohesion (F (1, 47) = 70.76, p < .007, partial est #2 = .75); fluency (F(1, 47) = 23.32, p < .007, partial est #2 = .5) except indicators of accuracy (F(1, 47) = .47, p > .007, partial est #2 = .03), and lexical complexity (F(1, 47) = .56, p > .007, partial est #2 = .03). The present finding suggests that the main effects of prompts were not systematic on all indicators and need to be understood along with consideration of proficiency levels or vice versa.  122  Table 4.11 3 ! 2 Univariate Analysis of Variance for Indicator Score Measures Source  Measure  df Between Subject  F  !2  p  56.36 78.00 57.65 71.03 70.76 23.32 4.25  .71 .77 .71 .75 .75 .50  .001*** .001*** .001*** .001*** .001*** .001*** .02  Proficiency Levels idea quality position-taking idea development idea wrap-up coherence-cohesion fluency accuracy lexical complexity  2 2 2 2 2 2 2 2  3.68  .03 .03  .03  Error  Prompt Types  idea quality 47 position-taking 47 idea development 47 idea wrap-up 47 coherence cohesion 47 fluency 47 accuracy 47 lexical complexity 47 Within Subject idea quality 1 position-taking 1 idea development 1 idea wrap-up 1 coherence cohesion 1 fluency 1 accuracy 1 lexical complexity 1  (1.16) (1.18) (1.32) (1.35) (1.28) (1.34) (1.13) (1.8) 23.43 19.30 46.86 22.71  .33 .29 .50 .33  .001*** .001*** .001*** .001***  35.11 69.06 352.11 164.62  .43 .60 .88 .78  .001*** .001*** .001*** .001***  123  Source Measure Prompt ! Proficiency idea quality position-taking idea development idea wrap-up coherence cohesion fluency accuracy lexical complexity Error (prompt) idea quality position-taking idea development idea wrap-up coherence cohesion fluency accuracy lexical complexity  df  F  !2  2 2 2 2  2.32 4.09 2.77 4.09  .10 .15 .11 .15  .11 .02 .07 .01  2 2 2 2  3.72 7.84 5.72 1.70  .14 .25 .22 .07  .03 .001*** .006*** .19  47 47 47 47  (.77) (.81) (.66) (.82)  47 47 47 47  (.72) (.35) (1.1) (1.29)  p  Note. ***. The mean difference is significant at the .007 level. ***p <.007, two tailed. Values enclosed in parentheses represent mean square errors.  Further analysis of the students’ indicator scores by the 3 ! 2 ANOVA indeed provided detailed, useful information to trace differences in specific linguistic features used by students in each of the prompts. The descriptive statistics (Table 4.10) showed that students at the higher proficiency levels (intermediate and advanced) scored lower on indicator scores of position-taking and idea development for Prompt B than Prompt A. Rather than make a direct claim about thesis in the introductory paragraph, some of the students  124  hesitated to state their position pertaining to federal politics when writing for Prompt B. For example, Cho, who was from South Korea and at the intermediate level, tacitly avoided taking a position in response to Prompt B. The following is an excerpt from his essay:  Whenever I think about politics, I get sleepy. Actually I borrowed a federal politics book from the library, but I don’t understand it as I am a girl or Korean who certainly doesn’t know much about politics.  Similarly, Jenny, who had immigrated from China and lived in Canada for a year, wrote, “Presently, some people take a tremendous interest in federal politics, yet others show great interest in federal politics…. Federal politics is not related to my personal life.” Rather than explicitly stating her position, she imbedded it; that is, she included an implicit thesis statement. In the same way, Su, an intermediate ESL student from Taiwan, wrote the following reasons in response to Prompt B explaining why he was not interested in Canadian federal politics. The spelling and grammar errors in the following samples were shown in the original samples.  The cause of I am not interested in any federal politics or government is in Taiwan. Taiwanese government is messed up because whenever there is a discussion there is fighting going on. My mom is always saying that it is a shame when politicians fight. She also said  125  that people from other countries have to match those discussion because they think it is fun to watch. Federal politics are boring because there is nothing to care about because all the politicians care about themselves. There are two things they care about, are they making enough money and how much they are making. The politicians are always fighting for these two questions. This situation happens all the time and it gets boring every time you watch. Personally, I do not care about any federal politics because basically most politics they make are for themselves. For example, when politicians argue, they argue for their own benefits. Sometimes they will argue for societies and people in their region. I do not care what they argue as long as it is good for the society and people they care. Federal politics are boring and these is nothing to care about; therefore, I am not interested in any federal politics.  Apparently, having no empirical evidence from the literature that could be used to support his points of view during the test, Su heavily drew on his life experiences in his home country to support his position. Rather than respond to the prompt by analysis or illustrations directly related to the topic itself, he explained his point of view based on what he knew about Taiwan. From an Asian cultural upbringing where the elderly are usually associated with authority or the power of the hierarchical relationship in a family where parents rank at the top (Pye, 1985), Su held his mother’s opinion as bolstering evidence in defense of his thesis. However, such a strategy may not be effective in the Canadian context, for there are more differences than similarities between the two countries, each displaying a varying  126  sociocultural system. As a result, Su could not effectively clarify his point of view in support of his position on federal politics for Prompt B. Rinnert and Kobayashi (2009) did capture instances in which L2 writers were more willing to write about things they held positive views about. Some of the positive views, according to Canagarajah (2006), were shaped by the diverse social contexts students came from. Like Su, most participating students digressed while responding to Prompt B since they were not familiar with the Canadian federal politics. Similarly, in responding to Prompt B, David from the basic level lacked cohesiveness or clarity while trying to defend his opinion in his essay:  Most people like to find something they think interesting to know and study. Because federal politics are too many backgrounds and relationship between people and other people, federal politics is boring. I come from China, China is not federal system. So, I never know about federal politics. If I want to know federal system, I will go to study their backgrounds. In China, I only know a little of backgrounds about the government, and I feel it is not interesting. Therefore, I do not like to know about federal politics. Federal politics are a complicated subject. We need to know how the government work, and how many parts in the government. Moreover, we need to know lots of relationship between government and people. That like you are doing a different math question.  To illustrate, David was trying to clarify his views about federal politics, but he redundantly stated his opinions and only presented two major facts --- “I come from China”  127  and “I do not know federal politics” --- without more relevant logical explanations and evidence. Apparently, such a superficial discussion without convincing evidence or explicit explanations fails to present a fully developed position in response to Prompt B with relevant, fully extended, and well-supported ideas. In contrast, participants generally presented a well-developed response to Prompt A (the choice of what to study). Su presented a clear central topic within each paragraph, using relevant personal experiences to support the discussion in the body paragraphs. Given that the prompt asked for the students’ opinion about what to choose for their study, Su’s personal experiences as supporting evidence sounded plausible. In addition, he skillfully managed paragraphs employing logically organized information and ideas. See the following sample he wrote for Prompt A.  Attending a college or a university is necessary for everyone, but choose what to study is another manner. The facts which will influence your choice of what to study are your interests and what do you want to be in the future. One’s interests will influence one’s choice because the more interesting, the more you want to know. When I study the subject I am interested in, obviously, I am going to get a good grade on the subject. I choose my courses based on how much interest I am into that subjects or courses. I choose math and computer, for I love electrical devices. Another reason is choose a subject area to study based on what you want to be in the future. For example, I want to be an electrical engineer, so I would choose any subjects that are about electronics. Electronics is a huge area, so it is not an easy subject to study. Being  128  an electrical engineer, I will have to choose computers and math to increase my knowledge about electronics. A college or university student will choose their own courses for their own purposes. The fact that will influence on what to study are their own interests and what they want to be in the future.  Like Su, David wrote Prompt A in greater depth and breadth, along with more convincing supporting evidence than Prompt B. See the following essay he wrote for Prompt A. In a university students have to choose some classes for study. Someone says a good beginning is a half of success. So, if students want to be successful after university, they have to know what to study is good for themselves. However, two important factors, such as interest and preferable majors, will influence student choice of what to study. When students choose classes, they will think about what classes they think interesting. Firstly, if students think the class is interesting for them, they will learn fast and well. Because they are willing to learn, they will enjoy the class. For example, a student likes to learn math, so he want to do lots of homework about math. After that, his math marks are higher than other students. Next, when students love to go to the class, they do not feel any uncomfortable with it. Students have a good mood when they take the class which is interesting for them. For example, in the math class, teacher usually gives lots of homework after the class. And then, many students feel a little bit unhappy, but some of them will ask teacher to give more questions to them. Therefore, interesting is an important factor.  129  Preferable majors is an important factor that can impact students to choose what to study. Because students’ majors like a direction of study, they have to choice a right direction of study. In summary, interest and preferable majors are important factors for students choice of what they to study.  Although the essay had visible syntax problems, David did incorporate some features which are valued in the English-speaking context. With a clear thesis statement in the introductory paragraph (“Two important factors, such as interest and preferable majors, will influence student choice of what to study”), David quickly delivered a clear objective for the essay. Then he included a topic sentence, such as “interest” and “preferable majors” at the beginning of each body paragraph, and followed up with an explanation or a discussion using supporting evidence (e.g., math class). In addition, David used appropriate transitions (e.g., firstly, next, for example, in summary) both within sentences and between paragraphs to achieve coherence and cohesion. The accumulated evidence here shows that the ESL students’ topical knowledge is part of the metaknowledge that can encourage them to elaborate, embellish, and enliven some explicit and intentional textual-linguistic and rhetorical choices in response to writing tasks. That is, topical knowledge can allow ESL students to function sufficiently and effectively in composing an essay in response to a prompt, since“[e]ven the best writers will fail to produce good writing if they do not posses adequate information about the subject presented” (Ruth & Murphy, 1988, p. 253). As a direct measure of general writing  130  proficiency relies heavily on the selection of appropriate topics for eliciting writing samples, choosing an appropriate prompt with a neutral topic with some a personal connections is important for both the test taker and the test user. Importantly, the repeated measures of the 3 ! 2 ANOVAs (see Table 4.10) revealed that there were statistically significant interaction effects between prompts and proficiency levels on indicator scores of fluency (F(2, 47) =7.84, p < .007, partial est #2 = .25), and accuracy (F (2, 47) = 5.72, p < .007, partial est #2 = .22), but no such interaction effects were detected on the other indicator scores including idea quality (F(2, 47) = 2.32, p > .007, partial est #2 = .09), position-taking (F(2, 47) = 4.09, p > .007, partial est #2 = .15), idea development (F(2, 47)=2.77, p > .007, partial est #2 = .11), idea wrap-up (F(2. 47) = 4.09, p > .007, partial est #2 = .15), coherence and cohesion (F(2, 47) =3.72, p > .007, partial est #2 = .14), lexical complexity (F(2. 47) = 1.70, p > .007, partial est #2 = .07). In other words, the effects of prompts on most indicators were systematic across different proficiency levels except on indicators of fluency and accuracy, which need to be understood along with consideration of proficiency levels or vice versa. The plots of the repeated measure of the 3 ! 2 ANOVAs displayed detected simple main effect or interaction effect differences of the prompt types on the indicators fluency and accuracy, respectively. It is noticed here that students at higher proficiency levels (intermediate and advance) scored poorer for Prompt B on federal politics than for Prompt A and that their accuracy scores even dropped to the same point as the students at the basic  131  level did (Figure 4.5 and Figure 4.6). The three proficiency lines were not parallel, and the interactions between prompt types and proficiency levels were obvious.  Figure 4.5  The Plot of Prompt Effects on “Fluency” Indicator Scores  Figure 4.6  The Plot of Prompt Effects on “Accuracy” Indicator Scores  132  Given the inconsistent main effect of proficiency and interaction effects of prompts and proficiency levels on indicators of fluency and accuracy, a post-hoc analysis was conducted to examine at which proficiency levels fluency and accuracy indicator scores differed. Table 4.12, listing the post-hoc analysis using a Tukey b, revealed a significant mean difference of proficiency effects on the pairs of fluency between the basic and the intermediate (p =. 001) as well as the basic and the advanced (p = .001). There was no such mean difference of proficiency effects on the pair of fluency between the intermediate and the advanced (p =.831), nor were there such proficiency effects on the accuracy indicator across all pairwise comparisons of proficiency levels (p =.538, p =.19, p =.016). The findings here suggest the unsystematic effects of prompt types on the specific textual-linguistic features used by the writers. The fluency indicator needs to take proficiency levels into consideration when investigating the effects of prompts, while the accuracy indicator has no such need because the effects of prompts on it do not depend on proficiency levels or vice versa. There were no interaction effects between prompt and proficiency on most indicator scores detected by the post-hoc analysis, which further emphasizes the significant effects of prompts on the specific textual-linguistic features. Thus, the sharp drop of the accuracy indicator scores for Prompt B across all proficiency levels (see Figure 4.6 above) was mainly due to the difficulty of Prompt B. As the repeated measures of the 3 ! 2 ANOVAs show, there were a big effect size for the main effects of prompts on accuracy (F (49, 1) =352.11,  133  p < .007, partial est #2 = .88) (see Table 4.11 above). While in accordance with the existing literature claim that essay length symbolizes writing quality rating at low proficiency level (e.g., Jarvis, Grant, Bikowski, & Ferris, 2003), the present findings suggests that a lack of specific topical knowledge can hinder L2 writers’ text generation, including fluency and accuracy even for those at higher proficiency levels.  Table 4.12 Post-hoc Analysis: Comparisons of Mean Difference of Proficiency Effects on Fluency and Accuracy Indicator Scores for Prompts  (J) LP MD (I-J) SE Sig.a LB UB Intermediate -1.58*** .28 .001 -2.27 - .89 Advanced -1.74*** .28 .001 -2.42 -1.06 Intermediate Basic 1.58*** .28 .001 .89 2.27 Advanced -.17 .28 .831 -.86 .52 Advanced Basic 1.74*** .28 .001 1.06 2.42 Intermediate .17 .28 .831 -.52 .86 accuracy Basic Intermediate .28 .26 .538 -.91 .53 Advanced -.74*** .26 .016 -1.37 -.12 Intermediate Basic .28 .26 .538 - .35 .91 Advanced .46 .26 .190 -1.1 .17 Advanced Basic .74*** .26 .016 .12 1.37 Intermediate .46 .26 .190 -.17 1.1 Note. Based on observed means ***. The mean difference is significant at the .007 level, ***p <.007, two tailed. a. Adjustment for multiple comparisons: Boniferron CI= confident interval; MD= mean difference; SE= standard error; LB= lower bound; UB=upper bound Indicator fluency  (I) L P Basic  134  Meanwhile, the results of the repeated measures of 3 ! 2 ANOVA (see Table 4.10 above) showed a lower frequency of academic words for Prompt B than Prompt A based on the Academic Word List (AWL) devised by Coxhead (1998). As shown in Table 4.10, there were a big effect size of the main effects of prompts on lexical complexity, measured by the percentage of frequency of academic words of each essay (F(49, 1) = 164.15, p < .007, partial est #2 = .78). The results of the Academic Vocabulary Highligher analysis showed that over 90% of the total words (tokens) of 10,907 for Prompt A, and of 6,726 for Prompt B were general-service vocabulary on the General Service List (West, 1953). Less than 10% of the tokens for both prompts were academic lexical words listed in the AWL. Table 4.12 shows a glimpse of the running word frequency for both prompts, starting from the most frequent words. Among the displayed word frequency results, only one academic word, “federal” was on the AWL (Coxhead, 1998) possibly because the word was used in the instructions for Prompt B.  135  Table 4.13 The Word Frequency of the Tokens of Prompt A and Prompt B Prompt A  Prompt B  10,907 token  6,726 token  Frequency 477 416 384 312 262 237 211 192 183 154 146 145 138 127 124 118 116 104 100 93 93 87 82 79 76 73 70 69 67 58  Word to the I a in of is my and will university study for that you it have what can they are choice if be choose or college job students me  Frequency 312 249 206 180 168 152 128 127 117 114 113 98 82 81 80 68 64 63 62 59 56 53 51 45 45 43 40 39 38 37  Word I the to politics in and is federal* a it not of that for are people about have they will because my can government do am as know think or  136  In response to Prompt A, which was more related to their daily life, participating students had greater vocabulary for generating, elaborating, and producing ideas for the topic, whereas in their response to the specific knowledge Prompt B, students digressed or repeatedly used certain informal words or expressions to address the topic. Three unparalleled lines in Figure 4.4 show the obvious decrease of academic words from Prompt A to Prompt B. The lexical complexity indicator scores of the writers at the basic and the intermediate levels were close for two prompts, and we see the two lines overlapped at a certain point.  Figure 4.4 The Plot of Effects of Prompts on “Lexical Complexity” Indicator Scores  137  While previous studies have pointed out that academic vocabulary causes a great deal of difficulty for learners (Cohen, Glasman, Rosenbaum Cohen, Ferrara, & Fine, 1988), the findings here provided some complementary suggestions regarding the effect of writing tasks on eliciting vocabulary, among other confounding factors, can bring about such learning difficulties. Table 4.14, summarizing the findings through the repeated measure of and the 3 ! 2 univariate ANOVAs along with the follow-up post-hoc analysis, suggests the answer to Research Question Two.  Table 4.14 Summary of Prompt Effects on Indicator Scores in Phase Two Proficiency Basic  Intermediate  Indicator idea quality position-taking idea development idea wrap-up coherence & cohesion fluency accuracy lexical complexity idea quality position-taking idea development idea wrap-up coherence & cohesion fluency  Prompt A low low low low low  Prompt B high high high high high  Main Effect yes yes yes yes yes  Interaction no no no no no  low low low low low low low low  high high high high high high high high  yes yes yes yes yes yes yes yes  yes* no no no no no no no  low  high  yes  yes*  138  Proficiency  Advanced  Indicator accuracy lexical complexity idea quality position-taking idea development idea wrap-up coherence & cohesion fluency accuracy lexical complexity  Prompt A low low low low low low low  Prompt B high high high high high high high  Main Effect yes yes yes yes yes yes yes  Interaction no no no no no no no  low low low  high high high  yes yes yes  yes* no no  Note. *Interaction effects for the pair of fluency between the basic and intermediate levels as well as the basic and advanced levels except the pair of fluency between the intermediate and advanced levels.  The important implication of the results here is that writing prompts which differ in content or knowledge requirements elicit different textual-linguistic features in terms of content, organization, and language. The implication also indicates that there is no causal effect between language proficiency and L2 writing performance; namely, a higher level of language proficiency alone is not sufficient to predict L2 writing competence, although they are associated. Without a knowledge base about the topic, L2 writers are at a disadvantage when it comes to the metacognitive processes of composing. This knowledge base is even more important for L2 students who bring their EFL writing experiences and norms, which is different from ESL norms, to an English-dominant country. The research to date has emphasized that L2 writing is different from L1 writing, since it “draws more heavily on already chunked or automatized language than on new, novel language creations, in that text  139  generation is made up of predominantly fluent outbursts of language” (Manch$n & Hann, 2008, p. 236). As the participants in this study just arrived, they could not have received adequate exposure to Canadian culture, either through formal curricula or through informal life experience, to gain the specific knowledge required for writing Prompt B. While concurring with the existing research claims that L2 proficiency accounts for the quality of writing in terms of content, organization, language use, and discourse modes (Cumming, 1989; Hirose & Sasaki, 1994), the results of the analysis here have some complementary suggestions: proficiency assessments are relevant only under the condition that irrelevant content contamination is controlled. If writers have no topical knowledge base, proficiency becomes an empty, meaningless concept. These findings, through repeated measures of 3 ! 2 ANOVAs along with follow-up post-hoc analyses showed systematic main effects of prompt types on the quality of the ESL students’ writing and the specific textual-linguistic features generated by the two prompts, in terms of components such as content, organization, and language or individual indicators including idea quality, position-taking, idea development, idea wrap-up, coherence and cohesion, fluency, accuracy, and lexical complexity. There were also systematic main effects of proficiency on the overall writing scores, component scores, and indicator scores on the two prompts as well. However, the post-hoc analysis revealed statistically significant mean differences of proficiency effects on the component scores of content and organization and most of the indicators scores. There were no effects of proficiency on the component of language and the indicator of fluency at high proficiency levels (intermediate and advanced), nor were any significant mean difference of proficiency effects on accuracy and lexical  140  complexity across all proficiency levels. Instead, there was a sharp drop of the indicator scores for accuracy and lexical complexity from Prompt A to Prompt B for students at the intermediate and advanced levels. In sum, the results of the 3 ! 2 ANOVAs consistently reveal effects of prompts on both components (content, organization, and language) and individual indicators within each component. Specifically, topical knowledge called for in Prompt B affected the participating students’ ability to fully address all parts of the writing task. Compared with Prompt A, which students addressed more fully, their responses to Prompt B restricted in content and loose in organization. These findings answered Research Question Two: the specific knowledge called for in Prompt B negatively influenced the specific L2 textual-linguistic features in content, organization, and language and their sub-features such as idea quality, position-taking, idea development, idea wrap-up, coherence and cohesion, fluency, accuracy, and lexical complexity. In light of test validity, Prompt B created construct irrelevance which distorted assessment of writing competence, thereby causing a notably lower score among L2 writers across all proficiency levels. As Messick (1989) addresses, the threat of construct-irrelevance to directness, in tandem with the threat of construct under-representation to authenticity, accounts for a large portion of variance in the invalidity of a test. Thus, the relevance of test tasks should be at the heart of any writing performance assessment. Meanwhile, the low scores on both Prompt A and B shown among the students at the basic proficiency level suggest that L2 writing is an integrative continuum, wherein writers at the lower level of language proficiency rely more on topical knowledge in their information processing. To  141  assess knowledge of specific content instead of writing proficiency, the topic should be determined by the content that is to be assessed, and an achievement test rather a language proficiency test is needed. The central concern illuminated by these findings pertains to the question: what kind of content should be included in the writing task in a language proficiency test? These findings suggest that the writing task should be checked for bias with cultural informants who share or have experience with the culture of the examinees for the sake of content validity.  4.3  Findings and Discussion for Research Question Three  After participants completed the two writing tests, five participants across the three language proficiency levels, basic (Mike, Bing), intermediate (Cho), and advanced (Paul, Hua), volunteered to participate in interviews. Paul and Hua achieved their bachelors’ degrees in China before immigrating to Canada eight months ago. Paul majored in physics, while Hua studied computer science. During the interviews, they both expressed happiness at having the opportunity to release their frustrations about responding to Prompt B and the difficulties they encountered in English writing at the college. Cho, a South Korean student, had been living in Canada for one year at the time of the interview. Cho was verbally quiet during my lecture providing participants with feedback on their essays, but she was active in response to the interview questions, particularly surrounding her perplexities in writing for Prompt B. She said that her goal in Canada was to study commerce or economics at the university level. She frankly articulated her concerns about writing with difficult topics such as Prompt B on the LPI test, which is a requirement of a local university she was attempting  142  to enter. Mike and Bing had graduated from high school in Taiwan five months before the interview. They did not hide the fact that they only wrote for Prompt A. They attended the test for Prompt B but did not complete the task due to their lack of knowledge of the topic, as they said in the interview. On the whole, the participants felt comfortable with writing about their choice of study (Prompt A), but found writing about federal politics (Prompt B) rather difficult. Five themes emerged as the five participants commented on their writing for the two prompts: the easiness in writing for Prompt A due to topic familiarity, and the difficulties in writing for Prompt B due to a lack of 1) knowledge about federal politics, 2) vocabulary to write on unfamiliar topics, 3) confidence to comment on political authorities, and 4) understanding of the cultural expectations of the readers.  4.3.1  Topic Familiarity  As they explained why they felt Prompt A (choice of what to study) was easy to write about, the five participants consistently said that Prompt A was more interesting than Prompt B (federal politics) as it was related to their life and they had prior knowledge for the topic. As Mike said, “Prompt A is easier and that is what I am going to think about after I am done with my study at this college.” Similarly, Bing firmly confirmed that Prompt A was easier. According to him, “Writing is just like reading, and both needs something you know to help you either writing or understanding what is said there.” Cho explained that she had more words for writing Prompt A compared with Prompt B. As she commented, “You have vocabulary for it [Prompt A]....It[Prompt A] is a daily life topic.”  143  4.3.2  A Lack of Knowledge about Federal Politics  While recounting their perceptions about writing for the two prompts, the interviewees all felt it difficult to write for Prompt B in depth and breadth due to their unfamiliarity with the Canadian government systems. As Bing explained in English to the researcher in the interview:  Most of the students here [in this class] came here [Canada] not long and haven’t settled down. What they know seems only something like the names of democracy party and conservative. For other parties, they have no ideas. They are not familiar with the politics here yet. They don’t know the government systems either. So, it is more difficult for us to write this topic.  Mike also complained that Prompt B was too difficult for him due to his unfamiliarity with the topic, commenting:  To me, a good writing should be easy and understandable. Also the content is interesting. I sat there thinking for a long time but got no ideas where to start writing about it. You just don’t know it. How can you write about it?  Regarding their difficulty in writing for Prompt B due to their lack of topical knowledge, all interviewees showed a willingness to learn more about the Canadian society and culture. In this regard, Mike articulated his great difficulty in writing for Prompt B: “I  144  don’t know much about federal politics. I just came here last term. I may learn more about it in the future.” Meanwhile, in his reflective recount, Bing stated his belief in the importance of practice in English: “I need more practice and write more. [I] also need to increase vocabulary. Also [I] don’t just translate my Chinese thoughts to English .... I need to do more reading and also watch more English TV to influence my thinking way.” The students’ recounts also pointed to the influence of previous education and cultural backgrounds on their writing. Compared with the Western education system, which emphasizes creative thinking and problem-solving ability, the Asian curriculum places a greater emphasis on classroom achievements and what students learn from their textbooks. There is still little opportunity for professional preparation in EFL countries, so teachers of L2 writing often rely on textbooks as their source of pedagogical knowledge (e.g., Matsuda, 2005). Teaching English in China is different from that in an English-speaking country due to a group of related factors, such as Chinese culture, pedagogical traditions, and test-based teaching (He, 2002). In addition, Chinese culture is known for its long literary tradition. One example of the tradition is Civil Service exams (196 BC – 1905), which tested one’s ability to write a rigid Chinese rhetorical pattern known as the “Eight Legged Essay.” Following such a tradition of focusing on the product rather than the process of writing, many Chinese students and writing instructors still believe that learning means copying models while testing means writing model texts from memory (e.g., Ergaugh, 1990; You, 2004). By memorizing model texts, some Chinese students are able to pass various exams including the Test of Written English. Students receiving schooling in China are used to being tested on their memorization of class notes and textbook content. Accordingly, when students had to  145  write about Canadian federal politics, which was beyond what they had learnt or memorized, they felt lost. Such different educational backgrounds may account for the reasons why Bing insisted during his talk,  I wish my teacher could talk something about Canadian politics before they asked us to write about it. We are not taught. It [knowledge about Prompt B] is not in our textbook.  As such, students with Asian cultural backgrounds in this study might assume that learning English in the host country was done in the same manner as in their home country. For example, while sharing his previous experience of writing English essays in China using TOEFL materials published by a private language school, Hua said,  I often went to writing samples published by the New Oriental school and looked for some transition words I could use in my own writing, like which word was used in the introduction, conclusion, and so on. Those sample essays could give me some good expressions.  From a reader-responsible culture that emphasizes flowery and ornate prose, Hua tried to include more “good expressions” in his writing. When these students were faced with Prompt B, they felt suddenly alienated, since they had not accumulated such knowledge in the target language. As Mike commented,  146  Something you have experienced before can make it easy to write about it [Prompt B]. At least you are clear how to express in Chinese… If you’ve never experienced it, it is certainly difficult to write about it.  Cho also stated frankly, “I like to write something related to my experiences.” Similarly, Hua aired his opinion that the precondition for writing a good essay was “not the type of Writing but something you are familiar with.” During the conversations with the interviewees, I noticed that the meaning of the word “experience” in their recounts had a broad sense referring to knowledge from both explicit and implicit L2 knowledge accumulation. Learning English writing, therefore, can be seen as a progressive process involving their real-life exposure to the target language in the host culture as well as classroom learning. Some of them even believed that topical knowledge could influence their preferences for certain discourse modes. As Hua commented, “Narrative topics about [personal] life is easier than argumentative ones since you generally have something to say there.” At this point, all interviewees unanimously concurred that a lack of topical knowledge on Canadian federal politics blocked both an active writing process and an optimal writing performance on Prompt B. The student’s recounts were consistent with the detected statistic analysis results in Phases One and Two. The students’ previous experience of L2 writing instruction in EFL contexts shows that we cannot take for granted that L2 writers have a static knowledge base. Rather, the more exposure and access a writer has to  147  the target language and culture, the more it can help a writer reshape his or her knowledge for composing in the target language.  4.3.3  A Lack of Vocabulary to Write on Unfamiliar Topics  Interviewees also expressed that a lack of vocabulary was another obstacle while writing for Prompt B. Some considered that without the necessary subject words for federal politics they could not develop their thoughts and ideas for the essay. For instance, Cho recounted that she was cautious and struggling while writing for Prompt B: “I thought a long time thinking about Topic 2, but finally I still couldn’t write more and I give up. I don’t want to write something wrong. I need to know more words to express my views.” Paul told me that he had no words about Canadian federal politics to communicate his thoughts:  I needed to express “  ” (“run for” in Chinese), but I did not know which word  I can use…. There were many situations like this in writing Topic 2 [Prompt B]. It was a test and we were not allowed to use a dictionary. So, I was stuck there most of the time…. To write Topic 2 well, you need to know a lot of words about the topic. Otherwise, you can write or only talk about something superficial.  Students’ talk also indicated cultural and linguistic barriers between languages. It was noted that interviewees had memorized a certain number of English words based on the denotation (i.e., dictionary meaning); however, these words learned as a foreign language in their local cultural contexts did not carry identical connotations in the target culture, and thus might fail  148  to convey ideas accurately for writing on a culturally bound topic like Canadian federal politics. Research to date has found evidence of a greater and more direct reliance on L1 lexical knowledge (Mánchon & Roca de Laros, 2007; Wang &Wen, 2002) and on source texts in L2 writing (Moore, 1997; Shi 2004, 2010). This may account for Mike’s response:  It [English] is different from what I learned before. Sometimes I am not sure what those words really mean and how to use them correctly although I know them. For example, I know “political “is something related to government. But I heard people use it to say about a persons, saying something like “You are political.” I feel English words are more complex. We need to know lots of words about government and politics for Prompt B. I am just poor at them.  4.3.4  A Lack of Confidence to Comment on Authorities  The four interviewees from China said that they felt nervous about making comments on federal politics for Prompt B. During the interview, Paul relayed his ideas using the case of Lai Changxing, a Chinese smuggling ringleader of ‘most-wanted’ status by the Chinese government. Recently, the Canadian government issued Lai working authorization in Canada. Paul showed puzzlement at such permission from the Canadian government while making comments on the case:  You really don’t understand Canadian politics, so you dare not to make more comments on it. This made writing topics 2 [Prompt B] more difficult. It’s not just  149  writing, actually what’s right or wrong also made this topic hard. I feel somewhat nervous about saying something wrong there.  At this point, I noticed that those participants from China appeared more cautious at the beginning and asked about the purpose of this study although they had been informed of the study in detail. In my view, the Chinese students’ nervousness may be attributed to the influence of the Confucian culture that values morality, stability and harmony and the fulfillment of morals between the ruler and the ruled through social stability and obeisance to authorities. In Confucian culture, revering authorities was viewed as morality, while a “good” piece of writing should be judged by its reverence to authority, among other criteria. The four interviewees’ lack of confidence to comment on Canadian federal politics suggests that Confucian philosophy still influences some areas of Chinese behavior even though it emerged in China in the late sixth century B.C, one millennium ago.  4.3.5  A Lack of Understanding of the Expectations of Readers  When the interviewees were asked about what they found most difficult in writing Prompt B, they said that they were not clear about what they were expected to write in terms of the appropriateness of form, content, and accuracy to satisfy the readers, who were local instructors. They found that translating their thoughts in the manner of their home language distorted their desired meaning. As they described, they often fell into a dilemma where they carefully looked up new words in the dictionary and double-checked their meanings before incorporating them into the essay, but their essays turned out somewhat awkward, lacking  150  clarity and accuracy, according to their instructors. Paul shared his difficulty in reading this aspect:  The most difficult thing to me [in writing Prompt B] is I don’t know which words can be used, which words not look awkward to the people here [Canada]. Sometimes you have certain words in mind, but after you use them, you can feel that they are so awkward but you don’t know why.  As interviewees indicated, they needed not only vocabulary for communicating their ideas for Prompt B, but also an understanding of the cultural connotations of words. Interviewees had no idea of how to meet the expectations of readers in the target language context. Such lack of understanding of the readers’ expectation in the target language may help explain Paul’s puzzled behavior in his writing for Prompt B:  For example, to express “  ” [in Chinese], English says“learn a lesson”, but  we write in our Chinese way and say “receive a lesson”. We translate word for word from our Chinese way. Also, to express an idea, we often have to look up the words in the dictionary. Actually  [foreigners] here don’t use those big words but very  simple and common words. Most of the time my writing sounds so awkward although I spend a lot of time.... Also, in Chinese we first say year, then month, and last date,  151  but in English it is just opposite. Most English are in this way. You have to think in an opposite way in English.  Interviewees also commented on what makes a good English essay and their understanding of how to balance between form and content. As Paul said,  Sometimes content should be more important than form, in my opinion. Even if you write a perfect form like English way, but if you don’t have content there, to me, that’s not a good writing. But the fact is if we don’t match English form, our writing is considered bad or off topic. Also, you can’t write English too fancy like you use so many beautiful words in Chinese.  In this regard, Hua emphasized that language development must be based on “absorbing other cultures and toleration of other cultures.” Hua thought highly of the current study, commenting it was an action of “exploring the development of a language in cultures”, and so did other participants.  4.4  Summary of Results In summary, the quantitative data analyses consistently show statistically significant  main effects of prompts on students’ overall writing performance across all three proficiency levels. The main effects of prompts were also shown on the three components (content, organization, and language) along with their indicators of various textual features. In  152  addition, there was an interaction effect between prompt and proficiency for the overall writing scores, but no such interaction effects on components of content and organization and most of the indicators. The qualitative interview data in Phase Three were consistent with the detected quantitative results in Phases One and Two: participants found Prompt B more challenging than Prompt A because of their lack of specific knowledge about federal politics, specific vocabulary for an unfamiliar topic and the expectations of readers. Some participants also expressed their lack of confidence in commenting on government authorities.  153  CHAPTER 5 CONCLUSIONS  This study reveals several theoretical, methodological, and pedagogical implications pertaining to construct validity in assessing L2 writing. Exploring the effect of topical knowledge on ESL students’ writing performance in testing situations, the findings of the study are based on data analyses of the participating ESL students’ (a) overall scores, (b) specific textual features, and (c) their perceptions of their experiences in responding to prompts requiring either general knowledge or specific knowledge. The findings assist in providing an understanding of the role of topical knowledge in L2 writing and assessment. They also reveal implications relevant to language testing and ESL pedagogy for test makers, administrators, and researchers alike. In this chapter, I summarize the findings and comment on the implications of the study with respect to research methodology, testing, and pedagogy. I conclude the chapter with some limitations of the study and suggestions for further research.  5.1  Summary of Findings The first research question asks whether ESL students across proficiency levels  perform differently when responding to Prompt A that requires general knowledge in comparison to Prompt B which requires knowledge of federal politics. The repeated measures of the paired-samples t test and the 3 ! 2 univariate ANOVA consistently revealed statistically significant main effects of prompts on the ESL students’ overall writing  154  performance across all proficiency levels (p < 0.05). The findings suggest that the participating ESL students, especially those at the intermediate and advanced proficiency levels, performed significantly better on the general topic Prompt A than they did on the specific topic Prompt B. The second research question asks whether ESL students across proficiency levels perform differently in terms of the specific textual features including content (idea quality, position-taking, idea development, and idea wrap-up), organization (coherence and cohesion), and language (fluency, accuracy, and lexical complexity), when responding to Prompt A that requires general knowledge in comparison to Prompt B which requires a specific topical knowledge. The findings suggest that the participating students performed better in content, organization, and language for Prompt A than they did for Prompt B. In general, students scored lower on Prompt B due to poor quality, implicit position, insufficient idea development, weak idea wrap-up, a lack of coherence and cohesion, shorter length, more syntax and lexical errors, and less frequent use of academic words. The effects of the prompts were constant even on students who had a higher proficiency level, as shown by their lower scores of components and indicators for Prompt B than for Prompt A. The results of repeated measures of the 3 ! 2 ANOVAs revealed statistically significant main effects of prompts across all proficiency levels on both the component level (content, organization, and language) (p < .0167) and the indicator level (idea quality, position-taking, idea development, idea wrap-up, coherence and cohesion, fluency, accuracy, and lexical complexity) (p < .007). No effects of interactions between prompt and proficiency were detected on components of content and organization and most of indicators across all  155  proficiency levels (p > .0167) except the language component between the basic and the advanced levels, and its indicator fluency between the basic and the intermediate levels as well as the basic and the advanced levels (p < .007). The third research question asked how ESL students perceive their writing for the general knowledge task compared to the specific knowledge task. The findings of the qualitative posttest interview generated five themes relating to cultural factors that could influence their writing performance: (1) topic familiarity about the writing task, (2) a lack of knowledge concerning federal politics, (3) a lack of vocabulary for writing on unfamiliar topics such as the federal politics, (4) a lack of confidence in commenting about political authorities, and (5) a lack of understanding of readers’ expectations in the host country. The findings of this study suggest that the knowledge-specific prompt causes construct-irrelevant variance in the manner it elicits ESL students’ writing performance. These findings should generate a greater awareness of validity and criterion. Such awareness can be enhanced from two points of view. One is cross-culture, the perspective that is derived through cross-societal comparisons. In this broad sense, validity should not solely correlate with criteria in a single context (e.g., Canada), but in a broad milieu which should consider diverse cultures and situations. The other is within-culture, the perspective that situates any claims within the criteria relevant to that specific context (e.g., Prompt B, with its expectation of students’ topical knowledge pertaining to federal politics). These two perspectives bring more questions about the background knowledge tacitly assessed in L2 language proficiency exams: what knowledge or whose knowledge should the test developers conform to, test users (e.g., test developers and test administrators) or test takers’, or both? As Messick’s  156  unitary validity theory (1989) reminds us, the construct validity must accommodate the value and social consequence along with other evidence. It is clear so far that this divergence caused by the construct irrelevance of specific topical knowledge in Prompt B (federal politics) threatens the accurate assessment of L2 writing English proficiency in the globalized context. Here, the norms in the native English speaking contexts, such as the assumption that L2 students should possess host country knowledge required by Prompt B, does not correspond to the normative features in nonnative varieties of English contexts. For example, the purposes of English learning in China still focus on test-based and literal skills including grammar and word choice (Reichelt, 2009), which is less reliant on cultural knowledge of native-English speaking countries. The findings of this study urge test users to distinguish deficiencies in L2 writing ability from L2 writers’ interpretations of writing topics resulting from their L1 cultural and educational backgrounds. This is particularly significant in the case of high stakes, norm-referenced tests such as the LPI, which is administrated for use in a specific local context and relied heavily upon local knowledge. The current standardized English tests agree with a single criterion: the native English-speaking linguistic and cultural norms. Given this limitation of criterion validity, it is advisable that test evaluators conduct construct validation; namely, that they identify and explain the sources of variability in L2 writing assessment contexts and determine how these variances may have the potential to negatively influence test scores.  157  5.2  Significance of the Study This study has been conducted in response to the need, noted by a number of scholars  in the psychometric measurement field, to gather evidence from different aspects of validity related to criteria, content, construct, value, and social consequences to interpret test scores (e.g., Cronbach & Meehl, 1955; Messick, 1989, 1994, 1996; Zumbo, 2005, 2007, 2009). The findings of this study support the notion that construct irrelevancy, as Messick (1989) observes, is a tangible threat to the validity of a test. A writing prompt about federal politics reflects construct-irrelevant variability and poses a challenge to L2 writers due to a lack of topical knowledge. The study thus raises an awareness of the content factor in language tests: a language test may not be valid if it relies on content knowledge that students do not possess. These problems are likely to occur in a test such as the LPI, which is developed for both L1 and L2 students but administered to one language and culture. Judgments of the relevance of test tasks should be at the center of any writing performance assessment. The present findings help fill the void on how writing topics or prompts, a “neglected variable” (Ruth & Murphy, 1988, p.xv), influence the quality of L2 writing. The study supports the claim that subject familiarity indeed affects L2 writing performance (Teddick, 1990; Winfield & Barnes-Felfeli, 1982). Different from other studies, the present study is centered on the test taker, not the test giver and rater. The study suggests that it is important to recognize what and how topical knowledge is relevant to test takers. Second, the results of the present study reject the common assumption that language proficiency is the main factor determining writing performances; instead, the quality of writing is also influenced by one’s topical or content knowledge. This study thus reshapes the  158  construct of L2 writing. The perspective of communicative competence (Canale & Swain, 1980; Bachman, 1990; Bachman, 1996) helps us see L2 writing as a constantly changing concept that encompasses not only linguistic competence (appropriate and broad lexis, fluent and accurate syntax, and accurate mechanics), but also discourse knowledge (content or topical knowledge, coherence, cohesion, appropriate conventions, and register). At the center of this more nuanced concept of L2 writing lie culture, situation, and purpose. A test of L2 writing should therefore consider what kinds of knowledge domains should be included in the construct of communicative competence. The inclusion of discourse knowledge in our understanding of L2 writing allows us to consider the complexities inherent in using language, including applying content or topical knowledge and accepting values in conflict with the discourses of which one is a member. Researchers have explored various factors influencing L2 writing, such as communication in different cultures and contexts (e.g., Cope & Kalantzis, 1993; Halliday, 1994; Hyland, 2003; Johns, 2003; Martin, 1985, 2002), the influence of L1 (Atkinson, 2004; Connor, 1996; Hirose, 2003; Kaplan, 1966; Kubota, 2004; Shi, 2001, 2006), negotiation of meaning, voice, and style (e.g., Casanave, 2004; Cumming, 2002; Fu, 1995; Hamp-Lyons, 1991; Hinkle, 1994), expression of one’s cultural identity (Ivanic, 1988, Norton, 2002b), and development of a writer’s own knowledge and authorship (Canagarajah, 2002; Hyland, 2000b; Pennycoook, 2001). This study produces complementary findings to these factors, indicating that these factors, to a certain extent, may all be part of the evaluation of the content validity of a test.  159  5.3  Implications for Methods This study is a methodological advance in investigating the effect of topical  knowledge on writing performance using a mixed method explanatory design through multiple phases. It includes repeated measures of the paired-samples t test, 3 ! 2 univariate ANOVAs, and in-depth qualitative posttest interviews. Unlike the few existing studies which employed either a quantitative or a qualitative method to investigate the effect of prompts on L2 writing performance (Spann, 1993; Teddick, 1990; Winfield & Barnes-Felfeli, 1982), this study explored the quantitative findings and then verified the numerical results with followup interviews. While holistic scoring in the literature is known for its “limited usefulness as a research tool for examining such topic effects” (Hoetker, 1982, p. 139), this study fills the methodological gap by using analytic scoring, which provides more information for an understanding of the investigated phenomena. To my knowledge, this is the first L2 writing study that has applied an advanced statistical principle component analysis to test the weighting of individual indicators (e.g., fluency, accuracy, and lexical complexity in the component “language”) for verifying the validity of the content domain in the measures of L2 writing competency, using a converted scale. Thus, this study is the first attempt to investigate the effects of prompts on ESL writing performance in such depth and breadth. The study used a multidimensional analysis through an integration of research methodologies. It takes a mixed-method explanatory design as a platform to investigate both quantitative and qualitative epistemologies through different phases, each of which informs the other in the analyses. In this manner, different methodological procedures are integrated into a coherent unitary analytical tool to meet the research purposes. Through the overall  160  comparison of written samples, the analyses of individual textual features, and verbal recounts, the study provides practical insights concerning the manner in which texts are constructed and assessed in response to different writing tasks. The merging and connection of the three-phase process in data collection, analysis, and interpretation allow for greater depth and breadth of evidence regarding test validity.  5.4  Implications for Testing The results of this study show that aspects of test construction, such as content  validity, construct validation, and item writing, influence test performance. The study calls for further research about the currently advocated construct of communicative competence in terms of which forms of knowledge underlying communicative competence are assessed. The present findings urge test developers to examine test specifications, which include test takers characteristics, test items, and scoring techniques to reduce threats to validity. This study emphasizes fairness and equality in language testing. It urges test developers and users to pay attention to the process of test development, especially the appropriateness of test tasks, before the test product is put into practice. The present findings call for in-depth rhetorical analysis beyond the assumption that L2 students confront different writing challenges compared with L1 students (e.g., Connor, 1996; Kachru, 1995; Kaplan, 1966), but what types of features and knowledge may enable L2 writers to meet the requirements for the specific writing task in appropriate response to the target language context. To avoid item biases towards a certain examinee population, the test developers should seek information pertaining to the specific skills needed to perform certain writing  161  tasks. Test developers, test users, and test takers need to co-operate at the very early stages of test development and get feedback from informants (e.g., L2 learners). If an item tests knowledge available only to a particular cultural group, it should be discarded. Content validity should not be limited to experts’ judgments without a consideration of the test takers’ previous educational and cultural backgrounds. Instead, test developers and users need to clarify the meaning of the construct, taking into account test takers’ characteristics, to reformulate the construct for the specific purpose of the test. Topics can be checked for bias with test takers who possess experience with the culture of the potential test takers, during the trial period of test development. Fairness of testing and validity of the test can be enhanced through direct dialogue among test developers, users, and examinees.  5.5  Implications for Teaching Another implication of this study is the important of acquiring the target language  through more exposure to the target language society by not only subconscious acquisition or exposure to the target language (for example, through informal learning such as reading novels and watching television), but also more involvement in conscious formal knowledge through formal instruction in written composition at school. Scholars advocating language socialization do address the importance of increasing social access to optimize L2 learners’ academic literacy acquisition (e.g., Block, 2003; Duff, 2003/2007; Lanntolf, 1996; Norton, 2000b; Norton & Toohey, 2001). This learning process includes group dynamics and activities such as role-taking (Lantolf, 2005), negotiation of task and self (Bourdieu, 1990), investment and cultural capital (Bourdieu, 1977; Norton Peirce, 1995; Norton, 1997),  162  language as social semiotic (Halliday, 1978), and participation in a certain community (Pavlenko & Lantolf, 2000). These teaching and learning activities, as researchers believe, can significantly help second language students’ development of meta-level cognitive and academic literacy acquisition, which would better serve L2 learners to communicate ideas in the target language (e.g., Raimes, 1985; Zamel, 1983) and to critique various discourses (Gee, 1993, 1998). To achieve this goal, this study proposes a pedagogy that encourages maximum social access to the target language discourses, following a “social-contextual approach” that “demystifies the institutional structure of knowledge” (Bizzell, 1982, p. 196). Knowledge acquisition for the demands of academic writing is socially generated and situated within the local cultural contexts; therefore, L2 writing pedagogy should focus on the conventions of academic discourse, highlighting the relationship between discourse, community, and knowledge. At the same time, L2 learners’ previous knowledge, as an essential resource, should also be valued to encourage students to take some ownership of their writing. This study calls for a situated pedagogical model, which embraces both the cross-culture and the inter-culture perspectives, for the teaching of L2 writers. To apply the model, the newcomers’ program should take into account the unique needs of English language learners. Such attention enables L2 writers, as are the case in the current study, to avoid a risk of failing the writing test due to the lack of specific topical knowledge. It is necessary to have a Newcomers’ Centre in an educational institution to provide a series of courses for new international students during the first few months of enrollment (Marler, 2006). The courses should entail not only writing skills but also cultural knowledge of the target language required in the writing.  163  5.6  Limitations of the Study and Directions for Future Research This present study focuses on a neglected field in writing assessment and research:  the effect of topical knowledge on ESL students’ writing performance in testing situations. Since the study focuses strictly on Asian ESL students from three Asian geographical areas: Mainland China, Taiwan, and South Korea, the findings cannot be extended to students of other cultural backgrounds. Future research should investigate the differences in writing performances among students with different backgrounds. Assessing writing in the target language is not merely a matter of testing a set of skills, but a progressive continuum that accommodates a set of culturally approved conventions as well as topical knowledge in composing written discourse. The results of this study in regard to topical knowledge in L2 writing clearly indicate that English as a world language has expanded the traditional concept of validity. This challenge underscores the urgent need for further theoretical concepualizations and more empirical studies that investigate the impact of writers’ topical knowledge on the quality of their writing. In addition.  164  REFERENCES  Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press. Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. TESOL Quarterly, 13, 280-287. Aliakbari, M. (2002). Writing in a foreign language: A writing problem or a language problem? PAAL 6, 157-168. Atkinson, D. (2004). Contrasting rhetorics/contrasting cultures: Why contrastive rhetoric needs a better conceptualization of culture. Journal of English for Academic Purposes, 3, 277-289. American Educational Research Association, American Psychological Association, & Educational Council of Measurement in Education (1985). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. American Educational Research Association, American Psychological Association, & Educational Council of Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Bachman, F. L. (1990). Fundamental consideration in language testing. Oxford: Oxford University Press.  165  Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press. Bailey, K. (1999). Washback in language testing. TOEFL Monograph No. 15. Princeton, NJ: Educational Testing Service. Bakhtin, M. M., Ed. (1981). The Dialogic Imagination: Four Essays. University of Texas Press, Austin. Balcher, D. (1991). Nonnative writing in a corporate setting. The Technical Writing Teacher, 18, 104-115. Bartelet, H. G. (1983). Transfer and variability of retrial redundancy in Apachean English interlanguage. In S. Gass & L. Selinker (Eds.), Language transfer in language learning (pp. 297-305). Rowley, IVIA: Newbury House. Barton, D., & Hamilton, M. (2000). Literacy practices. London, England: Routledge. Bereiter, C., & Scardamalia, M. (Eds.) (1982). From conversation to composition: The role of institution in a developmental process (Vol. 2). Hillsdale, NJ: Lawrence Erlbaum. Bereite, C., & Scardamalia. M. (1987). The psychology of written composition. Hillside, NJ: Lawrence Erlbaum Association. Bizzell, P. (1982). College composition: Initiation into the academic discourse community. Curriculum Inquiry, 12, 191-207. Block, D. (2003). The social turn in second language acquisition. Georgetown, Washington, D.C.: Georgetown University Press. Brown, H. D. (2004). Language assessment principles and classroom practices. New York, USA: Pearson Education, Inc.  166  Bosher, S. (1998). The composing processes of three Southeast Asian writers at the postsecondary level: An exploratory study. Journal of Second Language Writing, 7, 205-241. Bourdieu, P. (1977). Cultural reproduction and social reproduction. In: J. Karabel, & A. H. Halsey (Eds.), Power and ideology in education (pp. 487-511). Oxford University Press, New York. Bourdieu, P. (1990). The logic of practice. Stanford, CA: Stanford University Press. Braddock, R., Lloyd-Jones, R., & Schoer, L. (1963). Research in written composition, Urbana, IL: National Council of Teacher of English. Brossell, G. (1983). Rhetorical specification in essay examination topics. College English, 45, 165-173. Brossell, G. & Hoetker Ash. B. (1984). An experiment with the wording of essay topics. College Composition and Communication, 35, 423-425. Canagarajah, S. (2002). Critical academic writing and multilingual students. Journal of English for Academic Purposes, 1, 29-44. Canagarajah, S. (2006). Changing communicative needs, revised assessment objectives: Testing English as an international language. Language Assessment Quarterly, 3, 229-242. Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1, 1-47. Canale, M. (1981). Communication: How to evaluate it? Bulletin of the Canadian Association of Applied Linguistics, 3, 77-94.  167  Canale, M. (1983). From communicative competence to communicative language pedagogy. In J. Richards & R. Schmidt (Eds.), Language and communication (pp. 2-27). London: Longman. Carroll, B. J. (1980). Testing communicative performance. London: Pergamon Institute of English. Carroll, J. B. (1961). Fundamental considerations in testing for English language proficiency of foreign students. In Testing English proficiency of foreign students (pp 30 - 40). Washington, D.C.: Center for Applied Linguistics.  Carroll, J. B. (1972). Fundamental considerations in testing for English language proficiency of foreign students. In H. B. Allen & R. N. Campbell (Eds.), Teaching English as a Second Language: A book of readings (2nd ed., pp. 364-372). McGraw Hill, New York: McGraw-Hill Hook Company. Carroll, J. B. (1975). The teaching of French as a foreign language in eight countries. New York: John Wiley & Sons. Carmines, E. G. & Zeller, R. A. (1979). Reliability and validity assessment. Beverly Hills, CA: Sage. Carson, J., & Kuehn, P. (1992). Evidence of transfer and loss in developing second language writers. Language Learning, 42,157-182. Carson, J., Carrell, P., Silberstein, S., & Kroll, B. (1990). Reading-writing relationship in first and second language. TESOL Quarterly, 24, 245-266.  168  Casanave, C. (1992). Cultural diversity and socialization: A case study of a Hispanic woman in a doctoral program in sociology. In D. Murry (Ed.), Diversity as resource: Redefining cultural literacy (pp. 148-180). Washington, DC: TESOL. Casanave, C. P. (2004). Controversies in second language writing. Ann Arbor: The University of Michigan Press. CCCC resolution on testing. (1979). College Composition and Communication, 30, 391. Chalhoub-Deville, M., & Turner, C. E. (2000). What to look for in ESL admission tests: Cambridge certificate exams, IELTS, and TOEFL. System, 28, 523-539. Charney, D. (1984). The validity of using holistic scoring to evaluate writing: A critical overview. Research in the Teaching of English, 18(1): 65-81. Chapelle, C. (1998). Construct definition and validity inquiry in SLA and research. In F. L. Bachman & A. D. Cohen (Eds.), Interfaces between second language acquisition and language testing research (pp. 32-70). Cambridge: Cambridge University Press. Cheng, L., Watanabe, Y., & Curtis, A. (2004). Washback in language testing: Research contexts and methods. Lawrence Erlbaum and Associates. Chesky, J. A.(1984). The effect of prior knowledge and audience on writing. Dissertation Abstracts International, 45, 2740A. (University Microfilms No. DA 8428407) Chesky, J., & Hibert, E. H. (1987). The effect of prior knowledge and audience on high school students' writing. Journal of Educational Research, 80, 304-313. Clapham, C. (1996). The development of IELTS: A study in the effect of background knowledge on reading comprehension. Cambridge: UCLES/Cambridge University Press.  169  Cohen, A. (1994). Assessing language ability in the classroom. Boston, MA: Heinle and Heinle. Cohen, A., Glasman. H., Rosenbaum-Cohen, P. R., Ferrara, J., & Fine, J. (1988). Reading English for specialised purposes: Discourse analysis and the use of standard informants. In P. Carrel, J. Devine, & D. Eskey (Eds.), Interactive approaches to second language reading (pp. 152-167). Cambridge: Cambridge University Press. Conner, U. (1996). Contrastive rhetoric: Cross-cultural aspects of second-language writing. New York: Cambridge University Press. Cope, B., & Kalantzis, M. (1993). The powers of literacy: A genre approach to teaching writing. Philadelphia, PA: University of Pittsburgh Press. Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34, 213-239 Crandall, J. A., & Tucker, G. R. (1990). Content-based language instruction in second and foreign languages. In S. Anivan (Ed.), Language teaching methodology for the nineties, (pp.83-96). Singapore: SEAMEO Regional Language Centre. Creswell, J. W. (1995). Research design: Qualitative and quantitative approaches. Thousand Oaks, CA: Sage Publications, Inc. Creswell, J. W. (2003). Research design: Qualitative, quantitative, and mixed methods approaches. Thousand Oaks, CA: Sage Publications, Inc. Creswell, J. W., & Plano Clark, V. L. (2007). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage Publications, Inc. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.  170  Cumming, A. (1989). Writing expertise and second language proficiency. Language and Learning, 39, 81-141. Cumming, A. (1997). The testing of writing in a second language. In D. Corson & C. Clapham (Eds.), Language Assessment, Vol. 7 of Encyclopedia of language and education: Language testing and assessment (pp. 51- 63). Dordrecht, Netherlands: Kluwar. Cumming, A. (2002). Assessing L2 writing: Alternative constructs and ethical dilemmas. Assessing Writing, 8, 73-83 Cumming, A. (2006). Goals for academic writing. Amsterdam, Philadelphia: John Benjamins Publishing Company. Cumming, A., Grant, L., Mulxahy-Ernt, P., Powers, D. (2004). A teacher-verification study of speaking and writing prototype tasks for a new TOEFL. Language Testing, 21, 159-197. Cumming, A., Kantor, R., Powers, D., Santos, T., & Taylor, C. (2000). TOEFL 2000 writing framework: A working paper. TOEFL Monograph No. 18. Princeton, NJ: Educational Testing Service. Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T., & McNamara, T (1999). Dictionary of language testing. Cambridge: Cambridge University Press/UCLES. Devine, J., Railey, K., & Boshoff, P. (1993). The implications of cognitive models in L1 and L2 writing. Journal of Second Language Writing, 2, 203-225. Diederich, P. B., French, J. W., & Carlton, S. T (1961). Factors in judgments of writing quality. Research Bulletin 61-15. Princeton, NJ: Educational Testing Service.  171  Duff, P. (2003). New direction in second language socialization research. Korean Journal of English Language and Linguistics, 3, 309-339. Duff, P. (2007). Second language socialization as sociocultural theory: Insights and issues Language Teaching, 40, 309-319. Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155-185. Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of the Royal Statistical Society, 53, 460-475 and 644-663. Educational Testing Service (2006). The official guide to new TOEFL iBT. USA: McGraw Hill. Elbow, P. (1996). Writing assessment: Do it better, do it less. In W. Lutz, E. White, & S. Kamusikiri (Eds.), The Politics and Practices of Assessment in Writing (pp. 120-34). Modern Language Association of America. Ellis, N. (2002). Frequency effects in language processing. Studies in Second Language Acquisition, 24, 143-188. Engber, C. (1995). The relationship of lexical proficiency to the quality of ESL composition. Journal of Second Language Writing, 4, 139-155. Engellhard, G., Jr., Gordon, B., & Gabrielson, S. (1992). The influence of mode of discourse, experiential demand, and gender on the quality of student writing. Research in the Teaching of English, 26(3), 315-336. Erbaugh, M. S. (1990). Taking advantage of China's literary tradition in teaching Chinese students. The Modern Language Journal, 74, 15-27.  172  Eysenck, M. and M. Keane (2005). Cognitive psychology (5th edition). Hove: Psychology Press. Fagan, E. R., & Cheong, P. (1987). Contrastive Rhetoric: Pedagogical implications for the ESL teacher in Singapore. RELC Journal, 18 (1), 19-31. Faigley, L., Cherry, R. D., Jolliffe, D. A., & Skinner, A. M. (1985). Assessing writers' knowledge and processes of composing. Norwood, NJ: Ablex. Field, J. (2004). Psycholinguistics: The key concepts. London: Routledge. Flower, L. S., & Hayes, J. (1981). A cognitive process theory of writing. College Composition and Communication, 32, 365-387. Fox, H. (1994). Listening to the world: Cultural issues in academic writing. Urbana, IL: NCTE. Fox, J. (1999). Carleton Academic English Language (CAEL) Assessment Test Manual. Ottawa: Carleton University Press. Fox, J. A., Pychyl, T. A., & Zumbo, B. D. (1993). Psychometric properties of the CAEL assessment, I: Overview of development, format, and scoring procedures. Carleton Papers in Applied Language Studies, 10, 1-12. Freeman, S. W. (1983). Student characteristics and essay test writing performance. Research in the Teaching of English, 17, 313-325. Freeman, A., & Pringle, I. (1980). Writing in the college years: Some indices of growth. College Composition and Communication, 31(3), 311-324. Fu, D. (1995). My trouble is my English. Portsmouth, NH: Boynton/Cook.  173  Garrett, H. (1937). Statistics in psychology and education (2nd ed.). Oxford, England: Longmans. Gaies, S. (1980). T-unit analysis in second language research: Applications problems and limitations. TESOL Quarterly, 14, 53-60 Grabe, W., & Kaplan, R. (1989). Writing a second language: Contrastive rhetoric. In D. Johnson & D. Roen (Eds.), Richness in writing (pp. 263-283). New York, New York: Longman. Gee, J. P. (1993). Critical literacy/socially perspective literacy: A study of language in action. Australia Journal of Language and Literacy, 16, 333-355. Gee, J. P. (1998). What is literacy? In V. Zamel & R. Spack (Eds.), Negotiating academic literacies (pp. 51-59). New Jersey, USA: Lawrence Erlbaum Associates. Genesee, F. (1994). Integrating language and content: Lessons from immersion. Educational Practice Report 11. Retrieved April 10, 2010 from http://www.cal.org/resources/ digest/ncrcds05.html Glaser, R. (1984). Education and thinking: The role of knowledge. American Psychologist, 39, 93-104. Gradwohl, J. M., & Schumacher, G. M. (1989). The relationship between content knowledge and topic choice in writing. Written Communication, 6, 181-195 Grant, L., & Ginther, L. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9, 123-145. Green, A. (2007). IELTS washback in context: Preparation for academic writing in higher education. Cambridge: Cambridge University Press.  174  Greenberg, K. L. (1981a). The effects of variations in essay questions on the writing performance of CUNNY freshmen. New York: City University of New York, Instructional Resource Center. Greenberg, K. L. (1981b). Some relationships between writing tasks and students' writing performance (with WAT Question Pilot Series attachment). Paper presented at the 71st Annual Convention of the National Council of Teachers of English, Boston, MA. Gronlund, N. E. (1985). Measurement and evaluation in teaching. New York: Macmillan. Hall, C. (1990). Managing the complexity of revising across languages. TESOL Quarterly, 24, 43-60. Halliday, M. A. K. (1978). Language as social semiotic. London: Edward Arnold. Halliday, M. (1994). An introduction to functional grammar. London: Edward Arnold. Hamp-Lyons, L. (1989). Raters respond to rhetoric in writing. In H.W. Dechert & Raupauch (Eds.), Interlingual processes (pp. 229-244). Tubingen: Gunter Narr. Hamp-Lyons, L. (1990). Second language writing: Assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 67-87). Cambridge: Cambridge University Press. Hamp-Lyons, L. (1991). Reconstructing "academic writing proficiency". In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 127-53). Norwood, NJ: Albex Publishing Corporation. Hamp-Lyons, L. (1997). Ethnics in language testing. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education, Volume 7: Language testing and assessment (pp. 323-333). Netherlands: Kluwer Academic Publishers.  175  Hamp-Lyons, L., & Henning, G. (1991). Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts. Language Learning, 41, 337-373. Hamp-Lyons, L. & Kroll, B. (1997). TOEFL 2000-writing: Composition, community, and assessment. TOEFL Monograph Report No. 5. Princeton, NJ: Educational Testing Service. Hayes, J. R. (1990). Individuals and environments in writing instruction. In B. F. Jones & L. Idol (Eds.), Dimensions of cognitive instruction. Hillsdale, NJ: Erlbaum. Hayes, J. R., & Flower, L. S. (1980). Identifying the organization of writing processes. In L. W. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 31-50). Hillsdale, NJ: Lawrence Erlbaum. He, L. (2002). Test taker characteristics and performance on a Cloze test: An investigation using DIF methodology. Unpublished master thesis. University of British Columbia, British Columbia, Canada. He, L., & Shi, L (2008). ESL students' perceptions and experiences of standardized English writing tests. Assessing writing. Assessing Writing, 13, 130-149. Henning, G. (1991). Issues in evaluating and maintaining an ESL writing assessment program. In L. Hamp-Lyons (Ed.) Assessing second language writing in academic contexts (pp. 273-91). Norwood, NJ: Ablex Publishing Corporation Henry, K. (1996). Early L2 writing development: A study of autobiographical essays by university-level students of Russian. The Modern Language Journal, 80, 309-326.  176  Hilgers, T. L. (1982). Experimental control and the writing stimulus: The Problem of unequal familiarity with content. Research in the Teaching of English, 16(4), 381-390. Hillocks, Jr. G. (1986). Research on written composition: New directions for teaching. Urbaba, IL: National Conference on Research in English. Hillocks, G. (1995). Teaching writing as reflective practice. New York: Teachers College Press. Hinds, J. (1987). Reader versus writer responsibility: a new typology. In U. Connor, & R. B. Kaplan (Eds.), Writing across languages: Analysis of L2 text, reading (pp. 141-152). MA: Addison-Wesley Publishing Company. Hinkel, E. (1994). Native and nonnative speakers' pragmatic interpretations of English texts. TESOL Quarterly, 28, 353-376. Hinkle, E. (2001). Matters of cohesion in L2 academic writing. Applied Language Learning, 12, 111-132. Hirano, K. (1991). The effect of audience on the efficiency of objective measures of EFL proficiency in Japanese university students. Annual Review of English Language Education in Japan, 2, 21-30. Hirose, K. (2003). Comparing L1 and L2 organizational patterns in the argumentative writing of Japanese EFL students. Journal of Second Language Writing 12, 181-209. Hirose, K., & Sasaki, M. (1994). Explanatory variables for Japanese students' expository writing in English: An exploratory study. Journal of Second Language Writing, 3, 203-229.  177  Hoetker, J. (1979). On writing essay topics for a test of the composition skills of prospective teachers: with a review of literature on the creation, validation, and effects of topics on easy examination. Volume Four of Five. (ERIC Document Reproduction Service No. ED194 615). Hoetker, J. (1982). Essay examination topics and student writing. College Composition and Communication, 33, 377-392. Hoetker, J., & Brossell, G. (1986). A procedure for writing content-fair essay examination topics for large-scale writing assignments. College Composition and Communication, 37, 328-335. Homburg, T. J. (1984). Holistic evaluation of ESL compositions: Can it be validated objectively? TESOL Quarterly, 18, 87-107. Horowitz, D. (1991). ESL writing assessments: Contradictions and resolutions. In L. HampLyons (Ed.), Assessing second language writing in academic contexts (pp. 71-85). Norwood, NJ: Ablex. Hout, B. (1996). Toward a new theory of writing assessment. College Composition and Communication, 47, 549-566. Huberty, C. J., & Morris, J. D. (1989). Multivariate analysis versus multiple univariate analyses. Psychological Bulletin, 105, 302-308. Hughes, A. (2003). Testing for language teachers, Cambridge: Cambridge University Press. Hutchby, I. & Wooffitt, W. (1998). Conversation analysis. Cambridge: Polity Press. Hyland, K. (2002a). Teaching and researching writing. London: Longman.  178  Hyland, K. (2002b). Authority and invisibility: Authorial identity in academic writing. Journal of Pragmatics, 34, 1091-1112. Hyland, K. (2003). Second language writing. Cambridge: Cambridge University Press. Indrasuta, C. (1988). Narrative style in the writing of Thai and American students. In. A. Purves (Ed.), Writing across language: Issues in contrastive rhetoric (pp. 203-227). Newbury Park, CA: Sage. Isaacson, S. (1988). Assessing the writing product: Qualitative and quantitative measures. Exceptional Children, 54, 528-534. Ishikawa, S. (1995). Objective measurement of low-proficiency EFL narrative writing. Journal of Second Language Writing, 4, 253-272. Ivanic, R. (1998). Writing and identity: The discoursal construction of identity in academic writing. Amsterdam: John Banjamins Publishing Company. Jacobs, H. L., Zingraf, S. A., Wormuth, D. R., Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House. Janopoulos, M. (1986). The relationship of pleasure reading and second language writing proficiency. TESOL Quarterly, 20,763-768. Javis, S., Grant, L., Bikowski, D., & Ferris, D. (2003). Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing, 12, 377-403 Jennings, E. M., & Purves, A.C. Eds. (1991). Literate system and individual lives. State University of New York Press: Albany.  179  Jennings, M., Janna, F., Graves, B., & Shohamy, E. (1999). The test-takers’ choice: An investigation of the effect of topic on language-test performance. Language Testing, 16(4), 426-456. Johns, A. M. (2003). Genre and ESL/EFL composition instruction. In B. Koll (Ed.) Exploring the dynamics of second language writing (pp. 195-217). New York, NY: Cambridge University Press. Jolliffe, D., & Brier, E.M. (1988). Studying writers' knowledge in academic disciplines. In D. A. Jolliffe (Ed.), Advances in writing research (Vol. 2): Writing in academic disciplines (35-87). Norwood, NJ: Ablex. Jones, S., & Tetro, J. (1987). Composing in a second language. In A. Matsuhashi (Ed.) Writing in real time: Modeling production processes (pp. 34-57). Norwood, NJ: Ablex Publishing Corporation. Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38, 319-342. Kachru, Y. (1995). Contrastive rhetoric in world English. English Today, 41, 21-31. Kaplan, R. B. (1966). Cultural thought patterns in Inter-cultural Education. Language Learning, 16, 1-20. Kiany, G. R., & Nejad, M. K. (2001). On the relationship between English proficiency, writing ability, and the use of conjunctions in Iranian EFL learners' compositions. ITL Review of Applied Linguistics, 133/134, 227-241.  180  Kobuta, R. (1998). An Investigation of L1-L2 transfer in writing among Japanese university students: Implications for contrastive rhetoric. Journal of Second Language Writing, 7, 69-100. Kobuta, R. (2004). The politics of cultural differences in second language education. Critical Inquiry in Language Studies, 1, 21-39. Kobayashi, H., & Rinnert, C. (1992). Effects of first language on second language writing: Translation versus direct composition. Language and Learning, 42, 183-215. Kuiken, F., Vedder, I. (2007). Task complexity and measures of linguistic performance in L2 writing. IRAL- International Review of Applied Linguistics in Language Teaching, 45(3), 261-284 Kunnan, A., Ed. (2000). Fairness and justice for all. In A. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 1-14). Cambridge: Cambridge University Press. Langer, J. A. (1981). From theory to practice: A pre-reading plan. Journal of Reading, 25, 152-156. Langer, J. A. (1984). Where problems start: The effects of available information on responses to school writing tasks. In A.N. Applebee (Ed.), Contexts for learning to write: Studies of secondary school instruction (pp. 135-148). Norwood, NJ: Ablex. Langer, J. A. (1984a). The effects of available information on responses to school writing tasks. Research in the Teaching of English, 18, 27-44. Langer, J. A. (1984b). Examining background knowledge and text comprehension. Reading Research Quarterly, 19, 468-481.  181  Lantolf, J. P. (1996). SLA theory building: "Letting all the flowers bloom!" Language Learning, 46, 713-749. Lantolf, J. P. (2005). Sociocultural and second language learning research: An exegesis. In E. Hinkel (Ed.), The handbook of research in second language teaching and learning (pp. 335-353). Mahwah, NJ: Erlbaum . Larsen-Freeman, D. (1983). Assessing global second language writing proficiency. In H. W. Seliger & M. Long (Eds.), Classroom-oriented research in second language acquisition (pp. 287-304). Rowley, MA: Newbury House. Leki, I., Cumming, A., & Silva, T. (2008). A synthesis of research on second language writing in English. New York and London: Routledge Taylor & Francis Group. Leung, L. (1984). The relationship between first and second language writing. Language Learning and Communication, 3, 187-202. Li, X.-M. (1996). "Good writing" in cross-cultural context. Albany, NY: SUNY Press. Lisle, B., & Mano, S. (1997). Embracing a multicultural rhetoric. In C. Severino, J. C. Guerra, & J. E. Butler (Eds.), Writing in multicultural settings (pp. 12-26). New York: Modern Language Association. Linn, R. L. (1997). Evaluating the validity of assessments: The consequences of use. Educational Measurement: Issues and Practice, 16(2), 14-16. Linnarud, M. (1986). Lexis in composition: A performance analysis of Swedish learners' written English. Malmo, Sweden: Liber Forlag. Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to raters? Language Testing 19, 246-276.  182  Lynch, B. K. (1997). In search of the ethical language test. Language Testing, 14, 315-327. Ma, G., & Wen, Q. (1999). The relationship of second language learners' linguistic variables to second language writing ability. Foreign Language Teaching and Research, 4, 34-39. Marler, B. (2006). How can we best serve "newcomers," students who come with interrrupted formal schooling and from educational backgrounds that are very different from those in the United States? In E. Hamayan & R. Freeman (Eds.), English Language Learner at School (pp. 214-215). Philadelphia: Caslon Publishing. Martin, J. R. (1985). Process and texts: Two aspects of human semiosis. In: J. D. Benson & W. S. Greaves (Eds.), Systemic Perspectives on Discourse, Vol.1, No. 15 (pp. 248-274). Norwood, NJ: Ablex Language Teaching, London. Martin, J. R. (2002) Writing history: Construing time and value in discourses of the past. In M. Schleppergrell & C. Colombi (Eds.), Developing Advanced Literacy in First and Second Languages (pp. 87-118). Mahwah, N.J.: Erlbaum. Manchon, R. M., & de Haan, P. (2008). Writing in foreign language contexts: An introduction. Journal of Second Language Writing, 17 (1), 1-6. Matsuda, P. K. (2002). Negotiation of identity and power in a Japanese online discourse community. Computers and Composition, 19(1), 39-55. Matsuda, P. K. (2003). Second language writing in the 20th century: A situated historical perspective. In B. Kroll (Ed.), Exploring the dynamics of second language writing (pp. 15-34). New York: Cambridge University Press.  183  Matsuda, P. K. (2005). Historical inquiry in second language writing. In P. K. Matsuda & T. Silva (Eds.), Second language writing research: Perspectives on the process of knowledge construction (pp. 33-46). Mahwah, NJ: Lawrence Erlbaum Associates. Matsuhashi, A. (1982). Explorations in the real-time production of written discourse. In M. Nystrand (Ed.), What writers know: The language, process, and structure of written discourse (pp. 269-290). New York: Academic Press. Mayor, B. M. (2006). Dialogic and hortatory features in the writing of Chinese candidates for the IELTS test. Language, Culture and Curriculum, 19(1), 104-121. McColly, W. (1970). What does educational research say about the judging of writing ability? The Journal of Educational Research, 64, 148-156. McCuthen, D. (1986). Domain knowledge and linguistic knowledge in the development of writing ability. Journal of Memory and Languages, 25, 431-444. McNamara, T. F. (1996). Measuring second language performance. London: Longman. McPeck, J. E. (1981). Critical thinking and education. New York: St. Martin's Press. Markham, L. (1976). Influences of handwriting quality on teacher evaluation of written work.American EducationalResearch Journal, 13 (4), 277-283. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3 ed., pp. 13-103). New York: American Council on Education and Macmillan. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23. Messick, S. (1996). Validity and washback in language testing. Language Testing, 13, 241-256.  184  Moffett, J. W. (1963/1983). Teaching the university of discourse. Boston: Houghton Mifflin Mohan, B., & Lo, W. (1985). Academic writing and Chinese students: Transfer and developmental factors. TESOL Quarterly, 19, 515-534. Nelson, G., & Carson, J (1995). Social dimensions of second-language instruction: Peer response as cultural context. In D. Rubin (Ed.), Second identity and style in written communication (pp. 89-109). Hillside, NJ: Erlbaum. Newell, G. E. (1984). Learning from writing in two content areas: A case study/Protocol analysis. Research in the Teaching of English, 18(3), 265-287. Norton Peirce, B. (1995). Social identity, investment, and language learning. TESOL Quarterly, 29(1), 9-31. Norton, B. (1997). Language, identity, and the ownership of English. TESOL Quarterly, 31, 409-429. Norton, B. (2000). Writing assessment: Language, meaning, and marking memoranda. In A. Kunnan, (Ed.), Fairness and Validation in language assessment (pp. 20-29). Cambridge: Cambridge University Press. Norton, B. (2000b). Non-participation, imagined communities, and the language classroom. In M. Been (Ed.), Learner contributions to language learning: New directions in research (pp. 159-171). London: Pearson Education. Norton, B., & Toohey, K. (2001). Changing perspectives on good language learners. TESOL Quarterly, 35, 307-322. Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill. Ochs, E. (Ed.) (1979). Transcription as theory. New York, NY: Academic Press.  185  O'Donnell, H. (1984). The effect of topic on writing performance. English Education, 16, 243-249. Oller, J. W. (1979). Language tests at school. London: Longman Group. O’Loughlin, K., & Wigglesworth, G. (2003). Task design in IELTS academic writing task 1: The effect of quantity and manner of presentation of information on candidate writing. IELTS Research Reports, 4, 89-131 Odell, L., Cooper, C. R., & Courts, C. (1978). Discourse theory: Implications for research in composing. In C. R. Cooper & L. Odell (Eds.), Research on composing: Points of departure. Urbana, IL: National Council of Teachers of English. O'Shea, J. (1987). Writing apprehension and university tests of writing competence. English Quarterly, 20, 285-295. Pavlenko, A., & Lantolf, J. (2000). Second language learning as participation and the (re) construction of selves. In J. Lantolf (Ed.), Sociocultural theory and second language learning (pp. 155-177). Oxford: Oxford University Press. Perkins, K. (1983). On the use of composition scoring techniques, objective measures, and objective tests to evaluate ESL writing ability.TESOL Quarterly, 17, 651-671. Perkins, D. N., & Salomon, G. (1989). Are cognitive skills context bound? Educational Researcher, 18(1), 16-25. Pennington, M., & So, S. (1993). Comparing writing process and product across two languages: A study of 6 Singaporean University student writers. Journal of Second Language Writing, 2, 41-63.  186  Pennycook, A. (1996). Borrowing other's words: Text, ownership, memory, and plagiarism. TESOL Quarterly, 30, 201-230. Pennycook, A. (2001). Critical applied linguistics: A critical introduction. Mahwah, NJ: Lawrence Erlbaum Associations. Petersen, J. (2009). This test makes no freaking sense: Criticism, confusion, and frustration in timed writing. Assessing Writing, 14, 178-193. Plasse, L. A. (1981). The influence of audience on the assessment of student writing. Unpublished doctoral dissertation. University of Connecticut, Storrs, CT. Polio, C. (1997). Measures of linguistic accuracy in second language writing research. Language Learning, 47(1), 101-143. Porte, G. (1997). The etiology of poor second language writing: The influence of perceived teacher preferences on second language revision strategies. Journal of Second Language Writing, 7, 43-46. Purves, A. C., & Purves, W. C. (1986). Viewpoints: Cultures, text, models, and the activity of writing. Research in the Teaching of English, 20, 174-197. Purves, A. C., Soter, A., Takala, S., & Vahapassi, A. (1984). Towards a domain-referenced system for classifying composition assignments. Research in the Teaching of English, 18, 385-416. Pye, L. W. (1985). Asian power and politics: The cultural dimensions of authority. Cambridge, MA: Harvard University Press. Raimes, A. (1985). What unskilled ESL students do as they write: A classroom study of composing. TESOL Quarterly, 19, 229-258.  187  Raimes, A. (1987). Language proficiency, writing ability, and composing strategies: A study of ESL college student writers. Language Learning, 37, 439-468. Raimes, A. (1990). The TOEFL test of written English: Causes for concern. TESOL Quarterly, 25, 407-430. Ramanathan, V., & Atkinson, D. (1999). Individualism, academic writing, and ESL writers. Journal of Second Language Writing, 8, 45-75. Reichelt, M. (2009). A critical evaluation of writing teaching programmes in different foreign language settings. In R. N. Manchon (Ed.), Writing in foreign language contexts (pp. 183-206). NY, USA: Multilingual Matters. Resnick, L. (1987). Education and learning to think. Washington D. C.: National Academy Press. Reynolds, D. (1995). Repetition in nonnative speaker writing: More than quantity. Studies in Second Language Acquisition, 17, 187-209. Rinnert, C., & Kobayashi, H. (2009). Situated writing practice in foreign language settings: The role of pervious experience and instruction. In R. M. Manchon (Ed.), Writing in foreign language contexts: Learning, teaching, and research (pp. 23-48). Buffalo, NY: Multilingual Matters. Ross, S. J. (2005). The impact of assessment method on foreign language proficiency growth. Applied Linguistics, 26, 317-342. Rossman, G. B., & Wilson, B. L. (1985). Numbers and words: Combining quantitative and qualitative methods in a single large scale evaluation study. Evaluation Review, 9, 627-643.  188  Ruth, L., & Murphy, S. (1984). Designing writing tasks for the assessment of problems of meaning. College Composition and Communications, 35(4), 410-422. Ruth, L., & Murphy, S. (1988). Designing writing tasks for the assessment of writing. Norwood, NJ: Ablex Publishing Corporation. Sakyi, A. A. (2000). Validation of holistic scoring for ESL writing assessment: How raters evaluate composition. In A. J. Kunnan (Ed.), Fairness and validation in language assessment (pp.129-145). Cambridge: Cambridge University Press. Sasaki, M., & Hirose, K. (1996). Explanatory variables for EFL students' expository writing. Language Learning, 46 (1), 137-168. Sasaki, M. (2000). Towards an empirical model of EFL writing processes: An exploratory study. Journal of Second Language Writing, 9, 259-291. Scardamalia, M., & C. Bereite, C. (1987). Knowledge telling and knowledge transforming in written composition. In S. Rosenberg (Ed.), Advances in applied psycholinguistics, Vol. 2: Reading, writing and language learning (pp. 142-175). Cambridge: Cambridge University Press. Schumacher, G. M., Gradwohl, J. M., Brezin, M., & Parker, E. G. (1986). Children’s in-depth knowledge and writing processes: Of legends on broomsticks and rudderless boats. Paper presented at the National Reading Conference, Austin. Sharma, A. (1980). Syntactic maturity: Assessing writing proficiency in a second language. In R. Silverstein (Ed.), Occasional Papers in Linguistics, No. 6 (pp. 318-325). Carbondale: Southern Illinois University.  189  Shaw, P., & Liu, E. (1998). What develops in the development of second-language writing ? Applied Linguistics, 19, 225-254. Shaw, S. D., & Weir, C. J. (2007). Studies in Language Testing: Examining writing. Research and practice in assessing second language writing (No. 26). Cambridge, UK: Cambridge University Press. Shen, F. (1989). The classroom and the wider culture: Identity as a key to learning English composition. College Composition and Communication, 40, 459-466. Shepard, L. A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16(2), 5-13. Shi, L. (2001). Native and nonnative speaking EFL teachers' evaluation of Chinese students' English writing. Language Testing, 18, 303-325. Shi, L. (2003). Writing in two cultures: Chinese professors return from the West. The Canadian Modern Language Review, 59, 369-391. Shi, L. (2004). Textual borrowing in second language writing. Written Communication, 21, 171-200. Shi, L. (2006). Cultural backgrounds and textual appropriation. Language Awareness, 15, 264 - 282. Shi, L., & Kobuta, R. (2007). Patterns of rhetorical organization in Canadian and American Language arts textbooks: An exploratory study. English for Specific Purposes, 26, 180-202. Shi, L. (2010). Textual appropriation and citing behaviors of university undergraduates. Applied Linguistics 31, 1-24.  190  Shohamy, E. (1997). Testing Methods, Testing Consequences: Are they Ethical? Are they Fair? Language Testing, 14, 340-349. Shohamy, E. (2000). Fairness in language testing. In A. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 15-19). Cambridge: Cambridge University Press. Schneider, M., & Connor, U. (1990). Analyzing topical structure in ESL essays: Not all topics are equal. Studies in Second Language Acquisition, 12, 411-427. Shaw, S. D., & Weir, C. J. (2007). Studies in language testing. Cambridge, UK: Cambridge University Press. Spaan, M. (1990). The effect of prompt in essay examinations. In C. C. Dan Douglas (Ed.), A new decade of language testing research (pp. 98-122). Virginia, USA: Teachers of English to Speakers of Other Languages, Inc. Spann, M. (2000). Enhancing fairness through a social contract. In A. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 35-38). Cambridge: Cambridge University Press. Spradley, J. P. (1980). Participant observation. New York, NY: Holt, Rinehart & Winston. Squire, J. R. (1983). Composing and comprehending: Two sides of the same basic process. Language Arts, 60(5), 581-589. Tedick, D. J. (1988). The effects of topic familiarity on the written composition of international graduate students. Ohio State University Research and Scholarly Activities Forum. Columbus, OH: Ohio State University Council of Graduate Students.  191  Tedick, D. J. (1990). ESL writing assessment: Subject-matter knowledge and its impact on performance. English for Specific Purposes, 9, 123-143. Tashakkori, A., & Teddlie, C. (1998). Mixed methodology: Combining qualitative and quantitative approaches. Thousand, CA: Sage. Tashakkori, A., & Teddlie, C. (2003). Handbook of mixed methods in the social and behavioral research. Thousand, CA: Sage. Tsang, W. K. (1996). Comparing the effects of reading and writing on writing performance. Applied Linguistics, 17, 210-233. University of British Columbia (2005). The LPI workbook (2nd ed.). Vancouver, Canada: University of British Columbia. University of British Columbia (1997). The LPI workbook (2nd ed.). Vancouver, Canada: University of British Columbia. Vaughan, C. (1991). Holistic assessment: What goes on in the raters' minds? In L. HampLyons (Ed.), Assessing second language writing in academic contexts (pp. 111-125). Norwood, NJ: Ablex. Victori, K. (1999). An analysis of writing knowledge in EFL composing: A case study of two effective and two less effective writers. System, 27, 537-555. Vygotsky, L. S. (1978). Mind in society: the development of higher psychological processes. Cambridge, MA: Harvard University Press. Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15, 263-87.  192  Weir, C. J. (1994). Understanding and developing language tests. London: Prentice Hall. West, M. (1953). A general service list of English words. London: Longman. Whalen, K., & Menard, N. (1995). L1 and L2 writers' strategies and linguistic knowledge: A model of multiple-level discourse processing. Language and Learning, 45, 381-418. White, E. (1985). Teaching and assessing writing. San Francisco: Jossey-Bass. White, E. M. (1994). Teaching and assessing writing: Recent advances in understanding, evaluating, and improving student performance. San Francisco: Jossey-Bass Publishers. Winfield, F. E., & Barnes-Felfeli, W. (1982). The effects of familiar and unfamiliar context on foreign language composition. Modern Language Journal, 66, 373-378. Wray, A. (2002). Formulaic language and the lexicon. New York: Cambridge University Press. You, X. (2004). The choice made from no choice: English writing instruction in a Chinese University. Journal of Second Language Writing, 13, 97-110. Zainuddin, H., & Moore, R. A. (2003). Audience awareness in L1 and L2 composing of bilingual writers. TESL-EJ. 7, 1-18. Retrieved May 15, 2007, from http://wwwwriting.berkeley.edu/TESL-EJ/ej25/a2.html. Zamel, V. (1983). The composing processes of Advanced ESL students: Six case studies. TESOL Quarterly, 21, 165-187. Zamel, V. (1985). Responding to student writing. TESOL Quarterly, 19(1), 79-101.  193  Zhu, W. (2006) Interaction and feedback in mixed peer response groups. In P. K. Matsuda, M. Cox, J. Jordan, & C. Ortmeier-Hooper (Eds.), Second-language writing in the composition classroom (pp.186-209). Boston, MA: Bedford/St. Martin’s. Zumbo, B. D. (2005). Reflections on validity at the intersection of psychometrics, scaling, philosophy of inquiry, and language testing. LTRC 27th Language Testing Research Colloquium. Ottawa, Canada. Zumbo, B.D. (2007). Validity: Foundational issues and statistical methodology. In C.R. Rao and S. Sinharay (Eds.), Handbook of statistics, Vol. 26, psychometrics (pp. 45-79). Elsevier Science B.V.: The Netherlands. Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65-82). IAP - Information Age Publishing, Inc.: Charlotte, NC.  194  APPENDIX A  Topics and Standard Instructions from MELAB Assessment Battery Part 1: Composition  NAME_____________________________________DATE_________________________ (family/last/surname) (given/first name) INSTRUCTIONS: 1. You will have 30 minutes to write on one of the two topics printed below. If you do not write on one of these two topics, your paper will not be graded. If you do not understand the topics, ask the examiner to explain or to translate them. TOPICS:  SET____________(CIRCLE THE LETTER OF THE TOPIC YOU CHOOSE) A. _____________ B. _____________  2. You may make an outline if you wish, but your outline will not be used to determine your grade. 3. Write about 1 to 2 pages. You will lose credit if your paper is extremely short. Write on both sides of the paper. Ask the examiner for more paper if you need it. 4. You may make any changes or corrections in the body of the composition. You will not be graded on the appearance of your paper, but be sure your handwriting is legible. Do not waste time copying your composition over. 5. Your essay will be graded on the clarity of your writing and the linguistic range and accuracy you show. PLEASE SIGN YOUR NAME BELOW WHEN YOU HAVE UNDERSTOOD THESE INSTRUCTORS NAME (SIGNATURE) ________________________________________________ START HERE:  195  APPENDIX B  Descriptions of Score Levels for MELAB Compositions 97  Topic is richly and fully developed. Flexible use of a wide range of syntactic sentence level) structure, and accurate morphological (word forms) control. There is a wide range of appropriately used vocabulary. Organization is appropriate and effective, and there is excellent control of connection. Spelling and punctuation appear error free.  93  Topic is fully and complexly developed. Flexible use of a wide range of syntactic structure. Morphological control is nearly always accurate.Vocabulary is broad and appropriately used. Organization is well controlled and appropriate to the material, and the writing is well connected. Spelling and punctuation errors are not distracting.  87  Topic is well developed, with acknowledgement of its complexity. Varied syntactic structure are used with some flexibility, and there is good morphological control. Vocabulary is broad and usually used appropriately. Organization is controlled and generally appropriate to the material, and here are few problems with connection. Spelling and punctuation errors are not distracting.  83  Topic is generally clearly and completely developed, with at least some acknowledgement of its complexity. Both simple and complex syntactic structure are generally adequately used; there is adequate morphological control. Vocabulary use shows some flexibility, and is usually appropriate. Organization is controlled and shows some appropriacy to the material, and connection is usually adequate. Spelling and punctuation errors are sometimes distracting.  77  Topic is developed clearly but not completely and without acknowledging its complexity. Both simple and complex syntactic structures are present; in some “77” essays these are cautiously and accurately used while in other there is more fluency and less accuracy. Morphological control is inconsistent. Vocabulary is adequate, but may sometimes be inappropriately used. Organization is generally controlled, while connection is sometimes absent or unsuccessful. Spelling and punctuation errors are sometimes distracting.  73  Topic development is present, although limited by incompleteness, lack of clarity, or lack of focus. The topic may be treated as though it has only one dimension, or only one point of view is possible. In some “73” essays both simple and complex syntactic structures are present, but with many errors; others have accurate syntax but are very restricted in the range of language attempted. Morphological control is  196  inconsistent. Vocabulary is sometimes inadequate, and connection is often absent or unsuccessful. Spelling and punctuation errors are sometimes distracting. 67  Topic development is present but restricted, and often incomplete or unclear. Simple syntactic structures dominate, with many errors; complex syntactic structure, if present, are not controlled. Lacks morphological control. Narrow and simple vocabulary usually approximates meaning but is often inappropriately used. Organization, when apparent, is poorly controlled, and little or no connection is apparent. Spelling and punctuation errors are often distracting.  63  Contains little sing of topic development. Simple syntactic structures are present, but with many errors; lack morphological control. Narrow simple vocabulary inhibits communication. There is little or no organization, and no connection apparent. Spelling and punctuation errors often cause serious interference.  57  Often extremely short; contains only fragmentary communication about the topic. There is little syntactic or morphological control. Vocabulary is highly restricted and inaccurately used. No organization or connection are apparent. Spelling is often indecipherable and punctuation is missing or appears random.  53  Extremely short, usually about 40 words or less. Communicates nothing, and is often copied directly from the prompt. There is little sign of syntactic or morphologic control. Vocabulary is extremely restricted and repetitively used. There is no apparent organization or connection. Spelling is often indecipherable and punctuation is missing or appears random.  197  APPENDIX C  Holistic Rating Scale for the Language Proficiency Index  Essay Level 6: Advanced Proficiency: Demonstrates exceptional fluency marked by a wide range of skills, including excellent organizational abilities and original insights. Advanced proficiency is indicated in the clear articulation of both complex and straightforward concepts. Essay Level 5: Effective Proficiency: Demonstrates fluent competency. For the most part, the writing is clear and controlled. Occasional errors in expression and structure do not significantly detract from the coherent articulation of ideas. There is clarity in development and organization. Essay Level 4: Adequate Proficiency: Demonstrates adequate competency with satisfactory organization and structure although expression errors are evident, particularly occurring in occasional clusters. This is writing that requires some revision. Essay Level 3: Developing Proficiency: Demonstrates some familiarity with written communication, using simple tools of expression. Major problems in diction, sentence structure, and organization, however, are evident. Numerous second language errors may also be present. This is writing that requires considerable revision. Essay Level 2: Minimal Proficiency: Demonstrates only limited ability in written communication. Second language expression errors are the dominant feature of this writing. Essay Level 1: No Proficiency: Demonstrates very limited knowledge of written English. The writer does not have the language skills to write a series of statements that can be understood by the reader. Essay Level 0: Essay Cannot be Evaluated: Essays are placed at Level 0 when no essay has been attempted, when too little has been written to allow a fair evaluation, or when the essay does not directly address any of the given topics.  198  APPENDIX D Interview Guideline  Background Questions o  How long have you been in Canada?  o What’s your major? o How many years have you learned English? o Have you ever taken TOEFL, LPI, or other English language proficiency tests? If yes, how many times? o What do you think of your English level, basic, or intermediate or advanced?  Experiences with L2 writing o You have written two essays with different prompts focusing on either Canadian or home culture. Do you think the writing topics influenced the quality of your writing? Why or why not? o What kind of writing topics do you usually feel like writing about? Why? o What do you feel is the most difficult in writing an English essay? What strategies do you use to write an English essay? o Do you think writing an English essay is the same as or different from writing an essay in your native language? Why? o What do you think should be improved in your English writing?  199  APPENDIX E  Letter of Initial Contact  THE UNIVERSITY OF BRITISH COLUMBIA Department of Language and Literacy Education 2034 Lower Mall Vancouver, B.C. Canada V6T 1Z2 Tel: (604) 822-5788 Fax: (604) 822-3154  Dr. Marv Westrom, President Alexander College 305-4538 Kingsway Burnaby BC, V5H 4T9 February 14, 2008  Dear Dr. Westrom: I am writing to ask for your permission to conduct my doctoral study at your college. The study aims to investigate the impact of topic and culture knowledge on Second Language writing. My supervisor is Dr. Ling Shi in the Department of Language and Literacy Education in Faculty of Education at the University of British Columbia. I have recently told you about my study and you expressed interest in helping me with the data collection. I appreciate your support. I have now worked out the details about how to conduct the interviews with the students and collect their writing samples. I would like to discuss with you about the procedures and will call you to make an appointment. I am looking forward to seeing you. Sincerely yours, Ling He PhD Candidate Language and Literacy Education Faculty of Education The University of British Columbia  200  APPENDIX F  Letter of Permission #300 4680 Kingsway Burnaby, BC V5H 4L9  ALEXANDER COLLEGE  April 10, 2008  Dear the UBC Office of Research Ethic: This letter is to show our permission to Ling He to conduct her doctoral study at our college. Our college is a private educational institute that provides both academic and vocational training in areas of highest need. We accept many nonnative English speaking students who are new to Canada. I am happy that Ling has chosen our college as her research site. I believe that the bound context of our college will provide Ling with rich resources to conduct her study which aims to investigate the factors that could influence the English as a second language students’ writing performance. Indeed, it is important to understand these minority groups’ needs to help them integrate into Canadian mainstream culture and education system. I will provide any possible convenience to Ling for her collecting data at our college. Should you have any questions or concerns regarding this letter, please contact me at the address above.  Sincerely,  Marv Westrom Ph.D. (Education) President, Alexander College and Professor Emeritus, UBC  201  APPENDIX G Consent Form  Title of Study:  Effect of Topical Knowledge on L2 Writing Performance  Principal Investigator:  Dr. Ling Shi, Associate Professor Department of Language and Literacy Education, Faculty of Education, UBC Phone: 604 822-4335 Email: ling.shi@ubc.ca  Co-Investigator:  Ling He PhD. Candidate Program of Teaching English as a Second Language Department of Language and Literacy Education, Faculty of Education, UBC Phone: 604-2218863 Email: lingheli@interchange.ubc.ca  Purpose: The purpose of this study is to investigate the impact of content and cultural knowledge on L2 test takers’ writing performance. Specifically, it will examine the relationship between topic and cultural knowledge required by the writing prompts and the students’ writing performances. Procedures: If you choose to become involved in this study, you will be asked both semistructured and open-ended questions regarding your experience of writing two writing topics. The interview will be recorded. Amount of time required: 30 minutes for the interview and 1 hour for writing each prompt. If needed, I will cross check with the participants the researchers’ report and interpretation of the data. Thus, the total time involved will be 2.5 hours at most. Rights: Your participation in this study is entirely voluntary, and you have the right to refuse to participate, or withdraw at any time. During the interview, you can refuse to answer any of the questions.  202  Confidentiality: To ensure the confidentiality of the data, you and your institute will be assigned pseudonyms. These names will be used in the researcher’s notes and data analysis and in the poster. Compensation and benefits for participants: To thank you for your participation in the study, I will provide free tutoring or lecture to comment about your writing on two prompts you will write and answer your question about your writing on two prompts. You will not receive any payment as a result of your participation in this study. Contact for information about the study: If you are willing to participate in this study, contact Ling He by telephone at 604-221-8863 or by email lingheli@interchange.ubc.ca. If you have any questions about this study, you may contact Ling He. You may also contact our professor, Dr. Ling Shi by telephone (604-822-4345) or e-mail (ling.shi@ubc.ca). Contact for concerns about the rights of research subjects: If you have any concerns about you rights as participants in this study, you may contact the Research Subject Information Line in the UBC Office of Research Services and Administration at 604-822-8598.  203  Consent Form I have read the consent form and recognize that my participation in this study is entirely voluntary and that I am free to withdraw at any time during the course of the study without consequence. I understand that any information resulting from this study will be strictly confidential. I realize that I may ask for further information about this study if I wish to do so at any time. I have received a copy of this consent form for my own records. I agree to participate in this study.  _____________________________ Subject Signature  _____________________________ Print Name of the Subject  _____________________________ Date  204  APPENDIX H  Recruitment Poster  L2 WRITNG PERFORMANCE RESEARCH PROJECT  ! Are you an ESL student at the Vancouver Center College? Are you preparing to take the LPI test? ! Do you want to write and practice for the LPI test? ! Do you want to talk about your experiences or difficulties in English writing? ! Are you interested in getting the feedback for your writing? ! If you answered yes to all of the above and are interested in being part of a study about the cultural knowledge and its impact on L2 writing performance, please contact Ling He at 604 221-8863 or e-mail to lingheli@interchange.ubc.ca  205  APPENDIX I  Focus-Group Study for Ranking Difficult or Easy Writing Topics Instruction: Please check the topics you feel easy or difficult in the following sheet. Topic #  Writing Topics in the LPI Workbook (2008)  1  Some hospitals run lotteries for which the tickets cost as much as $100. Do you think this is a fair way to raise money? Explain why you would OR would not buy such a lottery ticket. Be specific.  2  Explain why you do OR do not take an interest in federal politics. Be specific  3  Will you OR will you not take an interest in the 2010 Winter Olympics in Whistler and Vancouver? Provide reasons.  4  If there is a major court trial that is frequently reported on in the media, do you Or do you not follow it with interest? In either case, explain why.  5  Identify something that you have succeeded at in your life, and explain why you succeeded. (Was it natural talent, hard work, the influence of family or friends, or help from a coach or teacher?)  6  What are some of the things about which you are superstitious? Explain why you came to have these superstitious?  7  Do you think stories in the media related to human health, or a possible threat to human health, are interesting? Explain why you do or do not. Be specific.  8  Explain why you think being a young person today is more difficult than it was about 20 years ago. Be specific.  9  What are some of the disadvantages of being either left-handed or very tall? Give specific examples  10  In your opinion, if public transportation in a large city were free, would enough people leave their cars at home to significantly reduce traffic congestion? Why OR why not? Be specific.  11  Identify two or three of the heroes you have looked up to during your lifetime. Are they OR are they not still heroes to you? Provide reasons.  Easy/Difficult  206  Topic #  Writing Topics in the LPI Workbook (2008)  12  Which individuals or groups contribute to making your school or workplace as special place? Be specific. Provide reasons.  13  What, in your view, are some of the potential problems associated with the Internet? Be specific.  14  Agree OR disagree with the following statement: “The larger a city gets, the more it has to offer the people who live there.” Provide reasons  15  Write an essay in which you contrast different attitudes you have had at different times in your life toward some persons, activities, or places. Be specific  16  What personality trait or attitude do you most dislike in other people? Provide reasons. Be specific.  17  If you plan to attend a college or a university, what factors will influence your choice of what to study? Provide reasons.  18  Do you OR do you not think that the world is undergoing a period of global warming at the present time? Provide evidence in support of your opinion.  19  Write an essay in which you explain why you would OR would not run for public office such as city councillor. Provide reasons.  20  Identity some of the strategies that you used to get through your years as a teenager. Give examples. Be specific.  21  What, in your opinion, are some of the reasons that radio talk shows (including the numerous sports talk shows) are often more irritating than informative? Be specific.  22  Agree OR disagree with the following statement: “There has never been a better time to be alive than the present.” Provide reasons. Be specific.  23  What conflict or social issue that is frequently in the news do you feel you have heard enough about? Provide reasons  24  In what way OR ways do you think your life will be different from the life of your parents? Be specific.  Easy/Difficult  207  APPENDIX J  Writing Test Sheet for Prompt A  Name___________________  Class_________________  Essay Writing Instructions: Writing an essay that develops some central claims, ideas, or opinions on the topic suggested below. Your essay should contain several well-developed paragraphs that support and illustrate your central idea.  Prompt A: If you plan to attend a college or a university, what factors will influence your choice of what to study? Provide reasons ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ Page 1 of 2  208  ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ Page 2 of 2  209  APPENDIX K  Writing Test Sheet for Prompt B  Name___________________  Class_________________  Essay Writing Instructions: Writing an essay that develops some central claims, ideas, or opinions on the topic suggested below. Your essay should contain several well-developed paragraphs that support and illustrate your central idea.  Prompt B: Explain why you do OR do not take an interest in federal politics. Be specific. ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ Page 1 of 2  210  ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ ___________________________________________________________________________ Page 2 of 2  211  APPENDIX L  Six-point Analytic Rating Scale of this Study General  Specific  Description  Idea quality  Content  Relevance, originality, depth to 0 1 2 the topic Position-taking Clear thesis statement in the introduction Idea development clear logic in paragraph development (a topic sentence with one idea/unity, supporting details/evidence with thorough discussions) Idea wrap-up a definite sense of closure by summarizing the main ideas discussed in the body paragraph and further comments by making a final statement such as prediction and recommendation in the conclusion paragraph Cohesion  Organization Coherence  Fluency  Language  Rating  Accuracy Lexical Complexity  Smooth transitions and/or connections between sentences and paragraphs The organization of discourse with all elements present and fitting together logically (e.g., the presence of an introduction, a thesis statement, rhetorical support, and conclusion) Length of the essay (i.e., the number of words of each essay) The ratio of error-free words of the length of each essay) The total number of academic words of each essay  3  4  5  6  212  APPENDIX M  Headswords1 of the Word Families in the Academic Word List Numbers indicate the sublist of the Academic Word List (e.g., abandon and its family members are in Sublist 8). Sublist 1 contains the most frequent words in the list, and Sublist 10 contains the least frequent. abundant  8  aspect  2  coincide  9  abstract  6  assemble  10  collapse  10  academy  5  assess  1  colleague  10  access  4  assign  6  commence  9  accompany  8  assume  1  commission  2  accurate  6  attach  6  commit  4  achieve  2  attain  9  communicate  4  acknowledge  6  attitude  4  community  2  acquire  2  attribute  4  compatible  9  adapt  7  author  6  compensate  3  adequate  4  authority  1  compile  10  adjacent  10  automate  8  complement  8  adjust  5  available  1  complex  2  administrate  2  aware  5  component  3  adult  7  behalf  9  component  3  advocate  7  benefit  1  comprehensive  7  affect  2  bias  8  comprise  7  aggregate  6  bond  6  compute  2  aid  7  brief  6  conceive  10  213  albeit  10  bulk  9  concentrate  4  allocate  6  capable  6  concept  1  alter  5  capacity  5  conclude  2  alternative  3  category  2  concurrent  9  ambiguous  8  cease  9  conduct  2  amend  5  challenge  5  confer  4  analogy  9  channel  7  confine  9  analyse  1  chapter  2  confirm  7  annual  4  chart  8  conflict  5  anticipate  9  chemical  7  conform  8  apparent  4  circumstance  3  consent  3  append  8  cite  6  consequent  2  appreciate  8  civil  4  considerable  3  approach  1  clarify  8  consist  1  appropriate  8  classic  7  constant  3  approximate  4  clause  5  constitute  1  arbitrary  8  code  4  constrain  3  area  1  coherent  9  construct  2  consult  5  document  3  flexible  8  consume  2  domain  6  fluctuate  8  contact  5  domestic  4  focus  2  contemporary  8  dominate  3  format  9  context  1  draft  5  formula  1  contract  1  drama  8  forthcoming  10  contradict  8  duration  9  foundation  7  214  contrary  7  dynamic  7  found  9  contrast  4  economy  1  framework  3  contribute  3  edit  6  function  1  controversy  9  element  2  fund  3  convene  3  eliminate  7  fundamental  5  converse  9  emerge  4  furthermore  6  convert  7  emphasis  3  gender  6  convince  10  empirical  7  generate  5  cooperate  6  enable  5  generation  5  coordinate  3  encounter  10  globe  7  core  3  energy  5  goal  4  corporate  3  enforce  5  grade  7  correspond  3  enhance  6  grant  4  couple  7  enormous  10  guarantee  7  create  1  ensure  3  guideline  8  credit  2  entity  5  hence  4  criteria  3  environment  1  hierarchy  7  crucial  8  equate  2  highlight  8  culture  2  equip  7  hypothesis  4  currency  8  equivalent  5  identical  7  cycle  4  erode  9  identify  1  data  1  error  4  ideology  7  debate  4  establish  1  ignorance  6  decade  7  estate  6  illustrate  3  decline  5  estimate  1  image  5  215  deduce  3  ethic  9  immigrate  3  definite  7  evaluate  2  implement  4  demonstrate  3  eventual  8  implement  4  denote  8  evident  1  implicit  8  deny  7  evolve  5  imply  3  depress  10  exceed  6  impose  4  derive  1  exclude  3  incentive  6  design  2  exhibit  8  incidence  6  despite  4  expand  5  incline  10  detect  8  expert  6  income  1  deviate  8  explicit  6  incorporate  6  device  9  export  1  indicate  1  differentiate  7  expose  5  individual  1  dimension  4  external  5  induce  8  diminish  9  extract  7  inevitable  8  discrete  5  facilitate  5  infer  7  discriminate  6  factor  1  infrastructure  8  displace  8  feature  2  inherent  9  display  6  federal  6  inhibit  6  dispose  7  fee  6  initial  3  distinct  2  file  7  initiate  6  distort  9  final  2  injure  2  distribute  1  finance  1  innovate  7  diverse  6  finite  7  input  6  insert  7  minimise  8  precede  6  216  insight  9  minimum  6  precise  5  inspect  8  ministry  6  predict  4  instance  3  minor  3  predominant  8  institute  2  mode  7  preliminary  9  instruct  6  modify  5  presume  6  integral  9  monitor  5  previous  2  integrate  1  motive  6  primary  2  integrity  10  mutual  9  prime  5  intelligence  6  negate  3  principal  4  intense  8  network  5  principle  1  interact  3  neutral  6  prior  4  intermediate  9  nevertheless  6  priority  7  internal  4  nonetheless  10  proceed  1  interpret  1  norm  9  process  1  interval  6  normal  2  professional  4  intervene  7  notion  5  prohibit  7  intrinsic  10  notwithstanding  10  project  4  invest  2  nuclear  8  promote  4  investigate  4  objective  5  proportion  3  invoke  10  obtain  2  prospect  8  involve  1  obvious  4  protocol  9  isolate  7  occupy  4  psychology  5  issue  1  occur  1  publication  7  item  2  odd  10  publish  3  job  4  offset  8  purchase  2  217  journal  2  ongoing  10  pursue  5  justify  3  option  4  qualitative  9  label  4  orient  5  quote  7  labour  1  outcome  3  radical  8  layer  3  output  4  random  8  lecture  6  overall  4  range  2  legal  1  overlap  9  ratio  5  legislate  1  overseas  6  rational  6  levy  10  panel  10  react  3  liberal  5  paradigm  7  recover  6  licence  5  paragraph  8  refine  9  likewise  10  parallel  4  regime  4  link  3  parameter  4  region  2  locate  3  participate  2  register  3  logic  5  partner  3  regulate  2  maintain  2  passive  9  reinforce  8  major  1  perceive  2  reject  5  manipulate  8  percent  1  relax  9  manual  9  period  1  release  7  margin  5  persist  10  relevant  2  mature  9  perspective  5  reluctance  10  maximise  3  phase  4  rely  3  mechanism  4  phenomenon  7  remove  3  media  7  philosophy  3  require  1  mediate  9  physical  3  research  1  218  medical  5  plus  8  reside  2  medium  9  policy  1  resolve  4  mental  5  portion  9  resource  2  method  1  pose  10  respond  1  migrate  6  positive  2  restore  8  military  9  potential  2  restrain  9  minimal  9  practitioner  8  restrict  2  retain  4  status  4  thesis  7  reveal  6  straightforward  10  topic  7  revenue  5  strategy  2  trace  6  reverse  7  stress  4  tradition  2  revise  8  structure  1  transfer  2  revolution  9  style  5  transform  6  rigid  9  submit  7  transit  5  role  1  subordinate  9  transmit  7  route  9  subsequent  4  transport  6  scenario  9  subsidy  6  trend  5  schedule  8  substitute  5  trigger  9  scheme  3  successor  7  ultimate  7  scope  6  sufficient  3  undergo  10  section  1  sum  4  underlie  6  sector  1  summary  4  undertake  4  secure  2  supplement  9  uniform  8  seek  2  survey  2  unify  9  select  2  survive  7  unique  7  219  sequence  3  suspend  9  utilise  6  series  4  sustain  5  valid  3  sex  3  symbol  5  vary  1  shift  3  tape  6  vehicle  8  significant  1  target  5  version  5  similar  1  task  3  via  8  simulate  7  team  9  violate  9  site  2  technical  3  virtual  8  so-called  10  technique  3  visible  7  sole  7  technology  3  vision  9  somewhat  7  temporary  9  visual  8  source  1  tense  8  volume  3  specific  1  terminate  8  voluntary  7  specify  3  text  2  welfare  5  sphere  9  theme  8  whereas  5  stable  5  theory  1  whereby  10  statistic  4  thereby  8  widespread  8  220  Appendix N Ethics Review Certificate THE UNIVERSITY OF BRITISH COLUMBIA Research Ethics, Office of Research Services Suite 102, 6190 Agronomy Road Vancouver, B.C. V6T 1Z3 Phone: 604-827-5112 Fax: 604-822-5093 Our File: H08-00400 April 24, 2008  Dr. Ling Shi, Language and Literacy Education Dear Dr. Ling Shi, RE: Your proposed study: Topic and Culture Knowledge and its Impact on L2 Writing Performance The University of British Columbia Behavioural Research Ethics Board has reviewed the protocol for your proposed research project. The Committee found the procedures to be ethically acceptable and a Certificate of Approval will be issued upon the Committee's receipt of written agency approval from the Alexander College. If you have any questions, please call me at 604-827-5112.  Sincerely,  Shirley A. Thompson Manager, Behavioural Research Ethics Board  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071206/manifest

Comment

Related Items