UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An exploratory study of the hypothesis of divisible versus unitary competence in second language proficiency 1983

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
UBC_1983_A8 B37.pdf [ 6.94MB ]
Metadata
JSON: 1.0078381.json
JSON-LD: 1.0078381+ld.json
RDF/XML (Pretty): 1.0078381.xml
RDF/JSON: 1.0078381+rdf.json
Turtle: 1.0078381+rdf-turtle.txt
N-Triples: 1.0078381+rdf-ntriples.txt
Citation
1.0078381.ris

Full Text

AN EXPLORATORY STUDY OF THE HYPOTHESIS OF D I V I S I B L E VERSUS UNITARY COMPETENCE IN SECOND LANGUAGE PROFICIENCY by ROSS PATRICK BARBOUR B.A., The U n i v e r s i t y Of B r i t i s h C o l u m b i a , 1968 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS i n THE FACULTY OF GRADUATE STUDIES E n g l i s h E d u c a t i o n Department of Language Education We a c c e p t t h i s t h e s i s as c o n f o r m i n g t o t h e r e q u i r e d s t a n d a r d THE UNIVERSITY OF BRITISH COLUMBIA J u l y 1983 © Ross P a t r i c k B a r b o u r , 1983 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of English Education The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date Sept. 13, 1983 DE-6 (3/81) i i Abstract In t h i s research O i l e r ' s question 'Is language proficiency d i v i s i b l e into components?' was explored by determining which of three models best f i t the experimental data: a model postulating numerous s p e c i f i c sources of variance (the extreme d i v i s i b l e model), a model postulating a single, large source of variance (the unitary model), or a model postulating a large general factor and several smaller s p e c i f i c factors. Following analysis of data gathered in a preliminary study, four tests which had c l e a r l y recognizable contrasts in content (grammar vs. vocabulary) and mode (l i s t e n i n g vs. reading) were constructed to i d e n t i f y l i n g u i s t i c and method variance in a c o r r e l a t i o n matrix of language proficiency variables. These four measures were p i l o t tested, revised, and administered in conjunction with eight other language measures to a group of beginning-level ESL learners. The data were factor analyzed using image analysis to explore the r e l a t i v e congruency of the three models to the data. In addition, the relationships between the tests and the demographic variables age, sex, length of time in English Canada, and f i r s t language were also investigated. In the factor analysis, both of the methods used to determine the number of factors to be retained in the f i n a l solution indicated three. (The methods used were the Kaiser-Guttman c r i t e r i o n of selecting factors with eigen values greater than one in a p r i n c i p a l components analysis and the inspection of a varimax rotation of a f u l l image analysis to determine the f i r s t factor with negligible c o e f f i c i e n t s . ) When transformed using a Harris-Kaiser oblique transformation (Independent Clusters), the data presented evidence for a grammar factor, a vocabulary factor and an age-related factor which may be linked c l o s e l y to the hearing a b i l i t y of the students. In addition, the analyses suggested the p o s s i b i l i t y that a listening-mode factor and what I have termed a 'speed of processing factor' were also influencing the variables. The factors, however, were highly correlated, suggesting the presence of a strong general factor underlying a l l of the measures. The analyses of the s p e c i f i c relationships between each of four demographic variables (age, sex, f i r s t language, and the length of time the subject had been in Canada) and each of the twelve language variables revealed a strong negative co r r e l a t i o n between the language measures and two of the demographic variables, age and length of time in Canada. In addition, t h i s set of analyses revealed that the Chinese as a group performed d i f f e r e n t l y than non-Chinese as a group. The analysis of sex produced no s i g n i f i c a n t findings. The conclusion of the study was that the language proficiency data in this study was best modelled by a large general factor and two s p e c i f i c , content-related factors, grammar and vocabulary. The p o s s i b i l i t y of s p e c i f i c factors related to mode was not ruled out. i v Table of Contents Abstract i i L i s t of Tables v i Acknowledgements v i i I. INTRODUCTION TO THE STUDY ...1 1 . 1 Background 1 1.2 Overview Of Experimental Procedures 3 1.3 D e f i n i t i o n Of Terms 4 1.4 Questions And Areas Of Exploration 6 1.4.1 D i v i s i b i l i t y 6 1.4.2 Interpretation 8 1.4.3 Subsidiary Questions 9 1.5 Assumptions 13 1.6 Limitations 13 1.7 Significance Of Study 14 1.8 Organization Of The Study 15 11 . BACKGROUND 17 III. DESIGN AND PROCEDURES 31 3.1 Population ...31 3.2 The Tests And Demographic Variables 33 3.2.1 Tests Developed For Interpretive Purposes ....33 3.2.2 Subtests From The Progress Assessment Battery 36 3.2.3 Composition 44 3.2.4 Supplementary Tests 46 3.2.5 The Demographic Variables 50 3.3 Administration Procedures 50 3.4 S t a t i s t i c a l Procedures 52 3.4.1 Data Preparation And Description 53 3.4.2 Factor Analysis 56 3.4.3 The Subsidiary Analyses 58 IV. PRELIMINARY STUDY AND PILOT ...59 4.1 Preliminary Study 59 4.2 The P i l o t Study 64 V. FINDINGS OF THE STUDY 68 5.1 The Factor Solutions And The D i v i s i b i l i t y Hypotheses 69 5.1.1 Factor Solution For Entire Set 70 5.1.2 The Subset Of Chinese Speakers 75 5.1.3 The Non-Chinese Speakers 78 5.1.4 Summary 81 5.2 The Demographic Variables And Their Relation To The Tests 82 5.2.1 Age 82 5.2.2 Length Of Time In Canada 85 5.2.3 F i r s t Language 87 5.2.4 Sex 89 VI. SUMMARY, CONCLUSIONS, AND IMPLICATIONS 91 6.1 Summary 91 6.1.1 The Factor Analyses And Interpretation 92 6.1.2 The Demographic Variables 96 6.1.3 Age 97 V 6.1.4 F i r s t Language 97 6.1.5 Length Of Time In Canada 98 6.2 Conclusions 98 6.3 Implications And Suggestions For Future Research .100 6.3.1 The Correlation Of The Factors 101 6.3.2 Age 1 02 6.3.3 Hearing 102 6.3.4 F i r s t Language 103 6.3.5 Length Of Time In An English Speaking Envi ronment 104 BIBLIOGRAPHY 105 APPENDIX A - INTRODUCTION, SAMPLE ITEMS, AND SAMPLE ANSWER SHEET FROM LISTENING-STRUCTURE TEST USED IN PILOT STUDY AND MAIN RESEARCH 110 APPENDIX B - EXAMPLE ITEMS FROM THE READING VOCABULARY TESTS USED IN THE PILOT STUDY AND THE MAIN RESEARCH ..113 APPENDIX C - INTRODUCTION AND SAMPLE ITEMS FROM LISTENING VOCABULARY TEST USED IN PILOT STUDY AND MAIN RESEARCH 114 APPENDIX D - EXAMPLE ITEMS FROM THE READING GRAMMAR TESTS USED IN THE PILOT STUDY AND THE MAIN RESEARCH 116 APPENDIX E - EXAMPLE OF CONVERSATION COMPLETION TYPE OF SUBTEST USED IN ASSESSMENT BATTERIES 117 APPENDIX F - EXAMPLE OF ERROR CORRECTION TEST FORMAT USED IN ASSESSMENT BATTERIES 118 APPENDIX G - INTRODUCTION, SAMPLE ITEM, AND SAMPLE ANSWER SHEET FROM LISTENING COMPREHENSION TEST USED IN MAIN RESEARCH 119 APPENDIX H - ORAL INTERVIEW GUIDELINES AND SAMPLE SCORING SHEET 122 APPENDIX I - COMPOSITION MARKING GUIDE 126 APPENDIX J - INTRODUCTION, SAMPLE ITEMS, AND SAMPLE ANSWER SHEET FROM PHONEME DISCRIMINATION TEST 129 APPENDIX K - INTRODUCTION, SAMPLE ITEMS, AND SAMPLE ANSWER SHEET FROM THE BOWEN-FORMAT LISTENING TEST 132 APPENDIX L - EXPERIMENTAL LISTENING TEST USED IN PRELIMINARY STUDY 135 APPENDIX M - THE AUXILIARY FACTOR ANALYSES 138 APPENDIX N - CORRELATION MATRIX OF ALL VARIABLES 140 vi L i s t of Tables I. Breakdown of sample by f i r s t language 31 II. Breakdown of cases by - sex 32 III. Summary s t a t i s t i c s for tests used for interpretive purposes 34 IV. Summary s t a t i s t i c s of subtests in the assessment battery 37 V. Subtest-test correlations of Concom with previous assessment batteries 40 VI. Correlations of previous oral assessments with progress assessment batteries 43 VII. Summary of ratings and computed score used for composition grade 45 VIII. Summary s t a t i s t i c s for three supplementary tests ...46 IX. Descriptive s t a t i s t i c s of subtests in preliminary research 61 X. Varimax rotated factor solution for seven English tests in preliminary tests, l e v e l B4 (n = 73) 62 XI. Descriptive s t a t i s t i c s of subtests in p i l o t study (n = 60) 66 XII. Varimax rotation of PC solution for 7 subtests in the p i l o t study (N=60) 66 XIII. Image analysis followed by Harris Kaiser (Independent clusters) on f u l l (n=18l) data set 71 XIV. Image analysis followed by Harris Kaiser (Independent Clusters) on Chinese subjects (n=121) 76 XV. Image analysis followed by Harris Kaiser (Independent Clusters) on 60 non-Chinese subjects-retaining three factors 79 XVI. Correlations of age and length of time in Canada (LOT) with language measures 83 XVII. Analysis of means grouped by language 88 XVI11 . Analysis of means, subjects grouped by sex 90 XIX. Factor by factor comparison of subsets of the variables ( p r i n c i p a l components followed by varimax rotat ion) 139 Acknowledgement I would l i k e to express my sincere thanks to a l l those people who contributed to th i s work. In particular I would l i k e to thank Dr. Jamie Patrie who suggested that I attend the F i r s t Annual TESOL Summer Institute, Dr. Bernard Mohan who supported my application, and Dr. John W. O i l e r , J r . who treated me to the most insp i r i n g six weeks of my academic career. I would also l i k e to thank Graham Evans, Geoffrey Flack, Tracy Johnson, Sherie Kaplan, Deborah Messenberg, L u c i l l e M i l l i g a n , Donna McGee and Margaret Thompson for donating class-time for the administration of the various tests and Dr. Todd Rodgers for his valuable advice on s t a t i s t i c a l matters. I must especially thank Dr. J. Belanger who kept on saying "This looks good, but why don't you recast...?" F i n a l l y , I wish to thank Atsuko and Ian who put up with three years of "I'm off to the computer centre." 1 I. INTRODUCTION TO THE STUDY 1 . 1 Background In the la s t f i f t e e n years, there has been some controversy over the appropriate model to use in the construction of language proficiency tests. Several theorists (cf., Cooper, 1968; J.B. C a r r o l l , 1968; B.J. C a r r o l l , 1980; Canale and Swain, 1980) have proposed complex models of proficiency which divide language a b i l i t y according to s k i l l s components such as reading, writing, and speaking, and l i n g u i s t i c or socio- l i n g u i s t i c components such as grammar, phonology and register. However, u n t i l recently, l i t t l e research had been done to validate these components. Harris (1968) asked: What evidence do we r e a l l y have, for example, to j u s t i f y the neat d i v i s i o n of most language tests into listening-speaking-writing-grammar components as the most accurate and e f f i c i e n t means of evaluating language 'competence'? (p. 44) Oile r (1976a, 1976b, 1979a), questioning not only the v a l i d i t y of the components but also the v a l i d i t y of the d i v i s i o n of language competence, outlined three hypotheses which he f e l t would have to be supported or refuted prior to real v a l i d a t i o n of any par t i c u l a r components. Simply stated (see Chapter II for further e x p l i c a t i o n ) , the alternative hypotheses were H1) Language proficiency i s d i v i s i b l e into unrelated components (the model of separate t r a i t s ) . H2) Language proficiency i s not d i v i s i b l e into unrelated components (the extreme unitary model). 2 H3) Language proficiency i s p a r t i a l l y d i v i s i b l e (the model postulating a large general factor accompanied by several s p e c i f i c factors). I n i t i a l l y , Oiler (1979b) f e l t that the t h i r d choice would be the most 'parsimonious' model ( i . e . the simplest model capable of explaining the most variance). However, as a result of i n i t i a l research on the hypotheses and his own pragmatic 1 approach to language, he began to advocate the second hypothesis, the extreme unitary t r a i t model. Very recently, though, as a result of reassessment of some of the e a r l i e r research and as a result of research by Bachman and Palmer (1981, 1982), he has renounced the extreme unitary t r a i t model (Oiler, 1981a). Bachman and Palmer (1981) say: As O i l e r (1981, forthcoming) has indicated, there now seems.to be a consensus among researchers that models including both general and s p e c i f i c factors w i l l provide the best explanations for language test data, (p. 450) As yet, however, no t r a i t other than the general t r a i t has received strong construct va l i d a t i o n through re p e t i t i o n of research on d i f f e r e n t samples of subjects. In fact, successful r e p l i c a t i o n of research on d i f f e r e n t samples w i l l be d i f f i c u l t since, as Powers (1982) points out, i t is l i k e l y that the number and nature of underlying factors found in any set of data w i l l depend to a s i g n i f i c a n t extent on such non-linguistic variables 1 The term 'pragmatic' here refers to the study of relationships between expressions in a formal system and things external to the system. (Oiler, 1978. See also Ingram, 1978) 3 as native language, l e v e l cf proficiency of the group, and, of course, the content of the tests. Thus, despite the reported consensus that H3 (the hypothesis of a large general factor and several smaller s p e c i f i c factors) w i l l provide the best explanation of language test data, research investigating such data must, for the time being, begin with consideration of a l l three hypotheses and the imp l i c i t c a l l for empirical support of theory. 1.2 Overview Of Experimental Procedures The purpose of my research was to explore O i l e r ' s d i v i s i b i l i t y hypotheses concerning language proficiency ( l i s t e d above) and the concomitant problem of the construct v a l i d i t y of certain components of that proficiency. More p a r t i c u l a r l y , the present study consisted of a search for evidence of underlying factors or relevant, interpretable sources of variance in language proficiency measures. Following analysis of data gathered in a preliminary study, four tests which had c l e a r l y recognizable c h a r a c t e r i s t i c s of content ( s p e c i f i c a l l y grammar in contrast with vocabulary) and mode (l i s t e n i n g in contrast with reading) were constructed to identif y l i n g u i s t i c and method related sources of common variance in a corr e l a t i o n matrix of language proficiency variables. These four measures were p i l o t tested, revised, and subsequently administered in conjunction with eight other language measures to a group of beginning-level ESL 2 learners at a l o c a l college. The data which were gathered 2 English as a second language. i 4 were factor analysed to explore the v a l i d i t y of the constructs of mode and content as underlying sources of variance. In addition, the relationship between the tests and the demographic variables age, sex, length of time in English speaking Canada (Lot) and f i r s t language was also investigated. The underlying relationships among the language measures and demographic varibles of age and Lot were investigated using co r r e l a t i o n and factor analysis. The effects of sex and f i r s t language (broadly defined as Chinese or not Chinese) on the language measures were investigated using differences of means and their associated significance l e v e l s . The study of f i r s t language was extended by deriving factor solutions for the two sub-groups, Chinese and non-Chinese, as well as for the entire sample. 1.3 D e f i n i t i o n Of Terms 1. 'Vocabulary' i s used to designate the tests with items in which the l i n g u i s t i c r elationship among the stem, the correct choice, and the di s t r a c t o r s i s one of word meaning rather than syntax, phonology or orthography. 2. 'Structure' i s used to designate tests with items in which the l i n g u i s t i c r elationship between the stem, the correct answer and the di s t r a c t o r s i s one of syntax rather than word meaning, phonology, or orthography. 3. 'Listening-mode' refers to the fact that the test was presented e n t i r e l y on tape with no printed component (other than numbers or l e t t e r s to indicate choices). 4. 'Reading-mode' refers to the fact that the test used 5 p r i n t e d m a t e r i a l s o n l y w i t h no a u r a l component i n v o l v e d d u r i n g the a d m i n i s t r a t i o n . 5. Throughout t h i s paper s h o r t l a b e l s a re used t o d e s i g n a t e the v a r i a b l e s i n the r e s e a r c h . F o l l o w i n g i s a b r i e f e x p l a n a t i o n of these l a b e l s . F u l l d e s c r i p t i o n s of the t e s t s a re g i v e n i n Chapter I I I , and examples are found i n Appendices A t o K. A. Comp -- a c o m p o s i t i o n t a s k . The s t u d e n t s w r i t e a s h o r t n a r r a t i v e . B. R e a d s t r u — a r e a d i n g mode, m u l t i p l e - c h o i c e t e s t t h a t f o c u s e s on s t r u c t u r e . C. L i s t s t r u -- a 1 i s t e n i n q mode, m u l t i p l e - c h o i c e t e s t t h a t f o c u s e s on s t r u c t u r e . D. Readvoc -- a r e a d i n g mode, m u l t i p l e - c h o i c e t e s t t h a t f o c u s e s on v o c a b u l a r y . E. L i s t v o c -- a l i s t e n i n g mode, m u l t i p l e - c h o i c e t e s t t h a t f o c u s e s on v o c a b u l a r y . F. Concom -- a c o n v e r s a t i o n c o m p l e t i o n t a s k . S t u d e n t s f i l l i n the b l a n k s i n a d i a l o g . G. O r a l -- an o r a l i n t e r v i e w . H. E r r c o r r l O -- an e r r o r c o r r e c t i o n t a s k t h a t has J_0 it e m s . I . L i s t c o m p -- a 1 i s t e n i n g comprehension t e s t . J . L i s t b o w -- a l i s t e n i n g mode t e s t t h a t was deve l o p e d from a format proposed by Donald Bowan (1975). K. L i s t p h o n -- a 1 i s t e n i n g mode, m u l t i p l e - c h o i c e t e s t t h a t f o c u s e s on phoneme d i s c r i m i n a t i o n . L. E r r c o r r 2 0 — an e r r o r c o r r e c t i o n t a s k t h a t has 20 i t e m s . 6. ' P r o g r e s s assessment b a t t e r y ' r e f e r s t o a s e t of t e s t s g i v e n t o the s t u d e n t s at the end of each term. The format and c o n t e n t of t h e s e t e s t s a r e changed from a d m i n i s t r a t i o n t o 6 administration. The battery is used to help determine who is ready for the next l e v e l in the language program and who appears to need more work at the present l e v e l . 1.4 Questions And Areas Of Exploration The questions in t h i s study f a l l into three categories. F i r s t is the central problem of the d i v i s i b i l i t y of language proficiency. As I w i l l explain in the following section, this is the problem of determining how many of the s t a t i s t i c a l factors generated by a factor analysis solution are of theoretical importance. Second is the problem of interpretation. Seven questions are presented which w i l l aid in interpreting the factors in terms of the content and mode of the salient variables. The t h i r d category contains four subsidiary areas of exploration regarding the relations between the four demographic variables (age, length of time in Canada* sex, and f i r s t language) and the tests. 1.4.1 D i v i s i b i l i t y The ' d i v i s i b i l i t y problem' as presented by O i l e r (1976a, 1976b, 1979a) i s the question whether or not language proficiency i s better modelled as a single t r a i t , the sum of several independent t r a i t s , or the sum of a single dominant t r a i t and several subsidiary t r a i t s . The focus of his hypotheses, when restated in operational terms, is on the common or shared variance of language tests and whether there is a single general factor, several factors, or a large general factor and several minor ones. The implication is that there is 7 a correct choice among the three hypotheses. C a t t e l l (1958) offers a d i f f e r e n t psychometric viewpoint of factors and their influence on manifest variables: In certain real but special cases i t may be that a quite r e s t r i c t e d , f i n i t e number of factors are act u a l l y operative, defining the c h a r a c t e r i s t i c s of a population species, over the p a r t i c u l a r set of variables chosen; but in general, in our interacting universe, an i n f i n i t e number of factors can and do influence the given objects and their dimensions, though the variance from a majority of the factors would be extremely small." Tp! 802) By taking t h i s point of view in the present present research, I have altered the focus of the question from "Is language d i v i s i b l e and, i f so, how many factors are there?" to "How many of the myriad underlying factors are important?" or, as Hakstian and Muller state i t : The number of factors problem in t h i s case, reduces to the task of id e n t i f y i n g those factors whose influence i s great upon the variables sampled from the domain of interest and those whose influence while real i s s l i g h t . (1973, p. 461) Therefore, the operational form of the primary question I have asked in t h i s research i s 1. Using the factor analytic strategies and techniques outlined by Hakstian and Bay (1973), what is the number of factors that should be retained when analyzing the cor r e l a t i o n matrix of language and demographic variables? 8 1.4.2 Interpretation In a solution which indicates that there is more than one factor s i g n i f i c a n t l y influencing the variables, a problem as important as the number of factors is their interpretation. Interpretation involves looking at the patterns of high and low c o e f f i c i e n t s on the rotated factors and r e l a t i n g them to th e o r e t i c a l l y important aspects of the variables. In this research, the important aspects of the tests are the content and the mode. The following questions are central to interpretation in t h i s research: 1. Do the tests designated as vocabulary tests (Readvoc and Listvoc) primarily load on the same factor? 2. Do the tests designated as structure tests (Readstru, ErrcorrlO, and Errcorr.20) primarily load on the same factor? 3. Do the group of tests designated as structure tests and the group of tests designated as vocabulary tests cluster on di f f e r e n t factors? 4. Do a l l of the reading-mode tests load on a single factor? 5. Do a l l of the listening-mode tests load on a single factor? 6. Do the listening-mode tests and the reading-mode tests cluster on d i s t i n c t factors? 7. Do the i n t u i t i v e l y complex (in terms of content) tests (Comp, Oral, and Concom) show fa c t o r a l complexity by loading on more than one factor? 9 1.4.3 Subsidiary Questions During the development and administration of the tests used for the investigation of the d i v i s i b i l i t y hypotheses, I became concerned about whether or not certain c h a r a c t e r i s t i c s of the subjects in the sample were associated with scores on s p e c i f i c measures. Therefore, I also explored the relationships between test scores and the four demographic variables: age, sex, f i r s t language, and the length of time the subjects had been in Canada. Although each of the questions was motivated by concerns regarding pa r t i c u l a r influences in s p e c i f i c tests, these i n i t i a l problems are not amenable to hypothesis testing because of the post hoc nature of the analysis. Thus, I asked the more general question: Does i t seem l i k e l y that these demographic variables influenced the factor analysis of the data? The aim of this section i s not to test p a r t i c u l a r hypotheses but to extend the exploration of the main area of investigation to other areas which could be f r u i t f u l for further research. The following are the s p e c i f i c concerns and the more general questions that were generated as a result of these. 1 . One of the processes commonly associated with aging i s a loss of hearing. The importance of this fact in the context of this study i s the p o s s i b i l i t y that the l i s t e n i n g tests could cluster together as a result of differences in hearing a b i l i t y rather than because they are measuring a ' l i s t e n i n g - s k i l l ' dimension. That i s , what might be construed as a l i s t e n i n g - s k i l l factor, might in r e a l i t y be a hearing factor. If age can be considered as an indirect measure of hearing and i f hearing 1 0 loss is the only age-related factor influencing the variables, then the expected pattern in the data would be for age to have negative correlations associated with the l i s t e n i n g tests and zero or close to zero correlations with the paper and pencil tests. In order to investigate this in the larger context of a possible over a l l age-related factor, the correlations of age with the language variables were inspected, and factor solutions with and without age were compared.3 2. Possible bias against female subjects prompted examination of the effect of sex on the variables. The items in the two vocabulary tests were composed by a male, and I f e l t that this might have resulted in a predominance of words that were more familiar to male subjects than to female subjects. To explore t h i s problem, I calculated the means on the tests for each group and the associated t-test of the difference of means and compared results for the vocabulary tests with the results for the rest of the data. 3. The large proportion (70 per cent) of Chinese speakers in the sample and in the population at large ( i . e . , students in the beginning l e v e l at the community college) suggested the need 3 The i n i t i a l motivation for investigating age as an influencing variable was my concern that older students, because of a natural loss of hearing, were being discriminated against in the listening-mode tests. It has been my experience that a number of students, p a r t i c u l a r l y older ones, exhibit behavior which could be associated with a p a r t i a l loss of hearing. A simple example is the tendency of some older students not to repeat high frequency /s/ and / z / sounds at the end of words when presented with the words o r a l l y . When the words are presented in written form, the same students have no trouble producing the correct sound. 11 to examine the relationship between f i r s t language and performance on the tests. Two possible problems might be associated with such an unbalanced sample. The f i r s t problem is that second language acquisition may be i n t r i n s i c a l l y related to f i r s t language. Swinton and Powers (1980) suggest that d i f f e r e n t factor solutions (both in number of factors and interpretation) are associated with d i f f e r e n t language groups. If this were true, then combining the large homogenous group of Chinese speakers with the more heterogenous remainder would obscure rather than c l a r i f y the factor solution. The second problem results from the possible cumulative effects of methods used in constructing and revising several of the measures. Listvoc, Readvoc, and L i s t s t r u are composed of items that have been tested and revised in successive administrations to samples from the same general population as the research i t s e l f . ( L i s t s t r u in particular was composed of items that had undergone several revisions.) Since the majority of the subjects in the preliminary study, the p i l o t study, and the main research were Chinese, I was concerned that these three tests could have developed a bias against" that group. If th i s were true, and i f the bias were the only factor influencing this u In multiple-choice test construction d i s t r a c t o r s are chosen for their effectiveness in leading poorer students away from the correct choice. One measure of effectiveness i s the number of students who choose the p a r t i c u l a r d i s t r a c t o r . Since i t is possible that some distra c t o r s are more ef f e c t i v e against some language groups than others, those that are e f f e c t i v e against the Chinese would be chosen rather than those which aren't simply because of the larger proportion of Chinese in the samples. 1 2 d a t a , then the means f o r the Chinese s p e a k e r s would be lower than the means f o r the non-Chinese speakers on these t h r e e t e s t s but not on the o t h e r s . The p o s s i b i l i t y of d i f f e r e n t f a c t o r s o l u t i o n s f o r the groups i s i n v e s t i g a t e d by computing s o l u t i o n s f o r the f u l l d a ta s e t and f o r the two s e p a r a t e groups, Chinese and non-Chinese and then comparing the r e s u l t s . In a d d i t i o n , the p o s s i b i l i t y of d i f f e r i n g group p r o f i c i e n c i e s i s i n v e s t i g a t e d by comparing the means of the two groups on the t w e l v e language measures. 5 4. The f i n a l a u x i l i a r y q u e s t i o n examines the r e l a t i o n s h i p between time i n Canada and language p r o f i c i e n c y . T h i s q u e s t i o n was prompted l a r g e l y by my c l a s s r o o m e x p e r i e n c e . I t o f t e n seems t h a t , even w i t h i n the same c l a s s , s t u d e n t s who have been i n Canada l o n g e r can c a r r y on more e x t e n s i v e , 'deeper' c o n v e r s a t i o n s than those who have not been here so l o n g . F u r t h e r m o r e , i t has been my i m p r e s s i o n t h a t t h i s depth d e r i v e s from a l a r g e r s t o r e of f u n c t i o n a l v o c a b u l a r y r a t h e r than from a b e t t e r g r a s p of language s t r u c t u r e . T h i s i m p r e s s i o n i s su p p o r t e d by Powers' (1982) i n t e r p r e t a t i o n of the r e s u l t s of the r e s e a r c h on t h e TOEFL (Test of E n g l i s h as a F o r e i g n Language) t e s t t h a t he and Swinton d i d (Swinton and Powers, 1980). He sug g e s t s t h a t r e s u l t s show: 5 The a p p r o p r i a t e method of a n a l y s i s of the above two problems would i n v o l v e m u l t i v a r i a t e t e s t s of hypotheses r e g a r d i n g means and v a r i a n c e - c o v a r i a n c e m a t r i c e s such as tho s e suggested by K e n d a l l (1980) and K r i s h n a i a h and Lee (1980). However, the m i s s i n g d a t a and the uneven sample s i z e s make such an approach t e c h n i c a l l y complex, perhaps i m p o s s i b l e and c e r t a i n l y beyond the scope of t h i s r e s e a r c h . 1 3 ...that vocabulary, more than any other component, develops with experience or exposure. (p. 334) To explore t h i s , the correlations between the language measures and the reported length of time a student had been in Canada were calculated and compared. 1.5 Assumptions Certain assumptions are made about the conditions in the research. These are 1. Students did not pass on information regarding the tests to students in the following classes. 2. Students were serious in their attempt on each test. 3. Missing data is determined by random causes. 1.6 Limitations The sample in th i s research seems representative of, but not formally generalizable to, adult ESL students at the beginner l e v e l in community college classes in the Vancouver area. The formal ge n e r a l i z a b l i t y of the s p e c i f i c results of this research must be q u a l i f i e d by parameters from two broad areas: the demographic c h a r a c t e r i s t i c s of the sample and the content and presentation of the curriculum. The importance of demographic parameters in influencing the results of investigations of the d i v i s i b i l i t y hypotheses has been established (Powers, 1982; Swinton and Powers, 1980). In the sample used for the present research, the two salient demographic features that w i l l l i m i t g e n e r a l i z a b i l i t y are the predominance of a single language group (70 percent were Chinese 1 4 speaking) and a broad age range. Results, therefore, may not be generalizable to more heterogeneous language groups, to homogenous non-Chinese language groups, or to groups with a narrow age range. The curriculum of the program has a s p e c i f i c grammatical outline and a more general subject area outline (Thompson, 1978). That i s , while a l l instructors cover the same grammatical points, the context (and thus vocabulary) in which they are presented.is more varied. The focus of the present research i s on the contrast between a grammar factor and vocabulary factor and i t may be that the program-wide uniformity in structure content combined with the program-wide d i v e r s i t y in vocabulary content w i l l produce d i s t i n c t i o n s that would not be found in groups involved in other methods of formal i n s t r u c t i o n . 1 .7 Significance Of Study On the l e v e l of theory and construction of language proficiency models, this study w i l l add to the body of knowledge associated with O i l e r ' s three d i v i s i b i l i t y hypotheses in four ways. F i r s t , i t w i l l add to s t a t i s t i c a l information that could support or question the psychological v a l i d i t y of the established l i n g u i s t i c d i s t i n c t i o n between the constructs of grammar and vocabulary, and the pedagogically accepted d i s t i n c t i o n between the s k i l l s of l i s t e n i n g and reading. Second, th i s study has the potential to provide a contribution to future research. If strong evidence i s found to support a s p e c i f i c (rather than a general) construct, and i f the construct is r e l i a b l y measured by any of the variables in the research, 1 5 then this study could supply a marker v a r i a b l e 6 to be used in subsequent research. Third, i f only weak evidence is found, then this study can help the design of new research by indicating which content areas should be studied further and which test formats are most l i k e l y to become more e f f e c t i v e through re v i s i o n . Fourth, t h i s study intoduces non-linguistic variables into the correlation matrix used for factor analysis. If these variables prove useful in c l a r i f y i n g relationships between l i n g u i s t i c variables, future reseachers w i l l be able to design experiments that can control for these sources of variance. 1.8 Organization Of The Study The basis, procedures and results of this study are presented in six chapters. Chapter I. An introduction to and overview of the study. This chapter presents the problem, a summary of the background, the questions and areas of exploration, the assumptions, l i m i t a t i o n s and a statement of the significance of the study. Chapter 11. A review of the related research. This chapter b r i e f l y outlines several language testing models, the d i v i s i b i l i t y hypotheses and related empirical research, and the rela t i o n of the s t a t i s t i c a l tool used, factor analysis, to the process of vali d a t i o n of theory. A marker variable is a test or device which is accepted as a measure of a par t i c u l a r construct. These variables are used to link together research in an area by providing established reference points for new research. 1 6 Chapter I I I . The experimental procedures. This chapter describes the sample, the language measures, and the demographic variables. It also outlines the procedures used in compiling and preparing the raw data and the s t a t i s t i c a l methods used in the study. Chapter IV. Preliminary analysis and p i l o t study. This chapter outlines the results of a preliminary analysis and a p i l o t study both of which were used to design the present study. Chapter V. Summary of the findings. This chapter presents a summary of the results of the factor analyses and the exploratory consideration of correlations and differences in means. Chapter VI. Conclusions and implications for further research. This chapter presents an o v e r - a l l review of the study, the conclusions I have drawn from the results, and some suggestions and implications for further research. 17 II. BACKGROUND Models that propose a variety of domains or components of language proficiency (J.B. C a r r o l l , 1968; Cooper, 1968; Canale and Swain, 1980; B.J. C a r r o l l , 1980) provide theoret i c a l surveys on which to base t e s t s . To support the v a l i d i t y of any p a r t i c u l a r model, there must be evidence that the.components are v a l i d constructs. Oiler's formulation of three hypotheses (1979) concerning the apportionment of variance in a battery of language tests represented a c a l l for the empirical research that he and others had recognized as being sparse (Harris, 1968; Upshur, 1976; Ingram, 1979). However, the problem of disentangling and identifying the numerous sources of variance that influence language test performance extends beyond language proficiency to the tests themselves and to the subjects who take the t e s ts. Results so far indicate that not only may test format (Farhady, 1979) and test method (Bachman and Palmer, 1981, 1982) contaminate research results but that sample and ind i v i d u a l variables such as f i r s t language (Swinton and Powers, 1980), f i r s t language proficiency (Johansson, 1973), and in t e l l i g e n c e (Flahive, 1980) may do so as well. A further source of complexity is the d i v e r s i t y of methods of factor analysis techniques commonly used in the analysis of test battery data. 7 An investigation of O i l e r ' s hypotheses, then, 7 One textbook on the subject (Harman, 1976) discusses nine d i f f e r e n t methods of obtaining an i n i t i a l solution and twelve methods of transforming these in order to obtain better i n t e r p r e t i v i t y . 18 e n t a i l s c o n s i d e r a t i o n of the t h e o r e t i c a l models of second language t e s t i n g , the e f f e c t of the a c t u a l i n s t r u m e n t s on the r e s u l t s , the c h a r a c t e r i s t i c s of samples used i n r e s e a r c h , and the methods used t o a n a l y z e the d a t a . The problems of what t o t e s t and how t o t e s t when measuring second language p r o f i c i e n c y have been a d d r e s s e d by many t h e o r i s t s i n the f i e l d of second language e d u c a t i o n ( c f . , H a r r i s , 1968; O i l e r , 1976a, 1979a; A l l e n and D a v i e s , 1979; Canale and Swain, 1980; D a v i e s , 1968). In g e n e r a l , t h e s p e c i f i c a t i o n s t h a t a r e drawn up s t a t e e x p l i c i t l y s e v e r a l s e p a r a t e s k i l l s and components, and the i m p l i c a t i o n i s t h a t i t i s i m p o r t a n t t o ta k e samples from each of these a r e a s i n d e p e n d e n t l y i n o r d e r t o o b t a i n a complete p r o f i l e of t h e l e a r n e r ' s language p r o f i c i e n c y ( i e s ) . Most, i f not a l l , t e x t b o o k s on second language t e s t i n g make d i v i s i o n s a c c o r d i n g t o s k i l l s ( r e a d i n g , w r i t i n g , s p e a k i n g , l i s t e n i n g ) and some a s p e c t of language i t s e l f such as s y n t a x , morphology or s e m a n t i c s (see f o r example A l l e n and D a v i e s , 1979; H a r r i s , 1968; V a l l e t t e , 1977),, However, the n a t u r e of the s k i l l s or components o f t e n d i f f e r s - depending on whether the v i e w p o i n t of the t h e o r e t i c a l model of language t h a t u n d e r l i e s the t e s t i s p s y c h o l i n g u i s t i c , s o c i o l i n g u i s t i c , p r a g m a t i c , f u n c t i o n a l - n o t i o n a l or o t h e r w i s e . As D a v i e s (1968) p o i n t s out when d i s c u s s i n g the v a l i d i t y of a second language t e s t : I t i s the t e s t c o n s t r u c t o r ' s assumptions i n language l e a r n i n g t h a t a r e r e a l l y b e i n g a n a l y z e d . A good t e s t i s a d e v i c e f o r f r a m i n g t h e s e a s s u m p t i o n s . . . . ( p . 10) 19 J.B. C a r r o l l (1968) focuses on language as behaviour and stresses the necessity of sampling from broad classes of stimuli and responses. He makes a d i s t i n c t i o n between productive and receptive s k i l l s and gives examples of such areas " . . . i n which individual differences are to be sought or measured...." (p. 51) as lexicon, grammar, and phonology. In keeping with his behaviouristic approach, he presents an extensive taxonomy of possible responses to various tasks as an example of how to cover the domains of interest. Cooper (1968), drawing from s o c i o l i n g u i s t i c theory, adds a t h i r d dimension to the usual two-dimensional, s k i l l s - b y - language-component matrix used by many when proposing s p e c i f i c a t i o n s for a test. Along one axis are the usual categories of s k i l l s (reading, auditory comprehension e t c . ) . Along a second are placed the commonly found d i v i s i o n s of language aspect (morphology, syntax, etc. ) . As an extension of thi s second axis, which he labels 'Knowledge,' the concept of context is added. F i n a l l y , he proposes a t h i r d axis or dimension, Language Variety, which then provides "...84 l o g i c a l l y d i s t i n c t 'cubes' each formed by the combination of a s k i l l , a variety, and a type of l i n g u i s t i c or communicative knowledge" (p. 64). Recently two other complex models for testing language proficiency (in p a r t i c u l a r communicative competence) have emerged. Canale and Swain (1980) have proposed a theoretical framework which d i f f e r e n t i a t e s numerous aspects of communicative competence. They outline three general areas to be tested: 20 grammatical competence, s o c i o l i n g u i s t i c competence, and strategic competence. Each general area also contains subcategories such as rules of morphology, rules of syntax, sociocultural rules and rules of discourse. Each one of these would need to be sampled in a testing s i t u a t i o n . B.J. C a r r o l l (1980), drawing extensively on work by Munby (1978), outlines a variety of functional parameters which must be considered when drawing up the s p e c i f i c a t i o n s for test content. These include purpose, setting, interaction, d i a l e c t and units of meaning. Although these models may have i n t u i t i v e and l o g i c a l power, there has not been any strong empirical support for one over another. In the mid 1970's, O i l e r began to question this lack of empirical support for the v a l i d i t y of the various components in the d i f f e r e n t models. Aiming at the l o g i c a l precedent for such models, he proposed two hypotheses regarding the d i v i s i b i l i t y of language (Oiler,1976b). In hypothesis one, he suggests that language i s amenable to d i v i s i o n into unique c e l l s defined by various s k i l l s and their intersection with l i n g u i s t i c a l l y posited components such as syntactic, semantic, phonological or communicative competencies. His second hypothesis i s that the opposite i s true: language proficiency is not d i v i s i b l e into subcomponents or s k i l l areas. Later, in response to Upshur's (1976) suggestion of a l o g i c a l l y possible middle ground, Oi l e r (1979a) expanded the number of hypotheses to three. His t h i r d hypothesis states that language a b i l i t y is p a r t i a l l y d i v i s i b l e , with a large part of i t taken up by a central core (perhaps to be c a l l e d global p r o f i c i e n c y ) , but also 21 includes unique s k i l l s or components which account for some of the differences between people. Placing the hypotheses in the context of expected results from language tests which attempt to measure unique a b i l i t i e s in supposed components or s k i l l areas, he summarizes the hypotheses as follows: The D i v i s i b i l i t y Hypothesis (H1): there w i l l be r e l i a b l e variance shared by tests that assess the same component, s k i l l , aspect, or element of language proficiency, but e s s e n t i a l l y no common variance across tests of d i f f e r e n t components, s k i l l s , aspects, or elements: The I n d i v i s i b i l i t y Hypothesis (H2): there w i l l be r e l i a b l e variance shared by a l l of the tests and es s e n t i a l l y no unique variance shared by tests that purport to measure a pa r t i c u l a r s k i l l , component, or aspect of language profiency: The P a r t i a l D i v i s i b i l i t y Hypothesis (H3): there w i l l be a large chunk of r e l i a b l e variance shared by a l l of the tests, plus small amounts of r e l i a b l e variance shared by only some of the tests. (1979, p. 426) Despite the e x p l i c i t description of separate components in such models as those of C a r r o l l or Cooper summarized above and the strong implied endorsement of H1, many theorists appear to have accepted the existence of some major latent t r a i t that may contribute h o l i s t i c a l l y to language proficiency--that i s , t a c i t support for H3. For example, Cooper (1968) does not i n s i s t that his "cubes" define operationally d i s t i n c t or orthogonal constructs. He suggests that some may co-vary. C a r r o l l (1968), whose model Cooper elaborated upon, acknowledges the existence of such a t r a i t and lin k s i t to the strong Verbal factor found in many factor analytic studies done on various mental 22 measurement batteries. O i l e r , however, tended to support the unitary t r a i t model (H2). Much of the research on the hypotheses focussed on factor analytic studies of batteries of tests given to groups of foreign students in the United States. In many cases, these studies appeared to find a single general factor which accounted for most i f not a l l of the r e l i a b l e variance in such diverse instruments as achievement tests, i n t e l l i g e n c e tests, and a variety of English language tests. (Scholz et a l . , 1980; Flahive, 1980; Hendricks et a l . , 1980; O i l e r and Hinofotis, 1980; Scholz and Scholz, 1979; Stump, 1978; S t r i e f f , 1978.) Oil e r ' s interpretation of those results led him to a rejection of H1 and H3 and consequently of the type of tests which purport to test minute components of English (often c a l l e d discrete point t e s t s ) . He turned instead to the more integrative types of test which recognize "...the pointlessness of attempting to i s o l a t e the components of phonology, morphology, phrase structure, transformational rules, semantics and pragmatics." (1979, p. 25) Recently, however, as a result of both reexamination of design flaws in e a r l i e r studies and the emergence of new research, O i l e r has changed his point of view. He (Oiler, 1981) now suggests that a method of factoring based on the c l a s s i c a l factor model rather than the method of p r i n c i p a l components 8 For a discussion of the difference see Harman, 1976, Chapter Two. For a brief explanation of the particular consequences of using the one instead of the other see O i l e r , 1981a. 23 should have been used for previous analyses. 8 In addition to this weakening of the empirical grounds on which Oi l e r based his e a r l i e r position, recent research has presented strong evidence which supports H3. Bachman and Palmer (1981), using confirmatory factor analysis (Joreskog, 1969, 1978), have shown that, for their data, a model which includes a general factor, a reading factor and a separate speaking factor i s s t a t i s i c a l l y superior to the unitary model favoured by O i l e r . In response to these findings, O i l e r (1981a) states: The research of Bachman and Palmer has eliminated the strong version of the unitary factor hypothesis. The position that I took in several e a r l i e r publications in regard to the p o s s i b i l i t y that such a factor might prove to be the best explanation for pragmatic language processing tasks in general (Oiler 1978; Oiler and Hinofotis; Oiler 1979, Appendix), has been proven wrong. (p. 141) Rejection of H2 and acceptance of H3 does not simplify the search for a generalized language proficiency model. It complicates the problem by establishing the large number of hypothesized language components and s k i l l s as legitimate objects of research. As Oiler (1981a) points out: What Bachman and Palmer have succeeded in showing is that there are undoubtedly s i g n i f i c a n t factors in language proficiency tests beyond the well-established general factor. The number and exact nature of those additional factors, however, remains largely obscure.(p. 130) Judging from the various findings of previous related research, the number and nature of factors found in future 24 research w i l l depend on what is looked at, who is looked at, and how the data are analyzed. In the research that prompted Oiler to reconsider his position, Bachman and Palmer (1981) produced evidence to support a speaking and a reading factor. In a separate piece of research using a similar design, they (Bachman and Palmer, 1982) found evidence of two correlated t r a i t factors which they labelled 'grammatical/pragmatic competence' and ' s o c i o l i n g u i s t i c competence' and two method factors, writing and interview which were uncorrelated. In a study of alternate items for the TOEFL te s t , Pike (1979) reports finding three groupings or clusterings of scores: l i s t e n i n g comprehension, English structure, and writing a b i l i t y . A study by Swinton and Powers (1980), which also supports the hypothesis of d i v i s i b l e language proficiency, introduces the complication that the results of an analysis may be cl o s e l y related to the c h a r a c t e r i s t i c s of the subjects in a sample. The study is impressive because of i t s size and design, and interesting because of i t s diverse r e s u l t s . The researchers factor analyzed 9 the 149-item correlation matrices derived from data obtained in the administration of the TOFEL test to seven language groups (African, Arabic, Chinese, F a r s i , Germanic, Japanese, Spanish). The samples contained from 600 to 998 subjects. The d i f f e r e n t solutions established that at least three factors were necessary in each solution. Furthermore, a l l solutions supported the concept of a separate l i s t e n i n g factor 9 They used a MinRes i n i t i a l solution. See Harman, 1976. 25 and t o some e x t e n t t h e r e was agreement t h a t a v o c a b u l a r y f a c t o r was p r e s e n t . However, the number and . i n t e r p r e t a t i o n of the o t h e r f a c t o r s tended t o d i f f e r depending on the p a r t i c u l a r language group b e i n g a n a l y z e d . On the b a s i s of the d i f f e r i n g means between the language groups, Swinton and Powers l i n k e d t h e s e f a c t o r d i f f e r e n c e s t o o v e r a l l p r o f i c i e n c y r a t h e r than f i r s t language. They proposed t h a t : One h y p o t h e s i s t h a t c o u l d be i n v e s t i g a t e d i s the e x t e n t t o which s e p a r a t e f a c t o r s (or components of v a r i a t i o n ) are more l i k e l y t o emerge as the o v e r a l l language p r o f i c i e n c y of the sample i n c r e a s e s . ( p . 15) Such a p r o p o s a l r a i s e s the p o s s i b i l i t y t h a t a g e n e r a l model of language p r o f i c i e n c y would need t o be d y n a m i c a l l y c o n d i t i o n e d , r a t h e r than s t a t i c a l l y d e f i n e d , over the range of second language p r o f i c i e n c y . In a d d i t i o n t o s o u r c e s of v a r i a n c e a s s o c i a t e d w i t h mode, l i n g u i s t i c component, and p o s s i b l y f i r s t language or g e n e r a l p r o f i c i e n c y , a v a r i e t y of o t h e r n o n - l i n g u i s t i c l i n k s t o performance on second language t e s t s have been found. F l a h i v e (1980) found s t r o n g p o s i t i v e c o r r e l a t i o n s between s e v e r a l language p r o f i c i e n c y t e s t s and s c o r e s on a n o n - v e r b a l i n t e l l i g e n c e t e s t (Raven's p r o g r e s s i v e m a t r i c e s ) . Johansson (1973) found a c o r r e l a t i o n between f i r s t language performance and second language performance. Gardner (1982) r e p o r t e d t h a t a f f e c t i v e v a r i a b l e s measured i n the A t t i t u d e / M o t i v a t i o n Test B a t t e r y p r e d i c t e d (median c o r r e l a t i o n .37) F r e n c h g r a d e s . The Swinton and Powers (1980) study c i t e d e a r l i e r found a p o s i t i v e c o r r e l a t i o n between the f a c t o r d e f i n e d as ' v o c a b u l a r y ' and the 26 two v a r i a b l e s age and undergraduate v s . grad u a t e m a t r i c u l a t i o n s t a t u s . Powers (1982) i n d i c a t e d t h a t t h i s s uggested . . . t h a t t h i s ( v o c a b u l a r y ) d i m e n s i o n of v a r i a n c e was both r e l i a b l y d e termined and d i s t i n c t from the o t h e r f a c t o r s . (p. 333) The importance of the s e n o n - l i n g u i s t i c s o u r c e s of v a r i a t i o n i n a se t of data t h a t w i l l be a n a l y z e d i s t h a t they may, i f i g n o r e d , o b s c u r e t r u e r e l a t i o n s between l i n g u i s t i c v a r i a b l e s or c r e a t e s p u r i o u s ones. As noted e a r l i e r , O i l e r (1981a) i n d i c a t e s t h a t the method of f a c t o r a n a l y s i s used i n any p a r t i c u l a r r e s e a r c h w i l l a l s o a f f e c t the n a t u r e of the s o l u t i o n and i t s i n t e r p r e t a t i o n . G e n e r a l l y , r e s e a r c h t h a t has a d d r e s s e d the v a l i d a t i o n of components of language has used f a c t o r a n a l y s i s (Swinton and Powers, 1980; S c h o l z et a l . , 1980; O i l e r and H i n o f o t i s , 1980; Bachman and Palmer, 1981, 1982). F a c t o r a n a l y s i s , however, i s a g e n e r a l t reatment c o v e r i n g a v a r i e t y of s t a t i s t i c a l t e c h n i q u e s which a re used t o d i s c o v e r an u n d e r l y i n g f a c t o r a l c o m p o s i t i o n of a d a t a s e t . A f a c t o r a n a l y s i s u s u a l l y f o l l o w s a sequence. F i r s t , an i n i t i a l s o l u t i o n i s d e r i v e d which e s t a b l i s h e s some e s t i m a t e of the dime n s i o n s of the f a c t o r a l space of the d a t a . That i s , i t p r o v i d e s a g e n e r a l i d e a of the number of i m p o r t a n t u n d e r l y i n g common f a c t o r s a c t i n g i n the d a t a s e t . The s t u d i e s by O i l e r and H i n o f o t i s (1980), S c h o l z et a l . (1980) and H e n d r i c k s e t a l . (1980) used the method of p r i n c i p a l components which was s u b s e q u e n t l y c r i t i c i z e d . Swinton and Powers (1980) and P i k e (1979) used a minimum r e s i d u a l s method w h i l e Bachman 27 and Palmer (1981, 1982) r e l i e d on a maximum li k e l i h o o d i n i t i a l solution. Both of these l a t t e r methods are based on the c l a s s i c a l factor model (see Harman, 1976). Often these i n i t i a l solutions are, as Hakstian and Bay (1973) suggest, " i n t e r p r e t i v e l y useless" (p. 29) since they do not c l e a r l y indicate the relationships between the factors and the variables. In order to provide some meaning to the factors, the axes of the space must be shifted while the projections of the variables remain stable in their r e l a t i o n to one another. This i s referred to as a transformation (Hakstian and Bay 1973). This sequence of an i n i t i a l solution followed by a transformation i s often repeated with d i f f e r e n t numbers of factors before a preferred solution i s found. Which type of transformation i s eventually chosen w i l l depend on the r e l a t i v e s i m p l i c i t y of the structure of the solution and whether the solution i s meant to be exploratory or confirmatory. Harman (1976) has pointed out that "...a given matrix of correlations can be factored in an i n f i n i t e number of ways." (p .4) The important points in choosing a preferred solution are, he says, s t a t i s t i c a l s i m p l i c i t y 1 0 and s c i e n t i f i c meaningfulness. Or, as stated by Hakstian and Bay (1973): The guiding p r i n c i p l e in such transformation i s Thurstone's notion of simple structure, or the idea that each factor should be interpretable in terms of (or have high loading by) a small number of variables, with the remaining variables r e l a t i v e l y free of the influence of (or loading near zero on) that factor. 1 0 Harman gives a more detailed outline of the concept of simple structure. (1976, p. 97-98) 28 (p. 29-30) To f u r t h e r a i d i n the c h o i c e of a s o l u t i o n , H a k s t i a n 1 1 has d i v i d e d f a c t o r a n a l y t i c r e s e a r c h i n t o : t h a t m o t i v a t e d by e i t h e r taxonomic or e x p l a n a t o r y i n t e r e s t s on the p a r t of the i n v e s t i g a t o r . . . The taxonomic view of f a c t o r a n a l y s i s r e g a r d s f a c t o r s as merely c o n v e n i e n t g r o u p i n g s or c l u s t e r s of v a r i a b l e s -- gr o u p i n g s t h a t c a r r y l i t t l e c o n s t r u c t v a l i d i t y or e p i s t e m o l o g i c a l s t a t u s . The e x p l a n a t o r y view of f a c t o r s and f a c t o r a n a l y s i s , on the o t h e r hand, r e g a r d s f a c t o r s as c a u s a l agents -- v a l i d and r e p l i c a b l e c o n s t r u c t s t h a t determine the c o v a r i a t i o n among the p h e n o t y p i c c o n s t r u c t s i n the domain of i n t e r e s t . (p. 16) Research which concerns the c o n s t r u c t v a l i d i t y of v a r i o u s components of language u s u a l l y t a k e s the ' e x p l a n a t o r y ' view of f a c t o r s . F a c t o r a n a l y s i s , both taxonomic and e x p l a n a t o r y , can a l s o be d i v i d e d i n t o ' e x p l o r a t o r y ' and ' c o n f i r m a t o r y . ' P h i l o s o p h i c a l l y , the d i f f e r e n c e i s whether the r e s e a r c h e r has a h y p o t h e t i c a l s t r u c t u r e i n rnind when he approaches the d a t a and wishes t o c o n f i r m t h i s or n o t . S t a t i s t i c a l l y , the two are d i f f e r e n t i n t h a t i n an e x p l o r a t o r y a n a l y s i s "...the shape of the f i n a l s o l u t i o n i s not i n f l u e n c e d by c o n d i t i o n s o u t s i d e of the a n a l y s i s " ( H a k s t i a n and Bay, 1973, p. 6 9 ) . In c o n f i r m a t o r y f a c t o r a n a l y s i s , on the o t h e r hand, c o n s t r a i n t s a r e put on the s o l u t i o n by s e t t i n g a t a r g e t m a t r i x which embodies a t h e o r e t i c a l model proposed by the r e s e a r c h e r . J o r e s k o g (1969, 1978) 1 1 H a k s t i a n and Bay, 1973. See a l s o H a k s t i a n and M u l l e r , 1973. 29 o u t l i n e s one method of d e t e r m i n i n g the s t a t i s t i c a l s i g n i f i c a n c e of the c l o s e n e s s of f i t of such a s o l u t i o n . That t h e r e i s a d i f f e r e n c e between the p h i l o s o p h i c a l and s t a t i s t i c a l u n d e r s t a n d i n g of the term ' c o n f i r m a t o r y ' i s i m p o r t a n t i n the c o n t e x t of language r e s e a r c h . Bachman and Palmer (1981) t e s t e d "...over 20 d i f f e r e n t c a u s a l models...." (p. 78) a g a i n s t t h e i r d a t a . That i s , they were u s i n g a s t a t i s t i c a l c o n f i r m a t o r y a n a l y t i c t e c h n i q u e i n an e x p l o r a t o r y manner 1 2 not ' c o n f i r m i n g ' the v a l i d i t y of a p r e - e x i s t i n g t h e o r y . In summary, the r e s u l t s of r e s e a r c h on the d i v i s i b i l i t y h y potheses and .the problem of the v a l i d a t i o n of language p r o f i c i e n c y c o n s t r u c t s are c h a r a c t e r i z e d by f o u r themes. F i r s t , t h e r e are the v a r i o u s t h e o r i e s ( l i n g u i s t i c , s o c i o l i n g u i s t i c and p s y c h o l i n g u i s t i c ) of competence and performance. These, of c o u r s e , determine the n a t u r e of the a c t u a l measuring d e v i c e s from which s p r i n g s the second problem: the i d e n t i f i c a t i o n of method, format, or mode v a r i a n c e i n the r e s u l t s . T h i r d , t h e r e i s the n a t u r e of the sample i t s e l f . I n the m u l t i - d i m e n s i o n a l , r e a l u n i v e r s e t h a t the s u b j e c t s b r i n g w i t h them t o the t e s t i n g e nvironment, a v a r i e t y of p s y c h o l o g i c a l , e x p e r i e n t i a l , and demographic v a r i a b l e s have been i d e n t i f i e d t h a t seem t o have s i g n i f i c a n t e f f e c t s on performance on language t e s t s . F i n a l l y , t h e r e a r e the methods used i n a n a l y z i n g the d a t a . D i f f e r e n t approaches have been shown t o b r i n g d i f f e r e n t r e s u l t s and 1 2 T h i s i s not meant t o be c r i t i c i s m of t h e i r work. J o r e s k o g (1978) p o i n t s out t h a t i n some c a s e s e x p l o r a t o r y t e c h n i q u e s may a c t u a l l y obscure p a r t i c u l a r t y p e s of s t r u c t u r e s w i t h i n the d a t a . 3 0 i n t e r p r e t a t i o n s (see i n p a r t i c u l a r O i l e r , 1981) ; as yet no s i n g l e method can be s a i d t o have the u n q u a l i f i e d support of a l l i n the f i e l d . Research which seeks t o d i s c o v e r the n a t u r e of language p r o f i c i e n c y and whether i t i s u n i t a r y , d i v i s i b l e , or dominated by a g l o b a l t r a i t but a l s o composed of s u b s i d i a r y , s p e c i f i c t r a i t s must address a l l f o u r of these themes. 31 III. DESIGN AND PROCEDURES 3.1 Population The subjects in the. study were E.S.L. students in a metropolitan community college. They were adults (18 years and older) from a variety of l i n g u i s t i c , c u l t u r a l and educational backgrounds. Analysis of the demographic variables shows 14 di f f e r e n t languages (see Table I ) , and an approximately even (55 percent male: 45 percent female) d i v i s i o n of sexes (see Table I I ) . The age ranged from 19 to 73 years old. Table I - Breakdown of sample by f i r s t language Language Frequency 1 . Chinese 121 2. Vietnamese 22 3. Japanese 4 4. Punjabi 4 5. Spanish 4 6. Gujarati 3 7. Greek 2 8. Korean 2 9. Portuguese 2 10. Hindi 1 11. I t a l ian 1 12. Polish 1 13. Russian 1 14. Tagalog 1 15. (non-chinese) 9* Missing 2 (* These cases were known to be non-Chinese but the actual language was not known) 32 Typical previous groups included a wide range of backgrounds: farm workers with l i t t l e or no education and professionals such as doctors, dentists, or engineers. The majority of the students are immigrants or Canadian c i t i z e n s . People who have come to Canada on student or v i s i t o r ' s visas are not permitted to enrol in thi s program. Some of the subjects may hold diplomatic visas. Table II - Breakdown of cases by sex MALE N % 93 (55) Missing cases: 12 (not reported) FEMALE N % 76 (45) The subjects' a b i l i t y in English can best be i l l u s t r a t e d with an outline of the hierarchy of the entire program. The college program has three l e v e l s : beginners, intermediate and advanced. In each l e v e l there are two sub-levels, lower and upper. Those students who wish to go on to study in content areas in colleges or u n i v e r s i t i e s generally have to take another year of language studies beyond the 'advanced' l e v e l . In summary then, there are six sub-levels or steps leading from zero proficiency through to a le v e l before college preparation. The data were gathered from students at the second sub-level 33 (Upper Beginners). Several of the tests in the analysis were those in an assessment battery used to promote students from Upper Beginners to Lower Intermediate. 3.2 The Tests And Demographic Variables Twelve language proficiency measures and four demographic variables provided the data for the investigation. In the descriptions and explanations which follow, I have grouped the language proficiency measures into four categories. F i r s t are four measures ( L i s t s t r u , Readvoc, Listvoc, and Readstru) which were included in the research to mark contrasts in mode (l i s t e n i n g vs. reading) and content (grammar vs. vocabulary). Second are four subtests (Concom, ErrcorrlO, Listcomp, and Oral) from the progress assessment battery administered to the population at the end of the term. Third i s the composition score which i s from data c o l l e c t e d in an internal college project on the development of a composition rating scale. In the fourth category are three supplementary tests (Listphon, Listbow, and Errcorr20). 3.2.1 Tests Developed For Interpretive Purposes Table III presents summary s t a t i s t i c s from four tests that were constructed s p e c i f i c a l l y to ide n t i f y contrasts in mode (l i s t e n i n g and reading) and content (grammar and vocabulary) in the interpretation of the results of the factor a n a l y s i s . 1 3 These tests a l l show moderate r e l i a b i l i t i e s , ranging from .65 to 1 3 The development and p i l o t testing of these four measures i s outlined in Chapter IV, Preliminary Study and P i l o t . 34 .74. Low to moderate internal r e l i a b i l i t i e s may derive either from error variance or from the response of the variable to more than one underlying source of variance (Magnussan, 1967). As Borg and Gall point out (1978), error variance in measures w i l l obscure finer d i s t i n c t i o n s that would otherwise be made apparent. Thus, while being a drawback in that they may indicate error variance which i s clouding real d i s t i n c t i o n s among the variables, the low to moderate values of the r e l i a b i l i t y c o e f f i c i e n t s w i l l not invalidate any d i s t i n c t i o n s that are found. If, on the other hand, the r e l i a b i l i t y estimates have been depressed by facto r a l complexity of the variables, then this w i l l be revealed in the Table III - Summary s t a t i s t i c s for tests used for interpretive purposes Mean s. d. No. of I tems N Rel LISTSTRU 12.3 3.7 28 1 55 .66 READVOC 16.0 4.1 27 1 46 .69 LISTVOC 8.0 3.2 20 1 67 .65 READSTRUC 18.9 3.9 30 1 64 .74 analysis because the variable w i l l load on more than one factor. Among the tests in Table I I I , Listvoc i s the only test that shows the adverse effect of being too d i f f i c u l t . Since the mean (8.0) i s only about one standard deviation (3.2) above the chance score of f i v e , there i s probably some error variance 35 being generated by guessing. As noted e a r l i e r , this effect would be reflected in the r e l i a b i l i t y . 1. L i s t s t r u (listening-structure) is a multiple-choice English grammar (structure) test in l i s t e n i n g mode. (See Appendix A for scr i p t of introduction, sample items, and sample answer sheet.) It is an extended and revised version of the multiple-choice l i s t e n i n g structure test used in the p i l o t study and in content is almost i d e n t i c a l to Readstru described below. The prototype of L i s t s t r u was simply a reading-mode (paper and pencil) grammar test transformed completely into a l i s t e n i n g - mode test with a l l parts, stem and options, being heard by the subject. This test was included to investigate the eff e c t of mode. If l i s t e n i n g mode is a unique source of variance (different from both content and reading mode) then this test should exhibit f a c t o r a l complexity. That i s , there should be common variance with Readstru and with some other factor that could be i d e n t i f i e d as strongly related to the mode of l i s t e n i n g . 2. Readvoc (reading-vocabulary) i s a multiple-choice vocabulary test in reading mode. (See Appendix B.) It was designed to identif y the presence, i f any, of a "vocabulary" factor underlying the twelve variables. The format and mode are id e n t i c a l to Readstru. Therefore, i f format and mode are sources of variance, this test w i l l overlap to some degree with Readstru, even i f there is a component of language proficiency that could be labelled 'vocabulary.' The extent of the overlap w i l l give some indication of the strength of mode and format in 36 contrast to content as sources of variance. 3. Listvoc (listening-vocabulary) is the aural form of Readvoc (see Appendix C). That i s , i t is a multiple-choice English vocabulary test in l i s t e n i n g mode. In fact, as pointed out in Chapter IV, Listvoc and Readvoc are merely presentations in d i f f e r e n t modes of items randomly selected from the same item pool. The test was included to highlight and make interpretable a vocabulary factor i f one could be educed. It forms a clear mode/content contrast with Readstru. 4. Readstru (reading-structure) is a multiple-choice English grammar test in reading mode. (See Appendix D.) Like L i s t s t r u , i t was included in order to identif y a grammar or structure factor i f one was influencing the set of variables. This test is not a version of the reading-mode structure test described in Chapter IV although i t i s very s i m i l a r . It is one module of a multiple choice English grammar test that was being developed at the college at the time of the research. 3.2.2 Subtests From The Progress Assessment Battery Four of the measures used in the research were the four subtests in the progress assessment battery given to students at the end of the term. Table IV presents the summary s t a t i s t i c s for these tests. The r e l i a b i l i t y of ErrcorrlO (.75) and Listcomp (.70) are moderate. Relative to the length of the test, the r e l i a b i l i t y (.75) of the ten item ErrcorrlO i s very high. According to the Spearman-Browm formula for correcton for attenuation (Ebel, 1974) the r e l i a b i l i t y of th i s test would be 37 .90 i f made the same length (30 items) as Readstru. The r e l i a b i l i t y i s no doubt helped by the test's independence from error variance created by guessing. Table IV - Summary s t a t i s t i c s of subtests in the assessment battery Mean s .d. No. of I terns N Rel CONCOM 7.3 1 .9 (10) 181 - ERRCORR10 5.9 2.4 1 0 181 .75 LISTCOMP 12.8 3.5 20 181 .70 ORAL 14.4 2.6 (25) 181 _ The two tests (Oral and Concom) which were subjectively graded have no measure of r e l i a b i l i t y from which to estimate error variance. The narrow standard deviation ( 1 . 9 1 ) of Concom suggests that the test was not making as clear d i s t i n c t i o n s among students as the other measures and consequently low correlations between this test and any others may be as much a re f l e c t i o n of this as of a difference in language dimension. In addition, t h i s test showed a s l i g h t c e i l i n g effect with a t h i r d of the students obtaining 90 percent or greater. Both the narrow standard deviation and the c e i l i n g prevent i t from displaying an accurate representation of the relat i o n s h i p between th i s kind of task and content and the others in the analysis. Interpretation of the results of the analysis are 38 tempered by this information. The oral test does a better job of spreading out students than does Concom. In addition, there i s no c e i l i n g effect on the d i s t r i b u t i o n so these two problems w i l l not be present in the interpretation of the factor analyses or corr e l a t i o n s . 1. The test labelled Concom (conversation completion) is a completion type of exercise in which the student writes the answer in a blank (see Appendix E). In t h i s p a r t i c u l a r type of test the student reads a short introduction which outlines a si t u a t i o n . This is followed by an incomplete dialog in which several of the sentences are replaced by blank l i n e s . The student's task is to f i l l these blanks with appropriate, grammatically correct (though not necessarily complete) responses. This test was graded by the students' own instructors who used the following guidelines for marking. Each blank was assigned an equal percentage of the t o t a l . The written responses were f i r s t considered for appropriateness. If the response did not follow from or lead into the rest of the dialog, i t was given zero for that part. For example in the following a: And how are you today? b: a: Oh, that is too bad. How long have you f e l t l i k e that? i f the student wrote "Fine, and you?" then the mark for that 39 response was z e r o . S i m i l a r l y , i f the response was not c o m p r e h e n s i b l e because of s t r u c t u r e , word usage, s p e l l i n g , or h a n d w r i t i n g , the mark was z e r o . Those p a r t s which d i d not r e c e i v e z e r o were checked f o r g r a m m a t i c a l a c c u r a c y and s p e l l i n g . A s i n g l e p o i n t was removed f o r each major s t r u c t u r a l e r r o r ( i n c o r r e c t d e l e t i o n of a v e r b or s u b j e c t , wrong t e n s e , word o r d e r e t c ) ; h a l f p o i n t s were removed f o r s p e l l i n g e r r o r s or minor s t r u c t u r a l e r r o r s ( d e l e t i o n or i n s e r t i o n of a r t i c l e s , p l u r a l or t h i r d person ' s ' , c o u n t a b l e nouns t r e a t e d as u n c o u n t a b l e and v i c e v e r s a ) . A s t u d e n t ' s s c o r e was the sum of the r e m a i n i n g p o i n t s . On the f a c e of i t , the c o m b i n a t i o n of the t a s k and the e v a l u a t i o n method c l e a r l y l e a d t o c o m p l e x i t y , c o v e r i n g r e a d i n g comprehension, s i t u a t i o n a l p r o f i c i e n c y , s t r u c t u r e , v o c a b u l a r y , and s p e l l i n g . The s c o r e was i n c l u d e d i n the a n a l y s i s t o see i f such h y p o t h e t i c a l c o m p l e x i t y would be borne out s t a t i s t i c a l l y . U n l i k e the c o m p o s i t i o n s , t h i s e x e r c i s e has no measure of i n t e r - r a t e r r e l i a b i l i t y . I t was not e s t i m a t e d i n the assessment b a t t e r y p r o c e d u r e , and the o r i g i n a l p r o d uct of the s t u d e n t was not a v a i l a b l e a f t e r w a r d s f o r r e - e v a l u a t i o n . I f e l t t h a t i f the measure d i s p l a y e d low communality w i t h the o t h e r v a r i a b l e s and had an e r r a t i c or u n s t a b l e b e h a v i o r i n the a n a l y s i s then i t c o u l d be dropped. E x p e r i e n c e w i t h a s i m i l a r t e s t s i n the p i l o t and p r e v i o u s a d m i n i s t r a t i o n s of the b a t t e r y suggested t h a t i t would show moderate communality w i t h the o t h e r t e s t s (r=.37 t o .60. See T a b l e V.) 40 T a b l e V - S u b t e s t - t e s t c o r r e l a t i o n s of Concom w i t h p r e v i o u s assessment b a t t e r i e s Date R N A " 9 1979 .60 89 Feb 1980 .59 73 May 1980 .68 98 June 1980 .47 , l 9 * Aug 1980 .44 1 0 8 * Oct 1980 .37 1 3 1 * (Those t e s t s marked w i t h * a r e . c o r r e l a t e d o n l y w i t h the r e a d i n g / w r i t i n g s u b t e s t t o t a l ) 2. E r r c o r r l O ( e r r o r - c o r r e c t i o n , 10 items) i s the second t e s t i n the assessment b a t t e r y . I t i s an e r r o r . c o r r e c t i o n format w i t h 10 items (see Appendix F ) . In t h i s format the st u d e n t i s g i v e n a s h o r t r e a d i n g passage of f i f t y t o one hundred words. P a r t s of the passage (words or p h r a s e s ) are u n d e r l i n e d . The s t u d e n t s ' t a s k i s t o determine whether the u n d e r l i n e d p o r t i o n i s i n e r r o r or not and c o r r e c t i t i f i t i s . W h i l e such t e s t s may c o n t a i n a v a r i e t y of e r r o r s ( v o c a b u l a r y , usage, s p e l l i n g , or s t r u c t u r e ) , t h i s t e s t i n c l u d e d o n l y s t r u c t u r a l e r r o r s . T h i s format had been used e x t e n s i v e l y on the d i f f e r e n t assessment b a t t e r i e s a t the beginner l e v e l s over the p r e c e d i n g f o u r y e a r s . In each a n a l y s i s the format had shown good r e l i a b i l i t y (.69 to .80) and a good tendency t o spre a d s t u d e n t s out d e s p i t e the s h o r t l e n g t h . (See Chapter IV f o r s t a t i s t i c s on two o t h e r such t e s t s . ) W i t h the items f o c u s s i n g as they do on 41 structural errors, the test can be labelled a grammar test and was included in the analysis with the expectation that i t would show common variance with Readstru. 3. The test labelled Listcomp (listening-comprehension) is a multiple choice form of the type of test commonly termed l i s t e n i n g comprehension (see Appendix G). In t h i s form of test, subjects hear a conversation between two people, three to five lines long and l a s t i n g four to fourteen seconds. (In t h i s p a r t i c u l a r test they heard the conversation twice.) Afterwards, the students hear several questions. These questions ask for r e c a l l of d e t a i l s of the conversation or for inferences about the people and their location or a c t i v i t i e s . Following each question, answer choices are given. The student c i r c l e s the l e t t e r on the answer sheet which corresponds to the correct choice. (In this form of the test for the present research, neither the conversations nor the questions and choices were presented in print.) This general format had been included on the assessment battery four times in the three years preceding the research. In a l l four administrations the results were unsatisfactory because of low r e l i a b i l i t i e s . Despite t h i s , inspection of the individual item s t a t i s t i c s suggested that i t was possible to create an e f f e c t i v e test using t h i s format but that care would have to taken to avoid making one that was too d i f f i c u l t . Because of t h i s history, I also f e l t that a twenty-item test might be more ef f e c t i v e and r e l i a b l e than the usual ten-item one. However, including an extended version of this type of 42 test in the assessment battery was impractical because of time constraints. Instead, an e n t i r e l y new test was created using new items modelled on those from the previous tests which displayed satisfactory item c h a r a c t e r i s t i c s . I intended to administer one test before the assessment battery and one within that battery, then combine the two scores to produce a single l i s t e n i n g comprehension mark. When the f i r s t module was administered, i t was obvious that i t was s t i l l far too d i f f i c u l t for the target population. The assessment-battery module was s i m p l i f i e d and lengthened, and the number of options decreased from four to three. Of these, only two of the options were aural. The t h i r d option was that neither of the f i r s t two was correct. 4. The Oral test i s an eight-minute, four-part, guided interview with an instructor. ( See Appendix H for guidelines and sample score sheet.) It consisted of a warmup, a free- speaking period, a question-making section and a language-use section. In each part, the focus of the interview was d i f f e r e n t and the students' responses were evaluated according to s l i g h t l y d i f f e r e n t c r i t e r i a . The weighting of the parts was approximately equal, but the c r i t e r i a did' tend to emphasize accuracy of structure. The oral interview method of testing language proficiency has a great deal of face v a l i d i t y . However, as with the conversation completion exercise (Concom), th i s measure has no estimate of inter-rater r e l i a b i l i t y . The p r a c t i c a l problems associated with obtaining such estimates are large. Mullins 43 (1980) has shown that the best r e l i a b i l i t y can be obtained when interviewers are given a general scale on which to base their judgments. In addition, previous analyses of the battery showed s i g n i f i c a n t positive correlations of similar o r a l assessments with the t o t a l test (see Table VI). These ratings were given under similar conditions to those in thi s study: raters were not the students' own instructors and had no knowledge of the students' previous performance on any language tests. Table VI - Correlations of previous oral assessments with progress assessment batteries Date R N Aug. 1979 .69 89 May 1980 .88 98 June 1980 .70* 119 Dec 1980 .77 73 (* does not include oral test in t o t a l test) If the matrix of language variables does allow a multi-factor solution, the oral measurement w i l l have important interpretive power because i t can not be l o g i c a l l y associated with reading mode or paper-and-penci1 tests as such. Thus i t has the important potential of distinguishing l i n g u i s t i c a l l y related common variance from mode or method related common variance. 44 3.2.3 Composition The composition score (Comp) is based on data gathered in a separate project done by a committee at the college to develop a program-wide scale for rating student compositions. The f i n a l form of the scale consisted of descriptions of five s k i l l levels in three hypothesized components of writing s k i l l : semantics, syntax and orthography (See Appendix I ) . In the i n i t i a l stage of development, samples of student writing from several levels were scrutinized and descriptions of the written work at the various lev e l s were composed. These descriptions were dist r i b u t e d to other instructors for suggestions on c l a r i f i c a t i o n of wording. Next, a group-training session was held during which instructors used the scale to evaluate samples of students' work. In the f i n a l stage, each Wednesday for three consecutive weeks each student in the ESL program wrote a composition based on a set of pictures which depicted a sto r y l i n e involving several people. They had one hour to complete their work. After each writing, the papers were graded separately, f i r s t by the students' instructors and then by another instructor. Because the results of these evaluations would be used to promote the students, i f there was a difference of three or more points between the f i r s t two raters the paper was evaluated by a t h i r d rater and the discrepant grade was eliminated. 1" At the end of the three weeks, then, three sets of two (or three) ratings on the students' composition 1 4 This i s in agreement with the procedure recommended by Diederich (1974). 45 proficiency were available. For the purpose of the present research, only ratings on the f i n a l compositions were used. (See Table VII for summary.) This was done for two reasons. F i r s t , I f e l t that the instuctors had become more adept at using the scale by that time and that their evaluations had become stable and more in agreement with the scale. This was indicated by the increase in the inter-rater correlation (Pearson product-moment) from .73 on the f i r s t set to .89 on the f i n a l set. In addition, the f i n a l composition was written in the same time period as the rest of the measurements done for the research. Table VII - Summary of ratings and computed composition grade score used for Mean s. d. Total Possible N Rel. F i r s t Rater 7.36 2.46 25 1 75 - Second Rater 7.40 2.46 25 175 Composition 14.8 4.78 50 175 .89(a) (a) c o r r e l a t i o n of f i r s t and second rater The score used in the research was the sum of the scores given by the two raters or in the sixteen cases where a t h i r d rater was required, the two scores closest together. In no case did the t h i r d rater f a l l exactly between the f i r s t two scores. 46 3.2.4 Supplementary Tests In both the preliminary research and the p i l o t study, only two listening-mode tests were included. However, Harman (1976) suggests i t i s necessary to have at least three tests loading on a factor to define i t and Gorsuch (1974) recommends f i v e . Consequently, following the p i l o t , two more listening-mode tests were developed and p i l o t tested. An additional error correction format test was also administered. Table VIII presents the summary s t a t i s t i c s for these three supplementary tests. Table VIII - Summary s t a t i s t i c s tests for three supplementary Mean s.d. Number I tems of N Rel. Listphon 46.3 9.3 62 1 34 ' .92 Li stbow 15.1 4.6 29 1 57 .80 Errcorr20 7.7 3.71 20 1 53 .81 1. The test l a b e l l e d Listphon (listening-phoneme) i s a l i s t e n i n g test in which the students must distinguish between vowel phonemes (see Appendix J ) . It i s an extension of a commonly-used classroom sound discrimination exercise. In such an exercise, the student hears three words and is required to determine which one ( f i r s t , second, third) is d i f f e r e n t from the other two. For this test, two other options were included: the choice of a l l words di f f e r e n t or a l l words the same. The 47 material for the test was taken from a pronunciation book containing minimal pairs (Nilson and Nilson 1973). Two each of thirty-one of the thirty-three vowel contrasts given in the book were chosen. 1 5 This phoneme discrimination test was pre-tested on a sample of thirty-nine students. The results of this pre-testing showed a r e l i a b i l i t y of .92 and a good spread (sd= 8.25). Once i t s r e l i a b i l i t y was determined, no item analysis or revision was done despite there being numerous items that were obviously i n e f f e c t i v e . I f e l t that since certain languages have more trouble with some contrasts than others, the elimination of " i n e f f e c t i v e " items might bias the pa r t i c u l a r test strongly against the Chinese, who made up 70 percent of the population. The. test was created and included because i t was short (thirteen minutes), easy to create, and represented an extreme end of the discrete-integrative test item scale. I f e l t that i f a l i s t e n i n g factor were found and i f t h i s p a r t i c u l a r measure were c l o s e l y related to i t , then i t could be used e f f e c t i v e l y as an i n d i r e c t measure of that factor. 2. The test l a b e l l e d Listbow (listening-Bowen) was a l i s t e n i n g test based on a format developed by Bowen (1975) which he c a l l e d an integrative test of English Grammar (see Appendix K). Ke suggested that i t : ... measures the a b i l i t y of a subject to reconstruct Two contrasts based on the phoneme /O/ (as pronounced in 'caught' in some American dialects) were omitted because this d i s t i n c t i o n i s not made in Canadian English. 48 obscured words by means of sentence analysis, c a r r i e d out not as a separate academic task, but in the normal procedure of understanding what the sentence says. It is a task b u i l t on the assumption that the a b i l i t y to handle reduced redundancy i s a v a l i d measure of l i n g u i s t i c competence. The reduced redundancy in this type of test is a consequence not of deliberate deletions or masking by superimposed noise, but of the reductions, assimilations, and contractions that normally accompany sentence production by native speakers functioning in a relaxed, informal context, (p.2) The test requires a student to l i s t e n to a sentence and write the second word. Usually t h i s word has been reduced by a contraction or run together with the preceding or following word. For example the student might hear "Where'd he go?" and be expected to write 'did'. The s c r i p t as presented by Bowen was too long for the present study. In his research, Bowen used sixty d i f f e r e n t items which he presented to the subjects twice each in the same s i t t i n g . F i r s t , a l l items in which the focal reduction resulted in the same sound were grouped together and then following that the same items were presented again in random order. As a result the t o t a l test was 120 items long. For a p i l o t run of t h i s format, the f i r s t half of the scri p t as presented by Bowen was recorded and administered to an Upper Intermediate c l a s s . The results of this p i l o t indicated that the test was too d i f f i c u l t for the Upper Beginners l e v e l and consequently would be i n e f f i c i e n t . The format was kept but f i f t e e n pairs of simpler items aimed at the approximate l e v e l of the target population were created. The items in each pair had as their focal reduction the same sort of syntactic 49 relationship. (For example a subject pronoun in a simple 'be' question.) To ensure a good range of marks, the f i r s t f i f t e e n items were spoken at a reduced speed and the second f i f t e e n at a speed approaching natural speech. A p i l o t run of t h i s test suggested i t would be sat i s f a c t o r y (r=.80, sd=3.82) and after some minor revision, a new tape was made and administered to the subjects. To ensure consistency in the results, I marked the papers. As mentioned e a r l i e r , the instructions to the students stipulated that only the second word in each sentence be written down. In a few cases the students wrote down more than one word for several of the items. Where this happened, the items were marked as incorrect, even i f the correct word was included. In three cases (of 160) the students did this for a l l of the attempted items. The Listbow tests for these subjects were deleted completely. During the administration of this form i t was found that the focal reduction of one item 1 6 was indistinguishable even to native speakers. This item was not included in any of the subsequent analyses. 3. The f i n a l supplementary test is Errcorr20 (error correction, twenty items) which has exactly the same format as ErrcorrlO (see Appendix F) except that there are twenty underlined items. During the design of the experiment, I was not sure whether there would be an error correction type of 1 6 This was item twenty. The reduction of the pronoun 'her' in the sentence "Is this her book?" was interpretable as either 'her' or 'your.' 50 exercise on the assessment battery. Since t h i s format had proved e f f i c i e n t in the p i l o t study and in previous assessment batteries, I f e l t i t imperative to include such a test (Errcorr20) in the research. When I found that there would also be one on the battery (ErrcorrlO, I kept both. 3.2.5 The Demographic Variables Students reported data used for the demographic variables on a form which they f i l l e d out on the day following the progress assessment battery. F i r s t language, age, and sex were used as reported by the students. To reduce potential errors in arithmetical c alculations, students were asked for the year and month they arrived in Canada. Data for the variable length of time in Canada (Lot) were calculated from this information. 3.3 Administration Procedures The tests developed for the purpose of the present research were administered in conjunction with a progress assessment battery at the end of the regular four month term of i n s t r u c t i o n . I did not supervise the administration of the assessment battery (Oral, Concom, ErrcorrlO, Listcomp). However, the same guidelines were followed by the instructors for both the assessment battery and the research tests. The research tests ( Listbow, L i s t s t r u , Readvoc, Listvoc, Listphon, Errcorr20) and the composition were administered in the two-week period preceding the administration of the assessment battery. The administration of the battery and Readstru was done in one day during the regular class period. The administration of the 51 oral test was spread over two days preceding the administration of the assessment battery. In the administration of each test, the instructors followed the same general rules: no d i c t i o n a r i e s or notes were allowed; no help was given by the instuctor after the actual test had started. For the paper and pencil tests, the instructor waited for a l l students to be in the room and then presented the examples and reviewed them with the students. If the students had problems, the instructor continued to give help with understanding the task. Generally, the kinds of tasks used (choosing a correct answer, writing in a word or sentence, or writing a connected set of sentences) were a l l common class exercises and presented nothing novel to the students. In the l i s t e n i n g tests, some of the tasks were unfamiliar and consequently more examples were given for these tests. The Listbow test, perhaps the most novel, included a t o t a l of eight examples. For a l l these listening-mode tests the students had the option of hearing the introduction with the examples several t imes. At the end of each of the six tests constructed and administered s p e c i f i c a l l y for the present research, the answer sheets and test papers were c o l l e c t e d by the instructor and given to me. In order to gain the cooperation of the instructors, I allowed them to use the test papers the following day for teaching purposes. For the evening classes, t h i s was allowed on the same day, as there was no security problem. In addition, as far as was possible, the answer sheets were graded 52 for the instructors and a class l i s t of results handed back to them the following day. On the day following the progress assessment test, students f i l l e d out a form which requested a variety of biographical d e t a i l s including those four (age, sex, f i r s t language, and length of time in Canada) used in the present research. The administration of the oral test was conducted over a period two days by four instructors who were experienced in the use of the method. Students l e f t the regular classroom, went to an o f f i c e for the interview, then returned to class and sent another student. 3.4 S t a t i s t i c a l Procedures The s t a t i s t i c a l procedures can be divided into three areas: data preparation and description, the factor analyses, and the subsidiary analyses. Prior to analysis, the raw data had to be transcribed into a form that could be read by computer. This was done using a microcomputer for data entry and for transfer of the data to disk storage at the University of B r i t i s h Columbia. When this was completed, the computer at the University of B r i t i s h Columbia was used to obtain summary s t a t i s t i c s , do the factor analyses and complete the subsidiary analyses. 53 3.4.1 Data Preparation And Description After the administration of a l l of the tests, the answer sheets, the progress assessment battery booklets, and copies of the Readstru and composition scores were collected, c o l l a t e d and placed in class groups. Next, the data was transcribed onto magnetic di s c . This was done using a microcomputer for data entry. Because one of the steps in the research was to calculate r e l i a b l i t y estimates, i t was necessary to encode the option chosen by each student on each item of the multiple choice tests (excluding Readstru). 1 7 In order to handle the resulting 40 thousand discrete pieces of data e f f e c t i v e l y and to diminish chances of entry error, I wrote a data entry program for a microcomputer. This program was designed so that as each set of test responses was entered, the length of the set was checked against the length of the appropriate t e s t . If these did not match, a signal was given and that p a r t i c u l a r test was re-entered. This avoided gross errors that might have resulted from entering a set of responses under the wrong heading or from adding or deleting a single response in a test and entering the following responses displaced by one item number. After the data had been transferred from floppy disk to storage on the computer system at the University of B r i t i s h Columbia, an error check was made by comparing a l i s t i n g of the data with t h i r t y randomly drawn sets of the o r i g i n a l papers. 1 7 This had been done on optical-read score sheets and analyzed separately. 54 This error check revealed n e g l i g i b l e error rates in the three categories of data: item responses, raw test scores, and biographical information. In thi s thirty-subject sample, a to t a l of seven item responses (out of 217 items for each subject) were incorrect for an error rate of .1 percent. In the same sample, two errors occurred in the entry of the thirteen biographical variables. This represents an error rate of less than .5 percent. The inspection of the eleven raw scores and subject measures for each subject in the error sample revealed two errors. Although th i s represents only a .6 percent error rate for entry in this subset of the variables, the nature of the p a r t i c u l a r errors found in this check prompted a check of the f u l l data set. The two errors that were found occurred in the same two variables (Oral and Concom). The values for measures had been transposed and i t appeared that the errors were systematic rather than random. Consequently, the values for these two variables were rechecked in the entire data set and three additional errors were discovered. The errors that were found in each category were corrected, of course, but the overall error rate indicated that the data could be used as entered and corrected, without further editing. After the error check, the computing f a c i l i t i e s at the University of B r i t i s h Columbia and subroutines from the S t a t i s t i c a l Package for the Social Sciences (SPSS) 1 8 were used to: Nie et a l . , 1975 55 1. score the multiple-choice tests 2. compute alpha r e l i a b i l i t i e s for the multiple choice tests 3. transform the reported year and month of a r r i v a l in Canada to a single variable 4. assemble a 195-subject data set that included the twelve language variables and the four demographic variables. In order to make thi s data base more stable, a l l cases which were missing data on four or more of the variables were deleted. This reduced the t o t a l number of cases to 181. Following the deletion of these cases, SPSS was used to obtain descriptive s t a t i s t i c s and histograms for each of the variables. This information was used in evaluating the tests as language measures, to determine the s u i t a b i l i t y of a l l of the variables for further s t a t i s t i c a l analysis, and to supply an overview of the demographic features of the sample. In preparation for the factor analysis, two alternative methods were used to replace missing data in the score matrix. F i r s t , the step-wise regression procedure in SPSS was used to obtain regression estimates (or 'predicted scores') of the missing data. The second approach was to replace each missing value with the mean of the respective variable. As a result of" these procedures, three data bases were available for analysis: one with missing data points, one with missing data replaced by regression estimates, and one with missing data replaced by mean scores. To decide which of the three data sets to use in the factor 56 analyses, the FACTOR procedure in SPSS was invoked and a p r i n c i p a l components solution followed by a varimax rotation was performed on a l l three sets. The solutions showed very l i t t l e d ifference. The mean difference of the highest and lowest loading for each variable on each factor was less than .060 and the single largest difference was .13 (on age, 3rd factor -.74, to -.87). A similar comparison was done on the twelve language variables. The mean difference of the highest and lowest loading for each variable on each factor in t h i s set was less than .055. The single largest difference was .12 (Listvoc on the t h i r d factor-- .78 compared to .90). This s i m i l a r i t y among the solutions for the d i f f e r e n t methods of treating missing data indicated that a choice among them could be based on c r i t e r i a which were external to the actual solution. The data set which had missing values replaced by mean values was subsequently chosen for a l l further analyses. The greatest advantage to using mean scores to f i l l in missing data i s that i t is cheaply and e a s i l y done. This was important in the later analyses because the data was s p l i t into two groups and of course the missing had to be f i l l e d in again using the new means. 3.4.2 Factor Analysis The p r i n c i p a l s t a t i s t i c a l method was factor analysis which was used for three purposes. F i r s t , as mentioned above in section 3.4.1, i t was used to decide the most appropriate treatment of missing data. Second, i t was used to choose the best subset of variables for the f i n a l solution. F i n a l l y , i t was used to arrive at representative solutions to the central 57 problem of the research: demonstrating the d i v i s i b i l i t y of language proficiency and giving meaning to the components. Incomplete component analysis followed by a varimax rotation was used in the analyses done to determine which .missing data treatment to use and in determing the f i n a l subset of variables. In these cases i t was the comparability of clusterings of variables and agreement on the number of factors for each solution that was of interest. The Kaiser-Guttman c r i t e r i o n (Harman, 1976; Hakstian and Bay, 1973) of eigenvalues greater than 1.0 was used throughout these analyses to serve as a standard for determining the number of factors. In deriving the f i n a l solutions, Hakstian and Bay's (1973) strategies for exploratory factor analysis were followed. F i r s t , an image analysis was used, followed by a varimax rotation. Then, combining inspection of the results of this with the results of a p r i n c i p l e components-varimax combination and using the c r i t e r i a recommended by Hakstian and Bay (1973), a decision was made about the number of factors. F i n a l l y , an image analysis was done again, t h i s time retaining the number of factors that had been indicated by the e a r l i e r procedures. This was followed by a Harris-Kaiser oblique transformation (independent c l u s t e r s ) , which allows the axes (and thus the factors) to be correlated. It also has the effect of bringing the factor solution closer to the c r i t e r i a of simple structure. These strategies, applied f i r s t to the f u l l 181 case set of data, were repeated for the Chinese-speakers and for the non- Chinese speakers. The missing cases in the two sub-groups were 58 f i l l e d with group means. A l l factor solutions were generated using either SPSS or the Alberta General Factor Program (Hakstian and Bay, 1973) 3.4.3 The Subsidiary Analyses The subsidiary analyses were done using SPSS. The data were f i r s t divided according to sex, and then means and t-tests were calculated for the twelve language variables and Lot and age. Next the data were regrouped according to f i r s t language (Chinese and non-Chinese) and the means and t-tests were again calculated. One of the cor r e l a t i o n matrices that were included in the output for the factor analyses was also used in the subsidiary analyses. 59 IV. PRELIMINARY STUDY AND PILOT 4.1 Preliminary Study In the f a l l of 1979, I began a project of revising and standardizing an ESL progress assessment battery in the English Language Training program of a metropolitan community college. Much of this work involved gathering s t a t i s t i c s on each of the items and subtests already being used and then using this information to improve the battery as a whole. In addition to the project of revising the established battery, I began a p a r a l l e l project of experimenting with a variety of new items and tasks. I intended that this serve the dual purposes of expanding the number of usable items and subtests and of investigating O i l e r ' s three d i v i s i b l i t y hypotheses. The target population of the battery and of the experimental items was a group of beginner-level adult ESL learners of mixed l i n g u i s t i c and educational background and ages. 1 9 The purpose of the battery was to determine which students were p r o f i c i e n t enough to move on to more advanced language study and which needed to continue to work at their present l e v e l . At the time the f i r s t set of data was gathered, the students who took this battery and the experimental items were at the fourth l e v e l of a ten-level program ranging from no ESL proficiency through to pre-college entrance. The labels for each l e v e l were Beginners 1 (B1), Beginners 2 (B2), Beginners 3 1 9 This i s e s s e n t i a l l y the same population as that in the research. 60 (B3), Beginners 4 (B4), Intermediate 1, Intermediate 2, Intermediate 3, Intermediate 4, Lower Advanced, and Upper Advanced. Although students often took longer to move through a l e v e l , the progress battery was administered every two months. In February 1980, as part of the validation procedure, the battery and an experimental l i s t e n i n g test were given to both the B4 level and the level below (B3). 2 0 Summary s t a t i s t i c s are presented in Table IX. These s t a t i s t i c s allow several .comments to be made concerning the v a l i d i t y of the battery, i t s subtests and the experimental l i s t e n i n g test. F i r s t , the differences in means between the two lev e l s , B3 and B4, show that each of the subtests was c l e a r l y distinguishing between the groups. The difference in means on the t o t a l test is p a r t i c u l a r l y large, which can be taken as one demonstration of i t s v a l i d i t y . Furthermore the overa l l estimate of r e l i a b i l i t y for the battery was moderately good for both the target B4 l e v e l (Cronbach's alpha=.79) and for the combined levels (Cronbach's alpha=.68). Since the tests had already been inspected by a number of instructors for content and face v a l i d i t y , i t can be said generally that the battery as i t stood and was used then was a v a l i d measurement of language proficiency. To explore the v a l i d i t y of the subtests in the context of O i l e r ' s d i v i s i b l i t y hypotheses, I factor analyzed the data using a truncated p r i n c i p a l components solution followed by a varimax 2 0 Descriptions of the subtests in the battery and of the experimental l i s t e n i n g test are given in Appendices E to H and L respect ively. 61 Table IX - Descriptive s t a t i s t i c s of subtests in preliminary research B3 B4 B3&4 TEST (N=71) (N=73) (N=144) 1 :Listen 1 Mean 4.41 6.11 5.27 10 items SD 1 .76 1 .53 1 .85 Range 0-8 2-9 0-9 Hoyt Rel . 35 .20 .43 2:Listen 2 Mean 5.78 7.53 6.67 10 items SD 1 .7 1 .83 2.09 Range 1-9 3-10 1-10 Hoyt Rel .40 .58 .59 3:MC (Reading) Mean 7.35 11.18 • 9.29 18 items SD 2.33 2.81 3.21 Range 2-13 5-18 2-18 Hoyt Rel .46 .50 .66 4: Error Mean 5.52 9.12 7.35 Correction SD 2.27 2.44 2.97 15 items Range 0-1 1 3-14 0-14 Hoyt Rel .46 .50 .66 5 rComposition Mean 6.14 9.16 7.67 15 marks SD 3.04 2.64 3.21 Range 0- 1 4 2-14 0-14 6 rConversation Mean 4.89 8.33 6.63 Completion SD 2.83 2.29 3.09 12 ma r k s Range 0-1 1 0-12 0-12 Oral Mean - 15.15 Interview SD - 4.38 -2 5 ma r k s Range - 5-23 - Total Test Mean 34.09 51 .44 42.89 (Excluding SD 7.54 9.22 1 1 .72 Oral) Range 1 8-53 23-65 1 8-65 Alpha Rel .49 .79 .68 62 rotation. The results are presented in Table X. While Factor I of Table X is not readily interpretable, Factor II is c l e a r l y related to l i s t e n i n g mode since both of the subtests in l i s t e n i n g mode load heavily (Listening 1 =.736, Listening 2 =.761) on Factor II. The loading of .35 of the Oral test on Factor II is consistent with any language proficiency model which included l i s t e n i n g as a component. In an interview the subject w i l l receive aural cues for his/her responses. Table X - Varimax rotated factor solution for seven English tests in preliminary tests, level B4 (n=73) TEST FACTOR I FACTOR II 1. LISTENING 1 .009 .736 2. LISTENING 2 .129 .761 3. MULTIPLE CHOICE .343 .386 4. ERROR CORR. .728 .157 5. COMPOSITION .703 .162 6. CONVER.CONPL. .809 -.117 7. ORAL .610 .350 Oile r and Hinofotis (1980) also found some support for the notion of a l i s t e n i n g factor. The moderately weak loadings of the Multiple Choice test on both Factor I (.343) and Factor II (.386) show that this test has low communality with the other tests and also suggest that the test may be f a c t o r a l l y complex. (That is to say i t may be measuring more than one component of language.) Inspection of the items in the Multiple Choice test produces evidence to support t h i s conjectured complexity, as the following examples i l l u s t r a t e : 63 #6. Why come to my party last week? 1. don't you 2. you don't 3. didn't you 4. you didn't #4. "Where is B i l l ? " "I saw him go outside with a hammer, a saw and some n a i l s . I think he i s going to ." 1. work in the garden 2. cut the f r u i t tree 3. paint the garage 4. f i x the fence #12. "How was the test?" "I got the best mark in the class." 1. What a p i t y . 2. You have my sympathy. 3. Congratulations. 4. Better luck next time. In number six, the correct answer is determined s t r u c t u r a l l y , using past tense and word order. In number four, the correct answer i s determined by the meaning of the words, connecting "hammer," "saw," and " n a i l s " with " f i x " and "fence." In number twelve, the answer is determined by the recognition of 64 the correct s o c i a l formula. As these examples show, the Multiple Choice test contained content from three different areas: vocabulary, grammar, and s o c i a l idiom. In summary, although the preliminary study was done on a group of measures that were not s p e c i f i c a l l y designed to investigate the d i v i s i b i l i t y of language proficiency, the study revealed two po t e n t i a l l y successful avenues for investigation of Oile r ' s hypotheses: contrasting l i s t e n i n g mode tests with tests in o t h e r 2 1 modes, and contrasting content in the form of vocabulary, grammar, and s o c i a l l y acceptable idiom or formulas. These two avenues were the subject of a p i l o t study. 4.2 The P i l o t Study To investigate the implication of the preliminary study that mode and content constitute s i g n i f i c a n t , contrasting sources of variance in language test data, I constructed four multiple-choice tests, two vocabulary tests and two grammar tests (see Appendices A to D), using items drawn from an item- bank that had been developed during the revision of the progress assessment battery. One of the vocabulary tests and one of the grammar tests were then converted to l i s t e n i n g mode by tape recording the items, with stem and numbered options each repeated but not presented on paper. Subjects were given answer sheets on which they c i r c l e d the number of the correct choice. 2 1 The tests which did not load on the putative l i s t e n i n g factor represent three other modes of tes t i n g : reading, writing, and speaking. 65 In August, 1981, the new reading-mode grammar test was incorporated into the regular progress assessment test as the multiple choice section. The three other experimental tests were administered in conjunction with the battery in four of the eleven Upper Beginners c l a s s e s 2 2 and the data were gathered for analysis. Unfortunately, the quality of the sound of the l i s t e n i n g comprehension test in the assessment battery was poor, resulting in the elimination of the t e s t . The oral interview was not included because at that time, the college's testing policy had changed. Previously, a l l students took the oral test, but at the time of t h i s study only students who scored above (about) 60 percent on the paper and pencil test (including the l i s t e n i n g test) were allowed to take the o r a l . The test s t a t i s t i c s are presented in Table XI and the results of a truncated p r i n c i p a l components solution followed by a varimax rotation are presented in Table XII. In Table XII, the loadings of .83 for the reading-mode structure test and.77 for the listening-mode structure test strongly associate Factor I with the measurement of grammar. Sim i l a r l y , the loadings of .78 for the reading-mode vocabulary 2 2 It should be noted that in the interim between the preliminary research and the p i l o t study, the o r i g i n a l l e v e l s B3 and B4 were merged into a single l e v e l , Upper Beginners. At the same time, the term was extended from two months to four months. The net result was that at the time of the p i l o t study, there were a greater number of students taking the progress assessment battery. 66 Table XI - Descriptive s t a t i s t i c s of subtests in p i l o t study (n=60) ITEMS MEAN SD RANGE HOYT REL NAME 1. MC STRUC(READ) 20 1 1 .43 3.25 6-18 .65 2. ERROR CORR. 20 8.57 4.30 1-18 .80 3. MC VOCAB(LISTEN) 1 6 7.45 3.13 1-14 .69 4. MC STRUC(LISTEN) 1 9 8.20 3.07 2-16 .58 5. MC VOCAB(READ) 27 16.20 4.81 3-25 .78 6. COMPOSITION (15) 8.13 3.70 0-15 7. COMPLETION (12) 6.85 3.12 0-12 - test and .88 for the listening-mode vocabulary test link Factor II with the measurement of vocabulary. Although this pattern of a content-related contrast in factors i s d i s t i n c t from the mode- related contrast found in the preliminary study, i t does support the theoretical analysis made of the complexity of the multiple choice Table XII - Varimax rotation of PC solution for 7 subtests in the p i l o t study (N=60) NAME FACTOR I FACTOR II 1. MC VOCAB (LISTEN) 2. MC VOCAB (READ) 3. MC STRUC (LIST) 4. MC STRUC (READ) 5. ERR.CORR 6. COMPOSITION 7. CONVERSATION COMPLETION .07 .29 .77 .83 .83 .76 .80 .88 .78 .30 .09 .06 .38 .19 67 test in the preliminary study. It was very unfortunate that the l i s t e n i n g comprehension test had to be deleted, for i t would appear on the surface and from the results of the preliminary research that i t i s a d i f f e r e n t kind of test. However, despite the drawback, there was enough evidence to suggest that in the presence of an expanded battery of te s t s , these four tests (the multiple-choice, reading-mode grammar and vocabulary tests, and the multiple-choice, listening-mode grammar and vocabulary tests) would act as e f f e c t i v e marker variables at least for a vocabulary/structure dichotomy and possibly for a mode contrast. 68 V. FINDINGS OF THE STUDY The factor analyses done on the three sets of d a t a 2 3 present evidence for a grammar factor, a vocabulary factor, and an age-related factor (possibly hearing). In addition, the analyses suggest the p o s s i b i l i t y that a listening-mode factor and what I have termed a 'speed of processing factor' are also influencing the variables. The analyses of the s p e c i f i c relationships between each of four demographic variables (age, sex, f i r s t language, and the length of time the subject had been in Canada--Lot) and each of the twelve language variables reveals a strong correlation between the language measures and two of the demographic variables, age and Lot. In addition, t h i s set of analyses reveals that the Chinese as a group performed d i f f e r e n t l y than the non-Chinese as a group. Because of these findings, age has been included in the matrices which are analyzed in the d i v i s i b i l i t y study and the Chinese and non- Chinese speakers have been treated separately as well as together. The analysis in which sex was the dependent variable produced no s i g n i f i c a n t findings. 2 3 The data were f i r s t analyzed as a complete set then divided into those who spoke Chinese and those who did not. For convenience, I refer to the groups as the combined group, the Chinese speakers, and the non-Chinese speakers. 69 5.1 The Factor Solutions And The D i v i s i b i l i t y Hypotheses For the d i v i s i b i l i t y hypotheses, the most s i g n i f i c a n t feature of the three d i f f e r e n t solutions (the combined group, the Chinese speakers and the non-Chinese speakers) is that the two d i f f e r e n t s t a t i s t i c a l c r i t e r i a 2 4 agree on a three-factor solution. That i s , they both support some form of d i v i s i b i l i t y in language proficiency. Of equal importance to the d i v i s i b i l i t y hypotheses is that in a l l three solutions, two of the factors are characterized by having high c o e f f i c i e n t s from the language measures. In the solution for each group, one factor consistently provides evidence for the v a l i d i t y of a grammar or structure factor while another factor supports the concept of a vocabulary factor. The t h i r d factor in each solution was associated with high c o e f f i c i e n t s from age. However, the configuration of the other c o e f f i c i e n t s on th i s factor suggest three d i f f e r e n t interpretations depending on the group for which the solution was done: hearing (the combined group and the Chinese-speaking subset), l i s t e n i n g , and 'speed of processing' (the non-Chinese speaking subset). The interpretive and th e o r e t i c a l power of a l l of these factors, though, must be tempered with the caution that in each solution the factors were correlated and thus cannot be considered as t r u l y independent sources of variation in the data. 2 4 These were the Kaiser-Guttman c r i t e r i o n of selecting factors with eigenvalues greater than one in a p r i n c i p a l components analysis, and inspection of a varimax rotation of a f u l l image analysis. 70 5 . 1 . 1 Factor Solution For Entire Set Table XIII shows the solution arrived at for the combined group. The three matrices that are presented are the Phi matrix, which shows the corr e l a t i o n of the factors with each other; the pattern matrix, which gives an indication of the rela t i v e strength of each factor in the variable; and the s t r u c t u r e 2 5 matrix, which gives the cor r e l a t i o n of the variables with the f a c t o r s . 2 5 The f i r s t factor in Table XIII can be defined by the clustering of high c o e f f i c i e n t s from the grammar tests: Readstru, ErrcorrlO, and Errcorr20. The other tests with moderate c o e f f i c i e n t s on th i s factor (Comp, Concom) are consistent with i t s interpretation as a grammar factor. 2 5 The word "structure" i s a term from the f i e l d of factor analysis and i s unrelated to structure in the sense of grammar. 2 6 When looking at an oblique factor solution (one in which the factors have been permitted to correlate) the pattern matrix gives the clearest picture of the underlying f a c t o r a l composition of the variables. This i s the matrix that is used for interpretation of the factors. The Phi matrix reveals how much the factors are correlated. If the co r r e l a t i o n between two factors i s close to zero, then the factors are acting independently and represent true differences in dimensions. (When the co r r e l a t i o n i s set at zero in a solution, the solution is termed orthogonal.) As the correlations between factors increase, their i n t e r p r e t a b i l i t y as separate influences decreases and so does th e o r e t i c a l power. Harman, (1976) gives some examples of how to interpret oblique solutions in Chapter IV. 71 Table XIII - Image analysis followed by Harris Kaiser (Independent clusters) on f u l l (n=!8l) data set PATTERN MATRIX ] [ II III ] [ II III ERRCOR10 0 74 -o .01 0 .03 0 72 -0 .51 0 .61 READSTRU 0 .74 -o .07 0 . 1 2 0 68 -0 .50 0 .46 ERRCOR20 0 63 -o .07 0 . 1 1 0 67 -0 .46 0 .59 CONCOM 0 46 -0 .01 0 .01 0 46 -0 .32 0 .39 COMP 0 38 -0 .02 0 .36 0 69 -0 . 56 0 .69 LISTSTRU 0 24 -0 . 12 0 . 19 0 49 -0 . 44 0 .47 READVOC 0 07 0 . 1 3 0 .56 0 46 -0 .34 0 .52 LISTCOMP 0 02 -0 . 1 6 0 .51 0 57 -0 .56 0 .65 LISTVOC -0 28 0 .06 0 .77 0 33 -0 .32 0 .49 ORAL -0 07 -0 .37 0 .31 0 45 -0 . 55 0 .52 LISTBOW 0 00 -o .46 0 .17 0. 48 -0 .60 0 .53 LISTPHON 0. 1 6 -0 .45 -0 .18 0. 33 -0 .43 0 .30 AGE 0. 1 5 0 .71 -0 . 18 -o. 20 0 .46 -0 .22 STRUCTURE MATRIX CORRELATION MATRIX OF FACTORS (PHI) I 11 III 1 .00 -0.71 0.85 II 1 .00 •0.76 III 1 .00 VARIANCE OF FACTORS I II III 2.00 1.10 1.49 However, the low c o e f f i c i e n t of L i s t s t r u on thi s f i r s t factor weakens the interpretation. Since ErrcorrlO, Errcorr20, and Readstru are presented in pr i n t , t h i s may also be a 'paper and pencil ' factor. Yet for v e r i f i c a t i o n of this argument, the value of Readvoc, which i s also a 'paper-and-penci1' test, should be closer to the value of the grammar tests (.63 to .74) rather than almost negligible (.07). Factor II is also interpretable. Of the three tests which have moderate c o e f f i c i e n t s on thi s factor (Listphon, Listbow, 72 Oral), two are l i s t e n i n g tests. These two tests (Listphon and Listbow) are the only two of the five l i s t e n i n g tests in which items are not repeated. Furthermore, these two are the least contextualized of the l i s t e n i n g tests. That i s , they contain the least amount of redundant information. (Listphon has none at a l l . ) This lack of extra information makes the items much more d i f f i c u l t for those subjects who, because of hearing problems, miss part or a l l of an item. I included Age, the variable with the highest c o e f f i c i e n t on t h i s factor, as a variable to be analyzed s p e c i f i c a l l y because I had noted that several of the older students were hard of hearing. In this sense, I had intended i t to act as an indi r e c t measure of hearing. Consequently, a possible interpretation of t h i s factor is that i t represents the influence of the physiological ^variable of hearing rather than a l i n g u i s t i c component of the tests. The moderate c o e f f i c i e n t on Factor III from Oral i s also consistent with i t s interpretation as a 'hearing' factor. Comprehension i s included in the evaluation guidelines for the interview and i t i s conceivable that interviewers have attributed manifestations of hearing problems to indications of a weakness in English. Asking for repetition or answering the 'wrong' question would be two such behaviours. 2 7 The interpretation of the t h i r d factor in Table XIII i s 2 7 Another possible explanation for the moderate c o e f f i c i e n t of oral on t h i s factor i s that i t indicates a bias on the part of the interviewers against older people. If this were the case, of course, the loading is unrelated to hearing. 73 straight forward. The two vocabulary tests, Listvoc and Readvoc have the highest c o e f f i c i e n t s on this factor. The fact that they are in diff e r e n t modes gives a great deal of strength to the interpretation of this as a vocabulary factor. Furthermore, the three other tests (Oral, Comp, and Listcomp) which have moderate-to-high c o e f f i c i e n t s on th i s factor are not only e n t i r e l y consistent with t h i s interpretation but also add strength to i t . These three tests are in di f f e r e n t modes which suggests .strongly that the common feature in the di f f e r e n t tests that causes them to cluster together is one of content rather than mode. Consideration of the content of these three tests also supports the interpretation of Factor III as a vocabulary factor. Correct use of words w i l l have a positive influence on both oral and composition grades. In the l i s t e n i n g - comprehension test, ten of the twenty questions require the students to draw inferences about the location, actions or characters involved in the short dialogs. Such inferences draw heavily on an understanding of s p e c i f i c words and phrases used in the dialog. The f i n a l important aspect of thi s solution i s the Phi matrix in Table XIII. It indicates that the three factors are highly correlated. One explanation for correlated factors i s that the factors themselves are responding to a single, higher- order factor. Such an explanation in th i s solution lends support to the concept of the i n d i v i s i b i l i t y of language proficiency. The two l i n g u i s t i c factors would have to be considered as di f f e r e n t manifestations of a single global 74 proficiency factor. A d i f f e r e n t explanation l i e s in the measuring devices themselves: they can be viewed as tapping d i f f e r e n t conglomerations of several (hypothetical) components of language proficiency. For example, as noted in the discussion on the t h i r d factor, several of the tests, while not being designated as 'vocabulary' tests, can be thought of as responding to differences along some 'word knowledge' dimension in addition to their putative purposes. Composition in p a r t i c u l a r must be considered a task involving the integration of grammar and vocabulary. In fact, of the twelve language measures, only one, Listphon, does not integrate structure and vocabulary into either the task or the product. With such inherent t h e o r e t i c a l and p r a c t i c a l complexity, i t is not surprising that some variables show s t a t i s t i c a l complexity 2 8 and that the factors themselves are correlated. To establish the construct v a l i d i t y of d i s t i n c t l i n g u i s t i c factors, i t would be necessary to overcome this inherent problem of complexity. Some methods of creating less complex tests such as structure- free vocabulary tests are discussed in Chapter VI. In summary, the solution for the f u l l set indicates the influence of a structure factor (Factor I ) , an age-related factor which I have argued i s best designated as a hearing factor (Factor I I ) , and a vocabulary or word knowledge factor (Factor I I I ) . The high correlation of the factors in this oblique solution r e f l e c t s the integrated nature of most of the 2 8 A complex variable is one which has moderate to high c o e f f i c i e n t s on two or more factors. 75 tasks and may also show that, in fact, there is only a single s i g n i f i c a n t source of variance influencing the language var iables. 5.1.2 The Subset Of Chinese Speakers Table XIV presents the solution arrived at for the more homogeneous subset of Chinese speakers. Here again the three factor solution is the preferred one. The pattern of c o e f f i c i e n t s on this matrix is similar to the one on the f u l l data-set matrix. This i s not surprising, of course, since the Chinese group represents two-thirds of the combined group. Factor I is s t i l l c l e a r l y a structure or grammar factor. In th i s solution, the c o e f f i c i e n t of L i s t s t r u (.45) i s higher than in the solution for the combined group (.24). This adds strength to the interpretation of Factor I as a grammar factor as opposed to a r e a d i n g - s k i l l or method factor because L i s t s t r u i s in l i s t e n i n g mode whereas the other three grammar tests (ErrcorrlO, Errcorr20, Readstru) are in reading mode. The age-related factor in this solution, although similar in configuration, does not account for as much variance as the same factor in the solution for the combined group. This is apparent not only through the r e l a t i v e l y lower values for Oral, Listbow, Listphon, and age, but also through the differences in the variance of the factor in the two solutions: 1.10 (Table XIII) in the combined group and .73 (Table XIV) in the Chinese speakers. In terms of underlying influences on the l i n g u i s t i c variables, this information suggests that in th i s solution, age 76 T a ble XIV - Image a n a l y s i s f o l l o w e d by H a r r i s K a i s e r (Independent C l u s t e r s ) on Chi n e s e s u b j e c t s (n=121) PATTERN MATRIX STRUCTURE MATRIX I I I I I I I I I I I I ERRCOR10 0. 83 -0 .02 -0 . 1 0 0. 76 -0 .50 0 .68 ERRCOR20 0. 68 0 .08 0 .07 0. 69 -o .40 0 .64 READSTRU 0. 68 -0 .08 0 .01 0. 73 -o .52 0 .67 LISTSTRU 0. 45 -0 .02 0 .07 0. 49 -0 .31 0 .47 CONCOM 0. 48 -0 .09 -0 . 1 4 0. 42 -0 .31 0 .36 COMP 0. 49 0 . 1 3 0 .36 0. 74 -o .41 0 .73 LISTCOMP 0. 1 4 -o .05 0 .49 0. 62 -o .44 0 .65 READVOC 0. 06 0 .09 0 .51 0. 47 -o .26 0 .51 LISTVOC -o. 30 0 .07 0 .76 0. 34 -0 . 1 9 0 .44 ORAL -o. 1 0 -o .22 0 .51 0. 51 -0 .47 0 .55 LISTBOW -o. 06 -o .33 0 .37 0. 49 -o .51 0 .52 LISTPHON 0. 29 -0 .38 -0 . 1 6 0. 38 -o .47 0 .33 AGE 0. 09 0 .60 0 .06 -o. 24 0 .50 -0 .22 CORRELATION MATRIX OF FACTORS (PHI) I I I I 1 .00 I I -0.64 I I I 0.91 1 .00 -0.60 I I I 1 .00 VARIANCE OF FACTORS I I I I I I 2.40 0.73 1.55 i s a c t i n g much l e s s as an i n d i r e c t measure of some p h y s i o l o g i c a l (or p s y c h o l o g i c a l ) impediment t o language l e a r n i n g or p r o d u c t i o n than i n the s o l u t i o n t o t h e l a r g e r g r o u p . 2 9 C h a r a c t e r i z e d by s a l i e n t c o e f f i c i e n t s from the two v o c a b u l a r y t e s t s , F a c t o r I I I i s c l e a r l y i n t e r p r e t a b l e as a 2 9 The average ages of the two s u b s e t s of the d a t a were v e r y c l o s e ; 31.7 f o r the Chinese speakers and 32.2 f o r the non- Ch i n e s e . T h i s s u g g e s t s t h a t the d i f f e r e n c e i n s t r e n g t h of the a g e - r e l a t e d f a c t o r i s not merely a r e f l e c t i o n of an age d i f f e r e n c e i n the two groups. 77 vocabulary factor. The difference between the t h i r d factor in th i s solution and the t h i r d factor in the solution for the combined group i s that both Oral and Listbow have larger c o e f f i c i e n t s in t h i s solution. As noted in the discussion on the combined group, Oral having a high c o e f f i c i e n t on t h i s factor i s consistent with the vocabulary interpretation. It can also be argued that a moderate c o e f f i c i e n t from Listbow is consistent with the vocabulary interpretation. Bowan describes his l i s t e n i n g test as an integrative test of English grammar. While the focus of each item is a structural point, the path to the correct answer l i e s through the comprehension of the entire sentence. Quite possibly, i t was word knowledge that presented the greatest barrier to comprehension. If t h i s were the case, then the test would be functioning more as a vocabulary test than a grammar test. The Phi matrix in t h i s solution shows once again that the factors are highly correlated. In this solution, Factor I and III (the structure factor and the vocabulary factor) are even more highly correlated (.91.) than in the solution for the combined group (.85). Such a high co r r e l a t i o n between factors weakens the t h e o r e t i c a l power of the factors as d i s t i n c t t r a i t s despite their apparently clear i d e n t i f i c a t i o n in the pattern matrix. Whether th i s correlation of factors i s a result of the complexity of the majority of the variables as mentioned e a r l i e r , or whether the i d e n t i f i c a t i o n of the two factors in the solutions as grammar and vocabulary factors is merely 'fooling oneself with factor analysis' as Nunnally (1978) puts i t , w i l l 78 have to be determined in further research. However, i t i s clear from the solutions on these two data sets (and from the solution on the non-Chinese set discussed in the following section) that the most f r u i t f u l d i r e c t i o n in any research which attempts to define separate factors in language proficiency at the le v e l of a b i l i t y of subjects in th i s study w i l l be to focus on grammar and vocabulary. In summary, in the factor analysis for the Chinese group, the solution indicated three factors, one i d e n t i f i e d as a structure or grammar factor, one i d e n t i f i e d as a vocabulary factor and the t h i r d as an age-related factor, possibly related to the physiological variable of hearing. 5.1.3 The Non-Chinese Speakers Although i t is open to c r i t i c i s m on s t a t i s t i c a l grounds, the solution for the non-Chinese speaking group is interesting in that i t i s both similar to and d i f f e r e n t from the other two s o l u t i o n s . 3 0 Table XV shows that in this solution too, Factor I, with s a l i e n t c o e f f i c i e n t s from the three reading-mode grammar tests, remains interpretable as a structure factor. In 3 0 In a fourteen-variable matrix there are 91 d i f f e r e n t c o r r e l a t i o n s . Among thi s many, especially with only 60 subjects, the prob a b i l i t y i s high that some values are spuriously large or small. Consequently i t is d i f f i c u l t to know i f a p a r t i c u l a r c o e f f i c i e n t is a result of chance co r r e l a t i o n or tr u l y r e f l e c t s the influence of an underlying factor. Second, this smaller group is in a sense a microcosmic r e f l e c t i o n of the complete data set. Within the group of 'non-chinese speakers' are 22 Vietnamese speakers. If f i r s t language i s a direct factor in learning English as a second language (Chapter VI suggests an alternative interpretation) then once again having a large homogenous group within an otherwise heterogenous sample would be expected to obscure the re s u l t s . 79 addition, Factor I I I , with high c o e f f i c i e n t s from the two contrasting mode vocabulary tests, is s t i l l i d e n t i f i a b l e as a vocabulary factor. Table XV - Image analysis followed by Harris Kaiser (Independent Clusters) on 60 non-Chinese subjects-retaining three factors I II III ] [ II III ERRCOR20 0. 72 0 .08 -0 .07 0 72 0 .42 0 . 35 ERRCOR10 0. 65 -o .04 0 . 1 3 0 70 0 .37 0 .46 READSTRU 0. 67 -o .01 -o . 1 1 0 61 0 .28 0 .24 CONCOM 0. 54 -o .06 0 . 1 0 0 56 0 .27 0 .35 COMP 0. 37 0 .40 0 .02 0 59 0 .61 0 .44 READVOC 0. 1 9 -o .06 0 . 50 0 42 0 .31 0 .56 LISTVOC -o. 1 3 0 .08 0 .66 0 26 0 .37 0 .63 LISTCOMP -o. 04 0 .47 0 .31 0 38 0 .63 0 .63 LISTSTRU 0. 1 4 0 .39 0 . 1 2 0 41 0 .53 0 .41 LISTPHON -o. 03 0 .62 0 . 33 0. 1 3 0 .43 -0 .00 ORAL -o. 07 0 .62 0 .01 0. 25 0 .59 0 .31 LISTBOW 0. 1 0 0 .69 -o .08 0. 42 0 .69 0 .35 AGE 0. 31 -o .65 -o .03 -o. 04 -0 .49 -0 .21 CORRELATION MATRIX VARIANCE OF FACTORS OF FACTORS (PHI) I I I I I I I II III I 1.00 1.98 2.20 0.95 II 0.53 1.00 III 0.53 0.55 1.00 Factor II is age-related, with a c o e f f i c i e n t of .65 for age on that factor. However, i t is c e r t a i n l y d i f f e r e n t in nature from the age-related factor found in the larger sets. In both of the previous analyses only two tests (Listbow and Listphon) clustered with age. In thi s solution there are six: Comp, Listcomp, L i s t s t r u , Listphon, Oral, Listbow. Because of t h i s , i t i s d i f f i c u l t to think of i t as simply a hearing factor. On 80 the other hand, four of the six tests that have high or moderate c o e f f i c i e n t s on this factor are listening-mode tests and another one i s the oral interview which, as has been noted, does involve l i s t e n i n g . Thus, this solution on the non-Chinese set could be construed as supporting the concept of a ' l i s t e n i n g - s k i l l ' factor or perhaps a bi-polar hearing/listening factor. Another interpretation for t h i s factor i s that i t measures the speed with which language processing tasks are handled. This interpretation stems from a common feature of tests of such apparent mode and content d i v e r s i t y as a composition task, an oral interview, Bowan's l i s t e n i n g test and a l i s t e n i n g comprehension t e s t . 3 1 A l l of these tasks can be viewed as dynamic, involving a marshalling of several s k i l l s or language components under the pressure of time. Age, too, has i t s highest c o e f f i c i e n t on thi s factor but i t i s negative. One t r a i t associated with aging may be a slowing down of the speed with which language (or any other information) i s processed. This would be reflected p a r t i c u l a r l y in tests and measures in which the information flow was continuous and not controlled by the r e c i p i e n t . Listening tests and the oral interview would both f a l l into t h i s category. A slowing down of language processing would also be reflected in language product ion tasks that were constrained by time such as the oral interview (again) ;* 1 This interpretation may also be relevent to the interpretation of the th i r d factor found in the complete data set and in the Chinese subset. There, too, the salient c o e f f i c i e n t s belonged to Oral, Listcomp, Listbow, and Comp. On the other hand, in these solutions age did not cluster with these four. 8 1 and the composition task. Establishing the v a l i d i t y of such a t r a i t i s outside the scope of the data in this study, but some suggestions regarding i t s implications for further research are presented in Chapter VI. The Phi matrix in Table XV is also of interest. The correlations between the three factors range between .53 and .55 which is somewhat less than those in the solution for the Chinese-speaker subset. This difference suggests that in this sample there is more d i s t i n c t i o n between the factors and that there was more heterogeneity in s k i l l s in the non-Chinese group than in the Chinese-speaking group. In summary, the analysis of the non-Chinese speaking subset reiterates the presence of both a grammar and a vocabulary factor. It also introduces the p o s s i b i l i t y of a hearing/listening factor (as opposed to a s t r i c t l y hearing factor found in the previous analyses) or a speed of processing factor. 5.1.4 Summary In conclusion, the factor analyses provided moderate evidence that a multiple factor solution i s preferred in the analysis of th i s matrix of language and demographic variables. It also produced support for the argument that knowledge of English grammar and knowledge of English vocabulary are i d e n t i f i a b l e sources of variance within the matrix. The analysis also suggested but did not c l e a r l y support the notion of a source of variance related to the modality (e.g., lis t e n i n g ) of the instrument. In addition, I have presented a 82 brief description of a hypothetical 'speed of processing' factor that would be amplified in tests in l i s t e n i n g mode though present in other modes. In this factor a major source of variation would be the speed with which language was processed and since l i s t e n i n g tests present language in a stream uncontrolled by the subject, these tests would be particulary influenced by such a t r a i t . In a l l solutions, the factors were highly correlated, leaving open the question of how many of the i d e n t i f i e d factors are t r u l y s i g n i f i c a n t . 5.2 The Demographic Variables And Their Relation To The Tests Inspection of the means and correlations of the variables suggested that three of the four demographic variables (age, Lot, and f i r s t language but not sex) might be s i g n i f i c a n t l y influencing the shape and interpretation of the factor solutions. Therefore, a series of subsidiary analyses was done in which the solutions for d i f f e r e n t subsets of the data were compared. While adding age to the language-variables matrix did c l a r i f y and simplify the solution, adding Lot did not. 5.2.1 Age The c o r r e l a t i o n of age with each test and p-values associated with a one t a i l e d t-test are presented in Table XVI. As noted in Section 1.4.3, age (as an indirect measure of hearing) was expected to have negative correlations associated with the l i s t e n i n g tests and zero or close to zero correlations with the paper and pencil tests. As Table XVI shows, this pattern was not found. While three listening-mode tests 83 (Listbow, Listphon, and Listcomp) do show strong negative correlations with age, two reading-mode tests (Readstru and ErrcorrlO) do also, which suggests that age may also be associated with some other negative effect on language proficiency. Furthermore, Oral, which is not a l i s t e n i n g test, has a larger negative correlation than three of the l i s t e n i n g tests (Listcomp, L i s t s t r u , and L i s t v o c ) . While some of this can be attributed to the fact that an oral interview has a listening/hearing component, i t seems reasonable to expect that a hearing problem could be compensated for by the interactive Table XVI - Correlations of age and length of time in Canada (LOT) with language measures AGE LOT R (N) P R (N) P LISTBOW -.43 1 46 .000 -.14 138 .057 LISTPHON -.41 1 26 .000 -.43 1 18 .000 ORAL -.36 1 68 .000 -.03 1 57 .375 LISTCOMP -.28 1 68 .000 .00 1 57 .481 READSTRU -.19 151 .009 -.09 141 . 1 40 ERRCOR10 -.18 1 68 .011 -.04 157 .315 LISTSTRU -.17 1 42 .021 -.02 132 .402 COMP -.11 1 62 .081 .03 1 52 .377 CONCOM -.11 1 68 .078 -.10 1 57 .099 LISTVOC -.11 1 55 .094 .21 145 .005 ERRCOR20 -.03 143 .333 -.07 1 33 .210 READVOC -.00 1 34 .480 -.01 1 24 .440 nature of the task. That i s , a subject being interviewed could 84 ask the evaluator to speak louder and more c l e a r l y . 3 2 One possible explanation for the pattern that appears in Table XVI i s that the variable age is acting as an indirect measure of two (or possibly more) otherwise independent influences for example, hearing and the 'speed of processing' factor proposed e a r l i e r . In tests which are affected by both of these influences, the effects would be amplified, making the s t a t i s t i c a l c o r r e l a t i o n large. Where only one or the other i s acting, the correlation would be proportionately reduced. Possibly t h i s 'speed of processing' factor in some tests (for example Listbow and Listphon) i s combining with . the hearing problem to increase the correlation with age, but in other tests (especially in reading and writing mode) acting alone, and thus producing a lower correlation with age. In s t i l l other tests (Listvoc or L i s t s t r u for example) this factor may not be an influence at a l l , leaving the cor r e l a t i o n of the test with age the result of the influence of the hearing factor. Clear characterization of these hypothetical age-related t r a i t s w i l l require research which includes a direct hearing measure and several tests designed to accentuate differences along the hypothesized 'speed of processing' dimension. Whatever the underlying causes, the pattern of correlations in Table XVI is a convincing argument for including age in the factor analysis. It is quite possible that a group of variables 3 2 As noted e a r l i e r , though, these very actions may be interpreted by the interviewer as indicating poor language comprehension. 85 which are not d i r e c t l y related to each other cluster together in a solution because of a common influence of age on the c o r r e l a t i o n s . Without age to identify the c l u s t e r , the interpretation would be misleading. To investigate the p o s s i b i l i t y that an age-related factor was producing spurious l i n g u i s t i c clusters in the analyses, I compared several factor solutions which included and excluded age in the matrix of language variables. These solutions are presented and discussed in Appendix M. The comparison indicated that retaining age in the matrix c l a r i f i e d the relationships among the l i n g u i s t i c variables. 5.2.2 Length Of Time In Canada Table XVI also presents the correlations of Lot with each of the language measures. Since the correlation of age with Lot was .42 *(n=155,p=.000), i t is d i f f i c u l t to know how much of the negative and near-zero correlations of Lot with the language variables i s a result of the mediating effect of Age. 3 3 That there are negative correlations of Lot with nine of the language measures i s somewhat surprising. It seems i n t u i t i v e l y unreasonable to expect that any of the language measures would be in fact negatively related to the length of time a person had been in Canada for this could imply a loss of language proficiency over the period a subject had been in Canada. It is 3 3 Two unrelated variables can show a s t a t i s t i c a l c o r r e l a t i o n i f they are both correlated to a t h i r d . S i m i l a r l y a relationship that does exist between two variables can be hidden i f the two variables are both correlated to a t h i r d variable but in an opposite manner. 86 more reasonable to suggest that some t h i r d influence i s clouding the r e s u l t . One suggestion is that a group of older students has 'plateaued' 3 * because of language f o s s i l i z a t i o n ( c f . Selinker and Lamendella, 1979; V i g i l and O i l e r , 1976) at something less than the necessary proficiency to exit this l e v e l . Other students who arrived in Canada at the same time as these older students may have already been promoted out of the le v e l at the time the tests were given and thus these younger, more capable students who had been in Canada equally long as the older students would not have been included in the data. In addition, younger, more capable students who arrived in Canada after the older subjects and were included in the data may have surpassed their elders' a b i l i t y within a single term. Under these conditions i t is easy to see that even i f there were no re l a t i o n between the length of time a subject had been in Canada and his language proficiency, for this set of subjects there would be a pattern of correlations similar to that in Table XVI. Because of the generally complex i n t e r r e l a t i o n s h i p of age and Lot and the language measures, I f e l t i t was necessary to compare solutions including and excluding Lot in a manner similar to that done with age. However, although including Lot in the matrix did simplify the solution (see Appendix M) in 3" At the college where the data was gathered there was, at the time of the research, a class s p e c i f i c a l l y for such older students who did not seem to be progessing in the regular classes. Because of budget constraints t h i s was offered at only one time during the day. It i s reasonable to suppose that similar students were attending the regular beginners classes at the other three times during the day. 87 terms of s t a t i s t i c a l c o m p l e x i t y , i t d i d not improve the i n t e r p r e t a b i 1 i t y of the f a c t o r s and so Lot was not i n c l u d e d i n the m a t r i x i n the i n v e s t i g a t i o n of the d i v i s i b i l i t y h y p o t heses. 5.2.3 F i r s t Language Table XVII p r e s e n t s the r e s u l t s of the i n v e s t i g a t i o n of the a s s o c i a t i o n of f i r s t language w i t h t e s t s c o r e s . These r e s u l t s s t r o n g l y support the argument f o r the n e c e s s i t y of a n a l y z i n g the C hinese and non-Chinese s e p a r a t e l y i n the f a c t o r a n a l y s e s . The t - v a l u e s and a s s o c i a t e d t e s t s of s i g n i f i c a n c e are g i v e n i n T able X V I I , not as a means of a c c e p t i n g or r e j e c t i n g hypotheses bu t , as K r u s k e l l (1968, p. 238) e x p r e s s e s i t , "as a means of measuring the s u r p r i s i n g n e s s of the o b s e r v e d . . . " p a t t e r n s i n the d a t a . The r e s u l t s a r e somewhat s u r p r i s i n g . F i r s t , i n g e n e r a l , the d ata suggest t h a t something o t h e r than c o n s t r u c t i o n - i n d u c e d b i a s i s i n f l u e n c i n g the v a r i a b l e s . The t h r e e t e s t s ( L i s t c o m p , L i s t s t r u , Readvoc) t h a t m o t i v a t e d a c o n t r a s t between Chinese and non-Chinese speakers are marked w i t h an a s t e r i s k . 3 5 Two of the t h r e e p a r t i c u l a r t e s t s , Readvoc and L i s t v o c , do show s t a t i s i c a l l y s i g n i f i c a n t d i f f e r e n c e s i n means. However, the t h i r d , L i s t s t r u , which had been s u b j e c t e d t o the most e x t e n s i v e r e v i s i o n , does n o t . F u r t h e r m o r e , Comp, which i s not m u l t i p l e c h o i c e and t h e r e f o r e not s u s c e p t i b l e t o In these t e s t s , the s e l e c t i o n of items and d i s t r a c t o r s had been based on item-response s t a t i s t i c s g a t h e r e d on samples from the same g e n e r a l p o p u l a t i o n as the r e s e a r c h i t s e l f . L i s t s t r u i n p a r t i c u l a r was composed of items t h a t had undergone s e v e r a l r e v i s i o n s . 88 Table XVII - Analysis of means grouped by language VARIABLE N OF t- DGRS 2TAIL CASES MEAN S .D. VALUE FRDM PROB. *READVOC CHINESE 1 00 15. 21 4 . 1 6 3 .82 1 44 0 .000 OTHERS 46 17. 86 3 .29 COMP CHINESE 1 16 14. 1 6 4 .34 2 .77 1 73 0 .006 OTHERS 59 16. 24 5 .31 *LISTVOC CHINESE 1 1 0 7. 50 2 .95 3 .21 • 165 0 .002 OTHERS 57 9. 10 3 .26 ORAL CHINESE 1 22 14. 05 2 .46 2 .29 1 79 0 .023 OTHERS 59 15. 00 2 .87 LISTCOMP CHINESE 1 22 12. 43 3 .44 2 .28 1 79 0 .024 OTHERS 59 13. 67 3 .42 CONCOM CHINESE 1 22 7. 09 1 .87 2 .00 1 79 0 .047 OTHERS 59 7. 68 1 .89 LISTBOW CHINESE 1 08 14. 93 4 .49 1 .34 1 55 0 . 182 OTHERS 49 16. 00 4 .86 ERRCOR20 CHINESE 101 7. 38 3 .61 1 .49 151 0 . 1 39 OTHERS 52 8. 32 3 .88 *LISTSTRU CHINESE 1 06 12. 1 0 3 .51 1 . 1 2 1 53 0 .264 OTHERS 49. 12. 81 4 .02 READSTRU CHINESE 1 1 1 18. 70 3 .87 1 . 1 1 1 62 0 .269 OTHERS 53 19. 43 4 .10 LISTPHON CHINESE 92 45. 89 9 .97 0 .80 1 32 0 .424 OTHERS 42 47. 28 7 .69 ERRCOR10 CHINESE 1 22 5. 87 2 .46 0 .41 1 79 0 .679 OTHERS 59 6. 03 2 .22 LOT CHINESE 1 10 31 . 36 32 .99 0 .33 155 0 .738 OTHERS 47 29. 46 31 .32 AGE CHINESE 1 16 31 . 68 1 1 .80 0 .23 166 0 .819 OTHERS 52 32. 1 5 1 3 .46 *- Multiple choice tests which underwent extensive revision. 89 construction-induced bias, also displays a s t a t i s t i c a l l y s i g n i f i c a n t difference in means. Thus, the pattern is not that predicted by the hypothesized construction-induced bias. The second 'surprising' fact about Table XVII is that the mean of the Chinese speakers is lower on a l l of the tests. That these two l i n g u i s t i c a l l y defined groups appear to d i f f e r in proficiency does not necessarily indicate that the difference results from the difference in language. It could indicate that the two groups d i f f e r e d s i g n i f i c a n t l y on some other demographic variable such as l e v e l of education. Table XVII does suggest though, that the source of the difference is not linked to age or Lot. The means of the two groups on these two variables are very close in value. Resolution of the exact nature of the source of the differences w i l l need further research and some suggestions regarding t h i s w i l l be made in Chapter Six. 5.2.4 Sex Table XVIII presents the results of the investigation of the effect of sex on test scores. It presents no evidence to suggest that in this research sex would be linked to any factors that might arise in the factor analysis. The means and standard deviations suggest homogeneity of the two samples. Although two of the differences in means (Listphon, Lot) do approach s t a t i s t i c a l significance, in a comparison of th i s many means i t is more appropriate to consider these as random events than to attempt to interpret them. In consequence, the analysis of the variable sex is not pursued further. 90 Table XVIII - Analysis of means, subjects grouped by sex VARIABLE N OF f DGRS 2-TAIL CASES MEAN S.D. VALUE FRDM PROB. COMP F 75 15, .26 4, .95 0, .80 1 62 0, .427 M 89 14, .66 4, .59 LEVIF1 F 69 18, .89 3, .90 - o , . 35 1 50 0, .728 M 83 19, . 1 2 3, .90 ERRCOR10 F 76 6, .01 2, .39 0, .35 1 67 0, .723 M 93 5, .88 2, .39 ERRCOR20 F 67 7, .79 3, .73 0, . 38 1 42 0, .701 M 77 7, .55 3, .51 CONCOM F 76 7, .42 1 , .64 0, .85 1 67 0, .397 M 93 7, .18 2, .02 ORAL F 76 14, .57 2, .60 0. .94 1 67 0, .351 M 93 14. .19 2, .64 READVOC F 59 15. .72 4, .27 -1 . ,01 1 32 0, .317 M 75 16. ,42 3, .74 LISTCOMP F 76 12. ,97 3, ,26 0. ,29 1 67 0, .769 M 93 12. ,81 3. ,57 LISTVOC F 71 8. ,01 3. ,22 -0. , 36 1 55 0. ,717 M 86 8. ,19 3. ,08 LISTPHON F 60 44. ,51 10. ,78 -1 . ,95 1 23 0. ,054 M 65 47. ,81 8. ,04 LISTSTRU F 66 12. ,53 3. ,36 0. ,21 141 0. ,835 M 77 12. ,40 3. ,87 LISTBOW F 65 15. ,44 4. ,21 0. , 1 6 1 45 0. ,876 M 82 15. ,32 4. ,72 LOT F 69 38. ,04 42. ,60 2. ,49 1 54 0. ,014 M 87 25. ,18 19. ,87 AGE F 73 32. 35 10. ,93 0. ,42 161 0. ,675 M 90 31 . 53 13. ,54 91 VI. SUMMARY, CONCLUSIONS, AND IMPLICATIONS 6.1 Summary This study investigated the interrelationships among twelve English language measures and four demographic variables in an attempt to answer the questions 'Is second language proficiency d i v i s i b l e into components and i f so what are the components?' The data on the variables were gathered on adult ESL learners in a language course at a community college. The language measures came from three sources. Six were constructed s p e c i f i c a l l y for the research ( L i s t s t r u , Readvoc, Listvoc, Errcorr20, Listbow, Listphon); four were subtests used in a progress assessment battery at the college (Concom, Listcomp, ErrcorrlO, Oral); and two were measures used in conjunction with the assessment tests but developed independently as separate projects (ComjD, Readstru). In order to distinguish between l i n g u i s t i c and possible non-linguistic sources of variation in the students' scores, information on four demographic variables (age, sex, length of time in the country, and f i r s t language) were also gathered. During the course of the analysis, i t became clear that the e f f e c t s of age and f i r s t language were influencing the relationships among the variables. In addition, there was a high, positive correlation between age and the length of time the subjects had been in Canada. As a result of these findings, the design of the analysis was extended to account for and c l a r i f y the e f f e c t s these variables had on the language 92 measures. This was done by including age in the correlation matrix used for analysis and by analyzing two subsets of the data, Chinese speakers and non-Chinese speakers, independently of each other. larger group for further analysis. The p r i n c i p a l method of analysis in the investigation of the d i v i s i b i l i t y hypotheses was that recommended by Hakstian and Bay(l973): image analysis followed by an oblique transformation (Harris-Kaiser independent c l u s t e r s ) . The interpretation of the factors focussed on the mode (in p a r t i c u l a r l i s t e n i n g and reading) and content (in p a r t i c u l a r vocabulary and grammar) of the tests and on the theoretical effects of the demographic variable age. The s t a t i s t i c a l methods used in the four subsidiary problems were comparison of group means ( sex and f i r s t language) and correlations (age and Lot) with the language measures. 6.1.1 The Factor Analyses And Interpretation In the analysis related to the d i v i s i b i l i t y hypotheses, the data were treated f i r s t as a complete (181-subjects) set and then divided into two groups, Chinese speakers (121 subjects) and non-Chinese speakers (60 subjects). In each of the three analyses, the solution indicated three underlying factors influencing the language variables: a structure or grammar factor, a vocabulary or word knowledge factor and an age-related factor. The s p e c i f i c interpretation of the age-related factor is d i f f e r e n t according to the solution: for the two larger sets i t appears to be related to hearing, in the smallest set (non- Chinese speakers) i t i s more complicated and can be interpreted 93 as either a bi-polar listening/hearing factor or as a 'speed of processing' factor. Although the d i s t i n c t nature of the three factors is apparent in each solution, the high correlation between the factors prevents their unqualified interpretation as t r a i t s which operate independently. The clearest and most interpretable factor to appear in each of the solutions is a 'structure' or grammar factor. In each solution, three of the structure-content tests (Readstru, ErrcorrlO, and Errcorr20) clustered together. The strongest i d e n t i f i c a t i o n of this structure factor is in the solution for the Chinese-speaking subset. In t h i s analysis, a l l four of the measures designated as structure tests (the three mentioned previously and L i s t s t r u ) have high c o e f f i c i e n t s on Factor I and n e g l i g i b l e c o e f f i c i e n t s on Factors II and I I I . Furthermore, those measures (Concom and Comp) that also loaded on the grammar factor are consistent with the interpretation since the evaluation method in both of these tests includes consideration of grammar. In addition to the structure factor, a c l u s t e r i n g of variables that i s interpretable as a vocabulary factor appears in each of the three solutions. This cluster i s i d e n t i f i a b l e by the presence of moderate-to-high c o e f f i c i e n t s from the two vocabulary tests, Listvoc and Readvoc. However, what strengthens t h i s interpretation i s that the two tests are in d i f f e r e n t modes and therefore the commonality cannot be attributed to modality. Further strength is given to the vocabulary interpretation by the contrasting modes yet 94 comparable content of other tests that have s i g n i f i c a n t c o e f f i c i e n t s on this factor. For example, Comp, Oral, and Listcomp are a l l quite d i f f e r e n t methods of testing, yet c l e a r l y performance on a l l three w i l l be p o s i t i v e l y influenced by recognition of or correct use of words. The t h i r d factor in each set was i d e n t i f i a b l e by the high c o e f f i c i e n t of age, though the interpretation of the factor varies depending on data set. In the complete set and in the Chinese-speaking set, age clustered with Listphon and Listbow. Certain features of these tests put a part i c u l a r y heavy load on the students' hearing a b i l i t y . F i r s t , Listphon is a l i s t e n i n g test in which the student must distinguish between minimal pairs 3 6 which are presented devoid of context. Second, the items in Listbow are spoken at near-natural speed, unlike the other l i s t e n i n g tests. F i n a l l y , while the items in the other tests are repeated, in these two tests they are not. These aspects of the tests support the suggestion that the age-related factor in the solution for the combined groups and for the Chinese- speakers is best interpreted as a hearing factor. The age-related factor in the solution for the non-Chinese speakers was d i f f e r e n t from the solutions for the other two data sets in that age clustered with four l i s t e n i n g tests ( L i s t s t r u , Listcomp, Listphon, Listbow), the oral test and to some extent the composition t e s t . One explanation of t h i s c l u s t e r i n g i s 3 6 A minimal pair i s a pair of words which d i f f e r in only one phoneme, e.g., pin and pen. 95 that i t represents a chance 3 7 c o n s t e l l a t i o n of tests and does not indicate a true relationship between any par t i c u l a r pair of variables. However, since four out of five of the l i s t e n i n g tests have their highest c o e f f i c i e n t s on this factor, i f i t is not an a r t i f a c t of chance, i t i s c l e a r l y related to l i s t e n i n g mode. Because of thi s and because the c o e f f i c i e n t s of a l l of these tests are opposite in sign to that of age, i t is reasonable to interpret i t as a bi-polar listening/hearing factor. Another possible interpretation of the age-related factor takes into account the s i g n i f i c a n t c o e f f i c i e n t s of Comp (.47) and Oral (.62) on thi s factor. This interpretation postulates a 'speed of processing' dimension. According to this explanation, in tests which are constrained by time l i m i t s , the greatest source of variation among students i s their d i f f e r i n g a b i l i t y to integrate a l l of their l i n g u i s t i c components of 'proficiency' quickly. Listening tests and oral interviews (in which the speed of the language input or stimulus i s not controlled by the subject) and time-limited, in-class compositions would exhibit the common influence of such a factor. While the interpretation of the dif f e r e n t clusters in the dif f e r e n t solutions i s straight forward, their strength i s not such that these clusters can be said to represent strong, independent aspects of language proficiency or that the language s k i l l of these sets of subjects i s characterized by d i s t i n c t s u b - s k i l l s that account for most of the variance in the tests. 7 7 As mentioned e a r l i e r , the solution for thi s group i s subject to c r i t i c i s m on s t a t i s t i c a l grounds because of the size of the sample. 96 In each of the solutions, a l l three of the factors which were derived were highly c o r r e l a t e d . 3 8 These correlations have two possible explanations. F i r s t , they may indicate that in fact only a single dominant factor i s influencing the language variables despite the agreement of the separate c r i t e r i a used to resolve the problem of the number of factors. Three factors were c l e a r l y indicated both by the Harris-Kaiser c r i t e r i a of the number of eigenvalues greater than one in a p r i n c i p a l components solution and by the method of inspecting a varimax rotation of a f u l l image analysis for the number of factors with s i g n i f i c a n t loadings. A second possible explanation i s that the tests themselves overlap in tapping not only a large general factor but also various combinations of other underlying factors to such a degree that the variables are too complex to produce any solution that is both simple in structure and yet s t i l l uncorrelated. 6.1.2 The Demographic Variables The subsidiary analyses of the four demographic variables suggest that the age and f i r s t language of the subjects (as represented by the dichotomy Chinese speakers and non-Chinese speakers) are strongly associated with performance on the language tests. The sex of the subjects, on the other hand, 3 8 In terms of individuals, the c o r r e l a t i o n of factors suggests that those who performed well in the tests of one cluster also tended to do well in the tests of another c l u s t e r . In the combined groups, the age-related factor was negatively correlated with the two l i n g u i s t i c factors. This indicates that subjects who were older or did poorly on the tests in t h i s cluster tended to do poorly on most tests. 97 does not appear to be associated with language performance at a l l . The variable Lot (length of time in Canada), possibly because of i t s correlation of .42 with age, showed a complicated and ambiguous set of relationships with the language variables. 6.1.3 Age The variable age appears to be an indirect measure of one or more underlying factors which i n h i b i t performance on the language tests and possibly language acqui s i t i o n i t s e l f . As mentioned e a r l i e r , one of the i n h i b i t i n g factors may be hearing while another could be a slowing down of mental processes in general and language processes (speed of processing) in part i c u l a r . 6.1.4 F i r s t Language When subjects were categorized as Chinese-speakers and non- Chinese speakers and the data re-analyzed, the means of the Chinese speaking group were lower on a l l twelve language variables. This finding may not be generalizable to the larger population of language learners because the category 'Chinese- speakers' may be biased by a covert factor, previous education for example. However, within this set of data, the differences between the two groups was of s u f f i c i e n t size to warrant separate analyses of the two groups. 98 6.1.5 Length Of Time In Canada The correlation between the measure of the length of time the students had been in Canada and the language scores indicated more of a link to age than I had expected. I have suggested that this s t a t i s t i c a l link i s an a r t i f a c t of the language program at the college rather than a general demographic connection. As a result of the testing and promotion system at the college, there is a tendency for less capable students to be moved along to and then stopped and held at the proficiency l e v e l in the program at which th i s research was done. Many of these slower students are older and consequently there is probably in the sample an unrepresentatively high proportion of older students who have been in Canada a longer time time than the younger students have. This s£udy suggests potential differences, but relationship between the length of time in Canada and language acquistion w i l l need to be re-addressed in other research. 6.2 Conclusions Of O i l e r ' s (1979a) three d i v i s i b i l i t y hypotheses, this study tends to support H3 or the model of a general factor plus small s p e c i f i c factors. F i r s t , the high correlations of the factors in each solution suggest that a strong global or general proficiency factor is operating, causing moderate correlations among a l l of the language measures. Second, the consistent emergence of the grammar, the vocabulary, and the age-related factors in each solution argues very strongly for the concept of 99 multiple, i d e n t i f i a b l e , s p e c i f i c factors underlying the data, factors which must be taken into account when developing models of either language proficiency or performance on language tests. A proficiency model based on the analyses of the combined group or of the Chinese-speakers subset would include a general factor, a grammar factor and a vocabulary factor. In these two analyses, I believe, the age-related factors r e f l e c t an underlying physiological rather than l i n g u i s t i c factor and consquently should not be. included in a model of language proficiency. If i t were desired to explain language test performance then, c e r t a i n l y , the age-related factor would need to be more c l e a r l y defined and then added to the proficiency model. A model to f i t the non-Chinese speaker . subset cannot, of course, be f u l l y defined u n t i l some better i d e n t i f i c a t i o n of the age-related factor in this set i s available. However, th i s model, too, would include a large general factor and the two s p e c i f i c factors, vocabulary and structure. The age-related factor in the solution for the subset of non-Chinese speakers suggested two interpretations: a true l i s t e n i n g - s k i l l factor or a speed of processing factor. If either or both of these represent real factors, then a language proficiency model must include them, too. This research extends Powers (1982) contention that i t is necessary to describe the sample c l e a r l y in a d i v i s i b i l i t y study. Not only does i t appear necessary to describe the sample, but i t seems c r u c i a l to at least explore the 1 00 relationships between the language variables and the par t i c u l a r demographic and experiential parameters of a sample. In this study, the age and f i r s t language of the subjects and the length of time they had been in Canada were c l e a r l y linked to some or a l l of the language variables. The addition of age actually helped c l a r i f y the clusters of l i n g u i s t i c variables. Furthermore, since the solutions for the two subsets were not id e n t i c a l , i t can be inferred that the significance of the many underlying factors related to language proficiency may vary according to changes in the nature of the sample. It is c l e a r l y in the interest of language acquistion and language testing research to know which factors are stable through the whole population and which factors lose or gain signicance depending on c h a r a c t e r i s t i c s of the sample. 6.3 Implications And Suggestions For Future Research The findings of t h i s study have clear implications for future research design. F i r s t , they show necessity of fundamentally pure tests i f orthogonal factors are to be derived. Almost as important, they indicate that non-linguistic variables must be taken into account i f a clear l i n g u i s t i c solution i s desired. The s p e c i f i c non-linguistic variables suggested by the research are age, hearing, and any categorical variables such as f i r s t language or l e v e l of education that may divide the sample into groups that perform s i g n i f i c a n t l y d i f f e r e n t l y from each other. 101 6.3.1 The Correlation Of The Factors One reason that I have suggested for the correlation of the factors in the factor analyses is the complexity of the tests. In order to establish a dichotomy between grammar and vocabulary, a variety of tests w i l l need to be constucted that put as great a load as possible on one t r a i t while remaining as free as possible of the influence of the other. For vocabulary this might include a simple synonym/antonymn test, or even a test in which students l i s t as many words as they know related to some subject (e.g., parts of the body, kitchen utensils e t c . ) . S t i l l another structure-free vocabulary test would be of the category/example type where students indicate the word that 'doesn't f i t . ' Creating a vocabulary-free grammar test seems impossible but, by ensuring that a l l vocabulary used in the grammar tests consists of simple, high frequency words, part of this problem can be solved. The structure tests used in the current research follow this approach to some extent. S t i l l another way to measure both grammar and vocabulary independently may be to evaluate a composition in two objective ways: f i r s t , by taking some simple measure of grammatical accuracy and then by using a measure of d i v e r s i t y of vocabulary. The measure of grammatical accuracy might merely be based on the 1 0 2 number or percentage of correct sentences. 3 9 For a vocabulary measure, a word frequency count may s u f f i c e . Such a count is performed quickly by a computer. There are already numerous word processing programs that w i l l not only count and l i s t the number of d i f f e r e n t words to appear in a passage but also check s p e l l i n g . 6.3.2 Age Certainly in studies where the age range is as broad as i t is in t h i s , some account must be taken of the effect of age on the d i f f e r e n t variables in the study. In t h i s study, not only did a l l language measures correlate negatively with age, certain of them were affected more than others. By ignoring age and omitting i t from the equation, a researcher runs a strong risk of keeping a non-linguistic factor in a matrix but providing no way of i d e n t i f y i n g i t in a solution. 6.3.3 Hearing In t h i s research I have tentatively linked the age-related factor to hearing. The evidence to support t h i s i s not conclusive, but i t seems to warrant further studies into this p a r t i c u l a r aspect of language learning. Clearly, a hearing measure is needed in research where l i s t e n i n g tests are 3 9 At the l e v e l of language competence of the subjects of this research, i t might be appropriate to ignore punctuation and s p e l l i n g and to include p a r t i a l marks for correct clauses. By being too s t r i c t there may not be a wide enough spread in scores for the measure to be useful. That is to say, while two compositions may contain an equal percentage of correct sentences, one may have far more 'almost' correct sentences than the other. 1 03 involved, p a r t i c u l a r l y where the upper bound of age is as high as i t is in this study. Comparing hearing c a p a b i l i t y with performance on any test (and on progress in language acquisition in general) may prove i n s t r u c t i v e as well, for obviously being hard of hearing would affect aural learning in general, not only performance on a l i s t e n i n g t e s t . Such research would not only be mandatory in the development of theories concerning l i s t e n i n g but may also be very useful in counselling adult learners. 6.3.4 F i r s t Language I d e n t i f i c a t i o n of large, homogeneous subgroups can c l a r i f y a study. In t h i s study the homogenous group that was i d e n t i f i e d was 'Chinese-speakers.' There may, however be more ef f e c t i v e ways of grouping subjects or of defining variables that w i l l c l a r i f y the apparent differences between groups. For example, although treating the subset of Chinese speakers separately gave a clearer factor solution than treating the entire group did, i t may have been even more ef f e c t i v e to have obtained an approximate measure of the subjects' exposure to formal education. A number of my Chinese students have commented on the fact that the Cultural Revolution disrupted their education. If a lack of formal education impedes language ac q u i s i t i o n , then possibly some subgroup of Chinese accounts for the poorer performance of the group as a whole. 1 04 6.3.5 Length Of Time In An English Speaking Environment The hypothesis that the length of time a subject spends in an English speaking environment is related more strongly to some components of English proficiency than to others i s too strong i n t u i t i v e l y to abandon. Powers' (1982) interpretation of his e a r l i e r study (Swinton and Powers, 1980) gives indi r e c t support to t h i s concept. He has suggested that a vocabulary factor is most influenced by experience and exposure. Although the research was done on subjects who had studied English as a foreign language (EFL), i t i s reasonable to expect a similar result for ESL subjects. The reason my reseach did not c l a r i f y the issues was that the relationship between the l i n g u i s t i c , variables and Lot had been confounded by Lot's r e l a t i o n with age. This appears to have been a result of a group of older students who had become 'stuck' at the one l e v e l . If possible, a sample should be taken from a program in which students remain only a limited amount of t i m e . 4 0 The suggestion for further research is to deal with a program that does not have proficiency barriers at various points in the program. 1 , 0 One such program would be the five month, Canada Manpower sponsored language classes given in various colleges and other centres across the country. 1 05 BIBLIOGRAPHY Allen, J.P.B., and Alan Davies (eds.) 1979. Testing and Experimental Methods. LondonrOxford University Press Bachman, L.F., and A.S. Palmer. 1982. "The Construct V a l i d i t y of Some Components of Communicative Proficiency." TESOL Quarterly 16:449-465 Bachman, L.F., and A.S. Palmer. 1981. "The Construct Validation of the FSI Oral Interview." Language Learning 31:67-86 Bowen, J.D. 1975. An Experimental Integrative test of English Work Papers in Teaching English as a Second Language, Vol. 9 Los Angeles: University of C a l i f o r n i a , Dept. of English Borg, W.R., and M.D. G a l l . 1979. Educational Research: An Introduction New York: Longman, Inc. Canale, M., and M. Swain. 1980. "Theoretical Bases of Communicative Approaches to Second language Teaching and Testing." Applied L i n g u i s t i c s 1:1-47 C a r r o l l , B.J. 1980. Testing Communicative Performance Oxford: Pergamon Press C a r r o l l , J.B. 1968. "The Psychology of Language Testing." In Alan Davies, ed. Language Testing Symposium: A Psycholinquistic Approach. London:Oxford University Press, 1968,p. 46-69 C a t t e l l , R.B. 1962. "The Basis of Recognition and Interpretation of Factors." Educational and Psychological Measurement 22:667-697 C a t t e l l , R.B., 1958. "Extracting the Correct Number of Factors in Factor Analysis" Educational and Psychological Measurement 18:791-837 Cooper, Robert L. 1968. "An Elaborated Language Testing Model." Language Learning 3: 57-65 Davies, Alan (ed.) 1968. Language Testing Symposium: A Psycholinguistic Approach. London:Oxford University Press Diederich, P.B. 1974. Measuring Growth in English. Champaign, I l l i n o i s : National Council of Teachers of Engli sh Ebel, Robert L. 1972. Essentials of Educational Measurement Englewood C l i f f s , New Jersey: -Prentice-Hall 1 06 I nc. 15. Farhady, H. 1979. "The Disjunctive Fallacy Between Discrete Point and Integrative Tests." TESOL Quarterly 13:347-357 16. Flahive, Douglas E. 1980. "Separating the g Factor from Reading Comprehension." In John W. O i l e r J r . , and Kyle Perkins, eds. Research in Language Testing Massachusetts: Newbury House, 1980, p. 34-46 17. Gardner, R.C., and L. Gliksman. 1982. " On 'Gardner on Affect': A Discussion of V a l i d i t y as i t Relates to the Attitude/Motivation Test Battery: A Response From Gardner." Language Learning 32:191-200 18. Gorsuch, R.L. 1974. Factor Analysis Philadelphia: Saunders 19. Hakstian, A. Ralph, and Kyung S. Bay. 1973. User's Manual to Accompany the Alberta General Factor Analysis Program (AGFAP). Alberta:University of Alberta. 20. Hakstian, A.R., and V.J. Muller. 1973. "Some Notes on the Number of Factors Problem." Multivariate Behavioral Research 4:461-475 21. Harman, H.H. 1976. Modern Factor Analysis (Third Edition) Chicago: The University of Chicago Press 22. Harris, David P. 1968. Testing English as a Second Language. New York:McGraw-Hill 23. Hendricks, D. George Scholz, Randon Sperling, Marianne Johnson and Lela Vandenberg. 1980. "Oral Proficiency Testing in an Intensive English Language Program." In John W. O i l e r J r . , and Kyle Perkins, eds. Research in Language Testing Massachusetts: Newbury House, 1980, p. 77-90 24. Ingram, E. 1978. "The Psycholinguistic Basis." in Bernard Spolsky, ed. Papers in Applied L i n g u i s t i c s : Advances in Language Testing Series:2 Approaches to Language Testing, Arlington Verginia: Center for Applied L i n g u i s t i c s , 1978, p. 39-58 25. Johansson, S. 1973. " P a r t i a l Dictation as a Test of Foreign Language proficiency." Swedish-English Contrastive Studies Report No. 3, Department of English, Lund University, Sweden 26. Joreskog, K.G. 1969. " A General Approach to Confirmatory Maximum Likelihood Factor Analysis." Psychometrica 32:443-482 1 07 27. Joreskog, K.G. 1978. "Structural Analysis of Covariance and Correlation Matrices" Psychometr ica 43: 443-477 28. Kendall, M. 1980. Multivariate Analysis. London : Charles G r i f f i n and Company 29. Krishnaiah, P.R. and J.C. Lee. 1980. "Likelihood Ratio Tests for Mean Vectors and Covariance Matrices" in P.R. Krishnaiah, Handbook of S t a t i s t i c s , Vol.1 Analysis of Variance. Amsterdam, New York: North-Holland Publishing Company, 1980 30. Kruskell, J. 1968. "Tests of Significance" in David L. S i l l s , ed. Encyclopedia of the Social Sciences Volume 14, The MacMillan Company and The Free Press, 1968, p. 238-249 31. Magnusson, D. 1967. Test Theory. Addison Wesley 32. Mullins, K.A. 1980. "Rater R e l i a b i l i t y and Oral Proficiency Evaluations." In John W. Oiler J r . , and Kyle Perkins, eds. Research in Language Testing Massachusetts: Newbury House, 1980, p. 91-101 33. Munby, J.L. 1978. Communicative Syllabus Design. Cambridge University Press 34. Nilsen, D.F., and A.P. Nilsen. 1973. Pronunciation Contrasts in English New York: Regents Publishing Company. 35. Nie, N., CH. H u l l , J.G. Jenkins, K. Steinbrenner, D.H. Bent. 1975. SPSS: S t a t i s t i c a l Package for the Social Sciences• McGraw-Hill Book Company. 36. Nunnally, J.C. 1978. Psychometric Theory. McGraw-Hill Book Company 37. O i l e r , John W. J r . 1976a. " Language Testing Today: an Interview with John O i l e r . " English Teaching Forum. July. P. 22-27 38. O i l e r , John W. J r . 1976b." A Program for Language Testing Research" Language Learning. 4: 141-165 39. O i l e r , John W. J r . , 1978. "Pragmatics and Language Testing" in Bernard Spolsky, ed. Papers in Applied L i n g u i s t i c s : Advances in Language Testing Series:2 Approaches to Language Testing, Arlington V i r g i n i a : Center for Applied L i n g u i s t i c s , 1978, p. 39-58 40. O i l e r , John W. J r . ed. 1979a. Language Tests at School: A Pragmatic Approach London:Longman 41. O i l e r , John W. J r . 1979b. (Class notes from course 108 Advanced Issues in Language Testing at F i r s t Annual TESOL Summer Institute) 42. O i l e r , John W. J r . 1981. "Language Testing Research (1979-1980)" in R.B. Kaplan, General ed., Randall L. Jones, and G.R. Tucker Co-editors. Annual Review of Applied L i n g u i s t i c s 1980. Rowley, Massachusetts : Newbury House, 1981, p. 124-150 43. O i l e r , John W. J r . and Frances Butler Hinofotis. 1980. "Two Mutually Exclusive Hypotheses about Second Language A b i l i t y : I n d i v i s i b l e or P a r t i a l l y D i v i s i b l e Competence." In John W. Oiler J r . , and Kyle Perkins, eds. Research in Language Testing Massachusetts: Newbury House, 1980, p. 13-23 44. O i l e r , John W. J r . , and Kyle Perkins(eds.). 1980. Research in Language Testing Massachusetts:Newbury House 206-229 45. Pike, L.W. 1979. An Evaluation of Alternative Item Formats for Testing English as a Foreign Language TOEFL Research Reports (~2) Princeton: Educational Testing Service 46. Powers, D.E. 1982. "Selecting Samples for Testing the Hypothesis of D i v i s i b l e Versus Unitary Competence in Language Proficiency." Language Learning 32:331-335 47. Scholz, George E., and Celeste M. Scholz. 1979. "Testing in an EFL/ESP Context." in Carlos A. Yorio, Kyle Perkins and Jacquelyn Schacter (eds.) On TESOL '7 9 The Learner in Focus. Washington, D.C.:Teachers of English to Speakers of Other Languages, 1979, p. 206-209 48. Scholz, George, Debby Hendricks, Randon Spurling, Marianne Johnson, and Lela Vandenberg. 1980. "Is Language A b i l i t y D i v i s i b l e or Unitary? A Factor Analysis of 22 English Language Proficiency Tests." in John W. Oiler J r . , and Kyle Perkins, eds. Research in Language Testing Massachusetts: Newbury House, 1980, p. 24-33 49. Selinker, L. and J. Lamendella. 1979. "The Role of E x t r i n s i c Feedback in Interlanguage F o s s i l i z a t i o n . : Language Learning 29:368-375 50. S t r e i f f , V i r g i n i a 1978. Relationships among Oral and Written Cloze Scores and Achievement Scores in John W. Oi l e r J r . , and Kyle Perkins (eds.) Language in . Education:Testing the Tests. Massachusetts:Newbury House, 1978, p. 65-102 51. Stump, Thomas A. 1976. "Cloze and Dictation Tasks as Predictors of Intelligence and Achievement Scores." in 1 09 John W. Oi l e r J r . , and Kyle Perkins (eds.) Language in Education-.Testing the Tests. Massachusetts:Newbury House, 1978, p. 65-105 52. Swinton, S.S., and D.E. Powers. 1980. Factor Analysis of the Test of English as a Foreign Language for Several Language Groups. TOEFL Research Reports 6, New Jersey: Educational Testing Service 53. Upshur, John A.1976. Discussion of "A Program for Language Testing Research" Language Learning 4. 167-174 54. Valette, Rebecca M. 1977. Modern Language Testing New York:Harcourt Brace Jovanovich 55. Vi g e l , J. and J.W. O i l e r . 1976. "Rule Fossi1ization: A tentative Model." Language Learning 26:281-295 1 1 0 APPENDIX A - INTRODUCTION, SAMPLE ITEMS, AND SAMPLE ANSWER SHEET FROM LISTENING-STRUCTURE TEST USED IN PILOT STUDY AND MAIN RESEARCH A. Introduct ion Here i s another l i s t e n i n g exercise. This is a review of Beginners' grammar. Look at your answer sheet and l i s t e n to the f i r s t example: My book ***** 4 1 on the table. a. are b. i s c. do (Repeat) The answer to that is obviously l e t t e r " b " . . . " i s . " Did you c i r c l e l e t t e r 'b?' Let's try example two: B i l l , where ***** my jacket? a. you put b. you did put c. did you put (Repeat) Did you c i r c l e l e t t e r "c." That is the correct answer. Now try example three: H e ***** to Eaton's tomorrow, to pick up some shoes. a. i s going b. w i l l c. has gone (Repeat) The correct answer to that one i s "a" . . . " i s going." Did you c i r c l e "a?" Now try example four. It i s d i f f e r e n t . Yesterday, John ***** to work. a. going b. goes 4 1 In th i s s c r i p t , the ***** indicates the sound of a b e l l . 111 c. gone (Repeat) There i s no correct answer. Did you c i r c l e no? If you want, your teacher w i l l play the instructions again. A l l right. Let's begin. B. Example items (The following are the f i r s t five examples from the f i n a l version of thi s test, used in the main study. Each question is repeated.) 1. He couldn't buy a sandwich because he didn't have ***** money. a. some b. many c. enough 2. Hey, don't eat that sandwich. It i s *****. a. my b. I c. mine 3. Mrs. Wright can't go with us because ***** car i s not working. a. She's b. hers c. her John, ***** you t i r e d last night. a. have b. were c. do Edward, how ****** you come to school last Thursday? a. w i l l b. do c. are 1 1 2 C. Sample answer sheet The following is a example answer sheet for the f i r s t five questions on the listening-structure multiple choice test: Example 1 . a b c NO Example 2. a b c NO Example 3. a b c NO Example 4. a b c NO 1 . a b c NO 2. a b c NO 3. a b c NO 4. a b c NO 5. a b c NO 6. a b c NO 7. a b c NO 8. a b c NO 9. a b c NO 1 1 3 APPENDIX B - EXAMPLE ITEMS FROM THE READING VOCABULARY TESTS USED IN THE PILOT STUDY AND THE MAIN RESEARCH He is the thief who a l l my money a. borrowed b. stole c. loaned d. reduced I was l a i d off so now I am a. looking for a job b. employed c. in bed d. t i r e d 3. When people buy a house or a car, the f i r s t money they pay is the . a. monthly payment b. p r i n c i p a l c. interest d. down payment 4. My pencil is broken. Could you me yours for a minute? a. exchange b. offer c. lend d. borrow 5. Could you t e l l me your address? I have i t . a. v i s i t e d b. forgotten c. remembered d. (no correct answer) 6. This old bicycle i s not working. I am going to take i t to the bicycle shop for . a. a refund b. repairs c. a mechanic d. (no correct answer) 1 1 4 APPENDIX C - INTRODUCTION AND SAMPLE ITEMS FROM LISTENING VOCABULARY TEST USED IN PILOT STUDY AND MAIN RESEARCH A. Introduct ion Hello are you ready for another l i s t e n i n g exercise? This is a vocabulary exercise. How many English words do you know? Look at your answer sheet. Now l i s t e n to t h i s : I want to buy a coffee. Would you lend me *****? a. some water b. some money c. your car Listen again. (Repeat) The answer to that is obviously l e t t e r "b" ... "some money." Did you c i r c l e "b?" Let's try example two. John i s in the classroom, reading ****. a. a movie b. sandwich c. a book (repeat) Did you c i r c l e l e t t e r "c." ... "a book?" That's the correct answer. Now try example 3. I need a **** because I'm going to write a l e t t e r . a. pen b. shoe c. doctor (repeat) The correct answer to that one i s "a"... "pen." Did you c i r c l e "a?" Now try example four. It's d i f f e r e n t . I'm going to the **** to buy some stamps. a. bank b. beach c. movie (repeat) None of the words are correct, are they? Did you c i r c l e no for no correct answer? If you want, your teacher w i l l play the examples again. 1 1 5 B. Sample items The following are the f i r s t f i v e items used on the l i s t e n i n g vocabulary test in the main research. 1. He wants to save money so he's going to open *****, a. a deposit b. a check c. an account 2. When you borrow money from a bank, you pay *****. a. back b. cash c. interest 3. My son was sick so I made ***** with the doctor. a. an appointment b. a prescription c. a telephone 4. The federal government takes two hundred d o l l a r s from my pay every month. I don't l i k e paying *****. a. insurance b. income c. taxes 5. If you want help in a department store, ask the *****. a. secretary b. t e l l e r c. deposit (NOTE: The answer sheet is the same as in the l i s t e n i n g structure test.) 1 1 6 APPENDIX D - EXAMPLE ITEMS FROM THE READING GRAMMAR TESTS USED IN THE PILOT STUDY AND THE MAIN RESEARCH 1. He couldn't buy a sandwich because he didn't have money. 1. some 2. many 3. enough 4. more 2. This i s a low priced car. It is the others. 1 . as not expensive as 2. not as expensive 3. not as expensive as 4. as expensive not 3. My mother can't go with us because car i s not working. 1 . she 2. she's 3. hers 4. her 4. That car is car of a l l . 1 . more comfortable 2. the most comfortable 3. most comfortable 4. the more comfortable 5. It i s a big class but there aren't in i t . 1. much women 2. many women 3. a lot women 4. some women 6. She come to the meeting tomorrow because she has a dentist appointment. 1. doesn't 2. hasn't 3. won't 4. couldn't 1 1 7 APPENDIX E - EXAMPLE OF CONVERSATION COMPLETION TYPE OF SUBTEST USED IN ASSESSMENT BATTERIES COMPLETE THE CONVERSATION Mary took a suit to the dry cleaners last week. She picked i t up thi s morning. The zipper is broken. She is at the dry cleaners now. She is complaining to the manager. Manager: Good afternoon. May I help you? Mary: Manager: What's the matter? Mary: Manager: Do you have your b i l l ? Mary: Manager: O.K. We'll repair i t for you. Ma r y: Manager: It w i l l be ready on Saturday. Mary: 10 marks 1 19 APPENDIX G - INTRODUCTION, SAMPLE ITEM, AND SAMPLE ANSWER SHEET FROM LISTENING COMPREHENSION TEST USED IN MAIN RESEARCH A. INTRODUCTION Listening test. Listen to these examples and do them with your teacher. WOMAN: Is this your sweater or John's? MAN: Let me see, Oh, i t ' s mine. (Repeat) Question 1: Who does the sweater belong to? a. the man b. John Question 2: What colour i s the sweater? a. red b. blue (Pause 5) Example .2 WOMAN: That bus i s late again. MAN: Yes i t always i s when i t rains. (Repeat) Question 1: What are the man and woman doing? a. drinking coffee b. waiting for a bus Question 2: What i s the weather lik e ? a. cold b. sunny (Pause 5) Instructors, you may play the examples several times. Be sure the students understand Let's begin. 1 20 B. Sample Item from Listening Comprehension test WOMAN: Would you l i k e some dessert, s i r ? MAN: Hmmm, yes, please. What's good today? WOMAN: Well, there's chocolate cake. We also have cherry pie. MAN: Cherry pie? Hmmm, no. Give me a piece of the cake. (Repeat) Question 1 : What does the man want? a. b. chocolate cake cherry pie Question 2 : Where are these people? a. b. i n a restaurant at home 121 C. Sample Answer Sheet 1 . 2. 3. 4. 5. 6 . 7. 8. 9. 10. 1 1 . 12. a a a a a a a a b b b b b b b b b b b b no no no no no no no no no no no no 1 22 APPENDIX H - ORAL INTERVIEW GUIDELINES AND SAMPLE SCORING SHEET Upper Beginners Oral Test -- Score Sheet Part A General Comprehension Part B 'Free' Speaking Fluency 0 1 2 + Accuracy 0 1 2 /5 /4 Part C Question Making Content/ Product ion 0 1 2 Accuracy 0 1 2 /4 Part D Language Use Appropr iateness/ Completeness 1 . 0 1 2 X 2. 0 1 2 X 3. 0 1 2 X Accuracy 0 1/2 1 0 1/2 1 0 1/2 1 /2 /2 /2 /20 1 23 ORAL TEST Upper Beginners Part A (one minute) Use these questions to set the students at ease and to test their general comprehension. Speak in a conversational tone and at regular speed. If the students answer in any way (short, long) that shows comprehension of the questions, give f u l l marks. 1. How are you today? 2. Sit down. 3. What i s your name? 4. How do you sp e l l your ( f i r s t , last) name? 5. Who i s you teacher? 6. How long have you been in Upper Beginners? 7. Is thi s your f i r s t interview? 8. Where do you li v e ? 9. (If they give an address, area, ask:) Where i s that? Part B (two minutes) Ask the student to t e l l you about ONE of the following: 1. Educational background 2. Employment background 3. A c t i v i t i e s on a' p a r t i c u l a r job 4. (For unemployed students who also have l i t t l e to say about their education...) Day-to-day a c t i v i t i e s Some suggested lead-ins 1. Did you go to school in ? T e l l me a l i t t l e about what you studied. 2. How many diff e r e n t jobs have you had? ( Have you had several jobs?) T e l l me a l i t t l e about those di f f e r e n t jobs. 3. Are you working now? T e l l me what you do on your job. (or) T e l l me about where you work. 4. ( Housewives, young students and some others may not have anything to say about the f i r s t three topics, ask ....) What do you do during the day? 1 24 Part C The student must ask at least three relevant questions from the point of view of someone renting an apartment. You may guide the student as to acc e p t a b i l i t y of the questions she/he asks and prompt for more. Acceptable ... Questions r e l a t i n g to rent, number of bedrooms, the floor i t ' s on, nearness to schools/shops, when i t i s ava i l a b l e . Lead-in .... You are looking for an apartment. I am the apartment owner. Ask me some important questions about my apartment.... Tester engages in conversation with the student. Part D (three minutes) Using the language.... Students must make an appropriate response to each of the problems presented. The appropriate response includes: complexity, stress, r e g i s t e r , intonation, appropriatness of the utterance to the si t u a t i o n , mood created by the response. Choose one of the three questions for each communicative type. Apologizing 1. Your friend invited you to go to a movie tomorrow night. You can't go because you have something else to do. What do you say? T e l l him/her why. 2. You are sick today and can't go to work. You phone your employer. What do you say to him? 3. You are late for class. Your teacher looks upset. What do you say to her/him? Make an excuse and a promise. Complaining 1. You bought a hamburger at a take-out restaurant. It is cold and doesn't taste good. What do you say? 2. You took a dress to the dry cleaners l a s t week. When you picked i t up i t had a button missing and the zipper was broken. What do you say? 3. You bought some milk at the corner store t h i s morning, but i t i s sour. You take i t back to the store. What do you say? Social Situation 1. Your friend's mother died l a s t week. What do you say to her? 2. Invite me to have a cup of coffee with you after c l a s s . 1 25 3. Introduce me to your friend. 1 26 APPENDIX I - COMPOSITION MARKING GUIDE Verbal Description for Free-Writing Assessment -- Beginners to College Entry Level 1: Semantics (Function, Vocabulary, Organization) Can give (ask for) concrete info. (name, address, phone, place of work, etc.) With a picture series of everyday basic experience for guidance, can write very brief, simple description or report. Can't handle discussion. Vocabulary limited and often inappropriate. Any organization due to picture guidance. Level 1: Syntax (Interclause) Can produce some simple sentences (affirmative, negative, interrogative). May attempt simple co-ord (and, or, but, so) and sub-ord (when, because) (Intraclause) Demonstrates awareness of past/present/future time but not always correct. Frequent errors of the following types: word order, word-form, fragments, run-ons, pronoun and subject-verb agreement, prepositions and a r t i c l e s . Level ,: Orthography (Punctuation, Spelling, Readability) L i t t l e or no punctuation or c a p i t a l i z a t i o n . Frequent sp e l l i n g errors in common words. Letters unclear, messy paper. Almost impossible to read. Level 2: Semantics Can rearrange stock phrases and patterns to handle basic personal and survival areas. With a verbal rather than p i c t o r i a l stimulus, some d i f f i c u l t i e s with describing and reporting in these areas. Can't handle discussion. Vocabulary is high frequency and generally appropriate for these basic areas. Not necessarily well organized. Level 2: Syntax (Interclause) Simple sentences generally mastered as well as some success in simple co-ord and sub-ord from Level 1. Other types of sub- ord may be attempted. Intraclause Use of past/present/future generally correct. Problems in 1 27 use of simple vs. continuous and present perfect. Few problems with word order. Prepositions of time/place/direction generally correct but continuing d i f f i c u l t i e s with word form, fragments and run-ons, agreement, idiomatic prepositions and a r t i c l e s . Level 2: Orthography Some attempt at punctuation. Less than impossible to read. Frequent s p e l l i n g errors. Level 3: Semantics Can handle description and reporting in everyday situations but has some d i f f i c u l t i e s with discussion. Exhibiting choice about vocabulary which is adequate for informal communication and some use of idioms. Great d i f f i c u l t i e s with abstract or distant levels of the topic i f attempted. Some e f f o r t at organization including t r a n s i t i o n s . Level 3: Syntax (Interclause) Simple sentences, co-ord & sub-ord. (adv. + adj.) generally correct. May have d i f f i c u l t i e s with N. clauses especially from questions. May attempt abridgements and phrases ( p a r t i c i p l e , gerund, i n f i n i t i v e ) but unsuccessfully. (Intraclause) Past/Pres./Future, contin. and Perfect under control. May have problems with past perf., use of tenses in conditions and sequencing across sentences. Few problems with frags., r.o.'s and agree. Use of a r t i c l e s and common idiomatic prepositions generally correct but d i f f i c u l t i e s with 'the' deletion, less common i d . preps. and word form. Level 3: Orthography Punctuation correct most of the time, especially c a p i t a l i z a t i o n and period but use of comma may be e r r a t i c . Some attempt at paragraphs. Few sp e l l i n g errors in common words. Level 4: Semantics Can handle description, reporting and discussion on everyday l e v e l . Beginning to control topic at more abstract or distant l e v e l s . Appropriate use of idioms and lower frequency vocab. Some flaws in organization. May contain some redundancy. Level 4: Syntax (Interclause) N. clauses, abridgements, phrases generally correct. May attempt absolute constructions, abstract noun phrases and appositive phrases. 1 28 (Intraclause) Few, i f any problems with odd use of tenses and sequencing. Few problems with word form. Use of less common i d . preps, generally successful. Few, i f any problems with a r t i c l e s including 'the' deletion. Level 4: Orthography Control of remaining punctuation: commas, colons, semicolons, quotation marks, etc. Spelling correct except for words natives would find d i f f i c u l t . Proper paragraphing. Level 5: Semantics Acceptable for college entry. Can handle description, reporting and discussion even at abstract and distant l e v e l s . Vocab. and idioms appropriate to the task and vary in frequency... high and low. Clear, l o g i c a l patterns, adequate development and lack of redundancy. Occasional evidence of unnatural but correct English. Level 5: Syntax (Interclause) Can handle a wide variety of grammatical functions and sentence types with few, i f any, errors. Absolute constructions, abstract noun phrases and appositive phrases are correct i f attempted. (Intraclause) Any intra-clause errors probably due to carelessness. Level 5: Orthography Beautiful, easy to read. No errors which would produce any misunderstanding or embarrassment. 1 29 APPENDIX J - INTRODUCTION, SAMPLE ITEMS, AND SAMPLE ANSWER SHEET FROM PHONEME DISCRIMINATION TEST A. Introduction How well can you hear the di f f e r e n t sounds of English? In each question you w i l l hear three words. C i r c l e the number of the one that is d i f f e r e n t . Listen to example one meet meet mate The t h i r d word was d i f f e r e n t . So you should c i r c l e three. Sometimes a l l of the words are the same. Look at example two. 0 meet meet meet You should c i r c l e 's' for same. Sometimes a l l of the words are d i f f e r e n t . Listen to example three. meet mate mote You should c i r c l e 'd' for d i f f e r e n t . Here are fi v e more easy examples. Do them on your answer sheet. Example four steak stock steak The answer i s two. Did you c i r c l e two? Example f i v e : brick break break The answer is one. Did you c i r c l e one? Example six: coin coin coin They are a l l the same. Did you c i r c l e 's?' Example seven: can can coin Number three is the right answer. Did you c i r c l e three? Example eight: coin can cane Those are a l l d i f f e r e n t . Did you c i r c l e 'd?' 1 30 Teachers you may play the examples again. A l l right. Let's begin. B. Sample Items 1. gene gin gin 2. swayed swede swayed 3. leaned leaned leaned 4. bit bait bet 5. peck pick peck 6. s l i n g s l i n g s l i n g 7. .sting stung sting 8. lace less less 9. pad paid pad 10. lake lake lake 131 C. Sample answer Sheet 1 . 2. 3. 4. 5. 6 . 7. 8. 9. 10. 1 1 . 12. 13. 14. 15. 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 s s s s s s s s s s s s s s s a d d d d d d d d d d d d d d 1 32 APPENDIX K - INTRODUCTION, SAMPLE ITEMS, AND SAMPLE ANSWER SHEET FROM THE BOWEN-FORMAT LISTENING TEST A. Introduction Listening Exercise. When people speak English quickly, some words become shorter. For example, we don't say 'He is going.' we say 'He's going. Sometimes words get pushed together. For example, we don't say 'Is he going?' we say 'Is-he going?' In this exercise, l i s t e n c a r e f u l l y to the sentences. Then write only the second word you hear. Look at your answer sheet now, and do the examples with me. The f i r s t one has been done for you. Example A: What's he doing? (Pause 5) Did you hear " i s " ? - i - s The sentence is 'What is he doing?' Example B: Did he leave his book? (Pause) Did you write 'he' -h-e? The sentence is 'Did he leave his book?' Example C: What did you do yesterday? (Pause) The answer i s 'did' d-i-d. The sentence is 'What did you do yesterday?" Example D: What colour i s her car? (Pause) Did you write 'colour?' The sentence i s 'What colour is her car?" Here are four more examples. Do them with your instructor. Write your answers under group two. E. Where did you go last week? F. Did he do his homework? G. When is he coming? H. What kind of ice-cream do you li k e ? 1 33 B Sample items from test 1. She's leaving tomorrow morning. 2. Did he pay for the dinner? 3. John and Nancy are coming to the party tonight. 4. He's t i r e d of that, isn't he? 5. Is your brother coming to pick you up? 6. What do you think the weather w i l l be l i k e tomorrow? 7. What did he do a l l day at the l i b r a r y ? 8. Well, there are no more books here. C. Sample answer sheet 1 . 2. 3. 4. 5. 6 . 7. 8. 9. 10. 1 1 . 12. 135 APPENDIX L - EXPERIMENTAL LISTENING TEST USED IN PRELIMINARY STUDY Introduction Instructors, be sure the students are looking at the columns of NGs and OKs. In t h i s test you w i l l hear some short conversations between a woman and a man. You w i l l hear them only once. After each conversation decide i f the man's answer is OK or not OK. If i t is OK, c i r c l e OK on the answer sheet. If i t is not OK, c i r c l e NG for no good. Here are four examples. Example one. Good morning. How are you? Fine and you? (4 second pause) The man's answer is OK. Did you c i r c l e OK? Example two. What's the weather l i k e today? It's Monday (4 second pause) The man's answer is ce r t a i n l y not OK. Did you c i r c l e NG for no good? Example three. Where is my book? On the table. (4 second pause) The man's answer is short but i t i s good. Did you c i r c l e OK? Example four. Do you go to school every day? Yes I do, but only on Tuesdays and Thursdays. (4 second pause) The f i r s t part of the man's answer i s OK but the second part is no good. Did you c i r c l e NG? Teachers, be sure the students understand. You may play the examples again. I terns When is the party going to be? Next Tuesday. Why didn't you come yesterday? I had to see a doctor. How long is your holiday? Two or three kilometers. Is i t warm enough in here? Yes i t i s . I need my coat. Is this the f i r s t time you've been to Vancouver? Yes, that's right. I have only been here once before. I'm not feeling very well. Oh r e a l l y , what happened to you? Why are your hands dirty ? Because I've just finished cleaning the cupboards. I have never played cards. It i s l i k e tennis. This sweater isn't big enough for me. Yes, i t is too big, isn't i t ? I want five seventeen cent stamps. I'm sorry. We have just run out. 1 37 Sample answer sheet Example 1 . OK NG Example 2. OK NG Example 3. OK NG Example 4. OK NG 1 . OK NG 2. OK NG 3. OK. NG 4. OK NG 5. OK NG 6. OK NG 7. OK NG 8. OK NG 9. OK NG 10. OK NG 1 38 APPENDIX M ~ THE AUXILIARY FACTOR ANALYSES The purpose of this series of analyses was to investigate the effect on the factor matrix of the two variables of undetermined r e l i a b i l i t y (Concom and Oral) and of the two demographic variables age and Lot. and to determine which combination of language and demographic variables to use in a r r i v i n g at preferred solutions. Missing data was replaced with group means and the method of factor analysis was p r i n c i p a l components followed by a varimax rotation. The results of this series of factor analyses --done on six subsets of the variables-- are presented in Table XIX. In the series of six analyses, each of which included eleven or more variables, a l l solutions produced three factors. The most important fact about Concom and Oral was that they did not create or define factors when they were introduced. That i s to say, the pattern of loadings was substantially the same with or without either of these two variables. The conversation completion (Concom) consistently clustered with the three structure tests and did not reveal any complexity. The loadings of Oral on the other hand did change depending on the presence or absence of age. This i s not surprising given the c o r r e l a t i o n (-0.36) of age and Oral. I did not consider t h i s v a c i l l a t i o n as inherent weakness or u n r e l i a b l i t y on the part of Oral because Comp, Errcorr20, Listcomp, and Listbow were also affected by the addition/deletion of this variable. In general, the inclusion of the two demographic variables s i m p l i f i e d the solutions. That i s , when these variables were included, the number of variables loading on three factors decreased. In the solution for the twelve l i n g u i s t i c measures there were two variables, Comp and Listcomp which "spread out" over a l l three factors. When age or Lot was included, these reduced to two-variable complexity, and i t became clear that there were in fact only two l i n g u i s t i c factors influencing these two and the rest of the language variables. The t h i r d major factor was a demographically defined one. By adding age to the matrix of language variables, Thurstone's c r i t e r i a of simple structure was more c l e a r l y met. However, when both age and Lot were included, the solution lost some of i t s interpretive power in that the c o e f f i c i e n t s for Readvoc and L i s t s t r u became more evenly d i s t r i b u t e d on two factors. Because of t h i s , the f i n a l solutions did not incorporate Lot. 1 39 Table XIX - Factor by factor comparison of subsets of the variables (pr i n c i p a l components followed by varimax rotation) FACTOR I COMP .62 .64 .57 .53 .67 .64 READSTRU .78 .69 .70 .72 .71 .70 ERRCOR10 .79 .76 .80 .77 .77 .75 ERRCOR20 .76 .72 .65 .63 .74 .72 CONCOM .63 * * * * .47 * * ** ORAL . 18 .26 .24 .22 ** ** READVOC . 38 .37 .29 .31 .38 .35 LISTCOMP . 38 .39 .35 .36 .43 .40 LISTVOC .05 . 1 4 . 1 1 .08 . 1 4 .08 LISTPHON .25 .27 .20 .21 .27 .26 LISTSTRU .42 .42 .39 .35 .45 .43 LISTBOW . 25 .33 .26 . 1 6 .37 .35 LOT -.09 • -.05 - .02 ** -.03 ** AGE .08 .04 ** ** .00 .01 FACTOR II COMP .06 .07 .09 .41 .09 . 1 6 READSTRU . 16 . 1 7 . 1 4 .27 . 1 6 . 1 9 ERRCOR10 . 1 1 . 1 2 .07 > .22 . 1 2 . 1 7 ERRCOR20 .05 .06 . 1 2 .31 .05 .06 CONCOM .08 ** ** . 12 ** ** ORAL .30 .27 . 1 6 .47 ** ** READVOC -.06 --.02 .02 .18 .01 .02 LISTCOMP . 1 4 . 1 5 .06 . 38 .18 .28 LISTVOC -.17 --. 12 - .21 -.10 .07 .06 LISTPHON .66 .51 .58 . 39 .53 .44 LISTSTRU .13 . 1 3 . 1 0 .35 . 1 4 .21 LISTBOW .44 .39 .34 .76 .40 .47 LOT -.75 --.58 - .60 ** . -.58 ** AGE -.79 --.74 ** ** . -.72 --.82 FACTOR III COMP .51 .44 .52 .38 .40 .42 READSTRU .16 . 1 7 .21 .10 .10 . 1 1 ERRCOR10 .23 .20 .23 .19 . 1 9 . 17 ERRCOR2 0 .24 .21 .30 .19 . 1 5 .21 CONCOM .05 ** ** .15 ** ** ORAL .63 . 50 .52 .31 ** ** READVOC .49 .40 .47 .45 .40 .48 LISTCOMP .66 .60 .63 .49 .54 .52 LISTVOC .78 .66 .65 .78 .74 .75 LISTPHON .12 .09 .21 .02 .07 .05 LISTSTRU .39 .31 .36 .22 .27 .26 LISTBOW .55 .44 .51 .19 .36 .29 LOT .28 .20 . 1 1 ** .21 ** AGE -.28 --.25 ** ** _ -. 17 --.03 1 40 APPENDIX N - CORRELATION MATRIX OF ALL VARIABLES COMP READST ERC010 ERCO20 CONCOM ORAL COMP 1 . 00 READSTRU 0. 543 1 . 00 ERRCOR10 0. 588 0. 678 1 .00 ERRCOR2 0 0. 610 O 0. 606 0 .639 1 .00 CONCOM 0. 328 0. 446 0 . 385 0 .400 1 . 00 ORAL 0. 421 0. 360 0 .294 0 .382 0. 209 1 . 00 READVOC 0. 481 0. 360 0 .393 0 . 429 0. 285 0. 336 LISTCOMP 0. 51 1 0. 415 0 .433 0 .450 0. 342 0. 454 LISTVOC 0. 401 0. 1 63 0 .259 0 . 228 0. 1 53 0. 301 LISTPHON 0. 365 0. 273 0 . 337 0 . 286 0. 1 53 0. 252 LISTSTRU 0. 457 0. 385 0 .424 0 .471 0. 1 99 0. 305 LISTBOW 0. 507 0. 404 0 .345 0 .427 o.. 220 0. 475 LOT 0. 025 -o. 091 -0 .038 -o .070 -o. 1 03 -o. 025 AGE -o. 1 10 -o. 1 93 -0 . 1 76 -o .036 -o. 1 09 -o. 367 READVO LISTCO LISTVO LISTPH READVOC 1 . 00 LISTCOMP 0. 421 1 . 00 LISTVOC 0. 476 0. 454 1 .00 LISTPHON 0. 226 0. 241 0 .075 1 . 00 LISTSTRU 0. 256 0. 457 0 .278 0. 236 LISTBOW 0. 327 0. 442 0 .285 0. 4 1 8 LOT 0. 013 0. 003 0 .213 -o. 438 AGE 0. 004 -o. 276 -o . 1 06 -o. 413 LISTSR LI STB LOT AGE LISTSTRU 1 . 00 LISTBOW 0. 407 1 . 00 LOT -o. 021 -o. 1 35 1 .00 AGE -o. 171 -o. 434 0 .416 1 .00

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
China 13 0
Canada 7 0
United States 3 0
Iran 2 0
City Views Downloads
Beijing 13 0
Unknown 4 1
Toronto 3 0
Tiran 2 0
San Jose 1 0
Redmond 1 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items