A S T A T I S T I C A L A N A L Y S I S O F F I N D I N G T H E B E S T P R E D I C T O R O F S U C C E S S IN FIRST Y E A R C A L C U L U S A T T H E U N I V E R S I T Y O F BRITISH C O L U M B I A By R O B E R T E U G E N E L E E B.Sc , University of Victoria, 1980 A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F S C I E N C E in T H E F A C U L T Y O F G R A D U A T E S T U D I E S The Department'of Statistics We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A October, 1987 © Robert Eugene Lee, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date (9r±hJ^./Sr. / 9 8 7 DE-6(3/81) A B S T R A C T In this thesis we focus on high school students who graduated from a B.C. high school in 1985 and then proceeded directly to the University of British Columbia (UBC) and registering in a first year calculus course in the 1985 fall term. From this data, we want to determine the best predictor of suc-cess (the high school assigned grade for Algebra 12, or the provincial grade for Algebra 12, or the average of the high school and the provincial grade for Algebra 12) in first year calculus at UBC. We first analyze the data using simple descriptive statistics and continuous methods such as regres-sion and analysis of variance techniques. In subsequent chapters, the cat-egorical approach is taken and we use scaling techniques as well as loglinear models. Finally, we summarize our analysis and give conclusions in the final chapter. - ii -TABLE OF CONTENTS ABSTRACT i i LIST OF TABLES V LIST OF FIGURES . v i i ACKNOWLEDGEMENTS v i i i I INTRODUCTION 1 II DATA DESCRIPTION 5 III INITIAL ANALYSIS AND CORRELATION STUDY 17 III-1. Introduction 17 III-2. Simple Descriptive Statistics 18 III-3. Ecological Correlation 26 III-4. Conditional Correlation Phenomena 33 III-5. Summary^ 38 IV DATA ANALYSIS (Continuous Methods) 39 IV-1. Introduction 39 IV-2. Simple and Multiple Regression 40 IV-3. Analysis of Variance 4 2 IV-4. Analysis of Covariance 49 IV-5. Discriminant Analysis 51 IV-6. Summary 53 - i i i -V SCALING GRADES AND OPTIMAL SCORES 54 V-1. Introduction 54 V-2. Canonical Correlations 57 V-3. Monotone Methods 63 V-4. Summary • 7 0 VI DATA ANALYSIS (Categorical Methods) 71 VI-1. Introduction 71 VI-2. Prediction of Success 72 VI-3. Loglinear Models 80 VI-4. Measures of Association 82 VI-5. Uniform Association Model 84 VI-6. The Logit and GSK Models 86 VI-7. Summary 91 VII RESULTS AND CONCLUSIONS 92 BIBLIOGRAPHY 100 - i v -LIST OF TABLES 1. Abbreviated Listing of the Original Data 6 2. Listing of the B.C. High Schools Included in this Analysis 8 3. Simple Statistics by Course by Mark Variable 19 4. Simple Statistics by Group by Mark Variable 19 5. Simple Statistics by School by Mark Variable 20 6. Simple Statistics by School D i s t r i c t by Mark Variable 21 7. Comparison of Euclid/Non-Euclid Students 25 8. Ecological Correlation Example 26 9. Conditional Correlations by Function 36 10. Conditional Correlations by Predictor Variable 37 11. R-Squared and Cp Values by Course 43 12. R-Squared and Cp Values by Group 43 13. Student-Newman-Keuls Multiple Range Test on Schools 45 14. Student-Newman-Keuls Multiple Range Test on Courses 46 15. Student-Newman-Keuls Multiple Range Test on Groups 46 16. Classification of Euclid High School B Students by Course 52 17. Classification of Euclid High School B Students by Group 52 18. Correlations and Measures of Association By Course 60 19. Correlations and Measures of Association By Group 60 20. Optimal, Normal, and Average Scores for A l l Students 62 21. Correlations and Measures of Association by School . . . . . . . . 64 22. Prediction of Success and Failure at UBC 73 23. Prediction of Success and Failure in UBC by Course and Group . . . 75 24. Assigning Weights to Groups 75 - v -25. Assigning Weights to Schools 77 26. School Cut-Off Scores by Cf and Cs Values 79 27. UBC Cut-Off Scores by Cf and Cs Values 79 - v i -L I S T O P F I G U R E S 1. Frequency Distribution of UBC Marks by Course 13 2. Frequency Distribution of UBC Marks (All Courses Combinded) . . . . 14 3. Frequency Distribution of UBC Marks by Group 15 4. Faces of B.C. High School Districts with 10 or More Students . . . . 23 5. Scatter Plot of Data Points and Group Means 27 6. Normal Quantile Plots of the Residuals 41 v i i -ACKNOWLEDGEMENTS I would l i k e to thank Dr. Harry Joe for having suggested t h i s the-s i s topic to me and for h i s patience and guidance i n supervising me throughout the study. I am also g r a t e f u l to Dr. Mohan Delampady f o r hi s time i n reading my thesis and for h i s comments and suggestions. In addition, I would also l i k e to thank Dr. George Bluman f o r having o r i g i -nated t h i s p r o j e c t and for h i s time i n discussing the study with me. F i n a l l y , I l i k e to thank the National Research Council of Canada f o r t h e i r f i n a n c i a l support. - v i i i -C H A P T E R I INTRODUCTION For several years, Dr. George Bluman of the Department of Mathe-matics at the University of British Columbia (UBC) has been collecting data on performances of students in first year calculus courses at UBC. He has focused mainly on B.C. high school graduates entering UBC and enrolling in a first year calculus course. Recently, Dr. Bluman had several questions (which motivated this thesis) pertaining to the relationships between the 1985 high school assigned grade for Algebra 12, the 1985 provincial Algebra 12 grade, and the resulting UBC grade for the 1985 fall term. More specifi-cally, his questions of concern were as follows: 1. Which of the following is the best predictor of success in first year calculus at UBC? a. High school assigned grade for Algebra 12. b. Provincial grade for Algebra 12. c. Blended" (average of high school and provincial) grade for Algebra 12. 2. Look at Question 1 for each of the following groups of schools. a. Vancouver public high schools. - 1 -b. Greater Vancouver Regional District (GVRD) public high schools (excluding Vancouver public high schools). c. Rest of B.C. public high schools. d. Private Non-Catholic high schools. e. Private Catholic high schools. 3. Look at Question 1 school-by-school for each of the 50 largest schools. 4. Compare the high school grade with the provincial grade for break-downs of Questions 2 and 3. 5. Consider the significance of performance and/or participation in the 1985 Euclid Math Contest as a predictor — the annual Euclid Math Contest which is optionally written by high school students in April of each year is a much harder exam than either the provincial exam or any high school exam. a. Focus on comparing B (high school) students with or without participation in the Euclid Contest by individual school. b. Look at Euclid as a predictor for the result on the provincial exam and first year calculus (focusing on B students). Conjecture: B students who wrote the Euclid performed like A students on the provincial exam and performed like B + students in first year calculus. For the same schools B stu-dents who did not write the Euclid performed like B students on the provincial exam and like B students in first year cal-culus. - 2 -In addition, Dr. Bluman also believed that high schools from the Interior did not prepare their students for first year calculus at UBC as well as schools in the Vancouver Region. For the 1985 data, Dr. Bluman was able to ascertain information from the Ministry of Education pertaining to the high school mark and the provincial mark for Algebra 12 for each student enrolled in a first year cal-culus course at UBC in the 1985 fall term. He also collected the results of the Euclid Math Contest for those UBC students who had written in 1985. The four first year calculus courses taught at UBC in 1985 were as follows: — Mathematics 100: Basic course required by all science students. — Mathematics 120: Honours version of Mathematics 100. — Mathematics 140: For Commerce and Economics students. — Mathematics 153: Engineering students (4-year programme). Students in Math 120 had the same basic course as Math 100, but were giv-en harder problems. At Christmas, they wrote the same exam as the Math 100 students. In addition, Engineering students taking Math 153 also received harder problems hence this course has a higher standard than Math 100 but lower than Math 120. An adjustment factor to make the Math 153 students' marks more comparable with the other math students is discussed in the next chapter. The results and conclusions of this analysis can be beneficial for administrative purposes in determining enrollment restrictions and admis-- 3 -sion standards. Enrollment quotas are sometimes necessary when a depart-ment is faced with a fixed financial budget and faculty complement. Now quotas should not depend on a first-come first-served basis but rather per-formance should be the main criterion. In the Department of Mathematics there are many sections of first year calculus taught and the resulting fail-ure rate is somewhat high (20% for the 1985 data). Hence, enrollment restrictions based on predicted performance should help to reduce the failure rate and thus result in better use of departmental funds! Chapter II describes the data and gives some frequency plots. Chap-ter III gives an initial analysis of the data using simple statistics and dis-cusses several applications of correlation. Data analysis treating the marks as continuous variables is found in Chapter IV. Chapter V looks at the data from a categorical approach and Chapter VI gives a data analysis based on grades (categories) implementing several models used in categorical data. Finally, Chapter VII gives summaries of results and conclusions. - 4 -C H A P T E R II DATA DESCRIPTION In the 1985 fall term, there were 1687 first-time students (originating from 164 different B.C. high schools) registered in a first year calculus course at UBC. These students had graduated in 1985 and then proceeded directly to UBC (i.e., they did not take any time off between leaving high school and first entering university). From the 164 high schools, 51 of them sent 10 or more students. Nearly 2.5% of the 1687 students had withdrawn from the math course before its completion. These withdrawn students are excluded from or included in different aspects of the study as outlined in subsequent chapters. In each high school, an Algebra 12 mark (out of 100) was assigned to the student. In addition, a mandatory provincial Algebra 12 exam (out of 100) was written in January or June. These data were collected from the Ministry of Education. Along with these marks, Dr. Bluman gained access to marks resulting from the 1985 Euclid Math Contest (percentage out of 100) which was optionally written by students in April. The student's name was used as an identifier in corresponding the three aforementioned marks with the UBC mark (out of 75) received. Finally, the raw data was entered into a dataset using one record per student. Table 1 shows an abbreviated listing of the dataset. - 5 -Table 1: Abbreviated Listing of the Original Data 1,1] [,2] [,3] t,4) [,5] [,6] 1,7) ( 1,1 1 0 58 94 88 91 NA [ 2,] 1 4 74 99 94 97 NA ( 3,] 1 4 26 61 60 61 NA t 4,] 2 0 56 80 71 76 12.75 I 5,] 3 5 47 90 96 93 NA ( 6,] 3 4 38 67 62 65 NA [ 7,] 3 0 55 85 97 91 NA t 8,] 3 5 52 98 97 98 34.50 [ 9,] 3 0 62 89 96 93 NA [ 10,] 4 0 62 94 89 92 NA ( 11,) 4 5 17 80 65 73 NA [ 12,] 5 0 30 73 67 70 NA [ 13,] 5 0 39 73 82 78 NA I 14,) 6 0 33 66 50 58 NA [ 15,) 6 5 38 88 85 87 50.25 ( 16/) 7 0 72 75 86 81 NA [ 17,] 7 0 33 76 71 74 NA ( 18,] 7 0 52 66 68 67 NA [ 19,] 7 0 57 78 76 77 NA C 20,] 8 0 66 92 91 92 NA C 21,] 9 0 0 80 53 67 NA 1 22,] 9 0 40 87 68 78 NA ( 23,] 9 0 51 86 75 81 NA [ 24,] 10 4 44 80 54 67 NA ( 25,] 10 0 28 77 47 62 NA C 26,1 11 5 33 79 83 81 NA t 27,] 11 0 53 86 90 88 NA ( 28,] 11 4 34 73 77 75 NA [ 29,] 11 0 67 88 88 88 NA t 30,] 11 4 10 60 75 68 NA [ 31,] 11 0 38 82 84 83 NA t 32,] 11 0 50 76 83 80 NA [ 33,] 12 4 47 72 88 80 32.50 [ 34,) 12 0 66 76 86 81 NA C 35,] 12 0 0 70 70 70 NA [ 36,] 12 0 74 93 92 93 77.00 I 37,) 12 0 47 77 90 84 28.00 [ 38,] 12 0 63 79 96 88 40.25 [ 39,) 12 0 55 73 83 78 37.50 ( 40,] 12 0 71 89 96 93 32.25 [ 41,] 12 0 62 81 86 84 35.75 ( 42,] 13 0 49 87 69 78 NA ( 43,] 13 0 50 88 69 79 NA [ 44,] 13 0 53 82 72 77 NA [ 45,] 13 0 58 92 76 84 NA ( 46,] 13 0 53 93 79 86 NA ( 47,] 14 0 43 94 68 81 NA ( 48,] 14 0 54 73 72 73 NA [ 49,) 14 0 14 77 53 65 NA ( 50,] • 14 • 0 • 51 • 94 « 78 • 86 • NA • • [1680,] • 162 • • 0 • 53 • 78 • 79 • 79 NA [1681,] 162 0 62 92 91 92 NA [1682,] 162 0 42 78 74 76 NA [1683,] 162 0 45 76 78 77 NA [1684,] 163 0 24 87 54 71 NA (1685,) 164 0 51 87 69 78 NA [1686,] 164 5 45 94 80 87 NA [1687,] 164 0 45 85 75 80 NA Column 1 represents the high school code which is translated in Table 2. The math course taken at UBC is under column 2 with codes 0, 2, 4, and 5 corresponding to Math 100, Math 120, Math 140, and Math 153 respective-ly. Column 3 gives the UBC mark (out of 75) received — a mark of 0 means that the student had withdrawn from the course. As was mentioned earlier in Chapter I, the Math 153 marks should be adjusted upwards to be more comparable with the other math courses. Dr. Bluman suggested a "12-point" increase (which has been incorporated into this analysis) — 12 is the difference between the average of the Math 100 and Math 120 means and the Math 153 mean. For example, student number 5 in Table 1 had an UBC mark of 47 in Math 153, thus his or her mark is adjusted to 59 (47 + 12). This "12-point" increase naturally has a ceiling of 75 — i.e., adjusted mark = min(unadjusted mark + 12, 75). Columns 4, 5, and 6 represents the high school assigned Algebra 12 mark (out of 100), the provincial Alge-bra 12 mark (out of 100), and the (rounded) blended mark (out of 100) respectively. These columns contain a few missing values (NA) for one rea-son or another. Finally, the Euclid mark (percentage out of 100) is found under column 7. A missing value means that the student did not write the Euclid Math Contest. Table 2 gives a list of all 164 high schools (stemming from the 1687 students) along with their high school district code and their group code (1 = Vancouver public schools, 2 = GVRD public schools, 3 = Rest of B.C. public schools, 4 = Private Non-Catholic schools, and 5 = Private Catholic schools). - 7 -Table 2: Listing of the B.C. High Schools Included in this Analysis Number Number School School Group of Writing Code District School Name Code Students Euclid 1 01 Fernie 3 3 2 02 Mount Baker 3 1 1 3 03 Selkirk 3 5 1 4 04 David Thompson (Invermere) 3 2 0 5 07 Salmo 3 2 --6 07 L.V. Rogers O 2 1 7 11 J. Lloyd Crowe 3 4 0 8 11 Rossland 3 1 --9 12 Grand Forks 3 3 -10 13 Boundary Central 3 2 -11 14 Southern Okanagan 3 7 --12 15 Penticton 3 9 7 13 18 Golden 3 5 --14 19 Revelstoke 3 5 --15 21 Pleasant Valley 3 2 --16 22 Clarence Fulton 3 3 --17 22 Vernon 3 2 1 18 23 George Pringle 3 2 0 19 23 Kelowna 3 1 1 20 23 K.L.O. 3 3 --21 23 Spring Valley 3 3 0 22 23 Mount Boucherie 3 7 5 23 23 Okanagan Mission 3 2 --24 23 Immaculata 5 1 --25 24 Kamloops 3 12 2 26 24 Chase 3 1 1 27 24 Brocklehurst 3 1 --28 24 Norkam 3 1 • 0 29 24 Westsyde 3 4 3 30 26 Clearwater 3 3 -31 27 Columneetza 3 16 -32 27 Peter Skene Ogden 3 3 --33 28 Quesnel 3 5 3 34 28 Correlieu 3 3 3 35 29 Lillooet 3 1 1 36 30 Ashcroft 3 1 --37 31 Merritt 3 1 1 38 32 Hope 3 3 --39 33 Chilliwack 3 6 2 40 33 Sardis 3 5 4 41 34 Abbotsford 3 12 2 Table 2: (cont'd) Number Number School School Group of Writing Code District School Name Code Students Euclid 42 34 W.J. Mouat 3 6 0 43 35 Langley 2 6 1 44 35 D.W. Poppy 2 4 3 45 35 Mountain 2 3 0 46 36 Queen Elizabeth 2 7 2 47 36 Semiahmoo 2 19 5 48 36 North Surrey 2 6 1 49 36 Lord Tweedsmuir 2 5 --50 36 Princess Margaret 2 3 --51 36 Guildford Park 2 2 --52 36 Earl Marriott 2 2 --53 36 Frank Hurt 2 3 3 54 36 Fraser Valley Christian 4 1 --55 37 Delta 2 6 4 56 37 North Delta 2 26 5 57 37 South Delta 2 35 16 58 37 Seaquam 2 18 11 59 38 Richmond 2 49 10 60 38 Steveston 2 56 12 61 38 Matthew McNair 2 48 9 62 39 King George 1 4 0 63 39 Britannia 1 21 6 64 39 Magee 1 32 --65 39 Kitsilano 1 15 7 66 39 John Oliver 1 54 9 67 39 Lord Byng 1 10 2 68 39 Templeton I 25 11 69 39 Vancouver Technical 1 30 8 70 39 Point Grey 1 26 2 71 39 Gladstone 1 24 3 72 39 Sir Winston Churchill 1 61 16 73 39 Killarney 1 63 17 74 39 Sir Charles Tupper 1 14 7 75 39 David Thompson (Vancouver) 1 28 9 76 39 Prince of Wales 1 36 3 77 39 Windermere 1 20 4 78 39 Eric Hamber 1 69 --79 39 University Hill 1 10 4 80 39 St. George's 4 32 15 81 39 Crofton House 4 8 6 82 39 Vancouver College 5 30 Table 2: (cont'd) Number Number School School Group of Writing Code District School Name Code Students Euclid 83 39 York House 4 15 6 84 39 Little Flower 5 20 --85 39 Notre Dame 5 16 6 86 39 St. Patrick's 5 1 -87 39 International College 4 2 -88 40 New Westminster 2 15 3 89 40 Relevant 4 1 -90 41 Burnaby Central 2 34 6 91 41 Burnaby North 2 11 4 92 41 Burnaby South 2 21 --93 41 Alpha 2 14 7 94 41 Cariboo Hill 2 3 1 95 41 St. Thomas More 5 13 6 96 41 Marian Regional 5 2 --97 42 Maple Ridge 2 5 2 98 42 Garibaldi 2 3 3 99 42 Pitt Meadows 2 3 1 100 43 Centennial 2 19 1 101 43 Port Moody 2 6 2 102 43 Port Coquitlam 2 11 1 103 44 Sutherland 2 10 6 104 44 Argyle 2 21 12 105 44 Handsworth 2 30 20 106 44 Windsor 2 12 4 107 44 Carson Graham 2 25 19 108 44 Seycove 2 1 0 109 44 St. Thomas Aquinas 5 2 1 110 45 Hillside 2 36 8 111 45 Sentinel 2 39 17 112 45 West Vancouver 2 22 3 113 46 Elphinstone 3 4 --114 46 Chatelech 3 2 --115 47 Max Cameron 3 7 --116 48 Howe Sound 3 7 ' 4 117 49 Sir Alexander MacKenzie 3 1 1 118 50 George M. Dawson 3 1 --119 52 Prince Rupert 3 8 0 120 54 Houston 3 1 --121 54 Smithers 3 3 -122 55 Lakes District 3 2 2 123 56 Fraser Lake 3 1 - 10 -Table 2: (cont'd) Number Number School School Group of Writing Code District School Name Code Students Euclid 124 56 Fort St. James 3 1 125 57 Duchess Park 3 5 1 126 57 Prince George 3 12 3 127 57 Kelly Road 3 4 --128 57 MacKenzie 3 2 --129 57 D.P. Todd 3 2 --130 57 College Heights 3 4 --131 59 South Peace o O 3 2 132 60 North Peace 3 7 --133 61 Oak Bay 3 2 --134 61 Esquimalt 3 1 1 135 61 Mount Douglas 3 5 1 136 61 Reynolds 3 1 1 137 61 St. Michael's 4 6 1 138 61 St. Margaret's 4 1 --139 61 Norfolk House 4 2 --140 63 Stelly's 3 1 --141 63 Parkland 3 2 1 142 64 Gulf Island's 3 1 --143 65 Cowichan 3 10 2 144 65 Brentwood 4 8 3 145 65 Shawnigan Lake 4 3 --146 65 Queen Margaret's 4 1 --147 68 Nanaimo District 3 12 3 148 69 Ballenas 3 2 0 149 69 Kwalikum 3 1 --150 70 Alberni District 3 15 5 151 71 Georges P. Vanier 3 5 0 152 71 Highland 3 3 0 153 72 Robron 3 7 --154 72 Southgate 3 2 0 155 75 Mission 3 3 --156 77 Summerland 3 4 --157 80 Mount Elizabeth 3 1 0 158 85 North Island 3 9 --159 85 Port Hardy 3 1 --160 86 Kaslo n O 1 --161 86 Prince Charles 3 3 --162 88 Caledonia 3 8 0 163 92 Nisgha 3 1 --164 98 F.H. Collins Total 3 3 1687 420 -11 -Vancouver public schools are all the public high schools in Vancouver; GVRD public schools represents public high schools in Langley, Surrey, White Rock, Delta, Richmond, New Westminster, Burnaby, Maple Ridge, Pitt Meadows, Coquitlam, Port Moody, Port Coquitlam, North Vancouver, and West Vancouver; and Rest of B.C. represents all the other public high schools. The Private Catholic/Non-Catholic schools were determined by the classifications found in B.C. Telephone Directories. In addition, the number of students from each school and the number of students writing the Euclid Math Contest appear in Table 2. A dash means that the school did not par-ticipate in the Euclid Math Contest whereas a zero indicates no students participated. Figure 1 gives a frequency distribution of the UBC mark by course excluding those who dropped out of the math course. Figure 2 has the frequency distribution of the UBC mark for all courses combined while Figure 3 has the frequency distribution by group. Private Catholic and Non-Catholic schools are aggregated together. In these plots, there are noticeable peaks at 38, 47, 49, 58, and 60. This conforms with the policy of instructors not assigning borderline marks (such as 59). - 12 -Math"! 00 Mark Distribution Math120 Mark Distribution 8 H o . CO H o - ! o to 0 20 40 60 80 Math100 Mark Received 42 c 0) 3 w Q> E 3 O CO o C\J o . o - it i i t t n I I —r 20 40 60 80 Math120 Mark Received Math 140 Mark Distribution Math 153 Mark Distribution o . o CO o CM o . I I sJS 0 20 40 60 80 Math140 Mark Received E Z o U) s -CO •o o 3 co w CO O S3 CM O . T tt I I 0 20 40 60 80 Math153 Mark Received Figure 1: Frequency Distribution of U B C Marks by Course - 13 -All Math Courses Combined Mark Distribution Pass II Class Z Class 20 40 60 I 80 Math Mark Received Figure 2: Frequency Distribution of UBC Marks (All Courses Combined) - 14 -Vancouver Schools GVRD Schools 1,1 0 20 40 60 80 UBC Math Mark Received Rest of B.C. Schools 1 CD XJ 55 *S i z o . CO CM H O . O - urn H I z i 1 • i 0 20 40 60 80 UBC Math Mark Received Private Schools I I 0 20 40 60 80 UBC Math Mark Received w c 0) •o w CD i 3 Z O . o CO 8 o . l Mi Mm I I 0 20 40 60 80 UBC Math Mark Received Figure 3: Frequency Distribution of UBC Marks by Group - 15 -In closing this chapter, we list the statistical packages that were used in this analysis. A complete reference is given in the bibliography. — SAS (Statistical Analysis System). — GLIM (Generalized Linear Interactive Modelling). — BMDP (Biomedical Computer Programs). — S (Statistics). More detail concerning which procedures were used for analyses is given in subsequent chapters. - 16 -C H A P T E R III INITIAL ANALYSIS AND CORRELATION STUDY III—1. Introduction We first seek answers to some of the questions using simple descrip-tive statistics (sample means and sample correlations). Although the data represent a "population", we can still view this as a representative sample from a larger population. The mark variables will be treated as continuous variables. The statistical package which was used extensively for matrix manipulation and calculations of simple descriptive statistics was S. In Section 2 of this chapter, we will look at Questions 1 through 5 using correlation as a measure of predictability. Section 3 will discuss eco-logical correlation and derive conditions whereby the "blended" mark is more correlated with the UBC mark than either of the high school or the provincial mark. The conditional correlation phenomena is covered in Sec-tion 4. Finally, Section 5 gives a summary with conclusions based on the initial analysis. - 17 -Ill—2. Simple Descriptive Statistics In Tables 3 through to 6, simple descriptive statistics (sums, sample means, sample standard deviations, and sample correlations with the UBC mark) are provided for the variables high school (HS), provincial mark (PROV), blended mark (BLND), and the UBC mark (UBC). Table 3 com-pares the three predictor variables (HS, PROV, and BLND) by a course breakdown whereas Table 4 examines these statistics by group. Table 5 contains only those high schools who sent 10 or more students to UBC and enrolled in a first year calculus course — there are 51 such high schools. Finally, Table 6 is by high school districts with 10 or more students enroll-ing in first year calculus. The sample correlation statistic gives a measure of how well two vari-ables are linearly related. The range of the statistic is bounded by ±1. A positive value means that as one variable increases, the other tends to increase as well. A value of + 1 indicates a perfect positive (slope) linear relation. The correlation coefficient however, cannot be used to prove a "cause-and-effect" relationship. Returning to our tables, the predictor vari-able with the highest correlation with the UBC mark would suggest intui-tively that it was the best predictor of success amongst the three pre-university marks. In all these tables, the sums (n) across rows may differ due to missing values. With the exception of Math 120, Table 3 indicates that the blended mark is the highest correlated (amongst the three mark variables) with the UBC mark by course and overall. - 18 -Table 3: Simple Statistics by Course by Mark Variable Course High School Mark N Mean SD Corr Provincial Mark NT Mean SD Corr Blended Mark N Mean SD Corr U B C Mark N Mean SD Math 100 1044 79.6 9.9 0.62 1046 78.9 11.4 0.66 1042 79.5 9.6 0.71 1048 50.5 14.7 Math 120 64 89.9 7.1 0.39 64 93.7 4.9 0.27 63 92.0 5.7 0.35 65 59.6 9.3 Math 140 450 73.7 10.2 0.52 451 73.2 11.9 0.57 449 73.8 9.8 0.62 452 47.0 13.1 Math 153 82 89.2 6.3 0.62 82 89.5 8.2 0.53 82 89.6 6.2 0.66 82 55.3 11.9 Total 1640 78.9 10.6 0.60 1643 78.4 12.2 0.64 1636 78.9 10.4 0.68 1647 50.1 14.2 Table 4: Simple Statistics by Group by Mark Variable Group High School Mark N Mean SD Corr Provincial Mark N Mean SD Corr Blended Mark N Mean SD Corr U B C Mark N Mean SD Vancouver 530 77.7 10.9 0.58 531 77.0 12.3 0.69 527 77.7 10.6 0.69 534 52.5 13.1 G V R D 617 78.6 10.8 0.69 617 80.6 11.1 0.65 616 79.8 10.2 0.72 618 49.9 14.4 Rest of B .C . 332 81.3 9.7 0.61 334 77.7 12.5 0.65 332 79.7 10.3 0.69 334 48.0 14.7 Private Non-Catholic 78 78.1 9.7 0.56 78 77.8 13.9 0.73 78 78.2 10.8 0.72 78 48.9 14.8 Private Catholic 83 79.7 10.2 0.64 83 74.9 13.7 0.43 83 77.6 9.9 0.63 83 46.4 15.4 Total 1640 78.9 10.6 0.60 1643 78.4 12.2 0.64 1636 78.9 10.4 0.68 1647 50.1 14.2 - 19 -Table 5: Simple Statistics by School by Mark Variable School High School Mark N Mean SD Corr Provincial Murk N Mean SO Corr Blended Mark N Mean SD Corr UBC Mark N Mean SD Nanaimo District 12 80.3 11.7 0.59 David Thompson (Vancouver) 28 82.1 9.1 0.43 Sir Winston Churchill Carson Graham Killarney Burnaby North Argyle Lord Byng Centennial Seaquam John Oliver Gladstone Notre Dame Vancouver Technical Magee St. Thomas More Britannia Templeton University Hi l l New Westminster Eric Hamber Matthew McNair Kitsilano Burnaby South St. George's Steveston Prince George Hillside Burnaby Central Sentinel Alpha Richmond Prince of Wales West Vancouver Semiahmoo York House South Delta Handsworth Port Coquitlam Point Grey North Delta Windsor Sir Charles Tupper Alberni District Windermere Columneetza Little Flower Vancouver College Cowichan Kamloops Abbotsford Total 60 74.4 11.9 0.76 24 81.3 11.4 0.75 63 73.2 11.6 0.73 11 76.1 11.0 0.76 21 84.6 10.7 0.70 10 72.4 15.8 0.82 19 81.2 10.2 0.88 17 81.1 8.0 0.70 9.0 0.73 6.4 0.07 9.0 0.75 12.5 0.68 9.9 0.76 10.8 0.76 8.0 0.39 9.8 0.80 11.7 0.82 8.0 0.56 52 79.2 24 84.1 10 81.2 29 73.5 28 81.6 13 77.4 21 70.5 25 78.1 10 78.0 14 78.5 69 81.4 46 82.1 15 83.2 18 83.9 32 78.6 53 78.0 12 80.4 9.4 0.03 8.3 0.78 8.6 0.37 8.0 0.74 8.5 0.73 U.5 0.77 7.9 0.66 35 70.0 10.0 0.40 33 79.8 11.6 0.09 39 74.3 11.1 0.71 14 83.6 9.3 0.76 48 73.4 11.4 0.79 34 77.4 10.5 0.53 22 72.0 11.0 0.73 19 78.0 10.5 0.05 13 85.6 10.1 0.80 34 80.0 9.1 0.73 29 78.2 8.0 0.72 10 80.9 10.7 0.92 25 81.2 8.5 0.70 25 76.3 U.5 0.88 11 78.7 8.6 0.69 13 80.9 11.7 0.56 15 78.8 10.2 0.70 20 72.2 9.2 0.80 16 77.6 8.5 0.30 19 84.5 7.7 0.84 29 77.3 11.3 0.67 10 72.8 12.0 0.91 12 72.7 6.4 0.56 12 83.8 9.0 0.78 1279 78.2 10.8 0.62 12 81.1 8.4 0.66 28 82.5 10.0 0.74 60 78.6 12.0 0.75 24 86.8 9.1 0.56 63 79.1 14.7 0.72 11 81.1 11.8 0.58 21 84.3 10.9 0.66 10 77.5 11.2 0.82 19 82.0 9.6 0.72 17 85.4 8.6 0.80 51 76.2 13.7 0.70 24 76.7 10.1 0.57 16 75.2 7.8 0.72 29 81.0 11.3 0.58 31 78.2 12.3 0.61 13 72.5 11.7 0.74 21 77.2 10.0 0.04 24 70.5 11.9 0.72 10 81.8 11.3 0.76 14 80.0 9.5 0.75 69 75.6 10.0 0.58 40 82.6 11.9 0.70 15 75.6 11.1 0.33 18 74.1 9.5 0.49 32 81.6 12.4 0.76 53 82.4 9.7 0.67 12 88.0 10.9 0.52 35 81.1 10.1 0.63 33 76.9 13.5 0.66 39 79.4 11.8 0.72 14 79.4 9.5 0.65 47 80.8 10.6 0.69 34 73.6 12.9 0.74 22 79.5 9.4 0.61 19 80.7 9.7 0.48 13 73.5 10.7 0.79 34 78.9 9.5 0.76 29 85.0 10.7 0.68 10 74.5 11.0 0.70 25 74.4 12.7 0.82 25 79.0 12.0 0.72 11 73.6 13.7 0.47 13 70.0 15.5 0.58 15 82.1 10.9 0.62 20 71.2 10.7 0.78 10 79.7 0.8 0.57 19 63.7 16.3 0.60 29 82.5 11.1 0.53 10 69.8 14.4 0.78 12 72.8 9.4 0.61 12 78.0 10.8 0.05 1279 78.8 12.0 0.02 12 81.0 10.0 0.04 28 82.5 9.1 0.65 60 76.8 11.4 0.79 24 84.3 9.8 0.08 03 76.4 12.5 0.75 11 78.9 11.1 0.68 21 84.7 10.5 0.70 10 75.2 13.0 0.85 19 82.2 9.5 0.83 17 83.6 7.9 0.80 50 78.0 10.9 0.74 24 80.6 7.9 0.63 10 78.5 8.4 0.78 29 77.9 11.3 0.66 28 80.0 9.7 0.74 13 75.2 10.6 0.79 21 74.2 8.4 0.50 24 78.1 10.2 0.78 10 80.2 11.4 0.80 14 79.9 8.7 0.08 69 78.8 8.8 0.07 46 82.6 9.8 0.70 15 79.8 9.1 0.37 18 79.3 8.1 0.68 32 80.4 10.0 0.77 53 80.4 10.1 0.76 12 87.4 0.0 0.56 35 76.0 9.6 0.57 33 78.6 U.9 0.72 39 77.1 U.O 0.75 14 81.7 0.2 0.72 47 77.2 10.4 0.78 34 75.7 10.7 0.70 22 76.0 9.6 0.73 19 79.9 9.6 0.60 13 79.8 13.1 0.81 34 80.0 8.8 0.79 29 81.9 8.8 0.73 10 78.0 10.3 0.85 25 78.0 10.0 0.82 25 78.0 11.3 0.83 11 76.3 10.5 0.59 13 75.7 13.1 0.59 15 80.7 10.3 0.08 20 72.0 9.5 0.82 10 78.9 8.7 0.49 19 74.5 10.7 0.73 29 80.1 10.0 0.68 10 71.5 12.7 0.87 12 72.9 7.0 0.66 12 81.1 9.5 0.74 1275 78.8 10.4 0.68 12 59.8 10.8 28 58.4 13.5 60 50.7 12.4 24 55.5 i 63 55.3 11.3 11 55.2 15.4 21 54.6 12.7 10 54.4 16.9 19 54.2 17.4 17 53.8 14.3 53 53.7 12.8 24 53.5 U.O 16 53.4 10.4 29 53.4 13.0 31 53.2 U . t 13 52.5 10.9 21 52.3 U.O 25 52.3 13.8 10 52.2 12.7 14 51.5 13.8 69 51.0 12.7 46 51.0 13.2 15 50.5 10.1 18 50.4 15.5 32 49.8 15.7 53 49.6 17.3 12 49.3 11.0 35 49.1 13.9 33 49.0 15.3 39 49.0 14.6 14 48.8 13.3 48 48.7 12.9 34 48.6 13.0 22 48.5 12.4 19 48.1 11.3 13 48.0 17.6 34 47.9 15.3 29 47.2 12.9 10 40.6 15.8 25 46.4 17.9 25 46.1 19.5 11 45.8 12.5 13 45.1 13.2 15 44.8 11.7 20 44.1 12.6 10 43.8 14.6 19 42.8 15.8 29 42.2 18.0 10 38.3 17.8 12 38.2 13.1 12 37.7 15.7 1283 50.4 14.2 - 20 -Table 6: Simple Statistics by School District by Mark Variable School District High School Mark N Mean SD Corr Provincial Mark N Mean SD Corr Blended Mark N Mean SD Corr UBC Mark N Mean SD 6 8 12 8 0 . 3 1 1 . 7 0 . 5 9 12 8 1 . 1 8 . 4 0 . 6 6 12 8 1 . 0 1 0 . 0 0 . 6 4 12 5 9 . 8 1 0 . 8 2 3 19 8 5 . 2 8.1 0 . 8 3 19 8 1 . 0 1 3 . 8 0 . 8 7 19 8 3 . 3 1 0 . 4 0 . 9 0 19 5 4 . 2 1 3 . 9 4 2 11 8 2 . 1 1 0 . 7 0 . 6 9 11 8 3 . 8 9 .7 0 . 7 7 11 8 3 . 2 9 . 9 0 . 7 6 11 5 3 . 8 14 .1 3 5 13 8 5 . 5 8 . 5 0 . 5 8 13 7 9 . 2 1 2 . 8 0 . 4 1 13 8 2 . 5 1 0 . 1 0 . 5 1 13 5 2 . 9 1 1 . 3 4 3 3 4 8 2 . 0 1 0 . 2 0 . 8 5 3 5 8 0 . 7 1 0 . 5 0 . 7 2 3 4 8 1 . 5 9 .7 0 . 8 3 3 5 5 2 . 1 1 6 . 2 3 9 6 5 0 7 8 . 1 1 0 . 8 0 . 5 6 6 5 1 7 6 . 9 1 2 . 7 0 . 6 6 6 4 7 7 7 . 8 1 0 . 5 0 . 6 8 6 5 4 5 1 . 5 1 3 . 8 4 4 9 7 8 0 . 8 1 0 . 0 0 . 7 0 9 7 8 3 . 8 1 1 . 1 0 . 6 1 9 7 8 2 . 5 9 . 8 0 . 6 9 9 7 5 1 . 5 1 2 . 4 6 1 18 7 9 . 6 8 .2 0 . 4 8 18 7 9 . 6 1 1 . 9 0 . 7 1 18 7 9 . 8 9 . 4 0 . 6 6 18' 5 1 . 3 1 4 . 0 4 1 9 4 8 0 . 2 1 0 . 7 0 . 6 7 9 4 7 6 . 6 1 2 . 1 0 . 5 9 9 4 7 8 . 7 1 0 . 6 0 . 6 8 9 4 5 0 . 4 1 4 . 3 4 0 15 7 8 . 1 8 . 4 0 . 5 9 15 7 8 . 9 1 1 . 5 0 . 7 8 15 7 8 . 7 9 . 5 0 . 7 4 15 5 0 . 1 14 .4 3 8 1 4 7 7 7 . 8 1 1 . 1 0 . 7 5 1 4 6 8 1 . 9 1 0 . 7 0 . 6 7 1 4 6 8 0 . 1 1 0 . 3 0 . 7 6 147 4 9 . 7 14 .6 5 V . 2 9 8 4 . 8 1 0 . 2 0 . 5 8 2 9 8 2 . 3 1 3 . 9 0 . 7 0 2 9 8 3 . 8 1 1 . 4 0 . 6 8 2 9 4 9 . 1 12 .1 4 5 9 6 7 2 . 4 1 0 . 8 0 . 6 2 9 6 8 0 . 0 1 0 . 6 0 . 6 7 9 6 7 6 . 5 10 .1 0 . 6 8 9 6 4 8 . 9 13 .7 8 5 10 8 2 . 3 8 . 5 0 . 6 5 10 7 5 . 2 1 4 . 1 0 . 4 6 10 7 9 . 1 1 0 . 9 0 . 5 5 10 4 8 . 7 1 5 . 6 3 7 8 1 7 9 . 3 9 . 8 0 . 7 8 8 1 8 0 . 3 1 0 . 3 0 . 7 5 8 1 8 0 . 1 9 . 5 0 . 8 1 8 1 4 8 . 6 16 .4 3 6 4 8 7 8 . 5 1 0 . 3 0 . 6 5 4 8 7 6 . 3 1 2 . 0 0 . 6 1 4 8 7 7 . 7 1 0 . 1 0 . 6 9 4 8 4 6 . 3 1 3 . 5 3 3 10 7 6 . 8 1 1 . 8 0 . 4 1 11 7 3 . 1 1 2 . 6 0 . 6 9 10 7 5 . 4 1 1 . 4 0 . 6 0 11 4 6 . 0 13 .7 7 0 15 7 8 . 8 1 0 . 2 0 . 7 0 15 8 2 . 1 1 0 . 9 0 . 6 2 15 8 0 . 7 1 0 . 3 0 . 6 8 15 4 4 . 8 1 1 . 7 6 5 2 2 7 5 . 5 1 1 . 1 0 . 7 0 2 2 7 6 . 1 1 5 . 1 0 . 7 6 2 2 7 6 . 0 1 2 . 3 0 . 7 8 2 2 4 4 . 5 1 5 . 9 2 7 19 7 7 . 5 8 . 4 0 . 3 4 19 7 7 . 3 1 3 . 8 0 . 4 7 19 7 7 . 6 1 0 . 4 0 . 4 6 19 4 4 . 2 1 3 . 8 3 4 17 8 3 . 6 8.1 0 . 6 4 17 7 7 . 6 1 0 . 1 0 . 4 7 17 8 0 . 9 8 . 8 0 . 5 7 17 4 2 . 0 1 6 . 0 2 4 19 7 5 . 7 9 . 3 0 . 7 8 19 7 4 . 8 1 3 . 0 0 . 7 5 19 7 5 . 5 1 0 . 6 0 . 7 9 19 4 1 . 1 15 .1 Total 1476 78.5 10.7 0.61 1478 78.6 12.2 0.63 1472 78.8 10.4 0.68 1482 50.3 14.2 - 21 -Now looking at correlations by group in Table 4, we see once again that the blended mark has the highest or very close to the highest correlation with UBC for every group. At the other end of the spectrum, the high school mark has the lowest correlation with the UBC mark in terms of overall stu-dents and Math 100, Math 140, Vancouver, Rest of B.C., and Private Non-Catholic students. The provincial mark had the lowest correlation with the UBC mark in the remaining categories. Table 5, which lists statistics for "large" U10 students) schools by descending order of UBC means, reveals that in 24/51 = 47% of the schools that the blended mark had the highest correlation with the UBC mark. The high school mark and provincial mark scored 25/51 = 49% and 8/51 = 16% respectively. Note that ties in corre-lation (e.g., PROV and BLND for Seaquam) result in the percentages sum-ming over 100%. The total of all students summed over the 51 high schools has the blended mark more correlated to the UBC mark than either the high school mark or the provincial mark. Finally, in Table 6, the 22 school districts with 10 or more students enrolling in first year calculus results in 41%, 32%, and 27% (of the schools) for blended mark, provincial mark, and high school mark respectively. Again, the list is by descending order of UBC means. We could represent the means of the mark variables and the UBC variable of Table 6 graphically and geographically as in Figure 4. The S function "faces" is a graphical method for plotting multivariate data — the idea being that facial expressions are easier to identify with. For each face (number is the school district), the area represents the number of stu-dents originating from a particular school district with large values of n cor-responding to large area. - 22 -It should be noted that although the ratio of the largest to smallest sum (n) is 65, the graphical area ratio is not indicative of this! The means of the high school mark, the provincial mark, and the blended mark calculated over the particular school district are represented by the width of the eye-brows, the width of the eyes, and the length of the nose respectively — wid-er or longer features imply higher means. Finally, the width and curve of the smile represents the average (over the school district) UBC mark with a broad smile representing a high UBC mean and a small frown implying a low UBC mean. In viewing Figure 4, we see somewhat wide smiles (moder-ately high UBC means) complemented with average to moderate eyebrow widths (high school means), eye widths (provincial means), and nose lengths (blended means) in the lower mainland and GVRD (including Vancouver). However, among the Interior school districts, we see some sad faces (school districts 24, 27, and 57). In particular, students in school district 57 do very well in their pre-university math (the wide eyes and eyebrows with a long nose) but end up with a "frown" (low UBC mean). This would suggest that some Interior high schools are not preparing their students for first year calculus as well as the schools in the GVRD or there marking stan-dards are higher than for the GVRD! In Table 7, we consider the validity of the conjecture by looking at the provincial and UBC means by high school (A or B letter grades) and by par-ticipation in the Euclid Math Contest. Overall, the mean of the high school B students who participated in the Euclid did perform like high school A - 24 -students who did not write the Euclid on the provincial (both about 85% average). Meanwhile, the high school B Euclid students performed slightly lower (average of 54) at U B C than A high school Non-Euclid students (aver-age of 58). Table 7: Comparison of Euclid/Non-Euclid Students School Grade Provincial Mark Euclid Non-Euclid N Mean SD N Mean SD UBC Mark Euclid Non-Euclid N Mean SD N Mean SD South Delta A 7 89.3 7.0 6 81.3 4.5 7 62.9 6.4 6 49.7 13.0 B 7 80.9 7.5 7 76.6 5.7 7 55.0 7.3 7 43.6 13.0 Steveston A 10 93.9 4.0 5 93.2 3.6 10 70.3 4.1 5 66.0 5.9 B 2 78.5 17.7 20 80.7 7.4 2 51.0 0.0 20 47.2 13.8 Sir Winston Churchill A 7 94.4 2.5 3 86.0 11.0 7 71.6 5.4 3 67.0 12.1 B 9 88.7 3.8 13 79.9 9.2 9 62.4 10.3 13 60.3 7.3 David Thompson (Vancouver) A 4 89.5 5.2. 8 87.8 6.5 4 65.8 5.7 8 62.9 7.2 B 3 88.0 11.3 8 79.3 8.8 3 61.7 4.2 8 53.1 16.9 Notre Dame A 2 86.5 10.6 4 79.8 6.1 2 67.0 9.9 4 56.3 9.2 B 3 74.7 6.7 4 70.0 2.9 3 53.0 7.8 4 53.3 3.5 Argyle A 10 93.2 4.7 3 84.7 8.3 10 60.5 9.9 3 66.7 8.4 B 2 77.0 0.0 3 71.3 3.5 2 43.5 7.8 3 41.7 3.5 Handsworth A 4 93.3 3.6 2 99.0 1.4 4 56.0 4.5 2 63.0 1.4 B 13 86.8 6.4 3 85.3 8.5 13 50.2 11.0 3 43.0 11.3 Alberni District A 2 94.5 0.7 2 94.5 7.8 2 60.0 4.2 2 57.5 0.7 B 3 87.0 1.7 4 78.5 6.1 3 43.3 7.6 4 39.5 7.9 Total A 242 91.4 6.1 278 84.5 9.0 243 62.6 9.1 280 58.2 10.2 B 145 84.7 7.6 504 76.0 10.0 145 54.4 9.9 504 48.1 12.5 - 25 -Ill—3. Ecological Correlation Whenever using correlation, one should be cautious about using corre-lation coefficients based on rates or averages. These, better known as eco-logical correlations (see Robinson [1950] or Freedman et al. [1978]) can be very misleading as the following contrived example indicates. Consider the variables X and Y in Table 8 grouped by a third variable Z. Table 8: Ecological Correlation Example X Y Z 1 1 A 1 2 A 2 1 A 1.5 1.5 2 2 A 1 3 B 1 4 B 3 1 B 2.0 2.5 3 2 B 2 3 C 2 4 C 3 3 C 2.5 3.5 3 4 C In Figure 5, we have a scatterplot of Y versus X giving a correlation of 0 and a scatterplot of fxx and (iY by groups yielding a sample (weighted) correlation of 1! This is due to a wide spread about the mean in each group. Scatter Plot of All Data Points (r»0) B C C B C C A A B A A B Scatter Plot of Group Means V C B 8 A i x Figure 5: Scatter Plot of Data Points and Group Means - 27 -Hence, the ecological correlation (correlation of the means) can be quite mis-leading. Ecological correlations are frequently used in the social sciences such as in political science and sociology where the averages are found over regions or types. Percentages (for dichotomous variables) as well as means are subject to ecological correlation misinterpretation as Robinson [1950] gives an illustrative sociological example. Robinson also defines the follow-ing relationship between the total individual correlation (pt), the ecological correlation (pe), and the within-areas (weighted average by size) correlation where TJX and n y are correlation ratios measuring "the degree to which the values of X and Y show clustering by area" (Robinson [1950]). More specifi-cally, Knapp [1977] gives the following formulas. If X,y be the ith observa-tion in the jth group on X (i = 1,n,-; ; = 1, ...,m; JJj «j = then let (Pw). (3.1) tjarjv ( 3 . 2 ) ( 3 . 3 ) ) i ( 3 . 4 ) - 2 8 -then sPt = E E ( ^ - ^ ) ( ^ - ^ ) -SPb = ^rijbij,.- Hx)(lijy- fly), (3.5) SSbx = E " i ( ^ - ^ ) 2 . (3.6) Let similar quantities be defined with Y replacing X and let Pt = 'VJ Pt = SPb SSia,. SSby^ - 29 -Equation ( 3 . 2 ) is the jth mean for group X, ( 3 . 3 ) is the grand mean of X, ( 3 . 4 ) is the total corrected sums of squares for X, ( 3 . 5 ) is the within-group sum of squares for X, and ( 3 . 6 ) is the between-group sum of squares for X. Now consider the case when the ecological correlation is larger than the total correlation. That is when pe > pt,{ or y. > p < ) VxVv (1 - V*Vv)Pt . > y/0--vl)0-^vl)Pw f kpt > Pw} (3.7) where Now, using calculus, k in equation (3 .7) takes a mininum value of 1 when 7ar = Vy Therefore, whenever the within correlation is smaller than the total correlation then the ecological correlation is greater than the total cor-relation. Now the within correlation is usually smaller than the total corre-lation especially for relatively homogeneous sub-groups. Hence, ecological correlations tend to be larger than total correlations! - 3 0 -Now returning to our data, in large groupings (such as Math 100 or Vancouver schools) the blended mark tends to have the highest correlation with the U B C mark. Because the blended mark is an average, a natural question would be "Is there an ecological correlation analogy for the blended score?" Mathematically, we seek conditions whereby we have three vari-ables Xi, X21 and Y such that We start off with the assumption that both X\ and X2 have a common standard deviation a. Now, by the Cauchy Swartz inequality corr ( Xl + X2 2 ,Y)> corr(Jt,-,r) z = l , 2 . (3.8) var( + _ a 3 +cov(XuX7) < 2 ' ~ 2 ~ 2 And so i ± ^ , K ) / [ v a r ( ^ ) v a r ( K ) ' corr ( 2 ,Y) as C 0 v ( cov( Y) > 2 cr ay - 31 -Furthermore, if we assume that the covariance of X\ with Y is identical to the covariance of with Y, then (3.8) holds. A n interpretation of this which is analogous to ecological correlation is that averaging reduces some variability and correlation increases. - 32 -Ill—4. Conditional Correlation Phenomena We start off this section with a few examples which are referenced in an article by Akemann et al. [1983]. These examples concern university admission to graduate programmes based on the undergraduate Grade Point Average (GPA) and the Graduate Record Examination (GRE) scores. — It was noted in "the 6-year period 1969-1975 that the correlation between the GPA and GRE scores among entering graduate stu-dents in psychology at the University of Oregon was negative for each of six entering classes." — At another university, it was concluded that students gain admit-tance to graduate schools if they had high undergraduate GPA and low GRE scores or vice versa. Hence admitted students gen-erate a negative correlation between GPA and GRE scores. — At the University of Southern California, it was recommended that the GRE be dropped as a requirement for admission because of the low correlation. In the above examples, admission was based on receiving a "high value" on a function of the two scores — this will be clarified later on. With this restricted range, the tendency of the correlation to be low is known as the conditional correlation phenomena. From a mathematical standpoint, if we had two variables X and Y which are used to predict a third variable Z by some function W(X,Y), then the correlation between X and Y would tend to be lower over the restricted range where W(X,Y) is "large". More precisely, consider the following three admission strategies. - 33 -• aX + bY > a, a, b > 0. • min(X,y) >a. • max(X,r)>a. Now assuming the joint distribution of the two variables is bivariate nor-mal, and that p*± 1 for the full population, then Akemann et al. [1983] has shown that the following conditional correlation results hold. lim p[(X,Y)\aX + bY >a] •« -1, (a,b>0). Or—*00 lim p[(X,Y)\mm(X,Y)>a) = 0. ot—•oo lim p[(XtY)\max(X,Y)>a] = -1. To understand these results, consider two athletic events say the 100-meter race and the long jump. In the entire population the correlation of the per-formance between these two events is probably positive; however, the reduced population of world-class athletes in either event would lead to a negative correlation! Now returning to our data, let us test for this phenomena. Table 9 contains the conditional correlations for the three above-mentioned functions (with a = b = 0.5) on the variables HS and PROV. To save space, the per-centage of admissible students is used instead of a. For example, in the top 10% of the students based on their blended mark, the conditional correlation is -0.039. In Table 10, the correlations of the predictor variables (HS, PROV, and BLND) with UBC conditional on the predictor variable being above certain quantiles is displayed. The blended mark seems to be less - 34 -susceptible (showing no negative correlations) to the conditional correlation phenomena than either the high school mark or the provincial mark. Hence, due to its "stability" the blended mark may be thought of as the "best" of the three predictors. In addition, note that the conditional correla-tions between the blended mark and the UBC mark fluctuate between the 20% and 4% admissible ranges. This can be explained by the clustering of high or low UBC marks between percentages. - 35 -Table 9: Conditional Correlations by Function Admissible 0.5HS + 0.5PROV Min(HS,PROV) Max(HS,PROV) 100% 0.656 0.656 0.656 90% 0.562 0.633 0.542 80% 0.492 0.586 0.453 70% 0.434 0.579 0.352 60% 0.342 0.524 0.265 50% 0.284 0.469 0.180 40% 0.213 0.432 0.085 30% 0.153 0.469 0.022 20% 0.055 0.454 -0.090 15% -0.022 0.402 -0.125 10% -0.039 0.341 -0.153 5% -0.176 0.193 -0.230 4% -0.232 0.271 -0.272 3% -0.274 0.187 -0.272 2% -0.321 0.183 -0.451 1% -0.521 -0.103 -0.495 - 36 -Table 10: Conditional Correlations by Predictor Variable Admissible (HS,UBC) (PROV.UBC) (BLND,UBC) 100% 0.603 0.637 0.682 90% 0.568 0.596 0.643 80% 0.538 0.578 0.603 70% 0.504 0.522 0.565 60% 0.455 0.466 0.518 50% 0.407 0.434 0.484 40% 0.373 0.373 0.443 30% 0.339 0.306 0.353 20% 0.245 0.187 0.254 15% 0.225 0.122 0.258 10% 0.235 0.070 0.211 5% -0.014 -0.020 0.151 4% -0.014 -0.020 0.253 3% -0.151 -0.055 0.018 2% -0.112 -0.055 0.018 1% -0.152 -0.078 0.325 - 3 7 -I l l — 5 Summary Based upon the initial analysis in this chapter, the blended mark seems to be the best predictor of success in large groups such as those stu-dents in Math 100, Math 140, and Math 153 or those students originating from Vancouver, GVRD, Rest of B.C., and Private Non-Catholic schools. On the other hand, when analyzing the data in small groups (Math 120 or Private Catholic schools) or by individual schools the high school mark tends to be the best predictor. In Section 3, we discussed ecological correlations and derived sufficient but not necessary conditions whereby the blended mark would have the largest correlation with the UBC mark. Finally, we looked at the conditional correlation phenomena and its implications on admission strategies. - 38 -C H A P T E R IV DATA ANALYSIS (Continuous Methods) IV—1. Introduction We begin our data analysis by treating all the variables as continuous variables. The withdrawn students are excluded from the methods in this chapter. Section 1 uses simple and multiple regression models. Analysis of variance techniques are used in Section 2 and Section 3 covers analysis of covariance using HS, PROV, or BLND as a covariate. Section 4 uses an application of discriminant analysis to classify the students. A summary is given in Section 5. - 39 -IV—2 . Simple and Multiple Regression When one wants to predict a variable's values from another's values, the idea of regression conies into mind. Hence, if we let UBC be the response variable and HS, PROV, and BLND be predictor variables, then 2 we can look at the R statistics for the three separate simple least squares regression in determining which independent variable (HS, PROV, or BLND) explains the most variation in the dependent variable (UBC). How-ever, since the R value is just the square of the correlation coefficients of the last chapter, then the same conclusions would be reached here — name-ly, BLND would be the best predictor in explaining the variation in the UBC marks in large groups. In terms of overall students, the R for HS, PROV, and BLND are 0.36, 0.41, and 0.46 respectively. Now after per-o forming a multiple regression of UBC on HS and PROV, we find that R = 0.47. Although the correlation between HS and PROV is moderate (0.66), one has to be cautious of the effects of multicollinearity. Nevertheless, the multiple regression coefficients for HS and PROV were 0.43 and 0.50 respectively. The coefficient for BLND due to simple regression of UBC on BLND was 0.93. Since the coefficients for HS and PROV are each approxi-mately one half of the coefficient for BLND, this would seem to suggest that the weights (of 0.5) used for the blended mark were appropriate. Figure 6 gives normal quantile plots of the residuals to test for the adequacy to the regression model. - 40 -Residuals From Regression of UBC on HS Residuals From Regression of UBC on PROV Figure 6: Normal Quantile Plots of the Residuals - 41 -Another criterion often used to determine the "best" predictor is Mal-lows' Cp statistic which is calculated by Cp = (RSSp/s3) + 2p - n, where n is the number of observations, p is the number of parameters including the intercept, RSSp is the residual sum of squares for the regres-sion equation with p parameters, and s2 is the usual estimate of a7 based on the full regression. The Cp statistic is often used in multiple regression to find an "optimal" subset of explanatory variables to be included into the regression equation. As more variables are included into the regression equation, RSSp will tend to decrease, but at the same time the 2p factor will increase the Cp value. Hence, with these trade offs, one usually selects small subsets of variables with low Cp values. Although our predictor variables (HS, PROV, BLND) are not neces-sarily independent of one another, we nevertheless calculate the Cp values (as well as the R2 values) for each predictor variable by course as outlined in Table 11 and by group as in Table 12. The results are the same using the Cp criterion as was using the last chapter for different courses and groups. Similarly, the results are the same when we look at the 51 "large" schools. IV—3. Analysis of Variance In this section, we shall use analysis of variance techniques to test for differences in UBC performances from the 51 "large" schools. In addition, we shall examine the effects of the math course taken and the high school grouping on the UBC mark. - 42 -Table 11: R 2 and C p Values by Course Course High School Provincial Blended MathlOO R 2 0.38 0.43 0.51 C P 258.38 157.84 0.73 Math 120 R 2 0.15 0.07 0.13 C P 3.94 9.72 5.78 Mathl40 R 2 0.27 0.33 0.38 c P 84.19 40.70 0.82 Mathl53 R 2 0.38 0.28 0.44 c P 13.29 28.37 4.33 Total R 2 0.36 0.41 0.47 c p 315.68 181.92 1.34 Table 12: R 2 and C p Values by Group Group High School Provincial Blended Vancouver R 2 0.33 0.47 0.48 c P 172.67 23.23 15.65 GVRD R 2 0.47 0.42 0.52 c P 69.69 133.64 4.21 Rest of B.C. R 2 0.37 0.42 0.47 c P 63.90 29.02 0.57 Private Non-Catholic R 2 0.31 0.53 0.52 c P 37.41 1.13 4.30 Private Catholic R 2 0.41 0.19 0.40 c P 7.04 40.31 9.46 Total R 2 0.36 0.41 0.47 c p 315.68 181.92 1.34 - 4 3 -To test for school differences, we first assume that the 51 schools' UBC grades are normally distributed with common variance and that the observations are independent — normal scores plots by school seem to sup-port the normality assumption while the sample standard deviations by school (found in Table 5) are not too different! Using the SAS procedure GLM (General Linear Models), we test the hypothesis that the 51 schools have a common mean. The resulting F-ratio (2.45) is then compared with the F distribution having (50,1232) degrees of freedom. The resulting p-value is 0.0001, hence we conclude that there are significant differences in the high school means. Table 13 gives a listing in descending order of school means using the Student-Newman-Keuls multiple range test (a = 0.05). The means that are joined by the same letter are not significantly (a = 0.05) dif-ferent. Similarly, we conclude that UBC means for different math courses or different groups are statistically (a = 0.05) different. Their corresponding Student-Newman-Keuls multiple range tests appear below in Tables 14 and 15 respectively. The means (under SNK) with the same letter are not sig-nificantly (a = 0.05) different. Next we again restrict ourselves to the "large" schools and test for differences in group means with schools nested within groups. Hence, we also want to test for differences among schools within groups. The analysis of variance model is as follows. - 44 -Table 13: Student-Newman-Keuls Multiple Range Test on Schools SNK Mean Group N School a 59.833 3 12 Nanaimo District a 58.357 1 28 David Thompson (Vancouver) a 56.700 1 60 Sir Winston Churchill b a 55.542 2 24 Carson Graham b a 55.302 1 63 Killarney b a 55.182 2 11 Burnaby North b a 54.571 2 21 Argyle b a 54.400 1 10 Lord Byng b a 54.211 2 19 Centennial b a 53.765 2 17 Seaquam b a 53.660 1 53 John Oliver b a 53.500 1 24 Gladstone b a 53.438 5 16 Notre Dame b a 53.414 1 29 Vancouver Technical b a 53.194 1 31 Magee b a 52.538 5 13 St. Thomas More b a 52.286 1 21 Britannia b a 52.280 1 25 Templeton b a 52.200 1 10 University Hill b a 51.500 2 14 New Westminster b a 50.986 1 69 Eric Hamber b a 50.957 2 46 Matthew McNair b a 50.467 1 15 Kitsilano b a 50.389 2 18 Burnaby South b a 49.781 4 32 St. George's b a 49.642 2 53 Steveston b a 49.333 3 12 Prince George b a 49.143 2 35 Hillside b a 49.030 2 33 Burnaby Central b a 49.000 2 39 Sentinel b a 48.786 2 14 Alpha b a 48.667 2 48 Richmond b a 48.618 1 34 Prince of Wales b a 48.545 2 22 West Vancouver b a 48.105 2 19 Semiahmoo b a 48.000 4 13 York House b a 47.912 2 34 South Delta b a 47.172 2 29 Handsworth b a 46.600 2 10 Port Coquitlam b a 46.360 1 25 Point Grey b a 46.120 2 25 North Delta b a 45.818 2 11 Windsor b a 45.077 1 13 Sir Charles Tupper b a 44.800 3 15 Alberni District b a 44.100 1 20 Windermere b a 43.813 3 16 Columneetza b a 42.842 5 19 Little Flower b a 42.207 5 29 Vancouver College b 38.300 3 10 Cowichan b 38.167 3 12 Kamloops b 37.667 3 12 Abbotsford - 45 -Table 14: Student-Newman-Keuls Multiple Range Test on Courses SNK Mean N Course a 59.631 65 Math 120 b 55.341 82 Math 153 c 50.457 1048 Math 100 d 47.015 452 Math 140 Table 15: Student-Newman-Keuls Multiple Range Test on Groups SNK Mean N Group a 52.476 534 Vancouver a b a 49.872 618 GVRD b a b a 48.949 78 Private Non-Catholic b b 48.009 334 Rest of B.C. b b 46.361 83 Private Catholic - 46 -Yijk = p + Ti + ^ ( O J + e(Wfc (4.1) where ji is the grand UBC mean, 77 is the ith treatment (group) level i= 1,...,5, fay is the effect due to the jth school in the ith group, €({J)k is the error term from the jth school in the ith group, Yijk is the UBC mark for the kth student in the jth school in the ith group. Now assume that €(,7)* are independently normally distributed with com-mon variance <r3. The hypotheses of interest are • Ho: 7 1 = 7 2 = . . . = 7-5 = 0. • H 0 : = 0 for all j and all t. . 47 -An ANOVA of the model (4.1) was performed and the resulting p-value (0.0004) lead us to conclude that there is significant variability of school means within geographical regions. Furthermore, the normal scores plot of the resulting residuals does not contradict the normality assumption. - 48 -IV—4. Analysis of Covariance In the preceding section, we used analysis of variance to try to explain the differences in performances from the large schools nested within groups. In this section, we shall use analysis of covariance to eliminate the effects of students having different Algebra 12 marks. The variables HS, PROV, and BLND will each in turn be used as a covariate in the nested model of the last section. Our analysis of covariance model can be written as follows. Yijk = M + Ti + /3ii)j + 7(Xijk"Y) + €(ij)k where H is the grand UBC mean, Ti is the ith treatment (group) level i= 1 , . . . , 5 , /?(,),• is the effect due to the jth school in the ith group, e(ij)k is the error term from the jth school in the ith group, Xtjk is the covariate (HS, PROV, or BLND) mark, X is the grand mean of the covariate, 7 is the regression coefficient, Y^k is the UBC mark for the kth student in the jth school in the ith group. In using this model, we made the assumption that the regression coefficients for each of the 5 groups were equal. The three ANOCOVAs (one with each - 49 -predictor variable as a covariate) all suggest that there are significant vari-ability of schools means within geographic regions after the effects of the covariate (Algebra 12 marks) is removed. This may mean that either there are actual subtle differences in the schools (within group) in their teaching ability or that the grading standards of schools (within groups) are not equivalent! The R2 value for HS, PROV, and BLND models are 0.53, 0.49, and 0.56 respectively. Hence, the blended mark once again seems to be the best of the three predictors (in explaining the total variation)! Finally, the normal scores plot of the residuals seem to satisfy our normality assumption in each of the ANOCOVA model. - 50 -IV—5. Discriminant Analysis In this section, we use discriminant analysis (see Anderson [1958], or Morrison [1976]) to examine the conjecture that high schools B students who wrote the Euclid Math Contest performed like high school A students who did not participate in the Euclid Math Contest. Using the SAS proce-dure DISCRIM, we first determine a classification rule using only the Non-Euclid high school students. Based on their performances on the provincial exam and first year calculus mark, the students are grouped into 6 classes corresponding to their high school letter grade (A, B, C + , C, P, and F). The prior probabilities are taken to be equal for each class — this assumption will not lead to misleading results as long as there is enough data. Next we classify (based on the above classification scheme) the high school B students who wrote the Euclid Math Contest. The distributions by course and by group appears in Tables 16 and 17 respectively. In Table 16, we see that from the overall high school B students who wrote the Euclid that 68% of them were classified (academically similar) to high school A students who did not write the Euclid. Also, these "B-students" are "behaving" like "A-students" (by showing the highest per-centage) in every course (see Table 16) and in every group (with the excep-tion of the Private Catholic schools). This would suggest some validity to the conjecture. - 51 -Table 16: Classification of Euclid High School B Students by Course Course A B C + C P F Total Math 100 N 60 21 4 1 0 12 98 % 61.22 21.43 4.08 1.02 0.00 12.24 100.00 Math 120 N 10 0 .1 0 0 0 13 % 76.92 0.00 23.08 0.00 0.00 0.00 100.00 Math 140 N 19 2 3 3 0 0 27 % 70.37 7.41 11.11 11.11 0.00 0.00 100.00 Math 153 N 7 0 0 0 0 0 7 % 100.00 0.00 0.00 0.00 0.00 0.00 100.00 Total N 98 31 13 2 0 1 145 % 67.59 21.38 8.97 1.38 0.00 0.69 100.00 Table 17: Classification of Euclid High School B Students by Group Group A B C + C P F Total Vancouver N 25 7 3 1 0 0 36 % 69.44 19.44 8.33 2.78 0.00 0.00 100.00 GVRD N 38 25 8 0 0 0 71 % 53.52 35.21 11.27 0.00 0.00 0.00 100.00 Rest of B.C. N 11 2 0 0 0 7 20 % 55.00 10.00 0.00 0.00 0.00 35.00 100.00 Private N 10 2 1 0 0 0 13 Non-Catholic % 76.92 15.38 7.69 0.00 0.00 0.00 100.00 Private N 2 3 0 0 0 0 5 Catholic % 40.00 60.00 0.00 0.00 0.00 0.00 100.00 Total N 98 31 13 2 0 1 145 % 67.59 21.38 8.97 1.38 0.00 0.69 100.00 - 52 -IV—6. Summary The techniques in this chapter gave results similar to the initial anal-ysis conducted in Chapter III. The blended mark was seen as an overall better predictor in the regression and the analysis of covariance models. However, in viewing the three predictors by individual schools, the high school mark was the best predictor most (49%) of the time while the blended mark was a close second (47%) and the provincial mark a distant last (16%). The Student-Newman-Keuls Multiple-Range tests that were conducted reveals that there are subtle differences among the course means with Math 120 having the highest (59.6) and Math 140 having the lowest (47.0) whereas students from Vancouver schools averaged 52.5 at UBC and Pri-vate Catholic students had a 46.4 UBC mean. Finally, the application of discriminant analysis seems to justify the conjecture of Euclid high school B students performing like Non-Euclid high school A students. - 53 -C H A P T E R V SCALING GRADES AND OPTIMAL SCORES V — 1 . Introduction In this chapter (as well as the next), we take the categorical approach to analyzing the data. The Algebra 12 marks are converted to grades by the following scheme. •— 86-100 represents an A letter grade. — 73-85 represents a B letter grade. — 67-72 represents a C~*~ letter grade. — 60-66 represents a C letter grade. — 50-59 represents a P (Pass) letter grade. — <50 represents a F (Fail) letter grade. The UBC marks are translated as follows. — 60-75 represents a I (First Class) standing. — 49-59 represents a II (Second Class) standing. — 38-48 represents a P (Pass) standing. — 30-37 represents a S (Supplemental) standing. — <30 represents a F (Fail) standing. - 54 -Hence, we can represent our data in a two-way contingency table whereby the cells in the table represent counts which correspond to the cross-classification of the two factors. In general, let X and Y be two classification variables with r and c classes respectively. Let n be the total sample size and n,y be the number of observations that are classified into the ith class of variable X and the jth class of variable Y. Furthermore, let n;. = Zjfly (iih row sum), n.j = Z[nij {jth column sum), and n = £ L i £$=1 n,-j (total sample size). In our grade data, we must first decide upon how to classify the 2.5% number of students who had withdrawn from the U B C first year calculus course. It is often believed that students withdraw from a course due to academic reasons; however, there may be other reasons involved such as financial support. On one hand, we could simply omit these students from the study. However, if we look at the high school grade frequency distribu-tion for those students who had at least a pass and for those students who dropped out and also compare the high school frequency distribution for those students who failed with those who withdrew (see Table 18), then we can apply a x test of homogeneity of populations. - 55 -The Pearson's x statistic for the population of students with at least a' pass and the population of students who dropped out is 46.21 with 5 degrees of freedom which is highly significant, whereas the Pearson's x" sta-tistic for those students who failed with those students who withdrew is 9.74 with 5 degrees of freedom which is not significant (p-value of about 10%) — it should be noted that in the original data (i.e., prior to making the 9 adjustment factor to the Math 153 marks), the Pearson's statistic for "failed-students" and "withdrawn-students" was highly nonsignificant (p-value of 0.63)! In addition, if we look at the proportion of students receiving each high school grade for those who failed first year calculus and for those who withdrew as in Table 18, we see that the proportions of each grade are not that different for the two groups. Furthermore, if students were withdrawing for reasons other than academic ones, then one would expect the withdrew proportions to be somewhat the same. Hence, for this chapter (and the one after), the withdrawn students are aggregated with those students who had failed the first year calculus course. In Section 2, we look at canonical correlations and optimal scores. Other methods of assigning scores including a normally based method which yield monotone scores will be discussed in Section 3. A summary found in Section 4 will conclude this chapter. . . . - 56 -V—2. Canonical Correlations In our two-way contingency table let x[ and yj be scores assigned to the ith and jth categories for variable X and Y respectively. The sample correlation coefficient is given by P = n»ii&intf(g<-g)(yj-y) y/iZUi n,-.(x,- - «-i(Vi - V)2)' Since the correlation coefficient is scale invariant, then without loss of gen-erality we may assume that scores have mean 0 and variance 1. Hence, the sample correlation can be written as > - EE n - 1 ' Suppose we want to find scores {x;} and {yj} such that p is maximized. Now let X and u be Lagrange multipliers. Then the Lagrangian C can be written as follows: C = EE njjXiyj T3 n - 1 n - 1 - 1 n - 1 - 1 Kendall and Stuart [1977] give an outline of the steps in determining these scores. McKeeman [1978] also have a nice variation following the steps of - 57 -Anderberg [1973]. Following Kendall and Stuart's derivation of the optimal scores we end up with the following equation. E(«™-*)E, Tff = P 2 ( W ) . Thus acceptable values of p2 are just the eigenvalues of the matrix N N ' where N is the r by c matrix whose (ith,jth) element is y/ni.n.j ' In matrix notation, we have N N ' = p 2 u, where u is an eigenvector with elements Now to calculate the scores {x[}, we first find the nonzero eigenvalues of N N ' . The largest eigenvalue is 1 which corresponds to assigning a com-mon score to each category. The remaining nonzero eigenvalues are the canonical correlations. Letting p be the largest one, then it represents the maximum correlation possible between X and Y. Next, we calculate the corresponding eigenvector and standardize the values (i.e., mean 0 and vari-ance 1) to give us the {xi} scores. Then we can compute the {yj} scores by „. _ n*Sx* V j ~ ^ on •' - 58 -The next step in our data analysis is to apply the above canonical cor-relation method to obtain optimal scores for our letter grades and UBC standings. The above algorithm was implemented on S and the results are found in Tables 19 and 20. In column 6 of Table 19 with the heading "Maximum", we list the maximum correlation (the other statistics will be discussed later on) based on the optimal scores for one of the predictor vari-able (HS, PROV, or BLND) with the UBC standings by course. Note that the blended letter grades had the highest maximum correlation with the UBC standings for each course with the exception of Math 153 (suggesting it is the best predictor). Overall, the blended grade has the highest maxi-mum correlation with the UBC grade (0.637); the provincial grade is not far behind with a correlation of 0.614; and the high school grade maximum cor-relation is last (0.560 with the UBC grade). This ordering is also notable in both Math 100 and Math 140. Similarly, Table 20 has the maximum corre-lation by group. The blended letter grades attained the maximum correla-tion with the UBC standings in three of the groups (GVRD, Private Non-Catholic, and Private Catholic schools) while the provincial grade has highest maximum correlation with the UBC grade in the Vancouver and Rest of B.C. regions — t h e blended grade was a very close second! The results for the 51 large schools indicate that in 45% of the schools the high school grade had the highest maximum correlation with the UBC grade. Likewise, the provincial and blended grade took 20% and 35% respectively. Finally, the maximum correlation between the high school let-- 59 -Table 19: Correlations and Measures of Association By Course Course Predictor Pearson Spearman Normal Maximum Kendall Somers ' Lambda Beta Math 100 HS PROV BLND 0.546 O.601 0.634 0.565 0.618 0.650 0.560 0.618 0.649 0.566 0.623 0.653 0.489 0.536 0.571 0.513 0.552 0.604 0.182 0.206 0.225 0.779 0.949 1.071 Math 120 HS PROV BLND 0.357 •0.148 0.218 0.327 •0.168 0.093 0.310 •0.161 0.165 0.492 0.172 0.521 0.311 •0.161 0.090 0.364 •0.393 0.148 0.042 0.000 0.000 0.294 •1.204 0.147 Math 140 HS PROV BLND 0.462 0.496 0.543 0.496 0.556 0.592 0.482 0.539 0.573 0.513 0.608 0.613 0.418 0.466 0.503 0.423 0.465 0.515 0.173 0.209 0.189 0.622 0.748 0.841 Mathl53 HS PROV BLND 0.422 0.393 0.444 0.373 0.404 0.436 0.412 0.411 0.441 0.533 0.424 0.476 0.348 0.373 0.404 0.475 0.459 0.565 0.021 0.125 0.021 0.465 0.465 0.512 Total HS PROV BLND 0.530 0.574 0.607 0.559 0.611 0.C36 0.547 0.600 0.625 0.560 0.614 0.637 0.479 0.524 0.551 0.495 0.534 0.573 0.193 0.214 0.217 0.747 0.892 0.980 Table 20: Correlations and Measures of Association By Group Group Predictor Pearson Spearman Normal Maximum Kendall Somers Lambda Beta Vancouver HS PROV BLND 0.512 0.619 0.603 0.544 0.665 0.641 0.530 0.650 0.629 0.548 0.668 0.643 0.464 0.575 0.558 0.461 0.565 0.557 0.162 0.239 0.219 0.705 1.057 0.988 GVRD HS PROV BLND 0.595 0.571 0.640 0.633 0.609 0.671 0.618 0.597 0.659 0.643 0.613 0.675 0.549 0.527 0.586 0.566 0.556 0.621 0.267 0.212 0.238 0.955 0.881 1.106 Rest of B.C. HS PROV BLND • 0.544 0.635 0.620 0.543 0.638 0.635 0.544 0.642 0.630 0.560 0.663 0.660 0.473 0.549 0.555 0.519 0.560 0.595 0.130 0.192 0.181 0.749 1.056 1.011 Private Non-Catholic HS PROV BLND 0.505 0.631 0.639 0.543 0.695 0.672 0.517 0.649 0.637 0.619 0.742 0.759 0.463 0.594 0.571 0.490 0.597 0.582 0.185 0.352 0.315 0.693 1.048 1.011 Private Catholic HS PROV BLND 0.579 0.393 0.564 0.585 0.386 0.569 0.581 0.394 0.569 0.663 0.570 0.683 0.498 0.323 0.481 0.524 0.321 0.497 0.119 0.186 0.186 0.864 0.464 0.832 Total HS PROV BLND 0.530 0.574 0.607 0.559 0.611 0.636 0.547 0.600 0.625 0.5C0 0.614 0.637 0.479 0.524 0.551 0.495 0.534 0.573 0.193 0.214 0.217 0.747 0.892 0.980 I - 60 -ter grades and the provincial letter grades was highest with the Rest of B.C. region (0.644) and lowest with the Private Non-Catholic schools (0.512). The Vancouver and GVRD schools were fairly close to the Rest of B.C. (0.621 and 0.628 respectively). We now focus on the optimal scores assigned to letter grades and the UBC standings. To be brief, we shall only look at the scores for HS with UBC, PROV with UBC, and BLND with UBC. Table 21 lists these scores under "Optimal" (the other scores will be discussed later.) It has been sug-gested by Gilula [1986] that "appropriate groupings of rows and columns of a two-way contingency table can often simplify the analysis of association between two categorical random variables." In that same article, Gilula explains the use of optimal scores to suggest groupings (i.e., categories which have approximately the same score should be grouped together). The optimal scores also indicate the "distance" between categories. For example, the UBC score (based on PROV) for first class students is fur-ther away from a second class student (difference of 1.219) than a pass stu-dent is from a second class student (difference of 0.827). This would suggest that, on the average, second class students are more closer to passing stu-dents than they are to first class students. Finally, note that the optimal scores obtained are not always monotone as the levels of grading would sug-gest (e.g., see HS) — the rationale is that very few students with a high school letter grade of F get into UBC! Monotone methods will be considered in the next section. - 61 -Table 21: Optimal, Normal, and Avorage Scores for All Students Scores Predictor A B C + C P F I .11 P S F Optimal HS PROV BLND 1.237 1.270 1.302 •0.013 •0.067 •0.050 • 1.100 •0.947 •0.981 • 1.470 • 1.264 • 1.581 •1.739 • 1.525 • 1.687 -1.387 -1.761 •2.410 1.235 1.330 1.271 0.288 0.111 0.211 •0.756 •0.716 •0.702 • 1.253 -1.224 • 1.281 • 1.544 • 1.439 • 1.535 Normal HS PROV BLND 1.209 1.192 1.249 •0.029 •0.003 •0.014 •0.852 •0.737 •0.874 • 1.409 • 1.216 • 1.501 •2.248 -3.340 • 1.704 -2.654 •2.338 -3.757 1.233 1.236 1.235 0.177 0.181 0.179 •0.541 •0.537 •0.539 • 1.129 -1.126 -1.128 • 1.854 • 1.852 • 1.852 Average 1.235 0.170 •0.539 -1.128 -1.853 - 62 -V—3. Monotone Methods The ordinal nature of our letter grades and U B C standings suggests the scores assigned to the categories should also be ordered. As we have seen in the last section, the optimal scores corresponding to the maximum correlation does not always preserve monotonicity. In this section, we shall look at several methods of assigning monotone scores to our grade catego-ries. A simple method of generating monotone scores is to just assign the values 4, 3, 2.5, 2, 1, and 0 corresponding to the letter grades A , B, C + , C , P, and F and assign the values 4, 3, 2, 1, and 0 to I, II, P, S, and F. The resulting correlation is found under the label "Pearson" in Tables 19, 20 and 22. Spearman's rank correlation ranks the categories with respects to their marginal frequency distribution. More specifically, the first row category receives rank (n x . +1)/2, the second row category receives rank n i . + ( n 2 . + l) /2, and the row category receives rank HkiZ\ n,\ + (nj,. + l)/2. The results of Spearman's correlation are also found in Tables 19, 20, and 22. There are many other methods of generating monotone scores (e.g., see Kimeldorf, May, and Sampson [1982]), but for our analysis we will use the "normal" method which is outlined in Kendall and Stuart [1977] — see also McKeeman [1978] for an application. We make the assumption of an - 63 -Table 22: Correlations and Measures of Association By School School Predictor Pearson Spearman Kendall -Somers Lambda Kamloops HS 0.441 0.432 0.395 0.486 0.250 PROV 0.428 0.527 0.432 0.449 0.375 BLND 0.527 0.589 0.534 0.657 0.375 Columneetza HS 0.579 0.606 0.550 0.672 0.111 PROV 0.677 0.657 0.593 0.681 0.222 BLND 0.650 0.615 0.559 0.677 0.222 Abbotsford HS 0.817 0.810 0.747 0.846 0.286 PROV 0.636 0.668 0.604 0.651 0.286 BLND 0.648 0.619 0.542 0.565 0.286 Semiahmoo HS 0.434 0.519 0.429 0.446 0.333 PROV 0.519 0.542 0.455 0.475 0.083 BLND 0.558 0.566 0.479 0.500 0.167 North Delta HS 0.820 0.840 0.765 0.779 0.579 PROV 0.670 0.722 0.634 0.651 0.421 BLND 0.752 0.767 0.673 0.691 0.368 South Delta HS 0.459 0.456 0.399 0.431 0.200 PROV 0.652 0.698 0.602 0.640 0.280 BLND 0.678 0.686 0.611 0.686 0.240 Seaquam HS 0.623 0.663 0.617 0.728 0.300 PROV 0.602 0.450 0.413 0.482 0.100 BLND 0.555 0.437 0.403 0.460 0.100 Richmond HS 0.700 0.745 0.661 0.662 0.281 PROV 0.655 0.689 0.603 0.633 0.290 BLND 0.634 0.752 0.665 0.703 0.355 Steveston HS 0.741 0.771 0.692 0.707 0.378 PROV 0.688 0.739 0.644 0.694 0.324 BLND 0.728 0.736 0.656 0.698 0.324 Matthew McNair HS 0.715 0.726 0.647 0.701 0.242 PROV 0.681 0.705 0.614 0.649 0.303 BLND 0.767 0.784 0.704 0.769 0.273 Britannia HS 0.457 0.450 0.378 0.382 0.231 PROV 0.608 0,537 0.464 0.470 0.231 BLND 0.486 0.602 0.512 0.535 0.385 Magee HS 0.709 0.677 0.619 0.636 0.389 PROV 0.622 0.672 0.585 0.587 0.318 BLND 0.611 0.626 0.564 0.554 0.389 Kitsilano HS 0.214 0.145 0.124 0.132 0.444 PROV 0.500 0.300 0.265 0.284 0.222 BLND 0.115 0.085 0.079 0.080 0.444 - 64 -Table 22: (cont'd) School Predictor Pearson Spearman Kendall <Somers Lambda John Oliver HS 0.C59 0.697 0.605 0.602 0.313 PROV 0.551 0.661 0.563 0.544 0.303 BLND 0.685 0.718 0.643 0.634 0.344 Lord Byng HS 0.722 0.681 0.587 0.500 0.750 PROV 0.717 0.831 0.763 0.676 0.250 BLND 0.846 0.839 0.783 0.684 0.500 Templeton HS 0.635 0.741 0.650 0.653 0.333 PROV 0.655 0.641 0.579 0.561 0.429 BLND 0.662 0.701 0.645 0.653 0.429 Vancouver Technical HS 0.523 0.632 0.565 0.540 0.235 PROV 0.500 0.534 0.463 0.471 0.176 BLND 0.456 0.552 0.470 0.465 0.235 Point Grey HS 0.638 0.668 0.582 0.657 0.263 PROV 0.715 0.797 0.694 0.702 0.421 BLND 0.697 0.727 0.646 0.724 0.316 Gladstone HS 0.459 0.466 0.429 0.455 0.286 PROV 0.508 0.493 0.445 0.426 0.357 BLND 0.465 0.489 0.463 0.477 0.286 Sir Winston HS 0.614 0.667 0.572 0.535 0.226 Churchill PROV 0.613 0.656 0.588 • 0.560 0.258 BLND 0.653 0.713 0.627 0.585 0.194 Killarney HS 0.678 0.761 0.675 0.631 0.459 PROV 0.649 0.729 0.633 0.607 0.378 BLND 0.697 0.764 0.674 0.639 0.378 Sir Charles Tupper HS 0.654 0.595 0.508 0.530 0.300 PROV 0.684 0.635 0.527 0.513 0.500 BLND 0.688 0.677 0.583 0.583 0.300 David Thompson HS 0.313 0.281 0.260 0.231 0.000 (Vancouver) PROV 0.757 0.642 0.604 0.545 0.222 BLND 0.663 0.528 0.492 0.455 0.222 Prince of Wales HS 0.545 0.547 0.478 0.483 0.182 PROV 0.576 0.614 0.530 0.523 0.364 BLND 0.605 0.599 0.518 0.523 0.273 Windermere HS 0.726 0.728 0.686 0.696 0.545 PROV 0.660 0.606 0.538 0.517 0.273 BLND 0.697 0.656 0.602 0.585 0.273 Eric Hamber HS 0.590 0.612 0.531 0.565 0.250 PROV 0.585 0.561 0.489 0.499 0.292 BLND 0.582 0.598 0.524 0.556 0.271 - 65 -Table 22: (cont'd) School Predictor Pearson Spearman Kendall .Somers Lambda University Hill HS 0.908 0.916 0.873 0.818 0.800 PROV 0.784 0.761 0.689 0.656 0.600 BLND 0.904 0.885 0.816 0.743 0.800 St.George's HS 0.698 0.716 0.641 0.657 0.389 PROV 0.766 0.807 0.717 0.733 0.611 BLND 0.790 0.836 0.733 0.736 0.556 Vancouver College HS 0.614 0.611 0.512 0.518 0.200 PROV 0.437 0.505 0.426 0.475 0.200 BLND 0.625 0.650 0.545 0.570 0.250 York House HS 0.651 0.656 0.612 0.686 0.364 PROV 0.457 0.500 0.441 0.438 0.636 BLND 0.615 0.634 0.564 0.595 0.545 Little Flower HS 0.699 0.701 0.640 0.748 0.083 PROV 0.464 0.458 0.395 0.380 0.167 BLND 0.647 0.640 0.567 0.565 0.250 Notre Dame HS 0.584 0.631 0.575 0.554 0.143 PROV 0.491 0.586 0.642 0.538 0.286 BLND 0.521 0.595 0.533 0.518 0.286 New Westminster HS 0.612 0.692 0.615 0.708 0.455 PROV 0.783 0.803 0.735 0.814 0.636 BLND 0.705 0.717 0.655 0.754 0.545 Burnaby Central HS 0.682 0.682 0.613 0.633 0.500 PROV 0.573 0.600 0.522 0.519 0.292 BLND 0.692 0.669 0.599 0.608 0.417 Burnaby North HS 0.685 0.694 0.589 0.568 0.500 PROV 0.392 0.569 0.500 0.513 0.500 BLND 0.614 0.628 0.561 0.561 0.333 Burnaby South HS 0.659 0.661 0.608 0.648 0.429 PROV 0.538 0.544 0.471 0.484 0.286 BLND 0.656 0.651 0.600 0.707 0.357 Alpha HS 0.597 0.546 0.500 0.596 0.444 PROV 0.633 0.588 0.657 0.603 0.333 BLND 0.698 0.666 0.634 0.677 0.444 St. Thomas More HS 0.777 0.675 0.619 0.596 0.333 PROV 0.747 0.764 0.699 0.631 0.167 BLND 0.761 0.743 0.686 0.639 0.333 Centennial HS 0.783 0.605 0.559 0.564 0.364 PROV 0.609 0.540 0.496 0.523 0.273 BLND 0.665 0.496 0.437 0.455 0.182 - 66 -Tublo 22: (cont'd) School Predictor Peurson Spearman Kendall •Somprs Lambda Port Coquitlam HS 0.845 0.852 0.788 0.786 0.429 PROV 0.577 0.614 0.558 0.595 0.429 BLND 0.750 0.764 0.667 0.644 0.429 Argyle HS 0.639 0.712 0.636 0.694 0.636 PROV 0.559 0.610 0.572 0.588 0.364 BLND 0.622 0.078 0.636 0.662 0.455 Handsworth HS 0.614 0.631 0.556 0.623 0.105 PROV 0.517 0.517 0.467 0.546 0.105 BLND 0.718 0.736 0.656 0.743 0.158 Windsor HS 0.230 0.213 0.207 0.222 0.286 PROV 0.182 0.236 0.204 0.196 0.143 • BLND 0,273 0.247 0.216 0.220 0.143 Carson Graham HS 0.603 0.654 0.578 0.573 0.400 PROV 0.392 0.367 0.335 0.378 0.067 BLND 0.541 0.579 0.514 0.546 0.133 Hillside HS 0.393 0.479 0.386 0.379 0.087 PROV 0.566 0.552 0.483 0.527 0.174 BLND 0.465 0.516 0.435 0.439 0.130 Sentinel HS 0.628 0.684 0.578 0.575 0.280 PROV 0.614 0.644 0.563 0.587 0.240 BLND 0.628 0.667 0.584 0.623 0.200 West Vancouver HS 0.597 0.762 0.667 0.680 0.438 PROV 0.554 0.541 0.466 0.497 0.188 BLND 0.697 0.718 0.626 0.642 0.313 Prince George HS 0.344 0.355 0.312 0.414 0.143 PROV 0.671 0.621 0.568 0.686 0.143 BLND 0.420 0.398 0.364 0.483 0.143 Cowichan HS 0.895 0.902 0.795 0.763 1.000 PROV 0.870 0.820 0.778 0.757 0.667 BLND 0.845 0.874 0.824 0.848 0.667 Nanaimo District HS 0.465 0.207 0.186 0.160 0.500 PROV 0.507 0.418 0.381 0.357 0.500 BLND 0.551 0.457 0.417 0.378 0.500 Alberni District HS 0.728 0.693 0.624 0.640 0.444 PROV 0.561 0.489 0.457 0.486 0.222 BLND 0.702 0.625 0.561 0.564 0.333 Total HS 0.545 0.574 0.492 0.502 0.186 PROV 0.563 0.602 0.515 0.526 0.195 BLND 0.610 0.636 0.551 0.570 0.206 - 67 -underlying normal distribution which is divided into r intervals (for variable X) with boundary points -co =» x0 <*t< s3 < ... < x,« +co. Let pi~niln be the observed proportion in the ith interval [XJ_;,XJ). Next let mj be the mean value of the ith interval, Using the cumulative values of the p/s, we can derive the xfs using the denominator. For example, X i = zP1+p3+...+Pi where P(Z < za) = a. Now the numerator is just The scores {m/} are then standardized (for convenience) to have mean 0 and variance 1 with respect to the frequency distribution of X. Similarly, scores are found for the Y variable. Using S, we apply this bivariate normal procedure to our data. A n example of the normal scores and resulting correlation is found in Table 21. Note that the scores for each grade level is very close to each other among the three predictors. Furthermore, the normal scores are close to their opti--PTIT- e^dz - 68 -mal counterparts and the resulting "normal" correlations are fairly close to the maximum correlation — e.g., in Table 19, we see that the (overall) cor-relations between HS, PROV, and BLND with UBC are 0.547, 0.600, and 0.625 for the normal scores and 0.560, 0.614, and 0.637 for the optimal scores respectively. - 69 -V—4. Summary We started this chapter by treating variables categorically. In Sec-tion 2, we applied the optimal scores technique to our data and found that monotonicity of the scores were not preserved in some cases. This lead us to investigate monotone methods for assigning scores which was the topic in Section 3. In particular, we looked at Pearson's correlation, Spearman's rank correlation, and a normal scores method. We then applied the normal scores method to our data and noted the close similarities to the optimal scores. These normal monotone scores will be used in the next chapter. - 70 -C H A P T E R VI DATA ANALYSIS (Categorial Methods) VI—1. Introduction This chapter conducts data analysis using a categorical approach. In Section 2, we shall analyze the data from a "predictive" standpoint and for-mulate a method by which we can assign weights to schools or groups. Sec-tion 3 will deal with loglinear models. Sections 4 and 5 will take advantage of the ordinal nature of the data by using measures of association and implementing the uniform association model which relies on predetermined scores assigned to the categorical variables. In Section 6, we will use the logit and the GSK models which make a distinction between one or more independent variables and a dependent or response variable. Finally, we summarize the results in section 7. - 71 -VI—2. Prediction of Success Let us define "success" at UBC to be either a first class or second class standing and "failure", to anything below a second class. Similarly, define "success" at high school to be an A or B letter grade. Hence, we can construct three four-fold tables as in Table 23. Now we can define the probability of success at UBC (Ps) to be the conditional probability of success at UBC given success in high school. For example, in the HS by UBC table, we have Ps = 847/1186 = 0.714. That is, a randomly chosen student who was successful in high school has a prob-ability of 0.714 of succeeding at UBC. On the other hand, we may also look at the probability of doing poorly at UBC given that the student did poorly at high school. We shall call this the probability of failure (Pf). Table 24 gives values of Ps and Pf by course and by group. In Table 24, we also sharpen our definition of success in high school to be just an A letter grade. With the definition of success in high school being an A or B letter grade, overall the provincial grade is the better predictor (Ps — 0.723) than either the high school grade (Ps = 0.714) or the blended grade (Ps = 0.720). This is also true for every category except Math 140 (where the blended grade is best), GVRD (where the high school grade is the best), and Private Catholic schools (where the blended grade is the best). At the other end of the spec-trum, overall the blended grade provides the best predictor of failure — i.e., a randomly chosen student who had a failing blended grade before entering - 72 -Table 23: Prediction of Success and Failure in U B C FREQUENCY I ROW PCT I UBC FAILURE Grade ( I , I D I SUCCESS I TOTAL High School Grade FAILURE | 1 383 77.69 I 110 1 22.31 I I 493 SUCCESS | (A,B) I 339 28.58 I 847 I 71.42 I I 1186 TOTAL 722 957 1679 FREQUENCY I ROW PCT 1 UBC FAILURE Grade ( I , I D I SUCCESS I TOTAL P r o v i n c i a l Grade FAILURE I 1 108 76.26 1 127 I 23.74 I I I I 535 SUCCESS | (A,B) 1 318 27.70 I 830 I 72.30 1148 TOTAL 726 957 1683 FREQUENCY I ROW PCT I UBC FAILURE Grade ( I , I D I SUCCESS I TOTAL Blended Grade FAILURE I 1 386 80.42 I 94 I 19.58 I I + 480 SUCCESS | (A,B) 1 335 28.03 I 860 I 71.97 I I 1195 TOTAL 721 954 1675 - 73 -U B C has a 80% chance of receiving a "failure" (below a II class) in first year calculus at U B C ! After sharpening our definition of success at high school as being an A letter grade, we see from Table 24 that the blended grade is the overall best predictor of success (92%). This is also noted in every category except Math 120 and the Vancouver schools. Now let us define a function of the probability of success and the probability of failure. Let P = ctPt + CjPj ct + cj - 1, c„ cj > 0 be called the "prediction function." It is a measure of the "predicting pow-er" of a predictor variable {HS, PROV, or BLND). One can use P to assign weights to groups such that the groups can be ordered relative to each other based on their "predictive ability" — i.e., high P values implies stronger pre-dictive ability. More specifically, for predetermined values of cs and one can calculate P for each group and then scale the values by dividing through by the maximum P value. Hence, each group would be relative to the group with the maximum P value in a "predictive sense." As an example, let us consider the different groups with the blended grade with c s = 0.8 and c^=0.2. The results are tabulated (under the label "Predict") in Table 25. Similarly, using the blended grade and the same values for cs and cf, we compute the weights for the 51 large school (see Table 26). In both Tables 25 and 26, the groups or schools are arranged in descending order by - 74 -Table 24; Prediction of Success and Failure in UBC by Course and Group (Success - A,B) (Success* = A) Predictor P(Success) P(Failure) P(Success) P(Failure) Math 100 HS 0.704 0.801 0.859 0.560 PROV 0.709 0.757 0.880 0.574 BLND O.705 0.816 0.921 0.671 Math 120 HS 0.887 0.000 0.958 0.313 PROV 0.801 - 0.885 0.000 BLND 0.889 - 0.911 0.286 Math 140 HS 0.678 0.751 0.855 0.590 PROV 0.698 0.771 0.909 0.609 BLND 0.703 0.788 0.906 0.595 Math 153 HS 0.790 1.000 0.844 0.474 PROV 0.797 0.750 0.845 0.400 BLND 0.780 1.000 0.862 0.656 Vancouver HS 0.794 0.656 0.908 0.477 PROV 0.828 0.709 0.961 0.505 BLND 0.817 0.717 0.934 0.486 GVRD HS 0.735 0.848 0.903 0.600 PROV 0.670 0.842 0.871. 0.627 BLND 0.694 0.873 0.940 0.621 Rest of B.C. HS 0.019 0.879 0.777 0.617 PROV 0.689 0.767 0.848 0.595 BLND 0.663 0.877 0.860 0.603 Private Non-Catholic HS 0.640 0.870 0.905 0.644 PROV 0.750 0.875 0.900 0.740 BLND 0.679 0.852 0.963 0.736 Private Catholic HS 0.613 0.826 0.750 0.632 PROV 0.571 0.611 0.630 0.566 BLND 0.630 0.742 0.800 0.600 Total HS 0.714 0.777 0.866 0.666 PROV 0.723 0.763 0.885 0.580 BLND 0.720 0.804 0.918 0.577 Table 25: Assigning Weights to Groups Group Ps Pr Predict Weight Vancouver 0.817 0.717 0.797 1.000 GVRD 0.694 0.873 0.730 0.916 Private Non-Catholic 0.679 0.852 0.714 0.896 Rest of B.C. 0.663 0.877 0.706 0.886 Private Catholic 0.630 0.742 0.652 0.818 - 75 -weight. Overall, P calculated from the blended grade is 0.737, while P for HS and PROV is 0.727 and 0.731 respectively. The weights calculated in Tables 25 and 26 can be interpreted as a measure of how predictable a stu-dent can be. For example, in Table 26 (where c s = 0.8 and cy=0.2), a stu-dent from Cowichan is highly predictable in that if (s)he had a successful blended grade, then (s)he will most likely be successful at U B C or if (s)he had a poor blended grade, then (s)he will probably do badly at U B C . In con-trast, a student from Kamloops is highly unpredictable — i.e., (s)he may have a high blended grade but end up doing badly at U B C . Note that this may also have some implications on the grading standards of different schools! Our definition of the prediction function depends on not only the val-ues for cs and cf but also the "definition of success" in high school. In Table 24, we see that after sharpening the definition of success from an A or B to just an A , the resulting values of Ps and Pf increased and decreased respec-tively. A n interesting question would be "What cut-off high school mark would maximize the prediction function for preassigned values for cs and cf!" To pose this question mathematically, we first form a 100 by 2 contin-gency table with the rows corresponding to each high school mark (1,2,...,100) and the columns for failure or success at U B C . Hence, we want to find 1 < k < 100 such that E,>fe ".2 12i<k ng c„— + cj— 2L,i>* «»'• lsi<k n«-- 76 -Table 26: Assigning Weights to Schools Schools Ps Pf Predict Weight Cowichan 1.000 0.875 0.975 1.000 Centennial 0.038 1.000 0.950 0.974 David Thompson (Vancouver) 0.920 1.000 0.936 0.960 Lord Byng 1.000 0.600 0.920 0.944 Templeton 0.917 0.840 0.903 0.926 St. Thomas More 1.000 0.600 0.900 0.923 Killarney 0.950 0.696 0.899 0.922 Sir Winston Churchill 0.972 0.560 0.890 0.913 Vancouver Technical 0.895 0.727 0.861 0.883 Magee 0.857 0.818 0.849 0.871 Britannia 0.917 0.556 0.844 0.866 Port Coquitlam 0.833 0.800 0.827 0.848 Argyle 0.778 1.000 0.822 0.843 Nanaimo District 0.900 0.500 0.820 0.841 Seaquam 0.705 1.000 0.812 0.833 Notre Dame 0.909 0.400 0.807 0.828 John Oliver 0.844 0.636 0.802 0.823 Carson Graham 0.818 0.667 0.788 0.808 Alpha 0.727 1.000 0.782 0.802 North Delta 0.700 1.000 0.765 0.784 Matthew McNair 0.700 1.000 0.760 0.779 Gladstone 0.700 1.000 0.760 0.779 Prince George 0.700 1.000 0.760 6.779 West Vancouver 0.714 0.875 0.740 0.766 Point Grey St. George's 0.667 1.000 0.733 0.762 0.607 1.000 0.733 0.762 New Westminster 0.667 1.000 0.733 0.752 Burnaby North 0.750 0.667 0.733 0.762 Hillside 0.739 0.692 0.730 0.748 South Delta 0.655 1.000 0.724 0.743 Eric Hamber 0.717 0.750 0.724 0.742 Burnaby Central 0.696 0.818 0.720 0.739 Sentinel 0.714 0.727 0.717 0.735 Prince of Wales 0.737 0.588 0.707 0.725 Steves ton 0.667 0.857 0.705 0.723 Burnaby South 0.625 1.000 0.700 0.718 Sir Charles Tupper 0.625 0.833 0.667 0.684 Windermere 0.625 0.833 0.667 0.684 Richmond 0.600 0.929 0.666 0.683 Columneetza 0.571 1.000 0.657 0.674 University Hill 0.571 1.000 0.657 0.674 Handsworth 0.615 0.750 0.642 0.659 York House 0.545 1.000 0.630 0.653 Semiahmoo 0.600 0.750 0.630 0.646 Little Flower 0.556 0.909 0.626 0.642 Kitsilano 0.630 0.600 0.609 0.625 Alberni District 0.645 0.750 0.580 0.601 Windsor 0.500 0.833 0.567 0.581 Vancouver College 0.409 0.875 0.502 0.516 Abbotsford 0.300 1.000 0.440 0.451 Kamloops 0.250 1.000 0.400 0.410 - 77 -is maximized. This question does not have a "closed form" but by enumer-ating k from 1 to LOO we can "pick out" the maximum value. Table 27 lists the cut-off scores that maximizes the prediction function for several choices of cs and cf. For example, if the cut-off points of success was 89 for the pro-vincial mark, then the prediction function with cs = 0.6 and cy=0.4 would be maximized at that point. One noticeable trend is the large change in the cut-off score when cy" goes from 0.5 to 0.4. This anomaly can be explained by the fact that when cs>cf then the cut-off score will tend to be high so as to maximize the probability of success. In Table 28, we repeat the above exercise but with the roles of the UBC and HS (or PROV, or BLND) inter-changed — i.e., fix the definition of success for HS, or PROV, or BLND to be an A or B and find the UBC cut-off score. Table 27: School Cut-Off Scores by Cf and C s Values Cf C s High School Provincial Blended 0.50 0.50 44 46 52 0.40 0.60 99 89 86 0.30 0.70 99 98 96 0.25 0.75 99 98 96 0.20 0.80 99 98 96 0.15 0.85 99 98 96 0.10 0.90 99 98 96 0.05 0.95 99 98 96 Table 28: UBC Cut-Off Scores by Cf and C 3 Values Cf C s High School Provincial Blended 0.50 0.50 23 24 24 0.40 0.60 47 57 44 0.30 0.70 73 60 55 0.25 0.75 73 60 57 0.20 0.80 73 72 73 0.15 0.85 73 72 73 0.10 0.90 73 72 73 0.05 0.95 73 72 73 - 79 -VI—3. Loglinear Models We start this section off by considering loglinear models (see Fienberg [1985] or Agresti [1984]) of three variables. Using the 4F program in the B M D P to formulate partitioning likelihood ratio chi-squared arguments, the following loglinear models (using Agresti's notation) were chosen from an hierarchical scheme. Here the variables are UBC, HS, PROV, BLND, Group, and Course. In addition, my* represents the expected cell frequency. log my* = p + Af + A f + A? + A j f + A ^ + A f 3 log my* = M + A? + Af + Af + Ag* + Ag 0 + Ag° log my* = j. + AF + Af + Aj + AS' + AiF + AS?' log my* = p. + \Y + \? + \? + X%p + )&0 + \$' log my* = /i + Af + Af + A? + Ag* + A?*° + Xfka log my* = /x + Af -f Af + A* + A y B + A^*c + \fkc For example, the first model was chosen from all possible models containing the variables UBC, HS, and Group. Likewise, the second was chosen from all possible models containing the variables UBC, HS, and Course, and so on. Note every model selected was the constant association model which means that no pair of variables are conditionally independent to the third. In fitting a loglinear model, permutations of the categories of any variable would affect neither the Pearson's chi-squared statistic nor the like-lihood ratio statistic. Hence, these models do not make use of the ordinal - 80 -nature of the data. In the next section, we shall look at measures of associ-ation for ordinal variables which depend on the ordering of the variable. - 81 -VI—4. Measures of Association We can use Pearson's chi-squared test statistic to show that there is some evidence (at the 5% level) of dependence between the predictor vari-able (HS, PROV, or BLND) and UBC for various groups. These dependenc-es, however, do not indicate the strength of the association. The notion of concordance and discordance for categorical data is similar to positive and negative correlation between continuous variables. When we have a two-way contingency table with ordinal-ordinal variables, a pair chosen from the table is concordant if an increase in one variable leads to an increase in the other variable. Measures of association are usually functions of the differ-ence of the number of concordant and discordant pairs. Kendall's tau-b measure of association is quite popular and is given preference over Goodman and Kruskal's gamma (another popular measure) because it takes into account of tie pairs on the X and Y variables. Also gamma tends to be unstable and take a value of 1 in cases that do not rep-resent complete dependence — e.g., all 0 counts except diagonal and a 1 in the adjacent off-diagonal. For more information about these and other measures of association, see Agresti [1984]. The range for tau-b is between -1 and +1 with a value of 1 meaning total positive association. This statistic is calculated in Tables 19, 20, and 22 by course, group, and large schools. With the exception of Math 120, the - 82 -blended grade has the highest association with the UBC standing. Also note that the "degree to which BLND is better" than the other two predictor variables is more pronounced (e.g., HS = 0.489, PROV= 0.536, BLND = 0.571 for Math 100) using the tau-b statistic than when using the sample correlation statistic in our initial analysis. Looking at tau-b by group, we see there is no real winner among the predictors as HS is the best for Private Catholic schools, PROV is the best for Vancouver and Private Non-Catholic schools, and BLND is the best for the regions. Another measure of association is Somers'd Y|X which is an asymme-tric version, of tau-b. This is intended for use when Y is a response variable (UBC in our case). Under the label "Somers" in Tables 19, 20, and 22, we see that the blended grade is the best predictor in every category (course or group) except for Math 120 (where HS is the best predictor) and the Private schools. Another asymmetric measure of association is Lambda Y|X which is interpreted as the probable improvement in predicting Y given that one has knowledge of X. The square root of lambda behaves somewhat like a corre-lation coefficient so that it is more comparable with tau-b. Overall, the blended grade is the best predictor using this measure but by individual courses or groups, the provincial grade fares better! - 83 -VI—5. Uniform Association Model In Section 3, we mentioned the fact that the loglinear models did not take advantage of the natural ordering of the grade levels. This section will look at the uniform association model which is a simple loglinear model that uses predetermined monotone scores. More specifically, the model is given by log my = n + A f + Xj + /3(m - v)(Vj - V) where ux < u2 < ... < u r and vx < v2 < ... < vc are known scores assigned to X and Y respectively. Also, Xx — Xj = 0. This model has only one more parameter (namely 0) than the com-plete independence model log m 0- = fx + X? + Xj. The /3 parameter measures association between the X and Y variables with /3 = 0 implying independence, and 0 > 0 meaning positive associa-tion. The magnitude of ../?. can be interpreted as the log odds ratio per unit distances of the scores (see Agresti [1984]). - 84 -Returning to our data, we use the normal scores from the last chapter which are both monotone and highly correlated. For the UBC scores, we will use the average of the 3 UBC normal scores resulting from the 3 pre-dictors — see "average scores" in Table 21. Then we use GLIM to fit the data using the uniform association model. The values for beta for HS by UBC, PROV by UBC, and BLND by UBC by course and by group is pre-sented in under the label "Beta" in Tables 19 and 20. Overall and in all courses except Math 120, the blended grade is the better predictor; how-ever, only the GVRD shows that the blended grade is the best predictor. - 85 -VI—6. The Logit and G S K Models Recall back in Chapter IV, we used simple and multiple regression techniques to "predict" the UBC marks from the predictor variable {HS, PROV, or BLND). The UBC mark was the response variable while the high school (or provincial, or blended) mark acted as the independent variable. This dependent/independent relationship seems natural; however, in the loglinear models we have seen thus far in this chapter, we never made that distinction between the UBC grade and the high school (or provincial, or blended) grade. We shall now make that differentiation by analyzing our categorical data using linear models due to Grizzle, Starmer, and Koch [1969] which are better known as the GSK models. The GSK models analyze data that can be represented by a two-way contingency table whereby the rows of the table represent populations formed from one or more independent variables and the columns correspond to observed responses on one or more dependent variable. The cell frequen-cies in the table are assumed to have come from a product multinomial dis-tribution. Let Pij be the true probability of the response in the i^1 popu-lation which is estimated by the sample proportion pij. A vector of functions F(p) is formed and the functions of the true probabilities F(P), are assumed to follow the linear model F(P) — X B , where X is a design matrix and B is a vector of parameters. Using asymptotic statistical theory, Grizzle et al. [1969] gives several chi-square test statistics for testing the goodness - 86 -of fit of the model and for testing the significance of other sources of varia-tion. In our data, we will use the CATMOD (Categorical data Modeling) procedure in SAS to find the best predictor to the response variable UBC and to test several hypothesis using the CONTRAST statement of the CATMOD procedure in SAS. We will analyze our data using two simple response functions. The first is the logit response function based on the probability of success over the probability of failure — i.e., for each popula-tion, the response function would be \og(Ps/Pf) where Ps is the proportion of students who were successful (I or II Class standing at UBC) and Pfis 1-PS (the proportion who were not successful). The second response function that we will use is the mean response function which is equal to the sum of the sample proportions weighted by the average scores for all students — i.e., the response function for the ith population is equal to sipn + S2P(2 + ••• + sgpig where the column index 1 to 5 stands for I Class to F and to sg are the average scores of Table 21. The CATMOD procedure is very sensitive to zero cell counts and requires a lot of trial and error in aggregating classes together. The follow-ing grouping scheme was used for the logit model. The UBC grade was divided into success (I, II) and failure while the high school (or provincial, or blended) grade was also divided into success (A,B) and failure. A third vari-able CAT (categories) was introduced and partitioned into the following 4 categories. - 87 -— Category 1: All students who wrote the Euclid Math Contest and had enrolled in Math 100, Math 120, or Math 153. — Category 2: All students who enrolled in the above courses and who did not write the Euclid exam. — Category 3: All students who wrote the Euclid exam and regis-tered in Math 140. — Category 4: All students who enrolled in Math 140 but who did not write the Euclid exam. The grouping scheme for the mean response model was similiar except there were 5 classes for the UBC grade (I Class to F). The saturated linear model (i.e., HS and CAT main effects and HS*CAT interaction term) was first used and the interaction term was found to be quite nonsignificant (p-value of 0.31) while the main effects were highly significant. Hence, the linear model with main effects was cho-sen. Similarly, the best models using the provincial grade and the blended grade were also the linear model with main effects only. The corresponding p-values for the residual chi-square goodness of fit test were 0.31, 0.44, and 0.30 for the models involving the high school grade, the provincial grade, and the blended grade respectively. A large p-value means a smaller resi-dual chi-square value, which in turn imply a better fit. Hence, the provin-cial grade was better in this case! Now from the linear model using the pro-vincial grade, we conclude from an analysis of contrast that students who wrote and did not write the Euclid Math Contest perform significantly dif-- 88 -f'erent at UBC both overall and by course (recall that Math 100, Math 120, and Math 153 students were grouped together). All the above results were similar when we used the mean response function. Next we consider the different groups of students in our CAT variable defined as follows. — Category 1: Vancouver students who wrote the Euclid Math Con-test. — Category 2: Vancouver students who did not write the Euclid Math Contest. — Category 3: GVRD students who wrote the Euclid exam. — Category 4: GVRD students who did not write the Euclid exam. — Category 5: Rest of B.C. schools. — Category 6: Private school students who wrote the Euclid exam. — Category 7: Private school students who did not write the Euclid exam. Upon applying our logit model, we once again discover that the interaction term (predictor*CAT) is nonsignificant with each of the 3 predictor variable (HS, PROV, BLND) model. The resulting p-values were 0.44, 0.41, and 0.82 for the models involving HS, PROV, and BLND respectively. Hence, this time the blended grade is the best among the three predictors. The resulting analysis of contrast using the blended model reveals that students who wrote the Euclid exam perform significantly different from the students who did not write the Euclid exam in the Vancouver, the GVRD, and the - 89 -Private schools groups. The same conclusions were reached when we ran the mean response model. We now focus our attention on the Conjecture and use a logit model to test its validity. First we compare B high school students who wrote the Euclid Math Contest with A high school students who did not write the Euclid exam — the provincial grade will become the dependent variable. The provincial grade is classified as success (A or B) or failure. The CAT variable which is the only independent variable is classified as follows. — Category 1: A students who wrote the Euclid exam. — Category 2: A students who did not write the Euclid exam. — Category 3: B students who wrote the Euclid exam. — Category 4: B students who did not write the Euclid exam. Now applying the logit model with only one main effect (CAT) we conclude from the analysis of contrast table that categories 2 and 3 are not statisti-cally different (p-value of 0.36). Likewise we run the logit model using the UBC success (I or II Class) or failure classes. This time, there is slight evi-dence (p-value of 0.08) that the 2 groups may perform differently at UBC. - 90 -VI—7. Summary In this chapter, we started analyzing our data using categorical meth-ods. The prediction function was used as a tool to weight groups (schools) based on their "predictive ability." The constant association model which was selected in section 3 did not make use of the natural ordering of the let-ter grades. Next we considered measures of association for ordinal-ordinal variables. Then, we used the monotone scores from the last chapter to fit the uniform association model. Overall, the blended grade was the better predictor of the UBC standing based on the above measures. In Section 6, we treated the UBC grade as a response variable in the logit model and the mean response model. We saw that the provincial grade was the best pre-dictor when categories were by course and by participation in the Euclid exam. The blended grade was the best predictor when we had categories by group and participation in the Euclid exam. Finally, we concluded that overall, B high school students who wrote the Euclid exam performed on the provincial exam like A high school students who did not write the Euclid exam. - 91 -C H A P T E R VII RESULTS AND CONCLUSIONS In this final chapter, a review of the analysis and results pertaining to the first 6 chapters is given. In addition, qualified conclusions will be made. In Chapter I, we outlined the questions of concern — the primary one was determining the best predictor of success in first year calculus at UBC. The candidates were the high school grade, the provincial grade, and the blended grade which is just the average of the high school grade and the provincial grade. Chapter II describes the data in detail; we only considered those first-time students who had graduated from a B.C. high school in 1985 and then proceeded directly to UBC and registering in a first year calculus course in the 1985 fall term. In addition, we made an adjustment to the Math 153 (for Engineering students) marks so that they would be more comparable with the marks from other math courses. Frequency distribu-tion plots of the marks were also displayed in Chapter II. In Chapter III, an initial analysis using simple descriptive statistics was conducted — those students who had withdrawn before the completion of the course were excluded from this analysis. By course, the Math 120 - 92 -(honours) students had the highest UBC sample mean of 59.6; the Math 153 (Engineering) students were second with an average of 55.3; students in Math 100 (basic course) were next with a mean of 50.5; and the Commerce student taking Math 140 were last with a 47.0 average — the overall UBC sample mean was 50.1. This ordering (Math 120 students on top to Math 140 students on the bottom) seems intuitively correct; however, this order-ing could have occurred by chance alone. The corresponding standard devia-tions are fairly small (14.7 for Math 100 is the largest among the math courses), hence, using them to construct approximate 95% confidence inter-vals for the means, we conclude that both Math 120 and Math 153 students are "better" than Math 100 and Math 140 students. Furthermore, Math 100 students are better than Math 140 students. The overall UBC mean is "captured" by the approximate 95% confidence interval (49.4,50.8). Simi-larly, we conclude that students from Vancouver schools perform better at UBC than students from the GVRD schools, the Rest of B.C. schools, and the Private Catholic schools. Next we address the question of determining which of the three pre-university marks (the high school, the provincial, or the blended) best pre-dicts the UBC mark. The sample correlation coefficients stemming from dif-ferent predictors, courses, and groups were significantly different from 0. In all students, the blended mark was the most correlated with the UBC mark (0.68) while the provincial mark was second at 0.64 and the high school mark was last at 0.60. In testing the equality of the correlation coefficients - 93 -from two populations, we require two independent random samples. Since the independence assumption is unlikely to be correct between the pre-dictors (HS, PROV, or BLND), then we are not justified to compare the two sample correlations to find out which one is significantly larger. However, we can form individual (approximate) 95% confidence intervals for each of the 3 sample correlation — here we assume that the sample correlation has been computed from N independent pairs of observations. We will use Fish-er's z transformation which has a variance-stabilizing property (see Morri-son [1976]). More specifically, we transform the sample correlation r, by 2 = atanh(r) where atanh is the inverse hyperbolic tangent function. Fisher [1921] showed that this transformation has an asymptotically normal distri-bution with mean approximately atanh(p) and variance approximately l/(N-3). Upon forming the approximate 95% confidence intervals for each sample correlations, we cannot conclude that there is a "best" predictor from any of the math courses, groups, or overall. However, we can make the following conclusions. — The blended grade is better than the high school grade in Math 100. — The blended grade is better than the high school grade overall. — The provincial grade and the blended grade are better than the high school grade for Vancouver schools. Note that this method of comparing correlation coefficients does not have much power in detecting differences, but it is simpler mathematically than finding the variance of the difference of two coefficients. Similarly, there is - 94 -not enough evidence to conclude a best predictor for each of the' 51 large schools — one can only use the largest sample correlation as a guide. Recall that among the 51 schools, the blended mark had the highest sample corre-lation with the UBC mark in 47% of them while the high school mark and the provincial mark took 49% and 16% respectively — note that since ties are possible, the above percentages sum over 100%! In the overall total of all large school districts we can conclude that the blended mark is more cor-related with the UBC mark than the high school mark is. In Chapter III, we also looked at the Conjecture using simple sample means and saw that the overall mean of the high school B students who participated in the Euclid exam did perform like high school A students who did not write the Euclid exam on the provincial (both about 85% average). Data analysis using continuous methods was the topic of Chapter IV. Again, those students who dropped out were excluded from the analysis. Using linear regression, we concluded that the blended mark explained most of the variation in Math 100 (R2 = 0.51), Math 140 C R 2 = 0.38), Math 153 (R =0.44), and overall (R =0.47). This was also true in every group with the exception of the Private schools. The math courses were found to be sig-nificantly different from each other under the Student-Newman-Keuls mul-tiple range test at the 5% level. In addition, Private Catholic schools Rest of B.C. schools are significantly different from those students in the Vancou-ver and GVRD regions. With the predictor variable {HS, PROV, or BLND) - 95 -defined as a covariate, we used an ANOCOVA model whereby the 51 large schools were nested within groups to show that the blended mark was the best predictor in explaining the total variation, and that there were signifi-cant variability of school means within geographic regions after the effects of the covariate (Algebra 12 marks) were removed. Finally, we used an application of discriminant analysis to show some validity to the Conjecture. In Chapter V, we took the categorical approach in analyzing the data. We first decided to aggregate the withdrawn students with those who failed at UBC. Optimal scores which yield the maximum correlation was the topic of Section 2. Applying this technique to our data, we found that for every course except for Math 153 (where HS was the best), the blended grade generated the maximum correlation with the UBC grade. The blended grade also induced the largest sample correlation with the UBC grade in the GVRD and Private schools while the provincial grade provided the maxi-mum values in the Vancouver and the Rest of B.C. schools. It was discov-ered that the optimal scores did not necessarily preserve ordering. In Section 3, we generated normal scores which maintain monotonici-ty. We observed that the sample correlations resulting from the normal scores were fairly close to the maximum correlations. The normal scores suggest that the blended grade is the best predictor for every course except Math 120 (the high school grade was best here) and overall. Also, the blended grade was the best predictor in the GVRD and the Rest of B.C. - 9 6 -schools — the provincial grade was the best in Vancouver and the Private Non-Catholic schools while the high school was the best predictor for the Private Catholic schools. Finally, in Chapter VI, we looked at the prediction of success which is the conditional probability of being successful at UBC given that the student was successful in his or her pre-university math grade. We saw that the provincial grade had the highest conditional probability of success in every category except Math 140 (where the blended grade was the highest), stu-dents from the GVRD (where the high school grade was the best), and the Private Catholic schools (where the blended grade was the best). Overall, the provincial grade provided the largest conditional probability of success at UBC. In Section 4, we looked at several measures of association for ordinal-ordinal variables — namely, Kendall's tau-b, Somers' d Y|X (asymmetric measure), and Lambda Y|X (asymmetric measure). The blended grade had the highest association with the UBC grade overall and in every course except Math 120 when we used Kendall's tau-b and Somers' d Y|X meas-ures. The two measures also favoured the blended grade for the GVRD and the Rest of B.C. regions. Using the Delta method (see Agresti [1984]), asymptotic standard errors can be calculated for these measures and from normal theory (Agresti [1984]), approximate 95% confidence intervals can be constructed. Once again, we do not have enough evidence to conclude a - 9 7 -"best" predictor; however, we can make the following statements based on the approximate 95% confidence intervals. — The blended grade is better than the high school grade for Math 100. — The blended grade is better than the high school grade overall. — The provincial grade is better than the high school grade for Van-couver schools. In Section 5, we took the normal scores from Chapter V and applied the uniform association model — note that the normal scores for the UBC grade were the average of the 3 predictors. Again from theory (Agresti [1984]), we can construct approximate 95% confidence intervals, and formu-late some conclusions. The resulting conclusions are identical to those made using the measures of association. : In Section 6, we used the logit model and the mean response model to conclude that the provincial grade was the best predictor (based on residual chi-square goodness of fit test) and the categories were by course. However, using the category breakdown by groups resulted in the blended grade to be the best predictor. It was also seen that students who wrote the Euclid exam performed significantly different from those students who did not write the Euclid exam. Finally, using a logit model, we examined the Con-jecture and concluded that B students who wrote the Euclid exam performed like A students who did not write the Euclid exam on the provincial exam. - 98 -In addition, there was slight evidence that the 2 groups may perform differ-ently at UBC. In conclusion, although most of the test statistics used could not be used to formulate significant conclusions in finding the best predictor, we have seen that overall the blended grade was the highest in many situations and therefore conclude that it is the best predictor! As for finding the best predictor by course, by group, and by the 51 large schools, one should refer to the various tables. - 99 -BIBLIOGRAPHY 1. A g r e s t i , A. (19 8 4 ) , A n a l y s i s o f O r d i n a l C a t e g o r i c a l D a t a , John W i l e y S Sons. 2. Akemann, C.A., B r u c k n e r , A.M., R o b e r t s o n , J.B., Simons, S., and W e i s s , M.L. ( 1 9 8 3 ) , " C o n d i t i o n a l C o r r e l a t i o n Phenomena W i t h A p p l i c a t i o n s t o U n i v e r s i t y A d m i s s i o n S t r a t e g i e s , " J o u r n a l of E d u c a t i o n a l S t a t i s t i c s , V o l . 8, 5-44. 3. A n d e r b e r g , M.R. ( 1 9 7 3 ) , C l u s t e r A n a l y s i s f o r A p p l i c a -t i o n s , New Y o r k : A c a d e m i c P r e s s . 4. A n d e r s o n , T.W. ( 1 9 5 8 ) , An I n t r o d u c t i o n t o M u l t i v a r i a t e S t a t i s t i c a l A n a l y s i s , New York: John W i l e y & Sons. 5. D i x o n , W.D., Brown, M.B., Engleman, L., F r a n e , J.W, H i l l , M.A., J e n n r i c h , R . I . , and T o p o r e k , J.D., e d i t o r s ( 1 9 8 1 ) , BMDP S t a t i s t i c a l S o f t w a r e 1981, Los A n g e l e s : U n i v e r s i t y of C a l i f o r n i a P r e s s . 6. F i e n b e r g , S.E. ( 1 9 8 5 ) , The A n a l y s i s of C r o s s - C l a s s i f i e d C a t e g o r i c a l D a t a , Second E d i t i o n , C ambridge, M a s s a c h u -s e t t s : The MIT P r e s s . 7. Freedman, P i s a n i , R., and P u r v e s , R. ( 1 9 7 8 ) , S t a t i s -t i c s , New York: W.W. N o r t o n . 8. G i l u l a , Z. (19 8 6 ) , " G r o u p i n g and A s s o c i a t i o n i n C o n t i n -gency T a b l e s : An E x p l o r a t o r y C a n o n i c a l C o r r e l a t i o n -100-A p p r o a c h , " J o u r n a l of the A m e r i c a n S t a t i s t i c a l A s s o c i a -t i o n , V o l . 81, No. 395. 9. G r i z z l e , J . E . , S t a r m e r , C.F., and Koch, G.G. ( 1 9 6 9 ) , " A n a l y s i s of C a t e g o r i c a l Data by L i n e a r M o d e l s , " Biome-t r i c s , V o l . 33, 133-158. 10. K e n d a l l , M.G. and S t u a r t , A. ( 1 9 7 7 ) , The Advanced Theo-r y o f S t a t i s t i c s , V o l . 2, F o u r t h E d i t i o n , London: C h a r l e s G r i f f i n and Company, L t d . 11. K i m e l d o r f , G., May, J.H., and Sampson, A.R. ( 1 9 8 2 ) , S t u d i e s i n the Management S c i e n c e s , V o l . 19, O p t i m i z a -t i o n i n S t a t i s t i c s , S.H. Z a n a k i s and J.S. R u s t a g i , e d s . , N o r t h H o l l a n d , 117-130. 12. Knapp, T.R., ( 1 9 7 7 ) , "The U n i t - o f - A n a l y s i s P r o b l e m i n A p p l i c a t i o n s of S i m p l e C o r r e l a t i o n A n a l y s i s t o E d u c a -t i o n a l R e s e a r c h , " J o u r n a l o f E d u c a t i o n a l S t a t i s t i c s , V o l . 2, No. 3, 171-186. 13. McKeeman, C A . ( 1978), A S t a t i s t i c a l A n a l y s i s o f Math 100 Grades R e l a t e d to B.C. H i g h S c h o o l F a c t o r s , Mas-t e r ' s T h e s i s , The I n s t i t u t e of A p p l i e d M a t h e m a t i c s and S t a t i s t i c s , The U n i v e r s i t y of B r i t i s h C o l u m b i a . 14. M o r r i s o n , D.F. (1976), M u l t i v a r i a t e S t a t i s t i c a l Meth-ods, New York: M c g r a w - H i l l . 15. R o b i n s o n , W.S. ( 1 9 5 0 ) , " E c o l o g i c a l C o r r e l a t i o n s and the B e h a v i o r of I n d i v i d u a l s , " A m e r i c a n S o c i o l o g i c a l Review, V o l . 15, 351-357. - 1 01 -16. SAS I n s t i t u t e I n c . ( 1 9 8 5 ) , SAS U s e r ' s G u i d e : S t a t i s -t i c s , V e r s i o n 5 E d i t i o n , C a r y , NC: SAS I n s t i t u t e I n c . - 1 0 2 -
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A statistical analysis of finding the best predictor...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A statistical analysis of finding the best predictor of success in first year calculus at the University… Lee, Robert Eugene 1987
pdf
Page Metadata
Item Metadata
Title | A statistical analysis of finding the best predictor of success in first year calculus at the University of British Columbia |
Creator |
Lee, Robert Eugene |
Publisher | University of British Columbia |
Date Issued | 1987 |
Description | In this thesis we focus on high school students who graduated from a B.C. high school in 1985 and then proceeded directly to the University of British Columbia (UBC) and registering in a first year calculus course in the 1985 fall term. From this data, we want to determine the best predictor of success (the high school assigned grade for Algebra 12, or the provincial grade for Algebra 12, or the average of the high school and the provincial grade for Algebra 12) in first year calculus at UBC. We first analyze the data using simple descriptive statistics and continuous methods such as regression and analysis of variance techniques. In subsequent chapters, the categorical approach is taken and we use scaling techniques as well as loglinear models. Finally, we summarize our analysis and give conclusions in the final chapter. |
Subject |
Education, Secondary -- British Columbia College students -- College students -- British Columbia Mathematics -- Study and teaching -- British Columbia |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-07-14 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0096973 |
URI | http://hdl.handle.net/2429/26430 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1987_A6_7 L42_5.pdf [ 5.87MB ]
- Metadata
- JSON: 831-1.0096973.json
- JSON-LD: 831-1.0096973-ld.json
- RDF/XML (Pretty): 831-1.0096973-rdf.xml
- RDF/JSON: 831-1.0096973-rdf.json
- Turtle: 831-1.0096973-turtle.txt
- N-Triples: 831-1.0096973-rdf-ntriples.txt
- Original Record: 831-1.0096973-source.json
- Full Text
- 831-1.0096973-fulltext.txt
- Citation
- 831-1.0096973.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096973/manifest