CLASSIFICATION CAPABILITIES OF N E U R A L N E T W O R K S : A C O M P A R A T I V E STUDY USING STUDENT A C D E M I C P E R F O R M A N C E by SANSERN ART PROMPIBALCHEEP B B A , Thammasat University, 1988 M B A , University of Colorado at Boulder, 1992 A THESIS SUBMITTED IN P A R T I A L F U L F I L L M E N T OF THE REQUIREMENTS FOR THE D E G R E E OF M A S T E R OF SCIENCE in THE F A C U L T Y OF G R A D U A T E STUDIES Department of Management Information Systems We accept this thesis as confirming to the required standard THE UNIVERSITY OF BRITISH C O L U M B I A April 1999 Â© Sansern Art Prompibalcheep, 1999 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Management Information Systems The University of British Columbia Vancouver, Canada Date April 30, 1999. DE-6 (2/88) 11 Abstract Among the emerging information technologies, neural networks have been increasingly recognized as a powerful method for classifying and predicting complex data. There have been a number of neural network paradigms being developed. Each paradigm has its own specific features that are applicable to particular tasks. The most popular neural network paradigm among users in the management area is the backpropagation. This paradigm has been extensively tested and proven to outperform traditional techniques on several classification tasks. However, there have been only a few studies determining the comparative capabilities of the backpropagation paradigm to other paradigms that are potentially applicable to the same task. The main purpose of this thesis research is to investigate capabilities and performance of two neural network paradigms: backpropagation and learning vector quantization (LVQ). The other purpose is to prove that neural networks outperform a traditional technique of ordered probit model, which is used as a performance benchmark. In this study, the two neural network paradigms and the ordered probit model are utilized to classify and predict the academic performance of U B C Commerce students. For each paradigm, a number of neural network models with distinct configurations are developed. The first investigation determines how well each paradigm performs in classifying and predicting academic success. The results from running those models on both training and validation samples show that the backpropagation paradigm significantly performs better than the L V Q paradigm in most instances. The second investigation compares the best performance of those paradigms with the performance of ordered probit model. After utilizing the A N O V A to test the Ill statistical significance of difference in prediction performance, the findings show that both backpropagation and L V Q paradigms have higher performance levels than ordered probit model. However, the difference between the performance of backpropagtion and ordered probit model is significant at only the 90% confidence level. On the other hand, the difference between performances of L V Q and ordered probit model is significant at the much higher level of 95%. Essentially, the study has shown that the backpropagation paradigm, on the average, still outperforms the L V Q paradigm in classifying and predicting complex data. The study has also proven that both backpropagation and L V Q are significantly better prediction techniques than the ordered probit approach. iv Table of Contents Abstract ii Table of Contents iv List of Tables vi List of Figures xii Chapter One Introduction A Study of Neural Networks: Importance and Justification The U B C Undergraduate Business Program: Its Student Recruitment Process General Issues Concerning the Prediction of Academic Success Neural Networks as the Alternative Prediction Technique Scope of this Study 1 4 8 10 11 Chapter Two Literature Review Prediction of Academic Performance by Traditional Techniques Prediction of Academic Performance by Neural Network Approaches Applications of Neural Networks to Managerial and Operation Tasks 22 Chapter Three Neural Networks Introduction to Neural Networks Feedforward Multilayer Perceptron Self-Organizing Map (SOM) Learning Vector Quantization (LVQ) 28 33 39 44 Chapter Four Research Purpose, Procedures, and Methodologies Purposes and Objectives Description of Input and Output Variables Privacy of Data Data Samples and Data Collection Data Analysis Methodologies Performance Evaluation Criteria 48 54 58 59 62 76 Chapter Five Research Results Descriptive Statistics and Ordered Probit Model's Parameters Classification and Prediction Capabilities between Backpropagation and Learning Vector Quantization Classification and Prediction Capabilities between Ordered Probit Model and Neural Networks 14 20 79 86 90 V Chapter Six Analysis and Interpretation Classification Power of Backpropagation and Learning Vector Quantization Observations from Descriptive Statistics Performance Comparison between Neural Networks and Ordered Probit Model 96 108 110 Chapter Seven Conclusions Potential Contributions to Academic Circles Limitations and Conditions Deliverables to the B.Com. Program Suggestion for Future Research Concluding Remarks 113 115 117 119 121 Bibliography 124 Appendix I Tables of Results in Detail 131 vi List of Tables Table 4.1: The grade ranges, their corresponding categorized groups, and their corresponding grade letters 58 Table 4.2: Frequency and percentage of complete, incomplete, and total records, classified by specialization 61 Table 4.3: Descriptives of means, standard errors, and effects for M A T H 100 and M A T H 140 64 Table 4.4: Analysis of Variance for testing the significant difference between means of M A T H 100 and M A T H 140 64 Table 4.5: Descriptives of means, standard errors, and effects for M A T H 101 and M A T H 141 64 Table 4.6: Analysis of Variance for testing the significant difference between means of M A T H 101 and M A T H 141 65 Table 4.7: Descriptives of means, standard errors, and effects for E N G L 110, E N G L 111, E N G L 120, and E N G L 121 66 Table 4.8: Analysis of Variance for testing the significant difference among means of E N G L 110, E N G L 111, E N G L 120, and E N G L 121 66 Table 4.9: Total number of training records and the proportion of them within each group, separated by Math options of each specialization 68 Table 4.10: Total number of cross-validation records and the proportion of them within each group, separated by Math options of each specialization 69 Table 5.1: Means, standard deviations, and ranges of values of input and output variables for the Accounting with M A T H 140 & 141 option 80 Table 5.2: Regression coefficients, standard errors, and z-values for the ordered probit model of Accounting with M A T H 140 & 141 option 81 Table 5.3: Means, standard deviations, and ranges of values for input and output variables for the Accounting with M A T H 100 & 101 option 81 Table 5.4: Regression coefficients, standard errors, and z-values for the ordered probit model of Accounting with M A T H 100 & 101 option 82 Vll Table 5.5: Means, standard deviations, and ranges of values for input and output variables for the Finance with M A T H 140 & 141 option 82 Table 5.6: Regression coefficients, standard errors, and z-values for the ordered probit model of Finance with M A T H 140 & 141 option 83 Table 5.7: Means, standard deviations, and ranges of values for input and output variables for the Finance with M A T H 100 & 101 option 83 Table 5.8: Regression coefficients, standard errors, and z-values for the ordered probit model of Finance with M A T H 100 & 101 option 84 Table 5.9: Means, standard deviations, and ranges of values for input and output variables for the Marketing with M A T H 140 & 141 option 84 Table 5.10: Regression coefficients, standard errors, and z-values for the ordered probit model of Marketing with M A T H 140 & 141 option 85 Table 5.11: Means, standard deviations, and ranges of values for input and output variables for the Marketing with M A T H 100 & 101 option 85 Table 5.12: Regression coefficients, standard errors, and z-values for the ordered probit model of Marketing with M A T H 100 & 101 option 86 Table 5.13: Average correct classification rates, both each group and total, of all developed neural network models 87 Table 5.14: Means of aggregate performance levels, measured in terms of the number of correct classified cases and the corresponding percentage, of each network paradigm within each specialization track 88 Table 5.15: F-Ratios and their probability levels resulting from the A N O V A test of a significant difference between performance levels of the backpropagation paradigm and those of the L V Q paradigm 89 Table 5.16: The correct classified cases and their correlated percentages, as of each group and of total, of the training data set among three different methods 91 Table 5.17: The correct classified cases and their correlated percentages, as of each group and of total, of the validation data set among three different methods 92 Table 5.18: Means of performance levels, in terms of the correct classification percentage, of each classification method 94 Vlll Table 5.19: Analysis of Variance for testing the significant difference between the mean of correct classification rates of ordered probit model and that of backpropagation 95 Table 5.20: Analysis of Variance for testing the significant difference between the mean of correct classification rates of ordered probit model and that of learning vector quantization (LVQ) 95 Table 6.1: Neural network configurations that produce the best total correct classification rates 97 Table 1.1: Means and standard deviations of input variables within each Categorized group, for the Accounting with Math 140 & 141 option 131 Table 1.2: Means and standard deviations of input variables within each categorized group, for the Accounting with Math 100 & 101 option 131 Table 1.3: Means and standard deviations of input variables within each categorized group, for the Finance with Math 140 & 141 option 132 Table 1.4: Means and standard deviations of input variables within each categorized group, for the Finance with Math 100 & 101 option 132 Table 1.5: Means and standard deviations of input variables within each categorized group, for the Marketing with Math 140 & 141 option 132 Table 1.6: Means and standard deviations of input variables within each categorized group, for the Marketing with Math 100 & 101 option 133 Table 1.7: Classification and prediction performance of the backpropagation models on the training data set of the Accounting with Math 140 & 141 option 133 Table 1.8: Classification and prediction performance of the backpropagation models on the validation data set of the Accounting with Math 140 & 141 option 134 Table 1.9: Classification and prediction performance of the learning vector quantization models on the training data set of the Accounting with Math 140 & 141 option 134 Table 1.10: Classification and prediction performance of the learning vector quantization models on the validation data set of the Accounting with Math 140 & 141 option 135 ix Table 1.11: Classification and prediction performance of the backpropagation models on the training data set of the Accounting with Math 100 & 101 option Table 1.12: Classification and prediction performance of the backpropagation models on the validation data set of the Accounting with Math 100 & 101 option Table 1.13: Classification and prediction performance of the learning vector quantization models on the training data set of the Accounting with Math 100 & 101 option Table 1.14: Classification and prediction performance of the learning vector quantization models on the validation data set of the Accounting with Math 100 & 101 option Table 1.15: Classification and prediction performance of the backpropagation models on the training data set of the Finance with Math 140 & 141 option Table 1.16: Classification and prediction performance of the backpropagation models on the validation data set of the Finance with Math 140 & 141 option Table 1.17: Classification and prediction performance of the learning vector quantization models on the training data set of the Finance with Math 140 & 141 option Table 1.18: Classification and prediction performance of the learning vector quantization models on the validation data set of the Finance with Math 140 & 141 option Table 1.19: Classification and prediction performance of the backpropagation models on the training data set of the Finance with Math 100 & 101 option Table 1.20: Classification and prediction performance of the backpropagation models on the validation data set of the Finance with Math 100 & 101 option Table 1.21: Classification and prediction performance of the learning vector quantization models on the training data set of the Finance with Math 100 & 101 option X Table 1.22: Classification and prediction performance of the learning vector quantization models on the validation data set of the Finance with Math 100 & 101 option 141 Table 1.23: Classification and prediction performance of the backpropagation models on the training data set of the Marketing with Math 140 & 141 option 141 Table 1.24: Classification and prediction performance of the backpropagation models on the validation data set of the Marketing with Math 140 & 141 option 142 Table 1.25: Classification and prediction performance of the learning vector quantization models on the training data set of the Marketing with Math 140 & 141 Option 142 Table 1.26: Classification and prediction performance of the learning vector quantization models on the validation data set of the Marketing with Math 140 & 141 option 142 Table 1.27: Classification and prediction performance of the backpropagation models on the training data set of the Marketing with Math 100 & 101 option 143 Table 1.28: Classification and prediction performance of the backpropagation models on the validation data set of the Marketing with Math 100 & 101 option 143 Table 1.29: Classification and prediction performance of the learning vector quantization models on the training data set of the Marketing with . Math 100 & 101 option 144 Table 1.30: Classification and prediction performance of the learning vector quantization models on the validation data set of the Marketing with Math 100 & 101 option 144 Table 1.31: Analysis of Variance for the training data set of the Accounting with Math 140 & 141 option 144 Table 1.32: Analysis of Variance for the validation data set of the Accounting with Math 140 & 141 option 145 Table 1.33: Analysis of Variance for the training data set of the Accounting with Math 100 & 101 option 145 xi Table 1.34: Analysis of Variance for the validation data set of the Accounting with Math 100 & 101 option 145 Table 1.35: Analysis of Variance for the training data set of the Finance with Math 140 & 141 option 145 Table 1.36: Analysis of Variance for the validation data set of the Finance with Math 140 & 141 option 145 Table 1.37: Analysis of Variance for the training data set of the Finance with Math 100 & 101 option 146 Table 1.38: Analysis of Variance for the validation data set of the Finance with Math 100 & 101 option 146 Table 1.39: Analysis of Variance for the training data set of the Marketing with Math 140 & 141 option 146 Table 1.40: Analysis of Variance for the validation data set of the Marketing with Math 140 & 141 option 146 Table 1.41: Analysis of Variance for the training data set of the Marketing with Math 100 & 101 option 146 Table 1.42: Analysis of Variance for the validation data set of the Marketing with Math 100 & 101 option 147 Xll List of Figures Figure 1.1: Number of students applying to the B.Com. Program and number of whom being offered, during the ten-year period (from 1987 - 1997) 6 Figure 3.1: Feedforward three-layer neural network 34 Figure 3.2: Single neuron with summation and transfer functions 35 Figure 3.3: Sigmoidal or logistic function 36 Figure 3.4: Self-organizing map neural network 41 Figure 3.5: Learning vector quantization neural network 46 Figure 4.1: The architecture of neural network models, showing the components within their input and output layers Figure 6.1: Self-organizing maps of the training data set, separated by categorized groups, for the Accounting with M A T H 140 & 141 option 70 103 Figure 6.2: Self-organizing maps of the training data set, separated by categorized groups, for the Accounting with M A T H 100 & 101 option 104 Figure 6.3: Self-organizing maps of the training data set, separated by categorized groups, for the Finance with M A T H 140 & 141 option 104 Figure 6.4: Self-organizing maps of the training data set, separated by categorized groups, for the Finance with M A T H 100 & 101 option 104 Figure 6.5: Self-organizing maps of the training data set, separated by categorized groups, for the Marketing with M A T H 140 & 141 option 104 Figure 6.6: Self-organizing maps of the training data set, separated by categorized groups, for the Marketing with M A T H 100 & 101 option 105 Figure 6.7: Bar chart comparing the correct classification rates, measured in percentages, of both neural network paradigms when applying to either training or validation data set 108 1 Chapter One Introduction This chapter provides a general overview of this research topic. It first addresses the importance of and necessity for a study of neural networks as a classification technique. Issues and difficulties concerning the prediction of academic performance of students in general, and of University of British Columbia (UBC) Commerce students in particular, are discussed. Next, it briefly explains why and how neural networks can help solve those academic difficulties. A brief study scope is described in the last section. A Study of Neural Networks: Importance and Justification One common but critical task for people within the management area is to decide between two or more possible alternatives. This task is, theoretically and practically, not an easy task since there are several inter-related factors involved in those alternatives that need to be carefully considered before an ultimate decision is made. The cumulative knowledge and experience of decision-makers fundamentally influences their ability to select the most preferable alternative. Within today's dynamic environments, however, making an accurate decision in a timely manner is also necessary. To achieve both accuracy and timeliness, the decision-makers need support from effective tools or technologies. These tools or technologies would utilize related knowledge and experience of decision-makers in processing all complicated factors and suggesting the best results within a short period of time. 2 There have been a number of emerging information technologies developed to help support decision-making tasks. Among the emerging technologies, neural network technology has become more and more popular with users in both academic and business worlds. The abilities to recognize and learn complex patterns of data, to generalize the acquired knowledge to unseen observations, to create data abstraction for classification, and to effectively handle noises within a data set, have made the neural network a powerful classification and prediction technique. Several researchers suggested that the neural network would be a desirable approach in situations where the exact form of the regression equation of a given set of data is not known. The nonlinear regression within a neural network algorithm is superior to the traditional regression model when the dimension of data is high and the relationship among variables does not correspond well to the assumed equation (Venugopal & Baets, 1994; Jain & Nag, 1997; Wang, 1998). Further, the assumptions about the underlying distribution and covariation among groups of data have no impact on the performance Of neural network. Neural networks were mainly and heavily used for analyzing data within scientific areas, such as face recognition, finger print identification, interpretation of sonar traces, and vehicle navigation (Gorr et al., 1994). Subsequently, they have been increasingly used for solving a wide range of problems, such as prediction, classification, clustering, and error detecting and control, etc., within the social, managerial, and behavioral science fields, as well as for substituting traditional techniques. In their 1997 survey, Wong, 3 Bodnovich, and Selvi, reported that there were more than 200 studies, conducted between 1988 and 1995, about neural network applications in business, management, and other related areas (Wong et al., 1997). Those studies designed and developed various neural network models for solving semi-structured or unstructured problems, as well as for supporting decision-making at either management control or operational control levels. There are several existing neural network paradigms. Each paradigm presents its own distinct features and capabilities that are applicable to particular tasks. Unfortunately, our knowledge about those network paradigms and their possible applications within the management area is still in an infant stage. There are a lot of aspects concerning the applications of those neural networks that are not well understood, and, thus, not fully utilized. As potentially extensive users of this technology, we would have to continually and thoroughly study those aspects to ensure that we will be utilizing the optimal features of neural networks. Most research studies, applying the neural network technology to classify data patterns, focus on investigating the differences between the performance of a particular neural network paradigm - the backpropagation - and that of traditional techniques, such as multiple regression, discriminant analysis, and logistic regression (Salchenberger et al., 1992; Tarn & Kiang, 1992; Bansal et al., 1993; Hruschka, 1993; Jain & Nag, 1995; Lenard et al., 1995; Shanker et al., 1996; Zhang & Hu, 1998). There are also other neural network paradigms that have potential capabilities to solve the problems usually performed by the backpropagation. From the current knowledge of the author, however, 4 there are only a few studies that explore how well different neural network paradigms perform on the same specific task (Doumas et al., 1995; Orwig et al., 1996). The author believes that, for a particular problem, we should try implementing several but suitable paradigms to identify which one produces the optimal results. After extensive experimenting with other paradigms, we might be able to argue that the backpropagation is not the only favorable neural network option for solving data classification problems. The UBC Undergraduate Business Program: Its Student Recruitment Process The student recruitment process is one of the most important tasks for any academic institutions. Having effective admission criteria, procedures, and tools ensure that they are recruiting the right students with desirable qualifications. Generally, the mission of any schools is to produce graduates with high levels of knowledge, skills, and competency to well serve the public at large. A significant factor that influences the schools in accomplishing that mission is the quality of their input resources - the entering students. To be able to admit the most desirable students from the pool of applicants, schools should have hands-on knowledge about the influential indicators of academic success that can be used mainly as the recruitment criteria. At the same time, they should be equipped with an effective supporting system that would enhance the decision-making regarding the admissions. The University of British Columbia has long been recognized for its strong focus on academic excellence. The university has a goal of becoming the best higher education institution, as well as of maintaining its strong position among the leading academic 5 institutions worldwide (Stanbury et a l , 1998). A l l university's management teams have consistently pursued this goal. Dr. Martha Piper, the current president of U B C , noted that U B C is responsible for "preparing the future citizens of the world" (Piper, 1997). She suggested that the most critical concern among large and research-intensive universities, like U B C , that must be focused on in the next decade is "the purpose of the undergraduate educational experience." In her speech on September 25, 1997, Dr. Piper stressed the importance of creating a better, university-wide, undergraduate learning experience. As a well known business school, the Faculty of Commerce and Business Administration has set its objective to be the best business school in Canada (Stanbury et al., 1998). The Faculty has gained its reputation of academic excellence in both undergraduate and graduate programs. The undergraduate program (B.Com. Program) is the biggest program in the Faculty, producing about 450 graduates annually. These graduates directly or indirectly represent the Faculty whenever they make contacts to outside communities. Thus, the public perception toward the Faculty is strongly influenced by the performance and quality of these graduates. To be recognized publicly as the most outstanding business school, the Faculty should focus its efforts on the continual improvement of the B.Com. Program. The demand from applicants, both the U B C first-year and transfer students, who would like to enter the B.Com. Program, has been more than the capacity that the program can provide. According to figure 1.1, the average number of applicants to the program was 6 1,976 persons annually. However, the program could accept only 526 persons, or about 25% of the total applicants. Limited admissions are necessary since all academic resources and services the program possesses cannot effectively satisfy the demand from all applicants. To efficiently utilize its limited resources, the program should restrict its services to only individuals who strongly show their academic potential of success. 2500 0 0 0> 0 0 0> 0 0 0> 0 > Q\ 0 > 0> 0 > 0 ON > 0 0*\ > O OS N O 0> â€¢ Apply I e a r > 0 0> > 0> â€¢ Offered Figure 1.1: Number of students applying to the B.Com. Program and number of whom being offered, during the ten-year period (from 1987 -T997) 1 Currently, the B.Com. Program considers the G P A of students before entering the program a major recruitment criterion. This criterion applies to 99% of all entering students (Stanbury et al., 1998). The majority of those entering students are the students who just finish their first year in Art and Science at U B C . These students are required to take courses in English, Economics, and Mathematics during their first year. Grades of these courses are also considered, in association with the first-year G P A , for the The corresponding figures, used to create the bar chart, are excerpted from the exhibit 4-4 of the Interim Report of the Faculty of Commerce Undergraduate Program Review Committee as of March 20, 1998 (Stanbury et al., 1998). 1 7 admissions decision. In the Winter Session of 1997, the cutoff point G P A was 68%. This same consideration is applied to transfer second and third year students from other colleges as well. The above criterion reflects the belief that past academic performance is an indicator of future academic success. There have been several studies identifying that a high school or first-year college G P A is a good predictor of the ultimate performance, i.e., a graduated G P A , of students at the university level (Domer & Johnson, Jr., 1982; Shaughnessy & Evans, 1986; Gramet & Terracina, 1988). Other than the first-year GPA, the B.Com. Program also believes that performance of relevant prerequisite courses English, Economics, and Mathematics - should reflect the potential future performance of students within the Commerce and Business Administration disciplines. Proficiency in various contents of those courses is necessary for students to be academically successful in the B.Com. Program. However, the undergraduate program committee is still not satisfied with the existing admissions process. Even though the committee believes that the above factors can be used to predict academic success, their predictability is quite inconsistent. Some students, who did not perform quite well before entering the program, turned out to be very academically successful in the program. On the other hand, some other students, whose past academic performance in the first year was outstanding, could not perform successfully after they entered the program. The possible reason for those two episodes may be that the most appropriate recruitment formula still has not been found. Nobody 8 can tell exactly how much the overall GPA, the grades of those individual subjects, or anything else, would be required for an individual student to ensure that he or she will perform successfully in the program. The other concern of the B.Com. Program is the belief that the Faculty and the entire university are not able to attract the top-notch students from both within and outside the B.C. province (Stanbury et al., 1998). These students are identified as the ones who have excellent high school and/or college GPAs, can communicate effectively, or possess particular attributes "considered valuable in a classroom learning environment." The Faculty has worked on several alternatives, attempting to attract more outstanding high school students to apply for admittance to U B C . It is hoped that most of these students, if not all, will finally choose to enter the B.Com. Program in the second year. However, once the Faculty has a desirable pool of potential candidates, there should be the right recruitment criteria, as well as supporting tools, that help the admission committee screen all applicants and recruit those who show potential success in the program. These criteria and tools would ensure that the right students are entering the program and help reduce the problem that the enrolled students do not perform satisfactorily as expected. General Issues Concerning the Prediction of Academic Success Researchers and management in business education, as well as in other disciplines, are still looking for the best solution(s) for dealing with the task of academic success 9 prediction. A number of studies have been conducted to find out what factors significantly influence the ultimate academic performance of students at the college level. Most of these studies were focused on particular professional disciplines, such as Medical Science, Engineering, Architecture, Computer Science, and Business Administration, which have been facing a great demand in various aspects from the public at large (Domer & Johnson, Jr., 1982; Campbell & McCabe, 1984; Eskew & Faley, 1988; Young, 1989). It is hoped that knowing the factors that strongly affect academic performance would lead to ways to improve recruitment efforts, teaching and learning processes, and the quality of graduates and the academic programs. However, the results from those studies were somewhat conflicting and unsatisfactory. There is no general agreement on which factor or set of factors will correctly predict the academic performance. Moreover, no one can tell which methods or approaches are the most suitable for the academic success prediction task. Particular factors were reported to be significant predictors in some studies, while regarded as insignificant indicators in the other studies. Among possible explanatory variables included within the developed prediction models, such as multiple regression equations, most of them did not substantially contribute to the variation of the predicted results. On the average, these independent variables can explain only 50% or less of the variation of the dependent variable - the academic performance (Alspaugh, 1972; Konvalina et al., 1983; Ho & Spinks, 1985; Gramet & Terracina, 1988; Eskew & Faley, 1988; Young, 1994). This implies that there are still some explanatory variables that 10 have been unidentified but would be able to explain the variation better than those factors identified. The other reason for the unsatisfactory results is, probably, attributed to the selected techniques that are not appropriate in predicting the patterns of a given set of data. For example, a linear regression analysis might generate an incorrect regression function from a given set of data with non-linear relationships. In this case, researchers would have to have a priori knowledge about the appropriate regression equation that the data represent. Identifying the correct data relationship and the correct choice of equation are somewhat difficult, i f not impossible. Moreover, within a set of data observations, researchers might have to worry about multicollinearity, which damages the predictive power of regression equations due to a high degree of error variance (Tracey et al., 1983). In the case of using discriminant analysis, researchers have to assume that observations within each categorized group are normally distributed, and that the covariance matrices of variables in each group are somewhat identical. Otherwise, the classification power of the discriminant function will be drastically reduced. However, data pertaining to demographic and behavioral characteristics usually violate a priori conditions. These methodological problems could be to blame for unsatisfactory results. Neural Networks as the Alternative Prediction Technique Neural network approaches have been selected to predict the academic success of Commerce students because of two reasons. First, there might be numerous and diversified patterns within this data domain. Some data might contain significant noise, 11 which disguises a perception of the correct pattern. Essentially, the prediction of academic performance data "requires potentially complex and subtle modeling", which should be able to effectively handle those data subtleties (Gorr et al., 1994). Neural networks seem to possess features that meet this requirement. Second, some prior studies were not quite successful in applying neural networks to predict academic success of students. In spite of the unfavorable performance of neural networks, however, researchers in this area are still positive about the predictive capability of neural networks thanks to the impressive performance of neural networks in other areas. Further investigation of applying neural network paradigms other than the backpropagation is, therefore, necessary and justifiable. Scope o f t h i s S t u d y This research is basically aimed at classifying and predicting the performance of U B C Commerce students, using neural network technology. Currently, the B.Com. Program is not quite satisfied with its recruitment procedures, since the procedures cannot ensure that accepted students would perform successfully in the program. The B.Com. Program needs a decision support system that can enhance the prediction of the potential academic performance of individual students. The accurately predicted results should help the B.Com. Program committees make a better decision as to whether to accept or to reject a particular student into the program. This study is conducted with a majority group of entering students, the ones who finish their first-year studies at U B C . In addition, the subjects of this study are only the 12 students who started their studies when or after the new four-year curriculum was first implemented. This study intends to cover most specializations offered in the B.Com. Program. Three specializations, Business Economics, General Management, and International Business, however, are not included since they have their own requirements that are totally different from the others, and they have not so many students as well. The author is mainly interested in determining which neural network paradigm will be the most effective in predicting the academic performance of U B C Commerce students. Two different neural network paradigms are adopted to perform this academic performance prediction task. The first paradigm is a feedforward multilayer perceptron with the backpropagation (BP) algorithm. The second one is called learning vector quantization (LVQ), a supervised learning version of self-organizing map, with the competitive learning algorithm. The author also would like to determine capabilities and performance of neural networks when compared to those of a correlated traditional technique. Within this study, the ordered probit model is adopted as a performance benchmark, since its features and algorithms regarding data classification and prediction are comparable to those of neural networks. In the next chapter, related research studies concerning the prediction of academic achievement and the applications of neural networks are fully reviewed. This literature review provides reasons for the conduct of this study and justifications for the adoption of procedures, methods, and approaches within this study. Chapter three provides basic knowledge about concepts, theories, and applications of neural networks. It focuses 13 mainly on three different paradigms of neural network. After the "why" about this research study has been discussed, the "what and how" about the research study are then fully described within chapter four. The chapter four first identifies both purpose and objectives of the study. It then explains data description, data collection method, research procedures and methodologies for data analysis and results evaluation. Chapter five reports all corresponding results. These results include the descriptive statistics, ordered probit model's coefficients, classification and prediction rates, and parameters of A N O V A tests. Chapter six provides the discussion of data analyses and interpretations of results and findings. Finally, chapter seven gives conclusions, which include new findings about the applications of neural networks that can contribute to the body of knowledge, any existing limitations and conditions regarding the results of this study, and suggestion for possible future research topics. 14 Chapter Two Review of Literature This chapter provides an in-depth review of research studies related to this thesis research. It describes the "what, why, and how" of the conduct of those studies, and the ultimate results from them. The contents of this chapter are basically organized into three parts. The first part covers prior research studies mainly focusing on identifying factors that significantly influence academic performance in various academic disciplines. It also includes some studies that attempt to predict academic performance using various prediction techniques. The second part consists of prior studies dealing with the application of neural networks to predict academic performance. The research studies in the third part are those of neural networks adopted and applied to tasks other than the prediction of academic performance. Prediction of Academic Performance by Traditional Techniques A number of studies have been conducted to identify factors that significantly contribute to the variation of academic success of college students. Some studies just investigate the factors that affect academic performance at a college level in general, while the others focus on the performance within particular disciplines. These research studies assumed some possible factors, and used conventional techniques, such as multiple regression or discriminant analysis, to prove their hypotheses. 15 Several factors were identified as the predictive indicators of the academic success. These predictive factors can be categorized into three groups - demographic factors, past academic performance, or cognitive factors, and psychological factors. Academic success is basically either measured as the final scores or grades, or classified into various academic standings. Most studies depend solely on the availability and easy accessibility of data items and the ability of using them as the predictive variables of future academic performance. These items are typically collected from students before entering a college, and usually consist of only demographic data and pre-college academic records. studies are mixed. The results from these Some of them claim that only demographic and academic backgrounds are capable of identifying the performance of students after matriculation (Mohammad & Almahmeed, 1988; Young, 1989). Other studies, on the other hand, argue that demographic and past academic data are not sufficient in accurately predicting the performance at the college level. They strongly support the inclusion of psychological factors, such as motivation, attitude, and perception into the prediction equations, since these factors can explain more variation of the ultimate academic performance (Nisbet et al., 1982; Gramet & Terracina, 1988; Evans & Simkin, 1989). There exists a behavioral viewpoint arguing that past academic performance strongly and significantly reflects future performance. This argument has been restated and confirmed by a number of educational research studies in various disciplines (Fowler & Glorfeld, 1981; Touron, 1983; Shaughnessy & Evans, 1986; Eskew & Faley, 1988). This past 16 academic performance is usually measured as the overall grade point average earned at the last year of high school or at the college level before entering the academic programs at a higher level. It sometimes covers the grade of particular courses taken in the past. Some researchers argue that the closer the measurement of prior academic performance is to the point of prediction, the stronger the past performance will be as the indicator of future academic success. For example, grade point averages at both high school and firstyear college levels were claimed to be significant discriminators of success and failure among Architecture students (Domer & Johnson, Jr., 1982). Since the admitted students would enter the Architecture program in their second year, Domer and Johnson, Jr. argued that the admissions decision should not be made until the end of the first year. They found that even though both high school and first-year college GPAs were the good predictors, the latter was much stronger in predicting more accurate results. They, thus, did not recommend using pre-matriculation factors as sole criteria for recruitment. The post-matriculation factors as the strong indicators of later academic achievement in college are also confirmed by other studies. In an attempt to predict the status of students, i.e., persistence, early withdrawal, and dropout, after their freshman year, Pascarella, Duby, Miller, and Rasher adopted as much as 19 pre-college performance, demographic, and attitudinal variables plus two additional post-enrollment academic related variables (Pascarella et al., 1981). Using multiple discriminant analysis with the 1 Discriminant analysis is a popular classification technique. It is used to assign a data observation into one of several distinct groups. Whichever group the observation belongs to depends upon a result calculated from the discriminant equation. The result falls into one of the ranges of values set for each group. 1 17 set of these 19 pre-enrollment traits, only nine of them showed to be the significant discriminators. The researchers could only distinguish between the dropout students and the rest of persistent and early withdrawal students. However, after they included one of two post-enrollment factors - the first quarter G P A - into the discriminant equation along with the best five pre-enrollment variables, they were able to clearly distinguish between the persisters and early withdrawals. There is a further argument extending the main aspect of the previous argument. It states that the past performance of particular subjects would be a good indicator of the later success of students pursuing similar or related fields. This statement is basically evident in the fields of science, applied science, and medical science. For example, it has been long hypothesized that individuals who are proficient in Mathematics should be able to perform successfully in Computer Science. This hypothesis was confirmed by several studies (Alspaugh, 1972; Fowler & Glorfeld, 1981; Konvalina et al., 1983; Campbell & McCabe, 1984; Butcher & Muth, 1985; Oman, 1986). Further, some of them found that past experience to computers during at least one high school level could identify success in college level Computer Science courses. Eskew and Faley found that students who had a pre-college exposure to accounting or bookkeeping courses performed significantly better in the college-level introductory accounting courses than students who did not (Eskew & Faley, 1988). Two studies identified gender as an explanatory factor (Campbell & McCabe, 1984; Cronan et al., 1989). These studies used discriminant functions to classify subjects into 18 particular groups. It appeared that the gender of the individual was consistently indicated as a significant variable in their classification models, even though the researchers believed that gender was not an achievement indicator. However, gender did not enter the classification model in the study by Fowler and Glorfeld. Rather, their model included age as a relevant variable (Fowler & Glorfeld, 1981). Ironically, similar to the role of gender, age was considered to have only marginal importance to the classification models. Age also showed conflicting results. While Fowler and Glorfeld reported an inverse relationship between age and academic success, Konvalina, Stephens, & Wileman, instead, found that the older the students were, the higher the scores they got (Konvalina et al., 1983). The results from those studies, however, were not conclusive in terms of the significant relationship among variables. Some studies reported very low or no significant relationship at all between any independent variables and aptitude or academic success. In 1980, a study was conducted to investigate how well particular variables were able to predict individual programming skill (Mazlack, 1980). The researcher wanted to know whether academic discipline, gender, number of semesters in college, and the Programmer's Aptitude Test score would be correlated with academic performance. He found that none of them were significantly correlated to the success of the programming course. He concluded that it was not possible to predict success in programming on the basis of some personal and behavioral attributes, as well as standardized written tests, when the involved subjects were of college level. 19 In their attempt to predict individual mastering of computer concepts (Evans & Simkin, 1989), the researchers found that only a few variables they used were relatively strong enough to be good predictors. They developed six different linear regression equations to measure different aspects of computer proficiency. Neither of them had a strong explanatory power (their average R was less than 24%). However, the researchers suggested that other than the set of typical predictive variables, such as demographic, behavioral, and academic factors, the psychological factors could be important predictive factors of computer proficiency. Finally, there is another group of studies attempting to test the prediction capability among different traditional methods, using the same set of observations. The purpose behind these studies is to investigate other possible methods that would improve the accuracy of predicted results, and, it is hoped, to have them replace the generally used method. Tracey, Sedlacek, and Miars applied both standard least squares regression analysis and ridge regression with the same set of admissions variables to predict academic success (Tracey et al., 1983). The ridge regression was developed to overcome the weaknesses of the least squares regression regarding the multicollinearity. Disappointingly, the ridge regression only performed as good as or slightly better than the least squares regression. Although its findings were not favorable, this last study had indirectly opened the door of the quest for other viable alternatives. It implicitly suggested that there could be other methods that improve the accuracy of academic success prediction that the standard methods could not achieve. 20 Prediction of Academic Performance by Neural Network Approaches From the survey of literature relating to the topic of this section, the author has found only two studies. Both of these articles applied the backpropagation neural network to predict the academic performance of college students. One was conducted with undergraduate students, while the other with graduate students. The main purpose of these studies was to test the prediction capability of the backpropagation neural network, and to compare it with some traditional research methodologies. The first study (Gorr, et al., 1994) applied backpropagation neural network, linear regression, stepwise polynomial regression, and linear admission decision rule methods, to predict student GPAs at the College of Pharmacy of North Dakota State University. The dependent variable to be predicted was the total G P A of all courses taken in the last two years of the five-year program. The set of independent variables in this study was quite subjective. The researchers decided to adopt the same set of variables used by the admissions committee in the admission decision formula. However, they added three more parameters in their developed models. Predicted results from the models of the four methods were compared in terms of mean errors and mean absolute errors within 95% confidence intervals. The study found that although there were differences among the central values of both error terms of the prediction methods, these differences were not statistically significant. A l l prediction methods seemed to perform at the relatively same level, and the backpropagation neural network did not outperform any of them. To explain the results, the researchers provided two interpretations. First, they believed that no underlying structure of the data set within that domain could be detected. Second, 21 even i f an underlying structure did exist, they might not have used the full capacity of that backpropagation neural network approach. The second study (Wilson & Hardgrave, 1995) was conducted at a major southwestern university to determine the capability of various classification techniques in predicting student success in the M B A program. The academic success was measured in terms of the first-year grade point average. The predicted GPAs were categorized into three groups: high risk, questionable, and no risk. The researchers identified eight independent variables, which were G M A T total, G M A T verbal, G M A T quantitative, undergraduate GPA, sex, attendance (full or part time), work experience, and age. Models from four different techniques, namely, least square multiple regression, discriminant analysis, logistic regression, and backpropagation neural network, were developed. The researchers selected the correct classification rate of the high-risk group as an indicator of prediction ability. The results showed that none of those approaches could accurately predict the students in the high-risk group. Moreover, the backpropagation neural network performed, on the average, at the same level as the other approaches. The researchers blamed the composition of data variables and the violation of statistical assumptions concerning the nature of data as the possible causes of the below-expected performance levels of all models. However, they stated their belief that the backpropagation neural network would be a promising alternative classification methodology. 22 The results from both studies implied that the prediction and classification capabilities of the backpropagation neural networks were not as impressive as had been claimed. Both studies suspected that some features of their research procedures, less-than-full-capacity neural networks, and the nature of data samples were the reasons for the unimpressive performance. Their reserved conclusions about the prediction capability of neural networks suggested a further investigation within this similar type of task would be beneficial. Applications of Neural Networks to Managerial and Operation Tasks According to their survey of journal articles published between 1988 and 1995, Wong, Bodnovich, and Selvi found that neural networks have been extensively used within Finance and Production, Operation, and Management Science areas (Wong et al., 1997). The most popular neural network paradigm for forecasting and classification tasks is the feedforward neural network with the backpropagation algorithm. Tarn and Kiang applied the backpropagation neural network to predict the bankruptcy of banks in Texas based on 19 financial ratios (Tarn & Kiang, 1992). They compared performance of neural network approach with linear discriminant, logistic regression , kNN , and ID3 algorithm' 11 111 v Logistic Regression or logit model utilizes a nonlinear logistic function to identify which group an observation is assigned to. It is a technique that applies linear regression to data samples, which can be classified into one of only two groups. A result from the logistic function is compared with the cut-off point to determine the ultimate group of the observation. k-Nearest-Neighbor (kNN) is a non-parametric method that classifies an observation into one of several groups based on some quantitative independent variables. An unseen observation is assigned to a group to which most of its k nearest neighbor (training) observations belong. The ID3 algorithm is simply a decision tree with a distinct classification algorithm. It employs a splitting procedure that repeatedly partitions a set of observations into some disjointed groups. II III i V 23 approaches. After applying the jackknife method to reduce biases within the data sets, v they proved that the neural network provides better prediction than other approaches. Salchenberger, Cinar, and Lash developed a feedforward neural network model to predict the financial health of thrift institutions (Salchenberger et al., 1992). The predicted results from neural network were compared with those from the logit model. The neural network performed as well as or better than the logit model for all examined cases. Lenard, Alam, and Madey adopted a revised version of the backpropagation algorithm, called the GRG2 algorithm. They applied this GRG2 network, a typical backpropagation neural network, and a logit model to suggest which firms the auditor should issue modified reports indicating a going concern uncertainty (Lenard et al., 1995). The GRG2 model was the best performer, while the logit model was the worst performer. The superiority of backpropagtion neural network over traditional statistical techniques was also confirmed by a number of studies, applying those techniques to both real world and simulated data samples (Bansal et al., 1993; Hruschka, 1993; Subramanian et al., 1993; Yoon et al., 1993; Jain & Nag, 1997; Zhang & Hu, 1998). The superior prediction capability of a typical backpropagation network over the traditional techniques was not brought to light without suspicions. The typical backpropagation neural network sometimes does not significantly outperform some traditional techniques under some specific circumstances. Patuwo, Hu, and Hung, compared the backpropagation neural network with four other techniques - linear The Jackknife method is a statistical procedure that helps produce the unbiased estimates of error. It allows a user to determine the uncertainties of estimators usually derived from a data set with small numbers of observations. v 24 discriminant analysis, quadratic discriminnant analysis, k-nearest-neighbor (kNN), and linear programming - in classifying simulated data samples (Patuwo et al., 1993). They found that, although the backpropagation performed better than the other techniques on the training samples, it did slightly worse on the validation samples. However, they believe that when it comes to the real world samples, which usually violate several underlying statistical assumptions, the backpropagation should be seriously considered over the traditional techniques because of its "generality and flexibility." Wang discussed the drawbacks of backpropagation neural networks when applied to various managerial tasks (Wang, 1995). He pointed out that individual backpropagation neural network models might produce different classification results for a certain set of data. This variation just happened by chance and there was no clear explanation behind its occurrence. Curry and Morgan discussed their concern about the deficiencies of gradient descent methods utilized within the backpropagation algorithm (Curry & Morgan, 1997). They suggested some changes in the algorithm in order to improve the performance of backpropagation neural network. Other groups of researchers have investigated the usability of other neural network paradigms for managerial decision making tasks. Some of them adopted the self- organizing map (SOM) neural network to handle data grouping problems. Orwig, Chen, and Nunamaker, Jr., implemented a S O M neural network to the problem of classifying outputs from electronic brainstorming sessions (Orwig et al., 1997). They wanted to vl An electronic brainstorming session is a part of the electronic meeting setting. The electronic brainstorming is a technique that helps electronically collect information related to complicated problems. It is useful for situations where maximum, unstructured, and anonymous participation is strongly required. VI 25 evaluate how well the S O M network would perform in that classification task, comparing to a human expert and a Hopfield neural network ". The results showed that the S O M v network outperformed the Hopfield network in all aspects. However, the SOM. network can only at most perform as good as the human expert for some aspects. The weakness of the S O M network is its lack of precision in producing distinct topics from the brainstorming sessions. Doumas, Mavroudakis, Gritzalis, and Katsikas, investigated the possibility of including neural networks within computer security systems to recognize and detect computer viruses (Doumas et al., 1995). They developed two neural network structures, backpropagation and SOM, to learn and classify behaviors of the viruses. To compare the classification performance of both structures, Doumas and her colleagues set two criteria: the accurate classification rate and the computational efficiency. Both neural networks performed equally in discriminating various patterns of computer viruses. However, the researchers recommended a backpropagation network for this task since it requires less computation time in both training and validating phases than the S O M network does. A study conducted in 1995 by Chen, Mangiameli, and West investigated classification and clustering capabilities of the S O M network, and compared them with those of The Hopfield model is one of the primary and primitive neural network models. It basically consists of a single layer of neurons with binary values. These neurons, in association with a recurrent network, can be used as associative memories. A recurrent neural network with associative memory has the capability to produce a full picture of output data when supplied with only a portion of related input data. v n 26 another seven clustering techniques (Chen et al., 1995). The purpose of this study was to determine how well each technique correctly clustered data within individual sets with different levels of data cluster dispersion. They found that, for all four simulated data sets with dispersion levels of very low, low, medium, and high, respectively, the S O M networks defined cluster memberships of data with significantly high percentages of accuracy. In addition, the S O M network outperformed, on the average, other clustering techniques at every level of data cluster dispersion. For a real world data set, only the S O M networks made the least misclassification rate, comparing to that of other techniques. Hybrid neural networks are the result of the combination, in one way or another, of learning algorithms and of architectures from both supervised and unsupervised learning paradigms of neural networks. The hybrid neural network has been proven to be able to solve various classification and prediction problems, but is still not quite so popular among researchers. The author has found only two studies implementing hybrid neural network models. The learning vector quantization (LVQ), a supervised learning version of the S O M network, was adopted by Gupta, Chen, and Murtaza to classify some industrial construction projects (Gupta et al., 1997). Due to the complicated, semi-structured, and risky nature of the problem, the advice from an expert had been the only possible method for classifying those projects. The researchers tried to prove that the L V Q neural network could be another likely option for this classification task. After comparing the 27 classification results from the network with those from the expert, they found that there was no significant difference. This means that the L V Q neural network performs as good as the expert, and, thus, could be an alternative classification tool for those projects. Huang and Kuh introduced another improvised neural network structure (Huang & Kuh, 1992). They believed that a new network structure combining a self-organizing feature map with a multilayer perceptron (MLP) would yield more accurate results in recognizing isolated words than would either the S O M or the M L P individually. In their hybrid neural network, the S O M part served as a mapping function that transformed higher dimension input signals into trajectories or clusters of lower dimension metrics on the map. The M L P part with backpropagation algorithm then classified the trajectories into the corresponding words. Wong, Bodnovich, and Selvi also reported that there has been an increasing amount of research on neural networks conducted for a wide range of business activities (Wong et al., 1997). They believed that neural networks would play a more critical role in supporting managerial decision making. They also suggested, among other possible areas of investigation, that an evaluation of the performance of different paradigms, architectures, and training schemes of neural networks should be further conducted. 28 Chapter Three Neural Networks This chapter basically describes concepts, theories, and learning algorithms concerning neural networks. The description is for the three different types of neural networks that are related to this study. The knowledge about each neural network type will be provided only at a fundamental level, which should be sufficient for the readers to understand how the neural networks operate to solve the classification problems. Introduction to Neural Networks The primary concept and model of the neural network was first introduced back in 1943 by McCulloch and Pitts (Ritter et al., 1992). They suggested a primitive model of a single neuron with two possible states, active (excitatory) and silent (inhibitory). Since then, there have been various neural network models developed by researchers from several disciplines. The main objective of those models was to solve problems or to handle tasks that could not be successfully accomplished by conventional techniques. Aleksander (1989) provided a definition of the neural network as follows: [Neural network is a] cellular network that has a natural propensity for storing experiential knowledge. [The network model] bears a resemblance to the brain in the sense that knowledge is acquired through training rather than programming and is retained due to changes in neuron functions. The knowledge takes the form of stable states or cycles of states in the operation of the network. A central property of such network is to recall these states or cycles in response to the presentation of cues. 29 Inspired by research studies of nervous systems, neural network models have been developed to imitate information processing that happens within a human brain. A l l types of neural networks are generally composed of neurons or groups of neurons and connections. The strength of connections is represented by their weight values. The direction of connections within a network can be one-way, two-way, or a combination of both. Distinct organizations of those components, along with distinct learning mechanisms, are developed for particular purposes or tasks. Most neural network models that have been created are either two-layer or multilayer networks. Neural networks have been implemented to solve several complicated problems in both scientific and nonscientific tasks. Neural networks possess promising characteristics and properties that are suitable for those tasks. First, neural networks possess parallel processing capability. Each neuron within a network behaves like a single independent processor. However, all neurons can be operated at the same time or in parallel. Parallel processing, therefore, enables a neural network to perform complex tasks much faster than traditional computation methods (Li, 1994; De Wilde, 1997). Second, neural networks behave like an associative memory. This is the ability to recognize, recall, and draw inferences or associations among various information items. In other words, i f we provide a neural network for only a portion of the entire set of input data, it should give us back the related complete output by recollecting to the past experiences or knowledge of the same type of data (Wasserman, 1989; Wang, 1993; De Wilde, 1997). Third, neural networks are fault-tolerant systems. The remarkable architecture of neural networks creates robust systems that effectively handle any malfunction of some neurons or 30 incomplete data sets and noises. Since the knowledge is evenly distributed over all individual storage elements and links, a loss of a few data items will cause only a small degradation of performance quality of a neural network (Knight, 1990; L i , 1994). To make the neural network models effectively work for particular tasks, users have to make sure that the selected architecture, learning algorithm, and related parameters are appropriate. The users need to consider the appropriate type and format of input and output data (Caudill, 1991; Yale, 1997). A l l of these issues, i f carefully planned, will greatly improve the performance of the neural network models and ensure more accurate results. At the same time, they will make analysis and interpretation of the outcomes more meaningful to a decision-maker. Learning within neural networks: The most remarkable characteristic of neural networks, hardly to be found within traditional computing systems, is their ability to learn from their experience. A neural network basically consists of a collection of intercommunicating neurons. The knowledge a neural network has learned will be stored in its individual neurons and in connections among the neurons. A given learning algorithm trains a neural network to recognize patterns of data in any domain. This learning process technically changes a value of parameters within each neuron and its connected weights. There are basically two distinct mechanisms of learning, supervised and unsupervised learning. 31 Supervised learning: This learning mechanism is sometimes referred to as learning with the help of a teacher. To learn within this mechanism, a neural network requires a pair of an input vector and a target output vector for each individual observation of the whole training data set. The target output vector represents the correct or desired results that the neural network is supposed to produce. At the initial stage, given an input vector, the network computes and produces its own output vector using the initial values of its parameters. The produced output vector is then compared with the corresponding target output vector. The difference, i f any, between those two vectors will be fed back to the network. During the feedback process, the values of connected weights, and, probably, the values of some parameters within neurons are adjusted. The changes in those values are to minimize the error between the produced outputs and the target ones. The learning and adjustment process will be sequentially applied to individual training vectors over and over again, until the difference for the entire training set has reached an acceptable level. Unsupervised learning: This learning mechanism has been recognized to be a close resemblance of the actual learning mechanism and environment within the brain. A neural network with this learning mechanism does not require a target output vector for a particular input vector. The network learns to recognize patterns of input vectors by extracting their statistical properties, grouping the vectors of similar properties together, and then assigning them into a distinct class. It is expected that the network would produce the same pattern of outputs for a subset of similar or closely related input vectors. The outputs from this neural network, however, are up to the process of learning 32 and are difficult to determine beforehand their specific patterns. The outputs generally need some transformation, visualization, and interpretation to make them become more comprehensible and meaningful to users. Input data for neural networks: Neural networks can handle various types of data from simple linearly correlated to complex nonlinearly correlated data. However, researchers generally agree that the neural network approach will be a very efficient and useful technique for finding relationships within a set of data that cannot be handled successfully by other methods or techniques. To enhance the capability of neural networks, the appropriate formats and properties of input data should be prepared. Yale suggested some guidelines for preparing the right data for training neural networks (Yale, 1997). The size of sampled input data should be substantially large to provide a high level of confidence that a neural network will finally converge. The large sample size will also ensure that all possible patterns or scenarios of input data are provided for training the neural network (Burke, 1991). The range of measurable values should be kept as tight as possible. This tight range reduces the chances of getting stuck into a local minimum. In case of categorical variables, the numerical values just behave like labels and do not convey any meaning. It is more efficient for neural networks to learn i f the categorical data are represented as a set of binary values of all corresponding possible categories. Finally, the training data set should be uniformly distributed. There should be relatively an equal number of input data for each possible scenario. 33 In the next sections, some learning algorithms and their corresponding neural network architectures are discussed. The intention is to provide general knowledge about individual neural network paradigms that will be applied within this research. First, the author will discuss the multilayer perceptron network with backpropagation learning algorithm, which is classified as a supervised learning mechanism. The self-organizing map network with a competitive learning algorithm, which is classified as an unsupervised learning mechanism, will be discussed next. Finally, the learning vector quantization with a combination of supervised and unsupervised learning mechanisms will be covered. Feedforward Multilayer Perceptron 1 The feedforward multilayer neural networks have been widely used to solve several decision-making problems. They have been proven to be able to perform complicated tasks that other simple neural networks cannot do. Several research studies also show that this type of neural network outperforms other conventional techniques and statistical tools within various domains. Their structure is designed to capture a complex mapping of inputs into desired outputs. Neurons in the middle layer are believed to function as pattern detectors. These neurons, thus, give the network an ability to make reasonable generalizations of unseen data. The concepts and theories of the Feedforward Multilayer Perceptron are extracted and adjusted from the following sources: Wasserman, 1989; Tarn & Kiang, 1992; rlruschka, 1993; Zahedi, 1993; Rumelhart et al., 1994; Rao & Rao, 1995; Gurney, 1997; Rogers, 1997. 1 34 A feedforward multilayer network is composed of neurons hierarchically organized into at least two layers, i.e., input and output layers. There are links that fully connect between neurons in different but adjacent layers, and no connections between nonadjacent layers or within the same layer. It is sometimes desirable to provide a bias with a constant output value of 1.0 to particular neurons. The bias behaves like an intercept in regression functions. It shifts the transfer function (which is mentioned in the next few sections) to the right of a horizontal axis. This permits the convergence of the learning process more rapidly. The widely adopted multilayer perceptron network has three layers, i.e., the input layer, the hidden layer, and the output layer (see figure 3.1). The number of neurons in the input layer equals the number of variables in the input set. The number of neurons in the output layer usually depends on the number of possible categories or classes of input patterns. There are no acceptable rules about how many neurons in the hidden layer will be optimal. The number usually comes out by trial-and-error, and will be different for different tasks with different data properties. Output Layer Hidden Layer Input Layer Figure 3.1: Feedforward three-layer neural network 35 A l l input and output signals generally flow only in one direction, from the input layer to the output layer. A n individual neuron accepts input signals from the external environment or other neurons in the previous layer. Within a neuron, all incoming inputs are computed with their corresponding connected weights. The total net input to the neuron is calculated by the following function NetXj = S "i = 1 Wy XJJ, if the neuron is in the hidden or output layer or NetXj = Xj, if the neuron is in the input layer, where NetXj is the total net input at neuron j , xy is an input signal from neuron i to neuron j , Xj is a value corresponding to the input variable at the input neuron j , and W y is a connected weight from neuron i to neuron j . The net input will be transformed into an output sent to neurons in the next layer or to the external environment. Figure 3.2 illustrates the transformation process occurring at a neuron. Figure 3.2: Single neuron with summation and transfer functions 36 There are several transfer functions, but the widely adopted one is the sigmoidal, or logistic function (see figure 3.3). This function, Out; = 1 (1 + e -Netx: produces a continuous value between 0 and 1 or, in some cases, between -1 and 1, for the output (Outj) from the neuron j . Output 0 Net Input Figure 3.3: Sigmoidal or logistic function Backpropagation is one type of supervised learning algorithm, usually applied to a feedforward multilayer network. The goal of this learning mechanism is to minimize the sum of squared errors of network outputs. The learning process starts when an input vector is presented at the input neurons. At each neuron, a net input value is computed, and an output signal is produced following the rules and formulas presented above. When the network produces an ultimate output vector at the output layer, each component within that vector is compared with a corresponding component within the 37 target output vector. The difference between the target and the actual values is computed. Let's suppose that there is only one hidden layer. After the neuron calculates the difference, which represents as error = targef, - actual , the error will be propagated 0 0 back to adjust all associated connections. The generalized delta rule is used for adjusting the error that is propagated back through the network. This adjusted error is computed as, 8 = [Out (l -Out )] error , 0 0 0 0 where Out (or actual ) is the output signal produced by the output neuron, and the 0 0 [Outo(l - Ouf,)] part is a derivative of the transfer function at the output neuron. This error is then used for calculating a change in weight value, following this equation, pw h 0 = a8 Outho, 0 where Who is the weight from the hidden to the output layer, a is a learning rate of a value between 0.1 and 1.0, and Outh is the output from a neuron in the hidden layer to all 0 neurons in the output layer. The calculated change in weight value (pWh ) is updated to 0 the corresponding weight of the connection between these two layers. The error propagation process then moves further back from neurons in the hidden layer to neurons in the input layer. Since there are no target outputs for neurons in the hidden layer, it is not quite possible to compute the error at those neurons directly. The indirect 38 way is to consider the influences that particular hidden-layer neurons have on all output neurons. It can then be assumed that the error at that hidden-layer neuron is the sum of weighted errors at the output neurons connected from that hidden-layer neuron. The error at the hidden neuron can then be computed by this equation, error = E h= 1 w m h ho error . 0 Again, this error would have to be adjusted by a derivative of the transfer function at the hidden neuron. The adjusted error at a hidden-layer neuron is now computed by the following equation, 8 = [Out (l - Out )] error . h h h h The change in weight value at connections between input neurons and hidden-layer neurons is calculated as, pw ih = a8 Outih, h where Outjh is the output from a neuron in the input layer to all neurons in the hidden layer. After all corresponding weights have been updated for that iteration, they will then be used for computing the ultimate output of the network in the next iteration. This learning process keeps repeating for the rest of the training data set, and might start over again, until the network produces a mean square error below a predetermined level. The 39 set of last updated weights is then stored for further generalization of unseen data vectors. At this point, the neural network has been completely trained. The backpropagation algorithm is the most fundamental mechanism of this type of supervised learning. There have been several more complicated algorithms developed to improve performance of the feedforward multilayer networks. Those revised versions of backpropagation make the networks learn much faster, consume less computational memory, and increase accuracy of the outputs (Demuth & Beale, 1998). Self-Organizing Map (SOM) 11 The principle of the S O M network was first introduced in early 1981 by Teuvo Kohonen based on the earlier work of Willshaw and Von Der Malsburg (Kiang et al., 1995). The network's structure and its learning mechanism were developed to closely resemble the topological organization and learning process of neurons within a human brain. The S O M network has a topology preserving property, which captures an important aspect of feature maps of the brain. The learning process of this network is based on the competitive learning, or winner-take-all algorithm, which is one type of unsupervised learning mechanism. This type of learning, along with the network's properties, enables the S O M network to extract some hidden topological features of a data set without an outside teacher. By providing only a set of input vectors, the network uses its learning algorithm to condense and map higher dimensional input data into a lower dimensional The concepts and theories of the Self-Organizing Map are extracted and adjusted from the following sources: Ritter & Kohonen, 1989; Kohonen, 1990; Hiotis, 1993; Chen et al., 1995; Kiang et al., 1995; Kohonen, 1995; Mazanec, 1995; Rao & Rao, 1995; Gurney, 1997; Orwig et al., 1997; Rogers, 1997. 11 40 spatial representation. A simple visualization, which is a result of the reduction of data dimensions, facilitates the interpretation of complicated relationships among data observations. The S O M network has been used as an alternative model to other traditional neural networks for similar tasks such as pattern recognition, classification and clustering, and process control. Typically, a S O M neural network consists of two layers, an input layer and a Kohonen layer (see figure 3.4). Neurons in the input layer behave like feeders. They accept input vectors from the external environment and simply pass them to the Kohonen layer without any alterations. Each neuron corresponds to each variable within an input data set. Neurons in the Kohonen layer or S O M neurons are generally arranged in particular topologies of two dimensions, such as squares, hexagons, or randomly ordered. Each of these neurons is connected to every neuron in the input layer. The set of connections to a single S O M neuron is called a codebook, or weight vector (wik e 9? ). The location of n individual S O M neurons and their neighborhood convey some meanings and relationships among the input data vectors. Each S O M neuron is to become a representation of one homogenous class, or category of input data. Thus, the adjacent neurons should represent similar classes. In other words, the further the distance between two neurons in the S O M grid is, the less similarity the category of data they represent. 41 Input Layer Kohonen Layer Figure 3.4: Self-organizing map neural network The primary competitive learning algorithm is a basic learning mechanism for the S O M and other similar networks. Its concept is to make individual neurons in the Kohonen layer compete with one another to represent one of the cluster subgroups of similar input data. For each training step, only a neuron, whose weight vector is close or very similar to the input vector, is a winner and is allowed to activate, while other neurons are inhibited. The degree of similarity between the weight vector of a particular neuron and an input vector is measured by the Euclidean distance (djk), which is calculated by this equation, + (X| " w ) 2 i k The Xj, where i = 1, 2, = 1, 2, n, is the i component within an input vector. The Wjk, where i l n, is the corresponding weight connected from the i particular k Kohonen neuron. th t h input neuron to the 42 For this primary competitive learning, individual neurons are trained independently. Only one winning neuron is allowed to learn a particular class by adjusting its weight vector to be closer to input signals that belong to that class. Thus, different neurons will learn different aspects of the input data. The order in which the neurons are assigned to capture different classes of input signals is, however, randomized, mostly depending upon the initial weight values. A neuron trained with this primary algorithm acts in the same manner as its counterpart in the backpropagation multilayer network, activating when an input data signal matches the group to which it is assigned. The competitive learning algorithm of the S O M network is adjusted to include the neighborhood aspect, as well as the topological organization of the S O M neurons. At the beginning of a training period, not only a winning neuron, but also its neighbors are allowed to tune themselves to similar input patterns. This tuning means that both a winning neuron and its neighbors adjust their weight vectors to be closer to those input vectors. After the network has been repeatedly presented with randomly selected input signals, each neuron gradually becomes a single prototype of a homogeneous set of data patterns in an orderly fashion. Once the network is completely trained, a particular S O M neuron becomes a localized response to an input vector of a particular class. The position of that neuron within the S O M grid map will reflect the most important feature coordinates of that input signal. The expected result from the S O M network is, hence, a spatial arrangement of training input signals. The ones that belong to a similar class will be clustered into a similar region. This fully trained S O M network can then be used to 43 classify unseen observations into the corresponding regions on the map based on their topological relationship with the prior training data set. The learning algorithm within the S O M neural network can be summarized as follows. First, weight values of the connections between input neurons and S O M neurons are initialized to be small random values. A learning rate and a neighborhood size (number of neurons in the neighbor vicinity) are also initiated. The learning is usually between 0.1 and 1.0, and the initial neighborhood size should be nearly the size of the S O M layer or the number of neurons in each dimension (Kiang et al., 1995). Second, each input signal is presented to the network. The distance between that signal and a weight vector of each S O M neuron is computed based on the Euclidean distance function as stated previously. The SOM neuron, which has the minimum distance value, is the winner, as shown in the following equation, 11 x â€” w 11 = min c {|| x - Wk||}, for k = 1, 2, n, where x is an input vector, w is the weight vector of the winning neuron, and Wk is the c weight vector of the k neuron within the S O M layer. Finally, the weight vector of both th the winning neuron and the neighborhood neurons will be adjusted to get closer to that input vector. The weight vector of other neurons outside the neighborhood vicinity will be kept intact. The adjusted values to each weight vector are calculated according to these equations, 44 w (t + 1) = w (t) + a(t) [x - w (t)], if k â‚¬ N (t) k or for k = 1, 2, k k c w ( t + l ) = w (t), if k Â£ N (t), k k c n, and where a(t) is the learning rate at time t, and N (t) is the c neighborhood size at time t. The learning rate and the neighborhood size are decreased at each iteration. This learning process will be repeated for the whole set of input data. The process will terminate when it reaches a predefined number of iterations set by the user. Learning Vector Quantization (LVQ) 111 The above S O M network with the unsupervised learning method might not be quite efficient for the classification tasks. Within the S O M method, users have no influence over which S O M neurons represent which particular classes of input data. It is, thus, somewhat difficult for interpretation and evaluation of whether the neuron responding to a particular input vector really represents the intended class to which the input belongs. By allowing neurons in the SOM network to freely compete with each other, two input vectors, which are supposed to belong to different classes, may be put into the same region just because the distance between them is small. To make their classification task become more accurate, the SOM networks should be trained in a supervised manner. A learning vector quantization (LVQ) neural network is a self-organizing map network that is trained with the supervised version of the competitive learning mechanism. The The concepts and theories of the learning vector quantization are extracted and adjusted from the following sources: Kohonen, 1990; Kohonen, 1995; Gupta et al., 1997; Demuth & Beale, 1998. 111 45 L V Q network can be viewed as a feedforward three-layer network that combines the Kohonen layer within its structure. It, therefore, can be used for pattern recognition and classification tasks. Since the input signals for these tasks are to be classified into a finite number of classes, subsets of corresponding connected weights are created to represent those classes. A l l of these connected weights have the same characteristics as the ones that learn patterns of data in the S O M networks. The L V Q network consists of three layers, i.e., the input layer, the Kohonen or competitive layer, and the output layer (see figure 3.5). The input and Kohonen layers are basically the same as those of the S O M network in terms of their physical layouts and computational functions. Input neurons are fully connected to every neuron in the Kohonen layer. The output layer is composed of neurons, each of which represents a particular class or category. A relatively equal number of neurons in the Kohonen layer are assigned to each class. As shown in the network structure (see also figure 3.5), the assignment is reflected by the fact that each output neuron is connected to only some neurons in the Kohonen layer. It is neurons in the Kohonen layer that learn the patterns of input data and perform the classification. The outputs from this Kohonen layer are then fine tuned to belong to a correct class at the output layer. 46 Output Layer Kohonen Layer Input Layer Figure 3.5: Learning vector quantization neural network The learning algorithm of this type of network is based on the SOM's competitive learning. Before the learning process begins, each possible subset of weight vectors is initially assigned to each class on a random basis. The initial random values of those weight vectors should roughly correspond to the probability density function of the input data. The training data set will consist of both the input signals and their corresponding target classes. During the learning period, the class regions in the input space are defined by the nearest-neighbor comparison method. This method measures the distance between an input vector (x) and any weight vectors (WJ). A neuron with the closest weight vector to that input signal is then selected to be a winner. The procedures for measuring the distance (Euclidean distance: d;) and declaring a winning neuron are the same as those of the S O M network. The class of the winning neuron will be compared with the target class of the training input signal. If the classes are similar, weights of the winning neuron will be adjusted in the direction that makes them move closer to the input vector. However, i f the class of 47 the winning neuron is different from that of the input signal, two sets of weights, belonging to two different neurons, will be adjusted. The weights of the winning neuron will be moved further away from that training input signal. At the same time, the weights of the other closest non-winning neuron to the input signal that belongs to the same class as the input signal are also adjusted. This adjustment makes that same-class neuron move closer to the input signal, and increases the chance that the neuron becomes a winner. Let w be the weight vector of the winning neuron. After each iteration, the w is updated c c following these equations, w (t + 1) = w (t) + ct(t) [x(t) - w (t)], c c c if the winning neuron is in the same class as the input vector (x), or w (t + 1) = w (t) - ct(t) [x(t) - w (t)], c c c w (t + 1) = w (t) + cc(t) [x(t) - w (t)], k k k if the winning neuron is not in the same class as the input vector (x), and w is a weight vector of k neuron, which is the closest neuron to th k the input vector and belongs to the same class as the input vector, or W,(t+ 1) = W j ( t ) , if the i neuron is not the winning neuron. t h The a(t) is a learning rate at time t. The training process continues repeating until there is no misclassification of classes or it reaches the number of iterations set by users. 48 Chapter Four Research Purposes, Procedures, and Methodologies This chapter outlines two main aspects: what goals this research study is trying to achieve and how to achieve them. The first part of the chapter defines the main purpose and objectives of this study. The rest of the chapter describes research procedures and methodologies, as means to achieve the purpose and objectives. This part includes the description of data, the approaches to collect and manage the data, and tools and techniques for analyzing the data and evaluating the results. Purposes and Objectives This research study is focused on the prediction of academic performance of students in the B.Com. Program at U B C , using neural networks. The main purpose of this research is to investigate capabilities and performance of neural networks when applied to a managerial decision-making task. As mentioned previously, neural network technology has increasingly assumed an important role in solving managerial problems. Numerous insights about this technology, in terms of its applications within the management area, are still waiting to be uncovered. Specifically, the research is aimed to achieve the following objectives. â€¢ To compare the classification and prediction capabilities of two neural network paradigms, which utilize different learning mechanisms. 49 â€¢ To prove that neural networks are more capable in handling the complicated classification tasks than a comparable traditional method, which is the ordered probit model. â€¢ To suggest the most appropriate method, the one that produces the most optimal predicted results, that will be embedded within the decision support system developed for the B.Com. Program. This research mainly investigates the classification and prediction capabilities of three different approaches - two neural network paradigms and an ordered probit model. The first neural network paradigm is a feedforward multilayer network. It typically adopts the backpropagation learning algorithm. The second one is a supervised-learning version of the self-organizing map (SOM) network, which is the so-called learning vector quantization (LVQ). This network paradigm utilizes the competitive learning algorithm. Most researchers still prefer using the backpropragation paradigm for their classification tasks. However, the L V Q network has promising potential for solving classification problems. Burke believed that the hybrid learning mechanism could improve the prediction accuracy, as well as other computational performance, of a neural network model (Burke, 1991). She argued that i f the unsupervised learning portion of a neural network can recognize a good clustering of data, the rest of the network can then be further trained for the associations of groups or combinations of groups with categories of interest. The hybrid network first finds the hidden structures or relationships within a 50 data set by its competitive learning method. This finding simplifies the problem by reducing the dimensionality of the data. The supervised learning portion then makes a classification decision by easily matching the clustered low dimensional data with the desirable classes. Why Select Backpropagation and Learning Vector Quantization From the review of literature in the previous chapter, it is evident that the backpropagation neural network has been extensively utilized to solve several classification and prediction problems. Despite its impressive performance in most situations, the backpropagation was quite unsuccessful in achieving the anticipated performance levels in the prediction of student academic success, as addressed in the corresponding studies of the backpropagation neural network application (Gorr et al., 1994; Wilson & Hardgrave, 1995). It could be that there are other neural network paradigms that are more capable of handling this academic success prediction task. The other reason could be that the procedures used to implement the backpropagation neural network are not quite perfect, as also mentioned by the researchers of those studies. The procedures utilized in those studies are likely to abort a chance of obtaining the backpropagation neural networks that have full classification and prediction capabilities. Within each of those studies, only one neural network with a particular configuration is applied on different sets of re-sampling data. It is, thus, quite questionable whether that particular neural network could perform optimally on every set of those data, even when they are from the same population. 51 Unlike the studies of backpropagation, two studies utilizing the hybrid neural networks did not clearly prove that their developed hybrid networks were better than or at lease as good as other traditional techniques, which can perform the similar tasks. Nobody can fully argue that those hybrid neural networks are the most promising techniques for handling the tasks under study. Thus, the remaining question concerning the capabilities of the hybrid networks is how well the hybrid networks can compete with other rival techniques in performing any related tasks. Both backpropagation and L V Q networks possess features and capabilities that are capable of handling the data classification and prediction. The major difference between them is in their learning algorithms, which adopt different concepts of pattern detection and recognition. The backpropagation algorithm adopts the gradient descent method that calculates the derivative of transfer function to adjust connected weights within the network. This attempt is aimed at minimizing the ultimate squared errors of outputs from the network. The L V Q algorithm adopts the Euclidean distance method that diminishes the higher dimension of data to the lower dimension, usually one or two, of data. This lower dimensional data are much easier for the fine-tuned classification process of the algorithm to assign them into the right groups. Even though in the prior studies the backpropagation neural networks did not perform satisfactorily in classifying and predicting academic success, it would be reasonable to try experimenting with them again. The first proposition for utilizing backpropagation neural networks in this study is that we can use them as a performance benchmark to 52 compare with the L V Q neural networks. The second proposition is that the procedure, which will be used in this study for applying backpropagation neural networks, is somewhat different from the procedures utilized in the prior studies. This adopted procedure, as will be mentioned in later chapters, is likely to facilitate the attainment of better performing backpropagation neural networks, and, it is hoped, to help generate more preferable prediction results than the procedures used in the prior studies. Why Select Ordered Probit Model Multiple regression analysis is a statistical technique that was commonly used in past studies of student academic performance. This regression technique is quite useful and appropriate for situations where continuous dependent variables, such as raw or actual scores, are predicted from sets of independent variables. However, within this study, the author is interested in classifying a student with particular academic and demographic backgrounds into one of categorized and ordered groups. The phenomena of academic performance prediction of this study are discrete rather than continuous. Multiple regression, therefore, might not be the suitable approach for this prediction task. Moreover, predicted results, which are continuous, from the multiple regression cannot be directly comparable to the categorical results produced by both neural network paradigms. There are several statistical methods that basically determine and estimate dependent variables with discrete values. One of them is the ordered probit model. This estimation technique is used to find relationships between an ordinal dependent variable and a set of 53 independent variables. The ordinal variable is composed of categorical values with orders, such as rating or ranking scores. quantitative meanings. None of the categorical values convey They just represent and signify the existence of orders. The ordered probit model is, hence, more appropriate for the task of academic performance prediction within this research study. It can be used to predict how well an individual student performs academically by directly identifying to which academic standing group he or she belongs. Predicted outputs from the ordered probit model are also readily comparable to those predicted by neural networks. Moreover, from knowledge of the author, no studies in the past have ever compared classification performance of neural networks with that of ordered probit model. The predicted outcomes from this ordered probit model technique will be used as a performance benchmark for comparison to those from neural network models. It is rather convincible to argue that neural networks are the better alternative i f we can show that they produce better results when compared to the comparable traditional technique, such as the ordered probit model. It is also interesting to find out whether, by utilizing the different procedure, the backpropagation paradigm would yield similar results as it did to those past academic success prediction studies. If the results from the backpropagation are, again, not quite impressive, we can then determine whether the L V Q paradigm could possibly yield better results. This finding would suggest which neural network paradigm is more appropriate for classifying and predicting academic related data, which are naturally complicated. 54 The question of whether there are any other neural network paradigms that outperform the backpropagation for the classification task could be partly answered here. Finally, it is also possible that neither of them performs satisfactorily, comparing to the adopted traditional method. This possibility would encourage the future research of other potential neural network paradigms, or even of other innovative techniques. Description of Input and Output Variables Applicants to the B.Com. Program can be the Art and Science students who finish their first-year studies at U B C , the transfer second- and third-year students from other colleges, or the mature students with relevant work experience (UBC Calendar, 1998). Since students in the first group make the biggest pool of applicants and their data are quite conveniently retrievable, they will be the focused subjects of this research study. The eligible first-year student applicants must have completed at least 30 credits on a full-time basis. They must also have taken the core requirements of English, Economics, and Mathematics during their first year. According to the above admission requirements, as well as the findings from prior research studies about student academic performance, the author has come up with the following list of input variables. The selection of these variable items also partly depends upon the availability and accessibility of them within the database at the Registrar's Office. 55 1. Gender 2. Age 3. First-Year GPA 4. Grade of an Economics course - E C O N 100 5. Grade of English courses - E N G L 112, E N G L 110, E N G L 111, E N G L 120, E N G L 121 6. Grade of Math courses - M A T H 100 & 101, M A T H 120 & 121, or M A T H 140 & 141 The output variable is the level of academic performance of a student in a particular specialization. This is measured by the calculation of grade point average of five core courses individual students are required to take in a specialization they choose. The B.Com. Program currently offers 10 specializations - Accounting, Commerce and Economics, Finance, General Business Management, Industrial Relations Management, International Business, Management Information Systems, Marketing, Transportation and Logistics, and Urban Land Economics. Each specialization, except Commerce and Economics, General Business Management, and International Business, requires its students to take at least five core courses from the provided course list. The following is a list of core courses provided within each specialization. Accounting: C O M M 353, 354, 450 (Mandatory) Two courses from C O M M 452, 453, 454, 455, 459 (Elective) 56 Finance: C O M M 371, 374 (Mandatory) One course from C O M M 376, 377, 378, 379 (Elective) Two courses from C O M M 471, 472, 475, 478 (Elective) Industrial Relations Management: C O M M 327, 328, 421, 425, 428 (Mandatory) Management Information Systems: C O M M 335, 436, 437, 438, 439 (Mandatory) Marketing: C O M M 362, 363, 365, 468 (Mandatory) One course from C O M M 460, 461, 462, 463, 464, 466, 467, 469 (Elective) Transportation and Logistics: C O M M 349, 399, 441, 449 (Mandatory) One course from C O M M 444, 445, 447 (Elective) Urban Land Economics: C O M M 307, 309, 407, 408 (Mandatory) One course from C O M M 406, 409 (Elective) 57 The calculated grade point average of those core courses is classified into one of three groups depending on which grade range it falls into. The author adopted the U B C grading system and adjusted it to make the three grade ranges. To make the comparison among different methods easier and more interpretable, the classified groups will be represented by an index of 1, 2, or 3. However, due to the particular architecture of neural network models, each categorized group is also shown as a three-column vector consisting of binary values of 0 or 1. Therefore, group 1 is represented by 1 0 0, group 2 by 0 1 0, and group 3 by 0 0 1. The following reasons explain why the author decided to have three categorized groups. First, prior studies concerning the classification of data observations generally implement a small number of categorized groups. Although the most number of categorized groups that has been found is four, the majority of those studies implement either two-group or three-group categories. It is quite evident that the more number of groups the category has, the harder and more complicated the interpretation of the results will be. Second, the three-group category seems to reflect the typical human assessment - good, fair, or poor toward the performance level of any subjects or entities. It seems also consistent with the philosophy behind the U B C grading system that converts the percentage grades into the letter grades. The author perceives all A-letter grades, A+, A , and A - , or grades above 80%, as the indication of good performance. A l l B-letter grades, B+, B , and B - , are considered the fair or average performance, since they cover the middle range of the passing grade continuum, starting from 50% to 100%. The grades below 67% or about two-third of the full 100% should be considered the poor performance. This performance 58 level is represented by the letter grades of C+, C, C-, and D. The Details of grades, both letter and percentage, in each categorized group are shown in table 4.1 below. Corresponding Group G r a d e R a n g e (%) 1 80-100 A+, A, A- 2 68-79 B + , B, B- Grade Letter 3 50-67 C+, C, C-, D Table 4.1: The grade ranges, their corresponding categorized groups, and their corresponding grade letters The author is not interested in predicting what exact grade point average will be for a particular student. This prediction manner does not provide sufficiently significant meaning, given the time and effort spent for coming up with the exact figure. Besides, the results from this study might be later used by the management of the B.Com. Program. Thus, the summarized figures, which show a big picture, would be more useful for making a quick but accurate decision than would the use of detailed figures. The above point of view is supported by the study of Wilson and Hardgrave. They proposed the classification approach as an ultimate approach for predicting the academic performance of students. They argued that a decision-maker would benefit more from the prediction of relative success/failure or good/poor performance of individual students than from the prediction of an actual G P A (Wilson & Hardgrave, 1995). Privacy of Data The privacy of individuals and confidentiality of their information are of utmost importance in this research study. Individuals' personal information is fully protected by 59 the Freedom of Information and Protection of Privacy Act (under section 35). A disclosure of personal information is possible only i f the released information is used solely for research or statistical purposes. To prevent the individual students from incurring unintentional or accidental damages due to the disclosure and usage of their personal data, the following procedures are applied.. The data items, such as name, social insurance number (SIN), and student number, that directly reveal identity of individual students, will not be included in this study. Even though these data items are excluded, someone might argue that it would still be possible to identify the individual. Putting values of the remaining data items, as defined within the description of variables, together could possibly create a unique set of values, which, in turn, could be traced back to the owner. This possibility is, to some extent, reduced by the adoption of the average grade of the core courses instead of the individual grades. The average value makes the chance of unique combination become less likely. Data Samples and Data Collection The raw input data are collected from the student database at the U B C registrar's office. Some of these data are used as training samples for building up the models, while the others are used as cross-validation samples for testing the predictive power of the models. Theoretically, a large sample size of data is required to ensure that effective and powerful classification models are created. For such a complex problem like this academic performance classification task, a big sample is quite necessary to make separation curves among classes of data be clearly defined (Putawo et al., 1993). 60 The set of data samples that basically cover all possible scenarios or patterns cannot be achieved only by a traditional data collection method, say, a survey of data from voluntary participants. Since some of the required data items are the final grades students get from taking particular Commerce courses, only those who would like to reveal their grades will fully participate. Students who do not perform quite well in those courses or the ones who do not care about this research project would not participate. Mazlack identified that by distributing questionnaires to the, target group of students, only a tiny portion of questionnaires will be returned. This fairly low return rate "makes the representativeness of the sample open to questions due to student self-selection. For example, 'did only those who did well return the questionnaire, or did only those who liked the instructor return the questionnaire?'" (Mazlack, 1980). The set of data acquired by this means is theoretically biased, since it represents only some portions of the entire population. In other words, it consists of only academic performance data from students who are willing to reveal their personal information. This set of data cannot be used for training the neural networks since it provides incomplete knowledge of the data domain. For the first data collection attempt, the author collected the samples of 1,729 records. These records belong to students who entered the program during 1991 to 1996. Less than 50% of these records are the complete records. The records are considered incomplete i f they have one or more missing values. Details of all related figures regarding the first set of raw data are shown in the following table 4.2. 61 Specialization Complete Records Freq % Incomplete Records Freq % Total Records Freq % Accounting 202 507 40% 305 60% 404 Finance 149 37% 255 63% Marketing 95 35% 178 65% 273 170 Urban Land Economics 46 27% 124 73% Transportation 17 29% 41 71% 58 109 Industrial Relations 0 0% 109 100% 0 0% 69 100% 69 MIS N/A N/A N/A N/A 139 Others Table 4.2: Frequency and percentage of complete, incomplete, and total classified by specialization 100% 100% 100% 100% 100% 100% 100% 100% records, The first complete set of data for each individual specialization is used as a training set for generating corresponding neural network models. It is also used for building up an ordered probit model corresponding to that specialization. Regarding the number of available complete records, the author decided to analyze the student data of Accounting, Finance, and Marketing. These three specializations - Business Economics, General Management, and International Business - require students to take more than five core courses. This requirement can create diversification of courses, which is quite difficult to handle, especially in terms of extracting all corresponding grades from the database. Further, only a small number of students are in those specializations. The author, hence, decided to discard them. Since Industrial Relations Management and Management Information Systems do not have any complete records, they are automatically excluded from this study. Urban Land Economics and Transportation and Logistics do not have a sufficient number of complete records, only 46 and 17 respectively. The author believes that these small sample sizes will not yield any statistically significant results. Thus, these two specializations were taken out from this study as well. 62 The second set of data is required for testing the developed models. Since there were no more complete records left, the second collection attempt to get some more data was conducted. This time, the data were collected further back, from 1987 to 1990. This data collection added another 469 records to the first data set. Unfortunately, no more complete data records were found from this additional set. Only some few records, taken out during the first data cleaning-up process because of some invalid values, were recovered and put back into the complete set. The third data collection attempt was conducted with the same group of students who entered the program during 1991 to 1996 at one semester after the first attempt. This collection attempt also included the other group of students who just entered in 1997 or 1998. At this time, the author did collect some more complete records, which, after combined with the leftover complete records from the second attempt, created a total of 58 complete records. Data Analysis Methodologies At the meeting with the director of the B.Com. Program, we agreed that the academic performance of students in different specializations would be considered separately. This separation means that there will be some prediction models developed particularly for each individual specialization. There are 10 specializations available in the B.Com. Program. Some specializations, however, have been excluded from the study due to some technical reasons, as defined above. Consequently, the models will be developed for only the traditional specializations of Accounting, Finance, and Marketing. 63 According to the admission requirements, the applicants must have taken M A T H 140 and M A T H 141 before entering the program. However, the applicants can substitute the mathematics requirement with either M A T H 100 and M A T H 101 or M A T H 120 and M A T H 121 (UBC Calendar, 1998). These Math options make up two variables within the input variables set. The author decided to exclude students who take M A T H 120 and M A T H 121, since they are accounted for only 12 out of 2,198 cases. Further, the M A T H 100 series and M A T H 140 series cannot be treated as the same courses because they are at different levels of difficulty. Therefore, for each finalized specialization, there are two distinct sets of models. One is for students who take M A T H 100 and M A T H 101, and the other for students who take M A T H 140 and M A T H 141. The average grade of M A T H 100 is 77.7, while the average grade of its matched M A T H 140 is 81.92. At the same time, the average grade of M A T H 101 is 76.59, and the average grade of M A T H 141 is 82.74. The argument that the remaining two Math options are different from each other in terms of their difficulty should be statistically confirmed using the analysis of variance (ANOVA) test. The test will tell whether the means of the matched Math courses - M A T H 100 versus M A T H 140 and M A T H 101 versus M A T H 141- are significantly different at the alpha level of 0.05. The results of the A N O V A test, as well as some descriptive statistics, are shown in the tables below. 64 Effect Mean Standard Error Term Count 79.80966 All 1,520 80.69276 -2.110564 MATH 100 442 77.6991 0.545166 2.110564 MATH 140 1,078 81.92022 0.3490845 Table 4.3: Descriptives of means, standard errors, and effects for M A T H 100 and M A T H 140 Source Term Between Groups Within Groups Total (Adjusted) Sum of Squares 5585.401 199412.1 204997.5 df 1 1518 1519 Mean Square 5585.401 131.365 F-Ratio 42.52 Probability Level 0.000000* Power (Alpha = 0.05) 0.999997 (* Significant at alpha = 0.05) Table 4.4: Analysis of Variance for testing the significant difference between means of M A T H 100 and M A T H 140 According to table 4.4, the F-ratio from the test of the first pair - M A T H 100 and M A T H 140 - is 45.52, which is much higher than the critical F at the alpha level of 0.05. Thus, the hypothesis that there was no significant difference between the means of M A T H 100 and M A T H 140 was rejected. In other words, the M A T H 100 was significantly different from M A T H 140 and, therefore, cannot be treated as the same input variable. Effect Mean Standard Error Count Term 79.66138 81.12317 1,502 All 0.606204 -3.075084 394 76.5863 MATH 101 3.075084 0.3614906 82.73647 MATH 141 1,108 Table 4.5: Descriptives of means, standard errors, and effects for M A T H 101 and M A T H 141 65 Source Term Sum of Squares df Between Groups W i t h i n Groups Total (Adjusted) 10993.6 217182.6 228176.2 1 1500 1501 Mean Square 10993.6 144.7884 F-Ratio Probability Level Power (Alpha = 0.05) 75.93 0.000000* 1.000000 (* Significant at alpha = 0.05) Table 4.6: Analysis of Variance for testing the significant difference between means of M A T H 101 and M A T H 141 According to the table 4.6, the F-ratio from the test of the second pair - M A T H 101 and M A T H 141 - is 75.93, which is much higher than the critical F at the alpha level of 0.05. Thus, the hypothesis that there was no significant difference between the means of M A T H 101 and M A T H 141 was rejected. In other words, the M A T H 101 was significantly different from M A T H 141 that they cannot be treated as the same input variable as well. The English requirement states that the applicants must have taken E N G L 112 and one of these elective English courses - E N G L 110, E N G L 111, E N G L 120, or E N G L 121 (UBC Calendar, 1998). Since the number of students who take each of these elective English courses varies unproportionally (see table 4.7), the author decided to put them together as a single input variable called 'Elec E N G L ' . However, before this merger can be justified, the author would have to prove that there is no significant difference among these English courses. Again, the analysis of variance of more-than-two-groups was applied for this test. The author set a hypothesis that there was no significant difference among the means of E N G L 110, E N G L 111, E N G L 120, and E N G L 121. The results are shown in the following tables. 66 Term Count Mean Standard E r r o r All 820 71.33781 ENGL 110 645 71.43566 ENGL 111 70.78313 166 74.4 ENGL 120 5 4 ENGL 121 74.75 Table 4.7: Descriptives of means, standard errors, and E N G L 120, and E N G L 121 Source T e r m Between Groups Within Groups Total (Adjusted) Sum of Squares 150.7054 48880.72 49031.43 df 3 816 819 Mean 0.3047501 0.6007167 3.461296 3.869846 effects for F-Ratio Square 50.23515 59.90285 0.84 Effect 72.8422 -1.406539 -2.059065 1.557802 1.907802 E N G L 110, E N G L 111, Probability Power Level ( A l p h a = 0.05) 0.472857 0.233318 Table 4.8: Analysis of Variance for testing the significant difference among means of E N G L 110, E N G L 111, E N G L 120, and E N G L 121 The average grade of E N G L 110, E N G L 111, E N G L 120, and E N G L 121, are 71.44, 70.78, 74.40, and 74.75, respectively. According to the A N O V A table (table 4.8), the calculated F-Ratio is 0.84, which is lower than the critical F at the alpha level of 0.05. The above hypothesis was, hence, accepted. In other words, there was no significant difference among the means of these elective English courses. Therefore, they can be treated in a rather similar manner. To train the neural network models, as well as to build the ordered probit models, the training sets of data are needed. Since there are three separate specializations, and each specialization has two separate models based on the Math options, there will be six different training sets. Each individual training set will consist of two subsets - the training input subset and the target output subset. 67 As mentioned above, all of the complete records found during the first data collection process are used as training sets. However, the number of observations belonging to each group within the training sets is not distributed equally. This unproportional distribution might contradict with the practice of some prior studies, which argue that a training set should have an equal proportion of all categorized groups (Shaun, 1995; Jain & Nag, 1997). Tarn and Kiang, on the other hand, argue that the training set with an equal proportion constitutes a small portion and does not reflect the real distribution of the entire population. The matching process to ensure the equal distribution of all categorized groups might introduce biases to the models (Tarn & Kiang, 1992). For this study, if the number of cases for each categorized group has to be equal, the total number of cases within any individual training sets would be reduced significantly. These resulting small sample sizes could tarnish the predictive power of the models, since the samples might not represent all possible patterns of the data domain. The author, hence, decided to go on with the available number of complete records as the training cases. The following table shows the number of records and the proportion of records in each group, for each specialization track. 68 Specialization Math Option Number of Records M A T H 140 & 141 137 M A T H 100 & 101 60 M A T H 140 & 141 93 M A T H 100 & 101 52 M A T H 140 & 141 79 M A T H 100 & 101 15 Accounting Finance Marketing Proportion of Samples in each Categorized Group (Groupl:Group2:Group3) 40:74:23 (29%:54%:17%) 24:30:6 (40%:50%:10%) 42:49:2 (45%:53%:2%) 26:24:2 (50%:46%:4%) 6:70:3 (7%:89%:4%) 2:13:0 (13%:87%:0%) Table 4.9: Total number of training records and the proportion of them within each group, separated by Math options of each specialization After the models have been developed and trained, they would have to be proved for their generalization power with the unseen cases of data. Thus, a second set of cross- validation data comes into play. This validation set consists of records that are not included in the training set. These records are from the second and third data collection attempts. Unlike the training set, the validation set should have the proportion of observations in each categorized group that represents the actual composition of the entire population (Markham & Ragsdale, 1995; Jain & Nag, 1997). A n effort to test the prediction power of models using a data set that has an equal proportion of each group's observations would result in a misevaluation of a model's performance. For each specialization track, the number of validation records is about 10% or more of the number of training records. validation sets. Table 4.10 below shows all correlated figures of the 69 Specialization M a t h Option Number of Records Proportion of Samples in each Categorized G r o u p (Groupl:Group2:Group3) 4:11:4 (21%:58%:21%) Accounting 4:2:0 MATH 100 & 101 6 (67%:33%:0%) 5:8:3 MATH 140 & 141 16 (31%:50%:19%) Finance 2:2:0 4 MATH 100 & 101 (50%:50%:0%) 1:11:0 MATH 140 & 141 12 (8%:92%:0%) Marketing 0:1:0 1 MATH 100 & 101 (0%:100%:0%) Table 4.10: Total number of cross-validation records and the proportion of them within each group, separated by Math options of each specialization MATH 140 & 141 19 There are eight input variables entering each particular model. The first variable is gender, which is represented by 0 i f its value is female, and by 1 i f its value is male. The second variable is age of students when they were entering the program. It was calculated by subtracting the entering year from the birth year. The next four variables are first-year GPA, E C O N 100, E N G L 112, and Elec E N G L . The last two variables depend upon which Math option the model belongs to. They could be either M A T H 100 and M A T H 101 or M A T H 140 and M A T H 141. The values of the third to eighth variables are in percentages of a 100-scale. Design of neural network models: The architectures of the two network paradigms are quite similar, except at their middle layers, which have different components and connections. The input layers have eight input neurons; each neuron corresponds to a particular input variable, as mentioned above. The output layers consist of three distinct neurons. Each output neuron represents one of three different academic standing groups, 70 as also mentioned above. The figure 4.1 below illustrates some details within the architecture of these neural networks. Group 2 Group 1 o . o Group 3 A o Output Layer Middle Layer o o o o o o o o Gender GPA 1 Age ENGL 112 ECON 100 Input Layer MATH 1?0 Elec ENGL MATH 1?1 Figure 4.1: The architecture of neural network models, showing the components within their input and output layers The first paradigm is a feedforward neural network with multilayering. Most of the research studies in neural network applications agree that a three-layer neural network (with one hidden layer) is sufficient for effectively handling any complicated classification task (Caudill, 1991; Salchenberger et al., 1992; Tarn & Kiang, 1992; Subramanian et al., 1993; Patuwo et al., 1993; Jain & Nag, 1995). However, there are no rules of thumb stating which optimal value each associated parameter should have for any particular task. To make things simple, the author just followed what was conducted and suggested in the prior studies, along with some trial and error experimental values. A model configuration that needs to be determined consists of the number of neurons in the hidden layer, learning rate, number of iterations (epoch), and performance goal. The 71 number of hidden neurons is the most critical factor that greatly impacts the prediction capability of backpropagation neural network. Most researchers just come up with their own formulas in identifying the potential number of hidden neurons. However, those researchers generally agree that the number of hidden neurons should not be too many or too few (Salchenberger et al., 1992; Subramanian et a l , 1993; Patuwo et al., 1993; Lenard et al., 1995; Jain & Nag, 1997; Zhang & Hu, 1998). A network with too many hidden neurons would create the overfitting problem. A n overfitting network will accurately classify the training data, but at the expense of losing its predictive power when coming across with the unseen validating data. On the other hand, the network with too few hidden neurons might not possess sufficient ability to learn all possible patterns of data. Since the hidden neurons behave like the feature detectors, having the small number of them would force individual neurons to put together some distinct and separable patterns. After reviewing the prior studies, the author decided to try numbers of hidden neurons between 4 and 12 neurons (between 50% and 150% of the number of input neurons), increasing by 2 neurons. In other words, the possible numbers of hidden neurons are 4, 6, 8, 10, 12,14, and 16 neurons. According to some researchers, the learning rate should not be set to be too high or too low (Green & Choi, 1997; Demuth & Beale, 1998). A high learning rate makes the network overlearn the data patterns, and, at the same time, keeps its performance oscillating. This instability tends to reduce the generalization capability of the network. On the other hand, a low learning rate lengthens the learning time of a network before it can converge. Since the adopted learning rates in the past studies ranged from as low as 72 0.1 to as high as 0.9, the author decided to take those two values at both ends and to include the middle value of 0.5 as the possible learning rates. Training with the typical gradient descent backpropagation algorithm consumes quite a long time before a network can converge. There are several revisions of the backpropagation algorithm attempting to dramatically reduce both time and memory required for training the network. Demuth and Beale, the authors of "Neural Network Toolbox," strongly suggested using the Levenberg-Marquardt algorithm. This algorithm can learn the complicated patterns of data much faster since it can approach the secondorder training speed without computing the Hessian matrix (Demuth & Beale, 1998). Individual backpropagation network models will keep iterating their learning process for 5,000 epochs. The performance goal is set at 0.05 of the mean squared error. Each training run will stop when the performance goal has been met, the gradient has reached the minimum value, or the number of iterations has been completed. Neurons in the hidden layer have the 'tansig' transfer function. This function accepts any input values, from - oo to + co, and then produces the output values between -1 and +1. Output neurons contain the 'logsig' transfer function, which transfers any real values into the values between 0 and +1. Since a categorized group is represented by the combination of three digits with binary values of 0 and 1, all three output values produced from the output neurons would have to be converted into either 0 or 1. The conditions for conversion are as follow. 73 â€¢ The strongest output value will be assigned the value of 1. â€¢ The two remaining lower output values will be replaced with the value of 0. The converted output vector will then identify which group the network model has predicted. Performance of backpropagation networks mainly depends upon not only the optimal number of hidden neurons but also the right set of initial weights. Basically, the poor performance of the network is a result of the occasion when the network is stuck in the local minima instead of a global minimum of the error curve. One way to reduce that chance is to make multiple training runs with different sets of randomly initial weights (Lippmann, 1987; Caudill, 1991; Demuth & Beale, 1998). For this study, each model with its specific configuration will be run five times. The averaged performance from those five runs of every possible model will then be compared to determine the best performer. The second paradigm is a learning vector quantization (LVQ) neural network. There are only a few research studies regarding the application of this network paradigm. The guidelines for setting the values of all correlated parameters are even in the mist. Since there are no evident rules for setting up the appropriate configurations for the L V Q models, the author would have to apply the same set of configurations for the backpropagation models to the L V Q models. Applying the same set of configurations to both paradigms would make it easy and consistent for the comparison of performance. However, due to some specifications of L V Q algorithms, the L V Q models developed for 74 the Marketing specialization in both Math options cannot follow all configurations that are applied to their backpropagation counterparts. Because of the highly unequal proportion of training records in each group (see table 4.9 for details), the possible number of middle or Kohonen neurons for the Marketing with M A T H 140 & 141 option would have to start from 8 instead of 4 neurons. For the same reason, the number of middle neurons for the Marketing with M A T H 100 & 101 option will vary from 13 to 16 neurons. The author added the 13- and 15-neuron configurations in order to increase the number of observed results. Brief Description of Ordered Probit Model: The ordered probit model is used as a 1 traditional benchmark for the performance comparison with neural networks. Its estimation method of maximum likelihood is applied to the data of training sets. The result from running the ordered probit model is a set of coefficient values. These values will then be used to generate an estimated linear equation for a predicted score (Sj). A sum value of the predicted score and a random error is used to calculate the probability that an observed outcome will belong to a particular categorized group. In other words, a predicted output from the ordered probit model is the probability that the predicted score (Sj) plus the random error (uj) lies between any pair of cut points (kj.i and kj). The following equation shows a general pattern of a would-be ordered probit model. Pr(outputj = i) = Prfki.i < p,Gender + (3 Age + p G P A l + (3 ECON100 + p E N G L l 12 2 3 4 5 + p ElecENGL + p M A T H l ? 0 + p M A T H l ? l + Uj < kj) 6 7 8 The concepts of the Ordered Probit Model are extracted and adjusted from the following source: Greene, 1997; StataCorp, 1997. 1 75 Uj is a random error and assumed to be normally distributed, i is a number that represents one of three categorized groups. Within this study, i can be either 1, 2, or 3. kj_i and kj are two adjacent cut points. Pi, P2, Ps are the coefficients of input variables. Pr(outputj = i) is a probability that the output from the j t h set of input items will belong to th i categorized group. The procedure used for considering which academic standing group is the most likely group to be predicted from the ordered probit model is quite similar to that used for considering the predicted group from the backpropagation neural networks. For each set of input items, i.e., each input record, the ordered probit model will produce three probability values. Each value represents the probability that the outcome belongs to a corresponding group. The ultimate predicted group is the group that has the highest probability among the others. In other words, a particular group is selected to be the predicted result from the ordered probit model i f the corresponding outcome has its highest probability of belonging to that group. A l l predicted groups are for later comparison with the actual categorized groups and with the predicted groups from neural network models. The Neural Network Toolbox within the MATLABÂ® software package is selected to generate and run various neural network models of both paradigms. There are numerous neural network packages, both commercial and freeware, available on the market. Sexton and his colleagues stated that they tried most of the commercial packages, such as Neural Works Professional II/Plus, Brain Maker, M A T L A B , etc., and found no 76 difference among their performance (Sexton et al., 1998). The author adopts M A T L A B since it provides all required neural network features for this research. For running the ordered probit model, Stataâ„¢ software is adopted. This software provides all important features, procedures, and methods necessary for running this probit model technique. Performance Evaluation Criteria The most critical concern within the task of academic success prediction is how to correctly identify the ultimate performance of individual students. By developing a model that represents patterns of data within the domain of consideration, we would expect that the model would correctly predict the results for any unseen observations. Within the context of this study, the author focuses on the correct classification rate within each categorized group, as well as the total correct classification rate, to determine which models are the good performers. The good performing models should consistently perform well in prediction across every group. To identify which neural network paradigm is better than the other in classifying and predicting academic performance, the author will compare the mean of aggregate correct classification rates produced by each paradigm. However, to ensure that any existing difference between the performance levels of these two network paradigms does not occur by chance, the author will implement the one-way Analysis of Variance (ANOVA) test. A total correct classification rate, which was measured in terms of a percentage of correct classified cases, from each configuration of each network paradigm is included in the test. Since there are 21 different model configurations of each paradigm, the total 77 number of observations will be 42. When applied within the A N O V A test, this number of observations should create a substantially statistical significance, which ensures the validity of any conclusions that are to be made. The next step is to consider whether neural networks are superior to ordered probit model in this classification task. The author will compare the correct classification results of each neural network paradigm with those of ordered probit model. Again, the one-way A N O V A test is used to assure the statistical significance of these comparisons and their results. There are, however, some procedural issues regarding the comparisons that need to be addressed and resolved. The first issue is about how we can determine and compare their classification performance. The performance of neural networks is quite dynamic. We can develop several neural network models with various configurations to perform on the same set of data. Thus, neural networks could produce various classification results, ranging from excellent to very poor, when applied with the same data set. On the other hand, an ordered probit model, developed and tested with a given set of data, produces only a single fixed set of results. Comparing the classification results from all possibly good, fair, and poor performing neural networks with those from a static performing ordered probit model could mislead the interpretation of the comparison results. To avoid this problem, the author would have to select, out of the pool of network models, the single best performing neural network of each paradigm, and compare its performance with the performance of ordered probit model. 78 The second issue is about the number of observations (observed correct classification results) generated from each classification approach. Since the best model of each approach will produce a single total correct classification rate for each set of data, comparing that single rate from one approach with that of the other approach would not be possible for the A N O V A test. Even if we pool the classification results from all of six specialization tracks together, the number of six observations from each approach might not be statistically sufficient. To increase the number of observations and the degree of freedom, the author decided to use both correct classification rates of individual groups and aggregate correct classification rates as the observed results. This inclusion increases the number of observations from 6 to 19 cases, which, in turn, should add more degree of statistical significance to the test results. 79 Chapter Five Research Results This chapter mainly summarizes the outcome from applying the neural networks and ordered probit model. The main purpose of this chapter is to objectively show the actual results, as well as to summarize and address some interesting points of the outcomes. No analysis, interpretation, or opinion from the author will be presented in this chapter. The chapter is basically divided into three parts. The first part illustrates descriptive statistics of the data sets, such as mean and standard deviation. It also shows some important parameters, such as coefficient, and standard error, concerning the ordered probit models. The second part reports the classification performance of both backpropagation and learning vector quantization neural networks. It then shows the results from the performance comparisons between the two network paradigms using the A N O V A test. In the last part, the best performance of each neural network paradigm is selected and compared with that of ordered probit model. The A N O V A tests of significant difference between ordered probit model and each neural network are presented as well. Descriptive Statistics and Ordered Probit Model's Parameters The following tables show some descriptive statistics of input (independent) and output (dependent) variables. These figures correspond to the data of training sets. There are 80 basically six separate sets of data. Each set belongs to each specialization - Accounting, Finance, or Marketing - with one of two different options of Math courses. â€¢ Accounting specialization with M A T H 140 & 141: There are 137 training samples for this track. Variable Mean Standard Deviation Range o " Values Min Value Max Value 20 17 18.29 0.5809 66 91 77.18 5.2078 96 61 77.23 7.5608 88 45 6.9323 71.20 90 53 71.72 6.5099 99 83.44 8.4826 61 99 8.4656 63 86.26 Number of Male = 52 cases Gender Number of Female = 85 cases 88.14 56.83 7.2440 75.01 Average Grade Table 5.1: Means, standard deviations, and ranges of values of input and output variables for the Accounting with M A T H 140 & 141 option Age First-Year GPA ECON 100 ENGL 112 Elective ENGL MATH 140 MATH 141 The following table shows the coefficient values of all eight independent variables within the regression equation of the ordered probit model for the Accounting with M A T H 140 & 141 option. Only the coefficient of the first-year G P A (GPA1) is significantly different from zero. It can be argued that the first-year G P A is the only explanatory variable that is significantly related to the dependent variable. The pseudo R is 0.177, which means that 17.7% the variation in the predicted output (categorized group) is explained by the variation of the independent variables. 81 Independent Variable Regression Coefficient Standard Error z-Value ( H : Bi = 0) 0 Probability Level Gender -0.3090 0.2309 -1.338 Age 0.0293 0.1760 0.167 GPA1 -0.1141 -2.502* 0.0456 ECON100 -0.0135 0.0217 -0.622 ENGL112 0.0002 0.0181 0.013 0.0134 0.764 ElecENGL 0.0176 MATH 140 0.0064 0.0170 0.376 MATH141 0.0161 -1.583 -0.0255 Pseudo R (* reject H at p < 0.05) Table 5.2: Regression coefficients, standard errors, and z-values for model of Accounting with M A T H 140 & 141 option 2 0 â€¢ 0.181 0.868 0.012 0.534 0.990 0.445 0.707 0.113 0.1770 the ordered probit Accounting specialization with M A T H 100 & 101: There are 60 training samples for this track. Variable Age First-Year GPA ECON 100 ENGL 112 Elective ENGL MATH 100 MATH 101 Mean Standard Deviation Range ol ' Values M a x Value M i n Value 18.40 0.5585 17 5.4704 64 78.20 7.7396 61 79.78 71.22 6.4705 57 6.3675 55 70.38 82.90 9.3713 58 11.0888 50 79.77 Number of Male = 29 cases Gender Number of Female = 31 cases 53.50 77.04 8.3080 Average Grade Table 5.3: Means, standard deviations, and ranges of values for input and for the Accounting with M A T H 100 & 101 option 20 90 97 83 86 99 98 93.80 output variables 82 Independent Variable Regression Coefficient Standard Error 0.4674 0.4597 -0.0847 -0.0498 -0.0063 -0.0338 -0.0210 0.0170 0.3666 0.3098 0.0810 0.0368 0.0291 0.0313 0.0285 0.0237 z-Value Probability Level (Hâ€ž: Bi = 0) 1.275 1.484 -1.045 -1.353 -0.218 -1.081 -0.736 0.715 Pseudo R Table 5.4: Regression coefficients, standard errors, and z-values for model of Accounting with M A T H 100 & 101 option Gender Age GPA1 ECON100 ENGL112 ElecENGL MATH 100 MATH 101 2 0.202 0.138 0.296 0.176 0.828 0.280 0.462 0.475 0.2527 the ordered probit For this track, none of the coefficients of independent variables are significantly different from zero. The pseudo R is 0.2527, which means that 25.27% the variation in the predicted output is explained by the variation of the independent input variables. â€¢ Finance specialization with M A T H 140 & 141: There are 93 training samples for this track. Variable Mean Standard Deviation Range o ' Values Min Value Max Value 18.57 0.9136 17 5.5794 55 77.98 54 78.34 8.0629 52 72.26 8.3743 7.3892 55 72.48 83.24 50 10.9648 50 86.49 9.4451 Number of Male = 52 cases Gender Number of Female = 41 cases 6.0007 63.33 Average Grade 78.50 Table 5.5: Means, standard deviations, and ranges of values for input and for the Finance with M A T H 140 & 141 option Age First-Year GPA ECON 100 ENGL 112 Elective ENGL MATH 140 MATH 141 23 89 95 90 91 99 99 92.80 output variables 83 Independent Variable Regression Coefficient Standard Error z-Value Probability Level (H : Bi = 0) Gender -0.2609 0.2851 -0.915 0.952 Age 0.1400 0.1466 GPA1 -0.0084 -0.152 0.0550 ECON100 -0.0594 0.0258 -2.299* ENGL 112 0.0176 0.0205 0.856 ElecENGL 0.0226 -0.575 -0.0130 , 0.0098 0.0177 0.554 MATH 140 MATH141 -0.0046 0.0210 -0.217 (* reject H at p < 0.05) Pseudo R Table 5.6: Regression coefficients, standard errors, and z-values for model of Finance with M A T H 140 & 141 option 0 2 0 0.360 0.341 0.879 0.021 0.392 0.565 0.579 0.828 0.1084 the ordered probit The coefficient of ECON100 is the only value that is significantly different from zero. When considering the pseudo R of 0.1084, we can see that only a small portion of 2 variation in output is explainable with the variation of input variables. â€¢ Finance specialization with M A T H 100 & 101: There are 52 training samples for this track. Variable Mean Standard Deviation Range o ' Values Max Value Min Value 0.8282 17 18.48 4.7665 71 78.29 7.4326 66 80.17 70.87 6.1867 59 7.0944 50 68.06 82.94 63 8.8260 64 8.9624 82.60 Number of Male = 22 cases Gender Number of Female = 30 cases 65.17 Average Grade 78.78 5.7117 Table 5.7: Means, standard deviations, and ranges of values for input and for the Finance with M A T H 100 & 101 option Age. First-Year GPA ECON 100 ENGL 112 Elective ENGL MATH 100 MATH 101 21 90 97 84 85 98 99 89.33 output variables 84 Independent Regression Standard z-Value Probability Error Coefficient (Hâ€ž: Bi = 0) Level Variable 0.314 Gender 0.4132 0.4105 1.007 0.181 0.3324 0.2486 1.337 Age -2.620* 0.009 GPA1 -0.3039 0.1160 0.928 0.0040 0.0447 0.090 ECON100 0.041 0.967 ENGL112 0.0016 0.0390 0.032 ElecENGL 0.0813 0.0380 2.139* -0.760 0.447 MATH 100 -0.0253 0.0333 0.011 0.0877 0.0344 2.546* MATH101 0.2871 Pseudo R (* reject H at p < 0.05) Table 5.8: Regression coefficients, standard errors, and z-values for the ordered probit model of Finance with M A T H 100 & 101 option 2 0 The coefficient of first-year G P A is, again, a significant factor for this track. Further, the coefficients of both Elective English and Math 101 are also significant from zero. The pseudo R of 0.2871 is quite higher than that of the previous Finance track. 2 â€¢ Marketing specialization with M A T H 140 & 141: There are 79 training samples for this track. Range o ' Values Standard Deviation Max Value Min Value 23 18 0.7656 Age 18.48 64 86 3.9775 First-Year GPA 75.11 89 6.8726 55 ECON 100 72.15 85 6.2652 57 71.47 ENGL 112 94 53 72.44 7.3480 Elective ENGL 97 78.52 9.8928 51 MATH 140 98 50 81.23 9.9999 MATH 141 Number of Male = 27 cases Gender Number of Female = 52 cases 86.83 60.40 , 74.68 4.0416 Average Grade Table 5.9: Means, standard deviations, and ranges of values for input and output variables for the Marketing with M A T H 140 & 141 option Variable Mean 85 Independent Variable Gender Age GPA1 ECON100 ENGL112 Elec ENGL MATH 140 MATH141 Regression Coefficient 0.1503 0.2170 -0.1306 -0.0549 0.0430 -0.0214 0.0428 0.0109 Standard Error 0.4561 0.2416 0.1012 0.0406 0.0428 0.0340 0.0249 0.0274 z-Value Probability 0) Level 0.742 0.330 0.898 0.369 -1.290 0.197 -1.352 0.176 0.314 1.006 0.529 -0.630 0.086 1.715 0.692 0.396 0.2520 Pseudo R Table 5.10: Regression coefficients, standard errors, and z-values for the ordered probit model of Marketing with M A T H 140 & 141 option (Hâ€ž: B i = 2 For this track, nothing is significantly different from zero. The pseudo R is 0.2520, z which is also somewhat low. In other words, only 25.2% of variation in output are explainable with the variation of input variables. â€¢ Marketing specialization with M A T H 100 & 101: There are only 15 records available as the training samples for this track. Range ol' Values Standard Deviation Max Value Min Value 20 18 0.6450 18.38 Age 81 66 74.32 4.9846 First-Year GPA 95 7.7447 63 75.87 ECON 100 78 62 4.5586 ENGL 112 70.07 81 6.5814 60 69.80 Elective ENGL 96 65 77.60 8.2445 MATH 100 85 57 72.33 8.5329 MATH 101 Number of Male = 8 cases Gender Number of Female = 7 cases 82.80 68.20 74.62 3.9423 Average Grade Table 5.11: Means, standard deviations, and ranges of values for input and output variables for the Marketing with M A T H 100 & 101 option Variable Mean 86 Independent Regression Standard Variable Coefficient Error 3.5221 -1.7657 0.0986 -0.1333 -1.8121 0.0320 0.3278 0.3766 0.000 0.000 0.000 0.000 0.000 8821857 0.000 0.000 Probability z-Value (H : Bi = 0 0) Level 0.000 -0.000 0.000 -0.000 -0.000 0.000 0.000 0.000 Pseudo R Table 5.12: Regression coefficients, standard errors, and z-values for model of Marketing with M A T H 100 & 101 option Gender Age GPA1 ECON100 ENGL 112 ElecENGL MATH 100 MATH 101 2 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.0000 the ordered probit The ordered probit model has a problem in running with the data of this Marketing track. Since there are only 15 observations within the data set, the standard errors are very closed to zero, except for that of Elec E N G L . None of the coefficients are significant from zero. Moreover, this small sample size makes the pseudo R a perfect 1.00. 2 Due to the small number of both training and validation samples, the ordered probit model could not generate any prediction of output from these samples. There will be no classification and prediction results for this Marketing track. Thus, the performance consideration of neural networks and ordered probit model will be on only the first five specialization tracks. Classification and Prediction Capabilities between Backpropagation and Learning Vector Quantization Correct Classification Performance: Each neural network model with the distinct number of hidden neurons and learning rate is repeatedly run for five times. The results from those runs, measured in terms of both the number and percentage of correct 87 classified cases, are then averaged. The following table reports only the average correct classification rates, in percentages, for each specialization track. The results in detail are illustrated in tables within appendix I. Specialization Track D a t a Set Neural Network Paradigm Correct Classification Rate Group 1 Group 3 Group 2 (%) Total 92 50 78 BP 68 Training 41 52 25 Accounting with LVQ 29 52 12 MATH 140 & 141 33 UP â€¢73 ' Validation 41 23 LVQ 25 53 " 52 83 BP 80 91 Training 43 52 15 LVQ 39 Accounting with N/A 63 MATH 100 & 101 64 BP 63 Validation . 37 N/A LVQ 39 35 80 86 10 BP 78 Training 45 41 8 LVQ 51 Finance with 49 MATH 140 & 141 2 59 BP 61 Validation 42 54 ', 13 ' LVQ 40 42 83 BP 85 83 Training 43 37 17 LVQ 51 Finance with â€¢ -N/A 49 MATH 100 & 101 BP 65 33 Validation 44- â€¢ N/A 42 LVQ 40 93 99 40 BP 50 Training 7 76 4 86 LVQ Marketing with " 81 MATH 140 & 141 . 88 BP , N/A'â€¢ 1 Validation 80 .87- â€¢ - â€¢ N / A LVQ 3 N/A 95 99 BP 69 Training 88 N/A 97 LVQ 31 Marketing with 98 N/A N/A 98 MATH 100 & 101 BP Validation 92 92 N/A .. ', N/A LVQ Table 5.13: Average correct classification rates, both each group and tola of all developed neural network models : : Test of a significant difference in performance: This section reports the results from the test of a significant difference between two sets of performance levels of the two neural network paradigms. The A N O V A test has been performed on both the training and validation sets of each specialization track. There are 21 different classification results for each paradigm within the Accounting and Finance tracks. There are also 21 different classification results for the backpropagation paradigm within both Marketing tracks. 88 However, due to the extremely unequal group proportion within those two Marketing tracks, the L V Q paradigm could generate only 12 and 15 different correct classification rates for the M A T H 140 & 141 and M A T H 100 & 101 options, respectively. The following table illustrates, for every instance, the mean values of total correct classified cases, as well as the corresponding percentages in parentheses. It should be noticed that a gap between the performance levels of these two network paradigms is quite wide on the training set. However, this performance gap is greatly reduced when the validation set is applied. This reduction is basically due to the decrease in performance of the backpropagation neural network from the training to validation set. The performance of the L V Q neural network seems to be relatively stable when migrating from the training to validation set. Specialization Training Set LVQ BP Validation Set LVQ BP 7.80 9.89 106.44 55.58 (52%) (41%) (41%) (78%) w / M A T H 100 & 101 49.62 25.89 3.78 2.23 (37%) (63%) (83%) (43%) 6.70 7.84 42.23 Finance: w/ MATH 140 & 141 74.80 (49%) (42%) (45%) (80%) w/MATH 100 & 101 42.97 22.45 1.96 1.68 (42%) (43%) (49%) (83%) 9.58 9.71 73.24 60.38 Marketing: w/MATH 140 & 141 (81%) (80%) (76%) (93%) w/MATH 100 & 101 14.30 13.16 0.98 0.92 (98%) (92%) (88%) (95%) Table 5.14: Means of aggregate performance levels, measured in terms of the number of correct classified cases and the corresponding percentage, of each network paradigm within each specialization track Accounting: w/ MATH 140 & 141 89 The following table reports the F-Ratios from the A N O V A tests. The F-Ratio identifies whether the difference between the performance levels of the backpropagation and those of the L V Q is significant. The table also reports at which confidence level the significance exists. In most instances, the significant difference does exist at both an alpha level of 0.05 and an alpha level of 0.01. There is one instance of the Finance with M A T H 100 & 101 option that the difference between the performance levels on the validation samples is significant at the alpha level of 0.05 but not at the level of 0.01. Further, for both Math options of the Marketing specialization, the differences between the performance levels on the validation samples are not at all significant at the level of 0.05. A l l detailed figures corresponding to the A N O V A tests are illustrated in tables within appendix I. Specialization Math Option MATH 140 & 141 Accounting MATH 100 & 101 MATH 140 & 141 Finance MATH 100 & 101 MATH 140 & 141 Marketing MATH 100 & 101 Data Samples F-Ratio Probability Level Training Validation Training Validation Training Validation Training Validation Training Validation Training Validation 634.33 70.15 373.45 64.82 348.38 9.87 429.81 4.32 133.60 0.22 25.38 2.94 0.000* 0.000* 0.000* 0.000* 0.000* 0.003* 0.000* 0.044* 0.000* 0.640 0.000* 0.096 (* Significant at alpha = 0.05) Table 5.15: F-Ratios and their probability levels resulting from the A N O V A test of a significant difference between performance levels of the backpropagation paradigm and those of the L V Q paradigm 90 Classification and Prediction Capabilities between Ordered Probit Model and Neural Networks The Best Scenario Correct Classification Performance: This section basically compares the classification performance of the three prediction approaches - ordered probit model, backpropagation, and learning vector quantization. Figures in the following tables are the classification results of the best scenario of each approach. Among various levels of performance of both neural network paradigms, the highest correct classification rate is selected. It will then be compared with the correct classification rate of ordered probit model. Classification results on the training samples are presented in the first of the following tables (table 5.16), while classification results on the validation samples are in the second table (table 5.17). 91 Method Ordered Probit Model â€¢ Accounting: w/ MATH 140 & 141 w/MATH 100 & 101 â€¢ Finance: w/ MATH 140 & 141 w/MATH 100 & 101 â€¢ Marketing: w/MATH 140 & 141 w/MATH 100 & 101 Backpropagation â€¢ Accounting: w/ MATH 140 & 141 w/MATH 100 & 101 â€¢ Finance: w/ MATH 140 & 141 w/MATH 100 & 101 â€¢ Marketing: w/MATH 140 & 141 w/MATH 100 & 101 Learning Vector Quantization â€¢ Accounting: w/ MATH 140 & 141 Best Scenario Group 1 Group 2 Group 3 Total 25 (63%) 16 (67%) 25 (60%) 19 (73%) 1 (17%) N/A (N/A) 62 (84%) 25 (83%) 38 (78%) 18 (75%) 70 (100%) N/A (N/A) 2 (9%) 1 (17%) 0 (0%) 0 (0%) 0 (0%) N/A (N/A) 88 (64%) 42 (70%) 63 (68%) 37 (71%) 71 (90%) N/A (N/A) 38 (95%) 24 (100%) 42 (100%) 26 (100%) 6 (100%) 2 (100%) 74 (100%) 30 (100%) 49 (100%) 24 (100%) 70 (100%) 13 (100%) 22 (96%) 6 (100%) 1 (50%) 2 (100%) 3 (100%) N/A (N/A) 134 (98%) 60 (100%) 92 (99%) 52 (100%) 79 (100%) 15 (100%) 31 49 1 81 (4%) (59%) (66%) (78%) 1 35 17 17 w/MATH 100 & 101 (17%) (58%) (71%) (57%) 28 0 57 29 â€¢ Finance: w/MATH 140 & 141 (61%) (57%) (69%) (0%) 13 1 33 19 w/MATH 100 & 101 (64%) (54%) (50%) (73%) 1 69 0 70 â€¢ Marketing: w/ MATH 140 & 141 (17%) (99%) (89%) (0%) N/A 15 13 2 w/MATH 100 & 101 (100%) (N/A) (100%) (100%) Table 5.16: The correct classified cases and their correlated percentages, as of each group and of total, of the training data set among three different methods 92 Method Ordered Probit Model â€¢ Accounting: w/ MATH 140 & 141 w/MATH 100 & 101 â€¢ Finance: w/ MATH 140 & 141 w/MATH 100 & 101 â€¢ Marketing: w / M A T H 140 & 141 w/MATH 100 & 101 Backpropagation â€¢ Accounting: w/ MATH 140 & 141 w / M A T H 100 & 101 â€¢ Finance: w/MATH 140 & 141 w/MATH 100 & 101 â€¢ Marketing: w/ MATH 140 & 141 w/MATH 100 & 101 Best Scenario Group 2 Group 3 Total 1 (25%) 3 (75%) 4 (80%) 1 (50%) 0 (0%) N/A (N/A) 9 (82%) 2 (100%) 4 (50%) 0 (0%) 11 (100%) N/A (N/A) 2 (50%) N/A (N/A) 0 (0%) N/A (N/A) N/A (N/A) N/A (N/A) 12 (63%) 5 (83%) 8 (50%) 1 (25%) 11 (92%) N/A (N/A) 2 (50%) 4 (100%) 5 (100%) 2 (100%) 0 (0%) N/A (N/A) 10 (91%) 2 (100%) 6 (75%) 2 (100%) 11 (100%) 1 (100%) 1 (25%) N/A (N/A) 0 (0%) N/A (N/A) N/A (N/A) N/A (N/A) 13 (68%) 6 (100%) 11 (69%) 4 (100%) 11 (92%) 1 (100%) Group 1 Learning Vector Quantization â€¢ Accounting: w/ MATH 140 & 141 3 7 3 13 (75%) (64%) (75%) (68%) 4 2 N/A 6 w/MATH 100 & 101 (100%) (100%) (N/A) (100%) 3 7 1 11 â€¢ Finance: w/ MATH 140 & 141 (60%) (88%) (33%) (69%) 2 2 N/A 4 w / M A T H 100 & 101 (100%) (100%) (N/A) (100%) 1 11 N/A 12 â€¢ Marketing: w / M A T H 140 & 141 (100%) (100%) (N/A) (100%) N/A 1 N/A 1 w/MATH 100 & 101 (N/A) (100%) (N/A) (100%) Table 5.17: The correct classified cases and their correlated percentages, as of each group and of total, of the validation data set among three different methods Table 5.17 shows the numbers and percentages of correctly classified validation samples generated by these three approaches. It can be seen that both backpropagation and L V Q 93 models outperform ordered probit models in almost every instance. Moreover, the aggregate correct classification rates of both backpropagation and L V Q models are higher than two-thirds (66%), while some of the classification rates of ordered probit models are below this level. Test of a significant difference in performance: This section reports the results from the test of a significant difference between the best performance levels of each neural network paradigm and the only performance levels of ordered probit model. The test has been performed solely on the validation sets, since the author would like to know how well each trained method performs on the unseen observations. The ordered probit model could not predict results for the Marketing with Math 100 & 101 track, thus, the classification rates produced by other methods of this track are also taken out from the comparison. Further, since both correct classification rates in each group and aggregate correct classification rates are included in the A N O V A test, there are ultimately 17 different correct classification observations from each method. Only the classification results measured in percentages are used as the observations for the comparison. The actual or head-counted numbers of correct classified cases cannot be used for comparison since each number is based on different sample sizes. Identifying any significant difference in performance using these head-counted numbers is incorrect, and definitely misleads the interpretation of results and conclusions. 94 Table 5.18 illustrates the means of the 17 observed correct classification percentages of each classification method, as mentioned above. It is quite interesting to see that while the average performance level of the best backpropagation decreases from the training set to validation set, the average performance level of the best L V Q , instead, increases. Further, on the training sets, the backpropagation has the highest average performance among other methods, while, on the validation sets, the L V Q has the highest performance among others. Moreover, it should be noticed that the mean figures of both backpropagation and L V Q within this table 5.18 are not similar to those in table 5.14. The mean figures in table 5.18 are calculated from only the classification rates, both by group and aggregate, of the best performing neural network models from the six specialization tracks, not from the classification rates of all developed neural network models. The reason of this different calculation is previously stated within chapter 4, regarding the fairness and appropriateness of performance comparison between each neural network paradigm and ordered probit model. Mean Validation Set Training Set 54.41% 54.45% Ordered Probit Model 74.71% Backpropagation (BP) 96.90% 84.24% Learning Vector Quantization (LVQ) 52.15% Table 5.18: Means of performance levels, in terms of the correct classification percentage, of each classification method Method The following A N O V A tables report the results of the performance comparison between each neural network paradigm and ordered probit model. The tables report both F-ratios and probability levels. They also specify whether the existing differences in means are significant at a particular alpha level of 0.05 or 0.10. 95 Source Term Sum of Squares df Mean Square F-Ratio Probability Level Power (Alpha = 0.05) 1 2.87 0.099796** 0.376218 Between Groups 3500.735 3500.735 Within Groups 38993.65 32 1218.552 Total (Adjusted) 42494.38 33 (* * Significant at alpha = 0.10) Table 5.19: Analysis of Variance for testing the significant difference between the mean of correct classification rates of ordered probit model and that of backpropagation Source Term Sum of Squares df Mean Square F-Ratio Probability Level Power (Alpha = 0.05) 0.004248* 0.847285 Between Groups 7560.265 1 7560.265 9.48 Within Groups 25529.18 32 797.7867 Total (Adjusted) 33089.44 33 (* Significant at alpha = 0.05) Table 5.20: Analysis of Variance for testing the significant difference between the mean of correct classification rates of ordered probit model and that of learning vector quantization (LVQ) The performance comparison between ordered probit model and backpropagation shows that the difference in their classification performance does exist at a 90% confidence level. At the same time, the difference between the classification performance of ordered probit model and that of L V Q model is also significant at a higher confidence level of 95%. 96 Chapter Six Analysis and Interpretation of Results The main focus of this chapter is to answer questions implied within the objectives of this research, as addressed in the previous chapter. To do so, the results are first thoroughly analyzed and then findings from the analysis are interpreted. This chapter also discusses the knowledge we get from the findings, as well as other unexpected but interesting outcomes. Classification Power of Backpropagation and Learning Vector Quantization The application of neural networks toward several classification tasks is still, to some extent, trail and error. There are no acceptable rules or standards stating which combination of parameters and their values will generate the most optimal neural network model for the task under study. The general practice is to try several models with different configurations, and then determine which model creates the best results. Within this study, the author has developed a number of models. Each model consists of a particular number of hidden neurons, ranging from 4 to 16 neurons, and one of three learning rates, 0.1, 0.5, or 0.9. The author implemented all possible configurations, where applicable, to both backpropagation and L V Q paradigms. The classification results from these models are reported in the previous chapter, as well as in appendix I. In the next two sections, the author discusses some interesting findings concerning the number of hidden neurons and the learning rate. The following table identifies the model configurations that produce the highest correct classification rates. 97 Correct Classification Rate Neural #of Learning (%) Data Set Network Hidden Rate Group Group Group Total Paradigm Neurons 1 2 3 94 86 BP 10 0.1 90 54 Training 33 47 LVQ 6 0.1 38 56 Accounting with MATH 140 & 141 BP 14 0.5 30 85 59p; 15 Validation 25 ' 71 15 ,', 50 LVQ 10 .0.1. 67 95 BP 14 0.1 97 99 Training 54 14 0.1 63 56 7 LVQ Accounting with N/A 80 MATH 100 & 101 0.5 75 90 BP 16 Validation N/A 60 ' 4 . . 0.5 .50 80 LVQ 92 14 93 95 20 BP 0.1 Training 58 48 0 LVQ 16 0.1 71 Finance with MATH 140 & 141 7 12 0.1,0.5 64 75 59 BP . Validation 7 â€¢ 54 LVQ 0.1 60, 68 16 60 97 14 0.1 98 99 BP Training 50 56 46 10 LVQ 16 0.5 Finance with 70 60 N/A 65 3 MATH 100 & 101 BP 12 0.1,0.5 Validation N/A â€¢ .70 14 0.1 50 90 LVQ 97 77 100 73 BP 16 0.1 Training 95 0 85 16 0.5 7 LVQ Marketing with 92 MATH 140 & 141 100 N/A' BP 14 0.5, 0.9 0 Validation N/A 88 95 LVQ 14 0.1 20 N/A 100 4, 14 100 100 BP 0.1,0.9 Training N/A 97 0.1 80 100 LVQ 16 Marketing with 100 N/A 100 MATH 100 & 101 0.1 -0.9 N/A BP 4- 16 Validation N/A 100 100 LVQ 8- 16.. 0.1 -0.9 "N/A Table 6.1: Neural network configurations that produce the best total classi ication results Specialization Track Optimal Number-(s) of Hidden Neurons: According to the table 6.1, most of the best performing backpropagation models have either 12, 14 or 16 hidden neurons. The 4-, 6and 8-hidden-neuron models are the rarest cases that create the best performance (only in the models for the Marketing with M A T H 100 & 101 option). The finding from this study is quite contrary to the findings of some previous studies. Those studies argued that the optimal number of hidden neurons would be around 75% of the number of input neurons (Salchenberger et al., 1992; Jain & Nag, 1995; Lenard et al., 1995; Gupta et al., 1997). Besides, for this study, the number should be around 150 to 200% of the number of input neurons. Even though some researchers argue that so many neurons in the 98 hidden layer could jeopardize the generalization ability of the network (Patuwo et al., 1993; Subramanian et al., 1993; Lenard et al., 1995; Zhang & Hu, 1998), it seems not to be the case for this study. The pattern of the optimal numbers of Kohonen (hidden) neurons of the L V Q networks is quite similar to that of backpropagation networks on both training and validation sets. The best performing L V Q networks could have any numbers of hidden neurons, but most of them have 14 or 16 hidden neurons (175, or 200%, of the number of input neurons). Classification performance levels of neural network models, both backpropagation and L V Q , within the Marketing with M A T H 100 & 101 option are quite exceptional. Almost every model performs near perfect on the training set, and does so perfectly on the validation set. The main reason for this impressive performance is due to the rather small number of observations. The training set consists of 15 observations; two of them classified into the first group, and 13 of them classified into the second group. Only 1 observation, classified into the second group, is available for validating the models. It should not be difficult for all neural network models to correctly classify those observations into their right groups. Even just to simply guess every observation into the second group seems not to deteriorate the correct classification rate so much. According to the above findings, the backpropagation neural network tends to produce more accurate prediction results when it has quite a large number of hidden neurons, more than 100% of the number of input neurons. The likely interpretation of this 99 phenomenon is that there are many abstract patterns of data to be detected within this domain of academic performance. These various patterns might result from the complex non-linear interactions among the input data items. In his tutorial article, Hiotis explained the role of hidden neurons within the backpropagation neural networks (Hiotis, 1993). He stated that an individual hidden neuron behaves like a feature or pattern detector by concentrating and comprehending "particular information containing in the input signals." The distinct combination, to some extent, of these features identifies a possible output signal that is associated with a given input vector. In other words, a particular output signal could be generated by various combinations of possible features of input data . Since there are eight input variables in this study, the interaction among 1 some or all of them could create quite a large numbers of features or patterns, which, subsequently, influence ultimate academic performance. As mentioned earlier, within this domain, we find that some students with similar academic backgrounds and demography often end up with much different academic achievement levels in their later studies. To effectively recognize and detect these several distinct features, as well as to be able to predict accurate outcomes, hence, the large number of hidden neurons is quite necessary for the backpropagation models. Perhaps it might not be worth trying to make a good interpretation out of the findings about the optimal numbers of Kohonen neurons. To the author's knowledge, there has been no indication of the impacts of too many or too few Kohonen neurons toward the It might be helpful for readers to understand the feature detection and interaction aspects by determining the general architecture of neurons and their connected weights within a backpropagation neural network, as shown in chapter 3 or other materials elsewhere. 1 100 ultimate classification and prediction performance of an L V Q neural network. Besides, Kohonen argued that it is the right number of those Kohonen neurons assigned to each class that has real impacts on the achievement of prediction accuracy (Kohonen, 1995). According to the competitive learning algorithm utilized within the M A T L A B package, the codebook vectors are initially assigned with random values. They are then randomly selected to represent any classes of input data at the beginning of the training phase. The values of the codebook vectors are adjusted during the training process to make the vectors represent the correct classes. Having too many or too few Kohonen neurons should not create any significant difference in prediction. However, since the assignment of particular Kohonen neurons to a particular class is unchangeable, the performance of L V Q networks would depend on whether or not the utilized algorithm is generating the optimal assignment. Optimal Learning Rate(s): It has been suggested that a learning rate that is not too high or too low should be used to train a neural network. The high learning rate would make the neural network learn data patterns quite fast. However, at the same time, the high learning rate makes the learning process greatly fluctuate, and creates a difficulty for the neural network to converge. The small rate, on the other hand, enables the neural network to converge at the lowest point of the error curve, but takes a longer time before it can converge (Green & Choi, 1997; Demuth & Beale, 1998). The actual results tend to be consistent with the above assertion. Most of the best performing neural network models implement a learning rate of either 0.1 or 0.5. Only 101 two other backpropagation network models with a learning rate of 0.9 did have the best performance. However, other corresponding models within the same tracks with a learning rate of 0.1 or 0.5 also accompany the 0.9 learning-rate models. Finally, because of a very small size for the validation set within the Marketing with M A T H 100 & 101 option, various learning rates seem not to show distinct impacts on the classification performance. Significant Difference in Correct Classification Performance: The A N O V A test is used to determine which network paradigm performs better in classifying and predicting academic success. The A N O V A test identifies whether there exists a significant difference between correct classification results of these neural network paradigms. Although the author reports both total and each group's correct classification rates, the comparison of performance will be focused only on the total correct classification rates. Table 5.14 in the previous chapter shows that, in every instance, the average aggregate correct classification rate of the backpropagation models is higher than that of L V Q models. The A N O V A test confirms the superiority of backpropagation models over L V Q models by proving that, in most instances, the differences between correct classification rates of these two paradigms are significant at the 95% confidence level. From the test results, we could argue that the backpropagation algorithm is more powerful than the L V Q algorithm in recognizing the patterns of the given data and in predicting the right patterns of unseen data within this domain. However, on the validation samples within both tracks of Marketing, the differences between classification 102 performance levels of these two paradigms are not statistically significant. In the next sections, the author will further discuss these exceptional test results. On the training data sets, the backpropagation models are able to fit most or all of the possible data patterns, while the L V Q models recognize only some data patterns. A backpropagation model uses the gradient descent method in adjusting its parameters to produce the minimum output errors. Once the backpropagation model has reached the global minimum of the error curve, it can effectively recognize all possible data patterns and produce the outcomes accordingly. The only problem of this gradient descent method is that it could possibly make the model get stuck into one of several local minima. This situation usually occurs when a set of initial random weights is not the right one. The local minima problem generally deters the model from learning data patterns, and makes the model produce quite low correct classification rates. To repeat running the same backpropagation model several times could help find the right set of initial weights that will not put a network in any local minima (Zahedi, 1993). A n L V Q model adopts the Euclidean distance method in adjusting its weights. The ultimate goal of this method is to make one or some codebook vectors, i.e., sets of connected weights, closely resemble input vectors of a similar pattern. The finally trained L V Q model has particular sets of interconnected weights and middle neurons that represent each distinct class of similar input data. Within this study, L V Q models seem unable to learn most or all data patterns of the training samples. The main concept behind the L V Q ' s learning algorithm is to cluster data with similar patterns into the same 103 class. Unfortunately, this algorithm can not work effectively when coming across a set of complicated data with inconsistent and fluctuating patterns. For example, the data set might consist of several data observations that are categorized into the same group, but are somewhat different from one another in most or all variable dimensions. It is quite impossible to let the L V Q model first learn these data patterns in an unsupervised manner and then to be supervisory trained to classify those pre-categorized data into the designated groups. To further address this issue, the author has run all training sets with the unsupervised part of the L V Q model to see how the training observations are clustered in a twodimensional plane. Since there is no direct way to perform this task, the author had to adopt the self-organizing map (SOM) model, which is usually used for clustering data without any supervision. The results from running the S O M neural network with the training sets are illustrated in the figures below. Group 2 Group 1 Q 5 4 % 2 -] n 0 2 3 4 5 6 1 2 3 4 5 5 Figure 6.1: Self-organizing maps of the training data set, separated by categorized groups, for the Accounting with M A T H 140 & 141 option 104 Figure 6.2: Self-organizing maps of the training data set, separated by categorized groups, for the Accounting with M A T H 100 & 101 option Figure 6.3: Self-organizing maps of the training data set, separated by categorized groups, for the Finance with M A T H 140 & 141 option Croup 2 Group 1 6 4 2 ) 2 3 4 5 ) 2 3 4 5 6 Figure 6.4: Self-organizing maps of the training data set, separated by categorized groups, for the Finance with M A T H 100 & 101 option Group 1 Group 2 Group 3 Figure 6.5: Self-organizing maps of the training data set, separated by categorized groups, for the Marketing with M A T H 140 & 141 option 105 Group 1 Group 2 g 6 5 5 * 3 0 2 3 4 5 6 ) 2 3 4 5 Figure 6.6: Self-organizing maps of the training data set, separated by categorized groups, for the Marketing with M A T H 100 & 101 option It can be seen that for every track of Accounting and Marketing, the observations in the first group do cluster in particular areas, which are quite distinctive from those occupied by the observations in the third group. The observations in the second group, however, scatter all over the plane, including most areas occupied by either the first or the third group. In the case of the Finance tracks, the observations in both the first and the second groups scatter and occupy most of the plane, including areas taken by the observations in the third group. This haphazard distribution on the plane generates a difficulty for the supervisory part of L V Q models to classify all data observations into the right groups. Since the observations in the second group highly populate the plane for all Accounting and Marketing tracks, the L V Q models tend to classify most data observations into this group. This makes the correct classification rates of the second group the highest rates among those of the other groups. The situation is also true for the Finance tracks. Both the first and the second groups highly populate the plane. The correct classification rates of these two groups are somewhat the same, and are much higher than the correct classification rate of the third group. The aggregate performance, on the validation samples, of backpropagation models is not significantly higher than that of L V Q models in every instance. The exception is in both 106 Marketing tracks. The backpropagation models perform slightly better than the L V Q models but their performance levels are not significantly different from each other at the 95% confidence level. A possible explanation for this insignificance is that the proportion of observations in each group is greatly uneven, as mentioned previously. There is only one validation case, which falls into the second group, for the M A T H 100 & 101 option. Within the set of 12 validation cases for the M A T H 140 & 141 option, eleven of them are in the second group. It is, thus, not difficult for both paradigms, trained with the data sets of the second group concentration, to correctly predict the results for those validation observations, most of which are also in the second group. The author believes that the classification and prediction results of these Marketing tracks will be similar to those of the other specialization tracks, i f more samples of the first and the third groups can be found. Performance Degradation: By comparing the aggregate performance of the two network paradigms, it can be seen that backpropagation models are much superior to L V Q models in learning the patterns of data on any training sets. On the average, most backpropagation models produce the correct classification rates of about 80 to 90% in the training phase. Most L V Q models, on the other hand, produce the results of only about 40%, except for the Marketing tracks, where the rates go up to around 80%. When applying both neural network paradigms to the validation sets, the gaps between their performance levels become smaller than those in the training phase. The average performance of backpropagation models is quite fluctuated across the specialization 107 tracks, ranging from 50% to almost 100%. The average performance of L V Q models, on the other hand, is less fluctuated, clustering at the 40% level for both Accounting and Finance and at 80 to 90% levels for Marketing. By considering each paradigm individually, the performance of backpropagation models is substantially degraded when migrating from the training to validation sets, while the performance of L V Q models seems to be rather stable. The performance degradation of the backpropagation models could be explained by the problem of overfitting (Shaun, 1995; Demute & Beale, 1998). The high classification rates of backpropagation models on the training sets imply overtraining situations. Not surprisingly, the classification rates significantly drop when running the models on the validation sets because of their insufficient generalization capability. The other possible reason for the performance degradation of backpropagation models is the small size of validation samples compared to the size of training samples. The number of validation samples is about 10% of the number of training samples. The validation set might mostly consist of observations with patterns that are not well recognized and learnt by the trained backpropagation models. Within such a small validation set, misclassifying just only one or a few observations could cause a severe drop in the correct classification rate. It is quite difficult, though, to find a good explanation for the consistent performance of L V Q models from the training to cross-validation session. The most viable explanation would be that the L V Q models are not, in most instances, overtrained. This non- overtraining scenario should enable the L V Q models to maintain their generalization 108 power when applied to the validation sets. The following figure shows the comparative classification rates of these two paradigms. 120 w S. i- 100 o U Â© o 80 60 eS 5 ^ 60 40 Â£ 0 20 M 4> Mathl40&141 MathlOO&lOl Math140&141 Mathl00&l01 Mathl40&141 MathlOO&IOl Accounting I BP-Training Finance Specialization Track HBP-Validation H LVQ-training Marketing LVQ-Validation Figure 6.7: Bar chart comparing the correct classification rates, measured in percentages, of both neural network paradigms when applying to either training or validation data set Observations f r o m Descriptive Statistics Within the set of 426 training records, there exist particular data patterns and characteristics that are quite interesting. When considering performance in the first-year courses, on the average, Commerce students in all three disciplines tend to perform well academically in the quantitative courses - Mathematics and Economics. Their performance in the qualitative (English) courses is somewhat lower than that of the quantitative ones. The average first-year GPA, which is currently used as the major admission factor, for all students in the training samples, is 78.96%. On the other hand, the average performance of the five core courses of these students is 78.20%, which is 109 not different from their first-year performance. This finding could support the past argument that the high school or first-year college G P A substantially reflects future academic performance, especially in terms of cumulative GPA or third- and fourth-year GPAs (Folwer & Glorfeld, 1981; Touron, 1983; Shaughnessy & Evans, 1986; Eskew & Faley, 1988). We should also expect the consistent patterns between the input and output data across the three categorized groups. In other words, we should expect students who are classified into the first group to possess better past academic records than their cohorts in the second group. At the same time, we should expect the same pattern between the students in the second group and the students in the third group. When considering the average grade of the first-year courses, as well as the first-year GPA, the author found that the above expectation was true for almost every specialization track. However, there exists an exception with the fellows within the Finance with M A T H 140 & 141 option. Within this track, the students classified in the third group have higher average grades in all first-year courses and higher first-year GPAs than their counterparts classified in the first or second group. This oddity might result from the fact that there are only two samples in this third group, making the average figures irregularly high. The performance paradox also occurs to the elective English course within the Finance with M A T H 100 & 101 option, the M A T H 140 course within the Marketing with M A T H 140 & 141 option, and the Economics and M A T H 100 courses within the Marketing with M A T H 100 & 101 option. It seems that the unequal proportion of students in each group 110 is to blame for this paradox, but we cannot firmly argue that since this paradox is not always true within other tracks, which also have an unequal proportion. Performance Comparison between Neural Networks and Ordered Probit Model Unlike neural networks, ordered probit model produces only one set of results for each specialization track. Prior studies comparing the performance of neural networks and traditional classification methods adopted different procedures in coming up with a particular number of observed results for making the comparison. Both the studies by Wilson and Hardgrave, and by Gorr, Nagin, and Szczypula, implemented quite similar procedures in testing and comparing the classification performance of neural networks and the traditional methods. Those studies determined and selected only one particular configuration of a backpropagation neural network model at the beginning stage. That selected model was then run with several sets of re-sampling data. It is, however, clearly shown from the results in the previous section of this study that the best performing neural network of each specialization track possesses a distinct configuration different from those of the best performers in other specialization tracks. In other words, a neural network with a particular configuration would optimally perform on only one or some particular sets, but not every set of data. Running the same network model with all the different sets of data, therefore, would not create the most optimal result for them, and would level off the overall performance. Further, the researchers of both studies argued that the neural network configurations they selected might not represent the fullest prediction capability. Any conclusions they made were quite reserved to the lower bound of a neural network's potential performance. The above explanation and argument would Ill justify the procedures adopted in this study regarding the selection of the best performing neural network for each set of data and the comparison between the performance levels of the best performing neural networks and ordered probit models. A l l three classification approaches were processing both training and validation sets. However, the author considers only their performance on the validation sets. Correct classification rates produced from individual approaches on the validation samples would show how powerful their generalization capabilities are. The performance on the training samples would not provide much knowledge about the difference among the prediction capabilities of these three approaches. Since we can continuously train a neural network model until it has fitted the entire training data set perfectly, it will always be true that neural networks outperform ordered probit model. According to table 5.18 in the previous chapter, the average performance of the best performing backpropagation models is 74.71%, while the average performance of ordered probit models is 54.41%. The A N O V A test reveals that, although the average performance of backpropagation models is higher than that of ordered probit models, the difference is not significant at a 95% confidence level. However, the corresponding Fratio of 2.87 is significant at the alpha level of 0.10. We can, thus, still argue that the backpropagation models significantly outperform the ordered probit models with a confidence level of 90%. 112 Unlike the backpropagation models, the best L V Q models outperform the ordered probit models at a much higher confidence level. The average performance of the best performing L V Q models is 84.24%, which is also higher than the average of the ordered probit models. The corresponding F-ratio of 9.48, which is significant at the alpha level of 0.05, proves that the L V Q models strongly and significantly outperform the ordered probit models. The final remarks for the performance comparisons within this study are twofold. First, it is possible that the developed neural network models of both paradigms might still be in the low or medium levels of their potential capacities. As mentioned earlier, there are no standards or rules of thumb that identify which configurations of those neural network paradigms will provide the most optimal results. Although the author has tried developing neural networks with a wide range of configurations, he cannot assure that the existing best performing models are ultimately the best models that can be developed. The best models that were found in this study might possess capabilities that are close to or still far away from - the greatest capabilities of neural networks for the application under study. Second, the number of validation samples used to test the prediction performance might be too small. This small sample size could increase a chance of misclassification of each method, but, probably, with different degrees. Thus, the results from the performance comparisons within this study might not reflect the ultimate picture of the superiority of one approach over the other approach. 113 Chapter Seven Conclusion This chapter concludes what has been found from this research and how the findings add value to the knowledge of neural network applications. Moreover, it also discusses some limitations that, to some extent, prevent us from making ultimate statements about neural network applications. A l l possible factors and conditions that needed to be fully considered before evaluating and generalizing the results are explained. Finally, further investigation within this area that would result in possible future research studies is also addressed. Potential Contributions to Academic Circles Neural networks have a great potential in enhancing the decision making process, in which a decision-maker is dealing with complicated phenomena. Neural networks are superior in recognizing and handling the complex data patterns, which usually violate statistical assumptions generally required by traditional statistical techniques (Wilson & Hardgrave, 1995). The task of accurately predicting the academic performance of students has shown to be a difficult one. The ordered probit model, utilized in this study, could not develop strong relationships between a set of explanatory variables and a predicted variable. A l l pseudo R s produced from the probit models are less than 30%. 2 The results from running several configurations of backpropagation and L V Q neural networks indicate that the backpropagation paradigm is better than the L V Q paradigm in 114 classifying and predicting academic performance. The author first hypothesized that the L V Q networks would outperform the backpropagation networks in classifying complex data, and could be used as the alternative prediction technique. However, it is evident from this study that, for the task of solely classifying data of a given set into the right groups, the backpropagation network is a more suitable technique. The hypothesis is also rejected when coming to the prediction of unseen observations. The performance levels of backpropagation models are still higher than those of L V Q models, although some of them are not significantly different from each other at the 95% confidence level. Since there have been only a few studies applying the L V Q neural network to the classification task, the full potential of the L V Q neural network as a predictive model might not have been uncovered yet. Moreover, the small performance gap between backpropagation and L V Q on the validation samples implies that the L V Q approach has the promising potential to be an alternative for the task of predicting complex data. According to the prior studies applying neural networks to predict academic success, it was shown that neural networks did not significantly outperform other traditional techniques. The results from this study, however, provide quite a different story. Both backpropagation and L V Q neural networks haye a higher average correct classification rate than ordered probit model in every specialization track. Although the difference between the performance of backpropagation and ordered probit model is not significant at the confidence level of 95%, we can still make a conclusion with some reservations that backpropagation neural network has significantly higher performance than ordered probit model at the somewhat lower confidence level of 90%. 115 The author has also proposed another procedure for testing the performance of neural networks and comparing it with that of ordered probit model. Instead of first deciding which network configurations would be applied to different sets of data, the author just experimented numerous network configurations with every set of data, and later determined which ones were the most favorable performers. This practice created a lot more chance of finding the best performing neural network model for each individual data set. The author believes that, by comparing the best performing models of one approach to those of the other approach, we can generate a more accurate and appropriate interpretation of the results. Limitations and Conditions Similar to other research studies, this study possesses several limitations that need to be addressed before making any ultimate arguments or conclusions. Those limitations concern the implication of the results and findings to other settings (external validity or generalization), the number of samples in the experiments (internal validity), and the neural network configurations (internal validity). Prior studies of neural network applications indicated that neural networks generally outperform other traditional techniques in various tasks within various domains. The results from this study seem to support those findings of the superiority of neural networks over traditional techniques. As mentioned above, the findings from this study contradict those of other similar studies in the academic success prediction area, and, they 116 should therefore be carefully considered before making any generalization to other settings. The findings might be applicable to the settings where correlated factors and environments are relatively similar to those of U B C ' s B.Com. Program. For example, they might be suitable for undergraduate business programs at other institutions in B.C. or in Canada. Moreover, since this study just covers only the three specializations of Accounting, Finance, and Marketing, it would not be useful and justifiable to apply or to generalize the study's results to other specializations. Since this study was conducted on rather small sample sizes, especially the sets of validation samples, we might not be able to make conclusive statements about the findings. If we can collect more observations to be used for validation, we could make stronger arguments approaches. about the classification and prediction capabilities of those At this point, what we can conclude is that there is a trend that neural networks are better than ordered probit model in classifying and predicting complex data. Further, we can state with confidence that backpropagation models are superior to L V Q models in recognizing patterns of a given set of data, thanks to sufficient numbers of training cases. However, due to the small numbers of validation cases, the conclusion that backpropagation models have greater generalization powers than L V Q models should be made with some cautions. It is possible that none of the neural network configurations employed within this study represent the full classification and prediction capabilities. Gorr and his colleagues believed that there could be more complex models that would improve the prediction 117 performance within their study (Gorr et al., 1994). In this study, the author only determines the variation of two parameters, i.e., number of hidden nodes, and learning rate, that would influence the performance of neural networks. Perhaps there exist other factors the author is not aware of or does not focus on that could improve performance. Moreover, other than realizing what factors are really influential to performance, selecting the right values for those factors is also important, but, at the same time, is hard to achieve. Kohonen argued that to accomplish the most accurate prediction within any tasks to which neural network models are applied depends upon several factors. However, such appropriate values can only be found by trial-and-error, as well as by extensive experience (Kohonen, 1995). There are no cookbooks that help find the right values of those factors easily. There are ongoing experiments and research attempting to improve the learning algorithm of each neural network paradigm. The learning algorithms implemented in this study are just the earlier versions available within the neural network application package. Newly improved versions, as well as newly related techniques, that could increase the prediction accuracy of neural networks are still to come. Deliverables to the B.Com. Program It has been shown, within this study and elsewhere, that neural networks are more powerful than traditional methods. It is, thus, justifiable to utilize neural networks within the student recruitment process of the B.Com. Program. Neural networks would be an 118 effective and powerful technique for predicting ultimate academic performance of U B C Commerce students. By applying neural networks to recognize patterns of past academic records in association with some demographic factors, and to predict future academic success, users (B.Com. Program's committees) do not need to greatly concern about the natures of data set they are dealing with. The data set could violate some statistical assumptions generally required by traditional statistical techniques, or could contain missing items or noise. However, these data defects seem not to significantly deteriorate classification and prediction power of neural networks. Given a set of input variables currently used as the admission criteria by the B.Com. Program, neural networks can predict the academic performance with a rather high degree of accuracy, compared to the traditional technique of probit model. These input variables, along with the other two demographic variables, as adopted in this study, might not be the most influential explanatory factors that strongly impact the variation of the academic performance. However, neural networks seem not to have any difficulty in finding the appropriate relationships among them and in coming up with the substantially favorable and accurate results. Fundamentally, these data variables are readily available within U B C ' s database and are conveniently accessible. There is no need for the B.Com. Program to attempt to acquire other explanatory variables that are more related and influential to the variation of academic success. The costs and efforts that would have to 119 be spent in finding the right set of explanatory factors could outweigh marginal benefits the B.Com. Program might receive from having more accurate results. After the neural network models have been trained, they can be used to predict the academic performance of students. For each specialization track, the best performing model will be selected. This set of models will then be included in the decision support system developed for the undergraduate program. The B.Com. Program can inquire of this system to predict the possible performance of particular students in any particular specialization, given their set of past academic and demographic data. For example, the system will predict how well a student will perform in Accounting, and, using the same set of data, how well he or she will do in Marketing. Suggestion for Future Research This research study could be repeated in the future when more complete data become available. It would be interesting to see whether the results will be similar to or share the same trend as the current results. Another study should also be conducted with the remaining specializations - MIS, Urban Land Economics, Industrial Relations, etc., again, when the number of complete records becomes more substantial. The results from these future studies might strengthen or weaken what the author has found from the first three specializations within this study. This would enable us to make general arguments regarding the prediction of academic performance within the Business Administration discipline using neural networks. 120 Only directly entered or non-transfer students who just finish their first year at U B C are the subjects of this research study. The results from this study should provide some guidelines for conducting further investigations that include second-year and third-year transfer students from other colleges or universities, and mature students with work experience. The admission requirements of these groups of students are quite different from those of the group of first-year U B C applicants (UBC Calendar, 1998). These students might be able to substitute some prerequisite courses required at the first year at U B C with equivalent courses the students took at their prior institutions. This substitutability could be an explanation for the missing of some prerequisite courses' grades within the incomplete records. We can utilize neural networks to predict the academic performance of these groups of students. However, we would have to consider the appropriate and valid ways to deal with the missing values, before applying the neural networks. For example, those missing first-year prerequisite courses' grades could be replaced with the grades of corresponding substituted courses. Otherwise, different sets of input variables that correspond to different natures of academic records of these students might have to be implemented. Other than predicting student academic performance into one of three academic standing groups, the backpropagation neural networks can be used to predict the performance in terms of actual grades (with continuous percentage values). A possible focus of the further study could be a prediction of student performance in various disciplines of second-year core Commerce courses, given the input data of past academic records and demographic factors, as have been used in this study. On the other hand, the academic 121 performance of students in those core Commerce courses can be treated as independent variables to predict the average grade of five core specialization courses. It should be very useful to determine how well the backpropagation neural networks will perform when the output variable is continuous rather than categorical. Furthermore, although the data items used in the study are basically from the student database, it would be more interesting to investigate other potential explanatory factors currently not available in the database. For example, related training, work experience, extra-curricular activities, or even psychological factors could possibly be relevant indicators of academic success. To collect these data items, a questionnaire could be distributed to students who are about to enter the program. Finally, other than attempting to predict academic success, the undergraduate program, in cooperation with the career center, can investigate the possibility of using these data to predict future job prospects of students. The models from this further study could help the B.Com. Program and the career center guide their students on which career path they should pursue. Students would have a chance to be fully equipped with the skills necessary for their prospective career before they enter the job market. Concluding Remarks This research study was conducted because of two anticipated benefits. The first and foremost benefit is that the study would provide more insights and knowledge about the application of neural networks to the classification and prediction of data within the 122 management area. The author would like to show that, other than traditional techniques, there are newly emerging methods and technologies that effectively enhance the managerial decision making process. The second benefit is that the findings and results from this study would be useful for a development of the decision support system for the admissions process. This support system can help the B.Com. Program in its student recruitment efforts by predicting the future academic performance of individual applicants. Having this information, the B.Com. Program can determine whether they should accept particular applicants. At the same time, the system would also help the B.Com. Program to better advise entering students of which specialization might be the most suitable for them. There are still so many aspects about neural networks and their applications that are not fully understood. The current findings, explanations, and evaluations of characteristics and capabilities of neural networks might be wrong in the future when more insights about them are discovered. For example, despite being recognized by their satisfactory performance in classification and prediction, the backpropagation neural networks have been consistently commented on and questioned by some researchers in terms of their traditional learning algorithms (Wang 1995; Curry & Morgan, 1997). They believe that the gradient descent learning algorithms have some weaknesses and might not be able to create optimal performance. They have suggested new methods to improve the learning algorithms, which, in turn, should increase their performance. From this example, we can see that new effective techniques and methods regarding neural networks and their applications have been continually unveiled. Consequently, even at this point, it might 123 not be easy for anyone to make ultimate conclusions about particular aspects of neural networks, given the current knowledge we have about them. 124 Bibliography Aleksander, Igor. Neural Computing Architectures: The Design of Brain-Like Machines. London, U K : North Oxford Academic Publishers Ltd., 1989. Alspaugh, Carol Ann. "Identification of Some Components of Computer Programming Aptitude," Journal for Research in Mathematics Education. 3(2) (Mar 1972): 89-98. Arlin, Marshall. Analysis of Variance for Educational Research. Unpublished Manuscript, Vancouver, B C : Faculty of Education, University of British Columbia, 1997. Bansal, Arun, Robert J. Kauffman, and Rob R. Weitz. "Comparing the Modeling Performance of Regression and Neural Networks as Data Quality Varies: A Business Value Approach," Journal of Management Information Systems. 10(1) (Summer 1993): 11-32. Brancheau, James C. "Completing Your Masters Thesis in Information Systems: Guidelines and Suggestions," (1995-97), http://www.colorado.edu/infs/icb/isthesis.html. Burke, Laura Ignizio. "Introduction to Artificial Neural Systems for Pattern Recognition," Computers and Operations Research. 18(2) (1991): 211-220. Butcher, D. F., and W. A . Muth. "Predicting Performance in an Introductory Computer Science Course," Communications of the A C M . 28(3) (Mar 1985): 263-268. Campbell, Patricia F., and George P. McCabe. "Predicting the Success of Freshmen in a Computer Science Major," Communications of the A C M . 27(11) (Nov 1984): 11081113. Caudill, Maureen. "Using Neural Nets: Representing Knowledge," A l Expert. 4(12) (Dec 1989): 34-41. Caudill, Maureen. "Neural Network Training Tips and Techniques," A l Expert. 6(1) (Jan 1991): 56-62. Chen, S. K., P. Mangiameli, and D. West. "The Comparative Ability of Self-Organizing Neural Networks to Define Cluster Structure," Omega, International Journal of Management Science. 23(3) (1995): 271-279. Cohen, Jacob, and Patricia Cohen. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, 1983. Cronan, Timothy P., Phillip R. Embry, and Steven D. White. "Identifying Factors that Influence Performance of Non-Computing Majors in the Business Computer Information Systems Course," Journal of Research on Computing in Education. 21(4) (Summer 1989): 431-443. 125 Curry, B., and P. Morgan. "Neural Networks: A Need for Caution," Omega, International Journal of Management Science. 25(1) (1997): 123-133. Daniel, Wayne W., and James C. Terrell. Business Statistics: For Management and Economics. Boston, M A : Houghton Mifflin Company: 1992. Davis, Gordon Bitter. Writing the Doctoral Dissertation: A Systematic Approach. Hauppauge, N Y : Barron's, 1997. Demuth, Howard, and Mark Beale. Neural Network Toolbox User's Guide. Natick, M A : The MathWorks, 1998. De Wilde, Philippe. Neural Network Models: Theory and Projects. London, U K : Springer-Verlag, 1997. Domer, D. E., and A . E. Johnson, Jr. "Selective Admissions and Academic Success: A n Admissions Model for Architecture Students," College and University. 58(1) (Fall 1982): 19-30. Doumas, Anastasia, Konstantinos Mavroudakis, Dimitris Gritzalis, and Sokratis Katsikas. "Design of a Neural Network for Recognition and Classification of Computer Viruses," Computers and Security. 14(5) (1995): 435-448. Eskew, Robert K., and Robert H. Faley. "Some Determinants of Student Performance in the First College-Level Financial Accounting Course," The Accounting Review. 63(1) (January 1988): 137-147. Evans, Gerald E., Mark G. Simkin. "What Best Predicts Computer Proficiency?," Communications of the A C M . 32(11) (Nov 1989): 1322. Fowler, George C , Louis W. Glorfeld. "Predicting Aptitude in Introductory Computing: A Classification Model," AEDS Journal. 14(2) (Winter 1981): 96-109. Fox, John. Applied Regression Analysis, Linear Models, and Related Methods. Thousand Oaks, C A : Sage, 1997. Gorr, Wilpen L., Daniel Nagin, and Janusz Szczypula. "Comparative Study of Artificial Neural Network and Statistical Models for Predicting Student Grade Point Averages," International Journal of Forecasting. 10 (1994): 17-34. Gramet, Pamela, and Lorraine Terracina. "Qualitative and Quantitative Variables in a Selective Admissions Process," College and University. 63(4) (Summer 1988): 368-373. Green, Brian Patrick, and Jae Hwa Choi. "Assessing the Risk of Management Fraud through Neural Network Technology," Auditing: A Journal of Practice & Theory. 16(1) (Spring 1997): 14-28. 126 Greene, William H . Econometric Analysis. Upper Saddle River, NJ: Prentice Hall, 1997. Gupta, V . K., J. G. Chen, and M . B . Murtaza. " A Learning Vector Quantization Neural Network Model for the Classification of Industrial Construction Projects," Omega, International Journal of Management Science. 25(6) (1997): 715-727. Gurney, Kevin. A n Introduction to Neural Networks. London, U K : U C L Press, 1997. Hart, Anna. "Using Neural Networks for Classification Tasks - Some Experiments on Datasets and Practical Advice," Journal of the Operational Research Society. 43(3) (1992): 215-226. Hiotis, Andre. "Inside a Self-Organizing Map: Two-Dimensional Map for Experimenting with Neural-Network Paradigm," A I Expert. 8(4) (Apr 1993), 38-41. Ho, David Y. F., and John A. Spinks. "Multivariate Prediction of Academic Performance by Hong Kong University Students," Contemporary Educational Psychology. 10 (1985): 249-259. Huang, Zezhen, and Anthony Kuh. " A Combined Self-Organizing Feature Map and Multilayer Perceptron for Isolated Word Recognition," IEEE Transactions on Signal Processing. 40(11) (Nov 1992): 2651-2657. Hruschka, Harald. "Determining Market Response Functions by Neural Network Modeling: A Comparison to Econometric Techniques." European Journal of Operational Research. 66 (1993): 27-35. Jain, Bharat A., and Barin N . Nag. "Artificial Neural Network Models for Pricing Initial Public Offerings," Decision Sciences. 26(3) (1995): 283-299. Jain, Bharat A., and Barin N . Nag. "Performance Evaluation of Neural Network Decision Models," Journal of Management Information Systems. 14(2) (Fall 1997): 201-216. Kiang, Melody Y . , Uday R. Kulkarni, and Kar Yan Tarn. "Self-Organizing Map Network as an Interactive Clustering Tool - A n Application to Group Technology," Decision Support Systems. 15(4) (1995): 351-374. Knight, Kevin. "Connectionist Ideas and Algorithms," Communications of the A C M . 33(11) (Nov 1990): 59-74. Kohonen, Teuvo. "The Self-Organizing Map," Proceeding of the IEEE. 78(9) (Sep 1990): 1464-1480. Kohonen, Teuvo. Self-Organizing Maps. Berlin Heidelberg: Springer-Verlag, 1995. 127 Konvalina, John, Larry Stephens, and Stanley Wileman. "Identifying Factors Influencing Computer Science Aptitude and Achievement," AEDS Journal. 16(2) (Winter 1983): 106-112. Lenard, Mary Jane, Pervaiz Alam, and Gregory R. Madey. "The Application of Neural Networks and a Qualitative Response Model to the Auditor's Going Concern Uncertainty Decision," Decision Sciences. 26(2) (1995): 209-226. L i , Eldon Y. "Artificial Neural Networks and their Business Applications," Information and Management. 27 (1994): 303-313. Lippmann, Richard P. " A n Introduction to Computing with Neural Nets," IEEE ASSP Magazine. 4(3) (Apr 1987): 4-22. Markham, Ina S., and Cliff T. Ragsdale. "Combining Neural Networks and Statistical Predictions to Solve the Classification Problem in Discriminant Analysis," Decision Sciences. 26(2) (1995): 229-241. Mazance, Josef A . "Positioning Analysis with Self-Organizing Maps," Cornell Hotel and Restaurant Administration Quarterly. 36(6) (Dec 1995): 80-97. Mazlack, Lawrence J. "Identifying Potential to Acquire Programming Skill," Communications of the A C M . 23(1) (Jan 1980): 14-17. Mohammad, Yousuf H. J., and Mohammad A. H . Almahmeed. " A n Evaluation of Traditional Admission Standards in Predicting Kuwait University Students' Academic Performance," Higher Education. 17(2) (1988): 203-217. Nisbet, Janice, Virgil E. Ruble, and K . Terry Schurr. "Predictors of Academic Success with High Risk College Students," Journal of College Student Personnel. 23(3) (May 1982): 227-235. Oman, Paul W. "Identifying Student Characteristics Influencing Success in Introductory Computer Science Courses," AEDS Journal. 19(2) (Winter/Spring 1986): 226-233. Orwig, Richard E., Hsinchun Chen, and Jay F. Nunamaker, Jr. " A Graphical, SelfOrganizing Approach to Classifying Electronic Meeting Output," Journal of the American Society for Information Science. 48(2) (Feb 1997): 157-170. Pascarella, Ernest T., Paul B. Duby, Vernon A. Miller, and Sue P. Rasher. "Preenrollment Variables and Academic performance as Predictors of Freshman Year Persistence, Early Withdrawal, and Stopout Behavior in an Urban, Nonresidential University," Research in Higher Education. 15(4) (1981): 329-349. Patuwo, Eddy, Michael Y. Hu, and Ming S. Hung. "Two-Group Classification Using Neural Networks," Decision Sciences. 24(4) (1993): 825-845. 128 Rao, Valium B., and Hayagriva V . Rao. C++ Neural Networks and Fuzzy Logic. New York, N Y : MIS:Press, 1995. Ritter, H., and T. Kohonen. "Self-Organizing Semantic Maps," Biological Cybernetics. 61 (1989): 241-254. Ritter, Helge, Thomas Martinetz, and Klaus Schulten. Neural Computation and SelfOrganizing Maps. Don Mills, ON: Addison-Wesley, 1992. Rogers, Joey. Object-Oriented Neural Networks in C++. Chestnut Hill, M A : Academic Press, Inc., 1997. Rumelhart, David E., Bernard Widrow, and Michael A . Lehr. "The Basic Ideas in Neural Networks," Communications of the A C M . 3(3) (Mar 1994): 87-92. Salchenberger, Linda M . , E. Mine Cinar, and Nicholas A . Lash. "Neural Networks: A New Tool for Predicting Thrift Failures," Decision Sciences. 23(4) (1992): 899-916. Sexton, Randall S., Robert E. Dorsey, and John D. Johnson. "Toward Global Optimization of Neural Networks: A Comparison of the Genetic Algorithm and Backpropagation," Decision Support Systems. 22 (1998): 171-185. Shanker, M . , Michael Y . Hu, and Ming S. Hung. "Effect of Data Standardization on Neural Network training," Omega, International Journal of Management Science. 24(4) (1996): 385-397. Shaughnessy, Michael F., and Robert Evans. "Word/World Knowledge: Prediction of College GPA," Psychological Reports. 59 (1986): 1147-1150. Stanbury, W. T., Dan Gardiner, S. W. Hamilton, Erica Mills, Craig Pinder, Bernhard Schwab, Dan Simunic, James Kwong, and James Nevison. Interim Report of the Faculty of Commerce Undergraduate Program Review Committee. Unpublished Manuscript, Vancouver, BC: Faculty of Commerce and Business Administration, University of British Columbia, 1998. StataCorp. Stata Reference Manual: Release 5 Volume 2. College Station, T X : Stata Press, 1997. Subramanian, Venkat, Ming S. Hung, and Michael Y . Hu. " A n Experimental Evaluation of Neural Networks for Classification," Computers and Operations Research. 20(7) (1993): 769-782. Tarn, Kar Yan, and Melody Y . Kiang. "Managerial Applications of Neural Networks: The Case of Bank Failure Predictions," Management Science. 38(7) (Jul 1992): 926-947. 129 Tracey, Terence J., William E. Sedlacek, and Russell D. Miars. "Applying Ridge Regression to Admissions Data by Race and Sex," College and University. 58(3) (Spring 1983): 313-317. Touron, Javier. "The Determination of Factors Related to Academic Achievement in the University : Implications for the Selection and Counselling of Students," Higher Education. 12 (1983): 399-410. Venugopal, V., and W. Baets. "Neural Networks and Statistical Techniques in Marketing Research: A Conceptual Comparison," Marketing Intelligence and Planning. 12(7) (1994): 30-38. Wang, DeLiang. "Pattern Recognition: Neural Networks in Perspective," IEEE Expert. 8 (August 1993): 52-60. . Wang, Shouhong. "The Unpredictability of Standard Back-Propagation Neural Networks in Classification Applications," Management Science. 41(3) (1995): 555-559. . Wang, Shouhong. " A n Insight into the Standard Back-Propagation Neural Network Model for Regression Analysis," Omega, International Journal of Management Science. 26(1) (1998): 133-140. Wasserman, Philip D. Neural Computing: Theory and Practice. New York, N Y : Van Nostrand Reinhold, 1989. Wilson, Rick L., and Bill C. Hardgrave. "Predicting Graduate Student Success in an M B A Program: Regression versus Classification," Educational and Psychological Measurement. 55(2) (Apr 1995): 186-195. Wong, Bo K., Thomas A . Bodnovich, and Yakup Selvi. "Neural Network Applications in Business: A Review and Analysis of the Literature (1988-95)," Decision Support Systems. 19(1997): 301-320. Yale, Karia. "Preparing the Right Data Diet for Training Neural Networks," IEEE Spectrum. 34(3) (Mar 1997): 64-66. Yoon, Youngohc, George Swales, Jr., and Thomas M . Margavio. " A Comparison of Discriminant Analysis versus Artificial Neural Networks," Journal of the Operational Research Society. 44(1) (1993): 51-60. Young, Abimbola S. "Pre-enrollment Factors and Academic Performance of First-Year Science Students as a Nigerian University: A Multivariate Analysis," Higher Education. 18 (1989): 321-339. 130 Young, John W. "Differential Prediction of College Grades by Gender and by Ethnicity: A Replication Study," Educational and Psychological Management. 54(4) (Winter 1994): 1022-1029. Zahedi, Fatemeh. Intelligent Systems for Business: Expert Systems with Neural Networks. Belmont, C A : Wadsworth, 1993. Zhang, Gioqinang, and Michael Y . Hu. "Neural Network Forecasting of the British Pound/US Dollar Exchange Rate," Omega, International Journal of Management Science. 26(4) (1998): 495-506. , Shaun, "Neural Networks," (1995), http://www.soton.ac.uk/~sni/neuralnet.html. , . The University of British Columbia: 1998/99 Calendar. Vancouver, B C : University of British Columbia, 1998. 131 Appendix I Tables of Results in Detail This appendix illustrates the research results in detail. There are basically three sections within this appendix. The first section consists of tables elaborating the mean and standard deviation values of each input variable by each categorized group. The second section is composed of tables reporting correct classification rates produced by all configurations of both neural network paradigms. The final section provides all detailed figures concerning the A N O V A tests of the differences between the classification and prediction performance of the backpropagation models and that of the L V Q models. The following tables, table 1.1 to table 1.6, show the means and standard deviations of input variables, separated by each categorized group. There are a total of 6 different tables for this section. Group 3 Group 1 Group 2 (50 - 67 %) (80 -100 %) (68 - 79 %) Variable N = 23 N = 40 N = 74 Mean SD Mean SD Mean SD 18.21 0.5184 Age 18.25 0.5884 18.32 0.5993 4.4363 73.78 3.8489 First-Year GPA 81.45 4.5739 75.93 5.4657 ECON 100 81.65 6.8931 76.58 7.1539 71.65 6.7444 70.18 6.9650 70.39 6.4437 ENGL 112 73.52 7.4994 71.17 Selective ENGL 73.33 6.2936 71.03 6.2329 8.0534 7.0924 82.04 8.4860 80.30 MATH 140 87.83 7.4994 MATH 141 92.23 6.3468 83.81 6.2329 83.78 0.304 0.4883 0.4705 Gender 0.425 0.5006 0.378 Table 1.1: Means and standard deviations of input variables within each categorized group, for the Accounting with Math 140 & 141 option Group 3 Group 1 Group 2 (50 - 67 %) (80 -100 %) (68 - 79 %) Variable N =6 N = 24 N = 30 SD SD Mean Mean SD Mean 0.4083 0.4661 18.83 18.41 0.6539 18.30 Age 3.5024 4.8162 72.33 4.5675 76.67 First-Year GPA 81.58 4.0208 72.17 ECON 100 84.08 7.0890 77.87 6.9368 6.0470 6.6132 69.17 ENGL 112 72.38 6.4391 70.70 67.00 2.6833 6.5120 69.50 6.4260 Selective ENGL 72.33 9.9683 8.5167 74.83 MATH 100 86.67 8.8252 81.50 8.9536 10.8810 69.83 MATH 101 10.4082 78.87 83.38 0.500 0.5477 0.600 0.4983 Gender 0.333 0.4815 Table 1.2: Means and standard deviations of input variables within each categorized group, for the Accounting with Math 100 & 101 option 132 Variable Group 1 (80 -100 %) N = 42 Mean SD Group 2 (68 - 79 %) N = 49 Mean SD Group 3 (50 - 67 %) N =2 Mean SD 0.7392 18.69 18.00 0.0000 Age 18.45 1.0449 1.4142 First-Year GPA 79.86 4.4314 76.20 5.9895 82.00 ECON 100 81.81 7.1984 75.37 7.7800 78.50 3.5355 ENGL 112 72.55 7.5327 71.98 9.1503 73.00 9.8995 75.00 1.4142 Selective ENGL 73.86 7.4263 71.20 7.3427 2.8284 MATH 140 81.55 11.6709 90.50 84.88 10.0855 MATH 141 9.5241 90.50 3.5355 88.05 9.3677 85.00 Gender 0.551 0.000 0.0000 0.595 0.4968 0.5025 Table 1.3: Means and standard deviations of input variables within each categorized group, for the Finance with Math 140 & 141 option Variable Group 1 (80 -100 %) N = 26 Mean SD Group 2 (68 - 79 %) N = 24 Mean SD Group 3 (50 - 67 %) N =2 Mean SD 0.0000 18.50 0.9055 18.50 0.7802 18.00 Age 72.50 0.7071 First-Year GPA 80.35 5.1064 76.54 3.3360 0.0000 77.83 5.9247 75.00 ECON 100 82.73 8.1366 4.9498 71.42 6.1560 66.50 ENGL 112 6.3509 70.63 72.00 4.2426 68.62 7.0941 67.13 7.3147 Selective ENGL 7.3752 81.50 9.5871 72.50 10.6066 MATH 100 85.08 3.5355 MATH 101 83.54 9.9488 81.92 8.1876 78.50 0.0000 0.4852 0.5090 1.000 Gender 0.346 0.45.8 Table 1.4: Means and standard deviations of input variables within each categorized group, for the Finance with Math 100 & 101 option Variable Group 1 (80 -100 %) N =6 Mean SD Group 2 (68 - 79 %) N = 70 Mean SD Group 3 (50 - 67 %) N =3 SD Mean 1.0000 0.5164 18.47 0.7750 19.00 Age 18.33 0.5774 3.5024 3.9619 73.33 First-Year GPA 78.67 74.89 7.7675 71.97 6.5739 64.33 ECON 100 78.17 5.9805 2.0817 6.4194 69.33 ENGL 112 72.67 6.1210 71.46 1.1547 72.26 7.0929 67.33 Selective ENGL 77.17 10.2258 9.5762 83.00 6.2450 MATH 140 71.67 13.2313 78.91 9.6437 81.00 MATH 141 82.67 7.1181 81.11 10.3189 0.5774 0.5164 0.4781 0.333 Gender 0.333 0.343 Table 1.5: Means and standard deviations of input variables within each categorized group, for the Marketing with Math 140 & 141 option 133 Group 2 Group 3 (68 - 79 % ) (50 - 67 % ) Group 1 (80 - 1 0 0 Variable %) N = 2 Mean N = 0 N = 13 SD Mean SD Mean SD 18.32 0.5872 N/A N/A 18.80 1.1314 Age 5.2705 N/A N/A First-Year GPA 76.05 2.7577 74.05 N/A 76.54 8.0789 N/A ECON 100 71.50 3.5355 3.4844 N/A N/A ENGL 112 78.00 0.0000 68.85 N/A N/A Selective ENGL 77.00 1.1412 68.69 6.3559 8.5327 N/A N/A MATH 100 76.00 8.4853 77.85 N/A N/A 8.5230 MATH 101 73.50 12.0208 72.15 0.5064 N/A N/A 0.615 Gender 0.000 0.0000 Table 1.6: Means and standard deviations of input variables within each categorized group, for the Marketing with Math 100 & 101 option The following tables (table 1.7 to table 1.30) report the correct classification rates of all possible neural network configurations. In each cell, the top figure represents the number, of correct classified cases and the bottom figure is the calculated percentage of correct classification. The tables report the figures of both aggregate performance and performance in each group. There are a total of 24 different tables for this section. Learning Rate Hidden Nodes G l G2 0.9 0.5 0.1 G3 Total G l G2 G3 Total G l G2 G3 Total 24.40 63.80 11.20 99.40 24.40 63.80 11.20 99.40 24.60 67.00 12.20 103.80 4 76% 73% 62% 91% 53% 73% 61% 86% 49% 61% 86% 49% 23.40 61.40 10.80 95.60 26.20 69.40 10.40 106.00 31.60 72.00 13.80 117.40 6 86% 77% 79% 97% 60% 70% 66% 94% 45% 59% 83% 47% 12.00 112.80 24.60 67.80 13.00 105.40 31.80 68.80 15.00 115.60 28.40 72.40 8 82% 84% 71% 98% 52% 77% 80% 93% 65% 62% 92% 57% 35.80 69.60 12.40 117.80 36.00 56.20 13.00 105.20 31.20 67.80 12.40 111.40 10 81% 78% 92% 54% 77% 86% 90% 76% 57% 90% 94% 54% 28.80 68.80 13.40 111.00 27.00 71.00 11.60 109.60 26.60 69.40 12.40 108.40 12 79% 80% 67% 94% 54% 72% 93% 58% 81% 68% 96% 50% 21.80 70.40 9.60 101.80 21.60 73.00 10.20 104.80 23.20 63.60 7.80 94.60 14 69% 76% 58% 86% 34% 74% 54% 99% 44% 55% 95% 42% 29.00 71.40 11.00 111.40 18.40 64.60 4.40 87.40 29.60 72.00 14.80 116.40 16 85% 74% 97% 64% 73% 96% 48% 81% 46% 87% 19% 64% Table 1.7: Classification and prediction performance of the backpropagation models on the training data set of the Accounting with Math 140 & 141 option 134 Learning Rate Hidden Nodes 0.1 Gl G2 G3 Total Gl G2 0 .5 G3 0 .9 Total Gl G2 G3 Total 1.40 7.60 0.00 9.00 1.40 .7.60. 0:00 9.00 1.40 7.20 1.00 9.60 4 35% 69% 0% 47% 35% 69% 0% 47% 35% 65% 25% 51% 1.40 7.60 0.80 9.80 1.60 7.60 0.60 9.80 1.40 8.00 0.00 9.40 6 35% 69% 20% 52% 40% 69% 15% 52% 35% 73% 0% 49% 1.00 9.00 0.60 10.60 1.40 8.00 0.80 10.20 1.00 8.00 0.60 9.60 8 25% 82% 15% 56% 35% 73% 20% 54% 25% 73% 15% 51% 2:00 7.40 0.60 10.00 1.80 7.40 0.60 9.80 2.00 7.20 0.20 9.40 10 50% 67% 15% 53% 45% 67% 15% 52% 50% 65% 5% 49% 1.00 7.20 0.80 9.00 1.00 9.00 1.00 11.00 1.40 8.60 0.40 10.40 12 25% 65% 20% 47% 25% 82% 25% 58% 35% 78% 10% 55% 1.00 8.80 0.40 10.20 1.20 9.40 0.60 11.20 1.20 7.60 0.20 9.00 14 25% 80% 10% 54% 30% 85% 15% 59% 30% 69% 5% 47% 1.20 8.80 0.60 10.60 1.20 8.60 0.00 9.80 0.80 9.00 0.40 10.20 16 30% 80% 15% 56% 30% 78% 0% 52% 20% 82% 10% 54% Table 1.8: Classification and prediction performance of the backpropagation models on the validation data set of the Accounting with Math 140 & 141 option Learning Rate Hidden Nodes Gl G2 0 .1 G3 Total Gl G2 0 .5 G3 Total Gl G2 0.9 G3 Total 17.60 36.40 8.00 62.00 13.40 36.20 6.80 56.40 11.40 36.40 5.80 53.60 4 44% 49% 35% 45% 34% 49% 30% 41% 29% 49% 25% 39% 15.20 41.40 7.60 64.20 10.00 34.00 9.80 53.80 11.40 38.40 7.40 57.20 6 38% 56% 33% 47% 25% 46% 43% 39% 29% 52% 32% 42% 14.60 37.00 6.80 58.40 6.20 35.60 5.80 47.60 14.80 30.40 4.60 49.80 8 37% 50% 30% 43% 16% 48% 25% 35% 37% 41% 20% 36% 8.60 51.60 3.80 64.00 10.00 40.40 5.00 55.40 4.80 48.20 4.80 57.80 10 22% 70% 17% 47% 25% 55% 22% 40% 12% 65% 21% 42% 15.20 37.60 5.40 58.20 9.80 37.00 4.00 50.80 10.40 34.80 10.00 55.20 12 38% 51% 23% 42% 25% 50% 17% 37% 26% 47% 43% 40% 13.00 40.20 2.00 55.20 10.40 40.20 6.00 56.60 14.40 33.20 5.00 52.60 14 33% 54% 9% 40% 26% 54% 26% 41% 36% 45% 22% 38% 11.60 42.00 2.20 55.80 8.20 35.80 4.20 48.20 11.80 36.40 6.20 . 54.40 16 29% 57% 10% 41% 21% 48% 18% 35% 30% 49% 27% 40% Table 1.9: Classification and prediction performance of the learning vector quantization models on the training data set of the Accounting with Math 140 & 141 option 135 Hidden Learning Rate 0 .5 0.9 G2 G3 T o t a l G 1 G2 G3 T o t a l G 1 G2 G3 T o t a l G l 1.40 5.80 0.40 7.60 1.20 5.40 1.20 7.80 1.60 5.20 0.60 7.40 4 35% 53% 10% 40% 30% 49% 30% 41% 40% 47% 15% 39% 0.80 4.80 1.60 7.20 1.20 5.60 1.20 8.00 1.20 4.60 1.20 7.00 6 20% 44% 40% 38% 30% 51% 30% 42% 30% 42% 30% 37% 1.20 5.80 1.60 8.60 0.80 6.40 1.40 8.60 1.00 4.80 0.60 6.40 8 30% 53% 40% 45% 20% 58% 35% 45% 25% 44% 15% 34% 1.00 7.80 0.60 9.40 0.80 7.00 1.00 8.80 0.60 6.80 0.80 8.20 10 25% 71% 15% 49% 20% 64% 25% 46% 15% 62% 20% 43% 0.80 4.40 0.60 5.80 0.20 6.80 1.40 8.40 0.60 4.80 0.80 6.20 12 5% 62% 35% 44% 15% 44% 20% 33% 20% 40% 15% 31% 1.00 5.40 0.60 7.00 1.20 6.00 0.80 8.00 1.40 6.40 1.20 9.00 14 25% 49% 15% 37% 30% 55% 20% 42% 35% 58% 30% 47% 0.80 7.60 0.00 8.40 0.80 6.20 1.00 8.00 1.20 5.80 1.00 8.00 16 20% 69% 0% 44% 20% 56% 25% 42% 30% 53% 25% 42% Table 1.1 0: Classification and prediction performance of the learning vector quantization models (m the validation data set of the Accounting with Math 140 & 141 option Nodes Hidden Nodes 0.1 Learning Rate 0.1 0.5 0.9 G2 G3 T o t a l G2 G3 T o t a l G l G 1 G2 G3 T o t a l G l 1.40 45.80 45.80 15.60 28.80 15.60 28.80 1.40 45.80 15.60 28.80 1.40 4 65% 96% 23% 76% 65% 96% 23% 76% 65% 96% 23% 76% 16.60 28.40 2.40 47.40 16.80 19.20 0.80 36.80 18.60 27.80 4.60 51.00 6 69% 95% 40% 79% 70% 64% 13% 61% 78% 93% 77% 85% 22.00 28.00 3.60 53.60 17.40 26.80 2.60 46.80 17.00 27.00 3.00 47.00 8 92% 93% 60% 89% 73% 89% 43% 78% 71% 90% 50% 78% 15.20 27.60 2.40 45.20 17.80 28.60 4.40 50.80 21.60 28.60 4.60 54.80 10 63% 92% 40% 75% 74% 95% 73% 85% 90% 95% 77% 91% 21.20 27.40 3.20 51.80 21.20 27.40 3.20 51.80 19.20 27.40 3.80 50.40 12 88% 91% 53% 86% 88% 91% 53% 86% 80% 91% 63% 84% 23.20 29.60 4.00 56.80 23.00 29.20 3.80 56.00 20.00 29.20 4.60 53.80 14 97% 99% 67% 95% 96% 97% 63% 93% 83% 97% 77% 90% 22.80 28.00 3.20 54.00 21.40 22.80 2.80 47.00 20.40 25.00 4.20 49.60 16 95% 93% 53% 90% 89% 76% 47% 78% 85% 83% 70% 83% Table 1.11: Classification and prediction performance of the backpropagation models on the training data set of the Accounting with Math 100 & 101 option 136 Hidden Learning Rate (1.5 0 .9 G 1 G2 G3 T o t a l G 1 G2 G3 T o t a l G 1 G2 G3 T o t a l 2.40 1.80 N / A 4.20 2.40 1.80 N / A 4.20 2.40 1.80 N / A 4.20 4 60% 90% N / A 70% 60% 90% N / A 70% 60% 90% N / A 70% 2.20 1.20 N / A 3.40 2.60 0.60 N / A 3.20 2.60 1.00 N / A 3.60 6 55% 60% N / A 57% 65% 30% N / A 53% 65% 50% N / A 60% 2.40 1.40 N / A 3.80 2.40 1.40 N / A 3.80 2.00 1.40 N / A 3.40 8 60% 70% N / A 63% 60% 70% N / A 63% 50% 70% N / A 57% 1.80 1.60 N / A 3.40 2.40 1.40 N / A 3.80 2.80 0.40 N / A 3.20 10 45% 80% N / A 57% 60% 70% N / A 63% 70% 20% N / A 53% 3.20 0.80 N / A 4.00 3.20 0.80 N / A 4.00 2.40 1.60 N / A 4.00 12 80% 40% N / A 67% 80% 40% N / A 67% 60% 80% N / A 67% 2.80 1.40 N / A 4.20 2.60 1.40 N / A 4.00 2.00 1.60 N / A 3.60 14 70% 70% N / A 70% 65% 70% N / A 67% 50% 80% N / A 60% 2.80 0.80 N / A 3.60 3.00 1.80 N / A 4.80 2.20 0.80 N / A 3.00 16 70% 40% N / A 60% 75% 90% N / A 80% 55% 40% N / A 50% Table 1.12: Classification and prediction performance of the backpropagation models on the validation data set of the Accounting with Math 100 & 101 option Nodes Hidden Nodes 0 .1 Learning Rate 0.1 0.9 0.5 G2 G3 Gl G2 G3 T o t a l G l 5.60 16.60 1.60 23.80 8.20 16.20 1.20 4 23% 55% 27% 40% 34% 54% 20% 10.00 14.40 1.60 26.00 7.40 14.00 1.00 6 42% 48% 27% 43% 31% 47% 17% 11.00 16.20 1.20 28.40 9.80 16.80 1.20 8 46% 54% 20% 47% 41% 56% 20% 10.00 18.00 0.40 28.40 10.40 13.60 0.60 10 42% 60% 7% 47% 43% 45% 10% 11.80 16.20 1.00 29.00 5.00 17.60 0.20 12 49% 54% 17% 48% 21% 59% 3% 15.00 16.80 0.40 32.20 8.00 17.00 0.40 14 63% 56% 7% 54% 33% 57% 7% 12.20 18.80 0.00 31.00 9.60 16.40 0.60 16 51% 63% 0% 52% 40% 55% 10% Table 1.13: Classification and prediction performance models on the training data set of the Accounting with G2 G3 T o t a l Gl 25.60 5.80 13.60 2.80 22.20 43% 24% 45% 47% 37% 22.40 7.80 18.20 1.40 27.40 37% 33% 61% 23% 46% 27.80 10.80 17.20 0.80 28.80 46% 45% 57% 13% 48% 24.60 9.80 11.40 0.00 21.20 41% 41% 38% 0% 35% 22.80 9.80 11.80 0.80 22.40 38% 41% 39% 13% 37% 25.40 8.80 12.00 1.20 22.00 42% 37% 40% 20% 37% 26.60 8.00 16.60 1.00 25.60 44% 33% 55% 17% 43% of the learning vector quantization Math 100 & 101 option Total 137 Learning Rate Hidden 1).5 0 .9 1 Total G 1 G2 G3 Total G 1 G2 G3 Total 0.60 0.40 N/A 1.00 2.00 1.60 N/A 3.60 0.80 0.20 N/A 1.00 4 15% 20% N/A 17% 50% 80% N/A 60% 20% 10% N/A 17% 1.80 0.80 N/A 2.60 1.00 0.60 N/A 1.60 2.20 1.00 N/A 3.20 6 45% 40% N/A 43% 25% 30% N/A 27% 55% 50% N/A 53% 1.40 0.80 N/A 2.20 1.00 0.40 N/A 1.40 2.00 1.40 N/A 3.40 8 35% 40% N/A 37% 25% 20% N/A 23% 50% 70% N/A 57% 1.80 0.40 N/A 2.20 1.40 1.40 N/A 2.80 1.40 0.40 N/A 1.80 10 45% 20% N/A 37% 35% 70% N/A 47% 35% 20% N/A 30% 2.20 0.00 N/A 2.20 1.00 0.00 N/A 1.00 1.40 1.00 N/A 2.40 12 55% 0% N/A 37% 25% 0% N/A 17% 35% 50% N/A 40% 2.20 0.40 N/A 2.60 1.60 0.80 N/A 2.40 1.60 1.60 N/A 3.20 14 55% 20% N/A 43% 40% 40% N/A 40% 40% 80% N/A 53% 1.80 0.00 N/A 1.80 1.60 0.80 N/A 2.40 1.60 0.40 N/A 2.00 16 45% 0% N/A 30% 40% 40% â€¢N/A 40% 40% 20% N/A 33% Table 1.14: Classification and prediction performance of the learning vector quantization models on the validation data set of the Accounting with Math 100 & 101 option Nodes G 0 .1 G2 G3 Learning Rate Hidden Nodes (1.1 G l G2 G3 Total G l G2 0 .5 G3 C.9 Total G l G2 G3 Total 34.60 41.20 0.20 76.00 34.60 41.20 0.20 76.00 34.60 41.20 0.20 76.00 82% 84% 10% 82% 82% 84% 10% 82% 82% 84% 10% 82% 30.00 38.60 0.20 68.80 19.60 42.40 0.20 62.20 30.00 38.60 0.20 68.80 6 71% 79% 10% 74% 47% 87% 10% 67% 71% 79% 10% 74% 27.40 44.60 0.20 72.20 28.00 41.40 0.20 69.60 27.40 44.60 0.20 72.20 8 65% 91% 10% 78% 67% 84% 10% 75% 65% 91% 10% 78% 34.00 36.80 0.00 70.80 37.80 45.60 0.00 83.40 34.00 36.80 0.00 70.80 10 81% 75% 0% 76% 90% 93% 0% 90% 81% 75% 0% 76% 32.60 47.20 0.20 80.00 32.60 47.20 0.20 80.00 36.80 40.00 0.40 77.20 12 78% 96% 10% 86% 78% 96% 10% 86% 88% 82% 20% 83% 39.00 46.40 0.40 85.80 36.40 42.60 0.40 79.40 36.00 44.20 0.60 80.80 14 93% 95% 20% 92% 87% 87% 20% 85% 86% 90% 30% 87% 37.40 45.80 0.00 83.20 29.40 35.00 0.00 64.40 33.20 40.00 0.00 73.20 16 89% 93%. 0% 89% 70% 71% 0% 69% 79% 82% 0% 79% Table 1.15: Classification and prediction performance of the backpropagation models on the training data set of the Finance with Math 140 & 141 option 4 138 Hidden Learning Rate 0.1 G.5 C.9 G2 G3 T o t a l G l G2 G3 T o t a l G 1 G2 G3 T o t a l Gl 4.40 4.00 0.00 8.40 4.40 4.00 0.00 8.40 4.40 4.00 0.00 8.40 4 88% 50% 0% 53% 88% 50% 0% 53% 88% 50% 0% 53% 2.00 4.00 0.00 6.00 1.20 5.00 0.20 6.40 2.00 4.00 0.00 6.00 6 40% 50% 0% 38% 24% 63% 7% 40% 40% 50% 0% 38% 2.00 4.00 0.20 6.20 3.00 5.20 0.00 8.20 2.00 4.00 0.20 6.20 8 40% 50% 7% 39% 60% 65% 0% 51% 40% 50% 7% 39% 3.00 4.00 0.00 7.00 3.20 5.60 0.00 8.80 3.00 4.00 0.00 7.00 10 60% 50% 0% 44% 64% 70% 0% 55% 60% 50% 0% 44% 3.20 6.00 0.20 9.40 3.20 6.00 0.20 9.40 3.60 3.40 0.00 7.00 12 64% 75% 7% 59% 64% 75% 7% 59% 72% 43% 0% 44% 3.40 5.80 0.00 9.20 2.80 6.00 0.20 9.00 4.00 4.60 0.00 8.60 14 68% 73% 0% 58% 56% 75% 7% 56% 80% 58% 0% 54% 3.60 5.20 0.00 8.80 2.40 5.40 0.00 7.80 3.60 4.80 0.00 8.40 16 72% 65% 0% 55% 48% 68% 0% 49% 72% 60% 0% 53% Table 1.16: Classification and prediction performance of the backpropagation models on the validation data set of the Finance with Math 140 & 141 option Nodes Hidden Learning Rate 0.9 0.5 0.1 G 1 G2 G3 T o t a l G 1 G2 G3 T o t a l G 1 G2 G3 T o t a l 18.60 23.80 0.20 42.60 6.00 28.20 0.40 34.60 13.20 26.60 0.40 40.20 4 44% 49% 10% 46% 14% 58% 20% 37% 31% 54% 20% 43% 16.60 22.60 0.00 39.20 9.20 32.60 0.20 42.00 9.20 24.60 0.60 34.40 6 40% 46% 0% 42% 22% 67% 10% 45% 22% 50% 30% 37% 15.00 23.00 0.00 38.00 17.80 23.60 0.00 41.40 19.80 20.20 0.00 40.00 8 36% 47% 0% 41% 42% 48% 0% 45% 47% 41% 0% 43% 24.20 20.00 0.00 44.20 13.40 25.00 0.40 38.80 15.80 27.40 0.00 43.20 10 58% 41% 0% 48% 32% 51% 20% 42% 38% 56% 0% 46% 26.00 23.80 0.00 49.80 17.20 21.80 0.00 39.00 18.60 22.40 0.40 41.40 12 62% 49% 0% 54% 41% 44% 0% 42% 44% 46% 20% 45% 26.80 24.80 0.00 51.60 15.40 27.80 0.00 43.20 17.80 24.00 0.20 42.00 14 64% 51% 0% 55% 37% 57% 0% 46% 42% 49% 10% 45% 30.00 23.60 0.00 53.60 14.00 27.00 0.20 41.20 18.60 27.60 0.20 46.40 16 71% 48% 0% 58% 33% 55% 10% 44% 44% 56% 10% 50% Table 1.17: Classification and prediction performance of the learning vector quantization models on the training data set of the Finance with Math 140 & 141 option Nodes 139 Learning Rate Hidden Nodes 0.1 Gl G2 0 .5 G3 Total Gl G2 G3 0.9 Total Gl G2 G3 Total 2.40 4.20 1.00 7.60 0.80 3.60 1.40 5.80 1.20 4.60 0.20 6.00 4 48% 53% 33% 48% 16% 45% 47% 36% 24% 58% 7% 38% 2.00 3.40 0.60 6.00 1.00 5.60 0.20 6.80 1.00 3.40 0.60 5.00 6 40% 43% 20% 38% 20% 70% 7% 43% 20% 43% 20% 31% 2.40 3.60 0.60 6.60 1.80 5.60 0.20 7.60 1.80 2.00 0.40 4.20 8 48% 45% 20% 41% 36% 70% 7% 48% 36% 25% 13% 26% 2.80 4.80 0.60 8.20 1.80 4.60 0.20 6.60 1.80 4.20 0.00 6.00 10 56% 60% 20% 51% 36% 58% 7% 41% 36% 53% 0% 38% 3.00 4.60 0.40 8.00 2.00 4.00 0.60 6.60 2.80 3.20 0.40 6.40 12 60% 58% 13% 50% 40% 50% 20% 41% 56% 40% 13% 40% 2.80 5.40 0.20 8.40 1.20 5.20 0.00 6.40 2.00 3.80 0.40 6.20 14 56% 68% 7% 53% 24% 65% 0% 40% 40% 48% 13% 39% 3.00 5.40 0.20 8.60 1.80 4.00 0.00 5.80 2.80 5.00 0.20 8.00 16 60% 68% 7% 54% 36% 50% 0% 36% 56% 63% 7% 50% Table IJ 8: Classification and prediction performance of the learning vector quantization models ()n the validation data set of the Finance with Math 140 & 141 option Learning Rate Hidden Nodes 0 .5 0.1 Gl G2 G3 Total G1 G2 G3 0.9 Total G1 G2 G3 Total 23.60 18.00 0.80 42.40 23.60 18.00 0.80 42.40 23.60 18.00 0.80 42.40 4 91% 75% 40% 82% 91% 75% 40% 82% 91% 75% 40% 82% 24.00 12.80 0.80 37.60 18.40 17.20 0.40 36.00 24.00 12.80 0.80 37.60 6 92% 53% 40% 72% 71% 72% 20% 69% 92% 53% 40% 72% 19.40 21.20 0.40 41.00 18.60 21.20 1.00 41.60 19.40 21.20 0.40 41.00 8 75% 88% 20% 79% 72% 88% 50% 80% 75% 88% 20% 79% 25.00 19.20 1.00 45.20 23.80 16.20 0.60 40.60 25.00 19.20 1.00 45.20 10 96% 80% 50% 87% 92% 68% 30% 78% 96% 80% 50% 87% 17.00 22.80 0.40 40.20 17.00 22.80 0.40 40.20 19.60 23.20 1.00 43.80 12 65% 95% 20% 77% 65% 95% 20% 77% 75% 97% 50% 84% 25.60 23.80 1.20 50.60 24.20 23.20 1.60 49.00 20.60 23.80 1.60 46.00 14 98% 99% 60% 97% 93% 97% 80% 94% 79% 99% 80% 88% 24.80 23.60 0.60 49.00 24.80 15.80 1.00 41.60 24.60 23.20 1.20 49.00 16 95% 98% 30% 94% 95% 66% 50% 80% 95% 97% 60% 94% Table 1.19: Classification and prediction performance of the backpropagation models on the training data set of the Finance with Math 100 & 101 option 140 Learning Rate Hidden Nodes 0 .1 G 1 G 2 0.9 0 .5 G 3 Total G l G 2 G 3 Total G l G 2 G 3 Total 1.80 1.80 1.60 0.20 N/A 1.80 1.60 0.20 N/A 1.60 0.20 N/A 4 80% 10% N/A 45% 80% 10% N/A 45% 80% 10% N/A 45% 1.40 0.60 N/A 2.00 1.20 1.00 N/A 2.20 1.40 0.60 N/A 2.00 6 70% 30% N/A 50% 60% 50% N/A 55% 70% 30% N/A 50% 1.60 1.60 0.80 N/A 2.40 1.60 0.80 N/A 2.40 1.00 0.60 N/A 8 80% 40% N/A 60% 50% 30% N/A 40% 80% 40% N/A 60% 1.20 1.20 1.20 0.60 N/A 1.80 0.80 0.40 N/A 0.80 0.40 N/A 10 40% 20% N/A 30% 60% 30% N/A 45% 40% 20% N/A 30% 1.40 1.20 N/A 2.60 1.40 1.20 N/A 2.60 1.20 0.80 N/A 2.00 12 70% 60% N/A 65% 70% 60% N/A 65% 60% 40% N/A 50% 1.60 1.60 1.40 0.80 N/A 2.20 1.00 0.60 N/A 1.40 0.20 N/A 14 70% 10% N/A 40% 70% 40% N/A 55% 50% 30% N/A 40% 1.80 1.20 1.00 N/A 2.20 1.60 0.80 N/A 2.40 1.00 0.80 N/A 16 60% 50% N/A 55% 80% 40% N/A 60% 50% 40% N/A 45% Table 1.20: Classification and prediction performance of the backpropagation models on the validation data set of the Finance with Math 100 & 101 option Learning Rate Hidden Nodes G2 G 3 0.9 0 .5 0 .1 G l Total G l G 2 G 3 Total G 1 G 2 G 3 Total 14.20 4.80 0.60 19.60 15.80 5.60 0.40 21.80 17.00 3.00 0.60 20.60 4 55% 20% 30% 38% 61% 23% 20% 42% 65% 13% 30% 40% 12.80 5.60 0.40 18.80 13.40 10.40 0.60 24.40 12.20 8.20 0.40 20.80 6 49% 23% 20% 36% 52% 43% 30% 47% 47% 34% 20% 40% 11.00 9.20 0.80 21.00 13.20 8.00 0.20 21.40 9.00 10.20 0.00 19.20 8 42% 38% 40% 40% 51% 33% 10% 41% 35% 43% 0% 37% 13.60 11.00 0.60 25.20 13.60 9.40 0.20 23.20 11.60 9.40 0.00 21.00 10 52% 46% 30% 48% 52% 39% 10% 45% 45% 39% 0% 40% 14.00 9.80 0.00 23.80 12.00 11.60 0.40 24.00 12.40 9.00 0.20 21.60 12 54% 41% 0% 46% 46% 48% 20% 46% 48% 38% 10% 42% 13.40 10.20 0.20 23.80 11.80 12.00 0.40 24.20 14.20 8.00 0.20 22.40 14 52% 43% 10% 46% 45% 50% 20% 47% 55% 33% 10% 43% 15.00 9.00 0.00 24.00 14.60 11.00 0.20 25.80 13.60 10.60 0.60 24.80 16 58% 38% 0% 46% 56% 46% 10% 50% 52% 44% 30% 48% Table 1.21: Classification and prediction performance of the learning vector quantization models on the training data set of the Finance with Math 100 & 101 option 141 Learning Rate Hidden Nodes 4 6 8 10 12 14 16 0.1 Gl G2 1.20 60% 0.40 20% 0.60 30% 0.60 30% 0.60 30% 1.00 50% 0.60 30% 0.40 20% 1.20 60% 0.5 G3 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.40 20% 1.00 50% 1.20 60% 1.40 70% 1.80 90% Total Gl G2 1.80 45% 0.80 20% 1.60 40% 1.80 45% 2.00 50% 1.20 60% 0.20 10% 0.80 40% 0.80 40% 0.80 40% 0.80 40% 1.00 50% 0.80 40% 1.00 50% 1.40 70% 2.80 70% 1.60 40% 0.40 20% 0.60 30% 1.00 50% 1.20 60% 0.9 G3 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Total 1.40 35% 1.80 45% 1.60 40% 1.80 45% 2.20 55% 1.40 35% 1.80 45% Gl 1.20 60% 0.40 20% 0.20 10% 1.00 50% 1.00 50% 1.40 70% 1.40 70% G2 G3 Total 0.60 30% 0.80 40% 0.20 10% 0.80 40% 0.80 40% 0.60 30% 0.40 20% N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 1.80 45% 1.20 30% 0.40 10% 1.80 45% 1.80 45% 2.00 50% N/A N/A 1.80 45% Table 1.22: Classification and prediction performance of the learning vector quantization models on the validation data set of the Finance with Math 100 & 101 option Learning Rate Hidden Nodes 0.1 Gl 12 2.80 47% 14 3.80 63% 16 4.60 77% 70.00 100% 6 8 10 0.9 0.5 G3 69.80 0.80 100% 2 7 % 68.40 0.40 9 8 % 13% 70.00 1.40 100% 4 7 % 70.00 0.80 100% 2 7 % 69.60 1.40 99% 47% 70.00 1.20 100% 4 0 % 4 1.20 20% 1.20 20% 4.60 77% 2.00 33% G2 2.20 73% Total G 71.80 91% 70.00 89% 76.00 96% 72.80 92% 76.80 97% 1.20 20% 2.20 37% 3.20 53% 4.00 67% 2.60 43% 3.80 63% 3.60 60% 73.80 93% 75.00 95% 1 G2 G3 Total G 69.80 100% 69.20 99% 70.00 100% 70.00 100% 0.80 27% 0.80 27% 2.00 67% 1.60 53% 70.00 100% 70.00 100% 1.20 40% 0.80 27% 71.80 91% 72.20 91% 75.20 95% 75.60 96% 73.80 93% 74.60 94% 1.20 20% 1.60 27% 4.00 67% 4.20 70% 2.80 47% 3.80 63% 70.00 100% 1.80 60% 75.40 95% 4.00 67% 1 G2 G3 69.80 0.80 100% 2 7 % 64.00 1.00 9 1 % 33% 60.40 0.60 86% 2 0 % 70.00 1.80 100% 6 0 % 69.60 1.40 99% 47% 70.00 0.80 100% 2 7 % 69.80 1.60 100% 53% Total 71.80 91% 66.60 84% 65.00 82% 76.00 96% 73.80 93% 74.60 94% 75.40 95% Table 1.23: Classification and prediction performance of the backpropagation models on the training data set of the Marketing with Math 140 & 141 option 142 Learning Rate Hidden Nodes (1.9 C.1 0.5 G2 G3 Total G2 G3 Total G l G2 G3 Total G l Gl 0.00 9.80 N/A 9.80 0.00 9.80 N/A 9.80 0.00 9.80 N/A 9.80 4 0% 89% N/A 82% 0% 89% N/A 82% 0% 89% N/A 82% 0.00 10.20 N/A 10.20 0.00 9.60 N/A 9.60 0.00 9.00 N/A 9.00 6 0% 93% N/A 85% 0% 87% N/A 80% 0% 82% N/A 75% 0.00 9.80 N/A 9.80 0.00 8.40 N/A 8.40 0.00 8.60 N/A 8.60 8 0% 89% N/A 82% 0% 76% N/A 70% 0% 78% N/A 72% 0.00 10.80 N/A 10.80 0.00 9.40 N/A 9.40 0.00 9.80 N/A 9.80 10 0% 98% N/A 90% 0% 85% N/A 78% 0% 89% N/A 82% 0.00 8.60 N/A 8.60 0.00 9.40 N/A 9.40 0.00 8.60 N/A 8.60 12 0% 78% N/A 72% 0% 85% N/A 78% 0% 78% N/A 72% 0.00 9.80 N/A 9.80 0.00 11.00 N/A 11.00 0.00 11.00 N/A 11.00 14 0% 89% N/A 82% 0% 100% N/A 92% 0% 100% N/A 92% 0.20 10.40 N/A 10.60 0.00 9.80 N/A 9.80 0.00 10.00 N/A 10.00 16 20% 95% N/A 88% 0% 89% N/A 82% 0% 91% N/A 83% Table 1.24: Classification and prediction performance of the backpropagation models on the validation data set of the Marketing with Math 140 & 141 option Learning Rate Hidden Nodes 0.9 0.1 0.5 G2 G3 Total G 1 G2 G3 Total G 1 G2 G3 Total G l 0.40 54.60 0.80 55.80 0.40 58.60 0.40 59.40 0.00 59.40 0.20 59.60 13 7% 78% 27% 71% 7% 84% 13% 75% 0% 85% 7% 75% 0.20 54.60 0.20 55.00 0.20 61.20 0.00 61.40 0.20 62.80 0.00 63.00 14 3% 78% 7% 70% 3% 87% 0% 78% 3% 90% 0% 80% 0.40 59.60 0.00 60.00 0.20 59.20 0.00 59.40 0.20 61.20 0.20 61.60 15 7% 85% 0% 76% 3% 85% 0% 75% 3% 87% 7% 78% 0.00 60.20 0.20 60.40 0.40 66.60 0.00 67.00 0.20 61.40 0.40 62.00 16 0% 86% 7% 76% 7% 95% 0% 85% 3% 88% 13% 78% Table 1.25: Classification and prediction performance of the learning vector quantization models on the training data set of the Marketing with Math 140 & 141 option Hidden Nodes 0.1 Learning Rate 0.5 0.9 G2 G3 Total G 1 G2 G3 Total Gl G2 G3 Total G l 0.20 8.80 N/A 9.00 0.00 9.60 N/A 9.60 0.00 8.80 N/A 8.80 13 20% 80% N/A 75% 0% 87% N/A 80% 0% 80% N/A 73% 0.20 10.40 N/A 10.60 0.00 9.80 N/A 9.80 0.00 9.40 N/A 9.40 14 20% 95% N/A 88% 0% 89% N/A 82% 0% 85% N/A 78% 0.00 9.20 N/A 9.20 0.00 9.00 N/A 9.00 0.00 9.20 N/A 9.20 15 0% 84% N/A 77% 0% 82% N/A 75% 0% 84% N/A 77% 0.00 10.20 N/A 10.20 0.00 10.40 N/A 10.40 0.00 9.80 N/A 9.80 16 0% 93% N/A 85% 0% 95% N/A 87% 0% 89% N/A 82% Table 1.26: Classification and prediction performance of the learning vector quantization models on the validation data set of the Marketing with Math 140 & 141 option 143 Hidden Learning Rate 0.1 0.5 0.9 G2 Gl G3 T o t a l G l G2 G3 T o t a l G 1 G2 G3 T o t a l 1.40 13.00 N/A 14.40 1.40 13.00 N/A 14.40 2.00 13.00 N/A 15.00 4 70% 100% N/A 96% 70% 100% N/A 96% 100% 100% N/A 100% 0.80 13.00 N/A 13.80 0.00 12.80 N/A 12.80 1.00 13.00 N/A 14.00 6 40% 100% N/A 92% 0% 98% N/A 85% 50% 100% N/A 93% 1.20 13.00 N/A 14.20 1.40 13.00 N/A 14.40 1.20 12.60 N/A 13.80 8 60% 100% N/A 95% 70% 100% N/A 96% 60% 97% N/A 92% 1.80 12.80 N/A 14.60 1.20 12.60 N/A 13.80 1.60 13.00 N/A 14.60 10 90% 98% N/A 97% 60% 97% N/A 92% 80% 100% N/A 97% 1.80 12.80 N/A 14.60 1.60 13.00 N/A 14.60 1.00 13.00 N/A 14.00 12 90% 98% N/A 97% 80% 100% N/A 97% 50% 100% N/A 93% 2.00 13.00 N/A 15.00 1.20 13.00 N/A 14.20 1.20 13.00 N/A 14.20 14 100% 100% N/A 100% 60% 100% N/A 95% 60% 100% N/A 95% 2.00 12.80 N/A 14.80 1.20 13.00 N/A 14.20 1.80 13.00 N/A 14.80 16 100% 98% N/A 99% 60% 100% N/A 95% 90% 100% N/A 99% Table 1.27: Classification and prediction performance of the backpropagation models on the training data set of the Marketing with Math 100 & 101 option Nodes Hidden Learning Rate 0.5 0.1 G2 G3 Gl G3 T o t a l G 1 G2 N/A 1.00 N/A 1.00 N/A 1.00 N/A 4 N/A 100% N/A 100% N/A 100% N/A N/A 0.80 N/A 0.80 N/A 1.00 N/A 6 N/A 80% N/A 80% N/A 100% N/A N/A 1.00 N/A 1.00 N/A 1.00 N/A 8 N/A 100% N/A 100% N/A 100% N/A N/A 1.00 N/A 1.00 N/A 1.00 N/A 10 N/A 100% N/A 100% N/A 100% N/A N/A 1.00 N/A 1.00 N/A 0.80 N/A 12 N/A 100% N/A 100% N/A 80% N/A N/A 1.00 N/A 1.00 N/A 1.00 N/A 14 N/A 100% N/A 100% N/A 100% N/A N/A 1.00 N/A 1.00 N/A 1.00 N/A 16 N/A 100% N/A 100% N/A 100% N/A Table 1.28: Classification and prediction performance the validation data set of the Marketing with Math 100 Nodes 0.9 G 1 G2 G3 T o t a l 1.00 1.00 N/A 1.00 N/A 100% N/A 100% N/A 100% 1.00 1.00 N/A 1.00 N/A 100% N/A 100% N/A 100% 1.00 1.00 N/A 1.00 N/A 100% N/A 100% N/A 100% 1.00 1.00 N/A 1.00 N/A 100% N/A 100% N/A 100% 1.00 0.80 N/A 1.00 N/A N/A 100% N/A 100% 80% 1.00 1.00 N/A 1.00 N/A 100% N/A 100% N/A 100% 1.00 1.00 N/A 1.00 N/A 100% N/A 100% N/A 100% of the backpropagation models on & 101 option Total 144 L e a r n i n g Rate II.5 Hidden Nodes 0 .9 0 .1 G2 G3 T o t a l G 1 G2 G3 T o t a l G 1 G2 G3 T o t a l Gl 1.00 13.00 N/A 14.00 0.60 13.00 N/A 13.60 0.40 12.80 N/A 13.20 8 50% 100% N/A 93% 30% 100% N/A 91% 20% 98% N/A 88% 1.00 13.00 N/A 14.00 0.20 12.40 N/A 12.60 0.20 11.80 N/A 12.00 10 50% 100% N/A 93% 10% 95% N/A 84% 10% 91% N/A 80% 1.00 13.00 N/A 14.00 0.40 13.00 N/A 13.40 0.20 12.40 N/A 12.60 12 50% 100% N/A 93% 20% 100% N/A 89% 10% 95% N/A 84% 0.80 13.00 N/A 13.80 0.40 12.80 N/A 13.20 0.20 12.00 N/A 12.20 14 40% 100% N/A 92% 20% 98% N/A 88% 10% 92% N/A 81% 1.60 13.00 N/A 14.60 0.80 11.40 N/A 12.20 0.40 11.60 N/A 12.00 16 80% 100% N/A 97% 40% 88% N/A 81% 20% 89% N/A 80% Table 1.29: Classification and prediction performance of the learning vector quantization models on the training data set of the Marketing with Math 100 & 101 option L e a r n i n g Rate Hidden Nodes 0 .9 0 .1 0 .5 G2 G3 T o t a l G 1 G2 G3 T o t a l G 1 G2 G3 T o t a l G l 1.00 1.00 N/A 1.00 N/A N/A 1.00 N/A 1.00 N/A 1.00 N/A 8 N/A 100% N/A 100% N/A N/A 100% N/A 100% N/A 100% 100% 1.00 N/A 0.80 N/A 0.80 1.00 N/A LOO N/A N/A 1.00 N/A 10 N/A 100% N/A 100% N/A 100% N/A 100% N/A 80% N/A 80% 1.00 1.00 N/A 1.00 N/A N/A 1.00 N/A 1.00 N/A 1.00 N/A 12 N/A 100% N/A 100% N/A 100% N/A 100% N/A 100% N/A 100% 1.00 N/A 0.60 N/A 0.60 N/A 1.00 N/A 1.00 N/A 1.00 N/A 14 N/A 100% N/A 100% N/A 100% N/A 100% N/A 60% N/A 60% N/A 1.00 N/A 1.00 N/A 0.60 N/A 0.60 N/A 0.80 N/A 0.80 16 N/A 100% N/A 100% N/A 60% N/A 60% N/A 80% N/A 80% Table 1.30: Classification and prediction performance of the learning vector quantization models on the validation data set of the Marketing with Math 100 & 101 option The following tables show all detailed figures regarding the A N O V A tests of significant differences between the correct classification performance of the backpropagation models and that of the L V Q models. Each table corresponds to a particular specialization track with either a training or validation data set. There are a total of 12 different tables for this section. Source T e r m Sum o f Squares df Mean Square F-Ratio Probability Level Power ( A l p h a = 0.05) 1.000000 0.000000* 634.33 1 27157.71 Between Groups 27157.71 1712.522 40 42.81305 Within Groups 41 28870.24 Total (Adjusted) (* Significant at alpha = 0.05) Table 1.31: Analysis of Variance for the training data set of the Accounting with Math 140 & 141 option 145 Source Term Sum of Squares df Mean Square F-Ratio Probability Level Power (Alpha = 0.05) Between Groups 45.67714 1 45.67714 70.15 0.000000* 1.000000 Within Groups 26.04571 40 0.6511428 Total (Adjusted) 71.72285 41 (* Significant at alpha = 0.05) Table 1.32: Analysis of Variance for the validation data set of the Accounting with Math 140 & 141 option Source Term Sum of Squares df Mean Square F-Ratio Probability Level Power (Alpha = 0.05) 1 0.000000* 1.000000 Between Groups 5914.347 5914.347 373.45 Within Groups 633.4781 40 15.83695 41 Total (Adjusted) 6547.825 (* Significant at alpha = 0.05) Table 1.33: Analysis of Variance for the training data set of the Accounting with Math 100 & 101 option Source Term Sum of Squares df Mean Square F-Ratio Probability Level Power (Alpha = 0.05) 1.000000 Between Groups 25.30381 1 25.30381 64.82 0.000000* 15.61524 40 0.3903809 Within Groups Total (Adjusted) 40.91905 41 (* Significant at alpha = 0.05) Table 1.34: Analysis of Variance for the validation data set of the Accounting with Math 100 & 101 option Source Term Sum of Squares df Mean Square F-Ratio Probability Level Power (Alpha = 0.05) 1.000000 1 348.38 0.000000* Between Groups 11139.43 11139.43 Within Groups 1278.983 40 31.97457 12418.41 41 Total (Adjusted) (* Significant at alpha = 0.05) Table 1.35: Analysis of Variance for the training data set of the Finance with Math 140 & 141 option Source Term Sum of Squares df Mean Square F-Ratio Probability Level Power (Alpha = 0.05) 0.865502 0.003158* Between Groups 13.48667 1 13.48667 9.87 1.366476 Within Groups 54.65905 40 68.14571 41 Total (Adjusted) (* Significant at alpha = 0.05) Table 1.36: Analysis of Variance for the validation data set of the Finance with Math 140 & 141 option 146 Power Sum of Mean Probability df F-Ratio Level (Alpha = 0.05) Squares Square 1.000000 1 4422.881 429.81 0.000000* Between Groups 4422.881 411.6152 Within Groups 10.29038 40 41 Total (Adjusted) 4834.496 (* Significant at alpha = 0.05) Table 1.37: Analysis of Variance for the training data set of the Finance with Math 100 & 101 option Source Term Power Probability Sum of Mean F-Ratio df Level (Alpha = 0.05) Squares Square 4.32 0.044013* 0.527778 Between Groups 0.8571429 1 0.8571429 Within Groups 7.927619 40 0.1981905 41 Total (Adjusted) 8.784761 (* Significant at alpha = 0.05) Table 1.38: Analysis of Variance for the validation data set of the Finance with Math 100 & 101 option Source Term Power Mean Probability Sum of df F-Ratio Level (Alpha = 0.05) Squares Square 1.000000 133.60 0.000000* Between Groups 1261.87 1 1261.87 292.8062 9.445361 Within Groups 31 32 Total (Adjusted) 1554.676 (* Significant at alpha = 0.05) Table L39: Analysis of Variance for the training data set of the Marketing with Math 140 & 141 option Source Term Power Mean Probability Sum of F-Ratio df (Alpha = 0.05) Level Square Squares 0.074405 0.1125974 0.22 0.639576 Between Groups 0.1125974 1 Within Groups 15.60619 31 0.5034255 32 Total (Adjusted) 15.71879 Table 1.40: Analysis of Variance for the validation data set of the Marketing with Math 140 & 141 option Source Term Power Probability Sum of Mean F-Ratio df (Alpha = 0.05) Level Square Squares 0.998313 25.38 0.000015* 1 11.2767 Between Groups 11.2767 15.10552 34 0.4442801 Within Groups 26.38222 35 Total (Adjusted) (* Significant at alpha = 0.05) Table 1.41: Analysis of Variance for the training data set of the Marketing with Math 100 & 101 option Source Term 147 Sum of Mean Probability Power df F-Ratio Squares Square Level (Alpha = 0.05) Between Groups 0.0325079 1 2.94 0.0325079 0.095699** 0.384255 Within Groups 0.376381 34 0.01107003 Total (Adjusted) 0.4088889 35 (* * Significant at alpha = 0.10) Table 1.42: Analysis of Variance for the validation data set of the Marketing with Math 100 & 101 option Source Term
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Classification capabilities of neural networks : a...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Classification capabilities of neural networks : a comparative study using student academic performance Prompibalcheep, Sansern Art 1999
pdf
Page Metadata
Item Metadata
Title | Classification capabilities of neural networks : a comparative study using student academic performance |
Creator |
Prompibalcheep, Sansern Art |
Date Issued | 1999 |
Description | Among the emerging information technologies, neural networks have been increasingly recognized as a powerful method for classifying and predicting complex data. There have been a number of neural network paradigms being developed. Each paradigm has its own specific features that are applicable to particular tasks. The most popular neural network paradigm among users in the management area is the backpropagation. This paradigm has been extensively tested and proven to outperform traditional techniques on several classification tasks. However, there have been only a few studies determining the comparative capabilities of the backpropagation paradigm to other paradigms that are potentially applicable to the same task. The main purpose of this thesis research is to investigate capabilities and performance of two neural network paradigms: backpropagation and learning vector quantization (LVQ). The other purpose is to prove that neural networks outperform a traditional technique of ordered probit model, which is used as a performance benchmark. In this study, the two neural network paradigms and the ordered probit model are utilized to classify and predict the academic performance of UBC Commerce students. For each paradigm, a number of neural network models with distinct configurations are developed. The first investigation determines how well each paradigm performs in classifying and predicting academic success. The results from running those models on both training and validation samples show that the backpropagation paradigm significantly performs better than the LVQ paradigm in most instances. The second investigation compares the best performance of those paradigms with the performance of ordered probit model. After utilizing the ANOVA to test the statistical significance of difference in prediction performance, the findings show that both backpropagation and LVQ paradigms have higher performance levels than ordered probit model. However, the difference between the performance of backpropagtion and ordered probit model is significant at only the 90% confidence level. On the other hand, the difference between performances of LVQ and ordered probit model is significant at the much higher level of 95%. Essentially, the study has shown that the backpropagation paradigm, on the average, still outperforms the LVQ paradigm in classifying and predicting complex data. The study has also proven that both backpropagation and LVQ are significantly better prediction techniques than the ordered probit approach. |
Extent | 7179990 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-06-15 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0089017 |
URI | http://hdl.handle.net/2429/9188 |
Degree |
Master of Science - MSc |
Program |
Business Administration - Management Information Systems |
Affiliation |
Business, Sauder School of Management Information Systems, Division of |
Degree Grantor | University of British Columbia |
Graduation Date | 1999-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1999-0286.pdf [ 6.85MB ]
- Metadata
- JSON: 831-1.0089017.json
- JSON-LD: 831-1.0089017-ld.json
- RDF/XML (Pretty): 831-1.0089017-rdf.xml
- RDF/JSON: 831-1.0089017-rdf.json
- Turtle: 831-1.0089017-turtle.txt
- N-Triples: 831-1.0089017-rdf-ntriples.txt
- Original Record: 831-1.0089017-source.json
- Full Text
- 831-1.0089017-fulltext.txt
- Citation
- 831-1.0089017.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0089017/manifest