User Modeling and Data Mining in Intelligent Educational Games: Prime Climb a Case Study by Alireza Davoodi M.Sc. Simon Fraser University, 2011 M.Sc. Amirkabir University of Technology (Tehran Polytechnic), 2008 B.Sc. Amirkabir University of Technology (Tehran Polytechnic), 2005 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2013 ? Alireza Davoodi, 2013 ii Abstract Educational games are designed to leverage students? motivation and engagement in playing games to deliver pedagogical concepts to the players during game play. Adaptive educational games, in addition, utilize students? models of learning to support personalization of learning experience according to students? educational needs. A student?s model needs to be capable of making an evaluation of the mastery level of the target skills in the student and providing reliable base for generating tailored interventions to meet the user?s needs. Prime Climb, an adaptive educational game for students in grades 5 or 6 to practice number factorization related skill, provides a test-bed for research on user modeling and personalization in the domain of education games. Prime Climb leverages a student?s model using Dynamic Bayesian Network to implement personalization for assisting the students practice number factorization while playing the game. This thesis presents research conducted to improve the student?s model in Prime Climb by detecting and resolving the issue of degeneracy in the model. The issue of degeneracy is related to a situation in which the model?s accuracy is at its global maximum yet it violates conceptual assumptions about the process being modeled. Several criteria to evaluate the student?s model are introduced. Furthermore, using educational data mining techniques, different patterns of students? interactions with Prime Climb were investigated to understand how students with higher prior knowledge or higher learning gain behave differently compared to students with lower prior knowledge and lower learning gain. iii Preface This thesis is based on the research studies conducted under supervision of Dr. Cristina Conati in UBC?s Intelligent User Interface Laboratory at the Department of Computer Science. A version of Chapter 4 was published and presented in Educational Data Mining 2013 conference: ?Davoodi, A. & Conati, C. (2013). Degeneracy in Student Modeling with Dynamic Bayesian Networks in Intelligent Edu-Games. EDM 2013 (to appear)? A version of Chapter 5 was published and presented in two conferences: ?Davoodi, A., Kardan, S. & Conati, C. (2013). Mining User?s Behaviors in Intelligent Educational Games: Prime Climb a Case Study. EDM 2013 (to appear)? ?Davoodi, A., Kardan, S. & Conati, C. (2013). Understanding User?s Interaction Behavior with an Intelligent Educational Game: Prime Climb. AIED Workshops 2013 (to appear)? In all above papers, I was responsible for processing all data and generating all figures and tables and writing the manuscripts. iv Table of Contents Abstract .................................................................................................................................... ii ?Preface ..................................................................................................................................... iii ?Table of Contents ................................................................................................................... iv ?List of Tables ........................................................................................................................ viii ?List of Figures ....................................................................................................................... xiii ?Acknowledgements .............................................................................................................. xvi ?Dedication ............................................................................................................................ xvii ?1 Chapter: Introduction ...................................................................................................... 1 ?2 Chapter: Review on Prime Climb?s Student Model and User Studies ........................ 6 ?2.1 ? Prime Climb ............................................................................................................................... 6 ?2.2 ? Magnifying Glass Tool in Prime Climb .................................................................................... 7 ?2.3 ? Student Modeling and Pedagogical Agent in Prime Climb ....................................................... 8 ?2.3.1 ? Prime Climb Model?s Structure ......................................................................................... 9 ?2.3.2 ? Prime Climb Model?s Parameters .................................................................................... 11 ?2.3.3 ? Original Rollup Procedure in PC-DBN ............................................................................ 14 ?2.4 ? Hinting Strategy in Prime Climb ............................................................................................. 15 ?2.5 ? Experiments and Data Collection ............................................................................................ 17 ?2.5.1 ? Experiment 1: Old Prime Climb (Computer-based Version) ........................................... 18 ?2.5.2 ? Experiment 2: New Prime Climb (Web-based Version) .................................................. 19 ?3 Chapter: Literature Review ........................................................................................... 22 ? v 3.1 ? A Review on User Modeling by Bayesian Knowledge Tracing ............................................. 22 ?3.1.1 ? Standard Bayesian Knowledge Tracing ........................................................................... 23 ?3.1.2 ? Evaluation Techniques of Bayesian Knowledge Tracing Model ..................................... 26 ?3.1.3 ? Evaluation of the Model?s Parameters in Knowledge Tracing ........................................ 31 ?3.1.4 ? Issues with Bayesian Knowledge Tracing ....................................................................... 35 ?3.1.5 ? Knowledge Tracing Model?s Parameters Estimation ...................................................... 41 ?3.1.6 ? Summary and Conclusion ................................................................................................ 58 ?3.2 ? Review on Behaviors Discovery in Educational Systems ....................................................... 59 ?4 Chapter: User Modeling in Dynamic Bayesian Networks: Prime Climb a Case Study................................................................................................................................................. 64 ?4.1 ? Evaluation Measures of the PC-DBN Student Model ............................................................. 65 ?4.1.1 ? External Validity of Student Model: End Accuracy of PC-DBN Student Model ............ 65 ?4.1.2 ? Internal Validity of Student Model: Real-Time Accuracy of PC-DBN Student Model .. 67 ?4.1.3 ? Other Measures to Evaluate External Validity of Prime Climb?s Model ........................ 68 ?4.2 ? Evaluation of the Student Model?s Parameters in Prime Climb .............................................. 70 ?4.2.1 ? Applicability of BKT Model?s Parameters Plausibility for Prime Climb ........................ 70 ?4.2.2 ? Model?s Parameters Plausibility for Prime Climb ........................................................... 72 ?4.3 ? Evaluation of the Hinting Strategy in Prime Climb ................................................................ 73 ?4.3.1 ? Simulation of the Intervention Mechanism Using the Original Threshold Setting ......... 75 ?4.4 ? Issues with Prime Climb DBN Student Model ........................................................................ 77 ?4.4.1 ? Identifiability in Prime Climb DBN Student Model ........................................................ 77 ?4.4.2 ? Degeneracy in Prime Climb DBN Student Model ........................................................... 79 ?4.5 ? Prime Climb DBN Model?s Parameters Fitting ...................................................................... 84 ?4.5.1 ? Original Prime Climb DBN Model .................................................................................. 85 ?4.5.2 ? Bounded Prime Climb DBN Model ............................................................................... 103 ? vi 5 Chapter: Mining User Interactions Behaviors in Prime Climb ............................... 122 ?5.1 ? Behavior Discovery in Prime Climb ..................................................................................... 123 ?5.2 ? Class Association Mining ...................................................................................................... 126 ?5.3 ? Features, Measures, Data Points and Datasets ...................................................................... 127 ?5.3.1 ? Interaction Logs and Datasets ........................................................................................ 128 ?5.3.2 ? Features Definitions ....................................................................................................... 128 ?5.3.3 ? Feature Set Definition .................................................................................................... 132 ?5.3.4 ? Measures ........................................................................................................................ 135 ?5.4 ? Behavior Discovery with Full-Features Sets ......................................................................... 139 ?5.5 ? Behavior Discovery with Truncated-Features Sets ............................................................... 146 ?5.6 ? Behavior Discovery with Mixed Datasets ............................................................................. 152 ?5.7 ? Conclusion and Future Works ............................................................................................... 159 ?6 Chapter: Conclusion and Future Work ...................................................................... 161 ?Bibliography ........................................................................................................................ 165 ?Appendices ........................................................................................................................... 173 ?Appendix A Clustering Results for Different Features Sets ........................................... 173 ?A.1 ? Understanding Patterns of Interactions in Groups with Different Prior Knowledge Using Full-Features Sets ............................................................................................................................ 173 ?A.2 ? Understanding Patterns of Interactions in Groups with Different Learning Gain Using Full-Features Sets ................................................................................................................................... 178 ?A.3 ? Understanding Patterns of Interactions in Groups with Different Prior Knowledge Using Truncated-Features Sets .................................................................................................................. 181 ?A.4 ? Understanding Patterns of Interactions in Groups with Different Learning Gain Using Truncated-Features Sets .................................................................................................................. 202 ? vii Appendix B Pre-Test ........................................................................................................... 218 ?Appendix C Post-Test ......................................................................................................... 220 ?Appendix D Implementation and Setup in Prime Climb ................................................ 222 ?D.1 ? New Rollup Strategy in Prime Climb?s Student Model Prime Climb .................................. 222 ?D.2 ? Game Architecture and Design ............................................................................................. 228 ?D.3 ? Starting a Prime Climb Game and Playing the Game .......................................................... 237 ?D.4 ? Setting up a New Prime Climb ............................................................................................. 238 ?D.5 ? Issues and Troubleshooting .................................................................................................. 241 ? viii List of Tables Table 2-1: Dependency Performance Parameters. The Conditional Probability Table of ClickXY node ........................................................................................................................... 12 ?Table 2-2: Max: Dependency Learning Parameter in Prime Climb ....................................... 12 ?Table 2-3: Values of the thresholds used in hinting strategy in Prime Climb ........................ 16 ?Table 3-1: Three hypothetical KT models (Knowledge model, Guess model and Reading Tutor model) ........................................................................................................................... 36 ?Table 3-2: Degeneration in Dirichlet prior, baseline and bounded models [5] ...................... 50 ?Table 3-3: Set of features extracted from interaction log data ................................................ 56 ?Table 4-1 Confusion matrix for hinting strategy .................................................................... 77 ?Table 4-2: Identifiability in Prime Climb DBN Student Model ............................................. 78 ?Table 4-3: CPT of Click Node ................................................................................................ 82 ?Table 4-4: Set of optimal model's parameters for original Prime Climb DBN model ........... 85 ?Table 4-5: Original Prime Climb DBN model?s End Accuracy ............................................. 86 ?Table 4-6: Real-time Accuracy in Original Prime Climb model ............................................ 87 ?Table 4-7: PostTest-Model Assessment Correlation in Original Prime Climb ...................... 89 ?Table 4-8: SSE in Original PC-DBN model ........................................................................... 89 ?Table 4-9: Evaluation of original Prime Climb student model for degeneracy ...................... 90 ?Table 4-10: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy .............................................................................................................................. 90 ?Table 4-11: Average number of patterns of degeneracy across students in original Prime Climb DBN model for Population prior probability. .............................................................. 91 ? ix Table 4-12: Average Degeneracy results across students when generic prior probability is used. ........................................................................................................................................ 93 ?Table 4-13: Average number of patterns of degeneracy in original Prime Climb DBN model when Userspecific prior probability is used. ........................................................................... 95 ?Table 4-14: Confusion matrix for hinting mechanism for Original PC-DBN model with Population prior ...................................................................................................................... 98 ?Table 4-15: Confusion matrix for hinting mechanism for Original Prime Climb model with Generic prior ........................................................................................................................... 98 ?Table 4-16: Confusion matrix for hinting mechanism for Original PC-DBN model with Userspecific prior .................................................................................................................... 98 ?Table 4-17: the skill analysis table for all numbers on the pre-test and post-test for the population model .................................................................................................................. 100 ?Table 4-18: Correlation results regarding plausibility based on correlation between within game performance and learning in population student model .............................................. 101 ?Table 4-19: Optimal dependency model parameter in bounded PC-DBN student model .... 104 ?Table 4-20: Accuracy results for the bounded Prime Climb DBN model ............................ 105 ?Table 4-21: Average/STD of accuracy results across students for original and bounded model with population prior ............................................................................................................. 106 ?Table 4-22: Average/STD of accuracy results across students for original and bounded model with generic prior ....................................................................................................... 107 ?Table 4-23: Average/STD of accuracy results across students for original and bounded model with user-specific prior ......................................................................................................... 107 ?Table 4-24: Area Under the Curve of ROC-Curves of the bounded PC-DBN model .......... 107 ? x Table 4-25: Model?s real-time accuracy for bounded PC-DBN Model ............................... 109 ?Table 4-26: PostTest-Model correlation assessment in Bounded Prime Climb ................... 109 ?Table 4-27: SSE for Bounded PC-DBN model .................................................................... 110 ?Table 4-28: Evaluation of Bounded Prime Climb student model for degeneracy ................ 112 ?Table 4-29: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy ............................................................................................................................ 113 ?Table 4-30: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy ............................................................................................................................ 114 ?Table 4-31: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy ............................................................................................................................ 115 ?Table 4-32: Confusion matrix for hinting mechanism for Bounded PC-DBN model with Population prior .................................................................................................................... 117 ?Table 4-33: Confusion matrix for hinting mechanism for Bounded PC-DBN model with Generic prior ......................................................................................................................... 117 ?Table 4-34: Confusion matrix for hinting mechanism for Bounded PC model with Userspecific prior .................................................................................................................. 117 ?Table 5-1: Movements-related features extracted from interaction logs .............................. 129 ?Table 5-2: Magnifying glass (MG) related features ............................................................. 131 ?Table 5-3: Descriptive statistics on the pre-test and post-test scores of the 43 subjects ...... 136 ?Table 5-4: Clustering results on Full-Mountains-Generic-Movements Dataset .................. 140 ?Table 5-5: Associative rules extracted from 2 clusters on full-mountains-generic-movements dataset ................................................................................................................................... 141 ?Table 5-6: Selected Features for full-mountains-generic-movements .................................. 141 ? xi Table 5-7: Clustering results on Full-Mountains-Generic+Specific-Movements Dataset ... 143 ?Table 5-8: Associative rules extracted from 2 clusters on full-mountains-generic+specific-movements dataset ................................................................................................................ 144 ?Table 5-9: Selected Features for full-mountains-generic+specific-movements ................... 144 ?Table 5-10: Clustering results on Truncated-Mountains[2]-Generic+Specific-Movements Dataset ................................................................................................................................... 147 ?Table 5-11: Associative rules extracted from 2 clusters on full-mountains[2]-generic+specific-movements dataset .................................................................................... 148 ?Table 5-12: Selected Features for Truncated-Mountains[2]-Generic+Specific-Movements 148 ?Table 5-13: Clustering results on Truncated-Mountains[4]-Generic+Specific-Movements Dataset ................................................................................................................................... 150 ?Table 5-14: Associative rules extracted from 2 clusters on full-mountains[4]-generic+specific-movements dataset .................................................................................... 150 ?Table 5-15: Clustering results on Truncated-Mountains[2]-Generic+Specific-Movements Dataset ................................................................................................................................... 151 ?Table 5-16: Associative rules extracted from 2 clusters on truncated-mountains[2]-generic-movements dataset ................................................................................................................ 151 ?Table 5-17: Selected Features for Truncated-Mountains[2]-Generic+Specific-Movements 152 ?Table 5-18: Clustering results on Full-Mountains-Generic+Truncated-Mountains[1-2]-Specific Dataset .................................................................................................................... 154 ?Table 5-19: Associative rules extracted from 2 clusters on truncated-mountains-generic+mountains[1-2]-specific dataset .............................................................................. 154 ? xii Table 5-20: 38 selected Features for full-mountain-generic+truncated-mountains[1-2]-specific .................................................................................................................................. 155 ?Table 5-21: Clustering and comparison results on clusters? prior knowledge results on Full-Mountains-Generic+Truncated-Mountains[1-2 and 8-9]-Specific Dataset .......................... 156 ?Table 5-22: Associative rules extracted from 2 clusters on truncated-mountains-generic+mountains[1-2 and 8-9]-specific dataset ................................................................. 157 ?Table 5-23: Clustering and comparison results on clusters? PLG results on Full-Mountains-Generic+Truncated-Mountains[1-2 and 8-9]-Specific Dataset ............................................ 157 ?Table 5-24: Associative rules extracted from 2 clusters on truncated-mountains-generic+mountains[1-2 and 8-9]-specific dataset ................................................................. 158 ?Table 5-25: 41 selected features for full-mountain-generic+truncated-mountains[1-2 and 8-9]-specific ............................................................................................................................. 158 ? xiii List of Figures Figure 1-1: Framework of user modeling and personalization in Prime Climb as an adaptive system ....................................................................................................................................... 2 ?Figure 2-1: Snapshot of a mountain, hint, MG tool and the two players in Prime climb ......... 6 ?Figure 2-2: Structure of the Click node .................................................................................. 10 ?Figure 2-3: Population and Generic prior probabilities for 16 factorization skills (numbers) 14 ?Figure 2-4: Original rollup procedure in Prime Climb model ................................................ 14 ?Figure 2-5: Hinting strategy in Prime Climb .......................................................................... 17 ?Figure 2-6: Ranges for number of movements and number of students for each range ......... 21 ?Figure 2-7: Number of levels played by students ................................................................... 21 ?Figure 3-1: The performance curve (left) and learning curve (right) of the three hypothetical KT models for a skill [18]. ...................................................................................................... 38 ?Figure 4-1: Knowledge transfer from one skill to other skill ................................................. 79 ?Figure 4-2: Click node structure ............................................................................................. 82 ?Figure 4-3: Degeneracy in Prime Climb DBN student model when population prior parameter is used. .................................................................................................................... 91 ?Figure 4-4: Average number of different patterns of degeneration across students when Population prior is used. ......................................................................................................... 92 ?Figure 4-5: Average percentage number of different patterns of degeneration across students when Population prior is used. ................................................................................................ 92 ?Figure 4-6: Degeneracy in Prime Climb DBN student model when Generic prior parameter is used. ........................................................................................................................................ 93 ?Figure 4-7: Average number of degeneracy when Generic prior probability is used ............. 94 ? xiv Figure 4-8: Average percentage number of degeneracy when Generic prior probability is used ......................................................................................................................................... 94 ?Figure 4-9: Degeneracy in Prime Climb DBN student model when Userspecific prior parameter is used ..................................................................................................................... 96 ?Figure 4-10: Average number of degeneracy when Userspecific prior probability is used. .. 96 ?Figure 4-11: Average percentage number of degeneracy when Userspecific prior probability is used...................................................................................................................................... 97 ?Figure 4-12: Plausibility Based on Correlation between Within Game Performance and Learning in Population student model .................................................................................. 103 ?Figure 4-13: Original and Bounded Prime Climb Student Model with Population Prior .... 105 ?Figure 4-14: Original and Bounded Prime Climb Student Model with Generic Prior ......... 105 ?Figure 4-15: Original and Bounded Prime Climb Student Model with User-specific Prior 106 ?Figure 4-16: AUC of ROC curve for the population PC-DBN models (red: Original, blue: Bounded) ............................................................................................................................... 108 ?Figure 4-17: AUC of ROC curve for the generic PC-DBN models (red: Original, blue: Bounded) ............................................................................................................................... 108 ?Figure 4-18: AUC of ROC curve for the user-specific PC-DBN models (red: Original, blue: Bounded) ............................................................................................................................... 108 ?Figure 4-19: Comparison of the models? correlation between post-test and model?s assessment ............................................................................................................................. 110 ?Figure 4-20: Comparison of the models? SSE ...................................................................... 111 ?Figure 4-21: Degeneration in Bounded PC-DBN model with population prior setting ....... 112 ? xv Figure 4-22: Average number of degeneration across students in population bounded PC-DBN model ........................................................................................................................... 113 ?Figure 4-23: Degeneration in Bounded PC-DBN model with generic prior setting ............. 113 ?Figure 4-24: Average number of degeneracy cases in Bounded PC-DBN model with generic prior setting ........................................................................................................................... 114 ?Figure 4-25 Degeneration in Bounded PC-DBN model with userspecific prior setting ...... 114 ?Figure 4-26 Average cases of degeneracy in Bounded PC-DBN model with userspecific prior setting .................................................................................................................................... 115 ?Figure 4-27: Comparison of degeneracy in Original and Bounded PC-DBN model ........... 116 ?Figure 4-28: Comparing performance of the hinting mechanism in Original and Bounded models with population prior probability setting .................................................................. 118 ?Figure 4-29: Comparing performance of the hinting mechanism in Original and Bounded models with userspecific prior probability setting ................................................................ 119 ?Figure 4-30: Comparing performance of the hinting mechanism in Original and Bounded models with generic prior probability setting ....................................................................... 120 ?Figure 5-1: Behavior Discovery Framework in Prime Climb .............................................. 124 ?Figure 5-2: pre-test scores distribution across the subjects .................................................. 137 ?Figure 5-3: post-test scores distribution across the subjects ................................................. 137 ?Figure 5-4: All combinations of different features types (pre: prior knowledge, PLG: percentage of learning gain) .................................................................................................. 139 ? xvi Acknowledgements ?Experience is what you get when you didn't get what you wanted. And experience is often the most valuable thing you have to offer.? Randy Pausch, The Last Lecture I would like to convey my gratitude to my supervisor, Professor Cristina Conati, for her advice and supervision and Professor David Poole for his feedback on the thesis. I was fortunate to collaborate with some individuals from the Computer Science Department of the University of British Columbia, Samad Kardan, Mary Muir and my other friends in Computational Intelligence and Intelligent User Interface laboratories. I also would like to thank the GRAND NCE, the DiGLT project, for funding this project as well as the the Department of Computer Science at the University of British Columbia for providing me with academic and financial supports during my study. And the last but not the least, I need to extend my most appreciation to my wife, Elham, my family and all my friends for their encouragement, support and patience. xvii Dedication I dedicate this thesis in honor of my best friend forever, my lovely wife, Elham, who has always stood with me and patiently dealt with the difficult and challenging situations to help our dreams all come true. 1 1 Chapter: Introduction Assisting people to acquire the desired knowledge and skills while engaging in a game, distinguishes digital educational games [54, 55, 56, 57, 58] from traditional video games. Digital educational games (henceforth referred to as edu-games) integrate game design concepts with pedagogical techniques in order to more appropriately address the learning needs of the new generation, which highly regards ?doing rather than knowing?. Effectiveness of the game aspects of an edu-game depends on how well the game is capable of keeping the player in the affective states known to be influential in learning such as motivation [59, 60]. On the other hand, from a pedagogical perspective, an educational game needs to embed the pedagogical contexts in the game?s scenario and narrative in such a way that the interactions of the player with the game eventually result in learning gains. While there exists promising evidence on the effectiveness of edu-games in keeping the players in affective states such as motivation and engagement [61], there still does not exist reliable evidence on usefulness of the edu-games in assisting the players to learn the target knowledge and skills [26]. Adaptive edu-games as a sub-domain of the edu-games aim to support tailored interactions with the player during game-play and have been proposed as an alternative solution for the one-size-fits-all approach used in designing non-adaptive edu-games [62]. Prime Climb is an adaptive edu-game which helps students practice number factorization knowledge (factorizing a number to its factors) while participating in a 2-player game. This game provides a test-bed for conducting research on adaptation in edu-games. Similar to any other type of adaptive systems, an adaptive edu-game follows the general framework of adaptive systems as shown in Figure 1-1. In Prime Climb, a student interacts with the game by playing 2 the game, using available tools and attending to the messages from an embedded pedagogical agent (More on Prime Climb in Chapter 2). Prime Climb uses Dynamic Bayesian Network (DBN) to build a user model. The interaction data collected from the user is used to train the model?s parameters and structure. The user?s model maintains and provides an assessment of student?s knowledge on target skills (number factorization related skills) during and at the end of the interaction. The model?s assessment of student?s mastery level on desired skills during the game play is leveraged by a pedagogical agent to provide the student with personalized supports. To facilitate learning during game-play, Prime Climb employs a heuristic strategy based on the user model to provide personalized supports in the form of varying types of hints at incremental level of details. Similarly, the model?s assessment of student?s knowledge on target skills (e.g. number factorization skills in PC) at the end of the Figure 1-1: Framework of user modeling and personalization in Prime Climb as an adaptive system 3 game predicts the student?s performance on similar problems outside the game environment on a post test. An accurate user?s model is the main component of a system which adapts to the user. Understanding limitations, issues and determining different ways of learning the parameters of a Bayesian Network student model from interaction data is of significant importance. This thesis presents the work on improving the user?s model in Prime Climb through capturing the limitation and problems with user modeling in Prime Climb and proposing solutions to deal with the issues. Contributions and Motivations The main contributions of the thesis are of two folds: 1. Detecting and resolving the issue of degeneracy in student?s model in Prime Climb: The issue of degeneracy is defined as a situation in which the parameters of a parametric student model are estimated such that the model has the highest performance (is at its global maximum) with respect to some standard measure of accuracy, yet it violates the conceptual assumptions of the process being modeled [5]. Degeneracy happens, for instance, if a student?s model does not properly update its assessment of how much a student has mastered a skill given student?s observable performance on that skill. In this thesis, we describe the issue of degeneracy in Prime Climb and how it affects student modeling in Prime Climb. An approach to significantly decreasing the extent of degeneracy is proposed and the resulting model is compared to the degenerated model using several metrics. In addition a comprehensive review on a common student modeling approach, called Bayesian Knowledge Tracing is presented with the purpose of 4 providing a comparison study between the student modeling in Bayesian Knowledge Tracing and student modeling using Dynamic Bayesian Network in Prime Climb. 2. Applying educational data mining approaches to understand different users? behaviors patterns in Prime Climb: Understanding how users with varying attributes (i.e. higher learner, higher knowledgeable) behave differently with Prime Climb, could further support personalization in an adaptive educational game. To this end, we used data mining approaches such as clustering and rule-mining to understand how users with higher prior knowledge on the domain on which the game has been built, behave differently from students with lower prior knowledge about the domain. In addition, we looked for clusters of users who learned more or less from Prime Climb and their distinguishing interaction patterns. The results of this research can be utilized in two ways: 1- Someone could encourage the users to follow interaction patterns which could result in higher learning and prevent them from performing actions not contributing enough in learning the target skills. 2- The patterns of interaction can be utilized to improve the student model and intervention mechanism. For instance if we know that the student is acting more similarly to those with lower prior knowledge or to those who will learn less from playing Prime Climb, it could be possible to leverage such information to adjust the parameters of the student model or the adaptive intervention mechanism to provide more individualized student model and intervention mechanism. Structure of the Thesis As shown in Figure 1-1, the rest of the thesis is organized as following. Chapter 2 describes Prime Climb game. Chapter 3 provides an overview on user studies conducted to collect data from students. Chapter 4 presents a comprehensive review study on Bayesian Knowledge 5 Tracing, its evaluation, issues and variations. Chapter 5 discusses the user modeling in Prime Climb, its evaluation, issues and variations. Chapter 6 presents the work on behavior discovery in Prime Climb using educational data mining approaches and Chapter 7 provides conclusion and proposes some future works. 6 2 Chapter: Review on Prime Climb?s Student Model and User Studies 2.1 Prime Climb Prime Climb is a 2-player game designed for kids in grades 5 or 6 to practice factorization concepts such as number factorization and common factor. The game comprises of a series of mountains of numbered hexagons. Currently the game contains 12 mountains starting from smaller mountains of numbers simpler to factorize to bigger mountains of more difficult-to-factorize numbers. Once a player reaches the top of a mountain, both players proceed to the next level (mountain) of the game. In Prime Climb, the players do not compete to reach the top of the mountains but cooperate to climb the mountain together. The game is not a turn-based game and a player can make more than one move at each time. As shown in Figure 2-1, each mountain contains a set of numbered hexagons on which the players can click to Figure 2-1: Snapshot of a mountain, hint, MG tool and the two players in Prime climb 7 move to. The two players are represented by simple iconic characters (the green and red players in Figure 2-1), and connected to each other with a rope which restricts the players from going too far away (more than two hexagons) from each other. The players climb the mountains by moving to numbers which do not share a common factor. A player?s movement to a numbered hexagon is considered a wrong movement if the number shares at least a factor with the number the partner is on. Once a wrong movement is made, the player who made the movement falls down and starts swinging from the rope. The swinging player then has three hexagons to start from (which are hexagon in the lower layers of the mountain). If the player is less than two levels from the bottom, both players will restart at the bottom of the mountain. To guide the player in selecting which numbers are available to move to, the possible numbered hexagons are highlighted. The height of mountains ranges from 5 to 10 hexagons. Within the first 6 mountains the size of the mountains doubles and the mountains only contain one or two digits numbers. Higher level mountains (from level 7 on) also contain 3-digit numbers. 2.2 Magnifying Glass Tool in Prime Climb Prime Climb contains a tool called, Magnifying Glass (MG for brevity) which helps the students see the factors tree of a number. The MG tool is available throughout the game and the players can use it with no limitation. In Figure 2-1, the MG tool is shown in the upper right corner of the figure. In order to use the MG tool, a player clicks on the magnifying glass symbol on the device and then clicks on the number she wants to view. Figure 2-1shows a use of the MG tool on number 27. This number (27) is first decomposed to 3 and 9 (27 = 3*9) and recursively 3 and 9 are decomposed into its factors. There are different ways to 8 build a factor tree. The most balanced factor tree is selected to be presented by MG tool to simplify the tree visualization on the MG device. 2.3 Student Modeling and Pedagogical Agent in Prime Climb Each player in Prime Climb has a pedagogical agent which receives data from user?s interaction with the game and updates the student model based on the given evidence. It then uses the student model to provide unsolicited tailored supports in the form of hints to the player as she plays the game. In the current version of Prime Climb, the pedagogical agent is represented by a static monkey figure in the top center of the game. Once the agent generates a message (a hint) it provides the user with the opportunity to ask for more hints. In Prime Climb, the pedagogical agent uses a heuristic approach, based on the assessment of the student model, to decide on what and when to provide hints to the player. Generally, the pedagogical agent intervenes once it believes that the player needs to receive a hint on a skill because the student model gives a low probability that the skill is known. The hints might be given after a wrong or correct move as long as the model?s assessment of the student?s knowledge about the skill practiced in the move shows that the student has not yet mastered the skill. A hint is always given on the most recent movement, so it does not provide information on the next movement to be made. As previously mentioned, Prime Climb is equipped with a probabilistic student?s model which infers students? factorization knowledge from the interactions and actions made by the students when playing the game. One challenge to make a connection between observable students? performance (e.g. a movement from one number on the mountain to another number) to latent student?s factorization knowledge is that the actions and interactions made 9 by the students are not unambiguous and certain evidence of knowledge. For example, if a student makes a correct action, it could be evidence of her knowledge or it could be simply due to a lucky guess. Similarly, in the case of an incorrect action, the action either could indicate that the student did not know the required knowledge to make a correct action or the incorrect action could be simply due to an error due to distraction. To deal with this uncertainty, Prime Climb uses a Dynamic Bayesian Network, DBN, to model the students? latent level of factorization knowledge. In Prime Climb, each mountain is assigned a DBN student model so there are 11 DBNs which share several nodes. The initial prior belief of a node in the DBN of mountain [i] is affected by the final posterior belief of the node in the DBN of mountain [i-1] if the node is shared between the two DBN models. From now on we refer to these models as Prime Climb DBN student model (PC-DBN student model). 2.3.1 Prime Climb Model?s Structure A PC-DBN, models the evolution of student?s factorization knowledge while the student is interacting with Prime Climb. In order to model such evolution during a period of time, a DBN consists of time slices representing relevant temporal states in the process being modeled. Each time a slice is created once a student makes a movement (climbs a mountain). As previously mentioned, in Prime Climb there is a DBN for each mountain. At each given time, the model represents its assessment of factorization knowledge of the student up to that specific time therefore a DBN is called a short-time student model. Each short-time PC-DBN model contains (but not limited to) the following random binary variables [26]: Factorization ? Nodes ? (F): ? Each factorization node, Fx, represents whether the student has mastered factorization of number x down to its prime factors. The PC-DBN student model 10 contains factorization nodes for each number on the mountain as well as some of the number?s factors. Common ?Factor ?Node ?(CF): ?there is only one CF node representing whether the student has mastered the concept of common factor between two numbers. Click ? Nodes ? (Click): ? A click node is associated to each movement the student makes and models the probability of the correctness of a student?s click on number x on the mountain. Magnification ?Nodes ?(Mag): Magx node denotes using the magnifying glass on number x. Such a model aims at assessing the probability that the student knows the factorization of a number on the mountain as well as the probability the student understands the concept of common factors. A number factorization skill is referred to as the student?s ability to factorize a number to its all factors. Similarly the common factor skill is defined as the student?s knowledge on the concept of common factors of two numbers. Prime Climb DBN model applied a causal structure over click nodes. Once the student makes an action, a click node is added as a child of three nodes: 1) Fx: the Factorization node corresponding to the number the player is on 2) Fy: the Factorization node corresponding to the number the partner is on and 3) CF: the common factor node. Therefore, these three nodes are conditionally dependent to each other given evidence on the click node. This is important because such a causal structure makes it possible to implement apportion of blame when the student makes an incorrect movement. Figure 2-2 shows this structure. Fx FY CF ClickX Figure 2-2: Structure of the Click node 11 As mentioned, the short-term PC-DBN uses time slices to model evolution in student?s factorization knowledge. Since it is computationally expensive to keep a large number of time slices, at most two time slices in the DBN are maintained at any given time. In order to keep track of time slices which are removed, a process known as rollup or filtering is applied. During a rollup procedure the posterior probabilities of the time slices which are being removed are saved and transferred to the current time slice [26]. The rollup procedure is briefly described in Section 2.3.3. 2.3.2 Prime Climb Model?s Parameters There are two types of model?s parameters in the student model in Prime Climb: 1) Prior Probabilities Parameters 2) Dependencies Parameters Dependency Parameters: The student model in Prime Climb is a parametric model. The model comprises of four parameters referred to as dependency parameters as described below. The dependency parameters are of two types: 1) Performance Dependency Parameters and 2) Learning Dependency Parameter. The performance dependency parameters specify how the student?s model updates its assessment from the student latent level of knowledge as a result of a performance (a movement). The performance dependency parameters are as following: ? Slip: Defined similar to the Slip parameter in Bayesian Knowledge Tracing [1]. The Slip shows the probability of making a wrong action on a problem step when the student knows the corresponding knowledge. 12 ? Guess: Defined similar to the Guess parameter in Knowledge Tracing. The Guess shows the probability of making a correct action on a problem step when the student does not have knowledge of every of the corresponding skill. ? Edu-Guess: This parameter is defined to account for the possibility that when the student knows the factorization of one of the two numbers involved in the click, it may be easier for her to guess correctly. The following table represents the conditional probability table (CPT) of ClickX node. Table 2-1: Dependency Performance Parameters. The Conditional Probability Table of ClickXY node CF=Known CF=Unknown Fx = Known Fx = Unknown Fx = Known Fx =Unknown Fy=Known Fy=Unknown Fy=Known Fy=Unknown Fy=Known Fy=Unknown Fy=Known Fy= Unknown 1-Slip Edu-Guess Edu-Guess Guess Guess Guess Guess Guess P(ClickXY= Correct) P(ClickXY = Correct) The learning dependency parameter shows how knowledge on a skill is transmitted to other skills. There is only one learning dependency parameter in student?s model, called Max. ? Max: As shown in Table 2-2, Max is a coefficient between 0 and 1 in the formula (???? ??? ) to calculate the probability of a student making a correct move proportional to number of its known parents. (P and PK are numbers of parents and known parents of FX respectively). See Figure 2-3 for the model?s graph. Table 2-2: Max: Dependency Learning Parameter in Prime Climb FZ PriorY P(FX = Known) Known Known 1 Known Unknown ??????? Unknown Known 1 Unknown Unknown 0 13 Prior Probability Parameters: The initial probability of a student knowing number factorization and common factor skills are modeled in terms of prior probabilities of the corresponding factorization and common factor nodes in the network. Prime Climb uses three possible types for prior probabilities to model students? prior knowledge as following: ? Generic: In generic prior probability the probability of a student knowing a factorization skill is initially set to 0.5. When there is no ground truth on students? initial level of factorization knowledge, the Generic prior probability is used ? Population: The population prior probability is calculated based on performance of a group of students in a pre-test. The pre-test examines the knowledge of students on specific number factorization skills. In Prime Climb, there are 16 questions in the pre-test related to number factorization skills that were used to calculate the population prior probabilities in the students. To calculate the population prior probabilities, the pre-tests scores of 45 students who played Prime Climb were used. Figure 2-3 shows the population prior probabilities for 16 factorization skills. ? Userspecific: The Userspecific prior probability is set up specific to each student based on her performance in a pre-test session. If a student has correctly responded a number factorization question in the pre-test, the prior probability that the student knows the corresponding factorization skill is set to 0.9 otherwise it is set to 0.1. The assumption is that a correct answer to a factorization question in the pre-test shows a high probability (e.g. 0.9) that student has the corresponding factorization skill. We did not set this probability to 1 because it could be updated based on the student?s performance as she plays the game. 14 2.3.3 Original Rollup Procedure in PC-DBN PriorX FZ P(FX=known) K K 1 K U 1 U K ??????? U U 0 Slice(ti-1) Slice(ti) Figure 2-4: Original rollup procedure in Prime Climb model Every factorization node in PC-DBN model is either a root or non-root node. To support the rollup procedure, a new node called prior node is introduced. Basically, there is a prior node associated with every non-root factorization node. The rollup procedure is as following: Slice(ti-1): The student makes a movement and a Click node is added to the network. Based on whether the movement is a valid or invalid movement, corresponding evidence is observed on ClickXY node. Due to casual structure of the Click node, the network beliefs on Figure 2-3: Population and Generic prior probabilities for 16 factorization skills (numbers) 0 ?0.2 ?0.4 ?0.6 ?0.8 ?1 ?9 ? 15 ? 25 ? 14 ? 33 ? 31 ? 36 ? 27 ? 30 ? 49 ? 11 ? 97 ? 89 ? 42 ? 81 ? 88 ?Probabili? ?Numbers ?on ?pre ?and ?post ?tests ?Prior ?Probabili?s ?Popula?on ?Generic ?Fx FY CF ClickXY Fz Priorx Fx FY CF Fz Priorx 15 student knowing factorization of number X, Y and concept of common factor (CF) changed. Since adding and keeping a Click node for each movement without immediate removing of the Click node would make the model too big to be used for modeling in real-time scenarios (like during game play), the Click node is removed once its impact is tracked and maintained. Slice(ti): Before a Click node (added in Slice (ti-1) is removed, the posterior probability of FX, FY and CF nodes should be transferred to time Slice(ti). To this end, if a node (e.g CF, FX or FY) is a root node (like CF and FY), the CPT of the node is directly altered to reflect the posterior probability of the node. If the node is a non-root node (for instance node FX), the CPT of the node in the new slice is set up such that knowing the factorization in the previous time slice implies knowing the factorization in the current slice (i.e. forgetting is not modeled). To this end, the CPT of the corresponding prior node changed in Slice(ti) to reflect the posterior probability of its corresponding child node (which is a factorization node) in Slice(ti-1). As shown in Figure 2-4: Original rollup procedure in Prime Climb model , the CPT of a non-root factorization node FX indicates that the probability of the node being known is 0 when all its parent nodes are unknown, and increases proportionally with the number of known parents to a maximum of max, the probability that the student can infer the factorization of x by knowing the factorization of its parent nodes. The Max parameter is defined in the following section. 2.4 Hinting Strategy in Prime Climb A true intervention strategy in an adaptive educational game insures pedagogical effectiveness by providing decent tailored supports when required while not intervene more than enough which might negatively affect the user?s engagement in the game. The intervention mechanism in Prime Climb has been developed prior to this thesis, provides 16 different types of hints during the interaction of the student with the game. The hinting strategy in Prime Climb utilizes the student?s model?s assessment of the student number factorization and common factor knowledge during the game-play to provide adaptive supports in terms of hints on unknown skills. To decide on when to intervene, the hinting strategy uses four thresholds namely: 1) Fact-CorrectMove, 2) Fact-WrongMove, 3) CF-CorrectMove and 4) CF-WrongMove. The first two thresholds, 1 and 2, determine the values, used to evaluate a number factorization (Fact) skill as known, after a correct and wrong movement respectively. Similarly, the last two thresholds, 3 and 4, are used to assess the common factor (CF) skill as known or unknown immediately after a correct ore wrong movement. A human-adjusted approach has been applied to find an original setting for the four aforementioned thresholds in the intervention strategy in Prime Climb [63]. To this end, subsequent to choosing some initial values for each of the thresholds, some graduate students played the game and their reports on timing the hints were used to adjust the initial values for the thresholds [63]. Table 2-3 shows the final values selected for each of the thresholds. Table 2-3: Values of the thresholds used in hinting strategy in Prime Climb Threshold Final value Fact-CorrectMove 0.5 Fact-WrongMove 0.8 CF-CorrectMove 0.1 CF-WrongMove 0.5 Figure 2-5 shows how these thresholds are used in the intervention mechanism in Prime Climb to decide when and on what skill to provide hints. 17 Figure 2-5: Hinting strategy in Prime Climb 2.5 Experiments and Data Collection This section focuses on the experiments conducted to collect data from students? interactions with Prime Climb and discusses the issues with the collected data. The data used for the studies presented in this thesis, was collected from two series of experiments. For each of the experiments, kids of grades 5 or 6 were recruited. Before each experiment, the subjects? parents signed the consent form and the subject took a pre-test exam. The pre-test exam examines the knowledge level of students on number factorization concepts. The pre-test contains questions regarding number factorization of 16 numbers including 9, 11, 14, 15, 25, 27, 30, 31, 33, 36, 42, 49, 81, 88, 89 and 97. At the end of the students? interaction with the game, they took a post-test containing exactly the same questions as those included in the 18 pre-test. The post-test was followed by a post-survey which asked the students about their experience on interacting with Prime Climb. During our data analysis, we found that the number 30 never appeared on any mountains in Prime Climb. Therefore, we excluded this number (30) and used fifteen numbers in the pre-test and post-test (all numbers excluding 30) for all evaluations done and presented in this thesis. Also as mentioned earlier, there are 12 mountains of numbers (levels) implemented in Prime Climb yet, the last mountain is not currently used because it was found that the last mountain was very time consuming for students to climb. Therefore, the game currently uses only 11 mountains. 2.5.1 Experiment 1: Old Prime Climb (Computer-based Version) This experiment was conducted during 3 days on site of a local school in Vancouver, British Columbia. Overall, 46 male and female students in grade of 5 or 6 played Prime Climb for almost half an hour (or until they finished the game). During this experiment, each student was matched with an investigator who played the role of the partner. A computer-based (not web-based) version of Prime Climb was used for the experiment. While the subjects were playing the game, the interaction data was logged in text files. Each subject?s corresponding log file contains the following information: ? Movements: For each movement the following data was collected: The timestamp of the movement, the player?s number, the partner?s number, whether the movement is a correct or wrong action and the level (mountain) in which the move has been made. ? Magnifying Glass (MG) usages: Once the player uses the MG tool, the time of usage and the number on which the tool has been used were logged in the file. 19 ? Hints Information: Once a hint is given, the time of presenting the hint, the type of the hint and closing time were collected. Issues with the Collected Data There are several issues in the collected data in the experiment as following: 1. Due to technical issues, Prime Climb game crashed during the game play for 13 (out of 46) students. There was no regular pattern for the crashes so the problem could not be resolved during the limited days of experiment. For these 13 subjects, the game abnormally terminated for more than one time so in all of the 13 cases of termination, the students could not manage to reach higher levels of the game and these 13 students repeated a few levels of the game more than once. We excluded these subjects for all evaluations throughout the thesis so data from 33 subjects from this experiment was used for analysis. Each student is assigned an identifier (id) in our data file. The ids of the students with valid interaction logs collected from the first experiment are as following for future reference: 1, 3, 8, 12, 13, 14, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46. 2. The other main issue pertains to the hinting during the game play in the mentioned experiment. In the old version of Prime Climb, the closing times of the hints were not correctly recorded for all given hints. The hints? information for those given in the first 3-4 mountains were recorded while the information for the other hints was not correctly recorded in log files. 2.5.2 Experiment 2: New Prime Climb (Web-based Version) The second series of experiments was conducted in the laboratory of Intelligent User Interface (IUI) in the Computer Science department at the University of British Columbia. 20 Similar to the first experiment, students of grades 5 and 6 were recruited for the experiment. Because this series of experiments also involved collecting eye gaze data using Tobii T120 eye tracker from users during the interaction, they were all conducted in the IUI lab and not in a local school. The eye tracking data gives some information on how the students attend different areas of interests (mountains, MG tool, hints) as they played the game. Similar to the first experiment, the students played the game with an investigator. For this series of experiment, a web-based version of Prime Climb was used. In order to make the game accessible via internet, Prime Climb was re-implemented as a web-based game. The new game uses the same interface, mountains (levels) and the MG tool compared to the old game. In new Prime Climb, the student?s interaction with the game is logged in an SQL-based database and no text log is generated. Similar to the experiment 1, for each subject, the movements, tool and hints data were logged in the database. In addition, eye-gaze information is also collected. In this thesis, however we do not work with eye gaze information. Totally, 12 students played Prime Climb through the new version of the game. The students? ids of this subjects as recorded in our data file is as following: 53, 54, 55, 56, 57, 58, 84, 85, 86, 87, 89 and 90. In all analyses throughout this thesis we used data from all 45 subjects (except where mentioned explicitly) who played Prime Climb from which 33 students played the old version of Prime Climb and 12 played the new version of Prime Climb. The pre-test and post-test exams and mountains were exactly the same in the old and new Prime Climb and mountain. The same hinting strategy was also used in both experiments and the MG tool was available in both versions. Therefore we integrated the data collected from the two experiments. Figure 2-6 shows numbers of movements subjects made and Figure 2-7 shows 21 number of levels they went through. Figure 2-6 show that there only a few of students (3 in total) who made over 200 movements and all students made over 110 movements. Figure 2-7 shows that 43 students could manage to complete 9 mountains (levels) and more. Figure 2-6: Ranges for number of movements and number of students for each range Figure 2-7: Number of levels played by students 5 ?7 ?8 ?4 ?6 ?12 ?1 ? 1 ? 1 ?0 ?2 ?4 ?6 ?8 ?10 ?12 ?14 ?Numbers ?Ranges ?for ?Number ?of ?Movements ?Number ?of ?students ?0 ? 0 ? 1 ? 0 ? 0 ? 0 ? 0 ? 1 ?3 ?11 ?29 ?0 ?10 ?20 ?30 ?40 ?1 ? 2 ? 3 ? 4 ? 5 ? 6 ? 7 ? 8 ? 9 ? 10 ? 11 ?Number ?of ?students ?Levels ?played ?Levels ?Played ?by ?Students ? 22 3 Chapter: Literature Review This chapter describes the previous studies related to the work presented in this thesis. The literature review section comprises of two main sections: 3.1) A Review on user modeling by Bayesian Knowledge Tracing (BKT) and 3.2) A review on behavior discovery in educational systems. The first part briefly describes the fundamentals of Bayesian Knowledge Tracing, learning BKT from data, evaluation methods of BKT and issues with student modeling in Bayesian Knowledge Tracing as well as the proposed approaches to resolving the issues. The reason for providing this review is that in Chapter 4, we will follow a similar structure to describe student modeling in Prime Climb using Dynamic Bayesian Networks which can be considered as a version of Bayesian Knowledge Tracing, learning the model from data, evaluation methods of the model and the issue with student modeling in Prime Climb and an approach to resolving the issue. In Chapter 4, it will be discussed that whether the evaluations methods in BKT are also applicable to student modeling in Prime Climb and whether or not they have been used in Prime Climb. In addition, it will be investigated whether student modeling in Prime Climb suffers from similar issues from which student modeling with BKT suffers. The second part reviews the works on mining users? interactions with educational systems and in Chapter 5 we will focus on behavior discovery in Prime Climb. 3.1 A Review on User Modeling by Bayesian Knowledge Tracing An intelligent educational system leverages student interactions with the system to conclude about the state of the domain knowledge being practiced and learned by the student. In order to come to such conclusion, an intelligent educational system utilizes a student model which maps the performance data to knowledge states (for instance, whether the student has 23 acquired the knowledge). One of the commonly used approaches to student modeling is Corbett & Anderson?s Bayesian Knowledge Tracing, BKT (or KT), model [1]. The BKT model has been applied to a verity of educational applications such as tutor for mathematics [23], computer programming [1] and reading skill tutor [19]. Bayesian Knowledge Tracing provides a real-time assessment of the student?s knowledge on each domain skill while the student is interacting with the educational system. In the following sections, the BKT, its evaluations, variants and issues will be discussed. 3.1.1 Standard Bayesian Knowledge Tracing Bayesian Knowledge Tracing assumes a two-state learning model with no support for forgetting a previously learned skill [1]. In such two-state model, a skill is either in the learned or unlearned state. An unlearned skill might transfer to the state of learned at each opportunity the student practices the skill. As no forgetting is modeled in BKT, there is no transition of a rule from learned to unlearned state. In BKT, it is also assumed that the student?s correct/wrong performance in applying a skill is the direct consequence of the skill being in the learned/unlearned state; yet there is always the possibility of a student correctly applying a rule without knowing the corresponding skill. This is referred to as probability of guessing. Similarly, the likelihood of a student showing a wrong performance on applying a rule while knowing the underlying skill is called the probability of slipping. The main objective in the framework of Bayesian Knowledge Tracing is maintaining an estimate of the probability of a skill being in the learned state. Following each opportunity to apply a skill, the probability of the rule being in the learned state is re-estimated through using two learning and two performance parameters. Such mechanism follows a variant of the Bayesian computational procedure of Atkinson [24]. According to Atkinson, the 24 probability of transition from an unlearned state to a learned state is independent of whether the student applies the skill correctly or not. In Bayesian Knowledge Tracing the learning and performance parameters are defined as following: ? Definition 3-1: Prior Knowledge, P(L0): The probability that a skill is in the learned state prior to the first opportunity to apply the skill. ? Definition 3-2: Learning rate, P(T): The probability that a rule will make a transition from an unlearned to a learned state following an opportunity to apply the skill. ? Definition 3-3: Guess, P(G): The probability that a student will guess correctly if a skill is in the unlearned state. ? Definition 3-4: Slip, P(S): The probability that a student will make a mistake on applying the skill when the skill is in the learned state. Bayesian Knowledge Tracing uses the above parameters to assess the accuracy; the probability of a correct response (the predicted accuracy: ?? ? ? ? ), at each problem step g (associated with a specific skill r of the student s) in a tutor exercise using the following formula: ? ? ? ? = ?(??) ? 1? ? ?? + 1? ? ?? ? ?(??) (Equation 3-1) Equation 3-1 shows that, the probability (? ? ? ? ) that the student s will perform correctly at goal g is equal to the sum of two products: 1. ?(??) ? ? [?? ?(??)]: The probability that the skill r is in the learned state at nth time to practice skill r in student s times the probability of a correct response (not slipping ?(?? ?(??)) 25 2. [?? ?(??)] ? ?(??): The probability that the skill r is not in the learned state at nth time to practice skill r in the student s times the probability of a correct guess (?(??)) when the skill r is not in the learned state (guessing). Bayesian Knowledge Tracing also keeps updating the probability of student?s mastery on skills as the student practices the skills during interaction with the educational system. The updating procedure is as following: First, given the evidence at time n, the BKT re-estimates the probability that the student knew the corresponding skill at time n-1. In other word the posterior probability of P(Ln-1 | Actionn) is calculated using the following equations: (Note that Actionn is either correct or wrong) ? ????|???????? = ? ???? ?(??? ? )? ???? ? ??? ? ? ??? ???? ??(?) (Equation 3-2) ? ????|?????????? = ? ???? ?? ?? ???? ?? ? ? ??? ???? ?(??? ? ) (Equation 3-3) Once there is an updated estimation of P(Ln-1) for a specific skill, the current probability of knowing the skill is calculated using the following equation: ? ??|??????? = ? ???? ??????? + ?? ? ???? ??????? ? ?(?) (Equation 3-4) In the above equations: ? P(Ln): The probability that a skill is in the learned state following the nth opportunity to apply the skill. ? P(Ln-1|???????): The posterior probability that the rule (skill) at time n-1 was in the learned state given the evidence pertaining to the correctness of nth action. ? P(T): The probability of transferring from an unlearned state to learned state following an opportunity to practice the rule (skill). 26 As mentioned before, according to Atkinson (1972), P(T) is independent from the student performance at the nth opportunity. In other word, the probability of transferring from an unlearned state to a learned state is the same for correctly or incorrectly applying the rule in the nth opportunity to practice the skill. 3.1.2 Evaluation Techniques of Bayesian Knowledge Tracing Model The Bayesian Knowledge Tracing model is evaluated based on the predictive accuracy. The predictive accuracy of the KT model is quantified based on how well the model fits the performance data within the tutor (a.k.a internal validity of BKT) and how well the model is capable of predicting the student?s performance after the tutoring system and when the student is working on her own on a post-test session (a.k.a external validity of BKT). The following sections, describes varying masseurs used in the Knowledge Tracing literature to evaluate the internal and external validity of BKT-based models. 3.1.2.1 Predictive Accuracy Within Tutor: Internal Validity of Knowledge Tracing The internal validity of a Knowledge Tracing model refers to how well the model fits the within-tutor performance data. In other words, the internal validity corresponds to how well the predicted accuracy (Equation 4-1) matches the actual accuracy (whether the student has actually makes a correct or wrong answer to the problem state). In KT, each problem step is associated with a skill and a student either applies the skill correctly or incorrectly. On the other hand, at each problem step, the Knowledge Tracing model keeps an updated estimate of how likely the student has mastered the corresponding skill (using Equation 4-4). Therefore, at each problem step, two values are available: 1. Actual Accuracy: The actual probability of a correct response. This actual accuracy is 1, when the action is correct and is 0 when the action is wrong. 27 2. Expected Accuracy: The knowledge tracing model?s predicted probability of a correct response calculated using Equation 4-4. Mean expected accuracy and mean actual accuracy are respectively, the means of expected accuracy and actual accuracy across students for each skill. It is expected that if the Knowledge Tracing model is successful at predicting the student performance at each step, there should be a strong correlation between the actual and expected accuracies. On the other hand, a successful Knowledge Tracing model avoids systematic overestimation or underestimation of student?s knowledge. The following measures have been used in the literature of BKT to evaluate the internal validity of BKT. Summed Squared Error (SSE) The Summed Squared Error (SSE) in Knowledge Tracing is computed using the following formula [12]: ??? = (?????? ?????????? ?(???????))? (Equation 3-5) In this equation, the actual accuracy is the actual performance of the user on a problem step, whether the user makes a correct answer (actual accuracy = 1) or wrong answer (actual accuracy = 0) on the given problem step. P(Correct) is the predicted accuracy and calculated using Equation 3-1. Area under the Curves (AUC) of ROC Curve A ROC (Receiver Operating Characteristics) curve plots the performance of a binary classifier as its discrimination thresholds varies. It plots the True Positive rate (a.k.a Sensitivity) vs. False Positive rate. A ROC curve is a Machine Learning technique which evaluates how well a binary classifier can successfully distinguish between the positive and negative data. The Area under Curve of a ROC curve is equal to the probability that a 28 classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one. The AUC of the ROC curve can be used to evaluate how well the model fits the performance data. The predictive accuracy of two models can be compared using the AUC of their corresponding ROC curves. Expected and Actual Accuracy Correlation The correlation between the expected and actual accuracy is a measure can be used to evaluate how well model fits the performance data within tutor and outside tutor. To calculate the correlation, the expected accuracy at each problem step and the actual performance is computed and the correlation between them is calculated. If the correlation is a statistical significant strong correlation, it means that the Knowledge Tracing model fits the data well otherwise the model fails making a highly accurate prediction of the student?s performance. R-Squared (R2) R2 is a measure of how well a model performs in terms of the predictive accuracy for each test data points, where the baseline is predicting the data points by just guessing the mean of the sample [20]. The following equation is used to calculate R2. ?? = ?? (????????? ???????????????? ?????????)?(??????????? ??????)? Equation 3-6 The predicted and actual accuracies are calculated as described in the previous section. 3.1.2.2 Predictive Accuracy Outside Tutor: External Validity of Knowledge Tracing The external validity of Knowledge Tracing is defined as the accuracy of the model in predicting the student performance outside the tutoring environment and after receiving the mastery level. Note that in Knowledge Tracing, it is expected that at the end of an interaction 29 session, the student masters all involved skills. When the Knowledge Tracing model assesses a probability of 95% or higher for a student knowing a skill, it is said that the student has mastered the skill. The main difference between the exercise within the tutor and the exercises outside the tutor is that exercises within tutor are split into problem steps and each problem step is associated with one skill. In addition, within the tutor, the solution path from the first to the last problem step is often pre-determined. Similar to within-tutor exercises, the outside tutor exercises also comprise several problem steps but the student is free to follow her own solution path. Since the solution path is not recognizable for outside-tutor exercises, the expected and actual accuracies for a complete exercise (including several problem steps) have to be considered in verifying the external validity of a Knowledge Tracing model. Therefore it is required to somehow calculate the expected and actual accuracy of exercises. Similar to internal validity study, the following statistics are calculated: 1. Exercise actual accuracy: The actual observation of whether the student has completed an exercise correctly. In other word, whether the student has completed the whole exercise correctly or incorrectly. This value is 1 if the student has completed the exercise correctly and is 0, if the answer to the exercise is wrong. Therefore, this does not count individual steps but just the final answer to the problem. 2. Exercise predicted accuracy: The predicted probability of completing an exercise correctly. If an exercise comprises of multiple rules, Ci, the exercise predicted accuracy, the probability of making a correct answer to the exercise, is given by the following equation: 30 ? ???????? ??????? = ??????? = ?(???)?????? ??? (Equation 3-7) In other word, the exercise predicted accuracy for an exercise comprising n skills Ci is calculated as the product of the probability of knowledge on the n skills. Similar to the internal validity study, three statistics are then calculated as following: 1. The correlation of exercise actual and expected accuracy across students. 2. The mean error in prediction, the difference between the exercise predicted accuracy and the exercise actual accuracy across students. 3. The mean absolute error, is the absolute difference between the exercise predicted accuracy and the exercise actual accuracy across students. To verify the external validity, a post-test session was administrated following a final tutoring session in which no individual remedial exercises provided to the students. (Note that in ordinary BKT each student is provided with as many remedial exercises as needed to help the student reaches a specific level of mastery. In evaluating the external validity of BKT, the students interacted with a BKT-based tutor during 5 sessions in such a way that the students received remedial exercises in the first 4 sessions but not in the final session) As a consequence of not presenting the students with tailored exercises, the students did not necessarily master all the skills involved in that tutoring session. The mean exercise actual accuracy and mean exercise predicted accuracy were calculated to be 0.55 and 0.64 respectively. The mean absolute error across students was 0.18 and the correlation between the exercise actual and predicted accuracy was a statistically significant correlation of 0.69. The study showed that the Knowledge Tracing procedure is also capable of predicting the outside-tutor performance of students after the students interacted with the tutor system. 31 3.1.3 Evaluation of the Model?s Parameters in Knowledge Tracing In addition to evaluating the Knowledge Tracing model on the basis of predictive accuracy of the model, it is also possible to directly evaluate the model?s parameters themselves. There exists two direct evaluation ways of model?s parameters in the literature pertaining to Knowledge Tracing. The first approach applies the known facts in the domain on which the tutor system has been developed. The second approach deals with the how reasonable the model?s parameters predicts the performance and learning of a student when interacting with the system. This criterion is referred to as Parameters Plausibility. The following sections discuss how the Knowledge Tracing model?s parameters are directly evaluated. 3.1.3.1 Domain-Specific Facts The domain-specific facts can be used to evaluate Knowledge Tracing model?s parameters. For instance a domain-specific fact in the Reading Tutor is that a student should have higher prior knowledge on the words which appear in the texts more frequently [8]. One issue with domain specific facts is that the known facts in one domain cannot necessarily be generalized to the other domains. In the work on the Reading Tutor [8] one domain-specific fact was the necessity of correlation between the Knowledge Tracing?s learning parameter, K0 which is the prior knowledge, and the frequency of a word in the text [4]. In the domain of text reading, it was generally known and accepted that a student?s knowledge on reading a word which is more frequent in a text is expected to be higher than those words which are rarely appearing in the text. 32 3.1.3.2 Model?s Parameters Plausibility As previously mentioned, model?s parameters plausibility refers o how reasonable the model?s parameters predicts the performance and learning of a student when interacting with the system. The plausibility of the model?s parameters cannot be just (purely) objectively defined. Instead it has to incorporate some subjective measures which might be domain independent or specific. In the following sections, some criteria used in the literature [18, 19, 4, 11, 12, 14] to evaluate parameters plausibility in BKT are described. Plausibility Based on Skills Mastery The model?s parameters plausibility on the basis of number of practice to skill mastery has been investigated in Knowledge Tracing using the learning curve [18, 19]. As previously mentioned, if the probability of a student knowing a skill is greater than 0.95, it is believed that the student has reached the level of mastery for the skill. In standard BKT, a student is presented with as many remedial exercises as needed on each skill to help the student reaches the level of mastery on the skill. The number of remedial exercises given to each user on each skill depends on the student?s predicted knowledge (current probability of mastery) on the skill. In Knowledge Tracing it is possible to apply the Equation 4-2 to 4-4 and simulate the number of trials required for a student reaching the level of mastery. The criteria of model?s parameters plausibility based on skill mastery can be used for the Knowledge Tracing model in which the parameter values are constant across skills as well as those in which the parameters? values vary across the skills. Note that in Knowledge Tracing it is assumed that the skills are independent from each other. To date, there is no study applying this criterion for evaluating BKT model?s parameters plausibility when the parameters? values are constant across skills. 33 Plausibility based on Number of Extremes Parameters? Values The values of the four Knowledge Tracing model?s parameters and in particular the values for skill-specific K0 (prior knowledge) and T (learning rate), can also reflect the plausibility of parameters? values. Depending on domain-specific and domain-independent facts, values of 0 and 1 can be considered as extreme values for the model?s parameters [12]. A value of 0 for the learning rate parameter means there is no transition from unlearned state to learned state for a skill in a Knowledge Tracing model. In other word, a value of 0 for the learning rate means there is no improvement in a student?s knowledge on a skill. A value of 1 for the learning rate means a full knowledge on a skill after only one opportunity to practice a skill regardless of correctness or incorrectness of the try. Similarly, a value of 1 for the prior knowledge means full prior knowledge on a skill and 0 is no prior knowledge on a skill at all. It is generally more plausible that a model?s parameter uses a reasonable learning rate and prior knowledge on the skills for students. Similar to the two previously discussed criteria, this criterion for parameter?s plausibility can also be used both for skill-specific and skills-independent parameters. Despite this fact, it is more reliable to apply such criterion when the parameters? values vary across skills. This is because if there is only one unique constant set of model?s parameters across skills, existing extreme parameters? values in such single set makes the model?s parameters almost completely implausible. For instance, if there is one single set of model?s parameters in which the learning rate is 1, it means all the skills are exactly in the same level of difficulty and all the skills can be mastered only after one opportunity to practice each skill. 34 Plausibility based on Correlation between Prior Knowledge and Skill Difficulty across Skills In order to evaluate model?s parameters based on the correlation between the prior knowledge (K0) and skills difficulty, there is a need for assigning a level of difficulty to each skill [12]. One way to do this is an expert-driven approach in which at least 2 domain experts will assign a level of difficulty to each skill and then the correlation between the assigned difficulty level will specify whether it is meaningful to apply such plausibility criterion or not. If the skills difficulty levels rated by different experts strongly and significantly correlated, we could then proceed with using such criterion, otherwise this approach should be abandoned. Once a level of difficulty is assigned to each skill, the correlation between the estimated value for the prior knowledge and the assigned level of difficulty is calculated. A strong and statistically significant correlation represents a plausible estimated value for the parameter prior knowledge. The above criteria is only applicable when the Knowledge Tracing model? parameters vary across skills. Plausibility based on Correlation between Student-specific Prior Knowledge and a Pre-test Score This criteria is similar to ?Plausibility based on Correlation between K0 and skill difficulty across skills? discussed in the previous section, except that, in here, instead of estimating the model parameters for each skill across students, a model is trained for each student across her skills. In other word, in this approach, all skills for one student will have the same values for each of the parameters but these values can vary from one student to the other. In this approach, the student-specific estimated value for the prior knowledge parameter is correlated with score available a priori through some sources such as a pre-test across 35 students. As described, such criterion is applicable when the model?s parameters? values are constant across skills but vary from one student to another student [12]. 3.1.4 Issues with Bayesian Knowledge Tracing The Knowledge Tracing student modeling aims at mapping the observable student?s performance to latent states of knowledge corresponding to the performance. Any issues in the quality of mapping can negatively affect the reliability of the model assessment of the knowledge. To date the Knowledge Tracing has been applied for student modeling for almost three decades yet the issues with inference in Knowledge Tracing had been unaddressed till recent years. Recently, Beck [18] addressed the issues of existing multiple global maxima in Knowledge Tracing known as Identifiability problem in Knowledge Tracing. The Identifiability in Knowledge Tracing refers to the existence of multiple equally good mappings from observable student?s performance to her corresponding latent level of knowledge. To address the issues, Beck introduced the Dirichlet prior approach [18] in which a Dirichlet probability distribution is defined over the model?s parameters in a Knowledge Tracing model to bias the estimation of the model parameters toward the mean of the distribution. The Dirichlet prior approach was then extended and the Multiple Dirichlet Prior approach [11] and Weighted Dirichlet Prior [12] have been proposed. The Identifiability problem is further discussed in next sections. The other problem with Knowledge Tracing is addressed by Backer, et al [5]. They discussed that the Knowledge Tracing models can suffer from the problem of degeneracy. The degeneracy problem is associated with updating the probability of knowing a skill in Knowledge Tracing. A KT model is degenerate if it updates the probability of a student knowing some skills in such a way that it violates the conceptual assumptions (such as a 36 student being more likely to make a correct answer if she has the corresponding knowledge than if she does not) underlying the process being modeled by Knowledge Tracing. It was also shown that the Dirichlet prior Knowledge Tracing model (which was proposed to address the Identifiability problem in BKT) also suffer from the degeneracy problem [5]. Two approaches, Bounded Parameters in Knowledge Tracing and Contextual Guess and Slip in Knowledge Tracing [5], have been defined and were shown to be less degenerated than the standard Knowledge Tracing and the Dirichlet Prior approaches. 3.1.4.1 Identifiability Problem in Knowledge Tracing As already discussed, Knowledge Tracing contains four model parameters: prior knowledge (K0), learning rate (T), guess and slip. The task of the parameters estimation in Knowledge Tracing is very important as the capability of the KT model in assessing latent state of student knowledge is directly affected by the values of the parameters. Beck, et. al [18, 12] showed that the process of estimating the Knowledge Tracing model parameter is subject to finding multiple combinations of model parameters which fit the student?s performance data and reduce the amount of error equally well. To illustrate the problem of Identifiability, Beck [18] plotted the learning and performance curves for three hypothetical models. The three models (Knowledge, Guess and Reading Tutor models) are shown in the Table 3-1[18]. Table 3-1: Three hypothetical KT models (Knowledge model, Guess model and Reading Tutor model) Model Name K0 T Guess Slip Knowledge 0.56 0.1 0.00 0.05 Guess 0.36 0.1 0.30 0.05 Reading Tutor 0.01 0.1 0.53 0.05 37 In order to plot the learning and performance curves, we need to know p(known) and p(correctn). p(known) gives the assessment of the Knowledge Tracing model of the level of mastery of a specific skill at each time the student has the opportunity to practice the skill. On the other hand, p(correctn) gives the prediction of the Knowledge Tracing model on correctness or incorrectness of the student?s answer to a problem step n. Note that here the assumption is that the Knowledge Tracing model?s parameters are calculated per skill but similar across students. Suppose that Table 3-1 represents the Knowledge Tracing model?s parameters for one specific skill s for three different BKT models. Given this assumption we have: ? P(knowis): probability that the student has learnt the skill s at ith opportunity to practice the skill s. ? P(correctis): probability that the student will make a correct response to the problem step associated with the skill s at the ith opportunity to practice the skill s. The learning curve is plotted on a 2-D space with the number of opportunities at the x-axis and p(knowis) at the y-axis. Similarly, the performance curve has the number of opportunities at the x-axis and p(correctis) at the y-axis. p(knowis) and p(correctis) are updated using Equations 3-2 to 3-4. Figure 3-1 [18] shows the learning and performance curves for the three hypothetical models described in Table 3-1. As can be seen in Figure 3-1, the three models have the similar performance curves while three different learning curves. This means that there are more than one set of KT model?s parameters which fit the performance data equally well. The models show three different learning curves. In the Knowledge model, after 10 practice opportunities the skill has the likelihood of 0.82 of being mastered (note the 38 level of mastery is the probability of 0.95) while the others are away from this level (0.75 for the Guess and 0.61 for the Reading tutor models). Figure 3-1: The performance curve (left) and learning curve (right) of the three hypothetical KT models for a skill [18]. One of the main objectives of the student modeling is providing an accurate assessment of the level of knowledge within the educational system during the student?s interaction as well as at the end of interaction. Different assessments of the level of skill mastery shown in Figure 3-1 raise this question that which assessment should be used for providing tailored supports on the skill s for which the curves have been generated. Notice that it is not possible to evaluate and compare the estimated model?s parameters based on how well the model fits the within-tutor performance data, because the different sets of the model?s parameters fit the performance data equally well. Therefore some other criteria of the model?s parameters evaluation are required. Such criteria is referred to as model parameters plausibility criteria, as previously discussed. To address the problem of Identifiabillity, Beck [18] proposed using a Knowledge Tracing which uses a Dirichlet prior probability distribution to bias the process of model?s parameters fitting. Subsequent to this work, Rai et. al [12] improved the Dirichlet prior 39 approach by incorporating weighted average in calculating the parameters in the Dirichlet distribution. The other extension by Gong et. al [12] propose using multiple Dirichlet prior distributions for estimating the Knowledge Tracing model?s parameters. These approaches will be discussed later in this Chapter. 3.1.4.2 Model Degeneracy in Knowledge Tracing The conceptual theoretical assumption behind Bayesian Knowledge Tracing approach is that knowing a skill generally leads to correct performances on the skill?s corresponding problem step and correct performances implies the student?s knowledge on the skill. In a degenerate model the parameter values lead to paradoxical behavior such as the probability of the student knowing a skill drops after several (i.e. three) correct answers in a row. Baker et al [5] has investigated the problem of degeneracy in Knowledge Tracing. To address the degeneracy issue in Knowledge Tracing, this problem has been investigated from theoretical and empirical point of views. In the following sections the problem of degeneracy in Knowledge Tracing is discussed. Theoretical Degeneration in Knowledge Tracing If a BKT model?s parameters violate the conceptual assumption underlying Knowledge Tracing, it is said to be theoretically degenerate. Generally when the slip and guess parameters are greater than 0.5 the model is theoretically degenerate. A slip value of greater than 0.5 for a skill means that a student who knows the skill is more likely to make a wrong answer to the corresponding problem step than a correct answer. Similarly, a guess value of greater than 0.5 indicates that the probability of a correct performance is higher than the probability of a wrong performance on a skill for a student not having the skill knowledge. 40 In addition, existing some specific patterns in the performance data is also indicators of degeneracy in the model. Similarly, these patterns also are related to violation of linkage between knowledge and performance (the conceptual assumptions behind Knowledge Tracing). If there exists such patterns in KT model, the model is said to be empirically degenerated. A model which is not theoretically degenerate could be empirically degenerate. In other word, a BKT?s model whose guess and slip parameters? estimated values are not greater than 0.5 might still violate the conceptual assumptions behind Knowledge Tracing. Such patterns of degeneracy are referred to as Empirical degeneration in Bayesian Knowledge Tracing. Empirical Degeneration in Knowledge Tracing As mentioned in the previous section, when the interaction data associated to a Knowledge Tracing model shows violation of the conceptual linkage between knowledge and performance, the model is said to be empirically degenerated. Two tests have been proposed by Baker et. al [5] to investigate the empirical degeneration in a Knowledge Tracing student model. If a model does not pass any of the following tests, the model is empirically degenerate. The tests are as following: (Definition 3-5) Test 1 of empirical degeneration in BKT: If a student?s first N actions in the tutor are correct, the model?s estimated probability that the student knows the corresponding skill should be higher than before these N actions. N is arbitrarily defined. (Definition 3-6) Test 2 of empirical degeneration in BKT: If student makes M number of correct actions in a row, the model should assess that the student has mastered the skill (the probability of knowing the skill is greater than a certain threshold). M is arbitrarily defined. 41 Pardos et. al [24] has also investigated the problem of degeneracy in Knowledge Tracing student modeling. They have analyzed the convergence of an Expectation-Maximization approach to estimating KT model?s parameters and visualized the error surface and found that the initial parameter values which lead to degeneration in the KT model are not randomly scattered throughout the parameters space but restricted within a surface with a specific boundaries. They showed that if the initial values to be used for the Expectation-Maximization approach for the two model parameters, guess and slip summed to greater than one (slip+guess>1), it is likely that the EM approach converges to a degenerate state in the parameter space in Knowledge Tracing. One straightforward approach of avoiding theoretical degeneration is bounding the Knowledge Tracing model?s performance parameters (guess and slip) to take a value less than 0.5. This approach is called Bounded model and is further investigated later in the thesis. Baker et. al [5, 6, 7] proposed an approach called Contextual Guess and Slip in Knowledge Tracing for contextually estimating the guess and slip and showed that such model is less degenerated than standard Knowledge Tracing. 3.1.5 Knowledge Tracing Model?s Parameters Estimation The objective of parameters fitting is estimating model?s parameters (prior knowledge, learning rate, guess, slip) so that the predictive accuracy of the model is maximized and the parameters are plausible. The predictive accuracy of a Knowledge Tracing model is directly affected by its parameters. There are varying approaches proposed to estimate Knowledge Tracing model?s parameters. In the standard Knowledge Tracing, a set of model?s parameters is estimated for each skill and the parameters may take any value in the range (0,1) with no bias toward any value in the range. Also the parameters are estimated with no attention to the 42 context they are used in. In other word, a skill-specific value for a parameter is estimated and this value is constant throughout the interaction of the student with the tutoring system. Such estimation also does not take into account characteristics of the students interacting with the system. Three approaches have been used to estimate the model parameters in the standard Knowledge Tracing as following: 1) Exhaustive search 2) Expectation-Maximization and 3) Curve fitting. Some other approaches to estimating model?s parameters have been also proposed which utilize some other recourses of information such as student?s characteristics [13, 10, 17], the context and so on [5, 6, 7]. These approaches are discussed in the following sections. 3.1.5.1 Standard Bayesian Knowledge Tracing A Baseline approach in fitting model?s parameter in Knowledge Tracing allows any values between 0 and 1 for the two learning parameters (prior knowledge and learning rate) and the two performance parameters (guess and slip). The Curve Fitting, Expectation-Maximization and Exhaustive search can be used for such purpose. Curve fitting: Conjugate Gradient Descent The curve fitting is carried out using conjugate gradient descent to minimize the distance between expected accuracy and actual accuracy at each problem step. Each problem step is associated to one skill. In Knowledge Tracing a set of four model?s parameter (prior knowledge, learning rate, guess, slip) is estimated for each skill. To prepare the input for the conjugate gradient descent to estimate the model?s parameter for a specific skill s, the interaction log files of the students are parsed and the performance of the student at any opportunity to practice the skill s is recorded. For instance, a student might have 5 opportunities to practice the skill s and her actual performance is as following: ?correct, 43 correct, wrong, correct, correct?. If the student has shown a correct performance at a problem step, the actual accuracy at that step is 1, otherwise it would be 0. On the other hand the predicted accuracy at each problem step can be calculated using the Equation 4-1 as following: ? ? ? ? = ?(??) ? 1? ? ?? + 1? ? ?? ? ?(??) Using the actual accuracy and the predicted accuracy, the model?s parameters are calculated so that the error of fitting process is minimized. To use the conjugate gradient descent we can form the following equation: ?? ?? ? ? ? ? ? ? ? ? ? ?(?? ?? ?? )?? ?? ? ? ? ? ? ? ? ? ? ?(?? ?? ?? )??? ?? ? ? ? ? ? ? ? ? ? ?(?? ?? ?? ) ?? ?(?)?(?) =???_?????(?)???_????? ?????_?????(?) in which Act_Perf1(n) is the actual performance of the student 1 at problem step n. The gradient descent approach starts with an initial random values for the parameter matrix (the matrix including 1-P(S) and P(G) and updates its estimate repeatedly until the fit error gets to less than an expected threshold or after a specific number of integrations. Expectation Maximization The Expectation Maximization approach can also be used for skill-specific parameters fitting in Knowledge Tracing. Assume that for each student we have an interaction log containing the student?s performance on applying all different skills. If we had a ground truth telling us whether the student has mastered a skill at each occurrence of performance in the interaction log, it is possible to use maximum likelihood to estimate the parameters. But because, the knowledge level of student on a skill is not observable, we can use Expectation Maximization to estimate such variable. The Expectation Maximization comprises two steps: 44 1) Expectation: at this step the current estimate of the parameters are used to calculate whether or not the student has mastered the skill at each time the student has the opportunity to practice the skill, (? ?? , 1? ?(??)). This is done by repeated use of Equation 4-2 to 2-4. 2) Maximization: at this step, new estimates of parameters are calculated. To calculate a new estimate for the slip parameter for a specific skill s, P(S), we take into account all problem steps related to the skill s (denoted by ?skill?s problem steps? in the following equation), to which the student has made a wrong answer. The following equation is used for new estimate of slip parameter for a skill [25]: (c: correct and w: wrong) ? ? = ? ?? ? ?(??)??????? ???????? ??????? ?(??)??????? ???????? ??????? Equation 3-8 Where, ?? ? = ? ? ? ??? ??????? ??? ???????? ????? ?? ??? ??????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?????????? Similarly, the guess parameter is re-estimated using the following equation: ? ? = ? ?? ? ?(??)??????? ???????? ??????? ?(??)??????? ???????? ??????? Equation 3-9 Where ?? ? = ? 1 ? ??? ??????? ??? ???????? ????? ?? ??? ????????0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?????????? The main limitation with the EM approach is that the EM approach does not guarantee to return the global maxima and eventually it might end in a local maximum [25]. The other possible problem with the EM approach is that the parameters? estimates might be implausible and/or make the BKT model empirically or practically degenerate. 45 Beck [18], showed that despite the problem of possible convergence to a local maxima, the EM approach can result in a Knowledge Tracing model with significantly higher predictive accuracy than the curve fitting approach. To compare these two approaches, after estimating the Knowledge Tracing model?s parameters, the estimated values were used to predict the student?s performance at each problem solving step. The predicted performance and the actual performance (whether the student has made a correct or wrong answer) have been compared across students using the AUC. The curve fitting approach had an AUC of 0.652+-0.01 while the Expectation-Maximization approach has an AUC 0.696+-0.01. In another study in the project LISTEN?s reading tutor [19], Expectation-Maximization approach (AUC of 0.61) significantly (p<0.01) outperformed the curve-fitting approach (AUC of 0.56). The results of the two aforementioned studies provided evidence that the Knowledge Tracing model?s parameter estimation using Expectation-Maximization approach provides higher predictive performance than the curve-fitting approach. Exhaustive (Brute Force) Search In the Brute force approach, we try several possible combinations of the four parameters of Bayesian Knowledge Tracing. In this search, parameters are estimated for each of the skill. To find the best fitting parameter estimates for each skill si, all the potential parameters combinations in a grain size of 0.01 are tried. For each of the parameter combinations, the summed squared residuals (SSR) is calculated by iterating through all of the students? actions related to the skill si. The combination which gives the best SSR will be the best fitting parameters? estimates for the skill si. To calculate SSR, for each combination prior knowledge (P(L0)), learning rate (P(T)), guess and slip, the likelihood of a correct action is calculated. For each action, using the following steps (for each skill): 46 0. n = 1, SSR = 0 1. repeat steps 2, 3, 4, 5 and 6 until n=number of actions applying skill si 2. Use the following equation to calculate the likelihood of making a correct action P(Actionn=correct) using the probability that the student knows the skill before the action P(Ln-1) ? ??????? = ??????? = ?(????) ? 1 ? ? ? + 1 ? ? ???? ? ?(?) (Equation 3-10) 3. Depending on student performance on Actionn (1 if correct, 0 if wrong) calculate the SR using the following Equation: ?? = (?????? ????????????? ?? ??????? = ??????? )? (Equation 3-11) 4. Update SSR using the following Equation: ???+= ?? 5. Then P(Ln) is calculated using Equations 4-2 to 4-4 depending on whether the student?s Actionn has been correct or wrong. 6. n = n+1 The SSR is calculated for each combination of values of the parameters and the combination which produces the best SSR is selected for the skill. 3.1.5.2 Bounded Knowledge Tracing The Bounded Knowledge Tracing is a KT model whose parameters are bounded to only take values from specific ranges. For instance, in accordance with recommendations in Corbett & Anderson, the guess and slip are bounded to be below 0.3 and 0.1 respectively in the LISP tutor [1]. The bounds should be defined in a way that supports mastery learning with reasonable number of opportunities to practice the skills. For instance a high value for guess does not give a high credit to a correct performance and consequently the probability that the student knows the skill does not increase much. Similarly, a high value for slip does not 47 highly penalize the student?s knowledge on a skill after a wrong performance in the tutoring system [1]. Also Baker et al. [5] proposed that in order to avoid theoretical degeneration the guess and slip values should not take any value greater than 0.5 [5]. The straightforward approach for estimating the bounded parameters is using a similar Exhaustive Search approach previously discussed to estimate the parameters for the standard BKT. We just need to modify the Exhaustive Search such that it does not take any combination of model?s parameters which contains values outside the defined ranges. Baker et. al [5] showed that although the Bounded Knowledge Tracing does not suffer from theoretical degeneration, there is some evidence of limited empirical degeneration in this model. It was also shown than the Bounded KT resulted in significantly lower empirical degeneracy cases than standard Knowledge Tracing which allows any value between 0 and 1 for its parameters. 3.1.5.3 Dirichlet Prior Knowledge Tracing The Dirichlet prior Knowledge Tracing model was developed to address the problem of Identifiabiity in the Knowledge Tracing model [11, 12, 18]. A Dirichlet distribution not only allows for specifying the most likely value (mean) for a parameter, but also the confidence in the estimate. The Dirichlet prior approach proposed by Beck [18] applies a Dirichlet distribution which uses two parameters ? and ? for each model?s parameters across skills. The values of ? and ? directly affect the mean and standard deviation of the Dirichlet distribution, so they should be selected wisely. Domain-specific facts and knowledge can be incorporated when selecting the distribution?s parameters (? and ?). For instance, a Dirichlet distribution of Dir(9,6) was used in estimating the prior knowledge (K0) parameter in LISTEN?s Reading Tutor project [19]. A mean 9 / (9+6) = 0.6 for Dir(9,6) suggests that most of the students have a moderate probability of initially knowing most of the skills, because 48 the most likely value in the distribution is 0.6. In LISTEN?s Reading Tutor, the Dirichlet parameters? values (9 and 6) were selected by a domain-expert. The main intuition behind the idea of using the Dirichlet distribution is that it is more plausible to have a model with similar parameters? estimates across the skills with no significant decay in predictive power of the model. Given this intuition, the idea of the Dirichlet prior approach in estimating the model?s parameters in Knowledge Tracing was proposed to bias the parameters fitting process toward estimating the most probable value for a parameter. This is carried out through defining a Dirichlet distribution over the parameter (one distribution for each parameter across skills). It should be noted that the more training data for estimating the value of a parameter, the more likely the estimate deviates from its initial estimation, the mean of the Dirichlet distribution. Suppose that there is a Knowledge Tracing model for each skill of a set of skills. In other word, each skill has its own set of model?s parameters (initial prior, learning rate, guess, slip). The following procedure is used to estimate the Knowledge Tracing model?s parameters using a Dirichlet distribution: 1- Use the Expectation-Maximization mechanism to estimate the KT model?s parameter (guess, slip, K0, t) to each of the skills. 2- For each of the four parameters (guess, slip, K0, t) calculate Dir (?,?) such that: ? Compute the mean (?) and variance (?2) for the parameter estimates across skills. ? Calculate ? and ? so that ? = ?/(?+?) and ?2= ?? / ((?+?)2( ?+?+1)). 3- Re-estimate the value of all parameters using their associated Dirichlet Priors. The Dirichlet prior approach is not meant to necessarily generate the optimal set of parameters with the maximum fit to the performance data as the first step in the above 49 procedure generates a set of parameters for each skill which is supposed to fit the performance data as much as possible. Re-estimating the model?s parameters using a Dirichlet prior distribution moves the model?s parameters toward the mean of the Dirichlet distribution associated to each parameter. Consequently it should not be a surprise if the model whose parameters have been re-estimated using a Dirichlet prior distribution fits the training data less well than the model whose parameters have been generated by the EM approach (generated after the step 1 of the above procedure). The Dirichlet prior Knowledge Tracing model was evaluated and compared to the EM Model (the model generated after the first step of the about procedure) [18]. It was shown that there is no statistically significant difference between the AUC of the roc-curves of the Dirichlet-prior model and EM model in both training and test data set. It is worth mentioning that a human-adjusted Dirichlet distribution used in LISTEN?s Reading Tutor showed statistically significant improvement in AUC when compared to the EM model. This result is not consistent to the data-driven Dirichlet prior approach described above [19]. The Dirichlet prior model?s parameters also were compared to the EM model with respect to parameters plausibility. With respect to plausibility based of skills mastery, the Dirichlet prior model suggests less cases of unreasonable number of practices to skill mastery compared to the EM model. So according to the plausibility based of skill mastery criteria, the Dirichlet prior model?s parameters values are more plausible than the EM model?s parameters values. With respect to the number of extremes parameters? values, The Dirichlet prior generated no extreme values for the model parameters across skills. 50 The Dirichlet prior Knowledge Tracing model was also compared to the bounded and baseline models with respect to model degeneracy [5]. To this end, the interaction data of the Middle School Tutor were used and the theoretical and empirical degeneration were analyzed in each of the models. 76% of skills in the Dirichlet prior model and 75% of skills in the baseline model were theoretically degenerate. Table 3-2: Degeneration in Dirichlet prior, baseline and bounded models [5] Model Theoretical Degeneration Failed test 1 of empirical degeneration (N=3) Failed test 2 of empirical degeneration (M=10) Baseline 75% 2% 23% Bounded 0% 0% 5% Dirichlet Prior 76% 2% 23% The bounded approach showed no theoretical degeneration as a result of bounding the parameters? values but showed some level of empirical degenerations as shown in Table 3-2 [5]. With respect to number of failures in test 1 of empirical degeneracy, the Dirichlet prior and baseline models were not statistically significant different from each other. Also cases of failures in test 1 of empirical degeneracy are rare compared to the cases of failures in the test 2 of empirical degeneracy. Finally, the Dirichlet prior and baseline models resulted in significantly more failures in test 2 of empirical degeneration compared to the bounded approach. 3.1.5.4 Weighted Dirichlet-Prior Knowledge Tracing In the Knowledge Tracing model, a set of parameters are estimated for each skill because not all skills are learnt in a similar way by a student. Despite the advantage of skills-specific model parameters, a potential problem happens when there is not much training data on a 51 skill and consequently, there is less constraint on the estimates for the skill. In other words, there is less confidence on the values assigned to the parameters. This could result in multiple combinations of model?s parameters which fit the performance data equally well. In such situations it is preferred to have such skills? parameters to take values similar to the values for the other better-estimated skills? parameters. As already mentioned, the Dirichlet distribution suggests a solution for this issue and it was originally proposed by Beck [18]. In the original setting for the Dirichlet prior approach, it was implicitly assumed that all skills are equally practiced during the tutoring sessions. This implicit assumption is not necessarily correct as some skills are practiced more than the other skills. Rai et. al [12] extended the work by Beck [18] to take into account the number of times a skill is practiced. The following procedure show how to incorporate the number of times each skill is practice. (Remember that K0 is the prior knowledge and T is the learning rate). 1- Use the EM to estimate the KT model?s parameter (guess, slip, K0, T) for each of the skills. 2- For all Knowledge Tracing model?s parameter P (guess, slip, K0 and T), compute Dir(?p, ?p) as following: ? Compute the mean (?p) and variance (?p2) for the parameter estimates ? Calculate weighted mean (?p?) and weighted variance (?p2?). Pi is skill-specific parameter P for the skill i: o ?? ? = ???? , ni is number cases for skill i o ?p? = (?? ???)???? ? ?????? ? o ?p2? = (?? ? (?? ? ???))???? ? ?????? ? ? Then Calculate ? and ? so that 52 o ?? ?= ? (??? ?/ ?????) ?? ?(1? ????) ?? ???? ?o ?? ?= ??? ?? ?((1/ ????)? 1) ?3- Re-estimate the knowledge tracing model?s parameters using the Dirichlet prior distribution. To evaluate the effectiveness of the weighted Dirichlet prior, two models were developed: 1-Fixed prior model: a model which uses the ?p? (calculated above) of each parameter as the estimate for the parameter and 2-The weighed Dirichlet prior model which uses Dir(?,?) as the prior distributions to estimate the parameters, as described in the above procedure. The predictive accuracy of the two models was compared using measure of summed squared error, SSE and AUC. The results showed that the weighed Dirichlet prior model did slightly (but not statistically significantly) better than the fixed prior model under both SSE and AUC. The two models were also compared base of model?s parameters plausibility. As for the first plausibility criteria, the two models were compared with respect to the plausibility based of skill mastery. The assumption in the study was that no skill should require more than 50 and less than 3 opportunities to mastery. The values outside the range, [3-50], are considered as outside-range values. The results showed that the weighed Dirichlet prior model resulted in less outside-range cases than the fixed prior model. In addition, the skills found implausible in the weighed Dirichlet prior model were subset of the skills identified as implausible in fixed prior model, meaning that the weighed Dirichlet prior does not introduce any new parameter implausibility on its own. The correlation between student-specific prior knowledge and a pre-test score was used as the second criteria for the model?s parameters plausibility to compare the two models. The results showed in both models there are strong correlation between the prior knowledge and 53 the pretest scores. The correlation in the weighted Dirichlet prior model was slightly but not significantly stronger than the correlation in the fixed prior model. 3.1.5.5 Multiple Dirichlet-Prior Knowledge Tracing In previous sections it was discussed how the Dirichlet distribution was used to address the issue of Identifiability in Knowledge Tracing. One assumption behind the two aforementioned approaches is that for each model?s parameter there is the same Dirichlet distribution across skills. As already mentioned, a single Dirichlet prior distribution for each of the parameters across skills, biases the corresponding parameter?s estimate toward the mean of the distribution. One extension to the previous Dirichlet prior approaches was proposed by Gone, et al [12]. The main idea in their work is that, some skills might be more similar to each other than the others. Therefore, it might be possible to cluster the similar skills into groups and then learn Dirichlet distributions for each group of skills. To this end, first of all, the Expectation-Maximization approach is used to calculate the model?s parameters for each of the skills. Each skill-specific parameters is then considered as a vector of features for clustering. In other word, the features for clustering is simply the model?s four parameters (prior knowledge, learning rate, guess and slip) estimated using the EM approach for each skill. Then the K-Means clustering approach is used to cluster the skills on the basis of the features vectors. The number of clusters was not specified a priori but was specified in a way that maximized the predictive accuracy on unseen data. So an over-fit to the test data set is limitation of the experiment discussed in this study [12]. Once the skills clusters are specified, a Dirichlet prior distribution is calculated for each cluster using the similar approach previously described, the weighted Dirichlet prior approach. 54 To evaluate the effectiveness of using multiple Dirichlet distributions, three types of Knowledge Tracing models were trained on the same data set: 1) Fixed prior model which simply uses a set of model parameters estimated by the EM approach for each skill. 2) Single Dirichlet prior model which uses a single Dirichlet distribution for each of the model?s parameters across the skills and 3) Multiple Dirichlet distributions approach which makes use of skills clusters. The three models were compared on the basis of the predictive accuracy as well as model?s parameter plausibility. To compare the models based the predictive accuracy two measures of AUC and R2 were considered. The results showed that there is no statistically significant difference between any of the three models with respect to AUC and R2. The three models were also compared with respect to models? parameters plausibility. No statistically significant difference was observed between three models with respect to plausibility based of skill mastery. The other comparison was carried out on the basis of the correlation between the pre-test and the student-specific prior knowledge. With respect to this criterion, the multiple Dirichlet approach generally resulted in more plausibility and for some specific number of clusters (for instance 6 clusters) the multiple (6 clusters) Dirichlet model resulted in significantly higher correlation compared to the fixed prior model. 3.1.5.6 Contextual Guess and Slip Model of Knowledge Tracing In standard Bayesian knowledge Tracing, the models? four parameters (prior knowledge, learning rate, guess and slip) are estimated for each skill and the estimated values are invariant across the context. In other words, the estimated parameters? values for a skill are the same for all the students and remain static during the entire session of interaction of the student with the tutor. One limitation in such models is that there is no attention to the 55 context where the skill is practiced. For instance, it never takes into account how many errors the student has recently made or how many helps the student has requested so far. To address this limitation, Baker at. al [5] proposed the Contextual Guess and Slip model of Knowledge Tracing. The main idea behind contextually estimating Knowledge Tracing model?s parameters is that the values of the parameters are no longer context-free but context sensitive. In other word, the values of the parameters are re-estimated based on contextual measures in real time. In Contextual Guess and Slip model of Knowledge Tracing the two learning parameters (prior knowledge and learning rate) values are estimated for each skill and remain fixed during the entire interaction and the two performance parameters (guess and slip) are initially estimated for each skill and re-estimated during the interaction and varied according to the context. The steps of contextually estimating the Knowledge Tracing performance parameters are as following: 1. Actions extraction: A subset of all actions made by the students during the interaction with the tutor system is extracted and labeled to build the dataset. There are two constraints on an action to be extracted and labeled. 1) For each student, the actions to be labeled are extracted from the set of first actions on each problem step. (Notice that, a student might make more than one wrong action on a problem step before eventually making a correct answer. The student has to make a correct performance on a problem step before proceeding to the next problem step). In actions extraction process only the first performance on each problem steps is extracted from set of all actions. 2) Only the first actions which practice skills whose Dirichlet prior estimates of guess and slip are 56 not degenerate are extracted. This is because of avoiding training a model which is degenerate. 2. Features Extraction from Interaction Data: For each action a set of 23 features is extracted. The Table 3-3 shows the features: Table 3-3: Set of features extracted from interaction log data Features Action is a help request Percent of past opportunities where student has requested help on this skill Percent of past opportunities where student has made errors on this skill Response is a string Time taken Time taken than average across all students Time taken in the last 5 actions (calculated in SD off average across students) Total number of times student has gotten this skill wrong on the first try Total time taken in this skill so far (across all problems) Number of last 5 actions which involved same interface element Number of last 8 actions which involved same interface element Number of last 5 actions which were wrong At least 3 of last 5 actions involved same interface between element and were wrong Number of opportunities student has already had to current skill 3. Labeling the student action: Each extracted action is labeled by the probability that the action had been due to guess or slip. In labeling the actions by probability of guessing and slipping on a skill, the Dirichlet prior estimates of the skill?s parameters are used. Each student action at time N (represented by An) is labeled with the probability it represents a guess or slip given the evidence on next two performed actions (An+1 and An+2). In general we know that: P(An is a guess | An is correct) = 1-P(Ln) P(An is a slip | An is incorrect) = P(Ln) We would like to calculate the following formula: 57 P (Ln | An+1, A n+2) = P(An+1, A n+2 | Ln)*P(Ln) / P(An+1, A n+2) By doing marginalization on Ln: P(An+1, A n+2) = P(An+1, A n+2 | Ln)*P(Ln) + P(An+1, A n+2 | ~Ln)*P(~Ln) in which P(~Ln) = 1- P(Ln) so to calculate P(An+1, A n+2) we need to calculate P(An+1, A n+2 | Ln) and P(An+1, A n+2 | ~Ln): if C denotes a correct performance and ~C denotes an incorrect response we have: P(An+1 = C, A n+2=C | Ln) = P(~S)*P(~S) = P(~S)2 P(An+1 = ~C, A n+2=C | Ln) = P(S) * P(G) P(An+1 = C, A n+2=~C | Ln) = P(~S) P(An+1 = ~C, A n+2=~C | Ln) = P(S)2 Similarly, we need to calculate P(An+1, An+2 | ~Ln) which is a function of the probability that the student learned the skill between An and An+1 (it means after doing action An+1) or between An+1 and An+2 (after doing action An+2). Therefore, P(An+1 = C, A n+2=C | ~Ln) = P(T)P(~S)2 +P(~T)P(G)P(T)P(~S)+P(~T)2P(G)2 P(An+1 = C, A n+2=~C | ~Ln) = P(T)P(~S)P(S)+P(~T)P(G)P(T)P(S)+P(~T)2P(G)P(~G) P(An+1 = ~C, A n+2=C | ~Ln) = P(T)P(S)P(~S)+P(~T) P(~G)P(T)P(~S)+P(~T)2P(~G)P(G) P(An+1 = ~C, A n+2=~C | ~Ln) = P(T)P(S)2+P(~T)P(~G)P(T)P(S)+P(~T)2P(~G)2 4. Developing a linear regression model: Using the features and guess and slip labels for each action, a linear regression model, a support vector machine and a multilayer perceptron are learned to predict the probability of guess and slip for any unseen student actions. The linear regression model generated showed slightly better performance than 58 the other two models on a 10-fold cross validation evaluation. So this model was used for the further evaluation and comparison. The accuracy of the Contextual Guess and Slip model, were compared to several models including a Baseline (the standard Knowledge Tracing in which model?s parameters can take any value between 0 and 1), Bounded (the standard Knowledge Tracing whose model?s parameters can take values from specific ranges) and the Dirichlet prior approach. For comparison, AUC of the ROC Curve and R2, the correlation between the actual performance and predicted performance were used. The Contextual Guess and Slip model showed significantly higher AUC and R2 compared to the other approaches. Therefore, the Contextual Guess and Slip resulted in significantly higher accuracy compared to Baseline, Dirichlet prior and Bounded approaches. The results of degeneracy evaluation in the Contextual Guess and Slip model showed some evidence for degeneracy. The proportion of failure of the test 1 of empirical degeneracy (0.0004%) in Contextual Guess and Slip model was significantly lower than the failure in the Baseline and Dirichlet approaches but not significantly higher than the proportion of failure of the test 1of empirical degeneracy in the Bounded model. The Contextual Guess and Slip Knowledge Tracing model also failed (1.7%) the test 2 of empirical degeneracy (M=10). This proportion is significantly lower than the Baseline, Dirichlet prior and Bounded models. 3.1.6 Summary and Conclusion This chapter provides a comprehensive overview on Bayesian Knowledge Tracing (BKT) as a common user modeling approach in tutoring system. The user modeling in BKT was discussed and it was explained how the approach was internally and externally validated. Next, different measures for evaluating a BKT based student?s model were described such as 59 ACU, R2 and SSE. It is also discussed that how the BKT?s model parameters can be evaluated through plausibility criteria. Then different approaches of estimating parameters in BKT were briefly described such as standard BKT, Bounded BKT, Dirichlet prior BKT and Contextual Guess and Slip BKT. This chapter also explains the two main issues with Knowledge Tracing: 1) Identifiability and 2) Degeneracy and the proposed solutions to deal with the issues were also briefly discussed. In the next chapter we discuss user modeling using Dynamic Bayesian Network in Prime Climb. When describing user modeling in Prime Climb, we follow a similar organization and structure to this chapter in order to provide a comparison study with Bayesian Knowledge Tracing. 3.2 Review on Behaviors Discovery in Educational Systems Keshtkar, et. al [35] describes an approach to distinguishing players and mentors roles in a multi-chat environment within Urban Science epistemic game. To this end, each participant of the game is represented by a feature vector which is a bag of n-grams (Unigrams and Bigrams) utterances. The constructed feature vectors are then used to train several classifiers such as Na?ve Bayes, Support Vector Machines and Decision Tree. The results showed that the utterance based features can be used to detect different roles (players and mentors) within a serious game like Urban Science. This work only uses the features from entire interaction. As opposed to this work, our work not only contains features, extracted from the whole interaction and but also features were extracted from parts of the interaction which can be used for building online classifiers. The other work by Mccuaig, et al. [36] discusses using interaction behaviors to distinguish students who will fail or pass a course in a Learning Management System (LMS). To this end, the learners? activities recorded within a LMS are 60 used to build a decision tree which can predict final letter grade of students. The assumption underlying their work is that higher achiever students behave differently in mastering the course materials compared to those having difficulty mastering the materials. In this work, they used learners? interaction with the course? material (such as total number of days of activity in LMS, average time to complete weekly problem sets) as well as the learners? self-report of confidence in the subject. This work not only uses features from entire interactions but also makes use of learner?s self-report of confidence in the subject. In our work, we do not use any feature related to self-reported measures. Similarly, the work by Lopez, et al. [37] shows that the university student?s final mark can be predicted by using their activities in the Moodle [38] forum of the course. To this end, the correlation between involvement in the forum and final mark of the student is calculated and used for predicting the final mark. Existence of different patterns of interaction has been also investigated in the domain of reading comprehension tasks [39]. A multidimensional K-Means clustering approach is used to determine positive and negative cognitive skill sets and strategies which are influential in reading comprehension tasks in an Intelligent Tutoring System (ITS). The K-Means clustering is used to cluster features related to text reading-scanning-scrolling actions. Identifying distinguishing behavioral patterns has been also investigated in an agent-based ITS which supports Self-Regulated Learning (SRL) about a complicated science topic [40]. To this end an Expectation-Maximization clustering approach was used to find 3 clusters of students which are distinguished by their test scores. Then a differential sequence mining approach was employed to identify frequent interaction patterns for each group of students. The results of this study show that the higher-performing students are more capable of identifying the relevance of materials to the goal and following more methodical approach in 61 their interaction with the educational contents. A similar approach was also used by Perera, et. al to cluster students which collaborated in a software development task based on attributes extracted from their interactions with the system. A modified version of Generalized Sequential Pattern mining was used to find the frequent sequence of actions which distinguish the most successful groups [41]. Sequence mining approach has been also used in differentiating patterns of interactions in students interacting with Betty?s Brain, a learning-by-teaching environment. Frequent reading patterns in the users were identified and extracted. The results showed that most of the extracted patterns are in common with high-performing and low-performing students [42]. The work by Bousbia, et al. discusses utilizing navigation behaviors (visited URLs, clicks, etc) to identify learning styles in learners in an Educational Hypermedia System (EHS). Mining users? interactions has been also used to classify patterns indicating two types of gaming behaviors which may or not have negative effects on the post-test result [44]. To this end, a model of student?s interaction with the system consisting of student?s interaction with the tutor, response time and probabilistic information about a student?s mastery level on some skills is constructed and used. The other related work is presented by Romero, et al. [45]. While most of the work on behavior mining is related to finding more frequent patterns in interaction, this work explores the extraction of the rare patterns in the interaction in Moodle system. To this end, Rare Association Rule Mining (RARM) is applied to extract meaningful rare associative rules. The motivation behind this work is that such rules can inform the teacher about a minority of students who might need special attention and support during their interactions with an educational system. The results show that RARM can be used to discover infrequent students? behavior in an e-learning environment like Moodle. 62 The work by D?Mello and Graesser [46] describes posture patterns mining to detect patterns associated with affective states of boredom, ?flow/engagement, ?confusion, ?frustration, ?and ?delight ? in ? AutoTutor, ? a ? dialogue-??based ? intelligent ? tutoring ? system. ? To ? this ? end, ? 16 ?posture ? features ?have ?been ?extracted ?and ?the ?affective ?state ?of ? the ? learner ? is ?rated ?by ?the ?learner, ?a ?peer ?and ?two ?trained ?judges. ?Binary ?Logestic ?Regression ?model ?is ?used ?to ?map ?the ?posture ?patterns ?and ?affective ?states ?in ?the ?users. ?Gerben, ?et ?al. ?discussed ?that ?some performance related features collected from Electrical Engineer (EE) students can be used to train decision trees to predict whether an EE student will drop out after first/second term of study or not [47]. Another work by Bravo and Ortigosa [48] discusses detecting patterns that imply low performance in an e-learning course in QuizGuide, a web-based service that provides individualized access to self-assessment quizzes for C Programming language [49]. To this end, C4.5 algorithm in Weka is used to generate production rules and then the most-comprehensive rules indicating low performance are filtered and identified. Mathews and Mitrovic present a work which focuses on finding a relationship between number of constraints seen and learning in the SQL-Tutor, a constraint-based ITS. To this end, each student?s log was scanned for extracting constraints seen and experienced by the student. A measure of constraint mastery is also defined. The results of the study showed that there is strong correlation between number of constraints a student sees and number of constraints she learns [50]. Kardan and Conati [30] introduce a user modeling framework which uses class association rule mining for automatic detection of user?s behavioral patterns within an exploratory learning environment and utilizing the patterns to improve intervention mechanism within AIspace [51]. This is done through building an online classifier which can 63 classify the users early in the interaction and guide them to follow the path which is more likely to result in higher learning gain. 64 4 Chapter: User Modeling in Dynamic Bayesian Networks: Prime Climb a Case Study In Chapter 2, the student model in Prime Climb was briefly described. As previously mentioned, Prime Climb is equipped with a probabilistic student?s model which infers students? factorization knowledge from the interactions and actions made by the students when playing the game. One challenge to make a connection between observable students? performance (e.g. a movement from one number on the mountain to another number) to latent student?s factorization knowledge is that the actions and interactions made by the students are not unambiguous and certain evidence of knowledge. For example, if a student makes a correct action, it could be evidence of her knowledge or it could be simply due to a lucky guess. Similarly, in the case of an incorrect action, the action either could indicate that the student did not know the required knowledge to make a correct action or the incorrect action could be simply due to an error due to distraction. To deal with this uncertainty, Prime Climb uses a Dynamic Bayesian Network, DBN, to model the students? latent level of factorization knowledge. In Prime Climb, each mountain is assigned a DBN student model so there are 11 DBNs. Moreover, in Chapter 3, a comprehensive review on Bayesian Knowledge Tracing was presented. This review has provided discussions on evaluation methods in BKT, estimation methods of BKT model?s parameters, issues of identifiability and degeneracy and finally solutions for dealing with the issues. In the current chapter (Chapter 4), the measures for evaluating the student model in Prime Climb, methods for estimating the model?s parameters, issues of identifiability and degeneracy in Prime Climb will be discussed. Throughout this 65 chapter and wherever possible, student modeling in Prime Climb and Bayesian Knowledge Tracing will be compared together. 4.1 Evaluation Measures of the PC-DBN Student Model As already mentioned, several measures have been used in the literature for evaluating the internal and external validity of Knowledge Tracing model. These measures include Summed Squared Error (SSE), Area under the curves (AUC) of ROC Curve, Expected and Actual Performance Correlation and R2(R-Squared). In the related literature, mostly the Knowledge Tracing models have been evaluated based on the predictive accuracy within the tutoring system (internal validity), meaning that how well the Knowledge Tracing model is capable of predicting the actual performance of the user within the tutor. On the contrary, in Prime Climb, the student?s model is evaluated both based on the model?s predictive accuracy on predicting student?s performance on a post-test at the end of the game (external validity of student model in Prime Climb) as well as model?s accuracy in discriminating known and unknown skills during the interaction (internal validity of student model in Prime Climb). The measure used to evaluate the model?s predictive accuracy after the student completes the interaction with the game is called End Accuracy. The measure which shows the model?s within-game accuracy is called Real-time Accuracy. The end accuracy and real-time accuracy are discussed in the next sections. 4.1.1 External Validity of Student Model: End Accuracy of PC-DBN Student Model Based on the literature previously published on Prime Climb, only the AUC of ROC Curve has been applied as a measure of external validity of Prime Climb?s DBN student model and the other measures such as (Summed Squared Error (SSE), R-Squared, Expected and Actual Performance Correlation) have not been used. While the external validity in BKT is 66 calculated based on actual and predicted performances, the external validity in Prime Climb can be calculated using end accuracy of the student?s model in Prime Climb. The end accuracy is defined as the model?s performance in accurate assessment of the student?s factorization knowledge at the end of the interaction with the game. Because, it is not possible to assess the student?s factorization knowledge on all numbers in the game, only a subset of them is selected. To measure the end accuracy, a post-test session is administrated after the student finishes interaction with the game. The student?s answers to the factorization questions on the numbers are used as ground truth in calculating the end accuracy. The end accuracy is calculated using the following formula: ??? ????????? = ?????? ????????????????? ?????????? Equation 4-1 Where ????? ????????? = ??????? ??? ??????????? ???????????? ??? ???? ?????? ??????? ????????? = ??????? ??? ??????????? ?????????????? ??? ???? ???????? Known Accuracy, also known as Sensitivity, is defined as the proportion of actual known (the numbers whose factors are known to the student based on her performance on the post-test) which are correctly identified (based on model assessment) as such. Similarly, Unknown Accuracy, also known as Specificity, is defined as the proportion of actual unknown (the numbers whose factors are not known to the student based on her answer to the corresponding question on the post-test) which are correctly identified as such. 67 4.1.2 Internal Validity of Student Model: Real-Time Accuracy of PC-DBN Student Model Compared to the model?s accuracy at the end of the interaction, the model?s real-time accuracy is a more informative measure for understanding the model impact on learning as it is influenced by how quickly the model stabilizes its assessment of the student factorization knowledge. The model?s real-time accuracy also has an immediate impact on the effectiveness of the pedagogical agent in providing individualized hints because the hinting strategy in Prime Climb (described later) is directly influenced by the model assessment of the student?s factorization skills during the game. In order to compute the model?s real-time accuracy we need to know at any given time whether a skill is known to the student or not. Therefore, we could not determine the real-time accuracy of all the target factorization skills due to not having a ground-truth assessment of how the related knowledge evolves during game playing. To deal with this limitation, we restricted the analysis to the skills on which the student?s performance did not change from the pre-test to the post-test, i.e., the related knowledge was constant throughout the interaction. Based on such assumption a skill which is known to the student during interaction should be always assessed as known by the model and vice verse, a skill supposed to be unknown to the student during game play needs to be assessed as unknown by the model. As a criterion for measuring the real-time accuracy, we used the average of two other measures named Positive-FMeasure and Negative-FMeasure as defined in the following equation ????_???? ????????? = ??????????????????????????????????? ?? Equation 4-2 where Positive_FMeasure ? = ?2 ? ????????????????? ? ???????????????????????????? + ??????????? ? ? 68 Negative_FMeasure ? ? = 2 ? ????????????????? ? ???????????????????????????? + ??????????? Positive ?Precision = ? #????????????#???????????? + ????????????? Negative ?Precision = ? #????????????#???????????? + #????????????? Sensitivity = ? #????????????#???????????? + #????????????? Specificity = ? #????????????#???????????? + #????????????? In the above formula positive precision is the fraction of assessed-as-known skills (the student has the skills) which are actually known and negative precision is the fraction of assessed-as-unknown skills (the student does not have the skills) which are actually unknown. On the other hand, sensitivity gives the fraction of known skills which are correctly assessed as such and specificity is fraction on unknown skills which are correctly assessed as such. Then we calculate two F-Measures, one related to known skills (positive-FMeasure) and the other one related to unknown skills (negative-FMeasure). Finally, the average of positive and negative Fmeasures is used as the real-time accuracy of the Prime Climb student?s model. 4.1.3 Other Measures to Evaluate External Validity of Prime Climb?s Model The end accuracy and AUC have been applied to quantify the external validity of student model in Prime Climb. As previously mentioned the internal and external validities of Bayesian Knowledge Tracing model are evaluated based on measures such as Summed Squared Error (SSE), AUC of ROC Curve, Correlation between Expected and Actual and R2. In this section, we discuss whether and how the other measures used in BKT are applicable to student model in Prime Climb. In Knowledge Tracing, SSE is calculated based on 69 Expected and Actual Performance. The current implementation of the student?s model in Prime Climb does not allow easy calculation of Expected performance so it is not straightforward to apply the same formula of SSE of Knowledge Tracing for evaluating student?s model in Prime Climb. For the same reason it is not possible to simply use the correlation between expected and actual performance as it is in BKT. Instead, we can assume that a correct performance on a question on the post-test shows the student?s knowledge on the corresponding skill and define some other measures for evaluating the student?s model in Prime Climb. Remember that, in BKT, the actual performance refers to correctness of student?s action on an opportunity to practice a skill and predicted accuracy refers to as the probability that the student will make a correct action on an opportunity to practice a skill. In Prime Climb?s student model, instead, the actual accuracy refers to as whether or not the student has the knowledge on a skill and the predicted accuracy refers to as the probability that the student has mastered the skill. Correlation between Post-Test Accuracy and Model?s Assessment This measure is based on the correlation between the student?s post-test performance and final assessment of the student?s model of the level of mastery in the student. As previously mentioned, the student takes a post-test at the end of interaction with Prime Climb. Also, at the end of the game, the student?s model provides its own assessment of skills? level of mastery using final posterior probabilities of the skills? corresponding node in the network. A model which can better predict actual knowledge of a student on the post-test will also show stronger correlation between post-test performance and model?s final assessment of the factorization skill. 70 Summed Squared Error between Post-Test Performance and Model?s Assessment This measure is based on the difference between actual knowledge of the student on a post-test after the game and the final model assessment of the student?s knowledge. An actual knowledge of a student on a skill is 1 if the student provides a correct answer to skill?s corresponding question in the post-test and 0 otherwise. ??? = (????_????(??????? ??)??????_??????????(??))? 4.2 Evaluation of the Student Model?s Parameters in Prime Climb In the previous section, the methods for evaluating the student?s model were discussed. As already mentioned, the parameters of the Knowledge Tracing model can also be directly evaluated using plausibility criteria. Similar to Knowledge Tracing, as described in Chapter 2, in Prime Climb DBN student model, there are some model?s parameters including slip, guess, edu-guess and max and these parameters can also be directly evaluated using plausibility criteria as discussed next. 4.2.1 Applicability of BKT Model?s Parameters Plausibility for Prime Climb There are some plausibility criteria defined for Knowledge Tracing student model. In this section we discuss whether these criteria are applicabe to Prime Climb stundent model or not. It is woth mentioning that none of the following criteria has been used in previous publications on Prime Climb to evaluate the plausibility of model?s parameters in Prime Climb DBN model. ? Plausibility based on Skills Mastery: As already discussed, this creterion of plausibility works with the number of practice opportunities required to master a skill. There is a limiation in applying this criterion in Prime Climb DBN student model: In 71 Knowledge Tracing the assumption is that the skills are independently practiced but in Prime Climb DBN model, at each opportunity for making an action, the student practice more than one skill at the same time (2 factorization skills and one common factor skill). Also as already discussed, the current implementation of Prime Climb DBN student model uses a causal structure for the Click node in the model. Thus, the skills? corresponding random variables are not independent from each other given evidence (which is the student action). Therefore, this plausibility cretria cannot be applied in the current implementation of student model in Prime Climb. ?? Plausibility based on Number of Extremes Parameters? Values: As opposed to Knowledge Tracing which uses skill-specific values for the model?s parameters, in the current implementation of Prime Climb DBN model, this one set of model?s parameters are used across all skills and students. The plausibility based on number of exterme parameter?s values requires skill-specific model?s parameters. Consequently such criteria is not applicable to the current version of Prime Climb DBN student model which uses a unique set of model?s parameters similarly across the skills. ? Plausibility based on Correlation between Prior Knowledge and Skill Difficulty across Skills: In Knowledge Tracing the prior knowledge is a learning parameter of the Knowledge tracing model and is not defined a priori and should be estimated. On the contrary in Prime Climb, the prior knowledge is already defined based on the type of prior probabilities. Remember that there are three types of prior parameters in Prime Climb: 1)Population 2)Generic 3)Userspecific. So in Prime Climb, this parameter is not estimated based on students? performance within the game. Therefore, such plausibility criterion is not applicable to Prime Climb DBN student model. 72 ? Plausibility based on Correlation between Student-specific Prior Knowledge and a Pre-test Score: As mentioned before, there are three types of prior knowledge parameters in Prime Climb 1)Population 2)Generic 3)Userspecific. In Prime Climb, a student takes a pre-test exam before starting the game. Since the prior knowledge is not estimated in Prime Climb DBN during interaction, this criterion is not applicable for evaluating the model?s parameter value in Prime Climb. 4.2.2 Model?s Parameters Plausibility for Prime Climb As discussed in the previous section, none of the parameters plausibility criteria in Bayesian Knowledge Tracing can be used in Prime Climb. In this section, we introduce several plausibility criteria which can be used in Prime Climb Plausibility Based on Correlation between Within Game Performance and Learning It is generally assumed that a correct performance is more likely indicating student?s knowledge than lack of knowledge. Similarly, a wrong performance is more likely due to lack of knowledge than having the knowledge. Given such assumption, it is expected that when a student makes significantly more correct actions than incorrect actions on opportunities to practice a skill, the final posterior probability of the student knowing the skill is relatively higher than the prior probability of the skill. To investigate this assumption, we can use the correlation between two variables: 1) difference between number of correct and incorrect actions on a skill and 2) difference between the final probability of knowing the skill and the prior knowledge on the skill. The final probability of knowing a skill is given by the final posterior probability of a student knowing the skill (assessed by the student model). 73 Plausibility Based on Efficiency of the Hinting Mechanism in Prime Climb The pedagogical agent in Prime Climb provides personalized interventions in the form of adaptive hints during the student?s interaction with the game. One way to evaluate the efficiency of an intelligent intervention mechanism is through quantifying the capability of the hinting mechanism in providing hints when needed. While educational games intend to leverage the engagement aspect of the game to enhance students? motivation in learning the target material, it is also important to assure an intervention mechanism does not decay engagement. We leverage such assumption and define a measure of plausibility based on efficiency of hinting mechanism in Prime Climb. According to such criteria, a set of model?s parameters which produce a reasonable number of hints during game play is more plausible. For instance if a student makes 200 movements in total during the game play, it is not plausible to receive over 100 hints. (one hint for every two movement on average). The main limitation of such measure is that, it cannot be defined objectively because there is no ground truth on the optimal number of given hints. In order to cope with this issue, we will define a measure to compute the accuracy of hinting mechanism in Prime Climb. A set of model?s parameter which produces higher accuracy is more plausible. Note that, as described in Chapter 2, the hinting strategy in Prime Climb is based on the assessment of the student model of the student?s factorization knowledge and student model?s assessment is affected by the values of the model?s parameters. The hinting mechanism and its evaluation criteria will be discussed in the following section. 4.3 Evaluation of the Hinting Strategy in Prime Climb From a pedagogical perspective, it is essential to provide the student with ?correct? supports when she needs it. A ?correct? support is given on a correct skill (a skill which is unknown or 74 not completely known to the student) when required and presented with helpful context in a way that encourages the student to attend to the support. As the intervention mechanism in Prime Climb uses real-time assessment of the student?s knowledge to determine when and on what to provide help, effectiveness of the mechanism is influenced by how accurately the student?s model tracks and assesses the evolution of desired skills. To investigate how well the hinting strategy and student?s model provides tailored supports to the student during the interaction, the accuracy of the hinting mechanism is defined based on the F-Measure of two measures: 1) Hint Precision and 2) Hint Recall. Generally, precision is defined as the fraction of retrieved instances which are relevant while recall is the fraction of relevant instances that are retrieved. Similarly hint precision is defined as the fraction of given hints which are justified and the hint recall is defined as the fraction of justified hints which are retrieved and given to the student. An intervention (hint) provided to the user is called justified if it is given at the correct time and on the right skill. On the contrary, an unjustified intervention is presented to the student when it is not required and expected by the student. Similarly, if the intervention strategy fails to provide a justified intervention, it is said that a justified intervention has been missed. Finally, when no intervention is given when it is not required, the intervention mechanism has ?correctly not given? the hint. The hint precision and hint recall are defined using the following equations. Section 4.3.1 describes how these measures are estimated. ???? ?????????? = ? ?????? ??? ?????????? ???????????? ??? ?????????? ?????? + ?????? ??? ???????????? ?????? ???? ??????? = ? ?????? ??? ?????????? ???????????? ??? ?????????? ?????? + ?????? ??? ??????? ?????? 75 ?_??????? = ?2 ? ???? ?????????? ? ???? ??????????? ?????????? + ???? ??????? It is worth mentioning that, although the real-time accuracy (measure of internal validity of student model) in Prime Climb had direct impact on the hinting strategy, but in the current version of hint strategy, the thresholds to discriminate known and unknown skills are fixed as defined in Table 2-3 in Chapter 2, while in real-time accuracy, the threshold to discriminate known and unknown skills are calculated using cross-validation as previously discussed. The next section describes how the accuracy of hinting strategy is calculated. 4.3.1 Simulation of the Intervention Mechanism Using the Original Threshold Setting In order to calculate the hint precision and hint recall in Prime Climb, the data from interactions of 45 students in grade 5,6 (as described in Chapter 2) with Prime Climb was used to simulate the hinting strategy using the original parameter settings (see Table 2-3 in Chapter 2). To this end, we initialized the student?s model with each of the settings of prior probabilities (population, generic, user-specific). Since there is no ground truth on how the student?s number factorization and common factor knowledge evolve during the interaction of the student with Prime Climb, in the process of calculating the hint precision and hint recall, we only considered the movements in which either the player?s number or the partner?s number or both keep the same score from the pre-test to post-test. (Remember that the students took a pre-test before playing the game and a post-test after playing the game) In each movement made by the student, there are two numbers involved: 1) Player?s number and 2) Partner?s number. The player?s number is the number to which the player has just moved while the partner had moved to the partner?s number on the mountain. All the numbers the students ever moved to during the game-play were assigned a label based on the 76 performance of the student on that specific number in the pre and post tests. We used 5 labels to represent the status of the numbers from the pre-test to post-test as following: ? KK: Stands for Known-Known and shows that the number has been known to the student both in the pre-test and post-test (student has answered correctly to the number?s corresponding question on both tests). ? UU: Stands for Unknown-Unknown and shows that the number has been unknown to the student both in the pre-test and post-test. ? NAP: If the number does not appear on the tests. Given the above terminologies, the types of the hints are defined based on the status of the numbers on which the hints are given as following: ? Justified hint: A hint which is given on a number with status of UU. ? Unjustified hint: A hint which is given on a number with status of KK. ? Missed hint: When the hinting mechanism fails to provide a hint on a number with status of UU. ? CorrectlyNotGiven hint: When the hinting mechanism correctly detects not to provide hint on a number with status of KK. In calculation of hint precision and hint recall it has been assumed that a student should receive a hint following a movement which contains at least a number with status of UU and should never receive a hint on a number with a status of KK. For each set of prior probabilities, total numbers of given hints was calculated and the confusion matrix was constructed. Table 4-1 shows the structure of the confusion matrix used for the intervention mechanism in Prime Climb. For instance, in this confusion matrix, an unjustified hint is a 77 hint given on a number which is known to the student according to pre-test and post-test scores of the student and is unknown on the basis of the student?s model assessment. Table 4-1 Confusion matrix for hinting strategy Model assessment of student knowledge Unknown Known Pre-Post Test Known Unjustified hint (UJ) Correctly Not Given (CN) Unknown Justified hint (J) Missed hint (M) 4.4 Issues with Prime Climb DBN Student Model In Chapter 3, we discussed two issues with Bayesian Knowledge Tracing called Identifiability and Degeneracy. The objective of the current section is investigating the possibility of similar issues in Prime Climb DBN student model. First the issue of Idenntifiability in Prime Climb DBN model is discussed followed by the Degeneracy issue in PC-DBN student model. 4.4.1 Identifiability in Prime Climb DBN Student Model In Knowledge Tracing the problem of Identifiability is defined as the issue of existence of multiple local maxima. Existence of multiple local maxima means that there are multiple set of model parameters which fit the performance data equally well and maximize the predictive accuracy of Knowledge Tracing model. Note that the predictive accuracy in Bayesian Knowledge Tracing is defined as the power of the BKT model to accurately predict the within tutor performance of the student (predicting whether or not the student will make a correct or wrong answer at each opportunity to practice a skill). In Prime Climb DBN student model, the model?s parameters are estimated such that the end accuracy of the model is maximized. A similar issue of Identifiability has also been observed in Prime Climb DBN model. Formally, the Identifiability in Prime Climb is defined as following: 78 Identifiability in Prime Climb DBN student model: There might exist multiple set of model?s parameters (slip, guess, edu-guess and max) which maximizes the end accuracy equally well. In other word, there could multiple set of model?s parameters which can predict the student?s performance on a post test similarly. Table 4-2 shows an instance of Identifiability problem in student?s model in Prime Climb. As can be seen in the Table all set of parameters resulted in the same end accuracy. In an intelligent educational system, a student model not only should be capable of accurate evaluation of latent states of the student (e.g level of knowledge on skills) and predicting the student performance within and outside the system, but also provide consistent claim about the student. Different sets of parameters shown in Table 4-2, claim differently about how learning has evolved in the student and how the student has approached the learning opportunities. Table 4-2: Identifiability in Prime Climb DBN Student Model Model Prior Knowledge Slip Guess Edu-Guess Max End-Accuracy M1 Population 0.6 0.4 0.4 0.2 0.96 M2 Population 0.4 0.6 0.6 0.4 0.96 M3 Population 0.2 0.7 0.7 0.2 0.96 M4 Population 0.6 0.3 0.4 0.8 0.96 Model M3 shows a probability of 0.2 for the slip parameter while models M1 and M4 show a probability of 0.6 for slip. This means that models M1 and M4 do not penalize students for making wrong actions as much as model M3 does because M1 and M4 give a much higher probability that a student makes an error of distraction. Similarly models M2 and M3 do not give credit to correct actions as much as models M1 and M3. 79 4.4.2 Degeneracy in Prime Climb DBN Student Model The model Degeneracy in Bayesian Knowledge Tracing is defined as violation of the conceptual assumptions behind the BKT model. Similarly in Prime Climb we define some conceptual assumptions and any violation of these assumptions is referred to as Degeneracy in Prime Climb DBN model. The conceptual assumptions in Prime Climb DBN model are as followings: 1) Correct evidence (action) on a skill, should increase (must not decrease) the probability of the student knowing the skill. 2) An incorrect action on a skill, should decrease (must not increase) the probability of the student knowing the skill. Similar to Degeneracy in Bayesian Knowledge Tracing, any case or pattern in the student model violating the aforementioned assumptions is marked as model degeneration. As already described, two tests of empirical degeneration have been suggested for Degeneration in BKT as following: 1) Test 1 of empirical degeneration in BKT: If a student?s first N actions in the tutor are correct, the model?s estimated probability that the student knows the corresponding skill should be higher than before these N actions. N is arbitrarily defined. 2) Test 2 of empirical degeneration in BKT: If student makes M number of correct actions in a row, the model should assess that the student has mastered the skill. M is arbitrarily defined. While the Test 1 of empirical degeneration in BKT is also applicable to check degeneracy in Prime Climb DBN student model, a more extended test would be more appropriate for 80 testing such type of degeneracy in Prime Climb DBN student model. We call the test, Test 1 of empirical degeneration in PC-DBN (see Definition 5.1). Test 2 of empirical degeneration in BKT is not applicable to Prime Climb DBN model as one limitation of Prime Climb is that no remedial exercises are given to the students during the game play and also as previously mentioned, the skills in the current implementation of Prime Climb are not supposed to be necessarily independent from each other and at each time to practice a skill, at least three skills are practiced at the same time. (Definition 4-1) Test 1 of empirical degeneration in PC-DBN: If a student makes a correct/incorrect action on an opportunity to practice a skill, the probability of the student knowing the skill should not be less/greater than the probability of knowing the skill before making the action on the skill. As opposed to Test 1 of empirical degeneration in BKT, the Test 1 of empirical degeneration in PC-DBN model takes into account both correct and incorrect movements. The other possible source of degeneracy in Prime Climb DBN model refers to the dependency parameter Max in Prime Climb DBN model. Despite the standard Bayesian Knowledge Tracing, in Prime Climb not all skills are independent from each other. If the dependency parameter Max is not zero, it means that knowledge on a skill is transferred to some other relevant skills. For instance, as shown in the above figure, given that the Max is not zero, the knowledge on factorization skill of number 42 is transferred to F42 Prior7 F6 F7 Figure 4-1: Knowledge transfer from one skill to other skill 81 knowledge on factorization skill of number 6 and 7. Test 2 of empirical degeneration in PC-DBN is related to this assumption in Prime Climb. (Definition 4-2) Test 2 of empirical degeneration in PC-DBN: Suppose that there is a dependency relationship between two skills S1 and S2 such that knowledge on S1 implies knowledge on S2. If a student performs correctly/incorrectly on an opportunity to practice skill S1, the probability that the student knows skill S2 should not be less/greater than its values before making the action. If a Prime Climb DBN model fails any of these two tests, the model is said to be degenerated. As previously described, generally two types of skills are practiced in Prime Climb: 1) Number factorization skills and 2) Common factor skill. In Prime Climb, at each time the student has an opportunity to make an action (climbing the mountains of numbers), both skills are required simultaneously. Therefore, degeneracy might happen pertain to both skill types. Specifically, several patterns of degeneracy can be identified in Prime Climb DBN student model and each of the patterns relates to a failure in one of the two tests of empirical tests of degeneration in Prime Climb DBN student model. For ease of representation, some specific notations are used as following: 1) Patterns causing a failure to test 1 of empirical degeneracy in PC-DBN model: ? FACT_NOT_DECREASE: When as a consequence of an incorrect action on a factorization skill, the probability of the student knowing the skill increases while it should have decreased or stayed the same. 82 ? FACT_NOT_INCREASE: When as a consequence of a correct action on a factorization skill, the probability of the student knowing the skill decreases while it should have increased or stayed the same. ? CF_NOT_DECREASE: When as a consequence of an incorrect action on a common factor skill, the probability of knowledge the skill increases while it should have decreased or stayed the same. ? CF_NOT_INCREASE: When as a consequence of a correct action on a common factor skill, the probability of knowledge the skill decreases while it should have increased or stayed the same. 2) Patterns causing a failure to test 2 of empirical degeneracy in PC-DBN model: ? INCREASE_DECREASE_MAX: Suppose knowledge on a factorization skill S1 implies knowledge on another factorization skill S2. A pattern of degeneracy is when as a consequence of an incorrect action on a factorization skill S1, the probability of the student knowing skill S2 increases while it should have not. ? DECREASE_INCREASE_MAX: Suppose knowledge on a factorization skill S1 implies knowledge on another factorization skill S2. A pattern of degeneracy is when as a consequence of a correct action on a factorization skill S1, the probability of the student knowing skill S2 decreases while it should have not. Table 4-3: CPT of Click Node Common Factor (CF) Known Unknown FY Known Unknown Known Unknown FX Known Unknown Known Unknown Known Unknown Known Unknown P(Click=Correct) Slip Eduguess Eduguess Guess Guess Guess Guess Guess 1-P(Click=Correct) 1-Slip 1-Eduguess 1-Eduguess Guess 1-Guess 1-Guess 1-Guess 1-Guess 83 To further investigate the sources causing degeneracy, we take a deeper look at the conditional probability table of Click node in the Prime Climb DBN model. Remember that the conditional probability table of Click node is as following: Based on the conditional probability table of the Click node, the following conditions could be of sources of causing degeneracy in the PC-DBN model. Note that K and U respectively stand for Known and Unknown and C and W stand for Correct and Wrong respectively: ? Eduguess < guess: Considering the following cases: o From CPT of Click node: P(Click=Correct | CF=K, FX=K, FY=U) = Eduguess and P(Click=Correct | CF=U, FX=K, FY=U) = Guess ?? In this situation, a correct performance on skills CF, FX and FY, could cause P(CF=K) to decrease. Similarly, a wrong action, could cause P(CF=K) to increase. Given the CPT of the Click node, because when FX=K and FY=U and guess>Eduguess, a current action could provide evidence that the common factor skill (CF) is more unknown than known to the student so the model might decrease the probability of the student knowing common factor skill. This situation is highlighted (diagonally lined) in the CPT of the Click node shown above. FX FY CF ClickXY Figure 4-2: Click node structure 84 o P(Click=Correct | CF=K, FX=U, FY=K) = Eduguess and P(Click=Correct | CF=U, FX=U, FY=K) = Guess ?? In this situation, a correct action on a movement practicing common factor and factorization skills (of numbers X and Y) could cause P(CF=K) to decrease. Similarly, A wrong action, could cause P(CF=K) to increase. ? 1-Slip < Guess: o P(Click=Correct | CF=K, FX=K, FY=K) = 1-Slip and P(Click=Correct | CF=U, FX=K, FY=K) = Guess ?? A correct movement involving factorization skills of number X and Y and common factor skill, could cause P(CF=K) to decrease. Similarly, A wrong action, could cause P(CF=K) to increase. ? 1-Slip < Eduguess: o P(Click=Correct | CF=K, FX=K, FY=K) = 1-Slip and P(Click=Correct | CF=K, FX=K, FY=U) = Eduguess ?? A correct action could cause P(FY=Known) to decrease and A wrong action could cause P(FY=Known) to increase. o P(Click=Correct | CF=K, FX=K, FY=K) = 1-Slip and P(Click=Correct | CF=K, FX=U, FY=K) = Eduguess ?? A correct action could cause P(FX=Known) to decrease and A wrong action could cause P(FX=Known) to increase. 4.5 Prime Climb DBN Model?s Parameters Fitting As previously described, the Prime Climb DBN model is a parametric model and there are two types of parameters in the model: 1) Prior probability parameters and 2) Dependency 85 Parameters. The prior probability parameters are calculated on the basis of the students? performance on a pre-test exam (for Population and Userspecific types) and the value is 0.5 for any random variables in the PC-DBN model for generic type. To estimate the dependency parameters some approaches have been used in Prime Climb. In this section such approaches are described. 4.5.1 Original Prime Climb DBN Model The Original Prime Climb DBN mode allows any value between 0 and 1 for the DBN model?s parameters (Slip, Guess, Edu-Guess, Max). As already mentioned, the values for the parameters are selected such that the model?s end accuracy is maximized. In order to find a set of values for the DBN model?s parameters to maximize the model?s end accuracy, an exhaustive search approach is applied. The search procedure examines values between 0 and 1, starting at 0 and ending at 1 in interval of 0.1 and eventually selects the parameters combination which maximizes the DBN model?s end accuracy. To this end, a Leave-One-Out cross validation approach was applied across students. The end accuracy for each prior parameter was calculated and optimal sets of parameters were computed. In Table 4-4, the parameter ?threshold? is the cut-off point which discriminates the known and unknown factorization skills at the end of the interaction with Prime Climb. This cut-off point is determined using a ROC curve or an exhaustive search between a comprehensive range of thresholds between 0 and 1. Table 4-4: Set of optimal model's parameters for original Prime Climb DBN model Prior Probabilities Guess Edu-Guess Max Slip Threshold opulation 0.5 0.3 0.2 0.4 0.68 Generic 0.2 0.6 0.8 0.6 0.58 User-specific 0.6 0.1 0.6 0.6 0.9 86 PC-DBN Model?s End Accuracy Using the above estimated set of parameters, the average end accuracy across students of the PC-DBN model for each prior probability setting is given in Table 4-5. Table 4-5: Original Prime Climb DBN model?s End Accuracy Prior Probabilities End Accuracy Known Accuracy Unknown Accuracy Population M=0.77, SD=0.14 M=0.88, SD=0.07 M=0.64, SD=0.29 Generic M=0.70, SD=0.15 M=0.69, SD=0.11 M=0.72, SD=0.29 User-specific M=0.72, SD=0.2 M=0.76, SD=0.22 M=0.64, SD=0.35 Note that in Table 4-5, the end accuracy is the average end accuracy across all student models (one model for each student). Also, it could be possible that either the known or unknown accuracy is not defined for a student model. If the known accuracy is not defined for a student model, the end accuracy would be equal to the unknown accuracy of the model and similarly if the unknown accuracy is not defined for a student model, the end accuracy would be equal to end accuracy. PC-DBN Model?s Real-Time Accuracy We also calculated the PC-DBN?s model real-time accuracy for the Original Prime Climb DBN model as previously described. Table 4-6 represent the real-time accuracy results for each prior probabilities setting. Note than when calculating the model?s real-time accuracy, the model?s dependency parameters based on their values in Table 4-4. 87 Table 4-6: Real-time Accuracy in Original Prime Climb model Prior Probabilities Real-Time Accuracy Population 0.75 Generic 0.62 User-specific 1 The result of computing model?s real-time accuracy shows a very high accuracy (=1) for the model with userspecific prior probability while the end accuracy for such model is 0.72. As already described, the real-time accuracy is limited only to those skills on which the students have shown similar performance in the pre-test and post-test. In other word, if, based on the pre-test and post-test results, a student knows a specific skill, the model is always capable of evaluating the skill as known to the student and vise versa. The reason is that if a student knows a skill (based on the student?s pre-test result) the skill?s corresponding node in the network (student model) is assigned a high prior probability of 0.9 and if the student?s does not know the skill, a low prior probability of 0.1 is given to the skill?s corresponding random node in the model. Consequently, the model is capable of clearly distinguishing the difference between these two types of skills during the interaction. In this section, the end (a measure of external validity) and real-time (a measure of internal validity) accuracies of the PC-DBN model were reported. When calculating the end accuracy of the student model, it was assumed that the known and unknown accuracy are given the same weight and therefore, the end accuracy was defined as the average of the known and unknown accuracy. This means that if the known accuracy of a model is 1 and the unknown accuracy of the model is 0, the end accuracy of the model is 0.5 (=[1+0]/2). While such an end accuracy measure is important and useful, in cases where the number of known 88 (unknown) skills to students is significantly higher than the number of unknown (known) skills, the end accuracy may not be a very informative and intuitive measure of accuracy for the model. For instance, imagine, there are 10 skills known to the students and there is only 1 skill unknown to the student. If the model is capable of successfully identifying all the known skills, the known accuracy of the model would be equal to 1. But if the model fails to identify the only unknown skill, the unknown accuracy of the model is zero. In such a model, the end accuracy would be 0.5 while the model has correctly identified the status of 10 over 11 skills. To address such a limitation two other measures of external validity have been defined: 1) Correlation between Post-Test Performance and Model?s Assessment and 2) Summed Squared Error of Post-Test Performance and Model?s Assessment. The next two sections report on these measures. Correlation between Post-Test Performance and Model?s Assessment The correlation between the students? post-test performance on factorization and common factor skills and the students? model final assessment on how much the students know these skills (based on the final posterior probability of the skills? corresponding nodes in PC-DBN) was also calculated. For calculating the correlation between students? performance on factorization questions and model?s assessment on factorization skills, we calculated, for each student, the correlation between student?s responses on 15 factorization questions on the post-test (score=1 if correct answer and score=0 otherwise) and the model?s final assessment on the corresponding skills and took average across students to compute the average correlation. Table 4-7 represents the results. 89 Table 4-7: PostTest-Model Assessment Correlation in Original Prime Climb Prior Probabilities Correlation Population M=0.44, SD=0.24 Generic M=0.28, SD=0.21 User-specific M=0.31, SD=0.3 Summed Squared Error of Post-Test Performance and Model?s Assessment The Sum of Square Errors (SSE) of the Original Prime Climb model is also calculated. The SSE is calculated based on students? performance on 15 factorization skills and the model assessment of students? knowledge on the corresponding skills. For each student, the average SSE over all factorization skills is calculated and the average SSE of the model is calculated across stents. Table 4-8 represents the results for the Original Prime Climb student model for each prior probability setting. Table 4-8: SSE in Original PC-DBN model Prior Probabilities Correlation Population M=0.15, SD=0.11 Generic M=0.19, SD=0.06 User-specific M=0.16, SD=0.11 Analysis of Model Degeneracy in Original Prime Climb DBN Model It has been already discussed that the Prime Climb DBN Model is vulnerable to the problem of degeneracy. In this section we investigate whether the optimal parameters could lead to 90 cases degeneration in the model. Previously, it was discussed that the following subspaces of PC-DBN model?s parameters could result in degeneration in the model. ? Eduguess < guess, ? 1-Slip < Guess ? 1-Slip < Eduguess We can check whether the estimated model?s parameters are enclosed in the above subspaces. Table 4-9: Evaluation of original Prime Climb student model for degeneracy Prior Probabilities Parameters Degeneration Guess Edu-Guess Slip Eduguess<guess 1-Slip<Guess 1-Slip<EduGuess Population 0.5 0.3 0.4 yes no No Generic 0.2 0.6 0.6 no no Yes User-specific 0.6 0.1 0.6 yes yes No As shown in the above Table, all models are vulnerable to degeneracy. For further investigation, the models were checked for degeneracy using the test 1 and 2 of empirical degeneration in PC-DBN. Table 4-10 summarizes the average number of failures per each student?s model in each test based on the data of the 43 students discussed in Chapter 2. In the next section, we investigate the issue of degeneracy in Prime Climb in more details. Table 4-10: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy Prior Probabilities Failures in Test 1 Failures in Test 2 Population M = 268.91, SD = 64.26 M = 1.71, SD = 2.85 Generic M = 101.84, SD = 28.73 M = 258.35, SD = 80.48 User-specific M = 339.17, SD = 75.11 M = 138.86, SD = 62.04 91 Model Degeneracy in PC-DBN Model with Population Prior Probability Figure 4-3 and Table 4-11 show average/STD number of patterns of degeneracy in original PC-DBN model when population prior probability is used. When simulating the student model on interaction logs of students, the PC-DBN model?s dependency and learning parameters were set as described in Table 4-4. The reason for why such parameters? settings were selected for analyzing PC-DBN degeneracy, was that, these parameters setting resulted in highest end accuracy for the student model. Figure 4-3: Degeneracy in Prime Climb DBN student model when population prior parameter is used. Table 4-11: Average number of patterns of degeneracy across students in original Prime Climb DBN model for Population prior probability. Population Prior Probability (#Subjects = 43) Statistics Total Degeneration Factorization Degeneration Common Factor Degeneration #of Moves Average 270.62 167.33 103.29 494.5 STD 64.59 48.14 22.5 94.5 270.62 ?167.33 ?(61.8%) ?103.29 ?(32.2%) ?0.00 ?100.00 ?200.00 ?300.00 ?400.00 ?TotalDegenera?on ? Factoriza?on ?Skills ? Common ?Factor ?Skill ?Average ?Number ?of ?Degenera?on ?on ?Skills ?(Popula?on) ?Number ?of ?Degenera?on ? 92 Figure 4-4: Average number of different patterns of degeneration across students when Population prior is used. Figure 4-5: Average percentage number of different patterns of degeneration across students when Population prior is used. As shown in Table 4-11, the average number of opportunities to practice factorization and common factor skills across students is 494.5 which is equal to average number of 28.96 ?136.67 ?28.76 ?74.53 ?0.53 ? 1.18 ?-??20 ?0 ?20 ?40 ?60 ?80 ?100 ?120 ?140 ?160 ?180 ?200 ?Average ?Number ?of ?Degenera?on ?in ?Model ?(Popula?on ?Prior) ?10.70% ?50.50% ?10.63% ?27.54% ?0.20% ? 0.44% ?0 ?10 ?20 ?30 ?40 ?50 ?60 ?Average ?% ?Number ?of ?Degeneracy ?in ?Popula?on ?Model ? 93 movements students made (=164.5) times the number of skills practiced (or examined) at each movement (3 skills = 2 factorization skills + 1 common factor skills). Also as previously mentioned, 2 patterns of degeneracy (INCREASE_DECREASE_MAX and DECREASE_INCREASE_MAX) are related to the max parameter. These cases of degeneracy are related to skills which are not directly practiced in each movement (See test 2 of empirical degeneracy in PC-DBN). Since less than 1% of degenerated cases are these types of degeneracy (See Figure 4-5), we can conclude that the results show that over half of the times in which the students had opportunity to practice factorization and common factor skills (making movement) involved degeneracy related to skills directly practiced in the movement) Figure 4-4 and Figure 4-5 show the average numbers of each pattern of degenerations when the population prior probability is used. Model degeneracy in PC-DBN Model with Generic Prior Probability Figure 4-6 shows the degeneration results when generic prior probability is used. Table 4-12: Average Degeneracy results across students when generic prior probability is used. Generic Prior Probability Statistics Total Degeneration Factorization Degeneration Common Factor Degeneration #Opportunities of Practice Average 360.20 358.82 1.37 494.5 STD 103.48 103.78 1.48 94.5 Figure 4-6: Degeneracy in Prime Climb DBN student model when Generic prior parameter is used. 360.20 ? 358.82 ?1.38 ?-??200 ?0 ?200 ?400 ?600 ?Total ?Degenera?on ? Factoriza?on ?Skills ? Common ?Factor ?Skill ?Average ?Number ?of ?Degenera?on ?in ?Generic ?Model ? 94 Figure 4-7: Average number of degeneracy when Generic prior probability is used Figure 4-8: Average percentage number of degeneracy when Generic prior probability is used The high percentages related to INCREASE_DECREASE_MAX and DCREASE_INCREASE_MAX in the generic PC-DBN model could be due to the high value of Max parameter in this model. As shown in Table 4-4 the value of the Max parameter in 10.67 ?89.80 ?0.00 ? 1.38 ?22.11 ?236.24 ?-??50 ?0 ?50 ?100 ?150 ?200 ?250 ?300 ?350 ?Average ?Number ?of ?Degeneracy ?Across ?Students ?in ?Generic ?Model ?2.96% ?24.93% ?0.00% ? 0.38% ?6.14% ?65.59% ?0 ?10 ?20 ?30 ?40 ?50 ?60 ?70 ?Average ?% ?Number ?of ?Degeneracy ?Across ?Students ?in ?Generic ?Model ? 95 the generic PC-DBN model is 0.8 which is greater than the value of Max in population and user-specific models. Model Degeneracy in PC-DBN Model with User-specific Prior Probability Similarly, model degeneration also has been investigated in Original Prime Climb model with userspecific prior probability setting. Table 4-13 shows average total number of cases of on factorization skills as well as common factor skills. Table 4-13: Average number of patterns of degeneracy in original Prime Climb DBN model when Userspecific prior probability is used. Userspecific Prior Probability Statistics Total Degeneration Factorization Degeneration Common Factor Degeneration #Opportunities of Practice Average 478.04 411.0 67.04 494.5 STD 131.8 57.9 20 94.5 Figure 4-10 and Figure 4-11 respectively average and percentage of different types of degeneracy in Original Prime Climb student?s model with userspecific prior probability setting. Compared to PC-DBN model with population prior probability, there are more cases of degeneracy in PC-DBN model with userspecific prior probability. In addition there more cases of degeneration related to max model parameters (29% in userspecific vs. less than 1% in population). This is because the max parameter in the userspecific model is high (0.6) vs. (0.2) which might cause more degeneracy propagation in the network. As discussed above, different patterns of model degeneracy have been observed in Prime Climb DBN student model with varying prior parameters. 96 Figure 4-9: Degeneracy in Prime Climb DBN student model when Userspecific prior parameter is used Figure 4-10: Average number of degeneracy when Userspecific prior probability is used. 478.04 ?411.00 ?67.04 ?0 ?200 ?400 ?600 ?800 ?TotalDegenera?on ? Factoriza?on ?Skills ? Common ?Factor ?Skills ?Average ?Number ?of ?Degenera?on ?on ?Skills ?(Userspecific) ?Number ?of ?Degenera?on ?49.02 ?223.11 ?29.16 ? 37.89 ?58.00 ?80.87 ?0 ?50 ?100 ?150 ?200 ?250 ?300 ?Average ?Number ?of ?Degeneracy ?in ?Userspecific ?Model ? 97 Figure 4-11: Average percentage number of degeneracy when Userspecific prior probability is used. Model?s Parameter Plausibility in Original Prime Climb DBN Model Model?s parameters plausibility is the other way of evaluating the model?s parameters. This section discusses the plausibility of the estimated model?s dependency parameters in Original Prime Climb model. Plausibility based on the Performance of the Hinting Mechanism The hinting strategy in Prime Climb is explained in Chapter 2. It is vital that a student model can be capable of providing a reliable base for an adaptive intervention mechanism. Therefore, we investigate the efficiency of the hinting mechanism in Prime Climb when the Original Prime Climb model is used. Note that in this analysis, the hinting mechanism and its thresholds setting are exactly the same as it was described in Chapter 2. In this section, the efficiency of hinting mechanism in Prime Climb student?s model with different prior probability settings has been investigated and shown the following confusion matrices. Note that UJ, J, M and CN respectively stand for unjustified, justified, missed and correctly not given hints. 10.25% ?46.67% ?6.10% ? 7.93% ?12.13% ?16.92% ?0 ?5 ?10 ?15 ?20 ?25 ?30 ?35 ?40 ?45 ?50 ?Average ?% ?Number ?of ?Degeneracy ?in ?Userspecific ?Model ? 98 Table 4-14: Confusion matrix for hinting mechanism for Original PC-DBN model with Population prior Model assessment of student knowledge (Population) Unknown Known Total Pre-Post Test Known 6 (UJ) 246 (CN) 252 Unknown 84 (J) 340 (M) 424 Total 90 586 676 Hint Strategy Precision Recall FMeasure Average Hints/Moves 0.74 0.2 0.33 112.74/164.5 Table 4-15: Confusion matrix for hinting mechanism for Original Prime Climb model with Generic prior Model assessment of student knowledge (Generic) Unknown Known Total Pre-Post Test Known 75 (UJ) 177 (CN) 252 Unknown 244 (J) 180 (M) 424 Total 319 357 676 Hint Strategy Precision Recall FMeasure Average Hints/Moves 0.77 0.58 0.66 82.96/164.5 Table 4-16: Confusion matrix for hinting mechanism for Original PC-DBN model with Userspecific prior Model assessment of student knowledge (Userspecific) Unknown Known Total Pre-Post Test Known 0 (UJ) 252 (CN) 252 Unknown 424 (J) 0 (M) 424 Total 424 252 676 Hint Strategy Precision Recall FMeasure Average Hints/Moves 1 1 1 139/164.5 The results shown in the above tables show that the hinting strategy in the Original student model in Prime Climb, intervene too much and this could negatively affect the student?s engagement in the game. In the population model on average 2 hints are given for almost 99 every three movements the student makes. In the generic model 1 hint is given for each 2 movements and in userspecific model a hint is given for each 1.2 movements. These results show that the parameters in the Original student model in Prime Climb are not plausible. Plausibility Based on Correlation between Within Game Performance and Learning This section discusses the correlation between a student?s performance within the game and learning evolution in the student. The student?s performance within the game is defined based on number of correct and wrong actions at opportunities to practice number factorization skills. For the factorization skills tested on the pre test and post test, the following statistics have been calculated: ? Average and standard deviation of number of correct Prime Climb movements involving the number, across students (denoted by #Correct(Mean) and #Correct(STD)) ? Average and standard deviation of number of wrong Prime Climb movements involving the number, across students (denoted by #Incorrect(Mean) and # Incorrect(STD)) ? Paired t-test comparison of number of correct and wrong Prime Climb movements involving the number across students (P-value) ? Average and standard deviation of the difference between number of correct and wrong PC movements involving the number across students (Difference (Mean) and Difference (STD)) Table 4-17 shows the above measures for the 16 numbers on the pre-post test when population prior probability is used. When the paired t-test result is statistically significant it is denoted by ?*? in the p-value column. 100 As the above table shows, in all cases in which the difference between average number of correct and wrong moves is statistically significant, the average number of correct moves is higher than average number of wrong moves. For some numbers such as 25, 27, 31, 81, 89 and 97 the difference between correct and wrong movements is very high and students at least made 5 times more number of correct movements than wrong movements. For such numbers, a high difference between the final posterior probability of students knowing the corresponding skills and their initial prior probability is expected. It is also expected that a higher number of correct movements consistently results in a higher difference between posterior and prior probabilities. Table 4-17: the skill analysis table for all numbers on the pre-test and post-test for the population model Number #Correct (Mean) #Wrong (Mean) #Correct (STD) #Wrong (STD) P-Value 9 3.69 1.09 2.25 1.72 8E-11* 11 4.15 0.24 3.37 0.64 1.8E-10* 14 0.67 0.22 1.19 0.47 0.03* 15 2.09 1.51 1.72 1.85 0.06 25 10.44 0.98 3.73 1.01 3.2E-21* 27 8.04 1.35 5.41 1.73 8.8E-12* 31 5.11 0 6.14 0 1.4E-06* 33 2.6 1.49 2.47 1.32 0.00063* 36 1.91 2.11 2.17 2.57 0.66 42 2.71 1.27 1.75 1.59 0.00019* 49 2.37 0.02 2.42 0.14 5.3E-08* 81 8.35 1.04 3.59 1.54 4.3E-19* 88 1.6 0.82 1.52 1.28 0.0003* 89 7.71 0 3.33 0 1.9E-19* 97 5.67 0 4.07 0 5.4E-12* To investigate such relation, a plausibility criterion is defined, called plausibility based on correlation between within game performance and learning. Figure 4-12 shows the correlation between the difference of prior and final posterior (y-axis) and the difference 101 between number of correct and wrong movements (x-axis) for some numbers on the pre test and post test. Figure 4-12 shows the correlation results for the numbers for which the difference between the number of correct movements and wrong movements is very high compared the other numbers. Table 4-18 shows the correlation results and whether or not the results are statistically significant. Table 4-18: Correlation results regarding plausibility based on correlation between within game performance and learning in population student model Number Average/STD (Correct-Wrong) Average/STD (Posterior-Prior) Correlation (Pearson r) P-Value (2-tailed) 9 2.6/2.04 0.024/0.0078 0.61 7E-06* 11 3.91/3.18 0.048/0.015 0.41 0.004* 14 0.44/1.34 0.016/0.003 0.385 0.0089* 15 0.58/2.02 0.013/0.009 0.355 0.015* 25 9.46/3.67 0.03/0.012 0.5 0.004* 27 6.68/4.88 0.014/0.015 0.33 0.024* 31 5.11/6.14 0.027/0.014 0.14 0.34 33 1.11/2.02 0.00072/0.0045 0.47 0.00091* 36 -0.2/3.086 0.011/0.0096 0.23 0.11 42 1.44/2.37 0.0116/0.034 0.65 1.2E-06* 49 2.35/2.41 0.00087/0.003 0.55 8.45E-05* 81 7.3/3.32 0.01/0.012 0.43 0.0025* 88 0.77/1.32 0.005/0.01 0.526 0.0002* 89 7.71/3.33 0.023/0.018 0.39 0.0089* 97 5.66/4.07 0.015/0.016 0.61 0.08 The Pearson r correlation is highly strong if higher than 0.7, strong if between 0.4 and .69, moderate if between 0.3 and 0.39, weak if between 0.2 and 0.29 and no correlation if lower than 0.2. As can be seen in Table 4-18 the correlation is significant for all numbers except for 31, 36 and 97. In addition in almost all cases the correlation is moderate or higher. Despite 102 this fact, the average difference between prior probability and final posterior probability is very small. This indicates that the evidence (correct and wrong movements) is not propagated properly in the Original PC-DBN model so the model may not have an accurate assessment of student?s knowledge on skills. R? ?= ?0.25523 ?0 ?0.01 ?0.02 ?0.03 ?0.04 ?0.05 ?0.06 ?0.07 ?0 ? 5 ? 10 ? 15 ? 20 ?#correct ?-?? ?#wrong ?movements ?Posterior ?-?? ?Prior ?for ?25 ?R? ?= ?0.11243 ?-??0.01 ?0 ?0.01 ?0.02 ?0.03 ?0.04 ?0.05 ?0.06 ?0.07 ?-??5 ? 0 ? 5 ? 10 ? 15 ? 20 ?#correct ?-?? ?#wrong ?movements ?Posterior ?-?? ?Prior ?for ?27 ?R? ?= ?0.02058 ?0 ?0.02 ?0.04 ?0.06 ?0.08 ?0.1 ?0 ? 10 ? 20 ? 30 ? 40 ? 50 ?#correct ?-?? ?#wrong ?movements ?Posterior ?-?? ?Prior ?for ?31 ?R? ?= ?0.19329 ?0 ?5 ?10 ?15 ?20 ?-??0.02 ? 0 ? 0.02 ? 0.04 ? 0.06 ? 0.08 ?#correct ?-?? ?#wrong ?movements ?Posterior ?-?? ?Prior ?for ?81 ? 103 Figure 4-12: Plausibility Based on Correlation between Within Game Performance and Learning in Population student model In sum, this section described the measures of internal and external validities for the original PC-DBN model. The estimated parameters for the original PC-DBN model were presented and evaluated using model plausibility criteria. It was shown that the original PC-DBN model is vulnerable to the issue of model degeneracy. In addition, it was shown that the hinting strategy based on the original PC-DBN model intervenes too much and that it could negatively affect the student?s engagement in Prime Climb. The results also showed that due to degeneracy issue in the original PC-DBN model, the evidence (correct and wrong movements) is not propagated properly in the student model so the model may not have an accurate assessment of skills in the students. Therefore it can be concluded that the parameters of the original PC-DBN model are not plausible. 4.5.2 Bounded Prime Climb DBN Model As discussed in the previous section, the original Prime Climb student model is vulnerable to the degeneracy problem. Three subspaces in the space of model?s parameters have been mentioned to enclose values for the parameters which might make the student model R? ?= ?0.37769 ?0 ?0.01 ?0.02 ?0.03 ?0.04 ?0.05 ?0.06 ?0.07 ?0 ? 5 ? 10 ? 15 ? 20 ?#correct ?-?? ?#wrong ?movements ?Posterior ?-?? ?Prior ?for ?97 ?R? ?= ?0.14935 ?0 ?0.01 ?0.02 ?0.03 ?0.04 ?0.05 ?0.06 ?0.07 ?0.08 ?0 ? 5 ? 10 ? 15 ? 20 ?#correct ?-?? ?#wrong ?movements ?Posterior ?-?? ?Prior ?for ?89 ? 104 degenerate. One approach to address such problem is avoiding selecting values for the model?s parameter which are located inside the aforementioned subspaces. Such approach is called Bounded Prime Climb DBN model. In this approach, the model?s dependency parameters do not take values from the degeneracy subspaces. Similar to the original PC-DBN model, an Exhaustive Search approach is used to find a set of model?s dependency parameters which maximizes the model?s End Accuracy. A Leave-one-out cross validation approach in the level of individual subjects (students) was applied to find the set of model?s dependency parameters. Table 4-19 shows the estimated model?s parameters for each prior probability type for the bounded PC-DBN model. Table 4-19: Optimal dependency model parameter in bounded PC-DBN student model Prior Probability Guess Edu-guess Slip Max Threshold Population 0.7 0.7 0.2 0.4 0.8 Generic 0.5 0.6 0.4 0.8 0.63 Userspecific 0.3 0.3 0.6 0.8 0.92 Bounded PC-DBN Model?s End Accuracy Using the above estimated set of parameters, the end accuracy of the bounded PC-DBN model for each prior probability setting is given in Table 4-20. The end accuracy, known and unknown accuracy in bounded PC-DBN model are defined exactly similar to these measures in the original PC-DBN model. 105 Table 4-20: Accuracy results for the bounded Prime Climb DBN model Prior Probabilities End Accuracy Known Accuracy Unknown Accuracy Population M=0.76, SD=0.15 M=0.77, SD=0.12 M=0.75, SD=0.0.27 Generic M=0.68, SD=0.15 M=0.60, SD=0.13 M=0.79, SD=0.22 User-specific M=0.69, SD=0.18 M=0.74, SD=0.21 M=0.60, SD=0.35 Comparison of the Models? End Accuracy The Original and Bounded PC-DBN models were compared based on their end accuracy, true known and true unknown. Figure 4-13 shows the comparison of the student models when the population prior is used. Figure 4-14 and Figure 4-15 respectively, show the comparison when the generic and userspecific priors are used. Figure 4-13: Original and Bounded Prime Climb Student Model with Population Prior Figure 4-14: Original and Bounded Prime Climb Student Model with Generic Prior 0.77 ?0.88 ?0.63 ?0.76 ? 0.77 ? 0.75 ?0 ?0.5 ?1 ?1.5 ?Accuracy ? TrueKnown ? TrueUnKnown ?Popula?n ?Prior ?Degenerated ? UnDegenerated ?0.70 ? 0.69 ? 0.72 ?0.68 ? 0.60 ?0.80 ?0 ?0.5 ?1 ?1.5 ?Accuracy ? TrueKnown ? TrueUnKnown ?Generic ?Prior ?Degenerated ? UnDegenerated ? 106 Figure 4-15: Original and Bounded Prime Climb Student Model with User-specific Prior Table 4-21, Table 4-22 and Table 4-23 summarizes the average end accuracy results across students for the Original model and Bounded Prime Climb DBN student model for different prior probability settings. The results of a t-test showed that there is no statistically significant difference on the end accuracy of the Original PC-DBN (M=0.77 , SD=.14) and Bounded (M=0.76 , SD=.15) PC-DBN models, p = 0.57, when population prior setting is used. Similarly, the results of a t-test showed that there is no statistically significant difference between the end accuracy of the Original (M=0.7 , SD=0.16) and Bounded (M=0.68 , SD=0.15) PC-DBN models, p=0.4, when generic prior setting is used. In addition, the results of a t-test showed no statistically significant difference was found between end accuracy of the Original (M=0.72 , SD=0.2) and Bounded (M=0.69 , SD=0.19) PC-DBN models, p=.37, when userspecific prior setting is used. Table 4-21: Average/STD of accuracy results across students for original and bounded model with population prior Model (Population) True Known Mean/SD True Unknown Mean/SD End Accuracy Mean/SD Original 0.88 / 0.07 0.63 / 0.29 0.77 / 0.14 Bounded 0.77 / 0.12 0.75 / 0.27 0.76 / 0.15 Statistics p-value=0.003 p-value=7.6E-09 p-value=0.57 0.72 ? 0.76 ? 0.64 ?0.69 ? 0.74 ? 0.60 ?0 ?0.5 ?1 ?1.5 ?Accuracy ? TrueKnown ? TrueUnKnown ?Userspecific ?Prior ?Degenerated ? UnDegenerated ? 107 Table 4-22: Average/STD of accuracy results across students for original and bounded model with generic prior Model (Generic) True Known Mean/SD True Unknown Mean/SD End Accuracy Mean/SD Original 0.69 / 0.12 0.72 / 0.29 0.7 / 0.16 Bounded 0.6 / 0.13 0.8 / 0.22 0.68 / 0.15 Statistics p-value = 5.18E-07 p-value = 0.09 p-value = 0.4 Table 4-23: Average/STD of accuracy results across students for original and bounded model with user-specific prior Model (Userspecific) True Known Mean/SD True Unknown Mean/SD End Accuracy Mean/SD Original 0.79 / 0.22 0.64 / 0.35 0.72 / 0.2 Bounded 0.74 / 0.22 0.6 / 0.35 0.69 / 0.19 Statistics p-value = 0.27 p-value = 0.5 p-value = 0.37 In addition, Table 4-24 summarizes the AUC (Area under the ROC curve) for the Original and Bounded models for all three types of prior probabilities. Figure 4-16, Figure 4-17 and Figure 4-18 also show the ROC-curves respectively for the two models with population, generic and use-specific prior probabilities. Note that the red curve is for the Original and the blue curve is for the Bounded models. Table 4-24: Area Under the Curve of ROC-Curves of the bounded PC-DBN model Prior probability Population Generic Userspecific AUC for original 0.7345 0.6762 0.7860 AUC for bounded 0.7375 0.6643 0.7449 108 Figure 4-16: AUC of ROC curve for the population PC-DBN models (red: Original, blue: Bounded) Figure 4-17: AUC of ROC curve for the generic PC-DBN models (red: Original, blue: Bounded) Figure 4-18: AUC of ROC curve for the user-specific PC-DBN models (red: Original, blue: Bounded) 109 Bounded PC-DBN Model?s Real-Time Accuracy The real-time accuracy for the Bounded Prime Climb DBN model was also calculated. Table 4-25 represent the real-time accuracy results for each prior probabilities setting. Note that the real-time accuracy is limited only those skills on which the student?s knowledge remain unchanged from the pre-test to post-test. Table 4-25: Model?s real-time accuracy for bounded PC-DBN Model Prior Probabilities Real-Time Accuracy Population 0.72 Generic 0.57 User-specific 1 Correlation between Post-test Performance and Learning The correlation between the students? post-test performance on factorization and common factor skills and the students? model final assessment on how much the students know these skills (based on the final posterior probability of the skills? corresponding nodes in PC-DBN) was also calculated. Table 4-26 shows the results. Table 4-26: PostTest-Model correlation assessment in Bounded Prime Climb Prior Probabilities Correlation Population M=0.44, SD=0.22 Generic M=0.29, SD=0.18 User-specific M=0.32, SD=0.3 110 Comparison of the Models? Correlation between Post-test Performance and Model Assessment The results of paired t-tests showed no statistically significant difference between the Original and Bounded models with respect to the correlation between the post-test performance on factorization numbers and model assessment on factorization knowledge. Figure 4-19 shows the results. Figure 4-19: Comparison of the models? correlation between post-test and model?s assessment Sum of Square Error (SSE) Using Post-test Performance and Model Assessment Table 4-27 represents the results for the Bounded Prime Climb student model for each prior probability setting. Table 4-27: SSE for Bounded PC-DBN model Prior Probabilities SSE Population M=0.15, SD=0.12 Generic M=0.18, SD=0.04 User-specific M=0.15, SD=0.11 0.46 ?0.29 ? 0.32 ?0.44 ?0.29 ?0.32 ?0 ?0.1 ?0.2 ?0.3 ?0.4 ?0.5 ?0.6 ?0.7 ?0.8 ?Popula?on ? ? Generic ? Userspecific ?Correla?on-??Original ? Correla?on-??Bounded ? 111 Comparison of the Models? Summed Squared Error (SSE) The plausibility of the estimated parameters in the Original and Bounded PC-DBN models was also analyzed based on the average error in estimating the students? performance on the post-test. The results of paired t-tests showed that there is no statistically significant difference between the average SSE in the Original and Bounded models. Figure 4-20 shows the results of comparison. Figure 4-20: Comparison of the models? SSE Analysis of Model Degeneracy in Bounded Prime Climb DBN Model As discussed the parameters in the bounded PC-DBN model is estimated in such a way that the parameter estimation procedure avoids selecting parameters located inside the parameters? subspaces related to degeneracy. Table 4-28 checks the estimated parameters. 0.15 ?0.19 ?0.16 ?0.15 ?0.18 ?0.15 ?0 ?0.05 ?0.1 ?0.15 ?0.2 ?0.25 ?0.3 ?Popula?on ? ? Generic ? Userspecific ?Average ?SSE-??Original ? Average ?SSE-??Bounded ? 112 Table 4-28: Evaluation of Bounded Prime Climb student model for degeneracy Prior Probabilities Parameters Degeneration Guess Edu-Guess Slip Eduguess<guess 1-Slip<Guess 1-Slip<EduGuess Population 0.7 0.7 0.2 no no no Generic 0.5 0.6 0.4 no no no User-specific 0.3 0.3 0.6 no no no Despite the fact that all of the models have been assigned parameters which are not contained in the aforementioned degeneracy subspaces, we would to check the bounded model degeneracy using the test 1 and 2 of empirical degeneration in PC-DBN. Model Degeneracy in Bounded PC-DBN Model with Population Prior Probability Figure 4-21: Degeneration in Bounded PC-DBN model with population prior setting 10.62 ?7.42 ?3.20 ?0 ?2 ?4 ?6 ?8 ?10 ?12 ?14 ?16 ?18 ?20 ?TotalDegenera?on ? Factoriza?on ?Skills ? Common ?Factor ?Skill ?Average ?Number ?of ?Degenera?on ?in ?Popula?on ?Bounded ?Model ? 113 Table 4-29: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy Prior Probabilities Failures in Test 1 Failures in Test 2 Population M = 9.17, SD = 7.45 M = 1.44, SD = 2.0 Figure 4-22: Average number of degeneration across students in population bounded PC-DBN model Model Degeneracy in Bounded PC-DBN Model with Generic Prior Probability Figure 4-23: Degeneration in Bounded PC-DBN model with generic prior setting 0.58 ?5.40 ?0.24 ?2.96 ?0.20 ?1.24 ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?Average ?Number ?of ?Degenera?on ?Types ?in ?Popula?on ?Bounded ?Model ?9.04 ?6.36 ?2.69 ?0 ?5 ?10 ?15 ?20 ?TotalDegenera?on ? Factoriza?on ?Skills ? Common ?Factor ?Skill ?Average ?Number ?of ?Degenera?on ?in ?Generic ?Bounded ?Model ? 114 Table 4-30: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy Prior Probabilities Failures in Test 1 Failures in Test 2 Generic M = 8.8, SD = 7.29 M = 0.24, SD = 0.48 Figure 4-24: Average number of degeneracy cases in Bounded PC-DBN model with generic prior setting Model Degeneracy in Bounded PC-DBN Model with Userspecific Prior Probability Figure 4-25 Degeneration in Bounded PC-DBN model with userspecific prior setting 0.58 ?5.53 ?0.18 ?2.51 ?0.02 ? 0.22 ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?Average ?Number ?of ?Degenera?on ?Types ?in ?Generic ?Bounded ?Model ?10.78 ?8.44 ?2.33 ?-??5 ?0 ?5 ?10 ?15 ?20 ?25 ?TotalDegenera?on ? Factoriza?on ?Skills ? Common ?Factor ?Skills ?Average ?Number ?of ?Degenera?on ?in ?Userspecific ?Bounded ?Model ? 115 Table 4-31: Average and Standard deviation of failures in test 1 and 2 of empirical degeneracy Prior Probabilities Failures in Test 1 Failures in Test 2 Userspecific M = 10.24, SD = 10.96 M = 0.533, SD = 1.2 Figure 4-26 Average cases of degeneracy in Bounded PC-DBN model with userspecific prior setting Comparison of Degeneracy in Original and Bounded Model Two tests of empirical degeneracy have been previously defined for PC-DBN model. Compared to the Original Prime Climb model, the parameters in the Bounded Prime Climb model were estimated to minimize the cases of model degeneracy. To check the degeneracy in the two models, the models were compared based on the average number of failures in the two tests of empirical degeneracy. Figure 4-27 shows the results. The result of a paired t-test shows that the average number of failures in test 1 of degeneracy in the Bounded model (M= 8.8 , SD=7.3) is statistically significantly lower than the Original model (M= 101.84 , SD=28.73), p<0.01. Similarly, there is statistically significant difference in the average 0.71 ?7.20 ?0.07 ?2.27 ?0.09 ? 0.44 ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?Average ?Number ?of ?Degenera?on ?Types ?in ?Userspecific ?Bounded ?Model ? 116 number of failures in the test 2 of empirical degeneracy in the Original model (M= 258.35 , SD=80.48) and Bounded model (M= 0.24 , SD=0.48), p<0.01. Figure 4-27: Comparison of degeneracy in Original and Bounded PC-DBN model Model?s Parameter Plausibility in Bounded Prime Climb DBN Model This section discusses the plausibility of the estimated model?s dependency parameters in Bounded Prime Climb model. Plausibility based on the Performance of the Hinting Mechanism The following confusion matrices represent the hinting strategy efficiency when the bounded Prime Climb model is used as the base for providing the students with adaptive hints. Remember that in the confusion matrix, UJ, CN, J and M respectively stand for Unjustified, Correctly not given, Justified and Missed hints. 101.84 ?258.36 ?8.80 ? 0.24 ?-??50 ?0 ?50 ?100 ?150 ?200 ?250 ?300 ?350 ?400 ?Test ?1 ? Test ?2 ?Average ?number ?of ?failures ?in ?the ?tests ?of ?degeneracy ?Tests ?of ?empirical ?degeneracy ?in ?PC-??DBN ?Original ? Bounded ? 117 Table 4-32: Confusion matrix for hinting mechanism for Bounded PC-DBN model with Population prior Model assessment of student knowledge (Population) Unknown Known Total Pre-Post Test Known 18 (UJ) 234 (CN) 252 Unknown 114 (J) 310 (M) 424 Total 132 544 676 Hint Strategy Precision Recall FMeasure Average Hints/Moves 0.87 0.27 0.41 55.23/164.5 Table 4-33: Confusion matrix for hinting mechanism for Bounded PC-DBN model with Generic prior Model assessment of student knowledge (Generic) Unknown Known Total Pre-Post Test Known 42 (UJ) 210 (CN) 252 Unknown 124 (J) 300 (M) 424 Total 166 510 676 Hint Strategy Precision Recall FMeasure Average Hints/Moves 0.75 0.3 0.43 48.54/164.5 Table 4-34: Confusion matrix for hinting mechanism for Bounded PC model with Userspecific prior Model assessment of student knowledge (Userspecific) Unknown Known Total Pre-Post Test Known 0 (UJ) 252 (CN) 252 Unknown 371 (J) 53 (M) 424 Total 371 305 676 Hint Strategy Precision Recall FMeasure Average Hints/Moves 1 0.88 0.94 41.94/164.5 The plausibility of the estimated parameters for the Original and Bounded is also compared. 118 Plausibility based on the Performance of the Hinting Mechanism We first compare the performance of the hinting mechanism in the two models, based on the hinting accuracy values for each model. To this end, the accuracy of the hinting mechanism is calculated for each student and then a paired t-test is used to compare two groups of accuracies. The independent variable in this t-test is the student?s model, Original or Bounded. Figure 4-28 shows the results of comparing the performance of the hinting mechanism for the two models: Original and Bounded PC-DBN. Figure 4-28: Comparing performance of the hinting mechanism in Original and Bounded models with population prior probability setting The result of a paired t-test shows that there is a statistically significant difference between accuracy measure of the hinting mechanism in Original (M=0.24 , SD=0.2) and Bounded (M=0.29 , SD=0.22) models, p=0.001. Also there is a statistically significant difference between the total number of given hints in the Original (M=112.5 , SD=56.62) and Bounded (M=55.2 , SD=19.84) models, p<0.01. Based on the results, the hinting mechanism provides two time more hints in the Original PC-DBN model than Bounded PC-DBN while the hinting accuracy in the Bounded model is 0.24 ?0.29 ?0 ?0.1 ?0.2 ?0.3 ?0.4 ?0.5 ?0.6 ?F-??Measure ?Original ? F-??Measure ?Bounded ?Average ?F-??Measure ?of ?Hin?ng ?Strategy ?(Popula?on) ?112.51 ?55.22 ?0 ?50 ?100 ?150 ?200 ?# ?Total ?Hints ?in ?Original ?# ?Total ?Hints ?in ?Bounded ?Average ?Number ?of ?Given ?Hints ?(Popula?on) ? 119 higher than Original model. This shows that the Bounded PC-DBN model provides more accurate and applicable model than the Original model when the population prior setting is used. Figure 4-29 shows the results of comparing the hinting mechanism?s accuracy in Original and Bounded model with userspecific prior probability setting. Figure 4-29: Comparing performance of the hinting mechanism in Original and Bounded models with userspecific prior probability setting A paired t-test shows that the hinting accuracy in the Original model (M=1 , SD=0) is higher than the hinting accuracy in the Bounded model (M=0.95, SD=0.08) when userspecific prior is used, p=0.005. Despite this fact, the hinting strategy gives 3 times more hints in the Original model than the Bounded model. It is already mentioned that the average number of movements made by a student is 164.5 movements. On the other hand, on average 139 hints are given in the Original model. In other words a hint is given for each 1.18 movement. This could be considered as too much intervention by the students. On the contrary, one hint is given for every 3.9 (~ 4) movements in the Bounded PC-DBN model. Similar comparison is done when the models use generic prior probability setting. A paired t-test shows that the accuracy of the hinting mechanism in the Original model (M=0.55 , SD=0.32) is statistically significantly higher than hinting accuracy in the Bounded model 1 ?0.95 ?0.75 ?0.8 ?0.85 ?0.9 ?0.95 ?1 ?1.05 ?F-??Measure ?Original ? F-??Measure ?Bounded ?Average ?F-??Measure ?of ?Hin?ng ?Strategy ?(Userspecific) ?139 ?42 ?0 ?50 ?100 ?150 ?200 ?250 ?# ?Total ?Hints ?in ?Original ?# ?Total ?Hints ?in ?Bounded ?Average ?Number ?of ?Given ?Hints ?(Userspecific) ? 120 (M=0.3 , SD=0.24), p<0.01. Similarly, the hinting strategy in the Original model has provided significantly higher number of hints to the students (M=82.95 , SD=27.37) than the Bounded model (M=48.53 , SD=19.1), p<0.01. Figure 4-30: Comparing performance of the hinting mechanism in Original and Bounded models with generic prior probability setting In conclusion, when population prior is used the estimated parameters of the Bounded model are more plausible than model?s parameters in the Original model. When userspecific and generic prior probability settings are used, the hinting strategy provides significantly more number of hints to the students, when Original model is used and the game intervenes too much. In fact, although the hinting strategy is more accurate in Original model with generic and userspecific, providing too much hints make the hinting mechanism inapplicable. In sum, in this section the bounded PC-DBN model in Prime Climb was described. The bounded model has been introduced to deal with the issue of degeneracy in Prime Climb. This section also reported on varying measures of internal and external validities in bounded PC-DBN model. The degeneracy issue was also investigated in bounded PC-DBN model. The bounded and original model were compared to each other and it was showed that the bounded model has significantly lower number of cases of degeneracy than the original PC-0.56 ?0.30 ?0 ?0.2 ?0.4 ?0.6 ?0.8 ?1 ?F-??Measure ?Original ? F-??Measure ?Bounded ?Average ?F-??Measure ?of ?Hin?ng ?Strategy ?(Generic) ?82.96 ?48.53 ?0 ?20 ?40 ?60 ?80 ?100 ?120 ?# ?Total ?Hints ?in ?Original ?# ?Total ?Hints ?in ?Bounded ?Average ?Number ?of ?Given ?Hints ?(Generic) ? 121 DBN model while comparable end accuracy with PC-DBN model. The plausibility of the bounded PC-DBN model?s parameters was also investigated and compared to the plausibility of the original PC-DBN model?s parameters. The comparison results showed that in general, the estimated parameters in bounded model is more plausible than estimated parameters in original model. 122 5 Chapter: Mining User Interactions Behaviors in Prime Climb Adaptive educational systems have been designed to answer the need for understanding and taking into account varying learning styles, capabilities and preferences in individuals in developing their knowledge and skills. To this end, many educational systems apply data mining approaches to recorded interactions logs of students to extract abstract high level information about students. Kardan, et al. leveraged behavior discovery to propose a general framework for distinguishing users? interaction style in exploratory learning environments [30]. McCuaig, et al. [31] exploited data mining technique to distinguish between successful and unsuccessful learners in a Learning Management System (LMS). Lopez, et al. [32] described an approach to predicting student?s final mark through their participation in a forum. In addition, students? interaction data has been also modeled with the aim of understanding learning evolution in the students in Interactive Learning Environment (ILE) [33]. Therefore, users? interaction behaviors as observable components in educational systems have been utilized extensively for capturing higher level behavioral patterns in students. Along this line of research, we also would like to understand how students interact with Prime Climb as an adaptive educational game and whether there is a connection between behavioral patterns and attributes (for instance higher versus lower learners) in the students. The ultimate goal of an adaptive edu-game like Prime Climb is providing an environment in which more students can learn the desired skills through more personalized interaction with the game. This requires that the game maintains an accurate understanding of individual differences between users and provides tailored interventions with the aim of guiding the learners in the right path. For instance, if the Prime Climb?s pedagogical agent is capable of discovering behavioral patterns shown by students who learn well from the game, 123 it could encourage such patterns in less efficient students through providing feedback and hints. Similarly, if the agent could identify whether a group of students have higher domain knowledge than the other group, it could be possible to leverage such information to devise a more accurate user?s model and intervention mechanism. The user?s interaction behaviours can also be revealed to developers as a guide for making improvement in the system e.g. [34]. Behavioral discovery has been vastly used in educational systems but there is limited application in educational games like Prime Climb in which educational concepts are embedded in the game with minimum technical notation to maximize game aspects (like engagement) of the system. In Prime Climb, students do not explicitly practice approaches to number factorization but implicitly follow a self-regulated mechanism to explore and understand the methods and practice them. This chapter describes the first step toward leveraging data measuring of students? behavioral patterns to build more effective adaptive edu-game. To this end, we use the recorded data from interactions of students with Prime Climb to understand different patterns of interacting with the game which characterize groups of students. The ultimate goal is devising mechanisms for abstracting high level behavior from raw interaction data and leveraging such understanding for real-time identification of characterizing interaction styles to enhance user modeling and intervention mechanism in an adaptive edu-game like Prime Climb. 5.1 Behavior Discovery in Prime Climb The long term objectives of behavior discovery in Prime Climb are as following: 1) Developing more personalized user model 2) Constructing more effective and tailored adaptive intervention mechanism. Figure 5-1 shows general framework of behavior 124 discovery in Prime Climb. This framework comprises of 4 major modules: 1) Data Collection 2) User Representation 3) Clustering and 4) Associative Rule Mining. Each of these modules is discussed in more details in the next sections. Figure 5-1: Behavior Discovery Framework in Prime Climb Data Collection: The objective of data collection module is to use the interaction data of the students who played Prime Climb (see Chapter 2 for more details) to prepare required data for the user representation component. This process involves parsing interaction data, data cleaning and constructing data structures for representing data. User Representation: The data structures constructed from interaction logs are used for extracting higher level information (features) about the subjects. Each user is represented by a vector of features. Features are statistical measured calculated for each student using her interaction data. Clustering: The objective of this component is to identify groups of students that showed similar interaction behaviors. One main requirement for developing adaptive educational systems is building more personalized model of user and providing individualized supports to users. In Prime Climb, a parametric Dynamic Bayesian Network is applied for modeling students and a heuristic parametric hinting mechanism is used for providing adaptive real-time hints to the students. As mentioned above, we aim at building more personalized student 125 model and intervention mechanism. Current version of student model and hinting mechanism in PC, use a similar model and mechanism for all students. We use the student?s interaction data to represent each student with a vector of features and use a clustering mechanism to find students who behave similarly to each other. As a result of clustering, 2 or more clusters will be discovered. Members of each cluster have behaved similarly so this could provide potentials for developing a model and hinting strategy for each cluster of students. In particular, we are more interested in finding clusters which are not only different from interaction behavior perspective, but also the clusters that are significantly different from each other based on certain measures such as cluster?s average learning gain. We will discuss these measures in the next sections. Associative Rule Mining: As previously discussed, the result of clustering is several clusters of students, in each of which students have interacted similarly with Prime Climb. One step further toward building more adaptive edu-game is understanding what interactive patterns are frequent in each cluster. The final objectives of extracting the frequent patterns for each cluster are as following: 1) Understanding what frequent patterns are observed in interaction behavior of students with certain characteristics (e.g higher/lower prior knowledge, lower/higher achiever). We then can leverage such information, to design and develop a more effective hinting strategy. 2) Investigating which interaction behaviors are more informative so that they could be modeled in the student model. To this end, the Hotspot algorithm is used for association rule mining. Next sections describes Prime Climb?s behavior discovery framework in more details. 126 5.2 Class Association Mining In order to identify interaction behaviors in students who play Prime Climb, an association rule mining approach is used. The hotspot algorithm [52] is applied for association rule mining on the interaction data of students. Hotspot algorithm generates the rules which are associated to a class label. An association rule is in the form of X?Y, where X and Y are disjoint (they have no item in common) subsets. In our case, X is a subset of several binary variables which represent the extracted features from interaction data (such as ?number of moves?, ?number of correct/wrong moves?, ?time spent on each movement? and so on, see Section 5.2.2 for more details) and the value of each of these variables could be either Low or High. Y instead, is a one-member subset of the set {Cluster1, Cluster2, ?, Clustern}. A generic rule in our case is as following: {Action A frequency = High} ?{Cluster c} In this case X = { Action A frequency = High } and Y = { Cluster c } and it means that X is a pattern detected in some or all members of cluster c. Support and Confidence are two traditional measures in association rule mining. The confidence of an association rule X?Y is the proportion of data containing X which also contain Y. Since in our case, Y is a class label, confidence shows the fraction of the cluster?s members showing the pattern X. A higher frequent rule in a cluster is the one whose confidence is closer to 100%. On the contrary, support of the rule is defined as fraction of all data points that contain both X and Y [40]. Two parameters in association rule mining influence the type and number of rules. 1) The tree branching factor influencing how many new rules can be generated from an existing rule by adding a new condition. 2) The minimum confidence required for creating a new tree branch [52]. As previously mentioned, we use the 127 Hotspot algorithm for behavioral discovery in Prime Climb. To this end, the maximum branching factor has been set to 2 and minimum confidence has been set to 5%. 5.3 Features, Measures, Data Points and Datasets The first step in understanding how users interact with an educational game (Prime Climb) is parsing the students? interaction logs with the aim of describing each user by a features vector. When a user is interacting with Prime Climb, all interactions are recorded in a time-stamped log file. There are several types of actions a user can made while playing Prime Climb as the following: ? Moving: Given the current position of the partner on the mountain, the user moves by clicking on a numbered hexagon on the mountain. ? Using Magnifying glass tool: The user can use the magnifying glass (MG) tool in order to see the factor tree of a number. To this end, the user can use the MG tool which is located on the game?s screen and click it on the numbers located on the mountains to see the factor tree of them. (see Chapter 2 for more details) ? Asking for hints: Once an adaptive hint is presented to the user, the user has the possibility to ask for further help by pressing a designated button located on the presented hint. Therefore in general, there are three types of interactions: 1) making movement, 2) using magnifying glass and 3) asking for hints. Due to technical issues in the interaction logs, it was not possible to extract reliable data pertaining to the hints therefore only features, related to making movements and using MG were extracted from the interaction logs. 128 5.3.1 Interaction Logs and Datasets In order to extract the features, interaction logs of the 45 students who played the game were parsed for movements and MG related features. Out of these 45 students, 12 of them played the web-base game and 33 played the older version of the game (see Chapter 2 for more details on user studies). Despite this difference, since the interactions in the both versions of the game (web-base and old) are similar, the two groups of students were integrated. In addition, despite the fact that the Prime Climb game consists of 11 levels (mountains), not all students could manage to reach the last level. Out of the 45 students previously mentioned, 43 completed 9 or more levels and 2 completed less than 9 mountains so the 2 students with less completed number of levels were excluded. Furthermore, when extracting features from logs, only the interactions with the first 9 mountains were included in features calculations because 9 is the minimum number of levels all 43 students have completed. 5.3.2 Features Definitions The extracted features comprise of statistical measures describing the user?s interaction with the game. Overall, there are two types of features: 1) Mountains-Generic Features [m-n] (m>=1 and n<=9): or mountains independent features are measures calculated based on the users? interactions with mountain m through n inclusively. For instance the feature, correct-movements [1-9], represents the total number of correct movements made by the user on mountain 1 through 9. 2) Mountains-Specific Features [k] (1<=k<=9): or mountain-dependent features are measures calculated based on interactions only with mountain k. For instance, correct-movements [7], represents the total number of correct movements made by the user on mountain 7. 129 Table 5-1 and Table 5-2 respectively show all movements and MG related features extracted from interaction logs and their description. Each user who played the game is represented as a feature vector containing all or some of the following features. Table 5-1: Movements-related features extracted from interaction logs General Notations ? ?????= ?a ?wrong ?or ?correct ?move ?? ?????_???? ?= ?a ?wrong ?move ? ?? ???????_???? =a ?correct ?move ?? ???? ? ?= ? ?time ?spent ?on ?making ?a ?movement ?X ? ????? ? ?= ? ?time ?spent ?on ?making ?a ?movement ?X ?on ?mountain ?i ? #?:?????? ??? ?? ? #??:?????? ??? ?? ??? ????????? ?? ? ????(?)= ?mean ?(average) ?of ?X ?? ????? ? = ??????? ??? ?? ??? ????????? ?? ?? ??? ?= ?standard ?deviation ? ?? ???? = ???????? ?????????? ??? ????????? ?? ?? ?_? =movement ?X ?(move ?or ?correct_move ?or ?wrong_move) ?on ?mountain ?i ?? ?? = ???????? ?? ?Features Description ???? ????? : Time-Spent-on-Movements Total ?time ?spent ?on ?movements ????? ????? = ????(????)??? ???? ????? : Mean-Time-On-Movements Average time per movement ???? ????? = ????(????)??? #???? ??? ????? : STD-Time-On-Movements Standard deviation of times spent on movements ??? ????? = (???? ???? ?????(?????))???? #???? ? 1 ???? ????????? : Mean-Time-Spent-On-Mountain Average time per mountain ???? ????????? = ????(??)????#???????? ??? ????????? : STD-Time-Spent-on-Mountain Standard deviation of times spent on mountains ??? ????????? = (???? ?? ????? ????????? )????? #????????? ? 1 ???? ???????_????? : Time-Spent-on-Correct-Movements Total time spent on correct movements ???? ???????_????? = ????(???????_????)????? _ ??? ???? ???????_????? : Mean-Time-on-Correct-Movements Average time spent per correct movement ???? ???????_????? = ????(???????_????)????? _ ???#???????_???? 130 Features Description ??? ???????_????? : STD-Time-on-Correct-Movements Standard deviation of times spent on correct movements ??? ???????_?????= (???? ???????_???? ?????(???????_?????))?????? _ ??? #???????_???? ? 1 ???? ?????_????? : Time-Spent-on-Wrong-Movements Total time spent on wrong moves ???? ?????_????? = ????(?????_????)???? _ ??? ???? ?????_????? : Mean-Time-on-Wrong-Movements Mean time spent per wrong movement ???? ?????_????? = ????(?????_????)???? _ ???#?????_???? ????_? ????????_????? Mean-Time-Spent-On-Correct-Movements-On-Mountains Mean time per mountains on correct movements ?????(???????_?????) = ? ????(???????_?????)????? _ ??? ? ????_? ????????_????? = ?????(???????_?????)???? #????????? ???_? ???????_????? STD-Time-on-Correct-Movements-on-Mountains Standard deviation of time spent on correct movements ???_? ???????_?????= (?????(???????_?????) ?????_? ????????_????? )?????? _ ??? #????????? ? 1 ????_? ??????_????? Mean-Time-Spent-On-Wrong-Movements-On-Mountains Mean time per mountains on wrong movements ?????(?????_?????) = ? ????(?????_?????)???? _ ??? ? ????_? ??????_????? = ?????(?????_?????)???? #????????? ???_? ?????_????? STD-Time-on-Wrong-Movements-on-Mountains Standard deviation of total times spent on wrong movements on mountains ???_? ?????_?????= (?????(???????_?????) ?????_? ????????_????? )????? _ ??? #????????? ? 1 Mean-Time-Between-Correct-Moves Mean time spent between two correct moves STD-Time-Between-Correct -Moves Standard deviation of time spent between two correct moves Mean-Time-Between-Wrong-Moves Mean time spent between two wrong moves STD-Time-Between-Wrong -Moves Standard deviation of time spent between two wrong moves Mean-Time-Between-Correct-Wrong-Moves Mean time spent on a wrong move after a correct move STD-Time-Between-Correct-Wrong-Moves Standard deviation of time spent on a wrong moves after a correct moves Mean-Time-Between-Wrong-Correct-Moves Mean time spent on a correct move after a wrong move STD-Time-Between-Wrong-Correct-Moves Standard deviation of time spent on correct moves after a wrong moves 131 Features Description Mean-Time-Between-Consecutive-Correct-Movements Mean time spend per sequence of correct moves (total time on making correct moves in a row) STD-Time-Between-Consecutive-Correct-Movements Standard deviation of time spend per sequence of correct moves Mean-Time-Between-Consecutive-Wrong-Movements Mean time spend per sequence of wrong moves STD-Time-Between-Consecutive-Wrong-Movements Standard deviation of time spend per sequence of wrong moves Mean-Number-of-Correct-Moves-InARow Mean length of sequence of correct moves Example: ccwwcccwww = (2+3)/2 (c: correct, w:wrong) STD-Number-of-Correct-Moves-InARow Standard deviation of lengths of sequences of correct moves Mean-Number-of-Wrong-Moves-InARow Average length of sequence of wrong moves STD-Number-of-Wrong-Moves-InARow Standard deviation of lengths of sequences of wrong moves Total-Number-Of-Movements Total number of movements made by the student Total-Number-Of-Correct-Movements Total number of correct moves made by the student Total-Number-Of-Wrong-Movements Total number of wrong moves made by the student Std-Number-Of-Movements-Per-Mountain Standard deviation of number of movements on each mountain Std-Number-Of-Correct-Movements-Per-Mountain Standard deviation of number of correct movements on each mountain Std-Number-Of-Wrong-Movements-Per-Mountain Standard deviation of number of wrong movements on each mountain Table 5-2: Magnifying glass (MG) related features Features Description Number-of-MG-Usage Total number of the magnifying glass usage Mean-Number-Of-MG-Usage-Per-Mountain Mean number of usage of MG per mountain STD-Number-Of-MG-Usage-Per-Mountain Standard deviation of usage of MG on each mountain Number-of-Movement-Per-Use-of-MG Average number of movements per each usage of MG Number-of-Correct-Movement-Per-Use-of-MG Average number of correct movements per each usage of MG 132 Features Description Number-of-Wrong-Movement-Per-Use-of-MG Average number of wrong movements per each usage of MG STD-Number-Of-Movements-Before-Usage-Of-MG Standard deviation of number of movements per each usage of MG STD-Number-Of-Correct-Movements-Before-Usage-Of-MG Standard deviation of number of correct movements per each usage of MG STD-Number-Of-Correct-Movements-Before-Usage-Of-MG Standard deviation of number of correct movements per each usage of MG 5.3.3 Feature Set Definition A dataset contains one or more data points. A data point is a representation of a user?s interactions with Prime Climb. Using the features defined in the previous subsection, varying datasets can be defined. Each user in the dataset is represented by a vector of features. Overall, in this study, 2 types of feature set are defined as follows: 1) Full-feature set: A dataset whose data points (students) are represented by features vectors containing features from all 9 mountains. 2) Truncated-feature set: A dataset whose data points (students) are represented by features vectors containing features from some (not all) mountains. A truncated-feature set can be useful for building online classifiers to identify users by using only a fraction of interaction data. Based on the above definitions, the following 12 datasets have been defined. The name of each dataset follows a specific format as following: [Full | Truncated]-[Mountain-Generic | Mountain-Generic+Specific]-[Movement | MG | Movement+MG] 133 The first part, [Full | Truncated], shows whether the data points are represented by features from all 9 mountains (Full) or from some of the mountains (Truncated). The second part, [Mountain-Generic | Mountain-Generic+Specific], indicates whether the features are just mountain-generic features or integration of mountain-generic and mountain-specific features. The last part, [Movement | MG | MG+Movement], specifies whether the features are movements, MG or integration of movements and MG features. Given the mentioned notations, the following datasets are defined: 1- Full-Mountains-Generic-Movements: In this dataset, the included data points (subjects) are represented by features vectors only including mountain-generic features[1-9] and features are only movements-related features over all the first 9 mountains. 2- Full-Mountains-Generic+Specific-Movements: In this dataset, the included data points (subjects) are represented by features vectors including mountain-generic features[1-9] and mountain-specific feature[k] (1<=k<=9) and features are only movements-related features over all the first 9 mountains. 3- Full-Mountains-Generic-MG: In this dataset, the included data points (subjects) are represented by features vectors only including mountain-generic features[1-9] and features are only magnifying glass (MG)-related features. 4- Full-Mountains-Generic+Specific-MG: In this dataset, the included data points (subjects) are represented by features vectors including mountain-generic features[1-9] and mountain-specific feature[k] (1<=k<=9) and features are only magnifying glass (MG)-related features. 134 5- Full-Mountains-Generic-MG+Movements: In this dataset, the included data points (subjects) are represented by features vectors only including mountain-generic features[1-9] and features are the integration of magnifying glass (MG) and movements-related features. 6- Full-Mountains-Generic+Specific-MG+Movements: In this dataset, the included data points (subjects) are represented by features vectors including mountain-generic features[1-9] and mountain-specific feature[k] (1<=k<=9) and features are integration of magnifying glass (MG) and movements-related features. 7- Truncated-Mountains-Generic-Movements: In this dataset, the included data points (subjects) are represented by features vectors only including mountain-generic features[1-k] (1<=k<9) and features are only movements-related features. Therefore there are 8 of such dataset which can be denoted as Truncated-Mountains-Generic-Movements[k], (1<=k<9). 8- Truncated-Mountains-Generic+Specific-Movements: In this dataset, the included data points (subjects) are represented by features vectors including mountain-generic features[1-k] (1<=k<9) and mountain-specific features[1-k] (1<=k<9) and features are only movements-related features. There are 8 of such dataset can be denoted as Truncated-Mountains-Generic+Specific-Movements[k], (1<=k<9). 9- Truncated-Mountains-Generic-MG: In this dataset, the included data points (subjects) are represented by features vectors only including mountain-generic features[1-k] (1<=k<9) and features are only magnifying glass related features. Therefore there are 8 of such dataset can be denoted as Truncated-Mountains-Generic-MG[k], (1<=k<9). 135 10- Truncated-Mountains-Generic+Specific-MG: In this dataset, the included data points (subjects) are represented by features vectors including mountain-generic features[1-k] (1<=k<9) and mountain-specific features[1-k] (1<=k<9) and features are only movements related features. Therefore there are 8 of such dataset can be denoted as Truncated-Mountains-Generic+Specific-MG[k], (1<=k<9). 11- Truncated-Mountains-Generic-MG+Movements: In this dataset, the included data points (subjects) are represented by features vectors only including mountain-generic features[1-k] (1<=k<9) and features are the integration of magnifying glass and movements related features. Therefore there are 8 of such dataset can be denoted as Truncated-Mountains-Generic-MG+Movements[k], (1<=k<9). 12- Truncated-Mountains-Generic+Specific-MG+Movements: In this dataset, the included data points (subjects) are represented by features vectors including mountain-generic features[1-k] (1<=k<9) and mountain-specific features[1-k] (1<=k<9) and features are the integration of magnifying glass and movements related features. Therefore there are 8 of such dataset can be denoted as Truncated-Mountains-Generic+Specific-MG+Movements[k], (1<=k<9). 5.3.4 Measures Prior to performing clustering on the datasets, two measures are defined so that the resultant clusters can be compared based on the measures. The measures are as follows: 1) Prior knowledge: A cluster?s average prior knowledge is calculated as the average of raw pre-test scores of the cluster?s members. Note that, in our case, the cluster?s members are students who played the game. The following formula is used to calculate the cluster?s prior knowledge: 136 ????????? ?????? ?????????? = ???_????(???????)??????????????? ?????????? ????? Equation 5-1 where ???_???? ??????? is the student?s pre-test score. 2) Percentage of Learning Gain (PLG): PLG is the standard way of calculating the percentage of improvement from a pre-test score to a post-test score. In this study, prior to starting the game, the student takes a pre-test exam and once done with the game takes the same post-test exam. (See Appendix B and Appendix C) PLG of a cluster is defined as the average of PLGs of the cluster?s members. A cluster?s member?s PLG is calculated using the following formula: ??? ??????? = ????_????(???????)????_????(???????)?????_????(??????? ??????? ) ? ???% Equation 5-2 Where A is the maximum possible pre-test score (in our case 15). Figure 5-2and Figure 5-3 respectively show the frequency distributions of the pre-test and post-test scores across all subjects. Also Table 5-3 shows the descriptive statistics on the pre-test and post-test scores. Table 5-3: Descriptive statistics on the pre-test and post-test scores of the 43 subjects Test Mean Standard Deviation pre-test 11.697 3.291 Post-test 11.813 3.156 137 Figure 5-2: pre-test scores distribution across the subjects Figure 5-3: post-test scores distribution across the subjects As can be seen in Figure 5-2, there are 7 students who received a full mark of 15 on the pre-test. If such students received the same scores in the post-test we could consider a PLG of zero for them. But as can be seen in Figure 5-3, there are only 6 students who received a full mark of 15 on the post-test. A quick analysis of data showed that there are 4 students who received a full mark on the pre-test but not on the post-test. For these 4 students, PLG cannot 138 be defined (it is a non-zero/0 fraction and therefore undefined). Notice that the prior knowledge measure is defined for these subjects. Having the fact that the PLG cannot be defined for 4 subjects (out of 43) enforces using two separate datasets: 1) a dataset of 43 subjects when dealing with the prior knowledge measure and 2) a dataset of 39 subjects when working with the PLG. We used GA K-Means (K-Means for short) clustering algorithm [30] which is modified version of GA K-Means [53] for clustering purpose. The K-Means clustering method was applied on each of the dataset previously defined. Prior to perform clustering, a features selection mechanism [64] is applied to filter out irrelevant features. Then K-Means is used to cluster the students to optimal number of clusters [65]. Once the clusters are built, we analyzed whether the resulting clusters are statistically significant different on cluster?s prior knowledge or cluster?s PLG. Then association rule mining is used to extract rules for each cluster. As shown in Table 5-3 there are 108 datasets in overall. 16 of them are full and 92 of them are truncated datasets. We have applied features selection process [64] followed by clustering on all 108 datasets. The clusters generated as a result of applying K-Means method on each dataset were compared to one another with respect to cluster? prior knowledge and cluster?s PLG. We then applied associative rule-mining on the clusters. Because there are 108 datasets, there are 108 clustering and cluster comparison results. Not all the results are included in this chapter to avoid confusion. Appendix A presents the results for all clustering and comparison of the clusters as well as extracted rules in cases the clusters are statistically significant different. For each dataset, the description of the dataset (size of dataset, possible outlier) is also provided. In this chapter, only presents some of the results which are more interesting. 139 Figure 5-4: All combinations of different features types (pre: prior knowledge, PLG: percentage of learning gain) 5.4 Behavior Discovery with Full-Features Sets As previously described, Full-features datasets are datasets which use features, extracted from all 9 mountains. Behavior discovery on such datasets are more useful for understanding how users interact with Prime Climb and which behavioral patterns can distinguish groups of students. From all 12 datasets mentioned above, we selected two datasets which resulted in clusters that are statistically significant different with respect to cluster?s prior knowledge 140 and cluster?s PLG. Since the number of available data points in the datasets is at most 43, we prefer datasets which contains data points represented by smaller number of features. Behavior Discovery on Full-Mountains-Generic-Movements Feature set In Full-Mountains-Generic-Movements dataset, the data points are represented by a vector of mountain-generic movement features. As a result of a feature selection [64] mechanism 18 features were selected out of original 30 features. Then the optimal number (=2) of clusters was found [65] and using the K-Means method the dataset was clustered into 2 groups of students as shown in Table 5-4. Table 5-4: Clustering results on Full-Mountains-Generic-Movements Dataset Full-Features Dataset, Mountain-Generic, Movement Features, #Cluster Outlier:0 #data points=43, #Total Features=30 #Total Selected Features=18 Cluster1 (HPK) Cluster2 (LPK) Statistics* Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 2.0 11.3 3.4508 0.036 0.535 The t-test was used to compare the clusters? prior knowledge. The result showed that there is a statistically significant difference between the prior knowledge of cluster 1 of students (higher prior knowledge (HPK) group) (M=13.0, SD=2.0) and cluster 2 of students (lower prior knowledge (LPK) group) (M=11.3, SD=3.45), p=0.03 and d=0.53. Next, class association rule-mining was applied on the clusters. Table 5-5 shows the rules extracted from each discovered cluster and Table 5-6 shows the selected features. There is some information shown in Table 5-5 (and other similar tables showing extracted rules in this chapter) which is important for understanding and interpreting the rule extraction results. The rule?s support is shown between the braces, in front of the extracted rules in Table 5-5. For instance, [6/6=100%] in front of the first rule for the cluster 1 in Table 5-5 shows that there are in total 141 6 (in denominator) out of 43 students on which the extracted rule applies and all of these students belong to cluster 1 (6 on numerator of the fraction), therefore the support for this rule is 100%. In addition, it can be concluded that this extracted rule applies to 60% (6/10) of the cluster 1 because the size of cluster 1 is 10 and the rule applies on 6 of the cluster?s members. Table 5-5: Associative rules extracted from 2 clusters on full-mountains-generic-movements dataset Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 10/43=23.26%) Rules for Cluster [HPK] ? Mean-Time-On-Movements = Higher (100%=[6/6]) ? Mean-Time-Spent-On-Correct-Movements-On-Mountains = 'Higher' (100%=[5/5]) Cluster 2 Cluster 2[LPK]: (Size: 33/43=76.74%) Rules for Cluster [LPK] ? Mean-Time-On-Movements = Lower (89.19%= [33/37]) o STD-Time-On-Wrong-Correct-Moves = Lower (94.29%=[33/35]) ? Mean-Time-On-Consecutive-Wrong-Movements = Lower (88.57%=[31/35]) o STD-Time-On-Movements = Lower (93.94%=[31/33]) o STD-Time-On-Correct-Movements = Lower (93.94%=[31/33]) Table 5-6: Selected Features for full-mountains-generic-movements Selected Features Mean-Time-Spent-On-Mountain Mean-Time-Spent-On-Wrong-Movements-Mountains Mean-Number-of-Wrong-Moves-InARow STD-Time-On-Movements STD-Number-of-Wrong-Moves-InARow STD-Time-Spent-On-Correct-Movements Mean-Number-Of-Wrong-Movements-Per-Mountain STD-Time-On-Consecutive-Correct-Movements STD-Number-Of-Wrong-Movements-Per-Mountain STD-Time-On-Correct-Movements STD-Number-of-Correct-Moves-InARow Mean-Time-On-Wrong-Correct-Moves Std-Number-Of-Correct-Movements-Per-Mountain STD-Time-On-Wrong-Correct-Moves Mean-Time-Spent-On-Correct-Movements Mean-Time-On-Consecutive-Wrong-Movements Mean-Time-On-Movements STD-Time-Spent-On-Mountain 142 Interpretation and Discussion The extracted rules show that the students belonging to the cluster with higher prior knowledge on factorization skills (cluster 1), spent more time on movements and correct movements. This could indicate that the students with higher prior knowledge were more involved in the game and spent more time before making a movement. Since the time spent on making correct movement is higher for this group of students, it might mean that a correct move by this group of students is less likely due to guess compared to the total population. Remember that each time a student has an opportunity to make a movement she has at least a probability of one third in making a correct movement. This is because at each opportunity to move to a number, the player?s movement options are limited to moving to only three numbers that are located near the current player?s location. On the other hand, the group of students with lower prior knowledge spent lower time on making movements as well as wrong movements as shown in Table 5-5. This could be an indication for less involvement in the game by lower prior knowledge group. It could show that a correct movement by this group of students is more likely to guess. Given these patterns of interaction, once the pedagogical agent detects such patterns in the students, it could prompt the user by encouraging them to spend more time on making movements. In addition there are some other frequent patterns of interaction for the group of students with lower prior knowledge. The other patterns show a lower standard deviation on time spending on making movements and correct movements. This indicates that this group of students showed a consistent pattern of lack of engagement in the game. Therefore, we can conclude that the students with higher prior knowledge showed more engagement in the game than students with lower prior knowledge. 143 Interactions Discovery on Full-Mountains-Generic+Specific-Movements Feature set The features selection and the K-Means clustering were also performed on Full-Mountain-Generic+Specific-Movements dataset. 76 features were selected out of 149 features. Originally there were 39 students in the data point. As the result of clustering, one cluster was generated which contained only one dataset. To avoid this situation, the student contained in the single-member cluster was excluded from the dataset. The optimal number of clusters was calculated (=2) [65] and clustering was applied on the dataset. The result of clustering is shown in Table 5-7. Table 5-7: Clustering results on Full-Mountains-Generic+Specific-Movements Dataset Full-Features Dataset, Mountain-Generic+Specific-Movement Features, #Cluster Outlier:1 #data points=38, #Total Features=149 #Total Selected Features=76 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 0.7641% 58.7927% -43.5178% 70.4686 0.066 0.717 The result of t-test on comparison of PLG of the 2 generated clusters showed that there is a marginally significant difference between cluster one of students (higher learning gain (HLG) group) (M=0.76%, SD=58.79%) and cluster two of students (lower learning gain (LLG) group) (M=-43.51%, SD=70.46%), p=0.06 and d=0.72. The high effect size shows that the difference is also practically significant. Table 5-8 shows the extracted associative rules from the clusters. 144 Table 5-8: Associative rules extracted from 2 clusters on full-mountains-generic+specific-movements dataset Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 29/38=76.32%) Rules for Cluster [HLG] ? Mean-Time-On-Movements[5] = Lower (93.55%=[29/31]) o STD-Time-On-Wrong-Correct-Moves = Lower (100%=[29/29]) ? Mean-Time-On-Correct-Movements[5] = Lower (93.55%=[29/31]) o STD-Time-On-Wrong-Correct-Moves = Lower (100%=[29/29]) Cluster 2 Cluster 2[LLG]: (Size: 9/38=23.68%) Rules for Cluster [LLG] ? Mean-Time-On-Movements[5] = Higher (100%=[7/7]) ? Mean-Time-On-Correct-Movements[5] = Higher (100%=[7/7]) Table 5-9: Selected Features for full-mountains-generic+specific-movements Selected Features Time-Spent-On-Mountains[3] Mean-Time-On-Correct-Movements[4] Time-Spent-On-Correct-Movements-On-Mountains[3] Mean-Time-On-Movements[5] Time-Spent-On-Wrong-Movements-On-Mountains[1] Mean-Time-On-Correct-Movements[5] STD-Time-On-Consecutive-Wrong-Movements Mean-Time-On-Correct-Movements Mean-Time-On-Correct-Movements[1] Total-Number-Of-Movements Mean-Time-On-Consecutive-Correct-Movements[9] Total-Number-Of-Correct-Movements Mean-Time-On-Consecutive-Correct-Movements[1] Number-Of-Movements[6] STD-Time-On-Consecutive-Correct-Movements[9] Number-Of-Movements[1] Mean-Time-On-Wrong-Movements Number-Of-Correct-Movements[1] Mean-Time-On-Movements[1] Total-Number-Of-Wrong-Movements Mean-Time-On-Consecutive-Wrong-Movements Number-Of-Correct-Movements[6] Mean-Time-On-Movements[9] Number-Of-Wrong-Movements[3] total-STD-Time-On-Wrong-Correct-Moves Number-Of-Wrong-Movements[6] STD-Time-On-Consecutive-Correct-Movements[4] Mean-Number-Of-Movements-Per-Mountain STD-Time-On-Movements[2] Mean-Number-Of-Correct-Movements-Per-Mountain STD-Time-On-Consecutive-Correct-Movements[2] Std-Number-Of-Wrong-Movements-Per-Mountain STD-Time-On-Correct-Movements[3] Number-Of-Wrong-Movements[1] STD-Time-On-Movements[9] Mean-Number-Of-Wrong-Movements-Per-Mountain STD-Time-Spent-On-Correct-Moves-On-Mountains Mean-Number-of-Correct-Movements-InARow[9] 145 Selected Features STD-Time-On-Movements[3] STD-Number-of-Wrong-Moves-InARow Mean-Time-On-Consecutive-Correct-Movements[2] Mean-Number-of-Wrong-Moves-InARow STD-Time-On-Consecutive-Correct-Movements Mean-Number-of-Correct-Movements-InARow[8] STD-Time-On-Movements Mean-Number-of-Correct-Movements-InARow[2] STD-Time-On-Movements[4] Std-Number-Of-Correct-Movements-Per-Mountain STD-Time-Spent-On-Wrong-Movements-On-Mountains Std-Number-Of-Movements-Per-Mountain Mean-Time-On-Consecutive-Correct-Movements[4] Number-Of-Wrong-Movements[9] Mean-Time-On-Movements[2] Number-Of-Correct-Movements[9] STD-Time-On-Correct-Movements[2] Number-Of-Movements[9] Mean-Time-On-Correct-Movements[2] Mean-Time-On-Movements Mean-Time-On-Correct-Movements[8] Mean-Time-Spent-On-Correct-Movements-On-Mountains Mean-Time-On-Consecutive-Correct-Movements[5] Mean-Time-On-Correct-Movements[6] STD-Time-On-Consecutive-Correct-Movements[6] STD-Time-On-Movements[5] Mean-Time-On-Movements[4] Mean-Time-On-Consecutive-Correct-Movements Mean-Time-On-Correct-Movements[9] STD-Time-On-Correct-Wrong-Moves STD-Time-On-Correct-Movements[9] Mean-Time-On-Movements[6] Mean-Time-On-Movements[8] Mean-Time-On-Correct-Wrong-Moves STD-Time-On-Correct-Movements[5] Mean-Time-On-Correct-Movements[7] Mean-Time-Spent-On-Wrong-Movements-Mountains Mean-Time-On-Consecutive-Correct-Movements[6] Interpretation and Discussion Generally the students did not significantly learn factorization skills by playing the game. This fact could be verified from the statistical results shown in Table 5-7. The average percentage of learning gain in higher PLG group is 0.7%, less than 1%, and the average PLG in lower PLG group is a negative value of about -43%. About 76% of students belong to Higher PLG group while around 24% of students showed negative PLG. Therefore on average there is almost zero PLG for 76% of students. A further analysis showed that there is a statistically significant difference between the cluster1?s prior knowledge (M=10.65 , SD=3.36) and cluster2?s prior knowledge (M=13.67 , SD=1.33), p=0.0003, d=1.0001. This indicates that the group with higher PLG is the group with lower prior knowledge and similarly, the group with lower PLG is the group with higher prior knowledge. The extracted 146 patterns are consistent with the results discussed in the previous section (See Table 5-4). The students in higher PLG group (=lower prior knowledge group) showed consistent pattern of spending less time on making movements and correct movements. It could be a reason for why this group of students (76% of the students) did not learn (PLG<1%) from interaction with Prime Climb despite the fact that there was some room for improvement (average pre-test score=10.65). On the contrary, the students (24% of the students) with lower PLG (=higher prior knowledge) showed more engagement in the game and spent more time on making movements. This group of students showed negative PLG on average. Remember that this group of students had a very high prior knowledge score (13.67 out of 15). Therefore, there is not much room for improvement from the pre-test to the post-test and apparently this group of students on average did worse on post-test. This could be due to frustration from playing the game or lack of motivation for paying enough attention to taking the post-test. 5.5 Behavior Discovery with Truncated-Features Sets The Truncated-features datasets are those which do not use interaction data from all 9 mountains to extract the features. Such datasets are mainly important for constructing an online classifier to classify students to different clusters with the aim of providing a more accurate student?s model or intervention mechanism. For instance, if the classifier can identify a student as a lower/higher knowledgeable student, it could leverage the information for early adjustment of student?s model parameters as well as the intervention mechanism. From all 96 Truncated-features datasets, we selected 2 datasets which resulted in clusters that are statistically significant different with respect to cluster?s prior knowledge and cluster?s PLG. 147 Interactions Discovery on Truncated-Mountains[2]-Generic+Specific-MG+Movements Feature set As previously mentioned, behavior discovery on truncated dataset can be useful for identifying frequent interaction patterns by using a fraction of data. Such patterns could then be used for building online classifiers to characterize users early in the interaction. To this end, the K-Means clustering was applied on Truncated-Mountains[2]-Generic+Specific-MG+Movements dataset. There are originally 43 data points (students) in this dataset from which 1 student was excluded because the clustering method generated a cluster containing only one student. Therefore, there are 42 data points in this dataset used for analysis. As already defined, in this dataset the data points (students) are represented by integration of mountain-generic and mountain-specific features from magnifying glass and movements-related interactions and only interaction data from the first two mountains was used for interaction clustering. Table 5-10 shows the clustering results. Table 5-10: Clustering results on Truncated-Mountains[2]-Generic+Specific-Movements Dataset Truncated-Features Dataset, Mountain[2]-Generic+Specific-MG+Movement Features, #Cluster Outlier:1, #data points=42, #Total Features= 51#Total Selected Features=25 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.4545 2.6639 9.2222 3.9378 0.0264 1.0836 A features selection process [64] was applied on the dataset and 25 features were selected out of 51 features. The optimal number of clusters was calculated to be 2 clusters [65]. The student t-test was used to compare the prior knowledge of the two clusters. The result of the t-test shows a statistically significant difference between cluster1?s prior knowledge (M=12.45 , SD=2.66) and cluster2?s prior knowledge (M=9.22 , SD=3.93), p=.02, d=1.08. 148 Next, the class association rule mining was applied on the 2 clusters. The extracted rules are shown in Table 5-11. Table 5-11: Associative rules extracted from 2 clusters on full-mountains[2]-generic+specific-movements dataset Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 33/42=78.57%) Rules for Cluster [HPK] ? Mean-Time-On-Movements[1] = Lower (96.77%=[30/31]) ? Mean-Time-On-Movements = Lower (96.67%=[29/30]) Cluster 2 Cluster 2[LPK]: (Size: 9/42=21.43%) Rules for Cluster [LPK] ? Mean-Time-Spent-On-Mountain = Higher (100%=[7/7]) ? Total-Time-On-Mountain[1] = Higher (100%=[5/5]) Table 5-12: Selected Features for Truncated-Mountains[2]-Generic+Specific-Movements Selected Features Mean-Time-Spent-On-Mountain Number-Of-Movements[1] Time-Spent-On-Mountains[1] Total-Number-Of-Correct-Movements Number-Of-MG-Usage Mean-Number-of-Correct-Moves-InARow Number-Of-MG-Usage[1] Time-Spent-On-Wrong-Movements-On-Mountains[1] Mean-Number-Of-MG-Usage-Per-Mountain Mean-Time-On-Correct-Movements[2] Number-Of-MG-Usage[2] Mean-Time-On-Movements STD-Number-Of-MG-Usage-Per-Mountain Mean-Time-Spent-On-Correct-Movements-On-Mountains Std-Number-Of-Wrong-Movements-Per-Mountain Mean-Time-On-Correct-Movements Mean-Number-Of-Wrong-Movements-Per-Mountain Mean-Time-On-Movements[2] Number-Of-Wrong-Movements[2] Mean-Time-On-Movements[1] Mean-Number-of-Correct-Movements-InARow[2] Mean-Time-On-Correct-Movements[1] Number-Of-Correct-Movements[1] Time-Spent-On-Correct-Movements-On-Mountains[1] Mean-Number-of-Correct-Movements-InARow[1] 149 Interpretation and Discussion As discussed in the two previous subsections, interaction discovery on full datasets, showed that the students with very high prior knowledge (average pre-test score >=13 out of 15) showed more engagement in the game through spending more time when making movement and correct movements and students with lower prior knowledge (average score of around 11 out of 15) showed less engagement through showing consistent pattern of spending lower time on making their movements. As opposed to frequent patterns identified by using full datasets, the behavioral discovery on the truncated dataset show seemingly contradicting results as presented in Table 5-10 and Table 5-11. Behavioral discovery on only interaction data from the first two mountains showed that students with higher prior knowledge (M=12.45 , SD=2.66) which constitute around 79% of the all students, spend less time on making movements. While this is the frequent pattern in this group, the complement (opposite) pattern (spending more time on movements) is not the frequent pattern in the other group (lower prior knowledge group). The extracted patterns could indicate that, in the first few mountains most of the students (around 79%) interact similarly in relation with making movements. But as they make progress in the game, the students with very higher prior knowledge behave more differently than the students with lower prior knowledge. To this end, we also investigated frequent patterns when more interaction data from upper mountains are contained in the interaction. (See Appendix A for more details). When the same type of interaction data from the first 3 mountains are included in pattern mining, 2 clusters are identified which are not statistically significant different on their prior knowledge. Interestingly, when interaction data from the first four mountains are included we can observe similar patterns to patterns identified using the full dataset. Table 5-13 and Table 150 5-14 respectively show the clustering and extracted rules on Truncated-Mountains[4]-Generic+Specific-Movements dataset. Similar patterns can be seen when more interaction data from upper mountains are included in patterns analysis. Table 5-13: Clustering results on Truncated-Mountains[4]-Generic+Specific-Movements Dataset Truncated-Features Dataset, Mountain[4]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43, #Total Features= #Total Selected Features= Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.2857 1.5779 11.3889 3.4016 0.0209 0.5971 Table 5-14: Associative rules extracted from 2 clusters on full-mountains[4]-generic+specific-movements dataset Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 7/43=16.28%) Rules for Cluster [HPK] ? Mean-Time-On-Movements[4] = Higher (100%=[5/5]) ? Mean-Time-On-Correct-Movements[3] = Higher (100%=[3/3]) Cluster 2 Cluster 2[LPK]: (Size: 36/43=83.72%) Rules for Cluster [LPK] ? Mean-Time-On-Correct-Movements = Lower (100%=[35/35]) ? Mean-Time-On-Movements = Lower (100%=[34/34]) Interactions Discovery on Truncated-Mountains[2]-Generic-Movements Set Similarly, Truncated-Mountains[2]-Generic-Movements dataset was used for interaction discovery. This dataset only contains movement mountain-generic features from the first two mountains (levels) in Prime Climb. There are 39 students in this dataset. Features selection was applied on the dataset and 9 features were selected out of 20 features. The optimal 151 number of clusters for this dataset was found to be 2 clusters. A student t-test was used to compare the PLG of the two clusters. The result of the t-test showed that there is a statistically significant difference between the cluster1?s PLG (M=4.94% , SD=60.55%) and cluster2?s PLG (M=-33.43% , SD=62.51%). Table 5-15 shows the clustering and statistical results. Table 5-15: Clustering results on Truncated-Mountains[2]-Generic+Specific-Movements Dataset Truncated-Features Dataset, Mountain[2]-Generic, Movement Features, #Cluster Outlier:0, #data points=39, #Total Features= 20#Total Selected Features=9 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 4.9432 60.5463 -33.4343 62.5143 0.0416 0.6265 Table 5-16: Associative rules extracted from 2 clusters on truncated-mountains[2]-generic-movements dataset Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 25/39=64.1%) Rules for Cluster [HLG] ? Mean-Time-Spent-On-Mountain = Lower (80.65%=[25/31]) o Mean-Number-Of-Wrong-Movements-Per-Mountain = Lower (86.21%= [25/29]) o Total-Number-Of-Wrong-Movements = Lower (86.21%=[25/29]) ? STD-Time-Spent-On-Mountain = Lower (78.13%=[25/32]) o Mean-Time-Spent-On-Mountain = Lower (83.33%=[25/30]) Cluster 2 Cluster 2[LLG]: (Size: 14/39=35.9%) Rules for Cluster [LLG] ? Mean-Time-Spent-On-Mountain = Higher (100% [8/8]) ? STD-Time-Spent-On-Mountain = Higher (100% [7/7]) 152 Table 5-17: Selected Features for Truncated-Mountains[2]-Generic+Specific-Movements Selected Features Mean-Time-Spent-On-Mountain Mean-Time-On-Correct-Movements Std-Number-Of-Wrong-Movements-Per-Mountain Mean-Time-Spent-On-Correct-Movements-On-Mountains Mean-Number-Of-Wrong-Movements-Per-Mountain Mean-Time-On-Movements Total-Number-Of-Wrong-Movements STD-Time-Spent-On-Mountain Mean-Number-of-Correct-Moves-InARow Interpretation and Discussion Table 5-16 presents the extracted rules on a truncated dataset of mountain-generic movement features from the first two mountains. The first cluster covering about 64% of all students is the group with positive PLG of about 5%. The second cluster covering about 36% of all students is the group with a negative PLG. It is worth mentioning that there is no statistically significant difference between the prior knowledge of the two clusters (p=.09 , d=.5). The extracted rules indicate that the higher PLG group spent less time on mountain and made fewer numbers of wrong movements in total and on average. Since the patterns identified using the interaction data from the first 2 clusters are different (but not necessarily inconsistent) from patterns identified when interaction data from all 9 mountains are included, we took a further look at the frequent patterns when interaction data from more mountains are included. (See Appendix A for more details). It was observed that, when interaction data from upper mountains are included, the clustering mechanism did not result in clusters with statistically significant different PLG. 5.6 Behavior Discovery with Mixed Datasets In the previous two sections, behavior discovery on the full and truncated sets were discussed. In the full datasets, the interaction data from all mountains was included in behavior analysis and in truncated datasets the interaction data from some mountains was 153 included. In this section, we introduce two other datasets which use some features from all 9 mountains and some features from only some mountains. As previously described, the results of clustering and rule mining showed that, the full and truncated feature sets could be used to discover two clusters of students that were significantly different on their prior knowledge and learning gain. Therefore, we were interested in investigating whether the feature sets that are defined based on interaction data only from the first few mountains and the last few mountains could also result in similar clustering and rule mining results. The next two sections present the work on identifying frequent patterns in different groups of students using the 2 mentioned datasets. Interactions Discovery on Full-Mountains-Generic-MG+Movement and Truncated-Mountains[1-2]-Specific-MG+Movement Set We refer to this dataset as Full-Mountains-Generic+Truncated-Mountains[1-2]-Specific for short. This dataset contains all mountain-generic features related to movement and MG interactions as well as mountain-specific features related to movement and MG only from the first two mountains. For this dataset the optimal number of clusters was calculated to be 2 clusters. A features selection [64] mechanism was applied which selects 38 features from 63 features. From the 38 selected features, 4 features are MG-related features and the rest are movements-related features. Table 5-18 shows the results of clustering and comparing the two clusters using t-test. 154 Table 5-18: Clustering results on Full-Mountains-Generic+Truncated-Mountains[1-2]-Specific Dataset Full-Mountains-Generic+Truncated-Mountains[1-2]-Specific #Cluster Outlier:0, #data points=43, #Total Features= 63#Total Selected Features=38 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 1.9 11.25 3.49 0.026 0.55 As shown in Table 5-18, the result of a t-test shows that the cluster1?s prior knowledge (M=13.0 , SD=1.9) is significantly higher than the cluster2?s prior knowledge (M=11.25 , SD=3.49), p=0.026 and d=0.55. Then, the frequent rules for each cluster were extracted. The two clusters were not statistically significant difference on cluster?s PLG. Table 5-19: Associative rules extracted from 2 clusters on truncated-mountains-generic+mountains[1-2]-specific dataset Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 11/43=25.58%) Rules for Cluster [HPK] ? Mean-Time-On-Consecutive-Correct-Movements = Higher (100%=[6/6]) ? Mean-Time-On-Movements = Higher (100%=[6/6]) Cluster 2 Cluster 2[LPK]: (Size: 32/43=74.42%) Rules for Cluster [LPK] ? Mean-Time-On-Correct-Movements[2] = Lower (90.32%=[28/31]) o STD-Number-of-Correct-Moves-InARow = Lower (95.45%=[21/22]) ? Mean-Time-On-Consecutive-Correct-Movements = Lower (86.49%=[32/37]) o Time-Spent-On-Mountains[1] = Lower (93.75%=[30/32]) o Mean-Number-of-Correct-Moves-InARow = Lower (93.75%=[30/32]) Table 5-20 gives the selected features. 155 Table 5-20: 38 selected Features for full-mountain-generic+truncated-mountains[1-2]-specific Selected Features Time-Spent-On-Mountains[1] Mean-Number-of-Correct-Movements-InARow[1] Mean-Time-Spent-On-Mountain Total-Number-Of-Wrong-Movements Mean-Time-Between-Correct-Wrong-Moves Total-Number-Of-Correct-Movements Number-Of-MG-Usage Total-Number-Of-Movements Number-Of-MG-Usage[1] STD-Time-Between-Correct-WrongMoves Mean-Number-Of-MG-Usage-Per-Mountain Mean-Time-Between-Correct-Movements Mean-Number-of-Wrong-Moves-InARow Mean-Time-Between-Consecutive-Correct-Movements STD-Number-of-Wrong-Moves-InARow Mean-Time-Spent-On-Correct-Movements-On-Mountains STD-Number-Of-MG-Usage-Per-Mountain Mean-Time-Between-Movements Mean-Number-Of-Wrong-Movements-Per-Mountain Mean-Time-Between-Correct-Movements[2] Std-Number-Of-Wrong-Movements-Per-Mountain Mean-Time-Spent-On-Wrong-Movements-Mountains STD-Number-of-Correct-Moves-InARow STD-Time-Between-Correct-Movements Std-Number-Of-Correct-Movements-Per-Mountain STD-Time-Spent-On-Correct-Movements-On-Mountains Mean-Number-of-Correct-Moves-InARow STD-Time-Between-Movements Mean-Number-Of-Correct-Movements-Per-Mountain STD-Time-Between-Consecutive-Correct-Movements Mean-Number-of-Correct-Movements-InARow[2] STD-Time-Between-Wrong-Correct-Moves Number-Of-Movements[1] Mean-Time-Between-Wrong-Correct-Moves Number-Of-Correct-Movements[1] Mean-Time-Between-Consecutive-Wrong-Movements Mean-Number-Of-Movements-Per-Mountain STD-Time-Spent-On-Mountain Interpretation and Discussion The dataset used for behavior discover contains full mountain-generic features and truncated mountain-specific features from the first two mountains and the features were movement and MG-related features. The frequent patterns extracted for higher prior knowledge group do not include any mountain-specific feature. The two patterns found for this cluster are similar to those identified when only full mountain-generic features were used: the students with higher prior knowledge spend more time on making movement and correct movements. The patterns extracted for lower prior knowledge group also showed that the students spent less time on making correct moves. Therefore, apparently, adding mountain-specific features from the first 2 mountains did not change the frequent patterns significantly. 156 Interactions Discovery on Full-Mountains-Generic-MG+Movement and Truncated-Mountains[1-2, 8-9]-Specific-MG+Movement Set This dataset is referred to as Full-Mountains-Generic+Truncated-Mountains[1,2 and 8-9]-Specific for short. Similar to the dataset discussed in the previous section, in this dataset, MG and movement mountain-generic features calculated based on students? interaction with all 9 mountains are included. In addition movement and MG mountain-specific features from the first two and last two mountains are included. In total, 41 features were selected out of 91 features as a result of features selection process. No MG-related feature was selected and 6 mountain-specific features are from the first two mountains and 10 mountain-specific are from the last two mountains. The optimal number of clusters was calculated to be 2 clusters. Table 5-21 shows the results of clustering and comparison of the 2 clusters using t-test. Table 5-21: Clustering and comparison results on clusters? prior knowledge results on Full-Mountains-Generic+Truncated-Mountains[1-2 and 8-9]-Specific Dataset Full-Mountains-Generic+Truncated-Mountains[1-2 and 8-9]-Specific #Cluster Outlier:0, #data points=43, #Total Features= 91#Total Selected Features=41 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 1.41 11.4 3.47 0.0279 0.5 As shown in Table 5-21, the result of a t-test shows that the cluster1?s prior knowledge (M=13.0 , SD=1.41) is statistically significant higher that the cluster2?s prior knowledge (M=11.4 , SD=3.47), p=0.0279 and d=0.5. Then, the frequent rules for each cluster were extracted. Table 5-22 gives the extracted rules. 157 Table 5-22: Associative rules extracted from 2 clusters on truncated-mountains-generic+mountains[1-2 and 8-9]-specific dataset Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 8/43=18.6%) Rules for Cluster [HPK] ? Mean-Time-On-Consecutive-Correct-Movements = Higher (100%=[6/6]) ? Mean-Time-On-Movements = Higher (100%=[6/6]) Cluster 2 Cluster 2[LPK]: (Size: 35/43=81.4%) Rules for Cluster [LPK] ? Mean-Time-On-Correct-Movements[2] = 'Low' (96.77%=[30/31]) ? Mean-Time-On-Consecutive-Correct-Movements = 'Low' (94.59%=[35/37]) o STD-Time-On-Correct-Movements[8] = 'Low' (100%=[34/34]) o Time-Spent-On-Correct-Movements-On-Mountains[1] = 'Low' (100%= [32/32]) Behavior discovery on the same dataset also resulted in two optimal clusters which have marginally significant difference (with high effect size) on the clusters? PLG. Remember that PLG cannot be defined for 4 of the data points in this dataset so there are 39 data points in the dataset. Table 5-23 shows the results of clustering and comparison of the 2 clusters. Table 5-23: Clustering and comparison results on clusters? PLG results on Full-Mountains-Generic+Truncated-Mountains[1-2 and 8-9]-Specific Dataset Full-Mountains-Generic+Truncated-Mountains[1-2 and 8-9]-Specific #Cluster Outlier:0, #data points=47, #Total Features= 91#Total Selected Features=41 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 2.07% 56.9 -40.47 72.24 0.065 0.695 158 Table 5-24: Associative rules extracted from 2 clusters on truncated-mountains-generic+mountains[1-2 and 8-9]-specific dataset Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 29/39=74.36%) Rules for Cluster [HLG] ? Mean-Time-On-Correct-Movements[2] = Lower (96.43%=[27/28]) ? Mean-Time-On-Movements[8] = Lower (87.88%=[29/33]) o Time-Spent-On-Correct-Movements-On-Mountains[1] = Lower (96.55%= [28/29]) o Mean-Time-On-Correct-Movements[2] = Lower (96.43%=[27/28]) Cluster 2 Cluster 2[LLG]: (Size: 10/39=25.64%) Rules for Cluster [LLG] ? Mean-Time-On-Movements[8] = Higher (100%=[6/6]) ? Mean-Time-Spent-On-Wrong-Movements = Higher (100%=[5/5]) Table 5-25: 41 selected features for full-mountain-generic+truncated-mountains[1-2 and 8-9]-specific Selected Features Time-Spent-On-Correct-Movements-On-Mountains[1] Mean-Number-of-Correct-Movements-InARow[9] Mean-Time-Spent-On-Mountain Mean-Number-Of-Correct-Movements-Per-Mountain Mean-Time-On-Consecutive-Wrong-Movements Mean-Number-of-Correct-Moves-InARow Mean-Time-On-Correct-Wrong-Moves Number-Of-Correct-Movements[1] Mean-Time-Spent-On-Wrong-Movements-Mountains Number-Of-Movements[1] STD-Time-Between-Consecutive-Correct-Movements Mean-Number-of-Correct-Movements-InARow[1] Mean-Time-On-Consecutive-Correct-Movements[8] Mean-Number-of-Correct-Movements-InARow[2] Mean-Time-On-Consecutive-Correct-Movements Mean-Number-Of-Movements-Per-Mountain Mean-Time-On-Movements[8] Std-Number-Of-Movements-Per-Mountain Mean-Time-On-Correct-Movements[8] Total-Number-Of-Correct-Movements STD-Time-On-Correct-Movements Total-Number-Of-Movements STD-Time-On-Movements STD-Time-On-Correct-WrongMoves Mean-Time-On-Movements Mean-Time-On-Correct-Movements[2] STD-Time-Spent-On-Correct-Movements-On-Mountains STD-Time-On-Movements[8] Mean-Time-Spent-On-Correct-Movements-On-Mountains STD-Time-On-Consecutive-Correct-Movements[8] 159 Selected Features Mean-Time-On-Correct-Movements STD-Time-On-Wrong-Correct-Moves STD-Time-On-Correct-Movements[8] Mean-Time-On-Wrong-Correct-Moves Mean-Number-of-Wrong-Moves-InARow Time-Spent-On-Wrong-Movements-On-Mountains[8] Mean-Number-of-Correct-Movements-InARow[8] STD-Time-Spent-On-Mountain STD-Number-of-Correct-Moves-InARow Time-Spent-On-Mountains Std-Number-Of-Correct-Movements-Per-Mountain Interpretation and Discussion As given in Table 5-24, the rules extracted for the two clusters are mostly mountain-specific rules. For instance, the students in the higher learners group spent less time on making correct movements in mountain 2 and less time on making movement on mountain 8. They also spent less time on correct movements in mountain 1. We already know that the students in lower learners group spent less time on movements. Although the current dataset only included specific-features from the first 2 and last 2 mountains, we can see the similar patterns on these mountains: the students spent less time on making movements and correct movements on mountain 1,2 and 8. It could indicate that these students did not change their strategy when playing the game across the mountains. In the higher learners group, the students generally spent more time on movement and more time on wrong movement on mountain 8. These patterns are also consistent with the patterns we already found for the higher learners group. 5.7 Conclusion and Future Works This chapter discusses behavior discovery in Prime Climb. To this end, different datasets of varying types of features were defined. The features are extracted from interaction of students with Prime Climb in the form of making movement from one numbered hexagon to another numbered hexagon and usages of MG tool. In order to identify frequent patterns of interaction in groups of students, firstly we applied a feature selection mechanism to select 160 more relevant features from set of all features. Then a K-Means clustering was applied to cluster the dataset to optimal number of clusters. Once clusters were built, the Hotspot algorithm of Association Rule Mining is applied on the clusters to extract frequent interaction patterns. Finally the clusters were compared to each other on their average prior knowledge and PLG. When interaction data from all 9 mountains are included in behavior discovery, it was found that the students with higher prior knowledge are more engaged in the game and spent more time on making movements. These students also showed lower PLG probably due to carelessness in taking the post-test exam. On the contrary, the students with lower prior knowledge, spent less time on making movements, indicating that they are less involved in the game. Behavior discovery also was conducted on truncated datasets in which only a fraction of interaction data was included. The results showed that using the interaction data from the first two mountains resulted in groups of students that are statistically different on their prior knowledge and PLG. This chapter presents the first step toward a more personalized user modeling and adaptive intervention mechanism in Prime Climb. Such results can be leveraged in building an online classifier which can classify students in different groups early in the interaction. The results of pattern discovery could also be utilized in developing more personalized intervention mechanism. 161 6 Chapter: Conclusion and Future Work The contribution presented in this work is two folds: 1. Improving the user modeling in Prime Climb 2. Mining behavioral interactions of students with Prime Climb Improving the user modeling in Prime Climb This thesis presents the work on improving the student modeling in Prime Climb through identifying the issue of degeneracy in the model and proposing a solution to deal with the issue. It was shown that the Original Prime Climb student?s model is vulnerable to the issue of degeneracy. Following the literature the issue of degeneracy was define as violating the conceptual assumption underlying student modeling despite the fact that the degenerate model has the optimal predictive accuracy. Two tests to identify empirical degeneracy in Prime Climb were suggested. Bounded Prime Climb student?s model was introduced to address the issue of degeneracy in Prime Climb. It was shown that the Bounded model introduces significantly lower failures on the two tests of degeneracy. In addition, several measures of evaluating student?s model were discussed including 1- end accuracy, 2-real-time accuracy, 3-Summed Squared Error and 4-correlation between student?s post-test performance and model?s assessment of factorization knowledge. In addition to the evaluation metrics for the student?s model in Prime Climb, several criteria known as plausibility criteria were also introduced for Prime Climb. The two models, Bounded and Original, were compared to each other with respect to model accuracy and parameters plausibility. It was shown than there is no statistically significant difference between the Bounded and Original models. However, the Bounded student?s model results in more plausible set of parameters when population prior probability is used. 162 Furthermore, this thesis also provides a comprehensive review on Bayesian Knowledge Tracing, a common method for student modeling. The review on BKT was used to present a comparison study on student modeling in Prime Climb and BKT. Mining behavioral interactions of students with Prime Climb This thesis also presents the work on behavior discovery in Prime Climb. The aim of this study was to detect frequent patterns that distinguish groups of students interacting with Prime Climb. To this end, several features were extracted from users? interactions with Prime Climb. The extracted features were of two general types: 1) Mountain-generic and 2)Mountain-specific. The mountain-generic features are features which are measured based on the students? interaction with the entire first 9 mountains in Prime Climb. On the contrary, the mountain-specific features are the features which are measured based on interaction data from subset of mountains. Overall, two types of interaction data were collected: 1)Movements and 2)Usage of Magnifying glass tool. Using the aforementioned types of features, several features sets were generated. Each student was then represented by a feature vector. A features selection mechanism was also applied on features set to filter out the features less relevant features. A modified version of K-Means algorithm which had been developed in Intelligent User Interface group in UBC was then used to cluster the students into optimal number of clusters. Applying the K-Means clustering algorithm on students resulted in groups of students which are significantly different with respect to prior knowledge on the domain skills and percentage of learning gain. Then we applied the Hotspot algorithm, an association rule mining approach, to find the frequent patterns of interaction in different groups of students. In addition we found that using only a subset of interaction data can result in clusters with significantly different characteristics. This can be 163 used to build an online classifier. The online classifier then can be leveraged to classify a student as a higher or lower knowledgeable students or higher or lower achiever (learner) students. This classification can be used in the student?s model to provide a model with possibly stronger predictive accuracy. Future Works There are several interesting follow-up for the work presented in thesis as following: ? Contextual estimation of model?s parameters in Prime Climb: Currently the model?s parameters estimated for student?s model in Prime Climb remain static during the interaction with the game. Contextual model?s parameters, on the contrary, might change according the context they are used in. ? Individualizing model?s parameters in Prime Climb: Currently the model?s parameters are estimated for a group of students and all students use the same values for the model?s parameters. A more individualized user?s model can take into account other sources of information such as student?s pattern of interaction with the game to refine the model?s parameters. One possible approach to addressing this problem can be learning a linear/non-linear regression model which uses student?s interaction patterns to estimate optimal set of individual model?s parameters. An individual model is a model which is trained only for one student. Such model has higher predictive accuracy compared to group model. ? Extending Prime Climb to support adaptive story line: Current version of Prime Climb contains fixed number of mountains of numbers which are presented to the students in the same order regardless of how students interact with system. Extending Prime Climb 164 to support presenting remedial exercises can provide students with more opportunities to practice the desired skills. It could also help the student?s model to collect more evidence on student?s interaction and performance with the aim of providing more accurate and justified adaptive interventions. ? Changing the structure of the student?s model: According to the results on behavior discovery, we found some patterns of interactions which are more than others indicators of some characteristics in the students (i.e. higher learner/lower learner, etc). Such patterns can be somehow leveraged in the structure of the student?s model in Prime Climb with the aim of providing the model with more information about the users during interaction. ? Associative Rule mining on other interactions data: The work presented on behavior discovery in this thesis only uses interaction data related to movements and usage of MG tool. There are some other sources of information can be utilized for clustering and classifying students including students? attention to hints which can be quantified based on eye-gaze data. 165 Bibliography [1] Corbett, A.T. and Anderson, J. R., 1995, Knowledge tracing: Modeling the acquisition of procedural knowledge, User Modeling and User Adapted Interaction, Volume 4, Number 4, 253-278 [2] Corbett, A.T., Anderson, J. R., O'Brien, A.T., 1995, Student modeling in the ACT Programming Tutor. In P. Nichols, S. Chipman and B. Brennan (eds.) Cognitively Diagnostic Assessment. 19-41. [3] Hillsdale, NJ: Erlbaum, Corbett, A.T. , Bhatnagar, A., 1997, Student Modeling in the ACT Programming Tutor: Adjusting a Procedural Learning Model With Declarative Knowledge, In Anthony Jameson, C?cile Paris, and Carlo Tasso (Eds.), User Modeling, Proceedings of the Sixth International Conference, UM97, 243-254. Vienna, New York: Springer Wien New York, 1997 [4] Beck J.E., Sison, J., 2004, Using Knowledge Tracing to Measure Student Reading Proficiencies. Intelligent Tutoring Systems 2004, 624-634 [5] Baker, R. S.J.d., Corbett, A.T., Aleven, V., 2008, More Accurate Student Modeling Through Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing. Human-Computer Interaction Institute. Paper 6. http://repository.cmu.edu/hcii/6 [6] Baker, R. S.J.d., Corbett, A.T., Aleven, V., 2008, Improving Contextual Models of Guessing and Slipping with a Trucated Training Set. EDM 2008: 67-76 [7] Baker, R. S.J.d., Corbett, A.T., Gowda, S. M., Wagner, A.Z., MacLaren, B. A., Kauffman, L. R., Mitchell, A. P., Giguere, S., 2010, Contextual Slip and Prediction of Student Performance after Use of an Intelligent Tutor. UMAP 2010: 52-63 166 [8] San Pedro, M. O. C. Z., Baker, R. S. J. d., Rodrigo, M. M. T., 2011, Detecting Carelessness through Contextual Estimation of Slip Probabilities among Students Using an Intelligent Tutor for Mathematics. AIED 2011: 304-311 [9] Gowda, S. M., Rowe, J. P., Baker, R. S. J. d., Chi, M., Koedinger, K. R., 2011, Improving Models of Slipping, Guessing, and Moment-By-Moment Learning with Estimates of Skill Difficulty. EDM 2011: 199-208 [10] Pardos, Z. A., Heffernan, N. T.,2010, Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing. UMAP 2010: 255-266 [11] Gong, Y., Beck. J. E., Ruiz, C., 2012, Modeling Multiple Distributions of Student Performances to Improve Predictive Accuracy. UMAP 2012: 102-113 [12] Rai, D., Gong, Y., Beck, J., 2009, Using Dirichlet priors to improve model parameter plausibility. EDM 2009: 141-150 [13] Pardos, Z. A. , Trivedi, S., Heffernan, N. T., S?rk?zy, G. N., 2010, Clustered Knowledge Tracing, ITS 2012 [14] Beck, J. E. , Sison, J., 2006, Using Knowledge Tracing in a Noisy Environment to Measure Student Reading Proficiencies. I. J. Artificial Intelligence in Education 16(2): 129-143 [15] Brielmann, M., 2009, A Learner Model based on a Bayesian Network for Garp3, Master's Thesis in Mathematices & Science Education, 2009 [16] Beck, J. E., Chang, K., Mostow, J., Corbett, A. T., 2008, Does Help Help? Introducing the Bayesian Evaluation and Assessment Methodology. Intelligent Tutoring Systems 2008: 383-394 167 [17] Corbett, A. T., J. R. Anderson, V. H. Carver, and S. A. Brancolini, 1994, Individual differences and predictive validity in student modeling. In A. Ram & K. Eiselt (eds.)The Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum. [18] Beck, J.E., 2007, Difficulties in inferring student knowledge from observations (and why you should care). Educational Data Mining [19] Beck, J.E, Chang, K., 2007, Identifiability: A Fundamental Problem of Student Modeling. User Modeling, 137-146 [20] Gong, Y., Beck, J. E., Heffernan, N. T., 2011, How to Construct More Accurate Student Models: Comparing and Optimizing Knowledge Tracing and Performance Factor Analysis. I. J. Artificial Intelligence in Education 21(1-2): 27-46 [21] Chang, K., Beck, J. E., Mostow, J., Corbett, A. T., 2006, A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring Systems. Intelligent Tutoring Systems 2006: 104-113 [22] Pardos, Z. A., Heffernan, N. T., 2010, Navigating the parameter space of Bayesian Knowledge Tracing models: Visualizations of the convergence of the Expectation Maximization algorithm. EDM 2010: 161-170 [23] Koedinger, K. R, 2002: Toward evidence for instructional design principle: Examples from Cognitive Tutor Math 6. Proceedings of PME-NA XXXIII (the North American Chapter of the International Group for the Psychology of Mathematics Education) [24] Atkinson, R.C., 1972, Ingredients for a theory of instruction, American Psychologist 27, 921-931. 168 [25] Lee, J. I., Brunskill, E., 2010, The impact of individualizing Student Models on Necessary Practice Opportunities, EDM 2012 [26] Manske M., Conati, C., 2005, Modeling Learning in an Educational Game. AIED 2005: 411-418 [27] Conati, C., Manske, M., Adaptive Feedback in an Educational Game for Number Factorization. AIED 2009: 581-583 [28] Muir, M., Conati, C., 2012, An Analysis of Attention to Student - Adaptive Hints in an Educational Game. ITS 2012: 112-122 [29] Muir, M., Conati, C., 2011, Understanding Student Attention to Adaptive Hints with Eye-Tracking. UMAP Workshops 2011: 148-160 [30] Kardan, S., and C. Conati., 2011, "A Framework for Capturing Distinguishing User Interaction Behaviours in Novel Interfaces." Proceedings of the 4th International Conference on Educational Data Mining, Eindhoven, The Netherlands. [31] Judi, M., Baldwin, J., 2012, Identifying Successful Learners from Interaction Behaviour, ,Proceedings of the 5th International Conference on Educational Data Mining, Chania, Greece, 160-163 [32] Lopez, M. I., Romero, C., Ventura, S., Luna, J.M., 2012, Classification via clustering for predicting final marks starting from the student, participation in Forums?, Proceedings of the 5th International Conference on Educational Data Mining, 148-152, Chania, Greece [33] Mavrikis, M., 2008, Data-driven modelling of students? interactions in an ILE. InThe 1st International Conference on Educational Data Mining, pp. 87-96. 2008, Quebec, Canada [34] Hunt, E., Madhyastha, T., 2005, Data Mining Patterns of Thought. In Proc. the AAAI Workshop on Educational Data Mining 169 [35] Fazel, K., Morgan, B., Graesser, A., 2012, Automated Detection of Mentors and Players in an Educational Game., EDM 2012 [36] Mccuaig J., Baldwin, J., 2012, Identifying Successful Learners from Interaction Behaviour, EMD 2012 [37] L?pez, M. I., J. M. Luna, C. Romero, S. Ventura, M. M. Molina, J. M. Luna, C. Romero, 2012, Classification via clustering for predicting final marks based on student participation in forums. In Proceedings of the 5th International Conference on Educational Data Mining, EDM 2012, vol. 42, pp. 649-656. 2012. [38] J. Cole and H. Foster., 2007, Using Moodle: Teaching with the popular open source course management system. O?Reilly Media, Inc., 2007. [39] Terry, P.and McCalla, G., 2012, Mining Student Behavior Patterns in Reading Comprehension Tasks. Journal of Educational Data Mining 2012 [40] Fran?ois, B., Azevedo, R., Kinnebrew, J. S. , Biswas, G., 2012, Identifying Students? Characteristic Learning Behaviors in an Intelligent Tutoring System Fostering Self-Regulated Learning., EDM 2012 [41] Perera, D., Kay, J., Koprinska, I., Yacef, K., Zaane, O. R., 2009, Clustering and sequential pattern mining of online collaborative learning data. IEEE Trans. on Knowledge and Data Eng., 21(6):759-772, June 2009 [42] Kinnebrew, J. S., Biswas G., 2012, Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution, EDM 2012 [43] Bousbia, N., Labat, J-M., Balla, A., Rebai, I., 2010, Analyzing Learning Styles using Behavioral Indicators in Web based Learning Environments, EDM2010 170 [44] Baker, R.S., Corbett, A.T., Koedinger, K.R., Roll, I., 2005, Detecting when students game the system, across tutor subjects and classroom cohorts. In Ardissono, L., Brna, P., Mitrovic, A.(Eds.) UM 2005. LNCS (LNAI), 2005. 3538, p. 220?224. Heidelberg: Springer [45] Crist?bal, R., Romero, J.R., Luna, J.M., Ventura, S., 2010, Mining rare association rules from e-learning data." In Proceedings of the international conference of educational data mining. Pittsburgh, pp. 171-80. 2010 [46] Sidney D., Graesser, A., 2010, Mining Bodily Patterns of affective experience during learning." In Proceedings of the 3rd International Conference on Educational Data Mining, pp. 31-40. 2010. [47] Dekker, G. W., Pechenizkiy, M., Vleeshouwers, J.M, 2009, Predicting students drop out: A case study." In Proceedings of the 2nd International Conference on Educational Data Mining, EDM, vol. 9, pp. 41-50. 2009. [48] Javier, B., Ortigosa, A., 2009, Detecting symptoms of low performance using production rules." In Int. Conf. Educ. Data Mining, Cordoba, Spain. 2009 [49] Brusilovsky, P., Sosnovsky, S., and Shcherbinina, 2004, O. QuizGuide: Increasing the Educational Value of Individualized Self-Assessment Quizzes with Adaptive Navigation Support. In J. Nall and R. Robson (Eds.), Proceedings of E-Learn, 2004, p. 1806-1813. [50] Moffat M., Mitrovic, A., 2008, Do Students Who See More Concepts in an ITS Learn More? [51] Amershi, S., Carenini, G., Conati, C.,Mackworth, A., Poole, D. 2008. Pedagogy and Usability in Interactive Algorithm Visualizations - Designing and Evaluating CIspace. Interact Comput. 20, 1, 64-96. 171 [52] Hall, M., Eibi, F., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H. 2009. The WEKA datamining software: an update. SIGKDD Explor. 11(1), 10?18 [53] Kim, K. Ahn, H. 2008. A recommender system using GA k-means clustering in an online shopping market. Expert Syst. Appl. 34, 2, 1200-1209. [54] de Castell, S. and Jenson, J., 2003, Serious Play. Journal of Curriculum Studies, 35(6), 649-665. [55] de Castell, S. & Jenson, J., 2007, Digital Games for Education: When Meanings Play. Intermedialities, 9, 45-54. [56] Gee, J. P., 2003, What video games have to teach us about learning and literacy, New York: Palgrave Macmillan [57] Reiber, L., 1996, Seriously considering play: Designing interactive learning environments based on the blending of microworlds, simulations and games. Educational technology research and development, 44(2), 43-58. [58] Ruben, D., 1999, Simulations, games, and experience-based learning: The quest for a new paradigm for teaching and learning. Health Education Research, Theory and Practice, 30(4):498?505, 1999. [59] Alessi, S.M., Trollip, S.R., 2001, 2001,. Multimedia for Learning: Methods and Development, 3rd ed. Allyn & Bacon, Needham Heights. [60] Lee, J., Luchini, K., Michael, B., Norris, C., Solloway, E., 2004, More than just fun and games: Assessing the value of educational video games in the classroom. Proceedings of ACM SIGCHI 2004, Vienna, Austria, pp. 1375?1378. 172 [61] Van Eck, R., 2007, Building Artificially Intelligent Learning Games. Games and Simulations in Online earning: Research and Development Frameworks. D. Gibson, C. Aldrich, and M. Prensky, Editors, Information Science Pub. 271-307. [62] Conati, C. and M. Klawe, ,2002, Socially Intelligent Agents in Educational Games. In Socially Intelligent Agents - Creating Relationships with Computers and Robots. K. Dautenhahn, et al., Editors, Kluwer Academic Publishers. [63] Manske, M. (2006). A model and adaptive support for learning in an educational game. University of British Columbia (Thesis). [64] Guyon, I., & Elisseeff, A. 2003. An introduction to variable and feature selection. J. of Machine Learning Research, 3, 1157?1182 [65] Milligan, G. W., & Cooper, M. C. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159?179 173 Appendices Appendix A Clustering Results for Different Features Sets A.1 Understanding Patterns of Interactions in Groups with Different Prior Knowledge Using Full-Features Sets This section describes the work conducted to identify frequent behavioral patterns in a group of 43 students for each of which a pre-test score is available. To this end, a K-Means clustering method is applied on different datasets which represent the 43 students and different clusters of students are constructed. The generated clusters are then compared with respect to their cluster?s prior knowledge. If there is statistically significant difference between two clusters, class association rule-mining is applied on the clusters to identify frequent patterns of interaction in each cluster. Behavior Discovery on Full-Mountains-Generic-Movements Dataset Full-Features Dataset, Mountain-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics* Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 2.0 11.3 3.4508 0.036 0.535 The result shows that there is a statistically significant difference between the prior knowledge of cluster one of students (higher prior knowledge (HPK) group) (M=13.0, SD=2.0) and cluster two of students (lower prior knowledge (LPK) group) (M=11.3, SD=3.45), p=0.03 and d=0.53. Following table shows the rules extracted for each cluster. 174 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 10/43) Rules for Cluster [HPK] (Cov=23.26%) ? Mean-Time-Between-Movements = Higher (100%, [6/6]) ? Mean-Time-Spent-On-Correct-Movements-On-Mountains = 'Higher' (100% [5/5]) Cluster 2 Cluster 2[LPK]: (Size: 33/43) Rules for Cluster [LPK] (Cov=76.74%) ? Mean-Time-Between-Movements = Lower 89.19%, [33/37]) o STD-Time-Between-Wrong-Correct-Moves = Lower (94.29% [33/35]) ? Mean-Time-Between-Consecutive-Wrong-Movements = Lower (88.57% [31/35]) o STD-Time-Between-Movements = Lower (93.94% [31/33]) o STD-Time-Between-Correct-Movements = Lower (93.94% [31/33]) Behavior Discovery on Full-Mountains-Generic-MG Dataset Full-Features Dataset, Mountain-Generic, MG Features, #Cluster Outlier:1 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.9706 3.3209 10.375 2.7811 0.1045 0.4947 The results of clustering show that there is no statistically significant difference between the prior knowledge of the 2 clusters, p>0.05. Behavior Discovery on Full-Mountains-Generic-MG+Movement Dataset Full-Features Dataset, Mountain-Generic, Movement+MG Features, #Cluster Outlier:0 #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 2.0 11.303 3.4508 0.036 0.5348 175 The result shows that there is a statistically significant difference between the prior knowledge of cluster one of students (higher prior knowledge (HPK) group) (M=13.0, SD=2.0) and cluster two of students (lower prior knowledge (LPK) group) (M=11.3, SD=3.45), p=0.03 and d=0.53. Following figure shows the rules extracted for each cluster. Extracted Rules Cluster 1 Cluster 1[HPK]: (Size:10/43 ) Rules for Cluster [HPK] (Cov=23.26%) ? Mean-Time-Between-Movements = Higher (100% [6/6]) ? Mean-Time-Spent-On-Correct-Movements-On-Mountains = Higher (100% [5/5]) Cluster 2 Cluster 2[LPK]: (Size: 33/43) Rules for Cluster [LPK] (Cov=76.74%) ? Mean-Time-Between-Movements = Lower (89.19% [33/37]) o STD-Time-Between-Wrong-Correct-Moves = Lower (94.29% [33/35]) ? Mean-Time-Between-Consecutive-Wrong-Movements = Lower (88.57% [31/35]) o STD-Time-Between-Movements = Lower (93.94% [31/33]) o STD-Time-Between-Correct-Movements = Lower (93.94% [31/33]) Interactions Discovery on Full-Mountains-Generic+Specific-Movements Dataset Full-Features Dataset, Mountain-Generic+Specific, Movement Features, #Cluster Outlier:1 #data points=42 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.6667 1.3333 11.1818 3.4593 0.0014 0.7944 The result shows that there is a statistically significant difference between the prior knowledge of cluster one of students (higher prior knowledge (HPK) group) (M=13.66, SD=1.33) and cluster two of students (lower prior knowledge (LPK) group) (M=11.18, SD=3.45), p=0.001 and d=0.79. Following figure shows the rules extracted for each cluster. 176 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 9/42) Rules for Cluster [HPK] (Cov=21.43%) ? Mean-Time-Between-Movements[5] = Higher (100% [7/7]) ? Mean-Time-Between-Correct-Movements[5] = Higher (100% [7/7]) Cluster 2 Cluster 2[LPK]: (Size: 33/42) Rules for Cluster [LPK] (Cov=78.57%) ? Mean-Time-Between-Movements[5] = Lower (94.29% [33/35]) o STD-Time-Between-Wrong-Correct-Moves = Lower (100% [33/33]) o STD-Time-Between-Consecutive-Correct-Movements[3] = Lower (100% [33/33]) ? Mean-Time-Between-Correct-Movements[5] = Lower (94.29% [33/35]) o STD-Time-Between-Wrong-Correct-Moves = Lower (100% [33/33]) o STD-Time-Between-Consecutive-Correct-Movements[3] = Lower (100% [33/33]) Interactions Discovery on Full-Mountains-Generic+Specific-MG Dataset Full-Features Dataset, Mountain-Generic, MG Features, #Cluster Outlier:1 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.7838 3.4573 10.80 1.1662 0.1249 0.3009 The results of clustering show that there is no statistically significant difference between the two clusters of students, p>0.05. 177 Interactions Discovery on Full-Mountains-Generic+Specific-MG+Movements Dataset Full-Features Dataset, Mountain-Generic, Movement+MG Features, #Cluster Outlier:1 #data points=42, #Total Features=161 #Total Selected Features=82 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.6667 1.3333 11.1818 3.4593 0.0014 0.7944 The result shows that there is a significant difference between the prior knowledge of cluster one of students (higher prior knowledge (HPK) group) (M=13.66, SD=1.33) and cluster two of students (lower prior knowledge (LPK) group) (M=11.18, SD=3.45), p=0.001 and d=0.79. Following figure shows the rules extracted for each cluster. Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 9/43) Rules for Cluster [HPK] (Cov=21.43%) ? Mean-Time-Between-Movements[5] = Higher (100% [7/7]) ? Mean-Time-Between-Correct-Movements[5] = Higher (100% [7/7]) Cluster 2 Cluster 2[LPK]: (Size: 33/42) Rules for Cluster [LPK] (Cov=78.57%) ? Mean-Time-Between-Movements[5] = Lower (94.29% [33/35]) o total-STD-Time-Between-Wrong-Correct-Moves = Lower (100% [33/33]) o STD-Time-Between-Consecutive-Correct-Movements[3] = Lower (100% [33/33]) ? Mean-Time-Between-Correct-Movements[5] = Lower (94.29% [33/35]) o STD-Time-Between-Wrong-Correct-Moves = Lower (100% [33/33]) o STD-Time-Between-Consecutive-Correct-Movements[3] = Lower (100% [33/33]) 178 A.2 Understanding Patterns of Interactions in Groups with Different Learning Gain Using Full-Features Sets This section describes the work conducted to identify frequent behavioral patterns in a group of 39 students for each of which a pre-test score is available. To this end, a K-Means clustering method is applied on different datasets which represent the 39 students and different clusters of students are constructed. The generated clusters are then compared with respect to their cluster?s PLG. If there is statistically significant difference between two clusters, class association rule-mining is applied on the clusters to identify frequent patterns of interaction in each cluster. Interactions Discovery on Full-Mountains-Generic-Movements Dataset Full-Features Dataset, Mountain-Generic, Movement Features, #Cluster Outlier:0 #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -3.8338% 61.2288% -23.3320% 69.3032% 0.2316 0.3076 The results of clustering show that there is no statistically significant difference between the percentage of learning gain of the 2 clusters of students, p>0.05. Interactions Discovery on Full-Mountains-Generic-MG Dataset Full-Features Dataset, Mountain-Generic, MG Features, #Cluster Outlier:1 #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 22.8413% 56.5861% -14.241% 62.3821% 0.0772 0.6058 179 The results of clustering show that there is no statistically significant difference between the percentage of learning gain of the 2 clusters. Interactions Discovery on Full-Mountains-Generic-MG+Movement Dataset Full-Features Dataset, Mountain-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -3.8338% 61.2288% -23.332% 69.3032% 0.2316 0.3076 The results of clustering show that there is no statistically significant difference between the percentage of learning gain of the 2 groups of students. Interactions Discovery on Full-Mountains-Generic+Specific-Movements Dataset Full-Features Dataset, Mountain-Generic+Specific, Movement Features, #Cluster Outlier:1 #data points=38 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 0.7641% 58.7927% -43.5178% 70.4686 0.066 0.717 The result shows that there is a marginally statistically and practically significant difference between the PLG of cluster one of students (higher learning gain (HLG) group) (M=0.76%, SD=58.79%) and cluster two of students (lower learning gain (LLG) group) (M=-43.51%, SD=70.46%), p=0.06 and d=0.72. Following figure shows the rules extracted for each cluster. 180 Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 29/38) Rules for Cluster [HLG] (Cov=76.32%) ? Mean-Time-Between-Movements[5] = Lower (93.55% [29/31]) o STD-Time-Between-Wrong-Correct-Moves = Lower (100% [29/29]) ? Mean-Time-Between-Correct-Movements[5] = Lower (93.55% [29/31]) o STD-Time-Between-Wrong-Correct-Moves = Lower (100% [29/29]) Cluster 2 Cluster 2[LLG]: (Size: 9/38) Rules for Cluster [LLG] (Cov=23.68%) ? Mean-Time-Between-Movements[5] = Higher (100% [7/7]) ? Mean-Time-Between-Correct-Movements[5] = Higher (100% [7/7]) Interactions Discovery on Full-Mountains-Generic+Specific-MG Dataset Full-Features Dataset, Mountain-Generic+Specific, MG Features, #Cluster Outlier:1 #data points=38 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -5.7903% 52.8098% -9.2857% 96.007% 0.4668 0.0555 There is no statistically significant difference between the 2 clusters with respect to percentage of learning gain, p<0.05. Interactions Discovery on Full-Mountains-Generic+Specific-MG+Movements Dataset Full-Features Dataset, Mountain-Generic+Specific, MG+Movement Features, #Cluster Outlier:1 #data points=38 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 0.7641% 58.7927% -43.5178% 70.4686% 0.066 0.717 181 The result shows that there is a marginally statistically and practically significant difference between the PLG of cluster one of students (higher learning gain (HLG) group) (M=0.76%, SD=58.79%) and cluster two of students (lower learning gain (LLG) group) (M=-43.51%, SD=70.46%), p=0.06 and d=0.72. In total, there are X features from which X number of features were selected as a result of features selection mechanism. Figure ? shows the rules extracted for each cluster. Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 29/38) Rules for Cluster [HLG] (Cov=76.32%) ? Mean-Time-Between-Movements[5] = Lower (93.55% [29/31]) o total-STD-Time-Between-Wrong-Correct-Moves = Lower (100% [29/29]) ? Mean-Time-Between-Correct-Movements[5] = Lower (93.55% [29/31]) o STD-Time-Between-Wrong-Correct-Moves = Lower (100% [29/29]) Cluster 2 Cluster 2[LLG]: (Size: 9/38) Rules for Cluster [HLG] (Cov=23.68%) ? Mean-Time-Between-Movements[5] = Higher (100% [7/7]) ? Mean-Time-Between-Correct-Movements[5] = Higher (100% [7/7]) A.3 Understanding Patterns of Interactions in Groups with Different Prior Knowledge Using Truncated-Features Sets In the two previous, we investigate users? interactions using Full-features datasets. Such an investigation is valuable as it provides a concrete understanding on how users interact with the educational game (Prime Climb) by observing the users? interactive behaviors in almost entire game (by end of the mountain 9). While the results of such study is valuable, it is limited in a sense that the results cannot be utilized to detect a user?s characteristics 182 (higher/lower prior knowledge or higher/lower learning gain) during the game play in real-time. To address this limitation, the following 2 subsections discuss whether using only a fraction of interactions data can help differentiate users? interactive behaviors. Interactions Discovery on Truncated-Mountains-Generic-Movements Datasets There is no statistically significant difference on prior knowledge of the 2 clusters when the mountain-generic movement features of the first, the first 2, 3, 4 and 5 are included (p>0.05). On the contrary, there exists a statistically significant difference on prior knowledge of cluster 1 (HPK) of students and cluster 2 (LPK) of students when the mountain-generic movement features of the first 6, 7 and 8 mountains are included. Truncated-Features Dataset, Mountain[1]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.9032 3.2957 11.1667 3.0777 0.2564 0.2276 Truncated-Features Dataset, Mountain[2]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.3478 2.8224 10.95 3.5422 0.0877 0.4399 Truncated-Features Dataset, Mountain[3]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.7073 3.2852 11.50 2.5 0.4738 0.0637 183 Truncated-Features Dataset, Mountain[4]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.8636 3.4284 11.5238 3.0491 0.3694 0.1046 Truncated-Features Dataset, Mountain[5]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 2.1794 11.4 3.3823 0.0666 0.5011 Truncated-Features Dataset, Mountain[6]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.4286 1.5908 11.3611 3.3842 0.0143 0.6538 The result of clustering shows that there is a statistically significant difference on prior knowledge between cluster 1 (HPK) of students (M=13.42 , SD=1.59) and cluster 2 (LPK) of students (M=11.36 , SD=3.38), p=0.01, d=0.65 when the movement features from the first 6 levels are included. 184 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 7/43) Rules for Cluster [HPK] (Cov=16.28%) ? Mean-Time-Between-Consecutive-Correct-Movements = Higher (100% [6/6]) ? Mean-Time-Spent-On-Mountain = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 36/43) Rules for Cluster [HPK] (Cov=83.72%) ? Mean-Time-Between-Consecutive-Correct-Movements = '(-inf-11730.165]' (97.3% [36/37]) ? Mean-Time-Between-Movements = '(-inf-10259.515]' (97.22% [35/36]) Truncated-Features Dataset, Mountain[7]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.8333 1.3437 11.3514 3.3387 0.0041 0.7911 The result of clustering shows that there is a statistically significant difference on prior knowledge between cluster 1 of students (M=13.83 , SD=1.34) and cluster 2 of students (M=11.35 , SD=3.33), p=0.004, d=0.79 when the movement features from the first 7 levels are included. 185 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 6/43) Rules for Cluster [HPK] (Cov=13.95%) ? Mean-Time-Between-Consecutive-Correct-Movements = Higher (100% [6/6]) ? Mean-Time-Between-Movements = Higher (100% [6/6]) Cluster 2 Cluster 2[LPK]: (Size: 37/43) Rules for Cluster [HPK] (Cov=86.05%) ? Mean-Time-Between-Consecutive-Correct-Movements = Lower (100% [37/37]) ? Mean-Time-Between-Movements = Lower (100% [37/37]) Truncated-Features Dataset, Mountain[8]-Generic, Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.6 1.3565 11.4474 3.3458 0.0162 0.6771 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 5/43) Rules for Cluster [HPK] (Cov=11.63%) ? Mean-Time-Between-Consecutive-Correct-Movements = Higher (100% [5/5]) ? Mean-Time-Between-Movements = Higher (100% [5/5]) Cluster 2 Cluster 2[LPK]: (Size: 38/43 ) Rules for Cluster [HPK] (Cov=88.37%) ? Mean-Time-Between-Consecutive-Correct-Movements = Lower (100% [38/38]) ? Mean-Time-Between-Movements = Lower (100% [38/38]) The result of clustering shows that there is a statistically significant difference on prior knowledge between cluster 1 of students (M=13.6 , SD=1.35) and cluster 2 of students 186 (M=11.44 , SD=3.34), p=0.01, d=0.67 when the movement features from the first 8 levels are included. Interactions Discovery on Truncated-Mountains-Generic-MG Dataset As can be seen in the following tables, the results of clustering show that there is no statistically significant difference on the prior knowledge of cluster 1 and cluster 2 of students when the MG features the first, first 2, 3, 4, 6,7 and 8 mountains are included. The results of clustering shows the statistically significant difference between the two clusters only one the MG measures of the first 5 levels are included. Truncated-Features Dataset, Mountain[1]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0769 2.9211 8.0 3.937 0.0849 1.3455 Truncated-Features Dataset, Mountain[2]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.8462 3.3246 10.25 1.9203 0.1285 0.4957 Truncated-Features Dataset, Mountain[3]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0 3.1365 9.2 3.3106 0.0848 0.8867 187 Truncated-Features Dataset, Mountain[4]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 () Cluster2 () Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.725 3.354 11.3333 1.2472 0.3625 0.1205 Truncated-Features Dataset, Mountain[5]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.1515 3.2578 9.8889 2.7262 0.0316 0.718 The result of clustering shows there is a statistically significant difference between the prior knowledge of cluster 1 (HPK) of students (M=12.15 , SD=3.25) and cluster 2 (LPK) of students (M=9.88 , SD=2.72), p=0.03, d=0.71. Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 33) Rules for Cluster [HPK] (Cov=78.57%) ? Number-Of-MG-Usage = Lower (86.84% [33/38]) Cluster 2 Cluster 2[LPK]: (Size: 9) Rules for Cluster [LPK] (Cov=21.43%) No rule is generated! Truncated-Features Dataset, Mountain[6]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.8333 1.5723 11.6757 3.4489 0.4324 0.0485 188 Truncated-Features Dataset, Mountain[7]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0 1.4142 11.675 3.3495 0.3958 0.099 Truncated-Features Dataset, Mountain[7]-Generic, MG Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.7368 3.4235 11.4 1.3565 0.3548 0.1036 Interactions Discovery on Truncated-Mountains-Generic-MG+Movement Dataset The results of clustering and rule mining show that there is no statistically significant difference between the prior knowledge of the 2 clusters when the mountain generic features of integration of MG and movements measures of the first, the first 1, 2, 3, 4 and 5 are included. There is statistically significant difference between the prior knowledge of the groups of students when the mountain-generic features of 6, 7 and 8 first mountains are included. Truncated-Features Dataset, Mountain[1]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0303 3.1955 10.6 3.2 0.1276 0.4475 189 Truncated-Features Dataset, Mountain[2]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0294 3.1668 10.4444 3.2698 0.1199 0.4971 Truncated-Features Dataset, Mountain[3]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0 1.0 11.6829 3.3235 0.4053 0.0975 Truncated-Features Dataset, Mountain[4]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.8636 3.4284 11.5238 3.0491 0.3694 0.1046 Truncated-Features Dataset, Mountain[5]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 2.1794 11.40 3.3823 0.0666 0.5011 Truncated-Features Dataset, Mountain[6]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1(HPK) Cluster2(LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.4286 1.5908 11.3611 3.3842 0.0143 0.6538 190 There is a statistically significant difference between the prior knowledge of the cluster 1 (HPK) of students (M=13.42 , SD=1.59) and cluster 2 (LPK) of students (M=11.36 , SD=2.38), p=0.01, d=0.65 when the movement and MG features of the first 6 mountains are included. Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 7/43) Rules for Cluster [HPK] (Cov=16.28%) ? Mean-Time-Between-Consecutive-Correct-Movements = Higher (100% [6/6]) ? Mean-Time-Spent-On-Mountain = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 36/43) Rules for Cluster [LPK] (Cov=83.72%) ? Mean-Time-Between-Consecutive-Correct-Movements = Lower (97.3% [36/37]) ? Mean-Time-Between-Movements = Lower (97.22% [35/36]) Truncated-Features Dataset, Mountain[7]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1(HPK) Cluster2(LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.8333 1.3437 11.3514 3.3387 0.0041 0.7911 There is a statistically significant difference between the prior knowledge of the cluster 1 (HPK) of students (M=13.83 , SD=1.34) and cluster 2 (LPK) of students (M=11.35 , SD=3.33), p=0.004, d=0.79 when the movement and MG features of the first 7 mountains are included. 191 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 6/43) Rules for Cluster [HPK] (Cov=13.95%) ? Mean-Time-Between-Consecutive-Correct-Movements = Higher (100% [6/6]) ? Mean-Time-Between-Movements = Higher (100% [6/6]) Cluster 2 Cluster 2[LPK]: (Size: 37/43) Rules for Cluster [LPK] (Cov=80.05%) ? Mean-Time-Between-Consecutive-Correct-Movements = Lower (100% [37/37]) ? Mean-Time-Between-Movements = Lower (100% [37/37]) Truncated-Features Dataset, Mountain[8]-Generic, MG+Movement Features, #Cluster Outlier:0 #data points=43 Cluster1(HPK) Cluster2(LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.8333 1.3437 11.3514 3.3387 0.0041 0.7911 There is a statistically significant difference between the prior knowledge of the cluster 1 (HPK) of students (M=13.83 , SD=1.34) and cluster 2 (LPK) of students (M=11.35 , SD=3.33), p=0.004, d=0.79 when the movement and MG features of the first 8 mountains are included. 192 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 6/43) Rules for Cluster [HPK] (Cov=13.95%) Mean-Time-Spent-On-Correct-Movements-On-Mountains = Higher (100% [6/6]) Mean-Time-Between-Correct-Movements = Higher (100% [6/6]) Cluster 2 Cluster 2[LPK]: (Size: 37/43) Rules for Cluster [LPK] (Cov=86.05%) Mean-Time-Spent-On-Correct-Movements-On-Mountains = Lower (100% [37/37]) Mean-Time-Between-Correct-Movements = Lower (100% [37/37]) Interactions Discovery on Truncated-Mountains-Generic+Specific-Movements Dataset There is no statistically significant difference on prior knowledge of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 1,2,3,4 and 5 are included. On the contrary, there exists a statistically significant difference on prior knowledge of cluster 1 (HPK) of students and cluster 2 (LPK) of students when the mountain-generic and mountain-specific movement features of the first 6, 7 and 8 mountains are included. Truncated-Features Dataset, Mountain[1]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.2174 2.7180 11.2105 3.7498 0.1739 0.3121 193 Truncated-Features Dataset, Mountain[2]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.3226 2.6927 10.1818 4.1080 0.0736 0.6848 Truncated-Features Dataset, Mountain[3]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0741 2.8665 11.0625 3.7328 0.1866 0.3145 Truncated-Features Dataset, Mountain[4]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.1818 3.4593 11.5313 3.1621 0.3025 0.2007 Truncated-Features Dataset, Mountain[5]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.1667 3.3124 11.5161 3.2116 0.2904 0.2008 Truncated-Features Dataset, Mountain[6]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.2500 1.5612 11.3429 3.4305 0.0158 0.6021 194 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 8/43) Rules for Cluster [HPK] (Cov=18.6%) ? Mean-Time-Between-Consecutive-Correct-Movements = Higher (100% [6/6]) ? STD-Time-Between-Consecutive-Wrong-Movements = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 35/43) Rules for Cluster [LPK] (Cov=81.4%) ? Mean-Time-Between-Movements[5] = Lower (97.14% [34/35]) ? Mean-Time-Between-Correct-Movements[5] = Lower (97.14% [34/35]) Truncated-Features Dataset, Mountain[7]-Generic+Specific, Movement Features, #Cluster Outlier:0 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.5000 1.5 11.5128 3.3272 0.05 0.6207 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 4/43) Rules for Cluster [HPK] (Cov=9.3%) ? Mean-Time-Spent-On-Mountain = Higher (100% [4/4]) ? Mean-Time-Spent-On-Wrong-Movements-Mountains = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 39/43) Rules for Cluster [LPK] (Cov=90.7%) ? Mean-Time-Spent-On-Mountain = Lower (100% [39/39]) ? Mean-Time-Between-Correct-Wrong-Moves = Lower (100% [39/39]) 195 Truncated-Features Dataset, Mountain[8]-Generic+Specific, Movement Features, #Cluster Outlier:0 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.5 1.5 11.5128 3.3272 0.05 0.6207 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 4/43) Rules for Cluster [HPK] (Cov=9.3%) ? Mean-Time-Spent-On-Wrong-Movements-Mountains = Higher (100% [4/4]) ? Mean-Time-Between-Correct-Wrong-Moves = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 39/43) Rules for Cluster [LPK] (Cov=39.43%) ? Mean-Time-Spent-On-Wrong-Movements-Mountains = Lower (100% [39/39]) ? Mean-Time-Between-Correct-Wrong-Moves = Lower (100% [39/39]) Interactions Discovery on Truncated-Mountains-Generic+Specific-MG Dataset There is no statistically significant difference on prior knowledge of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 1, 2, 3, 4, 6, 7 and 8 are included. On the contrary, there exists a statistically significant difference on prior knowledge of cluster 1 (HPK) of students and cluster 2 (LPK) of students when the mountain-generic and mountain-specific movement features of the first 5 mountains are included. 196 Truncated-Features Dataset, Mountain[1]-Generic+Specific, MG Features, #Cluster Outlier:0 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.076 2.9211 8.0 3.9370 0.0849 1.3455 Truncated-Features Dataset, Mountain[2]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.8462 3.3246 10.2500 1.9203 0.1285 0.4957 Truncated-Features Dataset, Mountain[3]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0 1.0 11.6829 3.3235 0.4053 0.0975 Truncated-Features Dataset, Mountain[4]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0 1.0 11.6829 3.3235 0.4053 0.0975 Truncated-Features Dataset, Mountain[5]-Generic+Specific, MG Features, #Cluster Outlier:0 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.0 0 11.6341 3.3185 0.0064 0.4215 197 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 2/43) Rules for Cluster [HPK] (Cov=4.65%) ? Number-Of-MG-Usage[5] = Higher (100% [2/2]) Cluster 2 Cluster 2[LPK]: (Size: 41/43) Rules for Cluster [LPK] (Cov=95.35%) No rule is generated Truncated-Features Dataset, Mountain[6]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=43 = Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0 1.4142 11.675 3.3495 0.3958 0.0999 Truncated-Features Dataset, Mountain[7]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.0 1.4142 11.6750 3.3495 0.3958 0.0999 Truncated-Features Dataset, Mountain[8]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.8333 1.5723 11.6757 3.4489 0.4324 0.0485 198 Interactions Discovery on Truncated-Mountains-Generic+Specific-MG+Movements Dataset There is no statistically significant difference on prior knowledge of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 3 and 5 are included. On the contrary, there exists a statistically significant difference on prior knowledge of cluster 1 (HPK) of students and cluster 2 (LPK) of students when the mountain-generic and mountain-specific movement and MG features of the first 2, 4, 6, 7 and 8 mountains are included. Truncated-Features Dataset, Mountain[1]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.2174 2.718 11.2105 3.7498 0.1739 0.3121 Truncated-Features Dataset, Mountain[2]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.4545 2.6639 9.2222 3.9378 0.0264 1.0836 199 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 33/42) Rules for Cluster [HPK] (Cov=78.57%) ? Mean-Time-Between-Movements[1] = Lower (96.77% [30/31]) ? Mean-Time-Between-Movements = Lower (96.67% [29/30]) Cluster 2 Cluster 2[LPK]: (Size: 9/42) Rules for Cluster [LPK] (Cov=21.43%) ? Mean-Time-Spent-On-Mountain = Higher (100% [7/7]) ? Total-Time-On-Mountain[1] = Higher (100% [5/5]) Truncated-Features Dataset, Mountain[3]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 12.2069 2.7839 10.6429 3.8472 0.1019 0.4935 Truncated-Features Dataset, Mountain[4]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.2857 1.5779 11.3889 3.4016 0.0209 0.5971 200 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 7/43) Rules for Cluster [HPK] (Cov=16.28%) ? Mean-Time-Between-Movements[4] = Higher (100% [5/5]) ? Mean-Time-Between-Correct-Movements[3] = Higher (100% [3/3]) Cluster 2 Cluster 2[LPK]: (Size: 36/43) Rules for Cluster [LPK] (Cov=83.72%) ? Mean-Time-Between-Correct-Movements = Lower (100% [35/35]) ? Mean-Time-Between-Movements = Lower (100% [34/34]) Truncated-Features Dataset, Mountain[5]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 11.9091 3.3427 11.6250 3.2186 0.4083 0.0874 Truncated-Features Dataset, Mountain[6]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.2500 1.5612 11.3429 3.4305 0.0158 0.6021 201 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 8/43) Rules for Cluster [HPK] (Cov=18.6%) ? Mean-Time-Between-Consecutive-Correct-Movements = Higher (100% [6/6]) ? STD-Time-Between-Consecutive-Wrong-Movements = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 35/43) Rules for Cluster [LPK] (Cov=81.4%) ? Mean-Time-Between-Movements[5] = Lower (97.14% [34/35]) ? Mean-Time-Between-Correct-Movements[5] = Lower (97.14% [34/35]) Truncated-Features Dataset, Mountain[7]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.5 1.5 11.5128 3.3272 0.0509 0.6207 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 4/43) Rules for Cluster [HPK] (Cov=9.3%) ? Mean-Time-Spent-On-Mountain = Higher (100% [4/4]) ? Mean-Time-Spent-On-Wrong-Movements-Mountains = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 39/43) Rules for Cluster [LPK] (Cov=90.7%) ? Mean-Time-Spent-On-Mountain = Lower (100% [39/39]) ? Mean-Time-Between-Correct-Wrong-Moves = Lower (100% [39/39]) 202 Truncated-Features Dataset, Mountain[8]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=43 Cluster1 (HPK) Cluster2 (LPK) Statistics Measure Mean SD Mean SD P-value Cohen-d Prior Knowledge 13.5 1.5 11.5128 3.3272 0.0509 0.6207 Extracted Rules Cluster 1 Cluster 1[HPK]: (Size: 4/43) Rules for Cluster [HPK] (Cov=9.3%) ? Mean-Time-Spent-On-Wrong-Movements-Mountains = Higher (100% [4/4]) ? Mean-Time-Between-Correct-Wrong-Moves = Higher (100% [4/4]) Cluster 2 Cluster 2[LPK]: (Size: 39/43) Rules for Cluster [LPK] (Cov=90.7%) ? Mean-Time-Spent-On-Wrong-Movements-Mountains = Lower (100% [39/39]) ? Mean-Time-Between-Correct-Wrong-Moves = Lower (100% [39/39]) A.4 Understanding Patterns of Interactions in Groups with Different Learning Gain Using Truncated-Features Sets While the previous subsection focuses on understanding patterns of behaviors in groups of users with different prior knowledge, this subsection discusses interaction patterns in different groups of users with different learning gain. Interactions Discovery on Truncated-Mountains-Generic-Movements Dataset There is no statistically significant difference on percentage of learning gain of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 3, 4, 5, 6, 7 and 8 are included. On the contrary, there exists a statistically significant difference on learning gain of cluster 1 (HLG) of students and cluster 2 (LLG) of students when the 203 mountain-generic and mountain-specific movement and MG features of the second mountain are included. Truncated-Features Dataset, Mountain[1]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 4.4164 61.4494 -14.0386 64.1846 0.2165 0.2910 Truncated-Features Dataset, Mountain[2]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 4.9432 60.5463 -33.4343 62.5143 0.0416 0.6265 Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 25/39) Rules for Cluster [HLG] (Cov=64.1%) ? Mean-Time-Spent-On-Mountain = Lower (80.65% [25/31]) o Mean-Number-Of-Wrong-Movements-Per-Mountain = Lower (86.21% [25/29]) o Total-Number-Of-Wrong-Movements = Lower (86.21% [25/29]) ? STD-Time-Spent-On-Mountain = Lower (78.13% [25/32]) o Mean-Time-Spent-On-Mountain = Lower (83.33% [25/30]) Cluster 2 Cluster 2[LLG]: (Size: 14/39) Rules for Cluster [LLG] (Cov=35.9%) ? Mean-Time-Spent-On-Mountain = Higher (100% [8/8]) ? STD-Time-Spent-On-Mountain = Higher (100% [7/7]) 204 Truncated-Features Dataset, Mountain[3]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 16.67 16.670 -10.2119 65.2748 0.1541 0.4221 Truncated-Features Dataset, Mountain[4]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -6.8365 59.2583 -11.7037 70.0836 0.4135 0.0761 Truncated-Features Dataset, Mountain[5]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -5.7368 62.0192 -20.8325 69.7223 0.3052 0.2371 Truncated-Features Dataset, Mountain[6]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.3906 58.9776 -42.8571 74.0587 0.1169 0.6693 Truncated-Features Dataset, Mountain[7]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.3906 58.9776 -42.8571 74.0587 0.1169 0.6693 205 Truncated-Features Dataset, Mountain[8]-Generic, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -2.0441 57.3566 -55.0 84.2615 0.1410 0.8615 Interactions Discovery on Truncated-Mountains-Generic-MG Dataset There is no statistically significant difference on percentage of learning gain of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 2, 3, 4, 6, 7 and 8 are included. On the contrary, there exists a statistically significant difference on learning gain of cluster 1 (HLG) of students and cluster 2 (LLG) of students when the mountain-generic and mountain-specific MG features of the first 5 mountains are included. Truncated-Features Dataset, Mountain[1]-Generic, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -5.8977 64.4071 -34.52 53.4926 0.2172 0.4516 Truncated-Features Dataset, Mountain[2]-Generic, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -5.8977 64.4071 -34.52 53.4926 0.2172 0.4516 206 Truncated-Features Dataset, Mountain[3]-Generic, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 6.93 47.4592 8.4591 64.8439 0.2902 0.2449 Truncated-Features Dataset, Mountain[4]-Generic, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -7.3472 64.6091 -26.6667 52.4934 0.3301 0.303 Truncated-Features Dataset, Mountain[5]-Generic, MG Features, #Cluster Outlier:1, #data points=39 Cluster1(HLG) Cluster2(LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 36.546 24.9464 -12.9464 64.5025 0.0061 0.8142 Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 5/38) Rules for Cluster [HLG] (Cov=13.16%) ? Mean-Number-Of-MG-Usage-Per-Mountain = Higher (100% [4/4]) ? STD-Number-Of-MG-Usage-Per-Mountain = Higher (100% [2/2]) Cluster 2 Cluster 2[LLG]: (Size: 33/38) Rules for Cluster [LLG] (Cov=86.84%) ? Mean-Number-Of-MG-Usage-Per-Mountain = Lower (97.06% [33/34]) ? STD-Number-Of-MG-Usage-Per-Mountain = Lower (91.67% [33/36]) o Mean-Number-Of-MG-Usage-Per-Mountain = Lower (97.06% [33/34]) 207 Truncated-Features Dataset, Mountain[6]-Generic, MG Features, #Cluster Outlier:0 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -7.50 56.2917 -8.9857 64.7839 0.4839 0.0232 Truncated-Features Dataset, Mountain[7]-Generic, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -8.7361 63.8949 -10.0 64.8074 0.4904 0.0198 Truncated-Features Dataset, Mountain[8]-Generic, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 14.0 66.2118 -12.1912 62.9347 0.2435 0.4133 Interactions Discovery on Truncated-Mountains-Generic-MG+Movement Dataset There is no statistically significant difference on percentage of learning gain of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 3, 4, 5, 6, 7 and 8 are included. On the contrary, there exists a statistically significant difference on learning gain of cluster 1 (HLG) of students and cluster 2 (LLG) of students when the mountain-generic and mountain-specific movement and MG features of the first 2 mountains are included. 208 Truncated-Features Dataset, Mountain[1]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 6.0725 67.5054 -12.68 62.446 0.2587 0.2952 Truncated-Features Dataset, Mountain[2]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 5.8131 56.4166 -51.308 65.5837 0.0173 0.9697 Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 29/39) Rules for Cluster [HLG] (Cov=74.36%) ? STD-Time-Spent-On-Mountain = Lower (90.63% [29/32]) ? Mean-Time-Spent-On-Mountain = Lower (90.32% [28/31]) o Mean-Number-Of-Wrong-Movements-Per-Mountain = Lower (96.55% [28/29]) o Total-Number-Of-Wrong-Movements = Lower (96.55% [28/29]) Cluster 2 Cluster 2[LLG]: (Size: 10/39) Rules for Cluster [LLG] (Cov=25.64%) ? STD-Time-Spent-On-Mountain = Higher (100% [7/7]) ? Mean-Time-Spent-On-Mountain = Higher (87.5% [7/8]) o STD-Time-Spent-On-Mountain = Higher (100% [6/6]) 209 Truncated-Features Dataset, Mountain[3]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -6.6081 63.8841 -50.0 50.0 0.2710 0.6861 Truncated-Features Dataset, Mountain[4]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -6.8365 59.2583 -11.7037 70.0836 0.4135 0.0761 Truncated-Features Dataset, Mountain[5]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -5.7368 62.0192 -20.8325 69.7223 0.3052 0.2371 Truncated-Features Dataset, Mountain[6]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.3906 58.9776 -42.8571 74.0587 0.1169 0.6693 Truncated-Features Dataset, Mountain[7]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.3906 58.9776 -42.8571 74.0587 0.1169 0.6693 210 Truncated-Features Dataset, Mountain[8]-Generic, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -2.1061 58.2180 -45.8333 79.6041 0.1418 0.7054 Interactions Discovery on Truncated-Mountains-Generic+Specific-Movements Dataset There is no statistically significant difference on percentage of learning gain of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 3, 4, 5, 6, 7, and 8 are included. On the contrary, there exists a statistically significant difference on learning gain of cluster 1 (HLG) of students and cluster 2 (LLG) of students when the mountain-generic and mountain-specific movement and MG features of the first 2 mountains are included. Truncated-Features Dataset, Mountain[1]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -3.6390 60.5448 -17.7306 68.1290 0.2609 0.22 Truncated-Features Dataset, Mountain[2]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 (HLG) Cluster2 (LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 8.5992 55.5859 -50.1183 63.9718 0.0082 1.0061 211 Truncated-Features Dataset, Mountain[3]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -3.3233 59.8854 -17.6493 69.0949 0.263 0.2253 Truncated-Features Dataset, Mountain[4]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -0.4141 60.5963 -27.7767 67.2152 0.1293 0.4364 Truncated-Features Dataset, Mountain[5]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 0.7387 57.8047 -40.74 72.59 0.0821 0.6741 Truncated-Features Dataset, Mountain[6]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 0.7387 57.8047 -40.74 72.59 0.0821 0.6741 Truncated-Features Dataset, Mountain[7]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.9857 56.5323 -68.75 89.0488 0.1431 1.1004 212 Truncated-Features Dataset, Mountain[8]-Generic+Specific, Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.9857 56.5323 -68.75 89.0488 0.1431 1.1004 Interactions Discovery on Truncated-Mountains-Generic+Specific-MG Dataset There is no statistically significant difference on prior knowledge of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 2, 3, 4, 5, 6, 7 and 8 are included. Truncated-Features Dataset, Mountain[1]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -5.8977 64.4071 -34.52 53.4926 0.2172 0.4516 Truncated-Features Dataset, Mountain[2]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -5.8977 64.4071 -34.52 53.4926 0.2172 0.4516 Truncated-Features Dataset, Mountain[3]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -6.6081 63.8841 -50.0 50.0 0.271 0.6861 213 Truncated-Features Dataset, Mountain[4]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -6.6081 63.8841 -50.0 50.0 0.271 0.6861 Truncated-Features Dataset, Mountain[5]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -7.9595 63.1976 -25.0 75.0 0.429 0.2669 Truncated-Features Dataset, Mountain[6]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -8.7361 63.8949 -10.0 64.8074 0.4904 0.0198 Truncated-Features Dataset, Mountain[7]-Generic+Specific, MG Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -8.7361 63.8949 -10.0 64.8074 0.4904 0.0198 Truncated-Features Dataset, Mountain[8]-Generic+Specific, MG Features, #Cluster Outlier:2, #data points=37 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 9.59 62.0822 -11.8146 63.5822 0.2047 0.3386 214 Interactions Discovery on Truncated-Mountains-Generic+Specific-MG+Movements There is no statistically significant difference on prior knowledge of the 2 clusters when the mountain generic and mountain specific movement features of the first, the first 4, 5, 6, 7 and 8 are included. On the contrary, there exists a statistically significant difference on learning gain of cluster 1 (HLG) of students and cluster 2 (LLG) of students when the mountain-generic and mountain-specific movement and MG features of the first 2 and 3 mountains are included. Truncated-Features Dataset, Mountain[1]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -3.639 60.5448 -17.7306 68.129 0.2609 0.22 Truncated-Features Dataset, Mountain[2]-Generic+Specific, MG+Movement Features, #Cluster Outlier:1, #data points=38 Cluster1(HLG) Cluster2(LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 3.1193 56.2332 -58.9275 69.4363 0.0273 1.0471 215 Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 30/38) Rules for Cluster [HLG] (Cov=78.95%) ? STD-Time-Between-Consecutive-Correct-Movements = Lower (93.1% [27/29]) o "TlTMnt1" = Lower (100% [26/26]) o STD-Time-Between-Correct-Movements[2] = Lower (100% [26/26]) ? Mean-Time-Between-Consecutive-Correct-Movements = Lower (92.59% [25/27]) o Mean-Number-Of-Wrong-Movements-Per-Mountain = Lower (100% [25/25]) o Total-Number-Of-Wrong-Movements = Lower (100% [25/25]) Cluster 2 Cluster 2[LLG]: (Size: 8/38) Rules for Cluster [LLG] (Cov=21.05%) ? "TlTMnt1" = Higher (80% [4/5]) ? Std-Number-Of-Movements-Per-Mountain = Higher (80% [4/5]) o STD-Time-Between-Consecutive-Correct-Movements = Higher (100% [4/4]) o STD-Time-Between-Consecutive-Correct-Movements[2] = Higher (100% [4/4]) Truncated-Features Dataset, Mountain[3]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1(HLG) Cluster2(LLG) Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 8.2713 58.8188 -33.4213 63.0487 0.0256 0.6881 216 Extracted Rules Cluster 1 Cluster 1[HLG]: (Size: 23/39) Rules for Cluster [HLG] (Cov=58.97%) ? Mean-Time-Between-Consecutive-Correct-Movements[2] = Lower (82.14% [23/28]) o Mean-Time-Between-Correct-Movements[2] = Lower (95.65% [22/23]) o Number-Of-Movements[1] = Lower (87.5% [21/24]) ?? Mean-Time-Between-Correct-Movements[2] = Lower (100% [20/20]) ? Mean-Time-Between-Correct-Movements[2] = Lower (78.57% [22/28]) o "TlTMnt2" = Lower (88% [22/25]) ?? Mean-Time-Between-Consecutive-Correct-Movements[2] = Lower (100% [22/22]) ?? Mean-Time-Between-Movements[2] = Lower (95.45% [21/22]) o Mean-Time-Between-Movements[2] = Lower (87.5% [21/24]) ?? Mean-Time-Between-Consecutive-Correct-Movements[2] = Lower (95.45% [21/22]) ?? Std-Number-Of-Correct-Movements-Per-Mountain = Lower (95% [19/20]) ? "TlTMnt2" = Lower (100% [19/19]) ? Mean-Time-Between-Consecutive-Correct-Movements[2] = Lower (100% [19/19]) Cluster 2 Cluster 2[LLG]: (Size: 16/39) Rules for Cluster [LLG] (Cov=41.03%) ? Mean-Time-Between-Consecutive-Correct-Movements[2] = Higher (100% [11/11]) ? Mean-Time-Between-Correct-Movements[2] = Higher (90.91% [10/11]) Truncated-Features Dataset, Mountain[4]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -0.4141 60.5963 -27.7767 67.2152 0.1293 0.4364 217 Truncated-Features Dataset, Mountain[5]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 0.7387 57.8047 -40.74 72.59 0.0821 0.6741 Truncated-Features Dataset, Mountain[6]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG 0.7387 57.8047 -40.74 72.59 0.0821 0.6741 Truncated-Features Dataset, Mountain[7]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.9857 56.5323 -68.75 89.0488 0.1431 1.1004 Truncated-Features Dataset, Mountain[8]-Generic+Specific, MG+Movement Features, #Cluster Outlier:0, #data points=39 Cluster1 Cluster2 Statistics Measure Mean SD Mean SD P-value Cohen-d PLG -1.9857 56.5323 -68.75 89.0488 0.1431 1.1004 218 Appendix B Pre-Test 219 220 Appendix C Post-Test 221 222 Appendix D Implementation and Setup in Prime Climb D.1 New Rollup Strategy in Prime Climb?s Student Model Prime Climb Student modeling in Prime Climb is described in Chapter 2 and in more details in [26]. The student models in the old version (non-web based version) and new version (web based version) use exactly the same strategy and mechanism in modeling the student. Therefore, the two user studies described in Chapter 2 of the thesis, used the same student modeling approach. In this appendix, we refer to this model as Original student model for clarity. The rollup procedure in the original student model has been described in Chapter 2. In analysis the belief propagation in the original student model in Prime Climb, some issues with rollup procedure in the original student model were identified. Overall, there were 2 main issues with rollup in the original student model: 1) Within-Levels Rollup Issue: This refers to the issue with the rollup procedure every time a student makes a movement and a new time slice is generated in the model. 2) Between-Levels Rollup Issue: This refers to the issue with the rollup procedure every time the student completes a mountain and switches to the next mountains. These two issues are explained in more detains in bellow. 223 Within-Levels Rollup Issue The following figures represent the roll-up procedure in the student's model in Prime Climb. After each movement and during the rollup process, the CPT of the PriorX changes to reflect the revised belief of Fx. In other words, P(PriorX[ti+1]="Known") is set to P(Fx[ti]="Known"|Move[ti]). In other word, the probability that PriorX is known at time [ti+1] is equal to the posterior probability that FX is known given the correctness of movement at ti. When the Max parameter is zero (as it is in the Micheline's thesis [63]), there is no issue and rollup works well and we have: P[ti+1](Fx="Known") = P(PriorX[ti+1]="Known") = P(Fx[ti]="Known"|Move[ti]) But when "Max" is not zero, P[ti+1](Fx="Known") is not equal to P(PriorX[ti+1]="Known") and instead the change in probability of PriorX is propagated down to FX and change the belief on FX and this makes a self-feedback mechanism and the P(FX="Known") keeps changing even there is no evidence on FX. The following example illustrates the within-level rollup issue. Following figures represent the steps of roll-up in the student's model. Suppose that the movement involves numbers 17 and 150. Before the Click node is added to the network, we have: P[ti](N17=Known)=0.5 and P[ti](N150=Known)=0.5 1- The Click node is added as the child of N17, N150 and CF nodes. Since 17 and 150 do not have a common factor, so the movement is valid. Therefore the evidence on Click node is accordingly set. 2- As the results of setting evidence on Click node, the N17, N150 and CF nodes and the other nodes in the network that are not d-separated from N17, N150 and CF are updated. 3- The rollup is carried out. 224 225 4- As shown in the following figures, before and after setting the evidence on the Click node we have: P(N50=Known)=0.65 and P(N50=Known|Click=Correct)=0.66 226 5- Since N50 is a non-root node, the CPT of PriorN50 node will change to reflect the new belief on N50 before removing the Click node. The new CPT of PriorN50 is shown in the following table: CPT of PriorN50 P[ti+1](PriorN50=Known) P[ti](N50=Known| Evidence on Click node) P[ti+1](PriorN50=Unknown) 1-P[ti](N50=Known| Evidence on Click node) 6- As the result of changing the CPT of PriorN50 node, the belief on N50 also changes to 0.77 as shown in the following figure. This is while the student has not made a new movement and no other observation has been made. Such change (self-feedback may happen in other none-root nodes in the network) To resolve this issue, the roll-up procedure has been change to neutralize the impact of self-feedback as much as possible. To this end, in step 5 above, the new CPT of PriorN50 is calculated in such as a way that the difference between the following probabilities is minimized. (the best case is when the difference is zero) P[ti+1](N50=Known ) - P[ti](N50=Known| Evidence on Click node) Between-Levels Rollup Issue The rollup procedure happens every time that the student makes a movement and when the student switches to the next mountain. In the previous section, the issue with rollup within the levels (when the student makes movements on the mountain) was described. The rollup process that happens when switching from one mountain to the next mountain is called Between-Levels rollup process. When switching from one mountain to the next mountain the final posterior probabilities of the nodes in the previous mountains are transferred to the next network as the prior probabilities of the nodes in the new network. This could cause the same issue as the within- 227 level rollup issue described before. Yet, it is not always possible to apply the same solution to resolve the between-levels rollup issue. This is due to inconsistencies in the nodes dependencies across the models (mountains). To deal with this issue the following situations are taken into account. Each situation is explained using an example for clarification. 1) For instance, in the model (mountain)[i], node N7 has two parents (PriorN7, N42) while in the model (mountain) [i+1], node N7 has different parents (PriorN7, N35, N42, N21, N14, N28). In this case, the set of parents of N7 in mountain [i] is subset of set of parents of N7 in mountain [i+1] and the intersection of two sets is just PriorN7. In this situation, the rollup procedure, just copies the CPT of PriorN7 in model [i] to the CPT PriorN7 in model [i+1]. 2) Suppose that parents of node N11 in model [i] is denoted by Parents[i](N11) and ? Parents[i](N11)={PriorN11, N22, N33 } ? Parents[i+1](N11)={ PriorN11, N22, N33, N44, N99} ? Parents[i+1](N11) is not a subset of Parents[i] and the intersection of the two sets is not empty. In this case rollup procedure will be similar to between-level rollup procedure. In other words, P[i+1](PriorN11=?Known?) should be adjusted such that P[i+1](N11=?Known?) = P[i](N11=?Known?) Since the situation 2 happens a lot in the model and the rollup procedure could be very time consuming, we decided to change the models (networks) in a way that for each node NX in the network we have: Parents[i]( NX) is subset of Parents[i+1]( NX). Given this change, the rollup process explained in the situation 1 will be used which is a fast process. 228 D.2 Game Architecture and Design This section is a copy of the Chapter 2 of [66]. ?Game Architecture The new implementation consists of three parts: the server, the client(s), and the database. A fourth component, an administrative application was planned but has not been implemented at this point in time. The following figure illustrates the interactions between the various components and also describes the technologies and/or languages used to build that component. In the next sections we will describe each component in more detail. Database ? Microsoft SQL Admin ? ASP.NET Application Logic Layer Server ? C# ? Windows Server 2008 ? Entity Framework Communication Layer Server ? Windows Communication Foundation ? C# ? Windows Server 2008 Player Client ? Silverlight ? C# Player Client ? Silverlight ? C# Player Client ? Silverlight ? C# Player Client ? Silverlight ? C# 229 Server This is the backbone of the entire Prime Climb game. It consists of two layers: Communication Layer: Application Layer: This layer handles the processing of game play, state, logging and user models. The Communication Layer manages all the communication from the server to the other components (client, database and all other future components). We used Windows Communication Foundation and C# to implement the communication features. We also used Entity Framework components to simplify the communication between the server and the database. This part takes messages coming in from the clients and passes the information to the application layer to be processed and acted upon as needed. The Application Layer handles the processing of game play, state, logging and user models. This section was written in C# and used Entity Framework to communicate with the database. The user model uses SMILE and its C# interface to implement the Bayesian network. The game server has several states which it can be in during the game play: START: In this state, two players have joined a game instance and both send ready signals to the server. The server then goes into the READY state. READY: In this state, the server builds the game state for a mountain and sends the information to both players. Once completed, the server goes into the IDLE state. IDLE: In this state, the server is waiting for a client to make a move. Either player can make a move, whoever sends a move first, gets to make the next move. There are four possible states, the game can move to which are MOVING, FALLING, RESET, GOAL 230 MOVING: In this state, a player has made a correct move. The server stops accepting any new requests until the move is done. Once the move is made, both clients get an update due to the move and server state returns to IDLE. FALLING: In this state, a player has made an incorrect move but there is at least 1 valid move upon making a fall (i.e., at least one number the player is swinging over does not share a common factor with the number the partner is on). The server sends the falling message for the client to animate and waits for the player to pick a hexagon to continue. In this state, the only action available is to choose a hexagon. Once a hexagon is chosen, the state returns to IDLE RESET: In this state, a player has made an incorrect move but there is no valid move upon falling. This could be due to being too close to the bottom of the mountain (which would cause the player to fall off the mountain) or due to the case where all three hexagons the person is swinging across contain a common factor with the hexagon the partner is on. The server resets both the user?s positions to the bottom of the mountain in one of the possible valid starting points of that mountain. The server then returns to the IDLE state. GOAL: In this state a player reaches the top of the mountain. The server then sends an update to each client to show that one player reached the top of the mountain, and then goes into the READY state for the next mountain. If there are not any more mountains left, it exits the game and goes to the EXIT state EXIT: In this state, the game is finished. The signal for the game animation for completion is given to both players. The server sends a message back to both clients to go to the lobby screen. 231 When a correct or incorrect move is made, the server will update the game state for the effect of that move. It will also send an update to the user model and make a decision on whether to provide a hint. Client Users of the Prime Climb game directly interact with the client application. The client is the graphical front end of the system. It consists of 3 different screens: 1. Log in / Registration Screen: This display gives the user an opportunity to log in with a pre-existing user name or to create a new user by filling in the required information (user name, name, grade, and gender). 2. Game Room Screen: This display provides information on the player, other players logged into the game and a space for players to ?seat? themselves at a game. Currently there are two types of game rooms ? a 2 player game and a prototype computer game room which allows a player to play against a computer. 232 3. Game Screen: This displays the prime climb game. User actions are sent back to the server where they are used to advance game play and/or get recorded into log files for future analysis. The users are able to do the following actions: a. Move their character to other hexes. b. Use the magnifying tool to display the factorization of any number on the mountain c. Choose to close the hint window or request another hint. The client was written using C# with Silverlight components. Silverlight was chosen due to its integration with Windows Communication Foundation which was chosen for the Server Application. 233 Database The database is used to store logs from the clients, user model data, and authentication data. It also stores the mountain information (used by the client to build a mountain), the hints which can be provided to the player and various parameters which can be modified by researchers to change the way the model works and hints are provided. Storing this information in a database rather than the server code allows researchers to modify the parameter dynamically and removes the need to recompile the code when a change is made as you only need to restart the server for these changes to become active. The server uses Microsoft SQL, which was chosen as it provides simple integration with the server technology through the use of Entity Framework. This framework reduces the need to write code for the interactions between the server and the database. The database itself consists of 15 tables that work together to store the relevant data for the Prime climb game. The tables with their relationships with other tables are shown in the following figure. We also created a table ?debug? which allows us to more easily debug the 234 web application. In the next part, I will describe each table and its function in relation to the game. Database Tables ? Mountain: This table describes all the information needed to build a mountain. In addition to the size of the mountain, we store as an array of integers, the numbers that go in each hexagon ordered from the bottom left corner and going across each row before moving to the next layer. We also store the id of the picture which should be used as the background for that mountain as well as the x, y coordinates of where the mountain should be located on the screen. ? Config: This table allows us to store various parameters of the game which we may want to modify on the fly. This includes the number of game rooms we want to display, as 235 well as the number of hinting strategies available. These two pieces of information allow the server to set up the correct number of rooms and also vary the hinting strategy of each room. For example with two hinting strategies, the odd game rooms would be running one hinting strategy while the even game rooms would run the other strategy. This allows us to test more than one hinting strategy at a time which can facilitate larger scale trials of the game. This table also directs what type of prior knowledge values we should use (user specific, generic or population based) as well as the probability values we should use for educated guesses (e_guess), guesses and slips. We also store the threshold values of when to give a hint. There is also a setting for whether both players will receive hints or just the first player which was implemented to reduce distraction (caused when the second player receives a hint) to a single subject when they played with an investigator. ? Hints: This table stores the text of the various hints, whether the hint needs to substitute the numbers related to the previous move and whether it requires the calculation of factors (which have to be calculated based on the numbers related to the previous move), it also has a flag which allows us to set a hint as active so that only active hints will be given during game play. It also keeps its association to the hintType and hintCategory. ? HintType: This table allows us to set different hint types. Currently this allows us to distinguish between correct and wrong hints. As hints are different based on whether the move was made correctly or incorrectly. ? HintCategory: This table refers to the category that the hint falls into. Currently there are the following types of hints: Focus, two types of Definition (Factorization or Common Factor), Tool and Bottom-Out. 236 ? Users: This table includes the demographic information on the subject (grade, gender) as well as the players name and a username which allows them to log into the game server. ? PriorKnowledge: This table allows us to set the prior knowledge nodes of the Bayesian network to three possible settings. We can set the priors to a generic value (0.5), population values based on the knowledge we have collected from previous student pretest scores, or user specific values which we can set based on the results of the player`s pretest. ? LogGames: This table ties the many components of the game together. It stores the user id of the two players playing the game as well as the time the game started and the hinting strategy being used for that game. The game id is used by other tables to identify the unique game that the information is associated with. ? LogHints: This table keeps track of the hints that the player receives while they play the game. It records which hint it was as well as the time the hint was first displayed as well as the time it was closed. It also keeps track if the user requested another hint or not. ? LogAgentModels: This table records each move as it records the numbers each player is on, which player made the move and the time they made the move. It also records the current belief of the agent at the time of the move on the number the player and his partner is on and the belief of whether the player understands the common factor concept. It also records whether the move was correct. ? LogFactors: This table stores the last value calculated on the belief of the model on the users? knowledge of each number they encounter. ? LogModels: This table stores the last value calculated on the belief of the model on the common factor node and KFT (knows factor tree) node. 237 ? LogMovesTypes: This table stores the different possible type of moves that a player can make. They are Restart (when the start or restart a mountain), Position (used at the start of a mountain to show where each player is at), Move_Success (when the move they make is valid and occurs), Move_Fail (when the player tries to move somewhere that is invalid like an obstacle and as a result fails to make a move), Fall (when a user makes a wrong move and falls, it records where the user ends up) and Mag_Used (when a user uses the magnifying glass) ? LogMoves: This table records all the moves made by both players.? D.3 Starting a Prime Climb Game and Playing the Game If the last time the game started successfully is back to a while ago (more than 10 days), make sure that you restart the database service and IIS before running a new game. To start a new Prime Climb game, follow the following instruction: 1. You will need Silverlight plug-in, if you don?t currently have it, download the plug-in from http://www.silverlight.net (available for Windows / Mac) 2. Go to the website: http://XXXX.cs.ubc.ca:6030/client ? Change XXXX to the name of the machine on which Prime Climb?s server is running. For instance: http://rome.cs.ubc.ca:6030/client 3. Create an account ? you will need a unique username. Fill in the rest of the items (make up your grade) and click ?Register?. 4. If the account is created successfully, your user name will be listed on the left side and you can click ?Login? to enter the game. 238 5. The right side of the lobby screen will give your details and a list of users logged in ? if the upper part is blank then you are not fully logged in, just refresh the browser and fill in your username and click ?Login?. 6. To start a game, you will need to find a partner. The left side will show a series of rooms (1-30), choose a seat by clicking the ?Sit? button. 7. When another player joins your room, the ?Ready? button will be available, each player needs to click this and your game will start. 8. Play the Game. 8.1: The goal of this game is to work together with your partner to climb the 12 mountains. 8.2: Each player is a mountain climber (your climber has a yellow glow around him) who is sitting on a hexagon with a number on it. 8.3: You move around the mountain by moving to a green highlighted hexagon. You will want to select hexagons whose numbers do not share any common factors with the hexagon your partner is on. You do not have to take turns. 8.4: If you choose a number which shares a common factor with your partner?s number, you will fall and start swinging. To stop swinging, click on one of the hexagons your climber is swinging across, if it also is a common factor, you will continue swinging. 8.5: You cannot climb on obstacles or move more than 2 spaces away from your partner 8.6: You can use the magnifying glass to look at the factor tree for a number. You can do this by clicking on the magnifying glass (at the top right of the screen) 8.7: You will be provided with some hints and advice as you play the game D.4 Setting up a New Prime Climb Follow the following instruction to setup a new Prime Climb Server: 239 1) Install Windows: Since .Net framework has been used for development of Prime Climb, the server cannot be setup on a unix-based machine. Therefore you have to install the Prime Climb?s server on a Windows machine. It is recommended that you install Prime Climb?s server on Windows Server operating system. Although the other versions of Windows should also be working (not tested though). To install the Windows Server 2008, just put the installation disk into the DVD drive and boot from the DVD drive. Then, follow the installation steps. 2) Install Microsoft Framework 3.5: On Windows Server: ? Go to Server Manager > Features > Add Features ? Check the Check box .Net Framework 3.5.1 Features ? Click Add Required Features ? Click Next ? Click Install 3) Install Microsoft Visual Studio 2008: After installing Microsoft Visual Studio 2008, if not already done, upgrade to SP1 (Service pack 1). 4) Install Microsoft SQL Server 2008: During the SQL Server 2008 Setup, an error message shows on the Setup Support Rules window when the Restart computer operation failed. To solve Restart Computer Failed error when installing or uninstalling MSSQL follow the instruction bellow: ? Open the edit ? Open HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ SessionManager ? Double Click ?PendingFileRenameOperations? 240 ? Delete everything ? Press OK and close the regedit 5) Install Microsoft Silverlight 6) Get Prime Climb Project?s Files: Prime Climb?s files are located under conati folder on the cs-smb. The full address is cs-smb/research/lci/project/conati/PrimeClimb ? The original Prime Climb student model is under NewPrimeClimb folder ? The modified (rollup modified+new models) Prime Climb student model is under ModifiedPrimeClimb folder 7) Setup IIS 7: The newer versions of IIS should also work. Windows 7 has been tested and worked properly. After installing IIS, if not already installed, you need to setup the IIS for Prime Climb: ? Open IIS (Internet Service Provider) from Control Panel ? Under the <Machine-name> Sites (for example: ROMA>Sites) right click the ?Default Web Site? and select ?Add Application?. ? Name the application ?service? to field Alias, and set the physical path to the location where the PrimeClimbServiceSite is. Note that PrimeClimbServiceSite is a folder under the Prime Climb main (root) directory. ? Create another application named ?client? and set the physical path to the location where the PrimeClimbClient.Web is. Note that PrimeClimbClient.Web is located under PrimeClimbClient located in the root Prime Climb folder. 8) About WCF Service: In the PrimeClimbService project, the Services folder contains codes that define the communication interfaces between the service and the client. So every time the interfaces get changed, we have to go to the PrimeClimbClient project 241 and update the service references. To update the service references, under the PrimeClimbClient project, in the Service references folder, right click each element and select ?Update Service Reference?. 9) Setup the Database: ? The current database that is used is called ?NewPrimeClimb? in the SQL 2008 and located on ROMA machine. D.5 Issues and Troubleshooting 1) Load Testing of the New Prime Climb (web-based): ? During summer 2011 we did load test on the web-based Prime Climb. During this test about 8 instances of the game were running properly. Despite successful load testing the game, during a real experiment in which 10 students played the game simultaneously, the game crashed and the experiment could not be completed. The probable issues have not been resolved yet by the date of this thesis (March 2013). 2) If Prime Climb did not start from a browser: ? If you could not log into the game or could not create a new account or if the game does start, restart the SQL service and IIS service from Control Panel.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- User modeling and data mining in intelligent educational...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
User modeling and data mining in intelligent educational games : Prime Climb a case study Davoodi, Alireza 2013
pdf
Page Metadata
Item Metadata
Title | User modeling and data mining in intelligent educational games : Prime Climb a case study |
Creator |
Davoodi, Alireza |
Publisher | University of British Columbia |
Date Issued | 2013 |
Description | Educational games are designed to leverage students’ motivation and engagement in playing games to deliver pedagogical concepts to the players during game play. Adaptive educational games, in addition, utilize students’ models of learning to support personalization of learning experience according to students’ educational needs. A student’s model needs to be capable of making an evaluation of the mastery level of the target skills in the student and providing reliable base for generating tailored interventions to meet the user’s needs. Prime Climb, an adaptive educational game for students in grades 5 or 6 to practice number factorization related skill, provides a test-bed for research on user modeling and personalization in the domain of education games. Prime Climb leverages a student’s model using Dynamic Bayesian Network to implement personalization for assisting the students practice number factorization while playing the game. This thesis presents research conducted to improve the student’s model in Prime Climb by detecting and resolving the issue of degeneracy in the model. The issue of degeneracy is related to a situation in which the model’s accuracy is at its global maximum yet it violates conceptual assumptions about the process being modeled. Several criteria to evaluate the student’s model are introduced. Furthermore, using educational data mining techniques, different patterns of students’ interactions with Prime Climb were investigated to understand how students with higher prior knowledge or higher learning gain behave differently compared to students with lower prior knowledge and lower learning gain. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2013-10-17 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0052185 |
URI | http://hdl.handle.net/2429/45274 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2013-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2013_fall_davoodi_alireza.pdf [ 16.01MB ]
- Metadata
- JSON: 24-1.0052185.json
- JSON-LD: 24-1.0052185-ld.json
- RDF/XML (Pretty): 24-1.0052185-rdf.xml
- RDF/JSON: 24-1.0052185-rdf.json
- Turtle: 24-1.0052185-turtle.txt
- N-Triples: 24-1.0052185-rdf-ntriples.txt
- Original Record: 24-1.0052185-source.json
- Full Text
- 24-1.0052185-fulltext.txt
- Citation
- 24-1.0052185.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0052185/manifest