Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the use of eye-tracking in the assessment of self-explanation Merten, Christina 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2006-0078.pdf [ 5.09MB ]
Metadata
JSON: 831-1.0051576.json
JSON-LD: 831-1.0051576-ld.json
RDF/XML (Pretty): 831-1.0051576-rdf.xml
RDF/JSON: 831-1.0051576-rdf.json
Turtle: 831-1.0051576-turtle.txt
N-Triples: 831-1.0051576-rdf-ntriples.txt
Original Record: 831-1.0051576-source.json
Full Text
831-1.0051576-fulltext.txt
Citation
831-1.0051576.ris

Full Text

On the Use of Eye-tracking iri the Assessment of Self-Explanation by Christina Merten B.Sc, The University of Texas at Dallas, 2002 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T OF T H E REQUIREMENTS OF T H E D E G R E E OF Master of Science in The Faculty of Graduate Studies (Computer Science) The University of British Columbia December, 2005 © Christina Merten 2005 ii Abstract T h e accuracy of a user model typically depends on the amount and quality of information available on the user's states on interest. A n eye-tracker provides data indicating where a user is looking during interaction with the system. In this thesis we present research on the use of eye-tracking data for on-line assessment of user meta-cognitive behavior during the interaction with an open learning environment. W e describe the design of a probabilistic model that processes this information and its formal evaluation. W e demonstrate that adding eye-tracking information significantly improves the model accuracy on assessing user exploration and self-explanation behaviors. Table of Contents iii Table of Contents Abstract ii Table of Contents iii List of Tables vi List of Figures viii Acknowledgements ix 1 Introduction 1 1.1 Open Learning Environments 1 1.2 Self-Explanation 2 1.3 Thesis Approach: Using Eye-tracking as a Predictor of Self-Explanation 3 1.4 Thesis Goals and Contributions 3 1.5 Outline . . 4 2 Related Work 5 2.1 Eye-tracking Research 5 2.1.1 Retrospective Analysis of Eye Movements 5 2.1.2 The Online Use of Eye-tracking in Interface Operation 6 2.1.3 Eye-tracking for Online Adaptation of System Interaction 7 2.2 Research on Supporting Self-Explanation 8 2.3 Open Learning Environments 9 3 ACE and the Addition of Eye-tracking to Assess Self-Explanation 11 3.1 The A C E Learning Environment 11 3.2 Previous Versions of the A C E Student Model 12 3.2.1 The Original A C E Model 13 3.2.2 Extending A C E to Monitor and Support Self-Explanation 15 Table of Contents iv 4 Preliminary User Study 18 4.1 Using Eye-tracking Data to Assess Implicit Self-Explanation . . .18 4.2 Study Goal . 19 4.3 Participants 19 4.4 Experiment Design 19 4.5 Data Analysis 21 4.5.1 Setting the "Gold Standard" 21 4.5.2 Time as a Predictor of Self-Explanation 25 4.5.3 Gaze Shifts as a Predictor of Self-Explanation 26 4.6 Results 28 4.7 Discussion 31 5 The New A C E Student Model 34 5.1 Defining "SE-related Behavior" 34 5.2 Adding "SE-related Behavior" to the model 35 5.2.1 The Naive Bayesian Classifier 35 5.2.2 Setting the Conditional Probabilities 35 6 Testing the Student Model 38 6.1 Initial Tests of the Student Model 38 6.2 Additional Training of the Student Model 42 6.3 New User Study 44 6.4 Testing the Student Model 45 6.4.1 Testing the Accuracy of the New Model with Different Priors 45 6.5 Cross-validation Analysis 51 6.5.1 Customizing Prior Probabilities 51 6.5.2 Comparing the Different Models 53 6.5.3 Calculating Optimal Thresholds 53 6.6 Testing the Performance of the Models Using Different Evidence of Self-Explanation 55 7 Pupil Dilation as a Predictor of Self-explanation 57 7.1 Data Collection 57 7.2 Results and Discussion 59 8 Conclusions and Future Work .65 Table of Contents v 8.1 Satisfaction of Thesis Goals 65 8.1.1 User Study to Assess Time and Gaze Shifts as Predictors of Self-Explanation 65 8.1.2 Design of Revised Student Model 66 8.1.3 Evaluation of Performance of New Student Model 66 8.2 Summary of Conclusions 67 8.3 Limitations 68 8.4 Future Work 68 Bibliography 70 A Pretest 73 B Posttest 81 V I List of Tables 4.1 Experiment schedule ^2.0 4.2 Classification scheme for coding episodes of positive and negative self-explanation 2%~ 4.3 Time elapsed after actions with and without self-explanation 25 4.4 Raw experiment data 29 4.5 Classification accuracy of different predictors 31 6.1 Values of implicitSE nodes corresponding to actions in study data. 39 6.2 ImplicitSE accuracies with generic prior probabilities using training data 40 6.3 Exploration node accuracies with generic prior probabilities using training data 42 6.4 Accuracies of implicitSE nodes for different prior probabilities. . . 46 6.5 Areas of R O C curves for implicitSE nodes 47 6.6 Accuracies of exploration nodes for different prior probabilities.. . 48 6.7 Areas of R O C curves for Exploration nodes 50 6.8 Cross-validation results for implicitSE nodes using various prior probabilities 52 6.9 Cross-validation results for exploration nodes using various prior probabilities 52 6.10 Cross-validation results for implicitSE nodes using various student models 53 6.11 Cross-validation results for exploration nodes using various student models 53 6.12 Mean optimal thresholds and standard deviations for implicitSE node assessments across folds of leave-one-out cross-validation. . 54 6.13 Mean optimal thresholds and standard deviations for exploration node assessments across folds of leave-one-out cross-validation . . 54 6.14 ImplicitSE accuracies for the A C E model using different predictors as evidence of implicit self-explanation 55 6.15 Exploration accuracies for the A C E model using different predictors as evidence of implicit self-explanation 56 7.1 Sample fragment of pupil size data 59 7.2 Mean normalized pupil sizes with and without self-explanation. . . 60 7.3 Sample fragment of pupil size data using Z-score normalization . 61 7.4 Mean Z-score normalized pupil sizes with and without self-List of Tables vii explanation 62 .5 Mean Z-score normalized pupil sizes with and without self-explanation over only the middle 60% of each time interval 63 V l l l List of Figures 3.1 The Machine Unit and Arrow Unit 11 3.2 The Plot Unit 13 3.3 High level structure of A C E ' s student model 14 3.4 Original A C E model with self-explanation 16 4.1 The Eyelink I eye-tracker 20 4.2 Coding of negative and positive self-explanation and utterance that has yet to be associated with an action 24 4.3 R O C curve for time as filter for self-explanation 26 4.4 Sample gaze shift 27 4.5 Excerpt of A C E log file with added gaze shift tags 28 4.6 Excerpt of user data file 28 4.7 Dual histogram of experiment data 29 4.8 R O C curve for time and gaze shifts as a combined predictor 30 5.1 The A C E student model with a naive Bayesian classifier component 36 5.2 The A C E student model for two time slices 37 6.1 R O C curves for models as predictors of implicit self-explanation. . . 40 6.2 R O C curves for models as predictors of sufficient exploration 41 6.3 Two time slices of the adjusted A C E student model 43 6.4 R O C curves for models as predictors of implicit self-explanation over testing data with generic prior probabilities 46 6.5 R O C curves for models as predictors of sufficient exploration over testing data with generic prior probabilities 49 7.1 R O C curve for pupil size as a predictor of self-explanation 61 7.2 R O C curve for pupil size, normalized with Z-scores, as a predictor of self-explanation 62 7.3 Pupil size plotted over the time interval when it is largest at the beginning (a) or at the beginning or the end (b). In (b), dotted lines indicate the middle 60% of the time interval 63 7.4 R O C curve for pupil size as a predictor of self-explanation, using the middle 60% of the time interval after each action 64 ix Acknowledgements First, I would like to thank my supervisor, Cristina Conati, for her helpful guidance over the last few years. Special thanks to Kasia Muldner and Andrea Bunt, both for their help with A C E and for their friendship. Giuseppe Carenini graciously agreed to be my second reader. Finally I would like to thank my mother and stepfather, Susan and Don Cundieff, and my father, Jay Merten, for their support and encouragement over the years. Chapter 1 i Introduction Eye movements have long been used as a source of evidence revealing cognitive state. The active theory of visual form perception relates fixation duration and location to internal events within the mind [21]. Studies in this field have shown that gaze data can be used to infer such higher mental processes as attention, understanding and memory [21]. Metacognitive skills are high-level, domain independent skills which can greatly influence learning [6]. These include self-explanation, self-explanation, analogical reasoning, and the ability to learn through exploration. The metacognitive skill of self-explanation is particularly important to the use of open learning environments which rely on unconstrained exploration of the interface to teach domain material. The research described in this thesis was carried out to explore the use of eye movements as a predictor of self-explanation in an open learning environment. 1.1 Open Learning Environments A n Intelligent Tutoring System (ITS) is a computer-based tutor which includes knowledge of the domain material being taught (an expert model) and knowledge of the learner (a student model) [27]. With a traditional ITS, students solve problems chosen to serve as effective learning experiences. Throughout this interaction, the student model maintains an assessment of the learner's knowledge of the domain material which the system may use to provide tailored instruction. Open learning environments are intelligent tutoring systems which depend on free exploration of the interface by the user rather than structured explicit instruction. By exploring the system interface, students can acquire a deeper, more structured understanding of the domain material [27]. However, not all students are able to explore effectively in these environments. In many cases it is because they lack the prerequisite knowledge necessary to understand the domain material. Other students lack the meta-cognitive skills needed to initiate exploratory actions or choose actions which cover all concepts in the target domain. Furthermore, even those who can explore appropriately are sometimes unable to interpret and generalize the results of their actions. In particular, many students are unable Chapter I. Introduction 2 o r u n w i l l i n g to s e l f - e x p l a i n , i.e. t o c l a r i f y a n d e l a b o r a t e the i n f o r m a t i o n p resen ted w i t h respec t t o u n d e r l y i n g d o m a i n t h e o r y [ 7 , 10] F o r these s tudents i t is necessary f o r the s y s t e m to p r o v i d e o n - l i n e s u p p o r t t o e n c o u r a g e e f f e c t i v e e x p l o r a t i o n . W h i l e such s u p p o r t can b e u s e f u l w h e n n e e d e d , i t can a lso u n d e r m i n e the un res t r i c tedness o f t h e o p e n l e a r n i n g e n v i r o n m e n t . T h u s i t is necessary t o m o d e l s tuden t e x p l o r a t o r y t e n d e n c i e s so tha t i n t e r v e n t i o n can o n l y o c c u r w h e n needed . 1.2 Self-Explanation S e l f - e x p l a n a t i o n is the m e t a c o g n i t i v e s k i l l o f s p o n t a n e o u s l y e x p l a i n i n g t o o n e s e l f i n s t r u c t i o n a l m a t e r i a l i n t e r m s o f the u n d e r l y i n g d o m a i n t h e o r y [ 7 ] . S tud ies h a v e s h o w n tha t s e l f - e x p l a n a t i o n can g r e a t l y i n f l u e n c e l e a r n i n g [ 6 , 2 3 ] . F o r e x a m p l e , s tudents w h o c a r e f u l l y c o n s i d e r the e f fec ts o f t h e i r ac t i ons i n t e r m s o f the d o m a i n m a t e r i a l are m o r e l i k e l y to l e a r n e f f e c t i v e l y t h a n those w h o m e r e l y p l a y w i t h the i n t e r f a c e i n an i n t e l l i g e n t t u t o r i n g s y s t e m . I t has been s h o w n h o w e v e r , tha t m a n y s tudents f a i l t o s p o n t a n e o u s l y s e l f - e x p l a i n w h e n i n t e r a c t i n g w i t h an o p e n l e a r n i n g e n v i r o n m e n t . T h u s c o m p u t e r - b a s e d t o o l s w h i c h s u p p o r t s e l f - e x p l a n a t i o n m a y be used t o e n h a n c e e x p l o r a t i o n i n an o p e n l e a r n i n g e n v i r o n m e n t . D e c i d i n g w h e n t o h i n t f o r s e l f - e x p l a n a t i o n is a c h a l l e n g i n g issue i n an o p e n l e a r n i n g e n v i r o n m e n t . T h e h i n t s s h o u l d i n t e r f e r e as l i t t l e as p o s s i b l e w i t h the e x p l o r a t o r y na tu re o f the i n t e r a c t i o n , bu t s h o u l d a lso be t i m e l y so tha t e v e n the m o r e r e l u c t a n t s e l f - e x p l a i n e r s can a p p r e c i a t e t h e i r r e l e v a n c e . T h u s , f o r an i n t e r f a c e t o e f f e c t i v e l y s u p p o r t a n d e n c o u r a g e s e l f - e x p l a n a t i o n , i t is c r u c i a l tha t t he s tuden t m o d e l b e as accura te as p o s s i b l e i n i ts assessment o f w h e t h e r a s tuden t is a c t u a l l y e n g a g i n g i n th i s p rocess . T h e r e m a y b e t w o types o f s e l f - e x p l a n a t i o n s w h i c h need t o be d e t e c t e d : ( i ) e x p l i c i t se l f -e x p l a n a t i o n s , i .e., s e l f - e x p l a n a t i o n tha t the s tuden t genera tes u s i n g m e n u -based t o o l s p o s s i b l y a v a i l a b l e i n the i n t e r f a c e ; ( i i ) i m p l i c i t s e l f - e x p l a n a t i o n s , tha t s tudents genera te i n t h e i r head . T h e s e are the m o s t d i f f i c u l t t o de tec t d u e t o the l a c k o f h a r d e v i d e n c e o f t h e i r o c c u r r e n c e . M o d e l b a n d w i d t h re fe rs to the a m o u n t a n d q u a l i t y o f i n f o r m a t i o n a v a i l a b l e t o a user m o d e l t o assess the use r ' s states o f in te res t [ 3 2 ] . T h e g rea te r t he b a n d w i d t h , t h e less a m b i g u i t y ex is ts i n m a p p i n g o b s e r v a b l e b e h a v i o r s o n t o user i n t e r n a l states, a n d the m o r e accura te the m o d e l ' s assessment can be. T i m e m a y b e used as e v i d e n c e o f s e l f - e x p l a n a t i o n b e h a v i o r , u n d e r the a s s u m p t i o n tha t s tuden ts w h o s e l f - e x p l a i n t ake l o n g e r b e t w e e n e x p l o r a t o r y a c t i o n s . H o w e v e r , t i m e as a p r e d i c t o r c a n o v e r e s t i m a t e s e l f - e x p l a n a t i o n b e h a v i o r as user m a y get d i s t r a c t e d b y o u t s i d e fac to rs such as o t h e r s tuden ts . Chapter 1. Introduction 3 T h e research presented in this thesis focuses on the use of eye-tracking to increase the bandwidth in a user model which assesses self-explanation in an open learning environment. 1.3 Thesis Approach : Using Eye-tracking as a Predictor of Self-Explanation T h e research presented in this thesis is aimed at exploring i f and how eye-tracking data can be used to detect self-explanation in an open learning environment. In an open learning environment, an essential part of self-explanation involves observing and considering the outcomes of one's actions. Previous studies [21] have shown that a user's gaze may be used as an indication of which parts of an interface hold the user's attention. Thus it is reasonable to assume that the presence or absence of certain gaze patterns may be used to assess self-explanation behavior. In this thesis, gaze patterns are first determined which indicate that the user is observing the outcomes of her actions. These patterns are then tested as evidence of self-explanation in a student model. 1.4 Thesis Goals and Contributions The main objective of this thesis is to explore i f and how eye-tracking may be used as a detector of implicit self-explanation. T o meet this objective, this thesis has the fol lowing goals: 1. T o create a student model which uses certain gaze patterns as evidence of implicit self-explanation. This involves: (a) Performing a user study to test assumptions concerning which gaze patterns indicate implicit self-explanation and collect quantitative data regarding the reliability of these patterns as predictors (b) Designing and building a new student model based on the results of this study. 2. T o evaluate the performance of the new model as a predictor of implicit self-explanation and sufficient exploration, particularly in comparison with a previously devised student model which does not use eye-tracking data. The student model discussed in this thesis uses on-line processing of eye movement data in its assessment of whether or not implicit self-explanation Chapter I. Introduction 4 has occurred. Most eye-tracking research to date has involved either the offline processing of gaze data for a greater understanding of the link between eye movement and user cognition or the online use of eye-tracking data for interface evaluation or manipulation by the user. The research described in this thesis involves the design of a student model which uses online processing of eye gaze data to assess self-explanation behaviour and an evaluation of the performance of the new model. 1.5 Outline Chapter 3 of this thesis presents a review of the design and development of the A C E system and its student model for assessing sufficient exploration. The previous addition of self-explanation is also discussed as well as the potential for eye-tracking data as a predictor of self-explanation. Chapter 4 describes a preliminary user study carried out to determine the correlation between certain gaze patterns and implicit self-explanation and thus establish the value of these gaze patterns as predictors of self-explanation. The results of this study were applied to the design of a new student model which includes eye-tracking data, as explained in Chapter 5. Chapter 6 describes an evaluation of the predictive performance of the new model, as compared to that of the previous models reviewed in Chapter 5. Finally, Chapter 6 contains a summary and conclusions. Chapter 2 5 Related Work This chapter presents a review of previous work related to the research described in this thesis. In section 2.1, earlier eye-tracking research is discussed. Section 2.2 provides a review of previous work on self-explanation. Finally, section 2.3 describes past research involving open learning environments and briefly introduces the A C E system. 2.1 Eye-tracking Research 2.1.1 Retrospective analysis of eye movements Many areas of science have long utilized the recording and subsequent analysis of eye movement data. For example, eye-tracking has been used in cognitive psychology as a tool to help understand both motor and cognitive processes [18, 21]. In [18], Just and Carpenter analyze the fixations of readers of various fragments of text. They show that better readers directly fixate on fewer words and spend less time on each. Comprehension was also improved for readers who fixated more on content words (e.g., nouns and verbs) than function words (e.g., conjunctions and prepositions). Further, skilled readers were found to fixate on words in the order in which they appear in the text, rather than regress to earlier words. Williams [33] applied retrospective analysis of eye-tracking data to his study of how people look at pictures and why they choose to fixate on certain areas. He found that color greatly influenced subjects' gaze patterns, while size and shape did not. In addition to fixating on some colors more than others, subjects' gazes were more likely to follow patterns defined by color than by size or shape. Thus color-coding is an effective tool for users to locate items efficiently. In [29], Sodhi et al. used an eye-tracker to record the gaze patterns of drivers to assess the distraction caused by in-vehicle devices such as cell phones and compact disc players. They performed an experiment in which they had subjects drive for twenty-two miles along a predetermined road while listening to an audio recording of instructions for which devices to use. After collecting the eye movement data, they were able to classify the use of each device as a single-glance or multiple-glance task. Chapter 2. Related Work 6 In the field of HCI, eye movement data has been studied post hoc to evaluate usability issues and understand human performance. For instance, Schiessl et al [25] used an eye-tracker to investigate differences in attentional behavior between genders for textual information and pictoral stimuli on websites. They found that women tend to focus more carefully on textual information while men pay more attention to pictures on websites. It should also be mentioned that when the participants of these studies were asked where in the interface they thought they looked, their perceptions often differed from reality. Here accurate eye movement data could only be found with an eye-tracker. In [11], offline processing of eye-tracking data was used in a study designed to improve the efficient generation of non-photorealistic images in a graphical interface. Here participants were initially asked to stare at different images for five seconds each. Eye fixation data was analyzed to determine which parts of the picture the study subjects found to be most meaningful. Complex algorithms were then designed which were used to draw the most "important" parts of the picture first. While the research described in this thesis includes the use of retrospective analysis of eye movements to design and test a student model, it differs from these efforts because this new model uses the online processing of eye-tracking data to assess and support student learning. 2.1.2 The online use of eye-tracking in interface operation There has also been fairly extensive research in using eye gaze as an alternative form of input to allow a user to explicitly operate an interface. In [17], Jakob explores issues surrounding the real-time processing of eye data such as efficient noise reduction and the organization of gaze information into tokens from which relevant data may be extracted. He then discusses the potential of eye-tracking as a tool in several forms of interface manipulation. These include selecting and moving an object, scrolling text, and navigating menus. Salvucci and Anderson [24] applied these ideas to the design and evaluation of a sample operating-system interface which allows users to execute commands with their eyes. They developed IGO (Intelligent Gaze-added Operating-system), with which users could perform such operations as opening, closing, and dragging windows using their eyes. Experiments showed that after little practice, users were able to perform tasks with their gaze almost as correctly and efficiently as with a mouse. Participants in the study also showed a preference for the gaze modality over the mouse. Majaranta et al [19] devised a system which allowed users to type with their eyes via an eye-tracker. A picture of a keyboard appeared on the keyboard for users to look at and small fixations were interpreted as key Chapter 2. Related Work 1 presses. When these occurred, the system played an audio recording of the name of the character typed. A user study revealed that requiring shorter fixations for key presses resulted in slower typing and more double-letter errors. In [14], Hornof et al. describe the EyeDraw system, which enables children with severe motor impairments to draw pictures by just moving their eyes. In EyeDraw, longer fixations which exceed a dwell threshold are necessary to "lower" and "lift" a drawing pen which moves across the screen. Three out of four of the children who participated in a small user study were able to quickly and easily draw the requested shapes and recognizable scenes including houses, stick figures, a car, and a butterfly. Unlike IGO and EyeDraw, the system discussed in this thesis uses real-time processing of a user's gaze to interpret user non-explicit metacognitive behaviours, enabling online adaptation of the interaction. 2.1.3 Eye-tracking for online adaptation of system interaction There is a much smaller body of work on on-line processing of eye movement data to interpret a user's behavior beyond interface operation to allow real-time adaptation of the interaction. Some of this work uses gaze tracking to assess user mental states, i.e. whether the student having difficulty in a reading task [32]. For example, in [28], Sibert et al describe the development and testing of the Reading Assistant, a system for automated reading remediation. Users of the Reading Assistant read text from the screen while an eye-tracker monitors their gaze. Visual and auditory cues are given depending on where the student is looking or whether changes in gaze patterns indicate that the student is having difficulty reading a word. For example, if a student stares at a word for a certain amount of time, an audio recording of the word being spoken will play. Another example of the use of an eye-tracker in the assessment of final states appears in [15]. Here Iqbal and Bailey perform a study which shows how eye-tracking can be used to determine which type of task the user is performing (e.g., reading email vs. reading a web page). This information can be applied to the design of an attention manager which balances the user's need for minimal disruption with an application's need to deliver necessary information. It also has important implications for interface design in general. Research has also been carried out in the use of gaze information to assess user mental states such as interest or problem-solving strategies. For example, in [30], Starker and Bolt describe a system which uses eye-tracking to determine the part of a graphical interface at which the user is interested. Chapter 2. Related Work 8 More information is then given about this area via visual zooming or synthesized speech. In [22], Qu and Johnson use eye-tracking in an Attention Tracking Model used to infer the learner's focus of attention while using the Virtual Factory Teaching System (VFTS), an online factory system for teaching engineering concepts and skills. The V F T S system includes a student model which assesses the motivational states of confidence, confusion and effort. This motivational model uses the time that the user spends after reading something but before taking action to assign a level of confusion (high, medium or low) to the user. In addition, the time the user spends reading, making a decision, and then taking an action is used to calculate an effort value relating to each task. A study carried out in [22] showed that this model was able to make accurate assessments about the learner's motivational states. These assessments could then be used by the system to provide tailored, proactive help to the user to motivate learning. Gluck and Anderson [13] studied the use of eye-tracking in the assessment of problem-solving strategies in the P A T Algebra I tutor. They used gaze patterns to detect cognitive behaviors such as shifts of attention, disambiguation of the problem statements and disambiguation of an error. Other fixation patterns were used to identify when the user failed to read or process error messages or other information critical to problem solving. This information about the user's cognitive behavior could then be used by the student model to adapt to the user's needs and thus create a better interaction. The work described in this thesis extends this body of research by exploring if and how eye-tracking can help assess mental states related to the meta-cognitive, domain-independent skill of self-explanation. 2 . 2 Research on Supporting Self-explanation Several computer-based tutors have been created to provide explicit support for self-explanation. However, each of these systems aims to support and encourage students' self-explanations during highly structured activities which target problem-solving skills. For example, the Geometry Explanation. Tutor [1] supports self-explanation as students solve problems in geometry theorem proving. This system prompts students to solve problems and requires that they enter an explanation for each of their solution steps. A user study [1] was also performed which compared this system to a similar version with the self-explanation prompts removed. Students using the tutor which required self-explanation were found to score considerably higher on post-tests than students who used the version which only included problem solving. Chapter 2. Related Work 9 Normit-SE [20] was devised as a problem solving environment for learning data normalization. Students using this system are asked to self-explain while they solve problems. However, in contrast to the Geometry Explanation Tutor, Normit-SE only requires self-explanation when an action is performed for the first time or is performed incorrectly. A user study revealed that when students used this system, both their problem solving performance and ability to explain their actions improved. Neither the Geometry Explanation Tutor nor Normit-SE considers the student who spontaneously self-explains without the need for prompts or other tools. These students are needlessly interrupted, making the interface restrictive and cumbersome. The SE-Coach [8, 9] addresses this problem by modeling the user's self-explanation behavior. This model uses information on both knowledge and reading patterns to assess self-explanation during example studying in the domain of Newtonian physics. Reading patterns are tracked via a poor-man-eye-tracker which requires that students move the mouse to explicitly uncover various parts of the studied example. Thus the SE-Coach only prompts for self-explanation when its student model assesses that the student actually requires it. In open learning environments, minimizing unnecessary prompts to self-explain is of particular importance. At the same time, the unrestricted nature of the interaction in such an environment makes it difficult to model the user's cognitive state. The research discussed in this thesis involves a system which, like the SE-Coach, includes an assessment of self-explanation. However, while the SE-Coach mainly relies on time spent on interface actions to detect spontaneous self-explanation, this system also uses a cumulative assessment of the student's tendency to self-explain. Thus, it possesses a student model which has a broader set of information to use in determining when a user needs help with self-explanation. 2.3 Open Learning Environments Open learning environments are designed to promote free exploration of the instructional material rather than structured, explicit instruction for student learning. By promoting unconstrained exploration of the interface, they force learners to take a more active role in their instruction. In theory, this type of learning should provide students with a deeper and more structured understanding of the domain material [27, 31]. However, some studies have shown that not all students possess the meta-cognitive skills necessary for effective exploration of open learning environments [6, 23]. These students require guidance by the system in order to initiate and perform useful experiments and correctly interpret and Chapter 2. Related Work 10 generalize the results. A t the same time unnecessary interruption of the student must be min imized to preserve the unrestricted nature o f open learning environments. Thus the user's exploratory ski l ls should be modeled by the system so that tailored support can be offered. Smithtown [27] is an open learning environment designed to both improve a student's scientific inquiry ski l ls and provide an environment in which users may discover concepts of macroeconomics. In Smithtown, a distinction is made between the "exploration phase" and the "experiment phase." In the exploration phase, students gather information and make observations about variables in an example economy. In the experiment phase, students perform experiments by changing variables concerning supply and demand and draw conclusions from the outcomes of their actions. W h i l e they use the system, their actions are constantly monitored for signs o f good or bad behavior and students who perform poorly are coached to become more effective problem solvers. S imi la r ly , users who show evidence of good behavior are congratulated by the coach. However , Smithtown does not assist users in the exploration phase or address the needs of students unable to perform experiments in the first place. A C E [4] is an open learning environment for the domain of mathematical functions. It uses a student model to assess the student's abil i ty to explore effectively, offering tailored hints when needed. U n l i k e Smithtown, A C E assists the learner throughout the exploration and provides support to students who appear unable to choose an appropriate action. In addition, A C E monitors the user's tendency to self-explain and supports this self-explanation through the use of customized prompts and other interface tools. This system w i l l be described in detail in the next chapter. 11 Chapter 3 A C E and the Addition of Eye-tracking to Assess Self-Explanation This chapter describes the A C E system and the benefits of adding eye-tracking data to the A C E student model. Sections 3.1 and 3.2 provide a review of the A C E interface and previous versions of the student model, respectively. In section 3.3, the need to add gaze data to the assessment of implicit self-explanation is discussed and explained. 3.1 The ACE Learning Environment A C E [4] is an adaptive learning environment for the domain of mathematical functions designed to support learning through exploration. A C E ' s activities are divided into units and exercises. Units are collections of exercises whose material is presented with a common theme and mode of interaction. Exercises within units differ in function type and interaction. For completeness, all three units of A C E will be briefly described in this section. However, the third unit, the Plot Unit, will be the focus of all research described in this thesis. FUNCTION A R R O W 2 output f(x) = 2x 18 19 200 0 200 73 I • » • <> «• ' 4 "*"*•» o -10 -3 0 3 10 input x Drag the inputs (x) to the correct outputs f(x) Figure 3.1 The Machine Unit (left) and Arrow Unit (right) The first unit, the Machine Unit, allows the user to input numbers into a function (represented in the interface by an arrow) by dragging them behind the tail of the arrow. The Machine Unit interface appears in the left hand side FUNCTION M A C H I N E 38 Drag a number behind the tail of the arrow Chapter 3. ACE and the Addition of Eye-tracking to Assess Self-Explanation 12 of figure 3.1. This type of exploration gives the user a sense of how functions transform an input value into an output value. This unit features four exercises for different types of functions: a constant function exercise, a linear function exercise, a quadratic monomial exercise and a quadratic polynomial exercise. For functions whose solution involves more than one calculation step (all but the constant function), users must press the step button to cause each operation to occur. By watching the calculations occur one step at a time, the user may acquire a greater understanding of the low-level process. In the second unit, the Arrow Unit, users may experiment by drawing lines from input values to output values, as shown in the right hand side of figure 3.1. If the correct output is chosen for the input and the function, the line turns green; otherwise the line is red. Like the Machine Unit, the Arrow Unit consists of a constant function exercise, a linear function exercise, a quadratic monomial exercise and a quadratic polynomial exercise. The Arrow Unit is the only unit in A C E for which an action may be classified as "correct" or "incorrect"; thus it is also the only unit that allows students to explicitly demonstrate mastery of a skill or concept. Here students have the opportunity to test their understanding of the material presented in the Machine Unit. The third and final unit of A C E is the Plot Unit. It will be the focus of all research described in this thesis. In the plot unit, students are presented with a function and its curve plotted in the Cartesian plane as shown in figure 3.2. Here the student may drag the curve around the plane, causing the function below to change to match the curve in its new location. Alternatively, the user may change values in the function equation, causing the curve to change in the plane. This type of exploration is intended to give the users a sense of what plotted curves look like for different functions and what effect each parameter in the function has on the graph. The plot unit includes a constant function exercise, a linear function exercise, and monomial function exercise. To support the exploration process, A C E also includes a coaching component that provides tailored hints when A C E ' s student model predicts that students have difficulties exploring effectively [3]. In the next section we describe previous versions the A C E student model 3.2 Previous Versions of the ACE Student Model In this section previously developed versions of the A C E student model are reviewed. The original A C E model was based entirely on student actions [4]. It used exploratory behavior alone to assess sufficient exploration. This model was later extended to predict whether or not each of these exploratory actions was self-explained by the student, allowing both for support for self-Chapter 3. ACE and the Addition of Eye-tracking to Assess Self-Explanation 13 explanation via prompts and other tools and a more informed assessment of sufficient exploration [5]. H 1 1 h 5.0 4.0 -5.0 -4.0 -3.0 -2.0 -1.0 f(x)= -11 (X-2.2) {± ' +3.2 ( X=2.2 « = 3 2 3ca'e i Figure 3.2. The Plot Unit 3.2.1 The Original A C E Model A C E ' s student model uses a Dynamic Bayesian network to manage the uncertainty in assessing students' exploratory behavior [5]. The main cause of this uncertainty is that neither exploratory behavior nor the related metacognitive skills are easily observable unless students are required to make them explicit. However, forcing students to articulate their exploration steps would clash with the unconstrained nature of open learning environments. The general structure of A C E ' s student model was derived from an iterative design process [4] which yielded a better understanding of what defines effective exploration. Figure 3.3 shows a high-level description of Chapter 3. ACE and the Addition of Eye-tracking to Assess Self-Explanation 14 this structure, which includes several types of nodes used to assess exploratory behaviour at different levels of granularity: • Relevant Exploration Cases: the exploration of individual exploration cases in an exercise (e.g., dragging the -7 square to the left of the arrow in the machine unit or changing the slope of a line to 3 in the Plot Unit) • Exploration of Exercises: the exploration of individual exercises • Exploration of Units: the exploration of individual units • Exploration of Categories: the exploration of groups of relevant exploration cases that appear across multiple exercises (e.g. all the exploration cases involving a positive slope in the Plot Unit) Relevant Exploration Cases Exploration of Exercises | j Exploration of Categories | j Correct Beliavior Figure 3.3 High Level Structure of A C E ' S Student Model The links among the different types of exploration nodes represent how they interact to define effective exploration [4] . Exploration nodes have binary values which represent the probability that the learner has sufficiently explored the items associated with the node. A C E ' s student model also includes binary nodes representing the probability that the learner understands the relevant pieces of knowledge (summarized by the node knowledge in figure 3.3). The links between knowledge and exploration nodes represent the fact that the degree of exploration necessary to understand a concept depends on how much knowledge of that concept a learner already has. Knowledge nodes are updated only through actions for which there is a clear definition of correctness (i.e., actions in the arrow unit). Prior probabilities for these nodes can be estimated from existing information on a student's prior knowledge of the related topic (see Chapter 6 for more details of this process). These nodes are never updated within the Plot Unit since it only supports purely exploratory activites. Early studies on A C E yielded encouraging evidence that the system based on the model in figure 3.3 could help students learn better from exploration [4] . However, these studies also showed that sometimes the A C E student model overestimated student's exploratory behavior because it regarded interface actions as sufficient evidence of good exploration, without Chapter 3. A CE and the Addition of Eye-tracking to Assess Self-Explanation 15 considering whether or not a student was self-explaining the outcomes of these actions. For instance, a student who quickly drags the curve around the screen in the Plot Unit - but never reflects on how these movements change the function equation - performs many exploratory actions but can hardly learn from them because she is not reflecting on (self-explaining) their outcomes. This exact behaviour was apparent in a number of subjects who tended to spend little time on their exploratory actions, and who did not learn the associated concepts. 3.2.2 Extending A C E to Monitor and Support Self-explanation To address the model limitation described above, A C E ' s interface and student model were extended to track and support self-explanation [5] . The original version of A C E only generated hints indicating that a student should further explore some elements of a given exercise. Augmenting A C E with the means to track self-explanation allows A C E not only to detect when a student's exploration is sub-optimal, but also to recognize if the cause is a lack of self-explanation and provide tailored hints to correct this behavior. The A C E student model was augmented to assess implicit self-explanation using only time spent on each exploratory action as evidence of implicit self-explanation [5] . The relation between time and self-explanation in the model was based upon the assumptions that (1) no self-explanation can happen if a student switches too rapidly from one exploration case to the next; (2) the longer a student dwells on a case the more likely it is that she is trying to self-explain it. Figure 3.4 shows two time slices corresponding to implicit self-explanation slices (similar slices capture the occurrence of explicit self-explanation). Nodes representing the assessment of self-explanation are shaded grey. Al l nodes in the model are binary, except for time, which has values Low/Med/High. In this figure, the learner is currently exploring exercise 0 (node eo) in the Plot Unit, which has two relevant exploration cases (eoCaseo and eoCasei in fig 3.4). Each exploration case influences one or more exploration categories (positive and negative intercepts in the figure). In the first slice, the learner performed an action that corresponds to the exploration of eoCase2. In the second slice, the learner performed an action corresponding to e0casei. In this version of the model, the probability that a learner's action implies effective exploration of a given case depends on both the probability that the student self-explained the action and the probability that she knows the corresponding concept, as assessed by the set of knowledge nodes in the model (summarized in figure 3.3 by the node Knowledge). Factors influencing the probability that implicit self-explanation occurs include the time spent exploring the case and the stimuli that the learner has to self-explain. Low time is always taken as negative evidence for Chapter 3. ACE and the Addition of Eye-tracking to Assess Self-Explanation 16 implicit explanation. The probability that self-explanation happened with longer time on action depends on whether there is a stimulus to self-explain. The probability that a stimulus exists depends on the learner's general tendency to self-explain and on whether the system generated an explicit hint to self-explain Plot Unit Exploration Figure 3.4. Original A C E model with self-explanation Time, however, can be an ambiguous predictor for self-explanation. First, it is hard to define for different learners what is insufficient time for self-explanation. Furthermore, a student may be completely distracted during a Chapter 3. ACE and the Addition of Eye-tracking to Assess Self-Explanation 17 long interval between exploration cases. Thus , we chose to explore an additional source of evidence of self-explanation behavior, i.e., the student's attention patterns during the exploration of a given case, as described in the next chapters. 18 Chapter 4 Preliminary User Study This chapter describes the design and results of a user study carried out with A C E during the summer and autumn of 2004. The study demonstrates the potential for detection of implicit self-explanation through observation of user attention patterns. In addition, the study serves to compare the performance of latency and gaze patterns as predictors of implicit self-explanation. This experiment focuses entirely on the plot unit of A C E and while data was collected for all three units, only the students' explorations within the plot unit are analyzed for evidence of implicit self-explanation. 4.1 Using Eye-tracking Data to Assess Implicit Self-explanation Self-explanation is more likely to occur if the student actually attends to the parts of the interface showing the effects of a specific exploratory action. Studies have shown that users tend to look at the region of the interface to which they are paying attention [21]. This naturally leads to the assumption that in an interface like A C E ' s , eye-tracking data on student attention patterns may be considered as a source of evidence of implicit self-explanation. As mentioned earlier, the student exploring the Plot Unit may either (i) drag the curve across the graph region, causing the function equation to change or (ii) type new parameters into the function line, leading to a change in the curve's position or orientation. Thus, it seems reasonable to expect that in order for a student to implicitly self-explain an action, she must make a change in one region (the plane or function line) and then look at the other to observe the outcome. A user study was then carried out to test these assumptions and collect quantitative data on the relationship between eye movements and implicit self-explanation. This chapter presents analysis intended to compare the performance of the following predictors of self-explanation: (i) time, (ii) gaze shifts and (iii) a combination of the two. Chapter 4. Preliminary User Study 19 4.2 Study Goal The purpose of this study was to compare the performance of various predictors of implicit self-explanation. The study was carried out to test assumptions concerning which gaze patterns may be used as evidence of implicit self-explanation. It was further intended to yield data on the correlation between the presence of self-explanation and time taken between exploratory actions. 4.3 Participants Since participants in the study were going to be using A C E and the interaction was going to be scrutinized for signs of self-explanation, it was essential that the domain knowledge be as new to them as possible. The ideal participants were those with a minimal knowledge of mathematical functions and their graphs but with some initial understanding of the topic. Since A C E does not offer any background instruction, total novices wouldn't know where to start. The 19 participants were students at the University of British Columbia who had not yet taken any college level math courses. They varied in the amount and level of mathematics that they recalled from high school. While it was never explicitly stated that users had to be non-science students, advertisements were placed on bulletin boards meant exclusively for psychology students. As a result, participants were all arts and humanities students. A C E participants were paid $10 per hour for their time. Table 4.1 Experiment Schedule Activity Time taken Pre-test 15 minutes Calibration 10 minutes Interaction with A C E 40 minutes Post-test 15 minutes Total 80 minutes 4.4 Experiment Design The study consisted of individual sessions lasting approximately 80 minutes each. Table 1 summarizes the structure of each session. At the beginning of each session, the participant would complete a pretest intended to determine Chapter 4. Preliminary User Study 20 her initial knowledge of mathematical functions and their graphs (see appendix A). This usually took about 15 minutes. During interaction with A C E , a student's eye movements were recorded by an Eyelink I eye-tracker, developed by SR Research Ltd., Canada. This is a head-mounted eye-tracker which weighs 600 grams. It is fairly uncomfortable and intrusive to use, due to both its weight and to the presence of a little camera which hangs in the user's peripheral vision. This is shown in figure 4.1. This particular eye-tracker was used because in was readily available through the psychology department at the University of British Columbia. However, the same data could be obtained through a completely non-intrusive remote eye-tracker, consisting of a small camera which sits on top of the monitor or on some other flat surface (e.g. IView X Red from SensoMotoric Instruments, USA). Figure 4.1. The Eyelink I eye-tracker After the eye-tracker was placed on the head of the participant, it had to be calibrated to correctly track her eye movements. For this to occur, the student had to watch a white dot appear at different locations across a blank screen, following it as closely as possible. The process often had to be repeated more than once to achieve acceptable calibration. This phase of the experiment usually lasted less than 10 minutes. Once calibration was achieved, A C E was started and the participant went through each of the units at her own pace. Each student was told to "think Chapter 4. Preliminary User Study 21 aloud" during the interaction. In fact the following script was read to each student before opening A C E : "As you use this program, we would like you to speak aloud and explain what each action means to you. That is, why did you do it and what new information, if any, about functions, does it provide? Tell us whatever is going through your mind - even if it seems unimportant." This part of the experiment usually lasted about 40 minutes. After exiting A C E , each participant completed a post-test which was very similar to the pretest and took about 15 minutes (see Appendix B). The post-test only differed from the pre-test in the constants used in the functions and the ordering of different questions. Due to the use of the eye-tracker, only one subject could participate in the experiment at a time. In addition to the paper pre-test and post-test, each session yielded a log file from the A C E system. This contained data which included each exploratory action as well as the time at which the action was taken. The eye-tracker also generated a log file containing the coordinates and duration of each fixation. In addition, video and audio recordings of the interaction were created showing the A C E screen and allowing for later analysis of the user's speech. 4.5 Data Analysis In order to assess the performance of latency and gaze patterns as predictors of implicit self-explanation, it was necessary to first analyze the audio and video data for each session in order to classify students' self-explanation behavior. In particular, we needed to isolate exploratory actions accompanied by implicit self-explanation and those that were not. Then these classifications could be tested for correlations with those predicted by time and gaze shift data. 4.5.1 Setting the "gold standard" To understand how attention patterns and time per exploration case relate to implicit self-explanation, it was necessary to obtain from the study data points on actual positive and negative self-explanation episodes. Here, "negative self-explanation" indicates situations in which the students did not self-explain, not situations in which students self-explained incorrectly, consistent with the original definition of self-explanation [7]. Chapter 4. Preliminary User Study 22 The audio recordings of each interaction were found to be the most useful for detecting the presence or absence of implicit self-explanation. It should be noted that A C E data collected for the purpose of classifying self-explanation behaviors (both positive and negative) consisted of explicit verbalization by the subjects as to their thoughts. Similar to other studies on self-explanation [e.g., 6, 18], using subjects' verbalizations is acceptable since the participants were instructed to share all of their thoughts and were not told anything about the data analysis process or the actual purpose of the study. Thus the episodes we related to presence or absence of self-explanation in the data can accurately be described as "internal" since they reflect the subjects' thoughts which are unknown to the A C E system. Further, with existing technology, this is as close as we could come to reading the students' thoughts in our search for evidence of implicit self-explanation or lack thereof. To maximize the objectivity in the analysis of the audio data, two observers (the author and another graduate research assistant) independently analyzed the audio data and then created links between the verbal episodes and the corresponding exploration cases in the log files. This turned out to be a much more laborious process than expected, due to two factors. Table 4.2 Classification scheme for coding episodes of positive and negative self-explanation. Evidence of positive self-explanation Evidence of negative self-explanation • Verbalized conclusions about domain-specific principles related to the exploration process (regardless of correctness) • Prediction of an action just before it occurred • Simple narration of the interaction • Isolated statements of confusion • Expressions of inattentiveness First, we needed to devise a detailed coding scheme in order to objectively convert fragments of audio data into isolated episodes of positive or negative self-explanation. While coding schemes exists for self-explanation study during problem solving, this is the first attempt to evaluate self-explanation during independent exploration. This problem was addressed by having the two observers independently label a subset of the audio data, then compare their classifications, possibly reconcile them and devise the coding scheme based on this discussion. In the coding scheme, student utterances were classified as self-explanation if they expressed a conclusion about a domain-specific principle Chapter 4. Preliminary User Study 23 related to the exploration process (e.g., "when I increase the coefficient here, the line gets steeper") regardless of correctness, or if they predicted the result of an action just before it occurred (e.g., "putting a negative sign here will turn the curve upside-down"). Simply narrating the outcome of each action once it happened (e.g., "this number just changed to a 3"), or isolated statements of confusion (e.g., "/ don't understand what's happening") were not considered self-explanation. However, tentative explanations followed by expressions of confusion were coded as self-explanation. Obvious statements of inattentiveness (e.g., "I'm just playing") were also coded as evidence of lack of self-explanation. This classification scheme is summarized in table 4.2. It should also be noted that whenever an exploratory action was followed by evidence of both positive and negative self-explanation, the action was considered self-explained. The coded data for three episodes appears in figure 4.2. The following types of tags were used to describe each episode: • ACEJTIME tags give the system time when the action occurs • VIDJTIME is an optional tag giving the time as kept by the video recorder • ACTION tags describe the exploratory action that occurred (e.g. "Moved linear function" or "typed new exponent") • SE_TYPE tags tell whether or not implicit self-explanation occurred as determined by the observers Here a Y (yes), N (no) or ? (inconclusive) always appears • SE_DESCRIPTION tags give the student's statement which led the observers to conclude that self-explanation did or did not occur. If the observers were unable to decide or agree upon whether or not it had occurred, this tag was left blank • LOG_PTR tags give the action's line number in the A C E log file for quick reference In some cases, it was very difficult to tell from the video recording which action was occurring when certain utterances were made by the student. This was a particular problem when the student was talking and moving the curve around very quickly. The V I D J T I M E tags were created to solve this problem as objectively as possible. When an utterance was made and it was not possible to tell which action was occurring at the time, this utterance was recorded in an SE_DESCRIPTION tag as in figure 4.2c. The V I D J T I M E tag gave the time as kept by the video recorder when the speech occurred and the S E J T Y P E tag gave the observer's determination of the presence or absence of implicit self-explanation. The A C E J T I M E , A C T I O N , and L O G _ P T R tags were left blank as the information they were intended to contain was not yet known. Chapter 4. Preliminary User Study 24 Next, using the time synchronization line appearing at the start of each coded file (see the top of Figure 4.2) which gave the time at which A C E was started in both forms, a post-processing program filled in the A C E _ T I M E tags for each of these utterances. This program then used the A C E _ T I M E information and the A C E log file to determine which action was occurring at the time and fill out the A C T I O N and L O G _ P T R tags accordingly. <ACE_TIME: 16 :59 :23> <VID_TIME:02:38:34> (a) <ACE_TIME: 17 :07:08> <VID_TIME:> <ACTION: Moved constant function> <SE_TYPE:N> <SE_DESCRIPTION: "I'm not sure what's going on"> <LOG_PTR: 1057> (b) <ACE_TIME: 17 :07:27> <VID_TIME:> <ACTION: Moved l i n e a r function> <SE_TYPE:Y> <SE_DESCRIPTION: "moving the l i n e changes the y i n t e r c e p t i n the equation"> <LOG_PTR: 1127> (c) <ACE_TIME:> <VID_TIME:02:43 : 65> <ACTION:> <SE_TYPE:Y> <SE_DESCRIPTION: "the exponent doesn't change when the curve i s moved"> <LOG_PTR:> Figure 4.2 Coding of (a) negative and (b) positive self-explanation and (c) utterance that has yet to be associated with an action Each of the observers individually applied this scheme to code the audio data and then their results were compared. The intercoder reliability was 93% in this phase, which suggests a high level of objectivity in the classification scheme. Only episodes on which the coders fully agreed were used in the rest of the analysis. A second factor which increased the complexity of data analysis was difficulty in determining which action each coded utterance corresponded to. The observers at first assumed that subjects' utterances always pertained to whatever exploratory action they had just taken. Thus, associating each utterance with an action was as simple as matching up the times at which each occurred. However, subsequent analysis of the video data showed that this was not always the case, particularly for users who showed great reluctance to think aloud. These learners had to be repeatedly prompted by the observers to speak, so some of the conclusions they shared weren't reached when the spoke, but related to self-explanation that occurred a few minutes earlier. The observers solved this problem by studying every coded episode and using its content to match it to its corresponding action. For example, if a user made a comment about even exponents, it was assumed that this self-explanation behavior pertained to an exploratory action which Chapter 4. Preliminary User Study 25 involved even exponents, even if such an action occurred slightly earlier. If a user made a verbal prediction about the outcome of the next action, this utterance had to be associated to the action that immediately followed. Thirteen coded episodes were discarded because the match was ambiguous. While both parts of the above coding process resulted in the elimination of data points, the factor that had the greatest impact on the amount of data that could be obtained from the study was students' willingness to verbalize their thoughts. A number of students were incapable or unwilling to think aloud, even if they were periodically reminded to do so. Without such verbalization, the coders could not tell whether a student had self-explained or not. Thus, of the 567 exploration cases recorded in the log files for all students, only 149 could be classified in terms of associated self-explanation. Once positive and negative self-explanation episodes were identified and mapped onto specific exploration cases, an analysis could be carried out of the correspondence between these episodes, gaze information, and time students devoted to each case. 4.5.2 Time as a predictor of self-explanation To analyze the relationship between time per exploration case and self-explanation in the plot unit, average times spent on exploration cases with and without self-explanation in this unit were computed and compared. When an exploratory action was accompanied by self-explanation, and average of 24.7 seconds elapsed before the next action. Exploratory actions without self-explanation were only followed by an average of 11.6 seconds of idle time. These results appear in table 4.3. The difference is statistically significant at the 0.05 level, suggesting that time per exploration case is actually a fairly reliable indicator of self-explanation. Table 4.3 Time elapsed after actions with and without self-explanation Self-explanation No self-explanation Average time elapsed 24.7 s 11.6s Standard deviation 13.89 s 9.59 s To use time as a predictor of self-explanation, a threshold T had to be determined so that an action could be classified as self-explained if and only if the student spent more than T seconds on it. To choose the optimal threshold, a Receiver Operating Characteristic (ROC) curve (figure 4.1) was built. The R O C curve is a standard techniques used in machine learning to evaluate the extent to which an information filtering system can successfully distinguish between relevant data (episodes the filter correctly classifies as positive, or true positives) and noise (episodes the filter incorrectly classifies Chapter 4. Preliminary User Study 26 as positive, or false positives), given a choice of different filtering thresholds. Figure 4.3 shows the R O C curve-obtained for time, where each point on the curve represents a different threshold value. Points corresponding to even-numbered thresholds appear as small asterisks on the curve and several of these thresholds are labeled in the figure. As is standard practice, the final threshold was chosen as the point on the curve which corresponds to a reasonable tradeoff between creating too many false positives and creating too few true positives. In this case, this threshold is 16 seconds as labeled by the larger asterisk on the curve in figure 4.3. The reader should recall that both positive and negative self-explanations are verbalized, so higher time for positive self-explanation is not an artifact of verbalization rate of false positives Figure 4.3 R O C curve for time as a filter for self-explanation. Asterisks indicate points corresponding to even-numbered thresholds 4.5.3 Gaze Shifts as a predictor of self-explanation Raw eye-tracker data was parsed by a pattern detection algorithm developed to detect switches of attention ("gaze shifts") among the graph Chapter 4. Preliminary User Study 27 panel, the equation area, and any other non-conspicuous areas in the plot unit interface^. As mentioned in the previous chapter, these are the gaze patterns hypothesized to be associated with self-explanation in the plot unit. A sample gaze switch appears in figure 4.4. Here a student's eye gaze (shown as the dotted line) starts in some untracked area below the screen, moves to the equation region and then hovers around the graph region above. The data-parsing algorithm uses fixation coordinates from the eye-tracker and matches them to appropriate A C E interface region. Next it searches the data for the pattern of looking at one region and then another, i.e. having a gaze shift. When this pattern in found, a tag is placed in the A C E log file to synchronize the switch with the appropriate exploration case. Figure 4.4. Sample gaze shift An excerpt of such a synchronized log file appears in figure 4.5. Here the user begins by typing a new value for the slope into the equation (line 1 in the figure). Then several gaze shifts occur (lines 2-4). An indirect gaze shift happens when the user looks from the graph (or equation) region in the interface to an irrelevant region (e.g. the help menu on the screen or the keyboard on the table) and then to the equation (or graph) region. A direct gaze shift involves looking directly from one tracked region to another. Next the user moves the curve without looking down at the function region (line ' This algorithm was designed and implemented by Dave Ternes, an undergraduate research assistant in the computer science department at UBC. Chapter 4. Preliminary User Study 28 5). Finally the user moves the curve again and then shifts her gaze down to the function region and back (lines 6-8). 1 <EXPLORE><TEXT BOX>*17:11:22*new_slope 8 2 <INDIRECT FIX CHANGE>*17:11:2 3 *Previous Region: Equation 3 <DIRECT FIX CHANGE>*17:11:24*Previous Region: Graph 4 <DIRECT FIX CHANGE>*17:11:24*Previous Region: Equation 5 <EXPLORE>*17:11:26*Moved power function to: new Coeff: 8.0 new y l n t : 2.02 x l n t s : 1.3533333333333333 6 <EXPLORE>*17:12:19*Moved power function to: newCoeff: 8.0 new y l n t : 2.02 x l n t s : -0.819999992847464 7 <DIRECT FIX CHANGE>*17:12:24*Previous Region: Graph 8 <DIRECT FIX CHANGE>*17:12:25*Previous Region: Equation Figure 4.5 Excerpt of A C E log file with added gaze shift tags Finally, a program is run which merges the synchronized log file with the coded data from the observers to create a file containing all data for one user in a concise form. An excerpt of such a file appears in Figure 4.6 below. Action SE? Gaze s h i f t Time>16s? Typed new slope Y Y N moved power function Y N Y moved power function N N N Figure 4.6 Excerpt of user data file 4.6 Results The dual histogram in figure 4.7 categorizes the 149 data point into episodes with and without self-explanation (99 circles and 50 triangles, respectively). The vertical line separates the points into those with and without a gaze shift between graph and equation plane in the plot unit. The horizontal line separates points with elapsed time above or below 16 seconds. The raw data also appears in table 4.4. In order to determine whether or not the addition of gaze shift data caused a change to the optimal threshold set for time analysis, new R O C curve was constructed for time and gaze shifts as combined predictor. Here an exploratory action is classified as self-explained if it is accompanied by either 16 seconds of idle time or a gaze shift. The R O C curve, appearing in figure 4.8 shows that 16 seconds continues to be a reasonable threshold. Chapter 4. Preliminary User Study 29 cases with implicit self-explanation cases without implicit self-explanation time > 16s o o o o o o o o o time = 16s - A - - -Q-9..9.. time < 16s * | A A ft O O O no gaze shift gaze shift Figure 4.7. Dual histogram of experiment data Table 4.4 R a w experiment data SE Yes No Total Gaze shift 61 12 73 No gaze shift 38 38 76 Time > 16 71 16 87 Time <= 16 28 34 62 Gaze shift or Time > 16 85 19 104 No gaze shift and Time <=16 14 31 45 Total 99 50 149 Chapter 4. Preliminary User Study 30 Table 4.5 shows different measures of self-explanation classification accuracy if the predictor used is: (i) the eye-tracker to detect gaze shifts; (ii) time per exploration case; (iii) a combination of the two. Accuracy is reported in terms of true positive rate (i.e. percentage of positive self-explanation cases correctly classified as such, or sensitivity of the predictor) and true negative rate (i.e. percentage of negative self-explanation cases correctly classified as such, or the specificity of the predictor). A combined measure also appears, which is the average of the two accuracies. This average could also be weighted should it be decided that predicting the occurrence of self-explanation would be more important for A C E pedagogical proposes than predicting lack thereof (or vice versa). i * Mr-0I I I I I I I I I I I 0 B.1 -0.2 3.3 0.4 0.5 B.E 3.7 OJ BiS * rate of false positives Figure 4.8. R O C curve for time and gaze shifts as a combined predictor. Here asterisks indicate points corresponding to even-numbered thresholds As the table shows, time alone has a higher sensitivity than gaze shift, i.e. the episodes involving self-explanation were more likely to take over 16 seconds than to include a gaze shift. However, the eye-tracker alone has comparably higher specificity, i.e. the cases without self-explanation were Chapter 4. Preliminary User Study 31 more likely to involve the absence of a gaze shift than shorter time per exploration case. The two predictors have comparable combined accuracy. Alternate analysis was performed to check if multiple gaze shifts would serve as a good predictor, with or without time. When two gaze shifts were required to indicate self-explanation, the specificity of the eye-tracker alone dropped to 51.7% and the sensitivity only rose to 79.5%, resulting in a combined accuracy of 60.6%. Adding time raised the combined accuracy to 67.9% but this is still lower than the results for a single gaze shift. Requiring more then two gaze shifts continued to lower the sensitivity to unacceptable levels. Table 4.5 Classification Accuracy of different predictors Eye-tracker Time Eye-tracker + Time True Positive Rate (sensitivity) 61.6% 71.7% 85.8% True Negative Rate (specificity) 76.0% 68.0% 62.0% Combined Accuracy 68.8% 69.85% 73.9% 4.7 Discussion These results seem to suggest at first that the gain of using an eye-tracker is not worth the cost of adding this information to the A C E model. However, there are a few counterarguments to this conclusion. First, it should be noted that time accuracy here is artificially high. One of the drawbacks of using time as a predictor of self-explanation is that the amount of time elapsed tells the model nothing about a student's behavior between actions. During a long time spent on a given case, a student may be doing or thinking of something completely unrelated to A C E . This seldom occurs in this data, but it should be kept in mind that students were in a laboratory setting with little available distraction, in the presence of an observer and wearing a rather intrusive device on their head. Al l of these factors are likely to have made it more difficult for the students' thoughts to wander from the task at hand, resulting in time being a more reliable indicator of self-explanation than it would be in actual practice. Second, the sensitivity of the eye-tracker as a predictor may be artificially low due to errors in the eye-tracking data. For some students, calibration of the eye-tracker proved very difficult. This was especially the case for participants with heavy eyelashes or thick glasses. The eye-tracker would function with a reading of " G O O D C A L I B R A T I O N " or "POOR C A L I B R A T I O N " but not " C A L I B R A T I O N F A I L E D . " For several subjects, "POOR C A L I B R A T I O N " was the best that could be achieved. Also, Chapter 4. Preliminary User Study 32 calibration could sometimes be compromised when a student sneezed or touched her face. Problems with calibration would make it more difficult for the eye-tracker to detect eye movements and thus some gaze shifts could go unrecorded. In fact, one student's data was entirely discarded immediately after her participation since, according to the eye-tracking data, her eyes stayed fixed on the same spot on the screen throughout her interaction with A C E . During her session with A C E , she performed many exploratory actions and made many comments consistent with self-explanation. Thus it was assumed that the eye-tracker must have been bumped or otherwise gotten shifted, causing a complete loss of calibration. These calibration problems are specific to,a head-mounted device and would likely be less of an issue with a desk-mounted one. It should be noted that when we tried to recompute these accuracies using only the data points associated with the eleven students out of nineteen for whom " G O O D C A L I B R A T I O N " was achieved, sensitivity increased to 63.8%. Although this does not seem like a substantial increase, the reader should bare in mind that it is based on only 79 episodes -about 50% of the available data points - and it possibly included students whose calibration was compromised during the interaction with A C E . In episodes where the student appeared to self-explain without making a gaze shift, pre-test performance showed that the student already understood the related concept before using A C E 73% of the time. Thus most of these students may have been merely telling the observer which mathematical phenomenon was relevant to the exploration rather than learning something new through observation and self-explanation. Finally, combining gaze shift and time into one predictor substantially improves sensitivity. That is, if an action is classified as self-explained when there is either a gaze shift or more than 16 seconds elapsed time, most of the self-explanation episodes (85.8%) are correctly recognized. This increase also causes the combined accuracy to improve. However, as sensitivity increases, specificity is reduced and only 62% of the episodes that lack self-explanation are discovered by the model. This situation is shown in figure 4.3. With the combined model, all data points to the right of the vertical line or above the horizontal time threshold line are classified as self-explained. As a result, most of the episodes with self-explanation are found but many of those without self-explanation are incorrectly classified. Here a tradeoff appears between sensitivity and specificity. Depending on how the system is used, it may be most important to correctly classify self-explanation when it occurs than to detect the lack thereof. This is the situation when letting natural self-explainers explore without interruption is given highest priority. Here, using the combination of eye-tracker and time data is best. Alternatively, it may be more important to make sure that the system intervenes wherever it is necessary. Then failing to identify lack of self-explanation is a bigger problem than failing to detect it when it occurs. Chapter 4. Preliminary User Study 33 In this case, the eye-tracker alone is a more appropriate predictor because students who need help will be more likely to get it. However, it also means that more self-explanation episodes wil l be misclassified and spontaneous self-explainers will be more likely to be needlessly interrupted. G i v e n the above arguments, we felt that it is worthwhile adding eye-tracker data to the A C E model, and in such way that allows for flexibility in deciding which predictor (or combination of predictors) to use. In the next chapter we describe this stage of the research. 34 Chapter 5 The New ACE Student Model This chapter describes the design and implementation of a revised A C E student model which includes the use of eye-tracking data to detect implicit self-explanation. The design of this model is based largely on the results of the user study discussed in the previous chapter. 5.1 Defining "SE-related Behavior" As discussed in the previous chapter, each of the predictors of -self-explanation we investigated has advantages and disadvantages. Depending on the circumstances surrounding the use of A C E , one predictor may be more appropriate than the others. Time was shown in the last chapter to yield a reasonably high combined accuracy, slightly higher than that of the eye-tracker alone but not as high as that of the combined predictor. Both the specificity and the sensitivity of time are also high, suggesting that it is well-suited to situations where detecting positive and negative self-explanation are given equal importance. When it is more important to catch the absence of self-explanation and intervene, the eye-tracker alone has proven to be a better predictor. In other settings, it may be most important to detect self-explanation when it occurs so that natural self-explainers are not needlessly interrupted. Then the combined predictor is most appropriate. Time is always simple and efficient to measure. In some situations an eye-tracker might not be available due its cost or other factors. Then time would be the only predictor available. However, there may also be settings in which A C E users are surrounded by such high levels of distraction (e.g. a noisy classroom) that time would perform exceptionally poorly. Then an eye-tracker would be needed to present a much clearer picture of the focus of the student's attention. These results suggest the need for a high degree of flexibility in the structure of the part of the A C E student model which assesses implicit self-explanation. Chapter 5. The New ACE Student Model 35 5.2 Adding "SE-related behavior" to the model This section describes the structure of the portion of the student model which assesses student implicit self-explanation behavior. The choice of structure is based on the study results described earlier. The conditional probabilities associated with this part of the model are also determined from the results of the user study. 5.2.1 The naive Bayesian classifier The revised student model appears in figure 5.1 with a dashed triangle drawn around the part of the student model which was modified to include evidence of self-explanation. The gaze shift node has a binary value which indicates whether or not a gaze shift has occurred. Time is also a binary node indicating whether a time longer than the threshold M identified from the study data has elapsed, indicating sufficient time for self-explanation. Al l other nodes remain the same as shown in figure 3.4. Nodes representing the assessment of self-explanation are shaded grey. The main advantage to this approach is that it is highly modular, allowing the gaze shift and time nodes to be included in the model or left out as needed. The disadvantage is that it assumes independence between time and the presence of gaze shifts, which is not necessarily true. In fact, our data actually suggests a slight positive correlation between the two. However, similar assumptions in pure naive Bayesian classifiers have been shown in practice to perform surprisingly well even when this independence cannot be guaranteed. 5.2.2 Setting the Conditional Probabilities Another advantage to using a naive Bayesian classifier is the ease with which the associated conditional probabilities can be obtained from the data. In this part of the model, each node only has one parent, reducing the number of conditional probabilities that need to be determined. These can easily be calculated as the frequencies in the data from the user study described in the previous chapter: P(time > 16 seconds | implicit SE) = 0.71 P(time > 16 seconds | no implicit SE) = 0.32 P(gaze shift | implicit SE) = 0.61 Chapter 5. The New ACE Student Model 36 P(gaze shift | no implicit SE) = 0.24. Further, these probabilities remain the same even as the gaze shift and time nodes are added and removed. Knowledge Figure 5.1 The ACE student model with a naive Bayesian classifier component Figure 5.2 shows the revised student model over two time slices. In this model, after an exploratory action (e.g. the action represented by the node e0Casei in slice T) occurs, time is kept and eye movements are monitored until the next action. When the next action is carried out a new slice is created; in parallel, if the new action is not an action indicating implicit self-explanation (using a selection of predefined self-explanations in an interface menu), the implicitSE node for the previous action is updated according to the stimuli to SE, time and gaze shift nodes. An implicitSE node is created for the previous action in slice T, along with time and gaze shift nodes with the appropriate values. These new nodes are used to assess the effectiveness of the exploration case, updating the corresponding node, e0Casely in figure 5.2. Chapter 5. The New ACE Student Model 37 F i g u r e 5 .2 T h e A C E s tuden t m o d e l f o r t w o t i m e s l ices 38 Chapter 6 Testing the Student Model This chapter discusses data collection and analysis carried out to test the performance of the revised A C E student model. First, section 6.1 presents early tests done to check the performance and consistency of the new student model. Section 6.2 describes the setting of additional conditional probabilities in the model using the data from the previous user study. Then section 6.3 describes the collection of further user data for use in tests of the new model. This was largely a continuation of the previous user study with some slight changes. Section 6.4 discusses early tests of the accuracy of the new student model. Finally, section 6.5 gives the results of cross-validation tests performed to check the stability of the new model and get a better estimate of the model's accuracy. 6.1 Initial tests of the new model This section discusses early tests of the new model to check its performance and consistency. These tests were done primarily to make sure that the model was correctly implemented and that it behaved as expected. They were also intended to assess whether or not the revised structure was appropriate. Here the data from the previous user study was used to evaluate the model. This training data was used to test basic model performance. The interaction data from each student (including eye-tracking data) was given to the model as input using a simulated student program. Then the resulting implicitSE node and exploration node probabilities were compared to the previously used self-explanation data points and post-test scores. To get a better idea of the value of adding eye-tracking to the A C E student model, the two other previously developed models are also tested and compared in this chapter: (i) the one which does not assess self-explanation at all (model 1 from now on) [4] and (ii) the one which uses time only to detect implicit self-explanation (model 2 from now on) [5]. For purposes of consistency, the latter of these models was also assigned a time threshold of 16 seconds. In order to test the new model (model 3 from now on), the SE tendency node and each of the knowledge nodes were assigned a generic prior probability of 0.5. Thus the model began with an identical assessment of the Chapter 6. Testing the Student Model 39 knowledge and behavior of each student. Generic prior probabilities were also used. To test model accuracy in assessing implicit self-explanation, we needed a threshold probability to use in deciding when an implicitSE node predicts the occurrence of self-explanation. This was derived from data from the previous study as follows. First the interaction data for each student was run through each of the models which assess self-explanation (i.e., models 2 and 3) using a simulated student program. The probabilities of implicitSE nodes were then compared against the coded data points from the first study. Each data point corresponds to a user action which the experts determined was or was not self-explained. The implicitSE node in each model (see Figures 3.4 and 5.2) also yielded probabilities that self-explanation occurred at the time of this action. These probabilities were compared to expert assessments to test the predictive performance of each model. For purposes of clarification, a small fragment of this data is given in Table 6.1 below. Table 6.1 Values of implicitSE nodes corresponding to actions in study data Action Experts S E Model 2 Model 3 assessment (time only) (time and gaze shifts) 1 Y 0.698 0.723 2 N 0.287 0.180 3 Y 0.409 0.645 For each model, an optimal threshold M had to be determined such that the model showed that a student self-explained an action if and only if the implicitSE node associated with the action returned a probability over M . To find the best threshold, R O C curves were constructed. These appear in figure 6.1. Each of these curves yielded an optimal threshold which was used to find the accuracies of each model. For the model which only uses time to detect to self-explanation, this threshold was found to be 0.56 and for the new model which adds gaze shifts as evidence, the optimal threshold was 0.45. Table 6.2 shows the sensitivity and specificity in each case. A combined measure, the average of the two accuracies, is also reported. Here the addition of the eye tracker causes an increase in each of the measures. It should also be noted that the largest increase is in specificity. This is consistent with the assumption that time as a predictor causes too many false positives by assuming that the user spends all of the time between her actions considering the outcomes of the actions. Chapter 6. Testing the Student Model 40 Models as Predictors of implicitSE rate of false" positives Figure 6.1 R O C curves for models as predictors of implicit self-explanation Table 6.2 ImplicitSE accuracies with generic prior probabilities using training data Model with time only For SE Model with time and gaze shifts for SE True Positive rate (sensitivity) 69.4% 73.2% True Negative rate (specificity) 66.3% 77.4% Combined Accuracy 67.9% 75.3% All three models were then evaluated as predictors of sufficient exploration over the training set. Recall that in the preliminary user study, participants completed a post-test of mathematical concepts immediately after interacting with A C E . A correspondence was then created between questions on this post-test and domain concepts. These questions were then Chapter 6. Testing the Student Model 41 use to determine each student's aptitude in each o f these concepts (e.g., positive intercepts and even exponents) at the end of the experiment. In addition, when a student's log file was run through any of the three models, the final probabilities o f the exploration of categories nodes (e.g., nodes positive intercepts and negative intercepts in figure 3 .4) represent the model 's assessment that the student understands these concepts at the end of the interaction (i.e., that the student had adequately explored this material). These assessments could then be compared with the corresponding post-test scores to evaluate model accuracy over effective exploration. That is, i f the model returned a high probability that the user understood a concept and the user correctly answered problems on the post-test that dealt wi th the concept, then the model was assumed to be a good predictor o f sufficient exploration of the concept. For each of the three models, a R O C curve was used to determine the best threshold at which an exploration node could be found to indicate adequate exploration - and thus knowledge - o f the material. A l l three o f these R O C curves are given in figure 6.2. The optimal thresholds found for the model without self-explanation, the model which only uses time to assess it, and that which uses eye tracking data as wel l are 0.66, 0.58 and 0.44, respectively. Models as Predictors of Sufficient Exploration rate of false positives Figure 6.2 R O C curves for models as predictors o f sufficient exploration Chapter 6. Testing the Student Model 42 Using these thresholds, the accuracies of the exploration nodes in each of the models could be computed. These appear in table 6.3. These accuracies show that the new model is well-suited to assess adequate exploration in the training data. Also, each of the accuracies increased with each successive model, suggesting that the addition of self-explanation and gaze shift data were improvements to the model. These results also suggest that an increase in the accuracy of implicit self-explanation detection does in fact cause an increase in the accuracy of exploration assessment. Table 6.3 Exploration node accuracies with generic prior probabilities using training data Model Model with time Model with time and without SE only For SE gaze shifts for SE True Positive rate 68.8% 74.3% 77.4% (sensitivity) True Negative 59.4% 76.3% 78.3% rate (specificity) Combined 63.7% 75.3% 77.9% Accuracy It should also be noted that the increase in accuracy caused by the addition of the eye-tracker is higher for the implicitSE nodes than for the exploration nodes. This is due to the difference in the way each is measured. Each implicitSE probability is taken at the time that the actions occurs while only the probabilities of the exploration nodes at the end of the interaction are used in the analysis. Thus, while the implicitSE nodes represent the state of the user at a certain time and are strongly effected by the presence or absence of a gaze shift, the exploration nodes' final probabilities are the result of many actions throughout the interaction and are influenced by other factors. Given these results, it can be concluded that the primary benefit in adding eye-tracking versus time only is the more accurate assessment of implicit self-explanation, which allows A C E to generate more precise real-time interventions during student interaction with the system. 6.2 Additional Training of the Student Model After the revised model was tested for consistency, the data from the previous user study was used to set additional conditional probabilities in the student model. Chapter 6. Testing the Student Model 43 When the user study was carried out, the A C E coach hints were turned off. Therefore, no information was gathered about the effectiveness of coach hints as a stimulus to self-explain. Further, no data was available concerning whether or not a stimulus to self-explain actually caused the student to self-explain. In order to better fit this part of the model to the available data, the coach hint and stimuli to SE nodes were removed and the direct relationship between tendency to self-explain and implicit self-explanation was modeled instead. Two time slices of the resulting model appear in figure 6.3. The 18 study participants were then divided into self-explainers - those who self-explained at least 20% of the time - and non-self-explainers - those who did not. This resulted in 14 self-explainers and 4 non-self-explainers. It was then found from the data that the self-explainers self-explained 79.8% of the time and the non-self-explainers self-explained 13.3% of the time. These frequencies were then used to set the following conditional probabilities in the model: P(implicitSE | S E Tendency ) = 0.80 P(implicitSE | no SE Tendency) = 0.13. Figure 6.3 Two time slices of the adjusted A C E student model Chapter 6. Testing the Student Model 44 It should also be mentioned here that the pretest and post-test scores were compared for each student and a mean increase of 24% from the pretest to the post-test was found for the self-explainers while the non-self-explainers achieved.a mean increase of only 5.7%. The increase in the self-explainers' improvement over the non-self-explainers' improvement was also determined to be statistically significant at the p<0.05 level, confirming that self-explanation has a significant effect on overall learning. 6.3 New User S tudy Since the training data was used to derive the conditional probabilities used in the student model, new user data was needed to reliably test the accuracy of the model. Thus, we conducted a second study during which we collected data from 18 new students interacting with A C E as before and this study yielded the same form of data as in the previous study. The new participants were university students who had not taken any college-level math, also as before. However, two changes were made in an effort to solve earlier problems and improve the data overall. First, the previous experiment showed that user's reluctance to think aloud made it difficult to glean episodes of positive and negative self-explanation from the study data. We thought this could be due in part to the fact that by the time these users got to the plot unit, they rushed through it, speaking as little as possible, because they were tired and eager to have the heavy and intrusive eye-tracker taken off. In order to get more data points, new study participants were asked to skip the arrow unit. Since the arrow unit serves as more of a test of the student's knowledge than as a chance for the student to explore and observe, it contributes little to a study of self-explanation behavior during exploration anyway. It was hoped that if students got to the plot unit sooner, they would stay longer and be more willing to think aloud. The machine unit was left in the experiment as it gave the users a chance to get used to thinking aloud before arriving at the plot unit. Second, participants in the new study were asked to turn off the monitor and stare at a blank screen for several seconds at the end of the experiment. This was done in order to collect a baseline pupil size against which pupil size measurements throughout the interaction could be compared. This work is described in greater detail in Chapter 7. Chapter 6. Testing the Student Model 45 6.4 Testing the New Student Model This section discusses research carried out to evaluate the performance of the new student model with the revised structure as shown in Figure 6.3. Here analysis is performed to test the sensitivity of the model assessments to prior probabilities of the knowledge and SEtendency nodes. For simplicity, all tests are carried out over the new user data - the testing set. 6.4.1 Testing the Accuracy of the New Model with Different Priors This section describes different tests of the accuracy of the student model using the new user study data - the testing set. First the SEtendency node and each of the knowledge nodes are assigned a prior probability of 0.5. Then tailored prior probabilities are used for knowledge nodes and in later tests, the prior probability of the SEtendency node is customized as well. Throughout this subsection, all "training" of the model was done using the user data from the preliminary study of Chapter 4. As described in Chapter 5 and section 6.2, frequencies in this data were used to set conditional probabilities in the implicit self-explanation part of the new model. In addition, each accuracy presented here was based on a probability threshold (i.e., a value M for which a node was said to return a Y E S if it had a probability greater than M) which was derived from a R O C curve over the preliminary user data. As in section 6.1, the exploration and SEtendency nodes were assigned generic prior probabilities of 0.5. Thus each model began with no initial information about each particular student. Generic prior probabilities were also used in the model which did not assess self-explanation and that which only used time. Next the accuracy of the implicitSE nodes over the testing set was calculated for each model. The first two rows of Table 6.4 show the sensitivity and specificity in each case. A combined measure, the average of the two accuracies, is reported as well. Here the addition of the eye tracker causes a substantial increase in each of the measures. As in section 6.1, the largest increase is in specificity. This is consistent with the assumption that the use of eye tracking data will catch many of the false positives inherent in the use of time as a predictor. In order to gain a more complete picture of the model's implicitSE assessment over the testing data, a R O C curve was created for each model over this data. These curves appear in Figure 6.4. As discussed in [12], the area under a R O C curve is equal to the probability that a randomly selected positive case will be given a higher probability by the model than a randomly Chapter 6. Testing the Student Model 46 selected negative case. Thus R O C curves with larger areas correspond to better predictors over the data. Model with time only for S E Model with time and gaze shifts for S E Generic prior probabilities for knowledge nodes True Positive rate (sensitivity) 65.1% 71.6% True Negative rate (specificity) 62.6% 74.3% Combined Accuracy 63.9%) 73.0% Customized prior probabilities for knowledge nodes True Positive rate (sensitivity) 65.6% 72.2% True Negative rate (specificity) 62.8% 74.6% Combined Accuracy 64.2% 73.4% Customized prior probabilities for knowledge nodes and the SE tendency node True Positive rate (sensitivity) 68.8% 74.2% True Negative rate (specificity) 65.2% 77.6% Combined Accuracy 67.0% 75.9% ratio of false positives Figure 6.4 R O C curves for models as predictors of implicit self-explanation over testing data with generic prior probabilities Chapter 6. Testing the Student Model 47 The first two rows of table 6.5 give the area of each curve as well as the standard error for each area. These values show improved performance with the addition of the eye-tracker. This improvement was found to be statistically significant at the z > 1.96 level, as indicated by the dotted line in the table. Table 6.5 Areas of R O C curves for implicitSE nodes. Here the dotted lines indicate statistically significant differences Model with time only for SE Model with time and gaze shifts for SE Generic prior probabilities for knowledge nodes R O C curve area 0.62; Standard Error, C.. 0.042i 0.036 Customized prior probabilities for knowledge nodes R O C curve area n ii'. 0.84 Standard Error i-;, 0.037 - 0.029 Customized prior probabilities for knowledge nodes and the SE tendency node R O C curve area Standard Error 0.035 0.025 For each model, the accuracies of the exploration nodes were then computed. These appear in the first three rows of table 6.6. Each of the accuracies increased with each successive model, suggesting that the addition of self-explanation and gaze shift data were in fact improvements. It also confirms here that an increase in the accuracy of implicit self-explanation detection does in fact cause an increase in the accuracy of exploration assessment. .For each model, R O C curves were then created for the exploration node assessments over the testing data. These appear in Figure 6.5. The areas under the curves and the standard error for each area are displayed in the first two rows of table 6.6. The differences between each of the areas were also found to be statistically significant at the z > 1.96 level, as indicated by the dotted lines in the table. Before each student in the user study used A C E , she completed a pre-test which covered all of the relevant mathematical concepts. For the second test of the new model, these results were used to pick prior probabilities for the knowledge nodes for each student. If the student answered the corresponding pretest items correctly, a prior probability of 0.85 was assigned to the Chapter 6. Testing the Student Model 48 corresponding knowledge node. Otherwise a probability of 0.15 was given. These values were chosen as reasonable estimates since they are close to 1 and 0 but they still allow for those students who guess correctly or make errors even though they understand the concept. Also, early analysis also showed that small changes in the prior probabilities (e.g., using 0.9 and 0.1 instead) have very little effect on the resulting model assessments. Table 6.6 Accuracies of exploration nodes for different prior probabilities Model without S E Mode l with time only for S E Mode l with time and gaze shifts for S E Generic prior probabilities for knowledge nodes True Positive rate (sensitivity) 62.7% 70.4% 73.9% True Negative rate (specificity) 55.2% 71.5% 76.3% Combined Accuracy 59.0% 71.0% 75.1% Customized prior probabilities for knowledge nodes True Positive rate (sensitivity) 64.3% 72.2% 76.2% True Negative rate (specificity) 58.8% 74.9% 78.4% Combined Accuracy 61.6% 73.6% 77.1% Customized prior probabilities for knowledge nodes and the SE tendency node True Positive rate (sensitivity) 74.5% 78.1%. True Negative rate (specificity) 76.7% 79.6% Combined Accuracy 75.6% 78.9% With these customized prior probabilities, the accuracies of the implicitSE and exploration nodes were computed for each model. These accuracies appear in the middle three rows of Tables 6.4 and 6.4. Each set of accuracies increases with the customization of the prior probabilities of the exploration nodes. However, the increases for the implicitSE nodes are smaller than for the exploration nodes as knowledge effects the assessment of sufficient exploration more directly than it effects the presence of implicit self-explanation in the model. As with the case for generic prior probabilities, each new model causes an increase in accuracy. In addition, R O C curves were created for the implicitSE and exploration assessments of each model over the testing data. The middle two rows of tables 6.4 and 6.6 give the areas under each curve as well as the standard errors for each of these areas. For each model, customizing the knowledge Chapter 6. Testing the Student Model 49 prior probabilities leads to an increase in area which is statistically significant at the z > 1.96 level, as indicated by the dotted lined in the tables. 7 / no SE — — time only for implicitSE time and eyetracking for implicitSE 0 0.1 0.2 (5.3 0,4 8.S 0.6 0.7 0.6 M i rale of faVc pasilnes Figure 6.5 R O C curves for models as predictors of sufficient exploration over testing data with generic prior probabilities As was done in section 6.2 for the original study data, the new study participants were divided into self-explainers - those who self-explained at least 20% of the time - and non-self-explainers - those who did not. This classification was used to customize the SEtendency node prior probabilities for each student. For self-explainers, this node was assigned a prior probability of 0.85 and for non-self-explainers, a value of 0.15 was used. As with the knowledge nodes, these values were chosen as they are close to 1 and 0 and tests revealed that the overall model has little sensitivity to small changes in these prior probabilities. With the tailored tendency to SE nodes, the implicitSE and exploration node accuracies were determined for each model. These values appear in the lower three rows of Tables 6.4 and 6.6. In each of the models which included self-explanation, tailoring the prior probabilities of the SEtendency node caused an improvement of the performance in assessing both implicit self-explanation and sufficient exploration. It should also be noted that the accuracies were further improved by the addition of the eye-tracker. Chapter 6. Testing the Student Model 50 Table 6.7 Areas of R O C curves for Exploration nodes. Dotted lines indicate statistically significant differences Model without SE Model with time only for SE Model with time and gaze shifts for SE Generic prior probabilities for knowledge nodes R O C curve area 0.56 0.76 • • Standard Error : 0.036! C 0.033 I 0.029 Customized prior probabilities for knowledge nodes R O C curve area 0.64 • • • 0.84 Standard Error ? r - 0 ; 0 3 4 / 0:031 0.024 Customized prior probabilities for knowledge nodes and the SE tendency node R O C curve area Standard Error 0.027 0.021 For each model, R O C curves were the constructed for the implicitSE and exploration assessments over the testing data. The last two rows of tables 6.4 and 6.6 give the areas under each curve as well as the standard errors for each of these areas. For each model, the increase in area which resulted from customizing the SEtendency prior probabilities failed to achieve statistical significance. This is perhaps due to the fact that while SE Tendency is maintained in the model as a probability that changes over time and always depends on previous values, the dependence between the current tendency to self-explain and the presence or absence of self explanation is very strong (recall that P(implicitSE | SE Tendency) = 0.80 and P(implicitSE | no S E Tendency) = 0.13). Thus, for each action, SETendency is greatly effected by such factors as time and gaze shifts and the effects of the prior probability can dissipate over time. It should also be mentioned that it is difficult to select an appropriate value for this prior probability as many students seem to self-explain about half the time. Then assigning a prior probability of 0.85 is actually less accurate than the originally used value 0.5. In summary, the addition of the eye-tracker causes a significant improvement in the performance of the model as a predictor of implicit self-explanation and sufficient exploration over the testing data. Customizing the prior probabilities of the knowledge nodes also significantly increases model accuracy. However, customizing the prior probabilities of the SEtendency node fails to produce a significant improvement. Chapter 6. Testing the Student Model 51 6.5 Cross-validation Analysis This section describes cross-validation analysis carried out to test the overall stability of the new model and get a better sense of the strength of the accuracy results of the last section. Three different types of cross-validation were performed in each of the experiments described in this section. The first was leave-one-out in which the model was trained - i.e. the conditional probabilities were set from frequencies - on the data from all but one student. Then the model was tested over the data from the remaining student. This was done for each of the 36 students and results from each student were averaged. This type of cross-validation is the most important because it gives the best sense of how the model behaves for one student. The second type of cross-validation was leave-half-out in which the students were randomly divided in half into a training set and a testing set. Frequencies from the first set were used to set conditional probabilities in each model, which was then tested over the latter set. This was performed 40 times and average accuracy over the student was calculated. This type of cross-validation shows the mean behavior of the model across a group of students. Finally leave-half-points-out involved treating all of the data points as belonging to one student and then randomly dividing them into a testing set and a training set. This was done because some students had more data points than others and this ensures that the data is truly divided in half. In addition, it gives an alternate view of the performance of the models over the data independent of which students are being used. As with leave-half-out, 40 folds of the cross-validation were performed and the accuracies were averaged. 6.5.1 Customizing prior probabilities The first set of tests was performed to determine if and how the prior probabilities for exploration nodes should be assigned. The decision was made at this point to always use generic prior probabilities for SEtendency nodes both because it is unrealistic in practice to assume that a student's self-explanation tendencies have been tested before using A C E and because the previous tests showed that the custom priors do not significantly improve accuracy. However, mathematical competence can easily be tested through the use of pretests or some other means. Since we found that customized SEtendency priors failed to significantly improve accuracy, we chose not to explore this type of customization further. Chapter 6. Testing the Student Model 52 Here we ran model 3 with three types of prior probabilities for knowledge nodes: (i) the generic setting of 0.5, (ii) a population prior probability based on the average pre-test performance of the test set and (iii) a probability (0.15 or 0.85) based solely on the pre-test score of the individual student. Table 6.8 Cross-validation results for implicitSE nodes using various prior probabilities Generic prior Population prior Custom prior probabilities probabilities Probabilities Mean Std Mean Std dev Mean Std accuracy dev accuracy accuracy dev Leave-one-out 71.6% 7.9% 72.5% 7.8% 75.2% 7.2% Leave-half-out 70.4% 9.7% 73.6% 11.3% 76.4% 6.4% Leave-half-pts-out 71.7% 8.5% 73.8% 8.8% 76.1% 7.9% Each type of cross-validation was carried out and the accuracy results for the implicitSE nodes are shown in table 6.8. Here only the combined accuracies are given for purposes of brevity. Each type of cross-validation shows that performance improves as the prior probabilities of the knowledge nodes are increasingly customized. A N O V A tests showed statistical significance in the differences within the set of results but T-tests were performed to find that, in each case, only the increase in accuracy from generic priors to custom priors is statistically significant to the p<0.05 level. Customizing prior probabilities to individual students also improves model stability, as shown by the decreases in standard deviation. It should also be noted that these results are very similar across the different types of cross-validation. This is due to the fact that each value is the average of many accuracy values which were based from model assessments from the same group of students. Table 6.9 Cross-validation results for exploration nodes using various prior probabilities Generic prior probabilities Population prior probabilities Custom prior Probabilities Mean Std dev Mean Std dev Mean Std accuracy accuracy accuracy dev Leave-one-out 71.6% 11.6% 72.6% 10.7% 76.8% 8.7% Leave-half-out 72.7% 9.4% 73.2% 9.8% • 75.7% 7.8% Leave-half-pts-out 70.7% 10.6% 72.9% 9.4% 76.2% 8.3% The results for the exploration nodes appear in table 6.9. As with the implicitSE nodes, accuracy increases and prior probabilities go from generic Chapter 6. Testing the Student Model 53 to population to custom. However, only the increase caused by changing the priors from generic to custom is statistically significant. Stability also increases as prior probabilities are customized. 6.5.2 Comparing the different models Here the student model without self explanation, the model which only uses time to assess it, and the new model which uses gaze shifts as well are tested and compared. In each case, individual prior probabilities are used for the knowledge nodes in each model. Table 6.10 Cross-validation results for implicitSE nodes using various student models Model with time Mode l with time only for S E and gaze shifts Mean Std dev Mean Std accuracy accuracy dev Leave-one-out 65.8% 7.4% 75.2% 7.2% Leave-half-out 66.3% 8.3% 74.7% 7.6% Leave-half-pts-out 65.7% 7.2% 75.8% 6.7% Each type of cross-validation was carried out and the accuracy results for the implicitSE nodes and exploration nodes are shown in tables 6.10 and 6.11, respectively. For each type of node, the addition of self-explanation and then eye-tracking increased model accuracy and stability. A N O V A tests also showed statistical significance within the set and pairwise T-test confirmed that each increase was significant at the p < 0.05 level. Table 6.11 Cross-validation results for exploration, nodes using various student models - • -Model without SE Model with time Model with time only for SE and gaze shifts Mean Std dev Mean Std dev Mean Std dev accuracy accuracy accuracy Leave-one-out 57.3% 11.6% 65.3% 9.4% 71.6% 8.7% Leave-half-out 56.8% 12.6% 66.3% 10.5% 71.2% 8.6% Leave-half-pts-out 57.8% 11.9% 67.3% 8.4% 73.5% 7.3% 6.5.3 Calculating Optimal Thresholds As previously noted, the leave-one-out cross-validation was essential because it gave the best sense of how the model would perform for a single user. In this process, one student would first be chosen as the one to be "left out." Chapter 6. Testing the Student Model 54 Then all necessary training would be carried out using the remaining data. This is included setting frequencies in the model and determining optimal thresholds via R O C curves. Thus, in addition to the stability results just described, this analysis yielded an optimal threshold for each student and each model. This information was used to choose a "final" threshold for each type of model. Since a training set would not be appropriate or available for A C E system use in practice, it was necessary to make such a selection. This was carried out as follows: for each model and type of prior, leave-one-out cross-validation resulted in 36 optimal thresholds (one per student). These thresholds were then averaged and these means appear in tables 6.12 and 6.13 with standard deviations. Table 6.12 Mean optimal thresholds and standard deviations for implicitSE node assessments across folds of leave-one-out cross-validation Model with time only Model with time and for SE gaze shifts Mean Std dev Mean Std dev Threshold Threshold Generic knowledge 0.49 0.0048 0.45 0.0077 node priors Custom knowledge 0.52 0.0055 0.49 0.0101 node priors As the low standard deviations in the tables show, the optimal thresholds chosen in each fold of the cross-validation are quite similar. This is due to the fact that set of thresholds results from R O C curves formed using model assessments over almost identical data sets (i.e., each data set differs by one student). The low standard deviation also confirms that for new users of the system, these mean thresholds are appropriate to use in interpreting model assessments. Table 6.13 Mean optimal thresholds and standard deviations for exploration node assessments across folds of leave-one-out cross-validation Model without SE Model with time only for SE Model with time and gaze shifts Mean Threshold Std dev Mean Threshold Std dev Mean Threshold Std dev Generic knowledge node priors 0.59 0.0084 0.52 0.0180 0.46 0.0123 Custom knowledge node priors 0.65 0.0077 0.56 0.0170 0.48 0.0044 Chapter 6. Testing the Student Model 55 6.6 Testing the Performance of the Models using Different Evidence of Self-explanation As previously discussed, the main advantage to the structure of the new model is that it is highly modular, allowing the time and gaze shift nodes to be used or ignored as needed. This section presents tests performed to assess the accuracy of the model using time alone and using gaze shifts alone to detect implicit self-explanation. First the A C E student model was adjusted to ignore the gaze shift node. Note that while this model uses time alone for implicit self-explanation, it differs in structure from model 2 (figure 3.4). Another version of the model was then similarly created which only uses gaze shifts as evidence of implicit self-explanation. As previously done, the data from the preliminary user study was sent into each model using a simulated student program. Then R O C curves were created for the implicitSE nodes of each model over this data. These curves yielded optimal thresholds which could then be used to test the performance of each model over the new user data. These results appear in table 6.14 below. For purposes of comparison, the accuracies for the model which uses time and gaze shifts as evidence are also presented. Table 6.14. ImplicitSE accuracies for the A C E model using different predictors as evidence of implicit self-explanation Model with Model with Eye- Model with time time only tracking only and eye-tracking True positive 67.9% 62.3% 71.6% rate (sensitivity) True negative 64.8% 67.8% 74.3% rate (specificity) Combined 66.3% 65.1% 73.0% Accuracy As shown in the table, the model which uses time alone has a higher sensitivity than the model with gaze shifts alone while the latter model returns a higher specificity. These findings match those of the original user study. They are also consistent with the assumption that time overestimates self-explanation behavior by assuming that the user spends all idle time considering the exploration. For each measure, the combined predictor outperforms either on its own. As before, R O C curves were created for the exploration nodes of each model over the original study data. The resulting thresholds were then used to calculate the accuracies of each model over the new data. These results are given in table 6.15. Chapter 6. Testing the Student Model 56 As with the implicitSE nodes, the model with time alone has a higher sensitivity than the model which only uses gaze shifts. However the model with gaze shifts alone achieves higher specificity. These predictors combine to yield the highest accuracy for each measure. This is due to the fact that accuracy improves with more evidence used. It should also be noted that each single predictor seems to succeed where the other fails so this complimentary behavior likely contributes to the high accuracy of the combined predictor. Table 6.15 Exploration accuracies for the A C E model using different predictors as evidence of implicit self-explanation Model with Model with Eye- Model with time time only tracking only and eye-tracking True positive 71.2% 69.8% 73.9% rate (sensitivity) True negative 72.9% 73.4% 76.3% rate (specificity) Combined 72.1% 71.6% 75.1% Accuracy It should also be noted that the accuracies generated by the new model when only time information is used are comparable to (although slightly higher than) the accuracies of model 2, despite the differences in structure and method of C P T definition (data-based for model 3 and expert-based for model 2). 57 Chapter 7 Pupil Dilation as a Predictor of Self-Explanation Previous research has demonstrated a positive correlation between a person's pupil dilation and cognitive load in a wide variety of tasks [2]. People performing tasks requiring more attention have been shown to have greater pupil size. Recent work has been done to evaluate the performance of pupil dilation as an indicator of cognitive load of users of adaptive systems, with controversial results. Schultheis and Jameson [26] led a study to analyze the pupil sizes of users of adaptive hypermedia systems. Participants were asked to read text samples of varying difficulty and their pupil sizes were examined throughout the task. The difference in text difficulty - and thus cognitive load - was not apparent in pupil diameter. So in this study pupil size failed to be a reliable indicator of cognitive load. Similarly, Iqbal et al [16] examined the sensitivity of pupil size to cognitive load as users performed different tasks, including file manipulation and the reading of text. As in [26], pupil size failed to be an accurate indication of cognitive load during the reading task. However, it was also found that pupil size was sensitive to task difficulty only during certain subtasks of the file management task. Unlike the file management task, the reading of text could not be decomposed into smaller, independent subtasks,. Thus, while pupil size may be'sensitive to cognitive load during some tasks, it had to be-concluded that it is not a suitableindictor of difficulty during the reading of text. This chapter describes research performed to determine whether pupil size may be used as an additional predictor of self-explanation behavior in users of the A C E system. Section 7.1 details the process of trying to collect reliable pupil dilation data during the user studies and in section 7.2, the analyses and results are presented. 7.1 Data Collection In the user studies described in Chapter 4 and 6, each session yielded a log file containing data from the eye-tracker. In addition to gaze data such as the time and location of each of the user's fixations, these log files included the Chapter 7. Pupil Dilation as a Predictor of Self-Explanation 58 diameter of the user's pupil throughout the interaction. However, as the preliminary study in Chapter 4 was focused entirely on testing the merit of gaze patterns as predictors of self-explanation, no effort was taken to attend to differences in pupil behavior between study participants or environmental changes between sessions. It was thus impossible to perform any reliable analysis of this data. Several factors have long been known to influence pupil size [2]. For example, pupils tend to dilate in darker rooms and contract in the presence of more light. Pupil size also naturally fluctuates with the size of the eye itself from one person to another. Even nonvisual stimuli such as sound can affect pupil dilation. During the study, pupil sizes results were given by the eye-tracker in units of number of pixels in the pupil image. Thus the pupil size readings for each student were also influenced by the placement of the eye-tracker itself: if the camera was placed closer to the participant's eye, all pupil size readings increased. While it is not possible to keep environmental conditions adequately constant from one study session to the next or to ensure that users have similar pupils, steps may be taken to normalize against these differences and get a pure correlation between pupil size and cognition across all study participants. One method which has been used and accepted [26] involves taking a baseline measurement - i.e., measuring the pupil dilation at a time in the session when all users should have the same cognitive state. This baseline encapsulates all information regarding the pupil attributes of the user and environmental conditions such as lighting. For each user, the baseline may be subtracted from all other pupil size values to yield a normalized measure which may then be compared to those of users from other sessions. As mentioned in section 6.3 a second user study was performed in order to gather additional data for testing of the A C E student model and address problems that appeared in the preliminary study (e.g., users' reluctance to think aloud). Efforts were also taken during this study to collect reliable pupil size data using the baseline normalization method described above. To get a baseline pupil size value, participants were asked to turn off the computer monitor and stare at the blank screen for a few seconds after completing their interaction with A C E . This baseline was chosen for several reasons. First, staring at a blank screen would ensure that each user had the same visual stimuli at this point in the session. It was also assumed that, while sitting idle at the close of the A C E interaction, the participants shared the same cognitive state. Finally, turning off the eye-tracker while the user was staring at the blank screen caused the baseline value to be the last pupil size measurement recorded by the eye-tracker. While not necessary, this eliminated any error that might occur in trying to find the moment of baseline measurement in the eye-tracking.data. Chapter 7.Pupil Dilation as a Predictor of Self-Explanation 59 With a baseline value obtained for each of the 18 participants in the new study, analysis could be carried out to explore the correlation between pupil dilation and self-explanation. This is discussed in the next section. 7.2 Results and Discussion Recall from Chapter 6 that the second user study resulted in a set of exploration cases in which observers determined whether self-explanation was definitely present or absent. For each of these data points, the user's average pupil size after the action but before the next action was calculated. This was carried out as follows: during the time interval between actions, the learner made a series of eye fixations, each of which was recorded by the eye-tracker. In addition to location coordinates, the data for each fixation included the length of the fixation in milliseconds and the user's pupil size (given as image area in pixels) when it occurred. For each of these fixations, the pupil size and duration were multiplied. These values were then summed and divided by the total elapsed time, resulting in a weighted average pupil size between actions. Finally, the baseline measurement was subtracted, resulting in a normalized value. A sample of this data appears in Table 7.1. Note the each normalized value in the table is negative. While one might expect that users had low cognitive load - and thus small pupil size - at the moment the baseline recording, this could no be guaranteed. For the purposes of a baseline it was only necessary that students be in the same cognitive state. Further, students would naturally have larger pupils when staring at a blank monitor than when looking at a lit screen. Table 7.1. Sample fragment of pupil size data User # Self- M e a n pupil size User baseline Normalized Explanation? after action value 7 Y 1321 1391 -70 7 Y 1359 1391 -32 • 8 • N • 550. . , .606 -56 8 Y 584 606 -22 After determining a normalized average pupil size value for each of these exploration cases, calculations were done to determine whether there was a significant difference in the pupil size of users who were self-explaining and those who were not. These results are shown in Table 7.2. Here we see that users had a mean normalized pupil size of -56 when they self-explained and -59 when they did not. However, due to the large standard deviations, this Chapter 7. Pupil Dilation as a Predictor of Self-Explanation 60 difference fails to achieve statistical significance (as measured by a two-tailed T-test). Table 7.2. Mean normalized pupil sizes with and without self-explanation Self-Explanation? M e a n normalized pupil size Standard Deviation Yes -56 14.4 No -59 10.5 A R O C curve was also created for these points to give a graphical representation of the performance of pupil size as a predictor of self-explanation for different normalized pupil size thresholds. This curve appears in figure 7.1 below. The area under this curve is 0.43, less the area of 0.5 given by random chance. Thus, over this data, pupil size is definitely not an acceptable predictor of self-explanation. There are several possible causes for the failure of pupil size as a reliable detector of self-explanation in this data. It was assumed earlier that all users who stared at the blank screen after interacting with A C E were at approximately the same cognitive state. This might not have been a correct assumption, in which case the baseline used may not have been appropriate. To test this theory, Z-scores were used to normalize pupil sizes between participants. Each mean pupil size was normalized according to the following formula: X k - m X k ' = where X k ' is the normalized mean pupil size after action k, X k is the mean pupil size after action k (these values appear in the third column of tables 7.1), m is the average value of X k over all actions by the user, and s is the standard deviation corresponding to m. While the mean pupil size m encapsulates all information which would effect the user's pupil size (i.e., lighting conditions and eye size), it is not specific to any single moment during the experiment and thus does not require the assumption that we identify the point at which all users are in the same state, as is required for the baseline method. A sample fragment of this data appears in table 7.3. Chapter 7.Pupil Dilation as a Predictor of Self-Explanation 61 Pupil Size Normalized with Baseline as Predictor of SE rate of false positives Figure 7.1. R O C curve for pupil size as a predictor of self-explanation As before, calculations were done to determine whether there was a significant difference in the pupil size of users who were self-explaining and those who were not. These results appear in Table 7.4. Here we see that when users had a mean normalized pupil size of 0.928 when they self-explained and 0.812.when they did not. However, due to the large standard deviations, this difference also fails to achieve statistical significance. Table 7.3. Sample fragment of pupil size data using Z-score normalization User # SE? Mean pupil size After action Overall mean pupil size for user Standard Deviation Normalized value 7 Y 1321 1318 26.4 0.114 7 Y 1359 1318 26.4 1.553 8 N 550 562 17.3 -0.693 8 Y 584 562 17.3 1.272 A R O C curve was also constructed for these points to give a graphical representation of the performance of pupil size as a predictor of self-Chapter 7. Pupil Dilation as a Predictor of Self-Explanation 62 explanation for different Z-score normalized pupil size thresholds. This curve appears in figure 7.2. The area under this curve is 0.49, about the same as that given by random chance. Thus, applying the Z-score normalization, pupil size is still not an acceptable predictor of self-explanation. Table 7.4. Mean Z-score normalized pupil sizes with and without self-explanation Self-Explanation? M e a n normalized pupil size Standard Deviation Yes 0.928 0.37 No 0.812 0.29 0 I 1 1 i i i i , i i i I 0 0 .1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 .7 0 . 8 0 . 9 1 rate of false positives Figure 7.2. R O C curve for pupil size, normalized with Z-scores, as a predictor of self-explanation It was then hypothesized that perhaps positive self-explanation caused the greatest increase in cognitive load - and thus pupil size - immediately after the outcome of the corresponding exploratory action. Here the user would have just noticed the change in the interface and would not yet be planning the next action. In this case, examining the weighted mean pupil size over the whole time interval between actions would fail to capture this behavior as the pupil size increase would be dissipated by the smaller pupil size values at the end of the time interval. Figure 7.3(a) shows a plot of the pupil size in this case. Another possibility considered was that the greatest increase in Chapter 7. Pupil Dilation as a Predictor of Self-Explanation 63 cognitive load occurred at the end of the time interval, after the user had noticed the action outcome and had time to consider its meaning. These two cases appear together in figure 7.3(b). In either case, taking the mean pupil size over only the middle portion of the interval, as shown in the figure, would best capture the pupil size increase. action k action k+1 (a) (b) Figure 7.3. Pupil size plotted over the time interval when it is largest at the beginning (a) or at the beginning or the end (b). In (b), dotted lines indicate the middle 60% of the time interval To test this theory, normalized weighted pupil sizes were calculated (using Z-score normalization) over only the middle 60% of each time interval. Analysis was then performed to test for differences in pupil sizes in the presence or absence of self-explanation. These results appear in table 7.5. Table 7.5. Mean Z-score normalized pupil sizes with and without self-explanation over only the middle 60% of each time interval Self-Explanation? M e a n normalized pupil size Standard • Deviation Yes 0.861 0.41 No 0.794 0.29 Here users had a mean normalized pupil size of 0.861 when self-explaining and a mean pupil size of 0.794 otherwise. However, T-tests failed to establish statistical significance in this difference given the large standard deviations. Figure 7.4 shows a R O C curve for this data. The area under this curve was found to be 0.41, again less than that which would have resulted from random chance. Ambient sounds (e.g., people walking in the hall outside of the research laboratory) were not something that could be controlled and may have skewed the results. It is also possible that there may have been cases when a Chapter 7. Pupil Dilation as a Predictor of Self-Explanation 64 user was not self-explaining but their cognitive load was relatively high because they were thinking about something other than the domain material. A third explanation is that for some of the participants, the spontaneous self-explainers in particular, self-explanation comes quite naturally and thus does not involve an increase in cognitive load at all. rate of fa lse posi t ives Figure 7.4. R O C curve for pupil size as a predictor of self-explanation, using the middle 60% of the time interval after each action 65 Chapter 8 Conclusions and Future Work 8.1 Satisfaction of Thesis Goals The ultimate goal of this thesis is to improve the monitoring and support of effective exploration in an open learning environment. By using eye-tracking to gain a more accurate assessment of implicit self-explanation, we improve the system's ability to provide tailored support for self-explanation in the form of gentle prompts and intervention. In addition, an improved assessment of self-explanation brings about a better assessment of sufficient exploration, which improves the system's ability to support effective exploration. Together these lead to a richer, more adaptive exploratory experience. Recall from section 1.4 that this research began with a user study to test the reliability of elapsed time and gaze shifts as predictors of self-explanation. A new student model was then designed and built around these results and analysis was perform to assess he accuracy of this new model. 8.1.1 User Study to Assess Time and Gaze Shifts as Predictors of Self-Explanation The research described in this thesis began with a preliminary user study to test whether elapsed time and gaze shifts could be used as reliable indicators of self-explanation behavior. Participants in this experiment interacted with the A C E system, an exploratory learning environment for mathematical functions, while their exploratory behavior and eye movements were captured in log files. In addition, video recordings were made which included the A C E interface screen and the learner's speech throughout the interaction. Data analysis was then performed to classify each utterance, if possible, as evidence of the presence or absence of implicit-self-explanation. This was done using a detailed coding scheme developed for this purpose. While coding schemes exist for self-explanation study during example studying and problem solving, this was the first attempt to evaluate self-explanation during independent exploration. Those utterances that were definitively classified were then matched to the exploratory action to which they pertained. Each of then actions was then examined for time taken before the next action and the Chapter 8. Conclusions and Future Work 66 presence or absence of a gaze shift. This allowed an assessment of the reliability of these as predictors of implicit self-explanation. Accuracies were then calculated to show that gaze shifts alone had the highest specificity, i.e., that they served as a better detector of the absence of self-explanation. On the other hand, time and gaze shifts as a combined predictor had higher sensitivity, i.e., they performed best together at identifying self-explanation when it occurred. Thus it would be better to use time and gaze shifts together when priority is given to minimizing unnecessary interruption by the system. Similarly, gaze shifts alone are the best predictor when it is more important to catch the absence of self-explanation and intervene as needed. 8.1.2 Design of Revised Student Model As suggested by the results of the preliminary study, gaze shifts and time combine to serve as the best predictor of implicit self-explanation in some situations and in others it is preferable to use gaze shifts on their own. In addition, certain environmental factors may exist to influence the choice of predictor. For example, in situations where an eye-tracker is not available, due to cost or other reasons, it is necessary to use time on its own. There may also exist situations (e.g., a busy office) where distractions are great enough to render time useless as a predictor. Then gaze shifts would need to b used alone. Thus some structural flexibility is necessary in the part of the student model which assesses implicit self-explanation. To address these concerns, the original A C E student model was modified to include a highly modular component very similar to a naive Bayesian classifier. This revised structure allows time and gaze shifts to be easily added or excluded from the model. In addition, the conditional probabilities were easily derived from frequencies in the user data. 8.1.3 Evaluation of Performance of New Student Model In order to get an initial assessment of the behavior and consistency of the new A C E student model, this model was first tested using the user data from the original user study. This data was run through the new model (model 3 ) as well as an earlier model which did not include self-explanation (model 1) and one which only used time to assess it (model 2 ) . Accuracy results showed that the addition of self-explanation and then eye-tracking caused improvements in model performance. The improvement was particularly high for sensitivity, the measure of the models' abilities to detect the absence of implicit self-explanation. This is consistent with the theory that time as a predictor overestimates self-explanation behavior by Chapter 8. Conclusions and Future Work 67 assuming that the user spends all the time between actions considering their outcomes. After these initial tests, the student model was further refined to allow training of the part which assesses a user's tendency to self-explain. The original version of this part [5] was based on subjective intuition and reasonable estimates rather than user data. Frequencies in the data were then used to determine conditional probabilities in the revised model. An additional user study - almost identical to the first - was carried out to collect test data to assess the performance of the revised A C E model. Here each of the three models was tested in its ability to accurately assess implicit self-explanation (when applicable) and sufficient exploration. These models were tested over the new data using two methods. First, this data was run through each model using a simulated student program and model assessments were compared to observed cases of self-explanation and post-test performance. This resulted in accuracy rates for each model. Second, R O C curves were constructed for each student model over the data. These curves provided a graphic picture of the behavior of each model and the area under each curve served as a measure of model performance. Each type of analysis showed that accuracy increased with the addition of self-explanation and then eye-tracking. Adding the eye-tracker was found to significantly improve the accuracy of self-explanation assessment over using time only. In addition, the improved accuracy of self-explanation prediction resulted in a significant improvement in the assessment of student exploratory behavior. Customizing the prior probabilities of the knowledge nodes was also shown to significantly improve model performance. While customizing the prior probabilities of the SEtendency nodes increased model accuracy, this improvement failed to achieve statistical significance. In order to test the stability and reliability of these results, cross-validation was performed for each model. Here the leave-one-student-out, leave-half-students-out and leave-half-data-points-put methods were each used. Each type of analysis showed that model accuracy increased with the addition of self-explanation and the eye-tracking data. In addition, T-tests revealed that these increases were statistically significant at the p<0.05 level. 8.2 Summary of Contributions In this thesis, research was presented on the use of real-time eye-tracking data for the on-line modeling of user meta-cognitive behaviors during interaction with an open learning environment. It was shown that the addition of eye-tracking caused a significant improvement in the model's ability to assess implicit self-explanation and thus sufficient exploration. A more Chapter 8. Conclusions and Future Work 68 accurate student model allows the environment to better provide adaptive support to improve these behaviors and consequent student learning. T h e main contribution of this thesis is a formal evaluation demonstrating that the student model which includes eye-tracking data provides a more accurate assessment than a model using only time as evidence of implicit self-explanation. T h e evaluation also confirms that the inclusion of self-explanation in the model leads to a more accurate assessment of sufficient exploration 8.3 Limitations First, there is the obvious limitation of the need for the eye-tracker itself. W h i l e eye-trackers are decreasing in cost, they are still prohibitively expensive, especially for individual use. W h i l e monitor-mounted eye-trackers (e.g. IView X R e d from SensoMotoric Instruments, U S A ) lack the intrusiveness of their head-mounted counterparts, it can still be difficult to achieve and maintain calibration. M a n y factors, such as the presence of eye-makeup and the color and thickness of the user's eyelashes, can complicate the calibration process by making it more difficult for the eye-tracker to isolate the user's pupil . Later, during interaction with the system, the user must continue to face the monitor or risk a loss of calibration and a resulting break in the data steam. Second, as mentioned above, any intelligent system which uses gaze shifts as evidence of implicit self-explanation must have interfaces with distinct, well-defined regions where actions occur and effects appear. For example, suppose the Plot Uni t were changed so that when the user drops the curve, the appropriate equation appears in the plane with the curve. T h e n it would no longer be possible to define disjoint regions between which gaze shifts could occur. T h i r d , recall from Chapter 4 that the time threshold of 16 seconds was established using data collected from user study participants who expressed their self-explanation behavior or lack thereof by vocalizing their thoughts as they explored. Another problem is how to transfer this threshold to one more appropriate for real usage, when users do not need to speak to self-explain. 8.4 Future Work W h i l e eye-tracking has been shown to enhance the ability o f the A C E student model to assess- implicit self-explanation and sufficient exploration, it remains to determine whether this leads to an improvement in user performance. T h e next step of this research is to collect this empirical Chapter 8. Conclusions and Future Work 69 evidence. A variety of interface tools have been designed and implemented which allow A C E to provide different levels of prompting for both exploration and self-explanation by relying on the assessment of the student model described in this thesis. A user study to test the effectiveness of these adaptive tools is currently being designed. A long term research step is to consider the inclusion in the model of other meta-cognitive skills involved in effective exploration, such as the ability to form hypotheses and to monitor one's own progress in the learning task. While all work in this thesis pertains to the Plot Unit in A C E , the idea of using gaze shifts as evidence of implicit self-explanation could be applied to any system in which the interface is divided into distinct regions for actions and outcomes. It would simply be a matter of deciding, for each type of action, where the action would occur and then where the user would have to look to observe its effects. Other approaches include the use of unsupervised learning to detect relevant gaze patterns. Bibliography 70 1. Aleven, V. and K.R. Koedinger, An Effective Meta-cognitive Strategy: Learning by Doing and Explaining with a Computer-Based Cognitive Tutor. Cognitive Science, 2002. 2(2): p. 147-179. 2. Bunt, A. , On Creating a Student Model to Assess Effective Exploratory Behaviour in an Open Learning Environment, Master's Thesis in Computer Science. 2001, University of British Columbia. 3. Bunt, A . and C. Conati, Probabilistic Student Modeling to Improve Exploratory Behaviour. Journal of User Modeling and User-Adapted Interaction, 2003. 13(3): p. 269-309. 4. Bunt, A . , C. Conati, and K. Muldner, Scaffolding Self-explanation to Improve Learning in Exploratory Learning Environments. 7th International Conference on Inteligent Tutoring Systems, 2004. 5. Chi, M . T . H . , Constructing self-explanations and scaffolded explanations in tutoring. Applied Cognitive Psychology, 1996. 10: p. S33-S49. 6. Chi, M . T . H . , M . Bassok, M . W . Lewis, P. Reimann, and R. Glaser, Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 1989. 15: p. 145-182. 7. Conati, C , J. Larkin, and K. VanLehn. A Computer Framework to Support Self-Explanation. Eighth World Conference of Artifical Intelligence in Education. 1997. 8. Conati, C. and K. VanLehn, Toward Computer-based Support of Meta-cognitive Skills: A Computational Framework to Coach Self-Explanation. International Journal of Artificial Intelligence in Education, 2000.11. 9. de Jong, T. and R. van Joolingen, Scientific Discovery Learning With Computer Simulations of Conceptual Domains. Review of Educational Research, 1998. 68(179-201). 10. DeCarlo, D. and A . Santella. Stylization and Abstraction of Photographs. 29th Annual Conference on Computer Graphics and Interactive Techniques. 2002. New York: A C M Press. 11. Fogarty, J., R. Baker, and S. Hudson. Case Studies in the Use of ROC Curve Analysis for Sensor-Based Estimates in Human-Computer Interaction. Graphics Interface. 2005. 12. Gluck, K . A . and J.R. Anderson, What Role do Cognitive Architectures Play in Intelligent Tutoring Systems?, in Cognition and Instruction: Twenty-five Years of Progress, D. Klahr and S .M. Carver, Editors. 2001: Erlbaum: p. 227-262. ' 13. Hornof, A .J . , A . Cavender, and R. Hoselton. EyeDraw: A System for Drawing Pictures with Eye Movements. A S S E T S 2004" The Sixth Bibliography 71 International A C M SIGACCESS Conference on Computers and Accessibility. 2004. Atlanta, Georgia. 14.1qbal, S.T. and B.P. Bailey, Using Eye Gaze Patterns to Identify User Tasks. The Grace Hopper Celebration of Women in Computing, 2004. 15. Jakob, R. The Use of Eye Movements in Human-Computer Interaction Techniques: What You Look at is What You Get. 1998: Morgan Kaufmann Press: San Francisco. 16. Just, M . and P. Carpenter, The Psychology of Reading and Language Comprehension. 1986, Boston: A . a. Bacon. 17. Majaranta, P., A . Aula, and K.-J . Raiha. Effects of Feedback on Eye Typing with a Short Dwell Time. Symposium on Eye-tracking Research and Applications. 2004. San Antonio, T X : A C M Press: New York. 18. Mitrovic, T., Supporting Self-Explanation in a Data Normalization Tutor. Supplementary Proceedings of AIED2003, 2003. 19. Monty, R.A. and J.W. Senders, eds. Eye Movements and Psychological Processes. . 1976, Lawrence Erlbaum Associates: Hillsdale, New Jersey. 20. Qu, L . and L . Johnson. Detecting the Learner's Motivational States in an Interactive Learning Environment. 12th International Conference on Artificial Intelligence in Education. 2005. Amsterdam, The Netherlands. 21. Renkl, A . , Learning Mathematics from Worked-Out Examples: Analyzing and Fostering Self-Explanation. European Journal of Psychology and Education, 1999. in press. 22.Salvucci, D. and J. Anderson. Intelligent Gaze-Added Interfaces. SIGHCI Conference on Human Factors in Computing Systems. 2000. The Hangue, The Netherlands. 23.Schiessl, M . , S. Duda, A . Tholke, and R. Fischer, Eye Tracking and its Application in Usability and Media Research. "Sonderheft: Blickbewegung" in MMI-interaktiv Journal, 2003. 6. 24.Shute, V.J . and R. Glaser, A large-scale evaluation of an intelligent discovery world. Interactive Learning Environments, 1990. 1: p. 51-76. 25.Sibert, J.L. , M . GokTurk, and R.A. Lavine. The Reading Assistant: Eye' Gaze Triggered Auditory Prompting for Reading Remediation. 13th Annual A C M Symposium on User Interface Software and Technology. 2000. San Diego, California: A C M Press. 26.Sodhi, M . , B. Reimer, and I. Llamazares, Glance Analysis of Driver Eye Movements to Evaluate Distraction. Behavior Research Methods, Instruments and Computing, 2002. 34(4): p. 529-538. 27.Starker, I. and R.A. Bolt, A Gaze-Responsive Self-Disclosing Display. CHI: Human Factors in Computing Systems, 1990. 28. van Joolingen, W. Designing for Collaborative Discovery Learning. Fifth International Conference on Intelligent Tutoring Systems. 2000. Berlin, Germany. Bibliography 72 29. Van Lehn, K. , Student Modeling, in Foundations of Intelligent Tutoring Systems, M . C . Poison and J . J . Richardson, Editors. 1988, Lawrence Erlbaum Associates. 30. Williams, L . G . , The Effects of Target Specification of Objects Fixated During Visual Search. Acta Psychologica, 1967. 27: p. 355-360. Appendix A. Pretest 73 Appendix A Pretest Par t 1: Questionnaire A package gets dropped off at your doorstep, containing a brand new scooter. The only problem is that assembly is required. Y o u a) read the manual carefully, and then commence assembly b) skim the manual quickly, and start to put the thing together c) ask an expert to help you do it d) trash the manual and start right away - you will figure it out as you go When working with others, you a) are very vocal about your ideas - you often have a good idea of how to do things b) contribute some ideas, but don't like to be the main person in charge c) are mainly quiet - you prefer for others to take charge When learning how to use a new computer application for an assignment, you a) only figure out use those things that are totally necessary to get the assignment done b) figure out how to use it to get the assignment done, plus learn a few extra things that caught your attention c) fully explore the application - you are very curious about the various things you may discover in there When it comes time to decide on what and how to do a school project, you a) prefer to have the teacher tell you b) prefer to make one up yourself c) a mix of a and b: the teacher initially helps, but you have the final say When learning something, you like to a) get an overall picture of the thing you are learning, and then figure out the details b) figure out the details first, and then get an overall picture c) a mix of a and b d) other Y o u are in a strange new city. Y o u a) get a map right away and use it frequently to plan your routes b) get a map but use it only when absolutely necessary c) don't bother with a map - i f lost, you'll ask someone Y o u play computer games a) almost every day b) once or twice a week c) less often that once a week Y o u use the Web to find information that you need (like a phone number, shopping, etc) a) almost every day b) once or twice a week less often that once a week Appendix A. Pretest 74 O n a scale of 1-5, where 1 means very much, you like to surf the web 1) Very much 2) so-so 3) Neutral 4) not very much 5) not at all O n a scale of 1-5, where 1 means strongly like, your feelings about math are at: 1) Strongly like 2) like 3) Neutral 4) dislike 5) Strongly dislike Par t 2: Recognizing the output of a function. For the following function equations specify the output of the function with the given inputs. Please show you work in the spaces provided. 1) f(x) = 3 What is the output of f(0) ? a) 1 b) 0 c) 3 d) -3 What is the output of f(4) ? a) 4 b )3 c ) - 4 d) 0 2) f(x) = 3x + 2 What is the output of f(4) ? a) 4 b) 5 c) 2 d) 14 What is the output of f(-4) a ) - 4 b )5 c ) - 1 0 d ) 2 3) f(x) = 5x 2 What is the output of f(-l) ? a) -1 b) - 5 c) 5 d) 1 What is the output of f(l)? a) -1 b) - 5 c) 5 d) 1 4) f(x) = 3x 3 + 2x + 1 What is the output of f(0)? a) 0 b) 1 c) -1 d) 6 Appendix A. Pretest 75 What is the output of f(2)? a) 2 b)29 c)23 d) 6 Par t 3: G r a p h Properties For each question, circle the appropriate response. 1) The y-intercept of the graph in this picture is: Positive/Negative/Zero • y •* • 2) The y-intercept of the graph in this picture is: Positive/Negative/Zero y < • 3) The slope of the graph in this picture is: Positive/Negative/Zero ' Appendix A. Pretest 76 4) The slope of the graph in this picture is: Positive/Negative/Zero y < • 5) The slope of the graph in this picture is: Positive/Negative/Zero 6) The exponent in the function equation graph in his picture is: Even /Odd 7) The exponent of the function equation graph in this picture is: Even /Odd Appendix A. Pretest 77 8) The graph has been scaled by : a Positive Number / a Negative Number / Zero 9) Which graph has been scaled by a larger number ?: Graph A / Graph B GraphA GraphB 10) This graph has been shifted by a : positive number / negative number • \ 'y / • 11) This graph has been shifted by a : positive number / negative number Appendix A. Pretest 78 Par t 4: Graph/Equa t ion Properties The function f(x) = 2x + 4 may be best described by the graph: a) b) c) d) None of these graphs The function f(x) = x - 7 may be best described by the graph: a) b) c) d) None of these graphs The function f(x) = 3 may be best described by the graph: a) b) c) d) None of these graphs Appendix A. Pretest 79 The function f(x) = l x may be best described by the graph: a) b) i y M \ X The function f(x) = -2x + 5 may be best described by the graph: a) b) The function f(x) = 4x -5 may best described by the graph: a) b) c) c) d) None of these graphs d) None of these graphs d) None of these graphs The function f(x) = l x may be best described by the graph: a) b) c) d) None of these graphs Appendix A. Pretest 80 Par t 5: Equat ion Properties The function f(x) = 3 has a) a positive slope b) a negative slope c) a zero slope The function f(x) = 3x + 7 has a) a positive slope b) a negative slope c) a zero slope The function f(x) = - Ix +10 has a) a positive slope b) a negative slope c) a zero slope The function f(x) = -2x + 5 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept The function f(x) = 3x - 6 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept The function f(x) = 5 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept The function f(x) = -7 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept Appendix B. Post-test 81 Appendix B Post-test Par t 1: Questionnaire Comments can be added in the spaces provided. I found the information in the hints helpful (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree I understood why the computer suggested I stay in an exercise (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree I found it helpful when the computer told me to explore an exercise more than I had (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree I found it helpful when the computer prompted me to answer multiple-choice questions (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree I found it annoying when the computer prompted me to answer multiple-choice questions (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree The amount of guidance provided by the computer was (please circle one) a) Not enough b) O K , but I would have liked more c) Just Right d) A bit too much e) W a y too much Appendix B. Post-test 82 The computer gave me enough freedom to move around between activities (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree I found the information in the help pages useful (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree 1 prefer to learn this way over using a text book (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree M y favorite unit was: Machine (unit 1) Switchboard (unit 2) Plot (unit 3) I feel that I learned a lot using A C E (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree I feel that wearing the eye tracker made the experience less enjoyable, (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree I feel that wearing the eye tracker detracted from my ability to learn, (please circle one) a) Strongly agree b) Agree c) Neutral d) Disagree e) Strongly disagree The thing I liked the most about A C E : The thing I liked the least about A C E : What I would like to see added to A C E : Appendix B. Post-test 83 Par t 2: Recognizing the output of a function. For the following function equations specify the output of the function with the given inputs. Please show you work in the spaces provided. 1) f (x) = 8 What is the output of f ( 0 ) ? a) 5 b ) 0 c ) 8 d ) - 8 What is the output o f f (2)? a) 2 b )8 c ) - 2 d) 0 2) f (x) = 3x + 3 What is the output of f ( 5 ) ? a) 18 b )5 c ) 6 d) 14 What is the output of f (-5 ) ? a ) - 5 b )3 c ) - 1 2 d)-15 3) f (x ) = 2x2 What is the output of f (-2 ) ? a) -1 b) - 4 c) 8 d) -8 Appendix B. Post-test 84 What is the output o f f ( 2 ) ? a) 2 b) 4 c) 1 d) 8 4) f ( x ) = 2x3 + 3 x - l What is the output of f ( 0 ) ? a) 0 b) 1 c) -1 d) 4 What is the output of f (1) ? a) 1 b) 4 c) 5 d) 6 Par t 3: G r a p h Properties For each question, circle the appropriate response. 1) The y-intercept of the graph in this picture is: Positive / Negative / Zero A y •« • 2) The y-intercept of the graph in this picture is: Positive / Negative / Zero Appendix B. Post-test 85 3) The slope of the graph in this picture is: Positive / Negative / Zero 4) The slope of the graph in this picture is: Positive / Negative / Zero -y X 1 5) The slope of the graph in this picture is: Positive / Negative / Zero 6) The exponent in the function equation graph is this picture is: Even / O d d Appendix B. Post-test 86 7) The exponent in the function equation graph in his picture is: Even / O d d 1 8) The graph has been scaled by: A Positive Number / A Negative Number / Zero 9) Which graph has been scaled by a larger number? Graph A / Graph B Graph A Graph B 10) This graph has been shifted by a: Positive Number / Negative Number Appendix B. Post-test 87 11) This graph has been shifted by a: Positive Number / Negative Number Par t 4: G r a p h / Equat ion Properties Which graph does the function f ( x ) = 2x -6 best describe? a) b) c) Which graph does the function f (x ) = x + 7 best describe? a) b) c) Which graph does the function f ( x ) = 5 best describe? a) b) c) d) None of these graphs d) None of these graphs d) None of these graphs Appendix B. Post-test 88 Which graph does the function f ( x ) = 5x best describe? a) b) c) y ^ X d) None of these graphs Which graph does the function f ( x ) = -2x + 6 best describe? a) b) c) Which graph does the function f ( x ) = 3x +5 best describe? a) b) c) d) None of these graphs d) None of these graphs Appendix B. Post-test 89 Par t 5: Equa t ion Properties The function f ( x ) = -4 has a) a positive slope b) a negative slope c) a zero slope The function f ( x ) = -5x + 2 has a) a positive slope b) a negative slope c) a zero slope The function f ( x ) = 4x + 6 has a) a positive slope b) a negative slope c) a zero slope The function f ( x ) = -4x + 5 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept The function f ( x ) = 3x - 4 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept The function f ( x ) = -7 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept The function f ( x ) = 3 has a) a positive y-intercept b) a negative y-intercept c) a zero y-intercept 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051576/manifest

Comment

Related Items