UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Investigation of a quick tagging mechanism to help enhance the video learning experience Zhang, Xueqin 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2018_february_zhang_xueqin.pdf [ 5.93MB ]
JSON: 24-1.0362399.json
JSON-LD: 24-1.0362399-ld.json
RDF/XML (Pretty): 24-1.0362399-rdf.xml
RDF/JSON: 24-1.0362399-rdf.json
Turtle: 24-1.0362399-turtle.txt
N-Triples: 24-1.0362399-rdf-ntriples.txt
Original Record: 24-1.0362399-source.json
Full Text

Full Text

Investigation of a Quick TaggingMechanism to Help Enhance theVideo Learning ExperiencebyXueqin ZhangB.E., The University of Electronic Science and Technology of China, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2017c© Xueqin Zhang 2017AbstractVideo continues to be used extensively as an instructional aid within moderneducational contexts, such as in blended (flipped) courses, self-learning withMOOCs (Massive Open Online Courses), informal learning through onlinetutorials, and so on. One challenge is providing mechanisms for students toefficiently bookmark video content and quickly recall and review their videocollection. We have run a background study to understand how studentsannotate video content, focusing especially on what words they would usemost to bookmark video content. From this study, we proposed to leveragea quick tagging mechanism in an educational video interface comprised ofa video filmstrip and transcript, both presented adjacent to a video player.The ‘quick’ tagging is defined as an easy and fast way to mark course videoparts with predefined semantic tags. We use the metaphor of marking andhighlighting textbook to achieve our quick tagging interaction. This mecha-nism was evaluated in a controlled study with handwritten notes. We foundthat participants using our quick tagging interface spent around 10% longerwatching and learning from video on average than when taking notes onpaper. Our participants also reported that tagging is a useful addition toinstructional videos that helps them recall video content and finish learningtasks.iiLay SummaryVideo is widely used as an instructional aid within educational contexts suchas blended (flipped) courses, self-learning with MOOCs (Massive Open On-line Courses), informal learning through online tutorials, and so on. Onechallenge we face is to support students with efficient video content book-marking and to improve their recall and review of their video collection. Wehave run a background study to understand how students use texts to sum-marize or record video content, especially focusing on what words they usemost to bookmark video content. From this study, we proposed to integratefunctions in an educational video interfaces, such as adding tags, deletingtags, and using tags to jump around while watching the video. Here, tagsare a set of key words. The interface is comprised of a video filmstrip andtranscript, both presented adjacent to a video player. A video filmstrip isa set of thumbnails from the video arranged side by side, each representinga portion of the video. Our interface including the aforementioned taggingfunctions was compared with taking notes on paper in a lab study. Taggingrefers to the actions users take to manipulate tags, such as adding tags,deleting tags and editing tags. We found out that participants using ourinterface spent around 10% longer watching and learning from video on av-erage than when taking notes on paper. Our participants also reported thatsuch tagging functions are useful additions to instructional videos to helpthem quickly recall video content and finish learning tasks.iiiPrefaceAll of the research work presented in this thesis was conducted in the HumanCommunication Technologies Laboratory (HCT) at the University of BritishColumbia, Point Grey campus. All user studies and associated methods wereapproved by the University of British Columbia Behavioural Research EthicsBoard [certificates #: H13-01589].All of the implementation and experiments henceforth were conductedby myself. Concepts and design decisions were discussed among myself,Matthew Fong, Gregor Miller and Sidney Fels.A version of Chapter 4 and Chapter 5 was submitted as Zhang, X.Q.,Miller, G, Fong, M, Roll, I, Fels, S (2018) at CHI 2018.The screenshots from Figures 3.1 and 3.4 are c©copyright 2017, BerkleeOnline.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1 Tags/Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Video Annotation . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Video Interfaces for Education . . . . . . . . . . . . . . . . . 162.4 Summary and Influence on Design . . . . . . . . . . . . . . . 183 General Interface Design for Quickly Tagging EducationalVideos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1 Groundwork . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Overview of Quick Tagging Interface . . . . . . . . . . . . . . 203.3.1 Pre-defined Tags . . . . . . . . . . . . . . . . . . . . . 223.3.2 State Diagram of Quick Tagging Mode . . . . . . . . 223.3.3 Filmstrip . . . . . . . . . . . . . . . . . . . . . . . . . 24vTable of Contents3.3.4 Transcript . . . . . . . . . . . . . . . . . . . . . . . . 263.4 Design Strategies . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Background Study . . . . . . . . . . . . . . . . . . . . . . . . . 304.1 Datasets Analysis Methodology and Evaluation . . . . . . . 304.1.1 A Taxonomy Method of Tag Words . . . . . . . . . . 304.1.2 Datasets Description . . . . . . . . . . . . . . . . . . 334.1.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 354.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Validation of Top Collected Tags . . . . . . . . . . . . . . . . 404.2.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 404.2.2 Participants . . . . . . . . . . . . . . . . . . . . . . . 404.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 424.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 444.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Lab Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.1.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 515.1.2 Participants . . . . . . . . . . . . . . . . . . . . . . . 515.1.3 Design and Procedure . . . . . . . . . . . . . . . . . . 515.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 555.2.1 Pre-defined Tags . . . . . . . . . . . . . . . . . . . . . 575.2.2 Quick Tagging Modes and Interaction . . . . . . . . . 615.2.3 Further Improvement to the Tagging Interface . . . . 645.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 676.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70AppendixA Online Survey Questionnaire . . . . . . . . . . . . . . . . . . 75B Lab Study Questionnaire . . . . . . . . . . . . . . . . . . . . . 84viTable of ContentsC Intermediate Results of Background Study . . . . . . . . . 88D Aggregation Process of Background Study . . . . . . . . . . 91viiList of Tables4.1 Affect categories by Klaus. . . . . . . . . . . . . . . . . . . . . 314.2 Definition of each category from function and scope. . . . . . 324.3 Details of CLAS data. . . . . . . . . . . . . . . . . . . . . . . 344.4 An example process of choosing representative word from aword set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5 Five words were moved after cross-validation study. . . . . . . 364.6 The top five collected subject feature words in six subjects,respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.1 The aggregated results of our questionnaire . . . . . . . . . . 545.2 Pre-defined tags in our interface and aggregated use frequencyby 14 participants. . . . . . . . . . . . . . . . . . . . . . . . . 575.3 Multiple tags and aggregated use frequency by 6 participants 59viiiList of Figures1.1 Mental model in a video learning scenario . . . . . . . . . . . 32.1 ISEE interface overview, by Mu . . . . . . . . . . . . . . . . . 92.2 Videotater interface overview, by Diakopoulos et al. . . . . . 102.3 LEAN interface overview, by Ramos et al. . . . . . . . . . . . 112.4 the Family Video Archive interface overview, by Abowd et al. 122.5 Data-Driven Interaction Techniques, by Kim et al. . . . . . . 132.6 Textbook-style Highlighting for Video, by Fong et al. . . . . . 142.7 Video Digests creation and editing interface, by Pavel et al. . 152.8 Facilitating Navigation of Blackboard-style Lecture Video, byMonserrat et al. . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1 Overview of the quick tagging interface . . . . . . . . . . . . 213.2 State diagram of quick tagging mode. . . . . . . . . . . . . . 233.3 Overview of tagging field . . . . . . . . . . . . . . . . . . . . . 243.4 Oveview of filmstrip . . . . . . . . . . . . . . . . . . . . . . . 253.5 Overview of transcript . . . . . . . . . . . . . . . . . . . . . . 294.1 Comparison results between annotations and comments inmath courses from CLAS. . . . . . . . . . . . . . . . . . . . . 374.2 Aggregated results from Normalized YouTube and CLAS data. 394.3 Content descriptive words results. . . . . . . . . . . . . . . . . 414.4 Opinion word results. . . . . . . . . . . . . . . . . . . . . . . 434.5 Results of Content Descriptive word use percentage . . . . . . 454.6 Results of Opinion word use percentage . . . . . . . . . . . . 475.1 Descriptive Statistics for quick tagging vs plain video interface. 53C.1 Comparison results between annotations and comments inmath courses from CLAS. . . . . . . . . . . . . . . . . . . . . 89C.2 Comparison results between annotations and comments inlibrary courses from CLAS. . . . . . . . . . . . . . . . . . . . 90ixAcknowledgementsFirstly, I would like to thank my loving family. Without them, I may havegiven up. Of course, I also want to thank my supervisor, Dr. Sidney Fels,for overseeing this work, as well as Dr. Gregor Miller and Matthew Fong,who worked with me on much of this work. Finally, I would like to thankmy best friends, Qingqing and Marco who have also always accompanied mewhen I needed support most.xChapter 1IntroductionVideo has a long history, constituting a vivid and entertaining way of stim-ulating a learner’s interest, conveying desired knowledge, and being used asan educational tool, such as through documentaries and visual tutorials. Inthe past decade, free online video hosting services, such as YouTube, haveled to a significant rise of varied educational videos across many topics, con-tributed to by individuals and well-known universities alike. Videos havebecome central to the student learning experience in the current generationof Massive Open Online Courses (MOOC) from providers such as Coursera,edX and Udacity. In 2014, at least 20,000,000 learners registered to at leastone MOOC1. These online education frameworks offer complete courses de-livered over the web and video has become a core aspect for self-learning.For learners who seek information on specific topics, sources such as KhanAcademy offer more specifically targeted videos.An empirical study of MOOC videos by Guo et al. [12] indicated thatvideo production style often affects student engagement. They summarizedfour typical styles, including classroom lectures, the “talking head” shot ofan instructor, digital tablet drawing format, and PowerPoint slide presenta-tions. Some interesting findings from their work revealed that shorter videosare much more engaging and that students engage in different ways with lec-ture and tutorial videos. In this paper, video engagement was measured byhow long students spend watching each video and whether they attemptedto answer post-video assessment problems. Lecture videos usually presentconceptual (declarative) knowledge, whereas tutorials present how-to (pro-cedural) knowledge (e.g., problem solving walkthrough). The findings ofGuo et al. suggest that students expect a lecture to be a continuous streamof information, so instructors should provide a good experience for firsttime watchers. For tutorials, re-watching and skimming should also be sup-ported. While issues can be solved by investing heavily in pre-productionlesson planning, providing an efficient mechanism for learners to facilitatetheir video learning process to keep them engaged can further enhance the1https://www.edsurge.com/news/2014-12-26-moocs-in-2014-breaking-down-the-numbers1Chapter 1. Introductionsolution.Five typical video watching scenarios under learning contexts are sum-marized by Kim et al. [15] including the following: re-watching to betterunderstand a concept, textual searching for a specific phrase mentioned bythe instructor, visually searching for a code example scene, returning toa specific slide, and skimming a trivial lecture. To support learners, theresearchers fed patterns of interaction data back into the video navigationinterface. Although the collective interaction data might indicate pointsof learners’ interest, confusion, or boredom in videos, the data-driven tech-niques might also ignore other potentially important points. Specifically, theinteraction data in their work is limited to clickstream logging, and othertypes of interaction data like active bookmarking and content streams (suchas voice) should also be explored to obtain comprehensive results.There are other works trying to approach the issue of supporting learn-ers’ navigation and recalling video content. For facilitating navigation ofblackboard-style lecture videos, Monserrat et al. [21] developed and evalu-ated NoteVideo and its improved version, NoteVideo+, systems that supportin-scene navigation and directly jumping to a video frame instead of navi-gating linearly through time. More specifically, the systems automaticallyidentify the conceptual ‘object’ of a blackboard-based video and then cre-ate a summarized image of the video and using it as an in-scene navigationinterface that allows users to directly jump to the video frame where thatobject first appeared. However this solution is limited to a specific videoformat, and may not work well for less visual information video formats,like “talking head” videos. Further, video digests was proposed by Pavel etal. [25] to help viewers browse and skim long informational talks, lectures,and distance-learning videos online. The key insight of their approach wasthat much of the information in lecture videos is conveyed through speech.Therefore, they presented a set of tools to help video authors create digestsby segmenting videos into a chapter/section structure and providing shorttext summaries and thumbnails for each section using a time-aligned tran-script of the speech. However, creating a video digest is a time-consumingprocess for authors. They also provided algorithmic tools for automaticallysegmenting the video and a crowdsourcing pipeline for summarizing the re-sulting segments so that authors can further refine these auto-generateddigests, if necessary. One limitation of this work is that crowdworkers maynot have the necessary background to write summaries for highly-technicalcontent.Figure 1.1 indicates users’ mental models in a video learning scenario.Learning goals or tasks motivate users to manipulate video content (such2Chapter 1. IntroductionFigure 1.1: Mental model in a video learning scenarioas skimming, re-watching, etc.). Meanwhile, video content affects users’watching behaviors, such as engagement. Based on this mental model, wepropose a quick tagging mechanism leveraged on a video interface to helpstudents manipulate video content such as searching, chaptering and recall-ing as they learn, aimed for enhancing their video learning experience. The“quick” here is an absolute concept instead of a relative one. We are notfocused on comparing our quick tagging mechanism with other tagging sys-tems. We believe that every tagging system is well designed to satisfy itsown system requirements. As for video learning applications, the “quick”tagging is designed to facilitate the process of manipulating video contentand recording useful learning information. It is a way to help students fo-cus on learning while efficiently recording learning content. The quicknessis a deterministic element which is amplified from three perspectives. Weprovide pre-defined tags for users to remove the hinder of being unable tothink of any tag. The process of tagging video content adopts the familiarmetaphor of marking textbooks. The tagging interaction is well adaptedfrom user habits of highlighting pdf files. Thus, we measure the “quick”from two metrics: efficiency and usefulness. Efficiency means that userscan easily use our quick tagging interface with one-click efforts. Usefulnessmeans that our quick tagging mechanism is helpful to qualitatively improvetheir learning experience and performance.Searching, chaptering and recalling video content is especially difficultwith educational videos, as such videos are generally visually similar. There-fore, it is important to find a way to help learners recognize and remember3Chapter 1. Introductionmajor topics in video clips. In the learning process, it is common to comeacross interesting or confusing content, and it is thus helpful for learners torecord their current mental state for later review.Tagging content using keywords is widely used on the web, in socialmedia and in applications such as photograph organization, taking shape invarious forms such as tag clouds and clickable search terms. Tags are usedfor search and recall of content, and as a method for giving feedback oncontent (often used in social media). Within educational contexts, tags canbe used by students to help them search for their content at a later dateduring their studies, aid them in recall (e.g. why they viewed this video)and provide feedback to the instructor.Tagging systems for expressing preference typically take the form of aunary mode (“like”/“dislike”), multi-valued (star ratings), or text-based.Each system must make a tradeoff between amount of cognitive load re-quired and the ability to express a full range of reactions. More specifically,for text-based tagging systems, there is a tradeoff between wide tag vo-cabulary and restricted tag vocabulary. In other words, there is a balancebetween flexibility and usability that must be found. Although a restrictedtag vocabulary sacrifices flexibility to some degree, it can be more easilysummarized by automated methods, filtering, and searching and it has alower cognitive load. A restricted version can be best if a strong method fordefining tags is used, such as maximizing tag concept range and limiting tagnumber.Brame [4] demonstrated the Cognitive Theory of Multimedia Learningwhich gives rise to several recommendations about educational videos. Oneof the recommendations is to highlight important information (called Sig-naling) which can reduce extraneous load and enhance germane load. InCognitive Load Theory, extraneous load refers to cognitive effort that doesnot help the learner toward the desired learning outcome. Alternatively,germane load refers to the level of cognitive activity necessary to reach thedesired learning outcome - e.g., to make necessary comparisons, completeanalysis, and elucidate the steps necessary to master a lesson. Establishingpredefined tags that are representative of active learners’ reactions can beuseful for video learning. Such predefined tags can be brief out-of-video textwhich can help explain purpose and context for the respective video. Inaddition, representative tags can maximize tag concept range, motivate stu-dents to tag, and alleviate the tagging hinderance. Sen et al. claim that 68%of non-taggers in their study [33] did not tag because they simply could notthink of any tags. If we can provide some useful tags in the tagging inter-face, this tagging hinderance may be alleviated. Two different approaches41.1. Research Questionto creating a useful collection of tags were suggested by Velsen et al. in theirincorporating user motivations to design for video tagging study [37]. Onewas to motivate all users to tag resources from a system launch on. Theother one was to let professionals tag resources until the audience at largeis familiar with tagging and more interested in tagging resources. For thelatter approach, it was costly to exhaust all the professionals of every coursesubject. In addition, we cannot ensure the word choices of professionals leadto interesting resources or are easy to understand for the ‘average’ user.Meanwhile, an optimal number of tags should be decided in order tobuild a well restricted tag vocabulary. If the number of default tags is toolarge, it will require a larger cognitive load for taggers to determine usefultags. If the number of tags is too small, it will not cover the full range ofreactions, which thus cannot be fully representative and can cause users notto use the predefined tags.Reactions to video content can also be highly sensitive to studying con-texts. In social and voluntary posting contexts such as YouTube, learnersare free to react however they like to video material. But in a context wherestudents are encouraged to tag (ie. under instructor supervision), studentsmay hide their real reactions from their instructors. In light of this, inour work, we focus on collecting one-word tags from two datasets, namelyYouTube and the Collaborative Learning Annotation System (CLAS) [29],to determine how best to help learners recall video content. CLAS is a me-dia player used to record, share, and comment on videos. Our main goal isto establish useful predefined tags representative of the reactions of activelearners, and to decide which default tag number is optimal. Further detailsare described in our background study in Chapter 4. Besides predefinedtags, the “quick tagging” of our tagging mechanism is also amplified fromvideo part selection and applying tags. More details about interface de-sign strategies for quick tagging educational videos are described in Chapter3. To explore the design space of the quick tagging interface, we build aweb prototype and evaluate it in a controlled study, which is described inChapter 5.1.1 Research QuestionThe purpose of this work is to apply the quick tagging mechanism to a videolearning context, meanwhile keeping it simple enough so that learners canefficiently complete common learning tasks. To direct this work, we proposethe questions:51.2. Contributions1) Do users feel it is efficient and useful enough to perform quick taggingon video content while finishing their learning tasks? 2) In perspective ofusefulness, does quick tagging mechanism help students recall video content?1.2 ContributionsWe focus our efforts on answering our aforementioned research questionin a learning context. We developed a new video viewing and navigationinterface designed to integrate the quick tagging mechanism for efficientlybookmarking video content. We ran one background study, two pilot studies,and one controlled lab study to collect predefined tags, test the design, andverify its efficiency and usefulness. Thus, the main contribution of this workis to explore and investigate how a quick tagging mechanism enhances theuser learning experience from the perspective of efficiency and usefulness inrecalling video content.In this work, there are two levels of contribution. The first is a back-ground study comprised of content analysis and an online survey. Thisstudy gave us insights on what tags are commonly used by learners to de-scribe video content and how the polarity of reaction words can be biased bylearning contexts. The second contribution is investigating user preferenceand watching patterns on our quick tagging interface, comparing user per-formance in the quick tagging interface with hand-written notes to finish aquiz related to the video content. Through the user study, we demonstratethe effectiveness of the quick tagging interface where users bookmark videocontent as well as its usefulness in improving video content recall. We alsofound implications for further improving the quick tagging mechanism in as-pects of the predefined tags, the quick tagging interaction, and the interfacevisualization.1.3 PublicationsAt the Graphics Interface Conference (2016), we published one paper. Thepaper [10] designed an interface which uses textbook-style highlighting ona video filmstrip and transcript, both presented adjacent to a video player.The qualitative results indicated that the familiar interaction of highlight-ing text was preferred, with the filmstrip used for intervals with more visualstimuli. My main contribution was to help conduct a preliminary investi-gation and qualitative user study together with G. Miller and M. Fong. I61.3. Publicationsalso contributed to improving the highlighting interface by taking part inthe design meeting.Together we submitted a paper for CHI 2018 which covered the work ofChapter 5 and part of Chapter 4.7Chapter 2Related WorkIn this chapter, we first introduce the definition of tags and tagging in re-lation to their functions, advantages, and applications. Our mechanismof using predefined tags for bookmarking educational video content effi-ciently relies on tag taxonomy methods, video annotation, and video in-terface design for education. The overview of taxonomy looks at methodsfor synonym-based word frequency analysis and various verb classifications.We take methodology cues from the presented works to integrate into ourown methods for tag taxonomy, which considers the learning context factor.There are currently several video annotation tools being developed, allowingusers to mark up video in personalized or collaborative ways, and forminga spectrum from manual to fully-automatic that involves different methodsof annotations, such as text, ink, and link. Lastly, the way we choose thevideo interface features and elements is based on the respective context ofeducational video content and learning.2.1 Tags/TaggingThere is a rich thread of research in studying the functions and advan-tages of tags and tagging. Scott et al. [11] identifies seven functions thattags perform for bookmarks. The authors mention that most tags identifythe topics of bookmarked items, and also indicated that adjectives such as“scary”, “funny” and “stupid” were inspirational tag bookmarks accordingto the tagger’s opinion of content. Collaborative tagging [19] is a practicewhereby users assigned uncontrolled keywords to information resources. So-cial tagging [36] is defined as the collaborative activity of marking sharedonline content with keywords, or tags, as a way to organize content for futurenavigation, filtering or searching. Storey et al. [34] investigated whethercombining waypointing and social tagging is a useful metaphor to supportnavigation in the software space of source code and related artifacts. Insoftware space, the authors expected waypoints to be locations of softwaremodel elements, or locations that correspond to a file name and line numberfor any type of file, or for any version of a file. Here, waypoints are indexed82.1. Tags/TaggingFigure 2.1: The user interface of an Interactive Shared Education Environ-ment, by Mu [23]through a set of tags supplied by programmers. Further, in a tagging survey[33], it was found that most users who use tags think that tagging featureshelp them express opinions and organize movies.Tags have been widely used in learning tools, social document annota-tion systems and social media. In the data-driven interaction design foreducational videos [15], word clouds were used to help learners recognizeand remember major topics in a video clip. In a social document annota-tion system study [40], Zyto et al. argued that some comments used lots ofscreen real estate to convey small bits of information, sometimes obscuringmore substantive information. In their results of examining the commentsof 5 words or less, they found that 2.7% of the total comments could havebeen replaced by 8 tags without loss of meaning. Tagged and Hashtaggedare two of the verbs in a unified naming schema in the Connected LearningAnalytics (CLA) toolkit [16] which enables data to be extracted from social92.1. Tags/TaggingFigure 2.2: The user interface of Videotater, by Diakopoulos et al.media to facilitate a set of learning activities. Niemann [24] presented anew way to automatically assign tags and classifications to learning objectsoffered by educational web portals, to address the issue of where data setsoften suffer from sparsity when seen in the educational domain. Han et al.[13] aimed to understand the formation of a social network based on twodimensions - Tags and Likes. In the network, the authors used tags to findpeople with the same interest, and Likes to find connections among thosepeople.Pina et al. [26] used synonym-based word frequency analysis to developa taxonomy of public health quality improvement concepts. They analyzedpublic health-related documents for word frequency, identified the most fre-quently recurring word-meaning clusters, and created high-level categoriesbased on the ranked synonym cluster. In this paper, our goal is to find mostfrequently-used words, with three high-level categories in mind. So, to adaptour goal, we have used the last two steps of the method presented by Pinaet al. Korhonen et al. [17] presented a substantial extension to Levin’s tax-onomy which incorporates 57 novel classes for verbs. The Lexical-semanticverb classifications have proved useful in supporting various natural lan-guage processing (NLP) tasks. Sabine [31] investigated whether humanassociations to verbs, i.e., the words that are called to mind by stimulusverbs, as collected in a web experiment, can help us to identify salient fea-tures for semantic verb classes. These methods only target verbs and classifywords based on human associations. In our work, we target a wider rangeof words, such as verbs, nouns, adjectives, etc.102.2. Video AnnotationFigure 2.3: The user interface of LEAN, by Ramos et al.2.2 Video AnnotationIt is difficult to translate the characteristics of annotations, such as highlight-ing, context-based notes, and organization from the traditional paper-basedmedium to a time-based digital format. A video annotation tool called In-teractive Shared Education Environment (ISEE) [23], which automaticallygenerates hyperlinked timestamps to associate notes with video content, wasdeveloped to explore issues in video annotation. Figure 2.1 illustrates theoverview of ISEE’s graphic user interface. The system is composed of fourcomponents to support different video-based collaborative learning scenar-ios. Here we focus on the Interactive Chat/Annotation Room (ICR) whichis located on the bottom right. The ICR records a clickable link (Smartlink)for each annotation and automatically associates it with a video segment.Clicking one of these Smartlinks navigates the video forward or backwardto that specific timestamp and starts playing the video from there.To examine users’ notes-taking behaviors in both individual and collab-orative distance learning environments, the usability of ISEE was tested byan empirical comparison study. The results showed that the tool facilitatedusers’ in situ video annotation by allowing users to directly connect theirnotes with video context. The results also indicated that annotation de-112.2. Video AnnotationFigure 2.4: The user interface of the Family Video Archive, by Abowd etal.pends on the content of information and cues from the lecturer as to whatinformation is important. These findings have motivated us to help usersconnect their tags with video context so that they can quickly recall videocontent by referring to key words such as “difficult”, “confusing” and “in-teresting”, as well as topic tags. Indeed, the principles of note-talking stylechange over time, and we believe that our quick tagging mechanism can beconsidered complementary to traditional note-taking styles.The evaluation method from an early paper from Bargeron et al. [2]further inspired us. Specifically, their first study of annotation creation onstreaming video explored the use of Microsoft Research Annotation Sys-tem (MRAS) for taking personal notes compared with hand-written notes.MRAS is a web-based client/server application which supports video an-notation creation and threaded discussions. This work introduced videoannotation in an early stage. Similarly, our work introduces tagging as anannotation tool for educational video in an early stage.A think aloud test on the usefulness of VideoJot [28] indicated that122.2. Video AnnotationFigure 2.5: The user interface overview, by Kim et al.both a text annotation tool, as well as a spatial annotation tool are nec-essary to fulfill the complete information needs of video annotation. Theyalso compared still annotations with moving ones to explore the issue of an-notating complex scenes in the video. In addition, LikeLines in VideoJot, aone-dimensional heatmap, was recommended to bookmark interesting partsof long videos. Although their evaluation focused on live and recorded videostreams of people playing video games, this work motivated us to thinkabout how to adopt these available video annotation features into our quicktagging interface, especially when educational videos include complex sceneswith dense information, such as formula explanations.Videotater [7], an experimental tool for a Tablet PC, supports the ef-ficient and intuitive navigation, selection, segmentation and tagging of avideo. To specifically facilitate the tasks of rapid manual segmentation andtagging, the visual scent of the underlying pixel colors and their evolutionon the timeline immediately signals to the user where appropriate segmentboundaries should be placed. In addition, rapid review and refinement ofmanually or automatically generated segments is also supported. A distri-bution of modalities in the interface by using multiple timeline representa-tions, pressure sensing, and a tag painting/erasing pen metaphor was alsoexplored. As shown in Figure 2.2, the timeline displays two different viewsof the underlying video, namely timeline segments (b) and timeline strip132.2. Video AnnotationFigure 2.6: The highlighting user interface, by Fong et al.images (c). The tagging paint is selected from the tagging view (a) anddrawn over the timeline wherever it should be applied. When drawn on asegment (b), the tag is applied to the entire segment, whereas drawing onthe strip image (c) applies the tag only to the frames touched. This paper isa good example of showing how actions such as video navigation, selection,segmentation and tagging are coupled to support a better user experienceof video annotation.Another paper from Ramos et al. [27] shares the same design prin-ciple. A variety of fluid interaction and visualization techniques for navi-gating, segmenting, linking, and annotating digital videos using a pressure-sensitive pen-based interface are demonstrated within a concept prototypecalled LEAN. An overview of the system can be seen in Figure 2.3, which iscomposed of Video, Video Segment, Annotation, and TLSlider.On the opposite end of the spectrum is the fully-automatic approach.The thesis by Morris [22] explores the discovery and effective use of automatic-semantic tags for navigating through video key frames in the unstructuredvideo presentation domain. Machine learning is used to automatically gener-ate classifiers for selected visual concepts. game. Meanwhile, Morris createda new video browser called VastMM-Tag to provide users with data gener-142.2. Video AnnotationFigure 2.7: Video Digests, by Pavel et al.ated by the semantic tagger.As automatic solutions do not always suit the criteria, we investigate asemi-automatic video annotation tool. The informal nature of home moviesmakes it difficult to use fully-automated techniques for scene detection andannotation. The Family Video Archive [1] explores the symbiosis betweenautomated and manual techniques for annotation, as well as the use of azooming interaction paradigm for browsing and filtering large collections ofvideo scenes. A screenshot of the annotation interface is shown in Figure 2.4.The interface mainly supports the annotation of the currently viewed scene.The upper-middle panel of Figure 2.4 shows all metadata associated with thecurrent scene. There are three kinds of annotations possible: date, freeformtext, and a metadata tag.Instead of a personalized solution, an aggregated method also makessense in certain scenarios. An approach using crowd-sourcing informationstreaming to semantically annotate live broadcast sports games, and select-ing video highlights from the game with these annotations was proposed byTang et al. [35]. The selected highlights are specific for fans of each team,and these clips reflect the emotions of a fan during aThe underlying concept of the Collaborative Lecture Annotation Sys-tem (CLAS) [29] is that CLAS moves away from user-defined annotationtools (such as note taking tools) and toward a system-defined annotationstrategy; e.g. the annotation has the same meaning for all users. This isa medium through which students and instructors can engage in meaning-making around course content, record important moments, and once fin-ished, navigate these important moments. CLAS relies on semantically-constrained annotation, post-annotation data aggregation, and transparent152.3. Video Interfaces for EducationFigure 2.8: NoteVideo+, by Monserrat et al.display of this aggregated data. Our system is not designed as an alterna-tive to CLAS, but instead can be considered as complementary to CLAS’ssemantically-constrained annotation. Indeed, our system also helps stu-dents record important moments, manage their video content, and navigatethese important moments while also providing varied approaches in orderto facilitate user tagging behaviors while they watch and learn from videos.Generated tags can be further aggregated in a collaborative annotation sys-tem.The Rich Interactive Multimedia Exercise System (RIMES) [14] is a sys-tem for authoring, recording and reviewing interactive multimedia exercisesembedded in video lectures. In their RIMES creation experience study, re-searchers coded all RIMES exercises by subject area, by knowledge type,and by the level of cognitive process, using the revised version of Bloom’sTaxonomy [18]. The original Bloom’s Taxonomy [5], as well as the revisedversion [18], are common frameworks used to evaluate the level of learningassessed in exams and tests [39]. In our work, we used the guitar videocontent and a quiz to simulate the learning process from the remember tothe analyze level based on the original Bloom’s Taxonomy [5].2.3 Video Interfaces for EducationFree online education video platforms such as Khan Academy, Coursera,edX, Udacity, MIT OpenCourseWare, and YouTube are watched by millions162.3. Video Interfaces for Educationof people. With this unprecedented scale of educational video consumption,there are many challenges and opportunities for us to explore in the designspace of educational video interfaces.Data-driven interaction techniques for educational video navigation areexplored by Kim et al. [15]. This study summarizes a few typical videowatching scenarios in learning contexts, including: rewatch, textual search,visual search, return, and skim. A set of techniques that augment existingvideo interface widgets are also included. As shown in Figure 2.5, threesets of novel interaction techniques are presented in the above study. Thedynamic timelines are comprised of the following: Rollercoaster Timeline,Interaction Peaks, and tracing of personal watching. In-video searching isenhanced by keyword searching and interactive transcript. Highlights aremade with Word Cloud, Personal Bookmarks and the Highlight Storyboard.Traditional playback controls (play, pause, seek) do not suit well forsupporting recall, history, or interval bookmarking. An interface which usestextbook-style highlighting on a video filmstrip and transcript, both pre-sented adjacent to a video player, was designed by Fong et al. [10] forstudents to manage their video collection and quickly review or search forcontent. The main video player view is shown in Figure 2.6. According totheir experiment results, including transcripts in the videos offers more util-ity for users and will allow them to highlight, search, and review videos moreeasily. This work inspired us in the design of our quick tagging interface.It is difficult to browse and skim the content of long informational lec-tures using current timeline-based video players. In response, video digests[25] are a new format for informational videos that allow browsing and skim-ming by segmenting videos into a chapter/section structure and providingshort text summaries and thumbnails for each section. A set of tools havebeen presented to help authors create digests using transcript-based inter-actions. As shown in Figure 2.7, there are two panes in the interface. Oneis an aligned transcript pane on the right for navigating, segmenting andsummarizing the talk. The other is a WYSIWYG editor pane on the leftfor adding chapter titles, summaries, and keyframes for each section. Addi-tionally, a progress overview scrollbar in the transcript pane allows authorsto view their segmentation progress and return to areas for refinement.The typical goals of students, like quickly finding a particular conceptin a blackboard-style lecture video, are not adequately supported in currentvideo navigation tools. An improvement based on NoteVideo, NoteVideo+[21] is a system for identifying the conceptual ‘object’ of a blackboard-basedvideo and then creating a summarized image of the video to be used asan in-scene navigation interface that allows users to directly jump to the172.4. Summary and Influence on Designvideo frame where that object first appeared, rather than navigating linearlythrough time. In Figure 2.8,NoteVideo+ adds a search box for the transcript(a), a hovering transcript text over the visual elements (b), and a scrubber(c) to address two limitations of the previous interface.2.4 Summary and Influence on DesignIn this chapter, we have covered tags/tagging, video annotation, and ed-ucational video interfaces. In the tags/tagging section, we reviewed thefollowing tags/tagging functions and advantages: identifying topics of itemsor expressing taggers’ opinions[ [11], [33], [15]], organizing content for fu-ture navigation, and filtering or searching[ [36], [34], [33], [24]]. We see thattags can be extracted from a large quantity of comments [40] or social mediadata[ [16], [13]] to support learning activities or help understand social net-work relationships. We also reviewed a word frequency analysis taxonomymethod [26] and semantic verb classifications in supporting various naturallanguage processing (NLP) tasks[ [17], [31]]. In the video annotation sec-tion, we have gained insights on how video navigation, segmentation, andannotation are associated with one another. We also learned that user notes-taking behaviors are different between personal and collaborative learningenvironments. Finally, we investigated different types of educational videointerfaces, which had direct implications on the choices of video interfacefeatures and elements.Related works to this study show the promise of integrating tagging fea-tures into educational video interfaces to support learning activities. Forvaried formats of course video content (such as talking heads, long infor-mative videos, blackboard-style lectures and so on), tags can be useful foridentify course topics, expressing learners’ opinions, and helping recall videocontent. Additionally, it is important to consider tagging valence (such ashaving all positive tags or having a range of tags) for different learning en-vironments. Similar to early works of video annotation[ [23], [2]], we havecompared our quick tagging mechanism with traditional notes-taking meth-ods as a starting point to investigate the usefulness and efficiency of ourquick tagging mechanism in educational video interfaces.18Chapter 3General Interface Design forQuickly Tagging EducationalVideosBased on our pilot study with eight student participants at the University ofBritish Columbia, we developed an interface that supplies users with four ba-sic functions: watching video, multiple video navigation ways, tagging videoparts with two orders, and deleting tagging applications in “un-tagging”mode. The important components in our design are: the quick tagging mech-anism (including pre-defined tags, video part selection, and applying tags orvice versa) and a video player adjacent to a video filmstrip and transcript.In this chapter, we describe the complete video interface that integrates ourcomponents with features from typical online educational platforms. Wealso rely on some of the design decisions found by Fong et al. [10].3.1 GroundworkThe preliminary investigation by Fong et al. [10] informs us that being ableto select intervals of video for manipulation is useful for users to manage theirvideo collections. As found in the study, users wanted more control over theability to emphasize or de-emphasize certain video parts without having towatching the video over and over again. It was stated that being able to seewhich parts of the video needed more attention would be beneficial in theuser’s reviewing process.As shown in Figure 2.6, we decided to use Fong et al.’s three major el-ements: the player, the filmstrip, and the subtitle viewer. For our study weshared their design idea that allows users to manipulate emphasis on certainvideo parts by tagging and to use these tags to review video content. We alsoshared their metaphor of textbook-style tagging for educational video mate-rial. More specifically, we supported user tagging on both the filmstrip andtranscript. We note that our work is not an alternative to the highlighting193.2. Use Casesystem of Fong et al., as tagging can be integrated into their highlight-ing system. From one perspective, tags can provide more information thanhighlighting colors; in general, highlighting colors can be integrated into one“important” tag. However, highlighting does afford greater visualizationstimuli than tags. From another perspective, our quick tagging shares thesame manipulation interaction with highlighting. In the study of Fong etal., their results showed that highlighting on a transcript was preferable tohighlighting on a filmstrip. Thus, we wanted to explore whether this findingcould be applied to our quick tagging method.3.2 Use CaseAlice is a first year college student in Electrical and Computer Engineering.This term, she starts learning an entry-level electronic circuit course. Oneday, she is asked by her instructor to preview the video lecture about pnjunction. In the beginning, Alice just linearly watches the video. Althoughshe knows that there are some differences between P material and N materialin pn junction, the underlying difference between them explained from thislecture is quite new and interesting for her. Thus, she marks the video partas “important” and “pn junction” so that she can rewatch it later. After awhile, Alice feels that this lecture seems somewhat trivial. She decides toskim it to see if there is something she probably shouldn’t miss. Soon, shefinds out that there is a tutorial and starts watching the corresponding videopart. She gets stuck by one key step of the tutorial exercise. She decides totag it as “difficult” and then go back to digest it once she finishes watchingthe whole tutorial. She still has several video parts which are tagged as“difficult” and “confusing” when she finishes watching the whole video. Asshe has other assignments to do, she plans to discuss those parts with herclassmates and instructor next week in the classroom.3.3 Overview of Quick Tagging InterfaceA screenshot of the quick tagging interface can be seen in Figure 3.1. In themiddle, the large picture is the main viewer, similar to most video players.Clicking on the main viewer allows the user to toggle between playing andpausing of the video. This same functionality can also be found in YouTube’splayer as well as various video viewing applications found in the mobilespace. On the bottom is a toolbar that houses video controls, allowing theuser to play or pause the video and view the current playing time. Users203.3.OverviewofQuickTaggingInterfaceFigure 3.1: Overview of the quick tagging interface. Here, we see the transcript (green), the filmstrip (red), anda tagging field (blue).213.3. Overview of Quick Tagging Interfacecan also skip forward or backward five seconds by using the arrow keys.The rest of the interface, such as the filmstrip (red, Section 3.3.3) andthe transcript (green, Section 3.3.4) will be described in detail below. Wefirst describe our pre-defined tags.3.3.1 Pre-defined TagsWe focus on two classifications of our pre-defined tags in learning contextsto help recall video content, namely reaction tags which are used to expressviewer feelings or opinions, and topics which describe video content. Asreaction tags are subjective, one of our assumptions is that the negativeand positive valence and the intensity (e.g. like and love) of such tag wordswill be biased under different learning contexts. To maintain low cognitiveload and save interface space, the optimal default tag number with highestpreference and least tagging efforts should also be considered. Topics, onthe other hand, are objective and strongly related to course subjects andcontent. In Chapter 4, we discuss how we got the most used tags and theoptimal number of tags to display in our quick tagging interface as well asin other design implications.In Figure 3.3, we can see that there are ten pre-defined tags which areused in our lab study in Chapter 5. The five reaction tags are chosen fromstudy results in Chapter 4 according to the metrics of high use frequency andword valence and intensity under learning contexts. The five topic tags areextracted from video content and a field is available for users to input theirown tags. We have made this decision with the hope that our participantscan focus on the quick tagging mechanism instead of experiencing a lackof availability while inputting tags. To uphold consistency throughout theentire interface, the tagging field will be used in playhead tagging, transcripttagging, and filmstrip tagging.3.3.2 State Diagram of Quick Tagging ModeAs shown in Figure 3.2, there are two modes in our quick tagging interface,namely tagging and untagging mode. In tagging mode, there are three tag-ging methods: playhead tagging, transcript tagging, and filmstrip tagging.For playhead tagging, users can use one hot key (Ctrl + z) to perform one-click tagging. Here we used hot key to achieve the one-click interaction ondesktop. Our study was not focused on a particular hot key in the tests so itwas selected for convenience. There was no undo mechanism implemented.Thus, as there was only one hot key that the participants had to learn with,223.3. Overview of Quick Tagging InterfaceFigure 3.2: Tree state diagram of three levels of quick tagging mode.From left to right, the first level has two modes: tagging and untagging.The second level shows two branches of three tagging each, for the allowedtagging actions in corresponding mode. The third level shows two taggingorders and other ways of interaction.overloading will not be an issue. For transcript tagging, users choose tagsfirst and select video content (tag region) in tagging mode. A similar orderapplies to filmstrip tagging in tagging mode. However, in untagging mode,there is one tagging deletion action and two tagging methods, called tran-script tagging and filmstrip tagging. Contrary to the order in tagging mode,both transcript tagging and filmstrip tagging need users to select video con-tent first and then choose tags (region tag) in untagging mode. In general,Noun-Verb order is best for users who prefer to use the same tags to tag allvideo content. Alternatively, the Verb-Noun order works well for users whofrequently change tags. In untagging mode, users can delete their taggingswith one-click. Further descriptions will be provided below.233.3. Overview of Quick Tagging InterfaceFigure 3.3: Tagging field with user input tag (green) and ten pre-definedtags comprised of five reaction tags (red) and five topics (blue). The taggingmode (“tagging” and “untagging”) is switched by a toggle button (purple).In this figure, “tagging mode” is displayed.3.3.3 FilmstripThe filmstrip is the interface element bound in red in Figure 3.1. Thefilmstrip allows users to seek to any part of the video, and provides userswith a preview to allow better seeking accuracy, as well as timestamps tohelp judge the location of the potential seek.The filmstrip is viewed as a set of thumbnails from the video arrangedside by side, each representing a portion of the video. As the width ofthe filmstrip represents the entire length of the video, each n thumbnailrepresents 1/n of the video. Granularity should depend on video length.Moving the cursor over top changes the corresponding thumbnail to show theframe represented by the horizontal location of the cursor and correspondingtimestamp. The initial image is the first frame of the represented interval.The red bar acts like a playhead.243.3.OverviewofQuickTaggingInterfaceFigure 3.4: The filmstrip splits into multiple rows, each representing a portion of the video. Black diagram (blue)displayed on the top of each thumbnail represents the tagging history.Applying Tags on FilmstripThere are two ways to apply tags on a filmstrip. One way is called playhead tagging where users can use one hotkey (Ctrl+z) to add tags. To perform playhead tagging, users need to be in tagging mode and select pre-definedtags or input a tag from the tagging field, shown in blue on the top of Figure 3.1. The video part will be thecurrent spoken sentence. More specifically, the tagged video interval will be the sentence around the playhead253.3. Overview of Quick Tagging Interfacetime where the tagging action occurs. The video length of the sentencebeing spoken is known from the audio transcript.Another method is called filmstrip tagging, shown in Figure 3.4. Thereare two orders available here. One is to select thumbnails first and thenchoose tags (or input one from tagging field) to apply tags on filmstrip(region tag). The other order is to choose tags (or input one from thetagging field ) and then select thumbnails to apply tags on the filmstrip (tagregion). Figure 3.4 shows the tag region order in untagging mode. Regiontag order can also be achieved in tagging mode, by choosing tags from thedrop-down menus in Figure 3.3 and then selecting thumbnails to apply tags.Tags Empowering Navigation on FilmstripThe black diagrams (highlighted in blue) in Figure 3.4 will be displayed onthe top side of the thumbnail to indicate the tags made in the correspondingvideo part. If there are multiple tagged video parts for each thumbnail, theopacity of overlapped black diagrams will be darker. The user can click theblack diagram to navigate the video. To provide more text information,the corresponding tags will pop up when hovering on the black diagram.Correspondingly, the temporary tagging history (highlighted in green) willshow up beside the red bar when the tagged video part starts, and disappearsafter it ends.3.3.4 TranscriptThe transcript, shown on the left in Figure 3.1 in green contains a transcriptof everything said in the video. This provides an overview of the video intextual form, allowing users to quickly scan through the spoken content ofthe video. As the video plays through, the sentence being currently spokenturns red and acts like a playhead. On the left of each caption is a timestamp,which is clickable to allow users to jump around the video.Applying Tags on TranscriptLike the filmstrip, the user can perform both playhead tagging and transcripttagging to apply tags on a transcript. Similarly, the two orders can be usedfor transcript tagging in two tagging modes. The bottom of Figure 3.5 inblue shows the order to select texts first and choose tags (or input a tag) toapply tags on the transcript during untagging mode.263.4. Design StrategiesTags Empowering Navigation on TranscriptAfter the user tags a video, the tagging history will be shown in the tran-script. The tagging history, shown on the top of Figure 3.5 in red, iscomprised of a box to circle around the selected texts, an arrow, and a ver-tical tag bubble. The user can click this tag bubble, causing the player toseek the video part from the start. This navigation function can be quitehelpful when users browse through the transcript and tagging history, espe-cially for longer videos. In addition, users can refer to the tag history anduse the timestamp to navigate.3.4 Design StrategiesIn general, quick tagging here refers to a function that provides an easy andfast way for users to mark course video parts with predefined semantic key-words. More specifically, users can easily and quickly choose tags and applythem to course video content. The “quickness” of this design is amplifiedby three aspects:Predefined tags: In a study by Sen [33], the author claims that 68%of non-taggers in the study did not tag because they could not think of anytags. We suppose this might also be an issue for users who are non-Englishnative speakers. In light of this, we provide pre-existing tags for users tochoose from, hoping to ease this dilemma. These tag words are extractedfrom comments and annotations of course videos in YouTube and CLAS andare noted as the most commonly-used by people when describing video con-tent. These pre-existing tags can even save time for English native speakers.In IBM’s Efficient Video Annotation(EVA) tool [38], all annotations applyterms from a small controlled-term vocabulary, and no free text annotationsare allowed. This promotes consistency, simplicity, and speed of annotation.Video part selection: In [8] and [20], the appropriate granularity ofvideo segmentation is an important issue to consider for quick video anno-tation and tagging. According to the interview results from [8], the mostuseful granularity of segmentation in general is the shot level, namely a videosegment with in and out points. Since video represents a special case of asegmented continuous variable [8], and video parts are used as the navigationpoints inside educational videos [15], we decided to choose video part as ourtagging application in this paper. Here, the video part shares the same gran-ularity of segmentation with the shot level. However, unlike Videotater [8],precision of segmentation is not our priority. So, in order to make the videopart selection process fast and simple, we use an auto-selection method.273.5. SummaryMore specifically, once users find video points they want to mark, they caneasily generate video parts with one click. According to [10], the familiarinteraction of highlighting text was preferred, which inspired us to use textselection and filmstrip selection as two other ways for selecting video parts.Applying tags: The evaluation results of [38] tell us that the taggingtask can be more efficient when tagging multiple frames for only one conceptat a time. This conclusion was based on the assumption that users canclearly recognize and easily locate the corresponding frames. Drawing uponBaudisch’s work that used a painting metaphor to rate large numbers ofobjects [3], the order of selecting items and methods was categorized asnoun-verb and verb-noun. Considering the diversity of use behaviors intagging videos, one of our design decisions was to integrate the order ofvideo part selection, and include tag application with one click. We alsodecided to support both noun-verb and verb-noun orders. In other words,users are able to select text or filmstrip first, and then apply tags, and viceversa.3.5 SummaryIn this chapter, we introduced our quick tagging interface, which is composedof the main viewer, transcript, filmstrip and tagging field. The “quick” isenhanced from three perspectives: pre-defined tags, video part selection,and tag application. Our video part selection methods can be widely ap-plied to various video content (such as videos with strong transcription butpoor visualization, good visualization, and so on). We have supported twoorders for applying tags: tag region and region tag. Region tag is bestfor users who frequently change tags, while tag region works well for userswho prefer to use the same tags consistently. We then used two modes tohelp users switch between the two orders, as well as other interaction meth-ods (like tagging deletion). In Chapter 4, we will discuss how we choosepre-defined tags for educational videos, and we will determine the optimalnumber of tags in our interface. Then, in Chapter 5, we will compare ourquick tagging mechanism with hand-written notes in a controlled lab study.In the notes-taking condition, the plain video interface will comprise of amain viewer, transcript, and filmstrip. In the quick tagging condition, thesethree components (main viewer, transcript, and filmstrip) together with thetagging field, will be used.283.5. SummaryFigure 3.5: The transcript shows the tagging history and a tagging field.29Chapter 4Background StudyIn this chapter, we discuss our background study conducted to choose pre-defined tags under learning contexts. This study was divided into two parts.First, we evaluated a taxonomy method for tag words from two datasets andcollected the most widely-used tags in three categories. Second, an onlinesurvey was conducted to validate the aggregated results from the evaluationand develop the design implications for our quick tagging interface.4.1 Datasets Analysis Methodology andEvaluationIn this section, we first introduce the taxonomy methodology for analyzingvideo comments. We then describe the two datasets separately, from theYouTube educational video channel and the Collaborative Learning Anno-tation System (CLAS) 2, a video platform for learning. Considering avail-ability, we were only able to access YouTube and CLAS datasets. In our con-tent analysis, we used only philosophy subject on YouTube but many othersubjects on CLAS. Notably, we found that there were larger amounts ofcomments available in philosophy courses than other subjects’ on YouTube.In our definition of three categories of tags, the general tags applied to allsubjects and only the topic tags were dependent on subjects themselves. Inlight of this, we believe this kind of distribution will not bias our results.Finally, we provide the analysis procedure and final results in this section.4.1.1 A Taxonomy Method of Tag WordsFor analysis logic, we want to find the most common words of users toexpress their feelings and opinions while watching videos, as well as howthey describe the video content. We manually perform colour-coded wordtaxonomy for the extracted words from video comments. There are threemain steps for this process.2http://clas.sites.olt.ubc.ca/304.1.DatasetsAnalysisMethodologyandEvaluationAdmiration/Awe Desperation Happiness LustAmusement Disappointment Hatred Pleasure/EnjoymentAnger Disgust Hope PrideAnxiety Dissatisfaction Humility Relaxation/SerenityBeing touched Envy Interest/Enthusiasm ReliefBoredom Fear Irritation SadnessCompassion Feeling Jealousy ShameContempt Gratitude Joy SurpriseContentment Guilt Longing Tension/StressTable 4.1: Affect categories by Klaus. There are 36 categories in total.Choosing Appropriate Tag WordsWe call this first step, the “word cleaning step”. We filter out many irrelevant words, mainly prepositions,articles, and personal pronouns, as well as some nouns such as “video”, and verbs such as “think” or “have”. Wedo not consider words of estimative probability, such as “probably” or “certainly”, for these have strong contextdependency in a sentence. Here our research aims for one-word selections. Emotion words such as “wow” or“haha” are not semantic words, so we also remove them from this study. Although many typos are encountered,such as “ndesirable”, it is quite obvious to recognize them, notably for words with ‘n’ in the comment text.CategorizationSecond, we categorize the cleaned words into three categories: opinion words, content descriptive words, andsubject feature words. Klaus [32] claims that words to express emotions for an object are called “discrete emotionwords”. In his paper, people’s affections are classified into 36 categories, and a list of pertinent words in eachcorresponding category is provided. All of these categories are shown in Table 4.1. We found that most of the314.1. Datasets Analysis Methodology and EvaluationCategory Function ScopeOpinionwordsExpress opinions andwatching feelingsGeneralContentdescriptivewordsDescribe video contentand express opinionsGeneralSubjectfeaturewordsDescribe video contentContext-basedTable 4.2: Definition of each category from function and scope. Here wordsin first two categories can be applied to general course videos. Words in lasttwo categories both can be applied to describe video content, but the latterone is course context-based.words in Klaus’ work can be applied to express opinions in the context oftagging educational video content, as well as expressing feelings while watch-ing videos. Therefore, in our work to classify opinion words, we directly referto these 36 affection categories. For our other two categories, there are noexisting lists we can refer. We just have a general sense that content de-scriptive words will mainly be composed of subjectively adjective words andsubject feature words mainly composed of objective nouns and some adjec-tive words only used in courses of specific subject. The border line for someadjective words between the two categories is still vague. Analyzing a broadrange of subjects to validate the words in content descriptive category andsubject feature category can be one way to solve this issue. More details aredescribed in our evaluation section. Definition of each category is shown inTable 4.2.Word Frequency AnalysisWe get a big picture after this second step. But there is still a great amountof words in each category, and we find that many words have different formsbut share the same meaning. For example, “interesting” has other formssuch as “interest”, “interested” and “interests”. What’s more, words like“happy”, “glad”, and “joyful” share similar meaning. So we decide to com-bine words with different forms and similar meanings into one word set,and choose the highest frequency word as representative word of the corre-sponding set. We apply this inductive process to the three categories. Foropinion words, we decide which words are synonyms by referring to listed324.1. Datasets Analysis Methodology and Evaluationword stems of each affect category from Klaus 4.1. As for the other twocategories, three researchers in Human Computer Interaction (HCI) worktogether to validate the results of synonyms. The main researcher (myself)categorized the first version. Then another researcher validated and modi-fied the results of the first version. Finally, the third researcher did the lastround of validation and modification.4.1.2 Datasets DescriptionWe analyzed video comments from two datasets: YouTube and CLAS. Thereare two goals of this evaluation. The first is to analyze a broad range of sub-jects to validate words in the content descriptive category and subject featurecategory. More specifically, if words are categorized as subject feature words,but also appear in other subjects, we categorize them as content descriptivewords. The second goal is to output an educational video tag form sortedby frequency, and collect the most used words in each category for the nextonline study. The reasons for why we use comment words as keyword tagsis two-fold. Firstly, there are no keywords for tagging educational videosavailable in current literature that we are aware of. Second, it is clear thatmany comments in educational videos include reactions, opinions and topicsdirectly related to the video content itself.YouTube DatasetIn the YouTube dataset, we analyzed 4870 comments from philosophy courses;39 philosophy course videos are used, 14 from the CrashCourse channel3 and25 from The School of Life4 channel. We scraped all comments from thesevideos using the YouTube API, and counted how many times each word wasused. Finally, we input all words with appropriate frequency to MicrosoftExcel for analysis.CLAS DatasetThe CLAS dataset consisted of data from students in Music, Math, Libraryand Information Studies, Political Science, and Art History. The amount ofusage in each class differed based on the amount of influence the instructorhad on encouraging students to use the system: students were either toldthat viewing and posting counted for participation marks, or viewing3https://www.youtube.com/user/crashcourse4https://www.youtube.com/user/schooloflifechannel334.1.DatasetsAnalysisMethodologyandEvaluationSubjects Video # Comment # Annotation # Posting Type Year LevelMusic 3 46 321 Encouraged 1st year(1)Math 18 154 3522 Encouraged 1st year(2)Library 33 67 794 Encouraged Graduate(1)Political 25 4 28 Voluntary 1st year(2), 3rd year(1)Art 98 0 420 Voluntary 1st year(1), 2nd year(1), 3rd year(2)Table 4.3: Details of CLAS data. For example, in the first year music course there are 3 videos with 46 commentsand 321 annotations, with encouraged posting.Affect category Pertinent words Representative wordInterest/Enthusiasmcurious (15), enthusiastic (1), interesting (176),entertaining (8), involved (13), engaging (4),disengaged (1), attractive (9), appealling (6),compelling (2)interestingTable 4.4: An example process of choosing representative words from a word set. Here we take the Inter-est/Enthusiasm affection category as an example word set. This comprises a group of pertinent words with similarmeanings. “Interesting” has the highest frequency in the word set, and is thus chosen as the representative word.344.1. Datasets Analysis Methodology and Evaluationand posting was completely voluntary. Within each course, the postingcomprised of two parts: aggregate annotation content (a comment for amoment of video) and aggregate comment content (a comment on the wholevideo). Details about metrics (such as annotation type, posting type, andyear level) of each course subject are shown in Table ProcedureWe first used the three-step word taxonomy method (Word Cleaning, Cate-gorization, and Word Frequency Analysis) to analyze the comments on phi-losophy courses from YouTube. These word taxonomy steps are common inliterature and current applications. We then normalized the representativewords in the opinion words category. We did not normalize words in theother two categories, because further validation first needs to be conducted.We needed to clarify that each category was comprised of word sets.A group of words with similar meaning made up one word set, and theword with highest frequency was chosen as the representative word in thatword set. An example of this process is shown in Table 4.4. The resultingnormalized use percentage of each representative word was in fact the resultof the total word count in each word set, divided by the total word countin each word category. The normalized results were then compared withresults from the CLAS dataset. As there was initially only one subject in thisYouTube dataset, the validation was completed after finishing the analysisof another five subjects in CLAS; there are five course subjects in the CLASdata. For each subject, we followed the same analysis procedures as weperformed with our YouTube data. However, as the data metrics amongthese five subjects are slightly different, our analysis procedures needed toadapt to the corresponding data features. In other words, the three-stepword taxonomy method was applied at subject level in the YouTube dataset,while applied at metric level within each subject in the CLAS dataset.Our analysis and comparison was completed within each subject in thisstep. For the courses in music, math, and library, there is only one year levelfor both annotations and comments, so We first completed a word taxonomyseparately for annotations and comments within each subject, and thencompared the results. For the political science courses, there are two yearlevel for both annotations and comments. Here, we first performed wordtaxonomy separately for annotations and comments within each year level,and then compared the results between the two annotation types and yearlevels. For art history courses, we see three year levels with just annotations,so we first performed word taxonomy on annotations within each year level.354.1.DatasetsAnalysisMethodologyandEvaluationMusic Math Art history Political science Library Philosophytrue X X X X Xnew X X X X Xwrong X X X Xlimited X X X Xpretty X X X XTable 4.5: Five words were moved after cross-validation study. Check mark means the word originally existed inthis subject.We then compared the results across the three year levels. As shown in Figure 4.1, the top five words inannotations are almost the same with ones in comments which respectively applied to the three word categories.Other results of this intermediate step are attached in the Appendix C.From the results of our last step, we concluded that the data metrics such as year level and annotation typehad no significant influence on taxonomy results. Based on this finding, we integrated results into three categoriesfor each subject separately, and performed analysis and comparison across subjects. We compared words inthe opinion, content descriptive, and subject feature categories, respectively, across the five subjects. We thenmanually verified the generality of words in both opinion and content descriptive categories, and the context-basedproperty of words in the subject feature category. Thus, we integrated words in the opinion category across fivesubjects into an aggregated opinion category for CLAS. We applied the same procedure for words in the contentdescriptive category. Following our YouTube routine, we only normalized words in the opinion category. Finally,we obtained opinion words and content descriptive words from the YouTube and CLAS data respectively.We also had subject feature words from six course subjects (5 from CLAS). For feature words that exist in morethan three of the course subjects, we reclassified them as content descriptive words. After this validation step,we finalized words in both the content descriptive and subject feature categories. We then separately normalizedwords in the content descriptive category in each dataset. Comparing normalized opinion words from YouTube364.1.DatasetsAnalysisMethodologyandEvaluationFigure 4.1: Comparison results between annotations and comments in math courses from CLAS. The blue bubblesshow normalized percentages. The comparison was run among three categories: opinion words, content descriptivewords and subject feature words.374.1. Datasets Analysis Methodology and EvaluationSubjects Feature WordsMusic dynamic, confident, rhythmic, loud, rushMath question, function, integral, answer, actualArt ritual, authentic, social, modern, aestheticsPoliticaldeal, constitutional, works, vulnerability,unanimousLibrary instructional, topic, question, handout, activePhilosophy factual, stupid, absurd, popular, logicalTable 4.6: Top five collected subject feature words in six subjects, respec-tively.with ones in CLAS by use percentage, we formed the aggregated opinionwords category, and applied the same procedure to words in the contentdescriptive category. The aggregation process is attached in our AppendixD. Lastly, we employed a form that contains one aggregated opinion wordscategory, one aggregated content descriptive words category, and six subjectfeature categories. We selected the top collected words in the two generalcategories to perform our next study.4.1.4 ResultsFor CLAS data, results within each course subject showed that in all music,math and library courses, the top used words in three categories are almostthe same between annotations and comments. In other words, annotationtypes do not influence the top used words across our three categories. Thisconclusion also applies to year levels. For courses with voluntary postings,words mainly belonged to the subject feature category, and were few in ourother two categories.In Table 4.5, five words moved from the subject feature to contentdescriptive category and were listed. These listed words had very high usepercentages. We can see that “true” and “new” existed in five subjects, whilethe other three words existed in four subjects. We also observe that politicalscience feature words do not contain any of these five words. We mightattribute this to voluntary postings and fewer quantities of annotations maycontribute to this, but we also see that art history courses have voluntarypostings. In any case, all five words were contained in the feature wordscategory. Thus, from another perspective, we suppose that the subjectfeature words in political science are completely distinct from ones in othersubjects, which enhances our decision to classify subject feature words and384.1.DatasetsAnalysisMethodologyandEvaluationFigure 4.2: Aggregated results from Normalized YouTube and CLAS data.394.2. Validation of Top Collected Tagsverify our definition for this category.In Table 4.6, the top five feature words in six subjects are given. Wecan see that the words in each subject are completely different and arecontext-based. The word “question” exists in math and library with differentfrequencies, but it does not exist in more than three subjects, so we stillclassify it as a subject feature word. We can see that these results aremade up of nouns such as “function” and “topic”, and adjectives such as“rhythmic” and “absurd”. Further, the distribution of nouns and adjectivesis dependent on subject. Math mainly comprises of nouns, while philosophyand music are made up of adjectives.In Figure 4.2, the top nine words in opinion and content descriptivecategories were given, respectively. As mentioned previously, these wordswill be used in our next study. We observe that most words are positiveand that few words are negative, such as “bad”, “limited” and “wrong”.For negative words, “bad” and “wrong” are from the YouTube dataset.In general, more negative words were collected in the YouTube datasetsthan in CLAS. This conclusion can be proved in our Appendix D. We alsoobserved that words like “good”, “like” and “true” had significantly higheruse percentages than others’ in the list.4.2 Validation of Top Collected TagsAn online survey was conducted in order to assess the usability of the col-lected tags and to determine an optimal tag number. With different numbers(3,5,7,9) of given default tags, participants were asked to tag five educationalvideo clips and were encouraged to input their own tags.4.2.1 ApparatusFive educational video clips from YouTube were embedded in the survey.The video clips were roughly 20 seconds in length and were completely dif-ferent from videos in the first study of Section 4.1.2. Five subjects werecovered: music, art history, engineering, math, and information manage-ment.4.2.2 ParticipantsThirty-five subjects participated in the study and we compensated themwith free coffee. They were randomly invited to take part in the study byour four researchers in Kaiser building in UBC. Two laptops and two iPads404.2.ValidationofTopCollectedTagsFigure 4.3: Content descriptive word results in the survey compared with normalization results in YouTube andCLAS data. There are three groups: words input by the participant (gray), percentage decrease (orange) andpercentage increase (blue).414.2. Validation of Top Collected Tagswere separately put on four tables. Each device was used to run one taggroup (4 groups in total). Two researchers were responsible for instructingthe participants to do online survey with the available devices. As the fourdevices were not occupied at the same time, the two researchers were alsoresponsible for assigning participants to corresponding device to make surethat all participants were randomly and evenly divided into four betweengroups. Another two researchers were responsible for inviting students totake part in the study by introducing the study purpose, time length, andreimbursement. As it took participants at most five minutes to finish thestudy, many students who were in their class break or waiting for theirfriends were willing to take part and got free coffee. All thirty-five partici-pants took part in the online survey within one day. Three participants didnot complete the survey and were excluded from the analysis. The otherthirty-two participants (8 female, 24 male) were all university students (aged19-40, with one male student less than 19). The thirty-two students (14 En-glish native speakers and 18 non-English native speakers) were from fourfaculties: Applied Science (N = 24), Graduate and Postdoctoral Studies (N= 5), Land and Food Systems (N = 2), and Pharmaceutical Sciences (N= 1). Nine students had no experience taking online courses taught withvideos.4.2.3 ProcedureA 4x2 factorial study was designed for this survey. Four between groups wererespectively assigned 3, 5, 7 and 9 default tags. We followed the routine fordesigning Likert and semantic differential scales [30] where 3, 5, 7 and 9 areoften used. Tags in both opinion word and content descriptive words wereprovided to each group.Participants were asked to watch an educational video clip first and thenrespectively select tags that could best express their feelings while watchingand their descriptions of the video content. After finishing and tagging fivevideo clips, participants were asked to choose default numbers of tags theywould like to have. Finally, questions about tagging task difficulty and tagusefulness were asked.As the online survey itself is within group, we made four survey versionsto control the four levels of default tag numbers. Each participant wasrandomly assigned to each version. To avoid the order effects of given tags,all the tags were randomly presented for each video clip. The video clipswere named “video1-5” to avoid any bias from the real names of the videos.424.2.ValidationofTopCollectedTagsFigure 4.4: Opinion word results in the survey compared with normalization results in YouTube and CLAS data.There are three groups: words input by the participant (gray), percentage decrease (orange), and percentageincrease (blue).434.2. Validation of Top Collected Tags4.2.4 ResultsWe used data and theory triangulation to analyze the results and answerthe two research questions below.Q1: Are the given words really useful and representative ofreaction range for tagging?Survey questions were asked about whether the provided tags were use-ful to describe video content and express watching feelings and opinionsrespectively, and participants were asked to rate on a five Likert scale. AKruskal-Wallis H test was then run to determine if there were differences inusefulness to describe video content among four groups of participants givendifferent numbers of tag words: 3 words(n=8), 5 words(n=7), 7 words(n=8),and 9 words(n=7). Here, the n amounts refer to group size numbers. Valuesare mean ranks unless otherwise stated.Distributions of usefulness to describe video content were not similar forall groups, as assessed by visual inspection of a boxplot. The usefulnessto describe video content increased from 9 tags (10.43), to 5 tags(14.71),to 3 tags(16.25), to 7 tags(19.88), but the differences were not statisticallysignificant, χ2(3) = 5.227, p = .156.The same test was run to determine if there were differences in usefulnessto express watching feelings and opinions among the four groups given differ-ent numbers of tag words: 3 words(n=8), 5 words(n=8),7 words(n=7), and9 words(n=7) groups. The mean rank of usefulness to express feelings wasdifferent between groups, and was statistically significantly: χ2(3) = 9.369,p = .025. Both p-values agree. According to [6], the asymptotic p-value(2-sided test) is considered good enough when there were five or more partic-ipants in each group. Subsequently, pairwise comparisons were performedusing Dunn’s (1964) procedure with a Bonferroni correction for multiplecomparisons. This post hoc analysis revealed statistically significant differ-ences in usefulness to express watching feelings and opinions between 5 tags(10.38) and 7 tags (23.14) (p = .016), but not between 9 tags (14.43) or anyother group combination.The Group with 3 words, namely “clear”, “helpful” and “true” thoughtgiven words were quite useful to describe video content though the differencewas not significant. In Figure 4.5, we can see that the sum of the threewords’ use percentages is above 50% among the four groups. This suggeststhat the given three words are quite useful. The group with 7 words had thehighest rating on usefulness to describe video content, which may be causedby the provided negative word, “wrong”. We can see that the 3 words groupand 5 words group were all given positive words in Figure 4.5. Our444.2.ValidationofTopCollectedTagsFigure 4.5: Content Descriptive word use percentage in each group. The listed nine words are default words.“Others” means words input by the participant.454.2. Validation of Top Collected Tagsargument can be enhanced by results in Figure 4.3 where we see that thenegative word “unclear” was input by users and is in the top four. Althoughthe 9 word group was also given the negative words “wrong” and “limited”,the group had the lowest rating. This can be explained by the fact thatparticipants thought only some of the given words were useful, such as thenegative words and the top three words.This analysis can also be applied to the results of rating usefulness toexpress watching feelings and opinions. We can see that the negative word“bad” was given in both the 7 words and 9 words groups from Figure 4.6.Thus, the group with 7 tags rated significantly higher than the 5 tag group.The usefulness of negative tags can also be seen by Figure 4.4 where thenegative word “boring” was input by users and places in the top three.Q2: Which is the optimal default tag number?Participants were asked which number was preferred to display defaultwords in a paper prototype. 1, 3, 5 and 7 were provided as default selections,but manual input was also supported. Of the 32 participants recruited tothe study, 4 preferred to have 1 word, 7 preferred to have 3 words, 14preferred to have 5 words, 5 preferred to have 7 words, and 2 preferred tohave 4 words. A chi-square goodness-of-fit test was conducted to determinewhether the tag numbers mentioned are equally preferable. The minimumexpected frequency was 6.4. The chi-square goodness-of-fit test indicatedthat the default selections were not equally preferable by the participants(χ2(4) = 13.313, p = .010), with 3 words and 5 words preferred by students.Participants were also asked to score the amount of effort it took themto add tags for the video clips in a five Likert scale. A Kruskal-Wallis Htest was then run to determine if there were effort differences in adding tagsamong the four participant groups given different numbers of tag words: 3words(n=8), 5 words(n=8),7 words(n=8), and 9 words(n=7) groups. Valuesare mean ranks unless otherwise stated. The distributions of effort to addtags were not similar for all groups, as assessed by visual inspection of aboxplot. The effort to add tags increased from 5 tags (13.75), to 3 tags(14.06), to 9 tags (16.21), to 7 tags (20.00), but the differences were notstatistically significant: χ2(3) = 2.707, p = .439. The 5 tag group had thelowest effort scores, which may suggest that they felt most comfortable with5 tags.In both Figure 4.5 and Figure 4.6, we can see that the 5 tag group alsohad the lowest percentage (0.14 and 0.17) of input words compared withthe other three groups. To some degree, this suggests that 5 default wordsshould be the most suitable number, given the various trade-offs found inthe analysis. Additionally, The five default tags should include both positive464.2.ValidationofTopCollectedTagsFigure 4.6: Opinion word use percentage in each group. The listed nine words are default words. “Others” meanswords input by the participant.474.3. Discussionand negative valence words.4.3 DiscussionA tagging interface for students to efficiently mark up video content canbenefit from the restricted tag vocabulary. These tags can also play an im-portant role as an alternative mechanism to commenting, using less screenreal estate. When choosing pre-defined tags for online video study, the learn-ing contexts should be considered (such as personal vs. public, mid-term,quiz, etc.). In addition, the factors of tag reaction range and word intensityshould be coupled with each other. More specifically, how to expand thereaction range of tagging in a more fine-grained way can also be exploredin future research, besides simply including positive and negative tags. An-other future direction is in associating learning action with reaction tags.For example, “interesting” may lead to sharing or re-watching the taggedvideo content, while “bad” may cause others to skip the video content en-tirely. In addition, “important” may lead to more reviewing of the videocontent in the future.As for our tag taxonomy method, we only consider the frequency factor.Because our work focused more on design of the quick tagging interface, wechose not to consider other factors. How to choose representative tags couldcertainly be more comprehensive, such as distinguishing “love” and “like”,etc. Specifically, more factors such as association and intensity can be takeninto account.According to Hick’s law [9], the more time will be taken to manipulatethe interface the more options there are in the interface. Survey design fortagging tasks in each group would be a future direction for calculating theTagging Tasks Finished Time Difference (TTFTD) as a function of design.4.4 SummaryThis chapter introduced a tag taxonomy method, which extracted the mostcommonly used tags from video comments. We evaluated this method sepa-rately for two data sets and aggregated the results from normalized YouTubeand CLAS data. The top collected tags were further validated by an onlinesurvey. According to our results, five is the optimal tag number for a one-click tagging interface. Additionally, a tag reaction range should cover bothpositive and negative polarity. Learning contexts should also be consideredwhen choosing the pre-defined tags for educational video interfaces. Based484.4. Summaryon these design implications, we recommend five reaction tags: “like”, “in-teresting”, “unclear”, “difficult”, and “important” for a quiz preparationlearning context, which will be used in our lab study prototype in Chapter5. Other directions can be explored, such as how to extend the current tax-onomy method, associating learning action with reaction tags, and utilizingthe Hick’s law.49Chapter 5Lab StudyIn this chapter, we discuss the experiment to evaluate our interface. Thisevaluation was performed following pilot studies to polish and receive initialfeedback from users about the usage of a quick tagging mechanism for videolearning tasks. Here the pilot studies helped us make design decisions for thequick tagging interaction and evaluation methods of our controlled study.In previous iterations, our quick tagging interface focused on supportingplayhead tagging and only one tagging order but pilot studies have shownthat varied methods are needed to support users’ tagging actions efficientlyand that two tagging orders are required to satisfy different user behaviors.Thus, we introduce our quick tagging mechanism integrating playhead tag-ging, transcript tagging, and filmstrip tagging, with pre-defined tag words.5.1 ExperimentA user study was carried out to evaluate the design and performance of ourinterface and the usefulness of using the quick tagging mechanism to helpusers recall video content. We developed an evaluation protocol to encour-age users to use our quick tagging method while keeping the experimentshort and maintaining a bias free evaluation. Likewise, for comparing withcurrent practices in quick tagging, we needed to ensure that our interfacemimicked current approaches as well as their logical extensions to provide afair comparison. Using our protocol, we investigated whether integrating thequick tagging mechanism into the current video player would make recallingvideo content more efficient in a learning context. We conducted a compar-ative user study, comparing the performance of participants using a quicktagging mechanism to bookmark video content against watching the plainvideo interface and taking notes on paper to finish a quiz corresponding tothe course video.505.1. Experiment5.1.1 ApparatusThe experimental interface was implemented using HTML5 and JavaScript.The study was performed on a 13.3 inch Acer S7 Laptop, with a screenresolution of 1920x1080 pixels. Participants used a regular Microsoft mouseto manipulate the contents of the screen. The web browser was Firefox(version 53.0.2, zoomed to 100%). We also used a screencast tool namedCamtasia Studio 8 to record participants’ interface operations, as shown inFigure ParticipantsFifteen volunteers, 6 males and 9 females, participated in this experiment.They were compensated $20 Starbucks gift card for their time. Participantsranged from ages 19 to 40. Each participant worked on this task individually.Twelve participants had taken a video course previously, either from UBCor from other on-line services. Three participants had never taken any videocourse. Participants were from diverse fields (7 Applied Science, 1 Business,1 Arts, 2 Education, 1 Graduate and Postdoctoral Studies, 1 Land and FoodSystems, and 2 Science). None of the participants had a strong backgroundin guitar, which was the topic of the video in the experiment. Participantswere aware they could opt out of the experiment at any time, which impliedthat they did not have to complete tasks as prescribed. Two participantsgave up passing the quiz from the second half of the video. Therefore, theircorresponding data was removed from our quantitative analysis. Accordingto our evaluation protocol, if participants could not pass the quiz (4 out5 questions right) within 20 minutes, they would be noted as failing thequiz. This experiment was time-limited, as we needed to control user studylength to avoid bias due to tiredness from lengthy learning sessions. Oneparticipant failed both of the quizzes. Another participant failed one quizfrom the first half of the video, thus, their corresponding data was removedfrom our quantitative analysis.5.1.3 Design and ProcedureWe separated participants into two groups. One group took handwrittennotes during the first half of the video and switched to the quick tagginginterface for the second half (paper-first condition). The other group did theopposite (tagging-first condition). The thirty-minute video was chosen from515.1. Experimentan introductory guitar course from Coursera5. We chose the guitar videofor three reasons:• Tagging motivation and preference: We piloted three kinds ofvideo lessons: Guitar, History and Chemistry. Participants showedinterest in history and music videos, but disliked chemistry. Accordingto our piloting, we found that when people felt interested in the videotopic, this would influence their motivation to tag video content whilelearning. In addition, the history video had a good transcript, soparticipants preferred tagging just on the transcript. The other twovideos had strong transcripts and filmstrips, so participants liked to tagboth. However, the guitar lesson also had many visual demonstrationsto complement the audio.• Quiz completion: We modified the quiz from Coursera. Each quizis comprised of five multiple choice questions, and we observed thatparticipants tended to complete the history quiz by simply searchingfor key words in the video transcript. The chemistry quiz also involvedheavy calculations, which may have taken significant amounts of timeto learn, slowing down completion time. The guitar quiz struck a goodbalance between these two issues. As we discussed in our related workin Section 2.2, the guitar video content and quiz were best able tosimulate the learning process from memory retention to analysis.• Video length: We learned from our piloting that short videos werenot able to motivate participants to use our tagging functions for learn-ing in the lab experiment. As a result, participants mainly relied ontheir memory instead of using the tagging functions we provided to fin-ish the quiz, so we were not able to verify the usefulness of our tagginginterface. In other words, longer videos can reduce this learning effectto some degree, which the guitar video (15 minutes) can accommodatefor our study.Each participant was asked to play the role of a distance learning stu-dent. Specifically, they were told that their instructor assigned a video forthem to watch and a quiz for them to finish. Using the quick tagging inter-face, participants were encouraged to use the tagging functions to bookmarkvideo content. We hoped that participants would develop their own use andlearning strategies for the tagging interface. For comparison, participants5https://www.coursera.org/learn/guitar/home/week/4525.1. ExperimentFigure 5.1: Descriptive Statistics for quick tagging vs plain video interface.Mean times are shown for each task and interface. Numbers are in seconds.also were instructed to use a plain video interface that did not support tag-ging, where they were allowed to take notes on paper. The guitar video wasequally divided into two halves for each participant to learn with the tagginginterface and with the plain video interface. The order of each interface wasrandomly assigned to reduce order effect. Participants were told that theycould use the video and tags or written notes as a cheat sheet to help themfinish the quiz later. The time for participants to learn and watch eachvideo was recorded and a maximum of 30 minutes was given for learningand watching the 15-minute video.The participants were asked to complete the quiz as soon as possible.They were required to get four out of the five questions right to pass the quiz.The time taken to pass each quiz was recorded, and the timing begun whenthe user opened the quiz. We then checked the answers when participants535.1. ExperimentQuestion Median p valueOverall usefulness of recall 4 .003Overall ease of use 4 .071Overall efficiency of use 4 .065Overall helpfulness for learning tasks 4 .003Pre-defined tags are helpful to recall 4 .048Pre-defined tags are useful to mark down 4 .005Liking transcript tagging 5 .001Liking filmstrip tagging 3 .782Liking playhead tagging 3 .317Tagging efforts on transcript 2 .034Tagging efforts on filmstrip 2 .031Tagging efforts by playhead 2 .003Table 5.1: The aggregated results of our questionnaire, with the medianscore (Likert Scale, 1 to 5) and p value (significant difference from neutralreaction). Here, 1 represents strongly disagree and 5 represents stronglyagree. Participants’ overall reaction to our quick tagging mechanism washighly positive.finished the quiz and the timing only stopped when they passed the quiz.We indicated their wrong answers, and allowed them to continue to attemptthe quiz until they passed, with a time restriction of 20 minutes.The experiment proceeded as follows:1. The participants started with a pre-study questionnaire to fill in theirdemographic information and usage history of on-line course videos.2. The evaluation started with a proficiency test for each technique.Participants were shown how to use each of the interface elements,and most importantly, how to tag in both tagging mode and untag-ging mode. They were also shown how to navigate the video contentthrough tagging history, transcript and filmstrip, and how to accesstagged video parts with the visual cues (black diagram) on the film-strip. The video used in this proficiency test was a popular coursefrom a UBC professor 6 on Field Effect Transistors (FET), which was30-minutes long. After this, they were required to finish each instruc-tion of the proficiency test independently. After passing the test, theywere allowed to proceed to the formal study section.6https://www.youtube.com/watch?v=SjeK1nkiFvI&t=141s545.2. Results and Discussion3. After the proficiency test, a trial began by asking participants to watchand learn from the first half of a given video (the guitar video). Par-ticipants were able to make a cheat sheet using the learning method.Once completed, they were asked to finish a quiz related to the videocontent.4. Once completing this first quiz, participants were advanced to the sec-ond half of the video and were instructed to use the alternate learningmethod. Once completed, they were asked to finish a quiz related tothe video content. Upon completion of the whole video, participantswere given a questionnaire to fill out asking for their reactions to thequick tagging interface and comments.5. Finally, participants were given time to experiment with the quicktagging interface in a less structured environment, and a follow-upinterview was conducted to further explore the tagging design spaceas well as other usage scenarios.Each experiment took between 1.5 hours and 2 hours.5.2 Results and DiscussionWe compared times spent finishing the quiz, when using either the quick tag-ging method to mark down video content or by hand-written notes with theplain video interface. The time to finish the quiz by using the quick tagginginterface (M=643.91, SD = 229.74) and by plain video interface (M=652.27,SD=280.60) was shown in Figure 5.1. The reason why we used descriptivedata analysis instead of a paired t-test was that the data was not normallydistributed as well as having large standard distribution which violated therequirements for a robust paired t-test. As we mentioned previously, fourparticipants did not pass the quiz and their data were excluded from de-scriptive data analysis. We also found that bookmarking video content withthe quick tagging interface took participants 10% longer on average thanwhen taking notes on paper.There are many factors which can influence the time participants taketo pass a quiz, which we consider here. As discussed in our related workin Section 2.2, our quiz needed participants to apply their memory-relatedlearning skills and analytical learning skills, which simulates real-life learn-ing scenarios. In other words, when participants tag video content, this doesnot mean they fully understand the content or are freely able to apply the555.2. Results and Discussionknowledge. In contrast, written notes may in fact be a better method allow-ing participants to understand the content, to some degree. In addition, ourquiz is comprised of multiple choice questions, where participants had theopportunity to randomly choose an answer. In other words, they could havesimply used trial and error quickly until answering enough questions cor-rectly. All of the above factors may contribute to the insignificant differencein finishing times between the two conditions. As seen in our observations,participants could also review the tagged content and then move on to fin-ish the quiz. However, when participants took notes on paper, they woulddirectly go to the quiz when the video ended. In addition, some participantscommented that some design decisions in the quick tagging interface wereinefficient. More details will be discussed in Section 5.2.2. These findingsmay explain why it took participants more time to use the quick tagginginterface than when taking notes.The questionnaire results shown in Table 5.1 demonstrate the positivereaction of participants to our quick tagging interface and mechanism. Theyespecially agreed that the quick tagging interface was significantly useful torecall video content (Median=4, p=.003) and that the quick tagging mech-anism was significantly helpful to finish learning tasks (Median=4, p=.003).Participants generally thought the interface was easy to use (Median=4,p=.071) and efficient (Median=4, p=.065), though the difference here werenot significant. Here we provide a quotation from P13:“I think the video tagging function is really helpful for studentsespecially during lectures, as professors tend to speak fast andlong, and students can only capture a point or two and jot themdown in a hurry. When they re-read their notes, there’s missingpieces in connecting the dots and it doesn’t make sense.” [P13]In the later sections of this thesis, we further present and discuss ourfindings about how the participants used our quick tagging interface to fa-cilitate their learning process while watching the video. Specifically, we willfocus on:1. Pre-defined tags (such as the number and variety of the tags, whetherthey were satisfying or not, the most used tags for learning, polarityof the tags, participant’s need to input their own personal tags, andthe number of tags they apply at one time).2. Quick tagging modes and interaction (such as preference, suggestions,use patterns, etc.).565.2. Results and DiscussionReaction Frequency Topics FrequencyImportant 168 First-Position 13Interesting 34 Half-Step 9Difficult 7 Descending 8Unclear 4 Ascending 7Like 1 Open-String 6Key 3Sequence 2Single-String 0Table 5.2: Pre-defined tags in our interface and aggregated use frequencyby 14 participants. The total number of use frequency of general tags was214, and was 48 for that of topic tags.3. Further improvements for the tagging interface (design and visualiza-tion, suggestions on adding new functions, aesthetics).We used a triangulation method to analyze our data from an observationof participants’ use patterns, their questionnaire reactions, and their timetaken to finish the tasks.5.2.1 Pre-defined TagsThe pre-defined tags are one of the most important parts of our proposedquick tagging mechanism. We will elaborate on the present results and dis-cussion with three aspects: use frequency and usefulness of pre-defined tags,satisfaction with pre-defined tag quantity and diversity, and requirementsfor input tags.Are the default tags used frequently and are they useful enoughfor learning?In terms of the pre-defined tags, we provided five reaction tags to expresswatching opinions and five topic tags related to video content to help par-ticipants bookmark video content, as shown in Table 5.2. We can see thatthe total use frequency number of reaction tags (214) was around 4.5 timesmore often than topic tags (48). As for reaction tags, words with positivemeaning, such as “important” and “interesting”, were used substantiallymore often than negative words like “difficult” and “unclear”. Three par-ticipants (P2, P3 and P9) used both “important” and “interesting”. Fourparticipants (P11, P12, P13 and P14) indicated that they used “important”575.2. Results and Discussionas if they were highlighting. Specifically, P12 mentioned that the study con-text was for quiz so “important” was useful enough but for other learningcontexts, more tags would be used. Three participants (P1, P3 and P11)used negative tags in the study. Another three participants (P12, P13 andP14) indicated that they would like to use negative tags such as “difficult”and “unclear” if they learned their own courses so that they can reviewthe corresponding video content later. However, one participant (P9) wouldnever use “Unclear” and “Difficult”, and explained that P9 would try towork out difficult content until it became clear. Most participants thought“Like” was improper in the quiz preparation context. Participants did notinput any reaction tag in the study, so we could not find an appropriatealternative to “Like” for a time-constrained personalized learning context.As “Like” had high use frequency in Chapter 4, we still recommend to use“Like” as default reaction tag for general collaborative learning contexts.When asked about whether the given tags were satisfactory in terms ofquantity and variety, both P3 and P6 thought they were good enough. P3also mentioned that the given tags helped him focus on video learning andhad lower cognitive load than thinking about his own tags. Notably, P12only used reaction tags. Others did use topic tags, but the ones providedwere quite limited, resulting in five participants (P2, P5, P8, P9 and P15)inputting their own. Correspondingly, we observed that all the tags inputby participants were topic words according to our experiment logs.In general, participants agreed that the topic tags would help them gainan overview of the video content. But when it comes to tagging video con-tent, some participants would use reaction tags while others would inputtheir own tags instead. No participants used the tag “Single-String”. Onthe one hand, it could be explained that some participants used other tagsto bookmark related content. Alternatively, we observed that some par-ticipants struggled on the quiz question related to the single string topic,meaning that these participants did not pay attention to the correspond-ing content while they watched the guitar video. Thus, both P3 and P10suggested that it would be helpful to display topics tags physically relatedto corresponding video content instead of displaying them in a menu whichloses spatial information, especially when people were not familiar with thevideo topic.As shown in Table 5.1, participants agreed that the pre-defined tagswere significantly helpful for recall (Median=4, p=.048) and useful to markdown video content (Median=4, p=.005). According to our interviews, mostparticipants liked the provided tags. However, P7 thought the tags containednot enough information and preferred to take notes. To some degree, this585.2.ResultsandDiscussionMultiple tags # Multiple tags #Interesting,Important 11 Ascending,flats 2Important,movable fingering 2 Descending,flats 2Difficult,Important 1 Descending,alternate picking 1Interesting,Important,First-Position 1 Ascending,Descending,Open-String 1Important,second position 1 Ascending,sharps 1Important,First-Position,open string 1 chromatic scale,left hand relax 1Important,octaves 1 Ascending,Descending 1Important,first position 1First-Position,Open-String,g chromatic good for fingering1Important,Chromatic scale 1First-Position,Open-String,two octaves apart1Interesting,Key 1Interesting,movable fingering 1Interesting,alternate picking 1Interesting,legato 1Table 5.3: Multiple tags and aggregated use frequency(#) by 6 participants. The number of tagging applicationsfrom 14 participants was 262 in total. Multiple tagging was around 13% (35/262) of the total tagging.explains why P7 did not tag at all in the experiment. Similarly, P5 took many notes on paper and preferredto input their own tags.Is there a need for using multiple tags? If yes, what should they look like?From Table 5.3, we can see that six participants used more than one tag in the experiment. According to our595.2. Results and Discussioninterviews, four participants indicated that they would only use one tag.Six participants would like to use more than one. However, one participantadded that they would not use more than two. Indeed, P4 used only onetag in the experiment, but indicated that more than one would be used formore complex and longer videos.In Table 5.3, there were three types of multiple tags, namely reactiontags only (12), reaction and topic tags (12), and topic tags only (11). Thesewere almost equally distributed. In our interview, P14 mentioned that tworeaction and one topic tag would be used, and P15 preferred to input mul-tiple tags.DiscussionFrom our findings, we can see that the pre-defined tags are indeed helpfulfor recalling video content. There are some design implications from usingboth reaction tags and topic tags in future tagging interfaces. For reactiontags, people tended to use the pre-defined tags without inputting their own.Based on this finding, transforming the reaction tags to emoji or icons couldbe a promising future research direction. Emoji or icons can save designspace on the video interface and are less cluttered than text. In addition,both positive and negative tags should be considered to satisfy users’ needs.From the perspective of topic tags, there are two main functions: anoverview of video content and bookmarking video parts. According to ourfindings, people need more variety and larger numbers of topic tags thanwhat was provided. Thus, a crowd-sourced tag cloud could be a solutionto provide an overview of video content. Dynamically chaptering the topicscan also be a promising solution to closely relate topics to video content.Learners who are unfamiliar with the video content can potentially benefitfrom this solution.Support for inputting a user’s own tags is necessary for a complete tag-ging mechanism. There are a few arguments to support this conclusion.First, users would like to input their own phrases or words together withpre-defined tags. Second, some users have strong opinions on typing theirown topics. Finally, this can be sourced by a tag cloud. Many factors suchas saving tag input by users and how to better support tagging actions inthe case of multiple tags can be considered in future design.605.2. Results and Discussion5.2.2 Quick Tagging Modes and InteractionAs described in Chapter 3, there are two modes: tagging and untagging.The quick tagging interaction involves video part selection followed by ap-plying tags, and vice versa. Three methods are supported to perform quicktagging, namely transcript tagging, filmstrip tagging, and playhead tagging.For spontaneous tagging, the process of video part selection can be achievedwith one hot key interaction.As for transcript tagging shown on the bottom in Figure 3.5 in blue, theinteraction process involves selecting texts in the transcript first and thenchoosing tags/inputting tags, and vice versa. Similarly, filmstrip tagging(Figure 3.4) can be achieved in two different orders: selecting filmstripfirst and then choosing tags/inputting tags, or vice versa. The results anddiscussion of this section are elaborated by the preference, suggestions, anduse patterns of the quick tagging mode and interaction.Comparison of three quick tagging methodsResults derived from the questionnaire showed that participants significantlyliked transcript tagging (Median=5, p=.001) and felt neutral about filmstriptagging (Median=3, p=.782) and playhead tagging (Median=3, p=.317).However, it took a significantly small amount of effort for them to tag thevideo content in all three ways: transcript (Median=2, p=.034), filmstrip(Median=2, p=.031), and playhead (Median=2, p=.003).According to our interviews, one participant liked playhead tagging bestwhich helped mark down video content while focusing on watching. How-ever, 12 participants preferred transcript tagging for the following threereasons. First, participants were generally not familiar with the video topicand reading text helped them learn faster. Second, some participants wereused to reading and highlighting on PDF, therefore they naturally selectedtext on transcript and correspondingly applied tags. Third, participantsliked to select exact words or sentences when they performed tagging, whichexplains why they disliked the filmstrip. More specifically, they thoughtthe visualization of the filmstrip was quite similar in each frame and thatselecting with the filmstrip gave them information that was too general andinaccurate. However, four participants mentioned that it would be veryhandy to tag charts or graphs on the filmstrip once they were familiar withthe video topic.615.2. Results and DiscussionSuggestions on improving the quick tagging modes andinteractionFour participants felt that it was confusing to have two modes. Theysuggested to instead use a context menu with hot keys to commit all theactions: add tags, edit and delete tags/tagging, as well as undo. Here is aquotation from P10 below.“It could be useful to add functionality to the right button ofthe mouse, to open a menu box for creating or deleting tags.”[P10]The other eleven participants thought the two modes were acceptable,but six of them suggested that the tagging mode should only allow addingtags while the untagging mode was best for deleting or editing tags/tagging.As for the two different orders of quick tagging: selecting video content firstor applying tags first, three participants liked both orders. Four participantsused the same tags frequently so they preferred the second order, while threeother participants frequently changed tags so they preferred the first order.In our design, the drop-down menu does not automatically close afterone tag is chosen, to support the selection of multiple tags. Three par-ticipants complained that this design was annoying and two participantsthought changing tags in the drop-down menu was time-consuming. Key-board shortcuts were suggested to efficiently change tags. Below are tworepresentative quotations to clarify the two issues above:“Not have to close the tag/untag drop-down menu before the tagis saved because sometimes the user forgets to do that and thetag does not refresh.” [P2]“It would be nice to be able to change the tags more quickly. Ittakes time away from watching the video when you have to goback and forth to select different tags.” [P8]Use patterns of the quick tagging modes and interactionAccording to our observations, almost all participants(13) chose tags firstand then selected video content. Six of them only used transcript tagging.Four of them used both transcript tagging and playhead tagging. Two ofthem only used playhead tagging. Three of them occasionally used film-strip tagging. One typical scenario was seen when there was only musicand no transcript, where participants would select on the filmstrip. Here,625.2. Results and Discussiontwo participants would first select texts on the transcript and apply tags.Six participants edited their tagging/tags by adding a new tag, changingtags, as well as modifying selected texts in the transcript. In addition, oneparticipant performed editing after finishing the video, while others editedwhile watching. Although we did not specifically provide an editing modein our interface, participants developed their own strategies here. For ex-ample, five participants moved to the untagging mode and manipulated thetagging application (trimmed the selected texts, interacted with the tag bub-bles, deleted the tagging application, and selected again to add new tags).However, one participant tagged twice on the same video content to addnew tags in tagging mode. In addition, we found that one participant triedto change tags in the current tagging application by changing tags in thedrop-down menu.We also observed that most participants(10) linearly watched the videoand performed tagging the first time they watched, while four participantsused the transcript time code to jump around the video. One participantwatched a section of the video and then went back to tag. Eight partici-pants rewatched the video and reviewed their tagging applications, wherefour of them first used the filmstrip to navigate the tagging application andthen browse content on the transcript. Two used the transcript time codeto browse tagging content. One participant searched key words in the tran-script or used the time code to browse tagging content. P7 did not tag atall but used the filmstrip to review video content.We allowed participants to use the quick tagging interface as a cheatsheet during the quiz. We observed that seven participants browsed thetranscript or used time code to jump around the transcript, and some-times even referred to their tagging applications. Four participants usedthe filmstrip to navigate the tagging application and browse content on thetranscript. Three participants browsed the transcript or searched using keywords.Other FindingsIn our interviews, two participants preferred taking notes while watchingeducational videos. One suggested that a notepad could be attached to thevideo interface, where notes themselves could be tagged. One participantliked both highlighting and taking notes to learn with videos. Six partici-pants indicated that they like to highlight on PDF while learning, so theypreferred highlighting when watching educational videos. Three of them didmention that colour coding tags were acceptable. However, two participants635.2. Results and Discussionindicated that they like to tag the video content while learning.DiscussionFrom the results, we can see that participants do not agree on using themodes or context menu to add tags and edit their tagging applications.We can say that each design has both advantages and disadvantages. Inregards to tagging modes, we suggest to use these tagging modes to addtags with the two different orders, and to use the untagging mode to edit ordelete tagging applications. The advantage of using these modes is that itcan be integrated with other annotation methods such as highlighting andcommenting to create a comprehensive video annotation system. However,the disadvantage is that it may take extra effort for users to change betweenthese two modes, especially when they are new to the video topic as well asthe video interface.In terms of the context menu, users can benefit from performing alltagging actions in one menu, without adding extra cognitive load. But thisapproach may not be scalable when there are other annotation methodsinvolved, which may cause the menu to become cluttered.When learners are new to the video topic or when they are under a time-constricted context, we see that they prefer to use transcript tagging. Butif the transcript is poor, or if there are graphs or charts in the filmstrip,they prefer using filmstrip tagging so that they can quickly find and recallcontent. For playhead tagging, it requires very low click effort, but it is notso useful when people need highly accurate information, such as for a quiz orexam situation. However, this may be useful for students to preview videocontent.In summary, tagging can useful but is not the only optimal way forstudents to learn with video annotation. Highlighting and taking notes canalso be useful for video annotation to adapt to different learning habits ofdifferent learners. For people who strongly prefer highlighting, combiningtagging with colour can be a solution. Whereas for people who stronglyprefer notes, tagging notes would be beneficial.5.2.3 Further Improvement to the Tagging InterfaceEight participants mentioned that they would like to save their tag historyand filter their tagging content by tags. There are two suggestions from P4and P5 on this filtering function.645.2. Results and Discussion“A filtering option to go through, looking at only sections ofthe video tags as “difficult”, for example, would be very helpful.”[P4]“It would be nice to have an “overall” view of the tags Iused without having to manually search them in the transcript,playhead, filmstrip, etc.. Maybe use linked highlighting like if Ihover over a tag anywhere on the UI, it will highlight whereverelse that tag might exist (eg. filmstrip, in transcript, etc.).” [P5]Two participants suggested that it would be better to have a tag cloudaround corresponding video content so that they could have a better un-derstanding of each video section. Three participants mentioned that thedrop-down menu slowed down their process of changing tags and blockedtheir vision for video watching. Therefore, they suggested a horizontal lay-out. Another two participants liked the idea of customizing the drop-downmenu. More specifically, they wanted the menu to simply show their ownmost-used tags. In regards to the transcript, two participants mentionedthat it would be time consuming to browse tagging applications on thetranscript for long videos. Another participant commented that the tran-script should have larger font sizes. In terms of the filmstrip, one participantsuggested that it should be segmented into small sections of real video in-stead of a number of frames. Finally, one participant would prefer to usethe mainviewer to tag in a real scenario, such as using a right click methodand choosing tags.DiscussionAs our experiment mainly focuses on adding tags, other functions like filter-ing and reviewing tagging history can be further improved. The suggestionson these functions from participants are quite useful. For iterative designof the drop-down menu, this could be horizontal and customizable, whereusers can input and customize their own tags. For topic tags, they couldbe visualized dynamically in a tag cloud. In addition, reaction tags couldbe transformed into emojis. We also need to explore further visualizationmethods for filmstrip itself. Finally, tagging on the mainviewer is yet anotherfuture research goal.655.3. Summary5.3 SummaryThis chapter introduced a quick tagging mechanism which helped users torecall video content. We performed a controlled study to quantitativelycompare time spent using three quick tagging interface with time spent usinghand-written notes. We found no significant difference in finishing the quizbetween the two methods, but we did find that participants using the quicktagging interface spent around 10% longer on average to watch and learn thevideo than when using hand-written notes. In terms of reaction to our quicktagging mechanism, users significantly agreed that it was helpful for them torecall content and finish their learning tasks. We also qualitatively collecteda substantial number of suggestions on how to improve the quick taggingmechanism, in regards to the pre-defined tags, the quick tagging interactionand modes, the tagging interface visualization, and future designs.66Chapter 6Conclusion and Future WorkIn this chapter, we present a conclusion of the results of our two studies tovalidate that our proposed quick tagging mechanism can indeed enhance thevideo learning experience. We also discuss the significance and contributionsof this thesis, as well as a proposal for future research.6.1 ConclusionIn this thesis, we presented an answer to our research questions:1) Do users feel it’s efficient and useful enough to perform quick taggingon video content when finishing their learning tasks? 2) In perspective ofusefulness, does quick tagging mechanism help students recall video content?Overall, participants have a positive reaction to our design. They wereable to efficiently bookmark and recall video content of this study. We com-pleted content analysis on course video comments from YouTube and CLASmaterial, and validated the results in an online survey. We obtained impli-cations for how to choose optimal pre-defined tags under learning contexts,as well as a list of commonly used tags. We then leveraged our quick taggingmechanism, which is comprised of pre-defined tags, video part selection, andapplying tags on a video interface. The tagging interface was both qualita-tively and quantitatively evaluated in a controlled study. Participants usingour quick tagging interface spent around 10% longer on average than whentaking notes on paper while learning and watching the video. In terms ofquiz completion time, it took participants almost the same average timeas taking notes on paper. It can be explained that taking notes engagedthem at a different cognitive level to synthesize the content, while the quicktagging helped with other aspects of Bloom’s taxonomy using a differenttype of effort. In addition, there were some tradeoffs involved when usingtagging. It took users less time using tagging when they want to quicklymark learning content compared with taking notes. However, notes providemore information than tags when users review their recorded video content.If these two methods are combined together, the combination will likely beuseful and beneficial.676.2. Future WorkIn summary, our observations with our tagging interface reinforce theconcept that there is not one best way for students to annotate video contentwhile learning. Rather, students learn in different ways at different stagesfor different content and types of information. Thus, providing a suite ofdifferent tools for video annotation, matched to different styles and typesof information for learning seems appropriate. Our results suggest that ourquick tagging interface for video can be used effectively as one of these tools.6.2 Future WorkIn this thesis, the tagging interface was tested in a controlled lab study andwith a limited number of participants. The next step in evaluating this quicktagging mechanism is to expand on this controlled study, and to build a fullfledged application so that the tagging function can be used in real learningscenarios. To perform a field study of this magnitude, we plan to integratethis quick tagging feature into our personalized video learning tool, ViDex.This will allow us to more easily test our quick tagging feature with studentsin their learning contexts.Although our work is focused on exploring personalized learning withvideo, it can be a good foundation for building aggregated video learningsystems. Besides the traditional interaction data such as play, pause, andseek, tags and tagged intervals can be fed back to the interface design toshow the wisdom of the crowd. For example, a tag cloud of specific videoparts, or the whole video, can be beneficial for students to pay attentionto certain important parts of the course as well as for instructors to betterunderstand student learning feedback. In addition, our current tags are intext format which can be a good foundation for expanding to other moresophisticated content, such as emoji. Other media such as voice can also beexplored using our current quick tagging mechanism in the future.As for our interface features, we included the mainviewer, transcript, andfilmstrip together. But for different educational video production styles, theinterface features may not adapt well to certain video formats. For example,the filmstrip may not be useful for a video mostly showing an instructortalking. And thus, how to adapt the quick tagging mechanism to otherinterface features can be further explored. In addition, our current taggingapplications are synchronized between transcript and filmstrip. A solutionto synchronize them with more interface elements can indeed be a challenge.The contributions in this work have shown that providing users the quicktagging mechanism is beneficial for effective video learning. Providing op-686.2. Future Worktimal pre-defined tags, and proper methods of video part selection and tagapplication is an essential part of bookmarking video content.69Bibliography[1] Gregory D. Abowd, Matthias Gauger, and Andreas Lachenmann. Thefamily video archive: An annotation and browsing environment forhome movies. In Proceedings of the 5th ACM SIGMM InternationalWorkshop on Multimedia Information Retrieval, MIR ’03, pages 1–8,New York, NY, USA, 2003. ACM.[2] David Bargeron, Jonathan Grudin, Anoop Gupta, and ElizabethSanocki. Annotations for streaming video on the web: System designand usage studies. Technical report, March 1999.[3] Patrick Baudisch. Using a painting metaphor to rate large numbersof objects. In Proceedings of HCI International (the 8th InternationalConference on Human-Computer Interaction) on Human-Computer In-teraction: Ergonomics and User Interfaces, pages 266–270. ACM, Au-gust 1999.[4] CJ Brame. Effective educational videos. Vanderbilt University Centerfor Teaching, 2015.[5] Bloom B.S. and Krathwohl D.R. Taxonomy of educational objectives:The classification of educational goals. Handbook I: Cognitive domain.1956.[6] W. J. Conover. Practical Nonparametric Statistics, 3rd Edition. JohnWiley & Sons, Inc, 1999.[7] Nicholas Diakopoulos and Irfan Essa. I.: Videotater: an approach forpen-based digital video segmentation and tagging. In In UIST 06: Pro-ceedings of the 19th annual ACM symposium on User interface softwareand technology (New, pages 221–224. ACM Press.[8] Nicholas Diakopoulos and Irfan Essa. Videotater: an approach for pen-based digital video segmentation and tagging. In UIST ’06 Proceedingsof the 19th annual ACM symposium on User interface software andtechnology, pages 221–224. ACM, October 2006.70Bibliography[9] W E. Hick. On the rate of information gain. 4:11–26, 03 1952.[10] Matthew Fong, Gregor Miller, Xueqin Zhang, Ido Roll, Christina Hen-dricks, and Sidney Fels. An investigation of textbook-style highlight-ing for video. In Proceedings of Graphics Interface 2016, GI 2016,pages 201–208. Canadian Human-Computer Communications Society/ Socie´te´ canadienne du dialogue humain-machine, 2016.[11] Scott Golder and Bernardo A. Huberman. The Structure of Collabora-tive Tagging Systems. Journal of Information Science, 32(2):198–208,2006.[12] Philip J. Guo, Juho Kim, and Rob Rubin. How video production affectsstudent engagement: An empirical study of mooc videos. In Proceedingsof the First ACM Conference on Learning @ Scale Conference, L@S ’14,pages 41–50, New York, NY, USA, 2014. ACM.[13] Kyungsik Han, Jin Y. Jang, and Dong W. Lee. Exploring tag-based likenetworks. In proceedings of the 33rd Annual ACM Conference ExtendedAbstracts on Human Factors in Computing Systems, CHI EA’15, pages1941–1946. ACM, 2015.[14] Juho Kim, Elena L. Glassman, Andre´s Monroy-Herna´ndez, and Mered-ith Ringel Morris. Rimes: Embedding interactive multimedia exercisesin lecture videos. In Proceedings of the 33rd Annual ACM Conferenceon Human Factors in Computing Systems, CHI ’15, pages 1535–1544,New York, NY, USA, 2015. ACM.[15] Juho Kim, Philip J. Guo, Carrie J. Cai, Shang-Wen (Daniel) Li,Krzysztof Z. Gajos, and Robert C. Miller. Data-driven interaction tech-niques for improving navigation of educational videos. In Proceedingsof the 27th Annual ACM Symposium on User Interface Software andTechnology, UIST ’14, pages 563–572, New York, NY, USA, 2014. ACM.[16] Kirsty Kitto, Sebastian Cross, Zak Waters, and Mandy Lupton. Learn-ing analytics beyond the lms: the connected learning analytics toolkit.In Proceedings of the Fifth International Conference on Learning Ana-lytics And Knowledge, LAK ’15, pages 11–15. ACM, 2015.[17] Anna Korhonen and Ted Briscoe. Extended lexical-semantic classifica-tion of english verbs. In Proceedings of the HLT-NAACL Workshop onComputational Lexical Semantics, CLS ’04, pages 38–45. ACM, 2004.71Bibliography[18] David R. Krathwohl. A revision of bloom’s taxonomy: An overview.Theory Into Practice, 41(4):212–218, 2002.[19] George Macgregor and Emma McCulloch. Collaborative tagging as aknowledge organisation and resource discovery tool. Library Review,55(5):291–300, 2006.[20] W. E. Mackay. Eva: an experimental video annotator for symbolicanalysis of video data. In ACM SIGCHI Bulletin, pages 68–71. ACM,October 1989.[21] Toni-Jan Keith Palma Monserrat, Shengdong Zhao, Kevin McGee,and Anshul Vikram Pandey. Notevideo: Facilitating navigation ofblackboard-style lecture videos. In Proceedings of the SIGCHI Confer-ence on Human Factors in Computing Systems, CHI ’13, pages 1139–1148, New York, NY, USA, 2013. ACM.[22] Mitchell J. Morris. VastMM-Tag: Semantic Indexing and Browsing ofVideos for E-Learning. PhD thesis, Columbia University, New York,United States, 2012.[23] Xiangming Mu. Towards effective video annotation: An approach toautomatically link notes with video content. Computers & Education,55(4):1752 – 1763, 2010.[24] Katja Niemann. Increasing the accessibility of learning objects by au-tomatic tagging. In Proceedings of the Fifth International Conferenceon Learning Analytics And Knowledge, LAK ’15, pages 414–415. ACM,2015.[25] Amy Pavel, Colorado Reed, Bjo¨rn Hartmann, and Maneesh Agrawala.Video digests: A browsable, skimmable format for informational lecturevideos. In Proceedings of the 27th Annual ACM Symposium on UserInterface Software and Technology, UIST ’14, pages 573–582, New York,NY, USA, 2014. ACM.[26] Jamie Pina, Kelley Chester, Diana Danoff, and Mark Koyanagi.Synonym-based word frequency analysis to support the developmentand presentation of a public health quality improvement taxonomy inan online exchange. Stud Health Technol Inform, 192:1128–1128, 2013.[27] Gonzalo Ramos and Ravin Balakrishnan. Fluid interaction techniquesfor the control and annotation of digital video. In Proceedings of the 16th72BibliographyAnnual ACM Symposium on User Interface Software and Technology,UIST ’03, pages 105–114, New York, NY, USA, 2003. ACM.[28] Michael Riegler, Mathias Lux, Vincent Charvillat, Axel Carlier, RaynorVliegendhart, and Martha Larson. Videojot: A multifunctional videoannotation tool. In Proceedings of International Conference on Mul-timedia Retrieval, ICMR ’14, pages 534:534–534:537, New York, NY,USA, 2014. ACM.[29] Evan F Risko, Tom Foulsham, Shane Dawson, and Alan Kingstone.The collaborative lecture annotation system (clas): A new tool for dis-tributed learning. IEEE Transactions on Learning Technologies, 6(1):4–13, 2013.[30] Yvonne Rogers, Helen Sharps, and Jenny Preece. Interaction Design:Beyond Human - Computer Interaction, 3rd Edition. Wiley Publishing,2011.[31] Schulte im Walde Sabine. Human Associations and the Choice ofFeatures for Semantic Verb Classification. Res on Lang and Comput,6(1):79–111, 2008.[32] Klaus R. Scherer. What are emotions? And how can they be measured?Social Science Information, 44(4):695–729, 2005.[33] Shilad Sen, Shyong K. Lam, Al Mamunur Rashid, Dan Cosley, DanFrankowski, Jeremy Osterhouse, F. Maxwell Harper, and John Riedl.Tagging, communities, vocabulary, evolution. In Proceedings of the2006 20th Anniversary Conference on Computer Supported CooperativeWork, CSCW ’06, pages 181–190, New York, NY, USA, 2006. ACM.[34] Margaret A. Storey, Li T. Cheng, Ian Bull, and Peter Rigby. Waypoint-ing and social tagging to support program navigation. In Ext. AbstractsCHI 2006, pages 1367–1372. ACM Press, 2006.[35] Anthony Tang and Sebastian Boring. #epicplay: Crowd-sourcingsports video highlights. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, CHI ’12, pages 1569–1572, NewYork, NY, USA, 2012. ACM.[36] J. Trant. Exploring the potential for social tagging and folksonomyin art museums: Proof of concept. New Review in Hypermedia andMultimedia, 12(1):83–105, 2006.73[37] Lex V. Velsen and Mark Melenhorst. Incorporating user motivationsto design for video tagging. Elsevier B.V., 21(3):221–232, 2009.[38] Timo Volkmer, John R. Smith, and Apostol(Paul) Natsev. A web-basedsystem for collaborative annotation of large image and video collections:an evaluation and user study. In MULTIMEDIA ’05 Proceedings of the13th annual ACM international conference on Multimedia, pages 892–901. ACM, November 2005.[39] Alex Y. Zheng, Janessa K. Lawhorn, Thomas Lumley, and Scott Free-man. Application of bloom’s taxonomy debunks the ”mcat myth”.Science, 319(5862):414–415, 2008.[40] Sacha Zyto, David Karger, Mark Ackerman, and Sanjoy Mahajan. Suc-cessful classroom deployment of a social document annotation system.In Proceedings of the SIGCHI Conference on Human Factors in Com-puting Systems, CHI ’12, pages 1883–1892. ACM, 2012.74Appendix AOnline Survey Questionnaire75Appendix A. Online Survey Questionnaire76Appendix A. Online Survey Questionnaire77Appendix A. Online Survey Questionnaire78Appendix A. Online Survey Questionnaire79Appendix A. Online Survey Questionnaire80Appendix A. Online Survey Questionnaire81Appendix A. Online Survey Questionnaire82Appendix A. Online Survey Questionnaire83Appendix BLab Study Questionnaire84Appendix B. Lab Study Questionnaire85Appendix B. Lab Study Questionnaire86Appendix B. Lab Study Questionnaire87Appendix CIntermediate Results of Background Study88AppendixC.IntermediateResultsofBackgroundStudyFigure C.1: Comparison results between annotations and comments in music courses from CLAS. The bluebubbles show normalized percentages. The comparison was run among three categories: opinion words, contentdescriptive words and subject feature words.89AppendixC.IntermediateResultsofBackgroundStudyFigure C.2: Comparison results between annotations and comments in library courses from CLAS. The bluebubbles show normalized percentages. The comparison was run among three categories: opinion words, contentdescriptive words and subject feature words.90Appendix DAggregation Process ofBackground StudyOpinion WordsCLAS Youtubeword count percentage word count percentagegood 486 0.465517241 good 1234 0.330299786like 177 0.16954023 like 1020 0.273019272easy 128 0.122605364 interesting 234 0.062633833interesting 88 0.084291188 happy 207 0.055406852enjoy 20 0.019157088 bad 203 0.054336188comfortable 18 0.017241379 amazing 172 0.046038544nervous 18 0.017241379 awesome 141 0.037740899awesome 15 0.014367816 simple 72 0.019271949bad 13 0.012452107 funny 60 0.016059957intimidated 13 0.012452107 hate 51 0.013650964funny 12 0.011494253 fantastic 41 0.010974304amazing 11 0.010536398 depressed 34 0.009100642worried 10 0.009578544 satisfied 31 0.008297645pleased 10 0.009578544 horrible 31 0.008297645moved 8 0.007662835 boring 31 0.008297645fantastic 7 0.006704981 dislike 27 0.007226981boring 2 0.001915709 sad 27 0.007226981awkward 2 0.001915709 disappointed 26 0.006959315hopeful 1 0.000957854 moving 17 0.004550321doomed 1 0.000957854 joyful 17 0.004550321revolt 1 0.000957854 pleasant 17 0.004550321disappointed 1 0.000957854 hopeful 9 0.002408994freaking 1 0.000957854 uncomfortable 9 0.002408994embarrassment 1 0.000957854 angry 7 0.001873662awkward 5 0.00133833annoying 5 0.0013383391Appendix D. Aggregation Process of Background Studyembarrassing 5 0.00133833worrying 3 0.000802998good 0.465517241like 0.273019272easy 0.122605364interesting 0.084291188enjoy 0.055406852bad 0.054336188amazing 0.046038544awesome 0.037740899comfortable 0.017241379Content Descriptive Wordsclear 126 0.12962963 true 583 0.24352548helpful 110 0.113168724 important 145 0.060568087right 97 0.099794239 wrong 144 0.060150376new 86 0.088477366 pretty 125 0.052213868important 68 0.069958848 clear 103 0.043024227precise 56 0.057613169 helpful 102 0.042606516limited 51 0.052469136 hard 94 0.039264829difficult 37 0.038065844 original 83 0.034670008pretty 37 0.038065844 intelligent 82 0.034252297agree 35 0.03600823 favorite 80 0.033416876appropriate 28 0.028806584 nonsense 69 0.028822055mark 27 0.027777778 deep 64 0.0267335primary 27 0.027777778 believe 58 0.024227235understandable 18 0.018518519 cool 54 0.022556391mistake 17 0.017489712 reasonable 48 0.020050125confusing 16 0.016460905 proper 48 0.020050125effective 16 0.016460905 accurate 46 0.019214703excited 15 0.015432099 crazy 44 0.018379282complex 15 0.015432099 inspiring 42 0.01754386lovely 11 0.011316872 oppose 41 0.017126149comprehensive 10 0.010288066 confusing 36 0.015037594decent 10 0.010288066 appreciate 35 0.014619883attentive 8 0.008230453 evil 35 0.014619883brief 7 0.007201646 skeptical 31 0.012949039benefited 7 0.007201646 complex 29 0.012113617wise 7 0.007201646 inaccurate 25 0.010442774tricky 5 0.005144033 limited 24 0.010025063crazy 5 0.005144033 support 23 0.00960735292Appendix D. Aggregation Process of Background Studyprimitive 5 0.005144033 insightful 19 0.007936508convincing 4 0.004115226 excited 17 0.007101086common 2 0.002057613 impressive 11 0.00459482concrete 2 0.002057613 effective 11 0.00459482careful 2 0.002057613 understandable 8 0.003341688expressive 1 0.001028807 concrete 5 0.002088555unsure 1 0.001028807 special 5 0.002088555unreality 1 0.001028807 ugly 5 0.002088555unproblematic 1 0.001028807 tricky 4 0.001670844unexpressive 1 0.001028807 exhausting 4 0.001670844unproblematic 1 0.001028807 unclear 2 0.000835422unexpressive 1 0.001028807 discouraged 2 0.000835422shortsighted 2 0.000835422unintelligent 1 0.000417711disbelieve 1 0.000417711insignificant 1 0.000417711unreasonable 1 0.000417711unappreciated 1 0.000417711unimpressive 1 0.000417711true 0.24352548clear 0.12962963helpful 0.113168724new 0.088477366important 0.069958848wrong 0.060150376precise 0.057613169limited 0.052469136pretty 0.05221386893


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items