Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Video annotations in helping locate in-video information for revisitation Min, Li 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2019_september_li_min.pdf [ 17.75MB ]
JSON: 24-1.0379724.json
JSON-LD: 24-1.0379724-ld.json
RDF/XML (Pretty): 24-1.0379724-rdf.xml
RDF/JSON: 24-1.0379724-rdf.json
Turtle: 24-1.0379724-turtle.txt
N-Triples: 24-1.0379724-rdf-ntriples.txt
Original Record: 24-1.0379724-source.json
Full Text

Full Text

Video Annotations in Helping Locate In-VideoInformation for RevisitationbyMin LiB. Eng, University of Science and Technology Beijing, 2016A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)June 2019c©Min Li, 2019The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Video Annotations in Helping Locate In-Video Information for Revisi-tationsubmitted by Min Li in partial fulfillment of the requirements for the degree ofMASTER OF APPLIED SCIENCE in Electrical and Computer Engineering.Examining Committee:Dr. Sidney Fels, Electrical and Computer EngineeringSupervisorDr. Dongwook Yoon, Computer ScienceSupervisory Committee MemberDr. Konstantin Besnozov, Electrical and Computer EngineeringSupervisory Committee MemberiiAbstractRewatching video segments is common in video-based learning, and video seg-ments of interest need to be located first for this rewatching. However, learnersare not well supported in the process of locating in-video information. To fill thisgap, the presented work explores whether video annotations are effective in help-ing learners locate previously seen in-video information. A novel interface designconsisting of two components for learning with videos is proposed and tested in thetask of locating in-video information: an annotating mechanism based on an inte-gration of text with video, and an annotation manager which enables the learnerto see all annotations he/she has made on a video and provides quick access tovideo segments. A controlled lab experiment with 16 undergraduate students assubjects was carried out. Experiment results suggested that the use of video an-notations significantly reduced time spent on searching for previously seen videosegments by about 5 seconds (p < 0.05), and subjects spontaneously used the pro-posed annotation manager 3 times more often in the information-seeking processthan the traditional method of finding video segments by sliding through the videotimeline (9:3). Qualitative data from surveys and interviews indicated that boththe annotation process and annotations on the proposed interface were perceivedas helpful for learning with videos. Thus, video annotations in the proposed inter-face are effective in reducing search time for in-video information; the annotationprocess, annotations created by the learner, and the proposed annotation managerplay important roles in the information-seeking process.iiiLay SummaryWith online video learning being more and more popular among students, toolsto facilitate such learning experiences are not keeping pace. When learning byreading a book, there are many devices available to improve the learning experi-ence: highlighters, pens for note-taking, bookmarks, tables of contents, and in-dexes. However, when learning by watching a video, the learner is restricted asidefrom clicking on the video timeline and pressing a very limited number of but-tons provided by the video player. Rewatching a video is common, especially in alearning context similar to rereading textbooks. However, there are also very fewfeatures designed to support learners in this rewatching process.Thus, a novel interface of video learning with richer features is proposed andtested. The interface has been found to be highly preferred by learners and effectivein helping learners find previously seen video segments to rewatch.ivPrefaceThe novel interface introduced in this thesis was designed and implemented bymyself on top of the ViDex platform [23] with permission. The research question,experiment designs, and data analysis came from my own efforts with the help ofDr. Ido Roll, Matthew Fong, Dr. Sidney Fels, and Dr. Yan Liu. The CertificateNumber of the Ethics Certificate was H13-01589.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Background and Related Work . . . . . . . . . . . . . . . . . . . . . 72.1 Locating Information in Video . . . . . . . . . . . . . . . . . . . 72.2 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.1 Text Annotations . . . . . . . . . . . . . . . . . . . . . . 112.2.2 Video Annotations . . . . . . . . . . . . . . . . . . . . . 132.3 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 19vi3 Video Annotations in Information-Seeking . . . . . . . . . . . . . . 213.1 Experiment Design and Procedure . . . . . . . . . . . . . . . . . 223.1.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.3 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.1.4 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 253.1.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2 Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.2 Video Player Page . . . . . . . . . . . . . . . . . . . . . 283.2.3 Annotation Manager . . . . . . . . . . . . . . . . . . . . 323.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 Task Performance . . . . . . . . . . . . . . . . . . . . . . 333.3.2 Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.3 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 354.1 Task Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 354.1.1 Training Session . . . . . . . . . . . . . . . . . . . . . . 354.1.2 Formal Session 1 . . . . . . . . . . . . . . . . . . . . . . 374.1.3 Formal Session 2 . . . . . . . . . . . . . . . . . . . . . . 394.2 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.3 Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.1 Pre-Experiment Survey . . . . . . . . . . . . . . . . . . . 414.3.2 Session Survey . . . . . . . . . . . . . . . . . . . . . . . 434.3.3 Post-Experiment Survey . . . . . . . . . . . . . . . . . . 444.4 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4.1 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . 464.4.2 The Annotation Process . . . . . . . . . . . . . . . . . . 474.4.3 Annotating Video vs. Annotating Text . . . . . . . . . . . 504.4.4 The Use of The AM . . . . . . . . . . . . . . . . . . . . 514.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52vii5 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.1 Interpreting the Results . . . . . . . . . . . . . . . . . . . . . . . 555.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 61Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63A Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.1 Pre-Experiment Survey . . . . . . . . . . . . . . . . . . . . . . . 71A.2 Session Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 75A.3 Post-Experiment Survey . . . . . . . . . . . . . . . . . . . . . . 77B Searching Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79B.1 Quantum Computation . . . . . . . . . . . . . . . . . . . . . . . 79B.2 Fusion Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83B.3 The Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87B.4 No AM Condition . . . . . . . . . . . . . . . . . . . . . . . . . . 90C User Study Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91D R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92E Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93viiiList of TablesTable 1.1 Feature summary of popular online learning platforms. . . . . 2Table 3.1 Demographics summary for subjects. . . . . . . . . . . . . . . 23Table 3.2 Video watched and AM conditions for each subject in the for-mal session 1. Video 1 was about the brain, and video 2 wasabout fusion power. For AM, “O” means the same tasks as inAppendix B.2 for video 2, and Appendix B.3 for video 1; “S”means the reversed AM conditions compared to “O”, i.e., itemsnot assigned with the AM in Appendix B.2 or B.3 were assignedwith the AM. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Table 3.3 Video watched by each subject in the formal session 2. Video 1was about the brain, and video 2 was about fusion power. Thesearching items were the same as in the formal session 1, butwithout assigned AM conditions, as exemplified in Appendix B.4. 24Table 4.1 Numbers of items searched with the AM, annotated items, andannotations by subjects in session 2. . . . . . . . . . . . . . . 40Table 4.2 Average numbers of annotations made by 16 subjects. . . . . . 41Table 4.3 Subjects’ responses to pre-experiment survey questions. . . . . 42Table 4.4 Subjects’ responses to session survey questions (Likert scale, 1to 5) of session 1 and 2. . . . . . . . . . . . . . . . . . . . . . 44Table 4.5 Subjects’ responses to post-experiment survey questions (Lik-ert scale, 1 to 5). . . . . . . . . . . . . . . . . . . . . . . . . . 45ixTable 4.6 Summary of subjects’ interview responses regarding annota-tions and the annotation process. . . . . . . . . . . . . . . . . 46Table 4.7 Numbers of subjects’ interview responses regarding annotatingvideo vs. annotating text and the use of the AM. . . . . . . . . 50Table E.1 Means of subjects’ task performances in session 1 in the pilotstudy (measured in seconds). . . . . . . . . . . . . . . . . . . 94Table E.2 Parameters for power analysis on G*Power. . . . . . . . . . . 94xList of FiguresFigure 1.1 The video player page of Coursera. . . . . . . . . . . . . . . 3Figure 1.2 The course page of Coursera. . . . . . . . . . . . . . . . . . . 4Figure 2.1 TEMPO [50], by Nathan Prestopnik et al. . . . . . . . . . . . 9Figure 2.2 Hitchcock [26], by Andreas Girgensohn et al. . . . . . . . . . 9Figure 2.3 Micro-level and macro-level scaffolding [12], by Salome Co-jean et al. (1) control, (2) microscaffolding, (3) macroscaffold-ing, and (4) two-level scaffolding conditions. . . . . . . . . . 10Figure 2.4 Artificial landmarks [59], by Md. Sami Uddin et al. Mediaplayer (a, b, c), PDF viewer (d, e, f). A, d: standard - withno landmarks; b, e: icon - augmented with abstract icons; c, f:thumbnail - augmented with extracted content as thumbnails. . 11Figure 2.5 Types and applications of text annotations [49], by IIia Ovsian-nikov et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Figure 2.6 Perceived importance of annotation features [49], by IIia Ovsian-nikov et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Figure 2.7 Video annotations anchors and content [5], by Olivier Aubertet al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.8 ToolScape [36], by Juho Kim. . . . . . . . . . . . . . . . . . 14Figure 2.9 MediaNotes developed at Brigham Young University. . . . . . 15Figure 2.10 CLAS [44], by Negin Mirriahi et al. . . . . . . . . . . . . . . 16Figure 2.11 Anvil [37], by Michael Kipp. . . . . . . . . . . . . . . . . . . 16Figure 2.12 VITAL [46], by Frank Moretti et al. . . . . . . . . . . . . . . 17Figure 2.13 Video Traces [54], by Amit Saxena et al. . . . . . . . . . . . 18xiFigure 2.14 DynamicSlides [34], by Hyeungshik et al. . . . . . . . . . . . 19Figure 3.1 Experiment design. . . . . . . . . . . . . . . . . . . . . . . . 23Figure 3.2 Video player page. . . . . . . . . . . . . . . . . . . . . . . . 28Figure 3.3 Course page with annotation manager. . . . . . . . . . . . . . 28Figure 3.4 The window to add annotations. . . . . . . . . . . . . . . . . 29Figure 3.5 Select a tag or create a new tag. . . . . . . . . . . . . . . . . 30Figure 3.6 Hover over an annotation icon in the transcript section to seethe details of annotations. . . . . . . . . . . . . . . . . . . . . 30Figure 3.7 The filmstrip section. . . . . . . . . . . . . . . . . . . . . . . 31Figure 3.8 The timeline of the video player section. . . . . . . . . . . . . 31Figure 3.9 Play highlighted parts only. . . . . . . . . . . . . . . . . . . . 32Figure 3.10 Annotation manager. . . . . . . . . . . . . . . . . . . . . . . 32Figure 4.1 Subjects’ task performances in the training session. . . . . . . 36Figure 4.2 Subjects’ task performances in formal session 1. . . . . . . . 38Figure 4.3 Comparing numbers of items searched with the AM, annotateditems, and annotations made by subjects. . . . . . . . . . . . 40xiiGlossaryAM annotation managerANOVA analysis of varianceLMEM linear mixed effects modelM meanSD standard deviationxiiiAcknowledgmentsI’d like to thank everyone who has helped me academically during the process offinishing this research project. Thanks to my supervisor, Dr. Sidney Fels, for over-seeing this work, and Dr. Ido Roll, Dr. Yan Liu, Kyoungwon, Matthew, and Samuelfor sharing with me their expertise and invaluable advice. Thanks to colleagues inthe Human Communication Technologies Lab and the ViDex team for all of thestimulating discussions and generous help.xivChapter 1IntroductionToday, millions of learners enjoy online learning on a number of platforms [3],and these platforms continue to evolve to improve learner experiences and learn-ing outcomes. Despite all the advantages of online video-based learning, it stillfaces many design challenges, including the highly important question of how toengage learners [16]. Learning engagement, defined as “the student’s psychologi-cal investment in and effort directed toward learning, understanding, or masteringthe knowledge, skills, or crafts that academic work is intended to promote” [47],has been found to be positively correlated with the use of learning technology andlearning outcomes [22]. Learner-content interactions are also found to be criticalin ensuring satisfying educational experiences [4]. However, as summarized in Ta-ble 1.1, currently most major video-based online learning platforms support fewlearner-video interactions and create a highly passive learning environment.For example, Coursera provides both the video and its transcript to learners onits 1video player page, as shown in Figure 1.1. Moreover, the interface providesannotation features and a list of annotated transcript segments along with associ-ated notes as well, which make Coursera the most interactive learning environmentcompared to other platforms listed in Table 1.1. However, the interface supportslimited annotation features, only including highlighting and adding notes to tran-script segments, and there is only one color available for these highlights; while1 Transcript Transcript navigation Bookmark Bookmark list Note Note listCoursera√ √ √ √ √ √Edx√ √ √iversity√Khan Academy√ √ √Lynda√ √ √ √ √Udemy√ √ √YouTube√Table 1.1: Feature summary of popular online learning platforms.there is a list of annotated transcript segments and annotations (notes) on the videoplayer page, no indication of annotations is shown outside of this page, as shownon its 2course page in Figure 1.2. This point is especially important, as rewatchinga video is common, where the first step is to select a video. Showing learners theirannotations of a video outside of the video player page has the potential to helpselect a video to rewatch, so the learner does not need to click and watch everyvideo to find the correct one for review.Though videos are used pervasively for learning both in schools and at homes,very few learner-content interactions are provided to learners. A typical videomainly includes visual, audio, and textual elements, among which the audio andtextual are highly overlapped. Currently, for example in most video players suchas QuickTime Player or on YouTube, the learner is not well supported to utilizethese three representations of information: to examine different parts of the visual,the only thing the learner can do is click on the timeline or press the “fast forward”or “play back” button; the learner can access the textual only if the transcript orcaptioning of the video is provided, and the only interactions available are clickingon the timeline to see captions and scrolling up and down to read the transcript.We see that when compared to reading books, watching videos for the purpose oflearning is not well supported. While reading a book, the learner has a table ofcontents to jump to chapters or sections of interest, an index to find specific detail,a highlighter to mark sentences, a pen or pencil to write notes, and a bookmark tokeep the reader’s place so that they can return and continue reading easily.Thus, richer learner-content interactions and a better use of video annotationsfor rewatching videos need to be investigated to improve online learning experi-ences.2 1.1: The video player page of Coursera.Among the interactions with learning materials, annotations are particularlyuseful as they can be used by learners to think, to remember, and to clarify [49].Annotating text has been performed by many learners for a long time and has alsobeen explored extensively by previous research studies [49, 52, 57]. Though thereare a number of studies exploring video interface designs with annotation features,few of them have explored the effectiveness of these features, and none of themhave investigated the use of video annotations in locating in-video informationsuch as video segments.Rewatching is common in video-based learning, just as rereading textbooks isin classroom-based learning [8, 14]. To revisit a previously seen video segment,finding the location of the segment in the video is the first step. Unlike in a text-book, which has a table of contents and an index to support the learner in theinformation-seeking process, clicking on the timeline of a video is the only way tofind a previously seen video segment. Though speeding up and fast forwarding areprovided by most video players, the searching process still requires tedious trialand error especially for long videos.The presented work explores learner-created annotations on the integration ofvideo and text, and hopes to find whether video annotations can help learners lo-cate previously seen video segments. In the proposed design, both the video and3Figure 1.2: The course page of Coursera.its transcript are provided and synchronized. This design is expected to improvethe video-based learning experience and further support learners in the process offinding segments to review.1.1 Research QuestionThis work explores the use of learner-created video annotations in locating pre-viously seen information in a video. A research question that directs the entireresearch process is proposed:How effective are video annotations, which are created by learnerswhile learning with both visual representation and textual represen-tation of a video, in helping learners locate in-video information forrevisitation?Two hypotheses are also proposed:1. Annotated information can be located faster than non-annotated information.2. The more annotations the learner has made, the more likely they would relyon annotations to locate information.41.2 ContributionsThe contributions of the presented work consist of the design of a novel annotationinterface which provides quick accesses to annotated video segments, and the re-sults of a controlled lab experiment. In summary, evidence has been collected fromsubjects on:1. the effectiveness of utilizing video annotations to locate previously seenvideo segments;2. the effectiveness of the novel design of annotation manager in helping learn-ers find previously seen video segments;3. the effectiveness and limitations of the integration of text with video forvideo-based learning.There are two main components in the research question: locating in-videoinformation for revisitation, and learner-created video annotations based on the in-tegration of a video’s visual representation with textual representation. To answerthis, a novel interface for video annotations was proposed and a controlled lab ex-periment investigating in-video information-seeking was carried out. The proposedinterface consists of two components: an annotation mechanism based on an inte-gration of text with video, and an annotation manager which enables the learner tosee all annotations he/she has made on a video and provides quick accesses to an-notated video segments. In the controlled lab experiment, tasks of locating in-videoinformation were completed after the subject finished learning with and annotatinga video using the proposed interface.Various data were collected to answer the research question and understandrelated issues. Time spent on locating in-video information was analyzed to findwhether annotated information was located faster; annotations made by subjectswere also collected and analyzed to understand subjects’ preferences regardingdifferent types of annotations; qualitative data were collected and analyzed to bet-ter understand how subjects perceived video annotations, the annotation process,the proposed annotation manager, and the difference between annotating video vs.text. All of these data were found later on to complement each other and provide aricher answer to the research question.5Previous work on locating in-video information and video annotations are re-viewed in Chapter 2. Chapter 3 describes the experiment design and the interfacedesigned for the experiment. Experiment results are presented and discussed basedon category in Chapter 4. General discussion which integrates results of all cate-gories and the limitations of the presented work are given in Chapter 5. Finally,Chapter 6 gives conclusions of the presented work and proposes directions for fu-ture work.6Chapter 2Background and Related WorkAs the presented work explores how in-video information can be better located,this chapter reviews previous work on in-video information-seeking in section 2.1,to further explore the problem and identify gaps that the presented work can fill.The presented work utilizes video annotations, inspired by text annotations, to helplearners locate in-video information for revisitation, and proposes an interface forcreating and managing video annotations. Thus, related works on text annotationsand interface designs for video annotations are reviewed in Section 2.2. Finally, adiscussion and conclusion is made in Section 2.3 to emphasize the gaps in literaturethat the proposed work can fill.2.1 Locating Information in VideoUnlike traditional classroom-based learning, learning with videos provides easierways to learn with the same material over and over again. Rewatching has beenfound to be common in video watching experiences with 92% of viewers hav-ing rewatched some type of video every month, and rewatching tutorial videosseveral times was perceived as necessary for learners to finish related tasks [8].Online learners were found to rewatch parts of video lectures frequently [14], andrewatching segments instead of the whole video was also found to be more helpfulin learning [42].To rewatch a video segment, the learner needs to first find the respective seg-7ment, which has been found to be not easy. Undergraduate students were foundto search for in-video information using low level strategies and focused more onlocal information both temporally and spatially, lacking attention to global patterns[39]. In the experiment, Richard Lowe asked subjects to learn with an interactiveanimation and apply what they learnt to predict and draw the pattern of meteo-rological markings. Video-like controls were provided in the animation. Resultsshowed that subjects revisited parts of the animation frequently, but used low levelstrategies that addressed limited temporal and spatial scopes. The searching pro-cess was more a simple off-loading of information rather than conceptual groupingof various parts. It is worth noting that all subjects of this experiment were novicesin meteorology, so learners with more expertise may perform differently in thetasks. The animation used in the experiment did not have audio which may havemade the searching process more complicated, but indeed more similar to the reallife case of searching for information from videos.The results of Richard Lowe’s experiment are valuable in understanding theprocess of learners assimilating and applying new knowledge from videos, andthey indicate the need to better equip learners to revisit in-video information. Re-cently, a more advanced interface was designed to better visualize the temporalrelationships among events happening in the same locations. As shown in Figure2.1, the Temporally Enabled Map for Presentation (TEMPO) interface indicatedcritical events on the timeline, after the designers found that viewers weighed mo-ments with various importance [50]. The indications may serve as an implicitsummary or table of contents of the animation, which help the learner locate pointsof key information quickly, and thus does not require tedious examination of pointsto find the one of interest. However, how those indications can actually help in theinformation-seeking process is yet to be experimented.Locating video segments in a large video database can be taxing and time-consuming using low level strategies such as browsing through, as the user willhave to watch videos in their entirety. To address this problem, many video sum-mary techniques have been invented to extract key video frames or create previewsautomatically. For example, the Hitchcock system shown in Figure 2.2 is ableto show the user a preview with key frames, and also enable the user to searchvideo segments based on text information of the video, such as its transcript [26],8Figure 2.1: TEMPO [50], by Nathan Prestopnik et al.Figure 2.2: Hitchcock [26], by Andreas Girgensohn et al.though the effectiveness has not been examined. The Visual Transcript combinesthe visuals and the transcript segments of a lecture video to create a documentstyle summary, and the text component of the summary was found to be helpful tolearners in finding information [55].Another stream of research explores interface designs to scaffold users in theprocess of locating video segments on their own. Recently, Salome et al. foundpositive effects of both micro-level and macro-level scaffolding on information-seeking in video-based environments [12, 13]. In their experiments, macro-scaffoldingwas provided by a table of contents of the video, and micro-scaffolding was real-ized by segmentation markers on the timeline, as shown in Figure 2.3. Those scaf-9Figure 2.3: Micro-level and macro-level scaffolding [12], by Salome Cojeanet al. (1) control, (2) microscaffolding, (3) macroscaffolding, and (4)two-level scaffolding conditions.foldings increased the saliency of key information and also created external mentalmodels of the video content. Thus learners could find information faster with thisexternal mental model provided on the interface.Except for the focus on video content, spatial memory has also been consideredas important in locating information in video for revisitation. For example, artificiallandmarks, as shown in Figure 2.4, were found to be helpful in building up spa-tial knowledge of a video so that learners could navigate back to locations visitedbefore more easily, and arbitrary icons were found less effective than thumbnailswhich were customized for each video [45, 58, 59].All of these recent efforts on locating in-video information summarized aboveinvestigated the use of provided information instead of users’ own inputs, suchas annotations. Though these interactive features have been found effective in im-proving learning outcomes generally [17, 66], which will be introduced shortly,their effectiveness in the information-seeking process has been left behind. To fillthis gap, the proposed work explores how effective learners’ self-entered annota-tions are in helping locate in-video information.10Figure 2.4: Artificial landmarks [59], by Md. Sami Uddin et al. Media player(a, b, c), PDF viewer (d, e, f). A, d: standard - with no landmarks; b, e:icon - augmented with abstract icons; c, f: thumbnail - augmented withextracted content as thumbnails.2.2 AnnotationsThis section takes a closer look at annotations. Text annotations are discussed first,followed by an examination of interface designs for video annotations.2.2.1 Text AnnotationsAs the proposed design for video annotations was inspired by text annotations, it isworthwhile to gain an understanding of text annotations. Discussions of the resultsof the proposed experiment will refer back to studies examined in this section.Active reading is a process of assimilating and reusing the reading material aspart of the reader’s knowledge network [62], and has been considered as a com-bination of reading, thinking, and adding annotations [1]. With the developmentof technology, annotating documents is more supported and popular than before.Today’s document management software, such as Endnote and RefWorks, provideannotation tools for PDF documents, but they rely heavily on metadata and require11Figure 2.5: Types and applications of text annotations [49], by IIia Ovsian-nikov et al.Figure 2.6: Perceived importance of annotation features [49], by IIia Ovsian-nikov et al.a high level of proficiency. Interestingly, there were more discussions regardingannotations for electronic files in the 1990s. For example, DynaText supports threetypes of annotations: bookmarks, notes, and hyperlinks [56]. The DynaText an-notation manager supports sorting, viewing, and deleting. Directing users fromannotations to corresponding pages was implemented by another annotation sys-tem, Re:mark. Comment sharing and collaborative commenting have also beendeveloped [15, 38].Though advocates of e-reading claim that e-reading provides more advancedfeatures to support learning and improve learning outcomes [52], many studentsstill prefer to read on paper, and navigation and annotation functions of e-readinghave been found to be inferior to paper [57].IIia Ovsiannikov et al. asked readers 3 sets of questions targeting primarily theresearch and academic environment, based on which they identified main annota-tion types, major annotation applications, and perceived importance of annotation12Figure 2.7: Video annotations anchors and content [5], by Olivier Aubert features [49]. Those readers were undergraduate students, graduate stu-dents, professors, and professionals. Regarding locations of annotations, readersusually mark up text segments, write on margins, write at the top, write in a sep-arate document, or write between lines. The frequency of use of each is shown inFigure 2.5. These annotations were used mainly for three purposes: to remember,to think, and to clarify. Regarding electronic annotation features, the perceivedimportance of features varied, as shown in Figure Video AnnotationsJust as text annotations, two main components of video annotations are usually ofinterest: the content and the anchor.For a text annotation, the anchor is in the spatial dimension of the text; how-ever, for a video annotation, both temporal and spatial dimensions are included, asillustrated in Figure 2.7. For example, the video timeline is usually used to addressthe temporal dimension of video annotations. The spatial dimension is usuallysimplified as the whole video frame. For example, the video frame at the timepoint when an annotation is created is associated with the annotation. Few designshave utilized specific spatial regions of video frames to make annotations [55, 61].Video Annotation Learning System allows users to add graphical annotations suchas arrows onto videos, and lists the types of annotations with timestamps [11].The integration of both temporal and spatial dimensions of video annotationscan make it difficult to display information. As shown in Figure 2.8, the lim-ited region around the video timeline provides a very crowded space to displaythumbnails with simple annotations as marks [36]. In the proposed interface, onlythe temporal dimension is addressed by displaying marks of annotations in corre-13Figure 2.8: ToolScape [36], by Juho Kim.sponding locations of the timeline, the transcript of the video, and the filmstrip.When users are allowed to type in their own notes, there is usually a note listprovided [32, 51, 63]. As shown in Figure 2.9, MediaNotes1, which was developedat Brigham Young University, has a highly comprehensive set of annotation fea-tures, including naming (section A), segmenting (section B), commenting (sectionC), and tagging (section D). Annotations are grouped based on timeline positionsof annotated segments in a panel alongside the video player. Annotations can beused for complex data mining such as filtering, analysis across time, space, tagset, or person. In the proposed design, annotations are grouped into lists as well;in addition, the proposed design enables linking to specific video segments fromannotations and allows learners to review their annotations outside of the videowhere there may be more than one video, increasing convenience for the purposeof reviewing as the learner does not need to play each video again.Aside from a list, video annotations can be well visualized in the video time-line. In the Collaborative Lecture Annotation System (CLAS) [44], users can com-ment on a specific point of the video or the whole video, and annotations of thevideo are mapped to the timeline in a separate panel below the video player. Anno-tations of the users who watched the same lecture video are displayed and analyzedcollectively in this system, and the instructor of the course can see an analysis of1 2.9: MediaNotes developed at Brigham Young University.student video watching activities, as shown in Figure 2.10.There have been many video annotation systems designed for data analysis,and these systems display annotations in a similar way [21, 35, 37, 68]. For ex-ample, Anvil displays annotations in parallel tracks based on themes used in thedata coding process by the user [37], as shown in Figure 2.11. Observer XT used asimilar visualizing technique, but with multimodal data such as behavioural data,sound data, and physiological data, all in the form of text [68]. These systems usu-ally emphasize annotations themselves, for example codes, and the integration ofannotations with the original material (such as the transcript) is not well supported.Video annotations along with corresponding video clips can also be used towrite essays, and they appear as hyperlinks in this case [7, 46]. In the Video Inter-actions for Teaching and Learning (VITAL) system, users can add notes to videosegments as they watch videos, and embed annotated video segments as video-based evidence in essays. As shown in Figure 2.12, annotated video segments arelisted in the left panel, from which users can embed them into the essay on theright. Readers of the essay can view the video-based evidence by clicking on thehyperlink while reading.15Figure 2.10: CLAS [44], by Negin Mirriahi et al.Figure 2.11: Anvil [37], by Michael Kipp.16Figure 2.12: VITAL [46], by Frank Moretti et al.Though text is the most common form of video annotations, audio and visualannotations have been used in some systems as well. VideoPaper offers users theoption to add images as video annotations [7]. In Video Traces [54], the definitionof annotations is broader than other systems discussed above: users can annotatevoice, pointing, and drawing to images and audio-video files, as shown in Figure2.13. Annotations in this system are better used to stimulate collaborative learning,student-student interaction, and learner-teacher interaction as users can annotateon the same video collaboratively and can also respond to annotations by creat-ing a “threaded discussion”. The threaded discussion is similar to the annotationmanager in the presented work, except that the threaded discussion aggregates an-notations of different learners and displays different types of annotations in oneplace.Recently, researchers have tried to create annotations automatically for learnersbased on eye-gaze tracking. In GazeNoter which was developed in 2016, notes aretaken automatically based on the viewer’s eye-gaze patterns. The video contentrelevant to the current notetaking is highlighted, so the user is fully aware of thenotetaking process. When the user types in notes manually, video playback isslowed down; the video is paused if the video is about to change to another slide17Figure 2.13: Video Traces [54], by Amit Saxena et al.when the user is taking notes manually. This design was reported by all participantsas helpful for them to focus and think while writing [48].While the annotation process of most video annotation systems relies on visualinformation, there have been some utilizations of video transcripts at different lev-els, among which systems for qualitative data analysis account for a big proportionbecause of the data coding process [21, 37, 41, 60]. The audio channel of educa-tional videos contains as much as, if not more, information as teachers talking inclassrooms. Recently, DynamicSlide [34] has displayed the transcript of a videoon the right side of the video player as individual segments which are synchronizedwith the video, and in-video objects were linked to corresponding transcript seg-ments. As shown in Figure 2.14, the system enables users to make annotations onin-video objects, and notes are listed below the video player where learners canplay the corresponding video segment.In summary, a number of video annotation systems have been designed forvarious applications, but none of them provide access to annotations of multiplevideos in the same location. In the proposed design of the annotation manager, thelearner is able to see annotations of multiple videos on the same page (the course18Figure 2.14: DynamicSlides [34], by Hyeungshik et, and these annotations act as hyperlinks that direct the learner to correspond-ing video segments, offering a more convenient way to locate and rewatch videosegments.2.3 General DiscussionElectronic devices provide more options for interactivity in processing both textand video information, and interactivity has been found helpful for both types ofinformation processing. Though some video-based learning systems provide tran-script along with video, the full potential of video transcripts in learning is yet tobe unlocked.Video annotations, as an important type of interactivity in video, are distinctfrom text annotations due to the unique nature of video; though their effectivenessin improving learning outcomes has been confirmed, there has been little evidencefor how they can support learners in the process of locating in-video information,which is critical for revisiting video segments. Previous studies have suggestedthat learners did not make many video annotations [18–20, 30], but it is worthnoting that subjects in those studies tended to annotate on video content that theywanted to revisit in the future. Further, platforms used in these studies did not havefeatures supporting learners in this revisitation through video annotations. Thus, itis possible that learners did not annotate in larger quantities due to the absence of19a mechanism supporting the use of video annotations in revisiting video content ofinterest. Moreover, though video annotations were found to be used as bookmarksfor future rewatching, how much they can help in locating video content for thisrewatching remains untapped.The two major parts of the presented work are video annotations and the useof them for in-video information searching. The presented work creates a closedloop video annotation process from creating annotations to the use of them aftercreation, and thus fills present gaps in both video annotations and locating in-videoinformation.20Chapter 3Video Annotations inInformation-SeekingThis chapter presents the methods used to answer the research question posed bythis study. In the presented work, an interface was designed and implemented asthe apparatus, and an experiment was designed and carried out to collect data basedon several measures.In the proposed interface, video annotations are displayed in two places: theAM which collects all annotations made on the video, and the video player pagewhere video annotations are displayed along with the original materials, i.e., thevideo, the transcript, and the filmstrip. Thus, there were two ways of using videoannotations to locate information: through the AM, and on the video player page.To investigate whether the AM helped in the information-locating process, subjectswere asked to finish searching tasks under both the AM condition and the non-AMcondition; locating annotated vs. non-annotated information was also compared toanswer the research question. Searching time was analyzed to determine whetherthe AM helped in reducing searching time, and whether annotated information waslocated faster. To provide a richer answer to the research question, and to betterunderstand the components of the research question, annotations made by subjectsand responses to survey and interview questions were also collected.213.1 Experiment Design and Procedure3.1.1 ParticipantsThe interface targeted at undergraduate students, and the presented work includesboth quantitative study and qualitative study. The quality of the quantitative studywas ensured by assigning subjects to experiment conditions randomly, which willbe introduced shortly in this chapter, and a power analysis based on data froma pilot study. To ensure the quality of the qualitative study, UBC undergradu-ate students with diverse academic backgrounds were recruited as subjects of thepresented experiment through an online posting; the number of subjects was alsodetermined so that data saturation could be reached.16 subjects were recruited and this number was in line with previous studieson similar topics[2], and was also much higher than the number 4, which wassuggested by the power analysis using data from a pilot study, a simplified versionof the formal experiment. Details of the power analysis can be found in AppendixE.Subjects were between 18 to 27 years old, and the mean age was 20.63. Theywere from diverse academic backgrounds, including arts, science, engineering, andcommerce; the numbers of participants in each academic year were balanced, with5 in the first year, 3 in the second year, 3 in the third year, and 5 in the final year.Subjects’ basic information can be found in Table 3.1. Subjects’ prior experienceswith the ViDex platform, video watching, video rewatching, note-taking for videos,and annotating texts are summarized in Section 4.3.1 based on survey responses.3.1.2 DesignThe experiment design and the procedure for each subject are shown in Figure 3.1.The mini session was used to get subjects familiar with the interface and its fea-tures. Following the mini session, there were three sessions in the experiment: thetraining session, the formal session 1, and the formal session 2. Subjects in earlypilot studies reported that they found their strategies after finishing one session,so the training session was used for subjects to develop their strategy of learning,annotating, and information-seeking.22Subject ] Gender Age Major Year1 F 19 Math 22 F 19 Geological engineering 33 M 23 Psychology 44 F 19 Food, nutrition, and health 25 M 21 Physiology 46 M 27 Life science 47 F 19 Arts 18 F 24 Law 19 M 19 Mining engineering 110 F 21 Physics 311 M 19 Commerce 112 M 18 Science 113 F 20 Food, nutrition, and health 314 M 21 Math 415 F 20 Anthropology 216 F 21 Commerce 4Table 3.1: Demographics summary for subjects.Figure 3.1: Experiment design.The formal session 1 investigated how the AM affected subjects’ task perfor-mances and whether annotated information was located faster than non-annotatedinformation. Thus, subjects were asked to finish half the searching tasks with theannotation manager (AM), and the other half without the AM. The formal session2 was to find whether subjects would use the AM and annotation features sponta-neously, so subjects were not required to use the AM for any tasks; whether to usethe AM or not was their own choice.In the training session and the two formal sessions, subjects watched a videoat their own pace, and they were free to use any features on the video player page.They then filled a survey which was the same for all sessions; finally, they com-pleted a set of 12 searching tasks, which will be described shortly. The numberof tasks in each session was determined by the time constraint, as the experiment23Subject S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16Video 1 1 2 2 1 1 2 1 1 2 1 2 2 2 1 2AM O S O S S O O S O S S O S O O STable 3.2: Video watched and AM conditions for each subject in the formalsession 1. Video 1 was about the brain, and video 2 was about fusionpower. For AM, “O” means the same tasks as in Appendix B.2 for video2, and Appendix B.3 for video 1; “S” means the reversed AM conditionscompared to “O”, i.e., items not assigned with the AM in Appendix B.2or B.3 were assigned with the AM.Subject S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16Video 2 2 1 1 2 2 1 2 2 1 2 1 1 1 2 1Table 3.3: Video watched by each subject in the formal session 2. Video 1was about the brain, and video 2 was about fusion power. The searchingitems were the same as in the formal session 1, but without assigned AMconditions, as exemplified in Appendix B.4.aimed at finishing within 1.5 hours.All subjects watched the same video and finished the same set of searchingtasks in the training session, while the orders of the other two videos for the twoformal sessions were counterbalanced so each video was watched by half of thesubjects. To mitigate the effect of the videos on subjects’ task performances, theorder of videos in the two formal sessions was counterbalanced. Though the orderof tasks for each video was fixed, the AM conditions associated with tasks werecounterbalanced, as some tasks may be intrinsically easier to locate with the AM.Due to this experiment design, simple within-subjects or between-subjects analysiswere not applicable to the data obtained in the presented work. Thus, the linearmixed effects model (LMEM) was used for analysis, which will be introduced inSection 4.1.Counterbalancing for videos and AM conditions (described in the followingsection) in the formal session 1 is summarized in Table 3.2. Counterbalancing forvideos in the formal session 2 is shown in Table TaskThere were twelve searching tasks in each session. In the training session andthe first formal session, half of the searching tasks asked the subject to finish the24task with the AM, while the other half without the AM; the order of searchingtasks of a video remained unchanged, but the AM conditions assigned to taskswere counterbalanced in the formal session 1, as shown in Table 3.2. In the formalsession 2, the subject was free to use the AM at his/her own discretion, i.e., therewas no AM condition assigned to each task and the subject decided whether to usethe AM or not on his/her own.The goal of a searching task is to locate a piece of information in the originalmaterial, a video in the case of the presented experiment. For each searching task,the subject started from the course page and completed the task either with orwithout the AM. When using the AM, the subject entered the video player pageby clicking on an annotation in the AM; the subject searched through the AM ofthe video to find the annotation that he/she thought would be the most helpful tofind the to-be-located information on the video player page. When not using theAM, the subject entered the video player page by clicking on the cover frame of thevideo on the course page. After the subject entered the video player page, he/shewas free to use any features on the page to find the point of the target information.If the subject found the correct point, the experimenter would ask the subject to goback to the course page, indicating the completion of the task. There were no timeconstraints for tasks, but subjects were asked to try their best to finish each task asquickly as possible.3.1.4 MaterialsVideosEarly small-scale pilot studies showed the following: subjects became bored ortired in the middle when watching a 13-minute long video; if the video was tooeasy, subjects learned passively, ignoring the video transcript and not making anyannotations; if the video content was too difficult, subjects became lost, especiallywhen they did not have enough prior knowledge; if the visuals or the transcript of avideo contained little information, subjects tended to ignore them and focus on theother presentations of video content. In light of the above, each of the three videosused in the experiment were (a) 5 to 7 minutes long, (b) at medium to high difficulty25levels (rated as 3-4 on a 1-5 difficulty scale in pilot studies), (c) intense informationcovered both visually and verbally (the speaker in the video spoke continuously,and the video frame changed for the most part within every 10 seconds), and (d)an introduction to the topic that did not require much prior knowledge (each videohas been viewed for more than 2 million times and received various commentson YouTube, which indicates that viewers with various backgrounds were able tounderstand the video in some way).Videos used in the experiment were 3 segments of popular YouTube videosdesigned to educate the public. The video used in the training session was about1quantum computation, and the other two videos were about 2the brain and 3fusionpower respectively.To-be-located InformationA piece of to-be-located information was considered as a searching item. Search-ing items for all three videos are shown in Appendix B. Appendix B.1 is the set ofsearching items for the training session, which was the same for all subjects. Ap-pendix B.2 and B.3 show what the searching items for the two videos looked likein the formal session 1 (the counterbalancing of AM conditions has been explainedin Section 3.1.2). Appendix B.4 is an example of how subjects were reminded thatthere was no AM condition assigned, and that they were free to either use or notuse the AM.For the searching items of each video, half of them were summarizations ofsegments of the video content, thus not only memorizing but also understanding ofthe video content was required for the learner to locate them, which simulated thereal-life case. Each topic was ensured to appear in the video only once. To simulatecases when various types of information remind the learner which content to revisit,using the other half of topics, three of them were original transcript segments andthe remaining three were video frames. To prevent predicting the location of theto-be-located information, and to ensure the independence of each task, the orderof these information pieces was randomized so that they did not follow the same1 Accessed on February 13, 2018.2 Accessed on February 13, 2018.3 Accessed on February 13, 2018.26chronological order as in the original material.Each piece of to-be-located information lasted for a short period of time in thevideo, thus a task was considered complete if the subject located the informationat any point within the corresponding time range.3.1.5 ProcedureAfter a short greeting, the subject was asked to sign a consent form, then to com-plete the pre-experiment survey. Next, a verbal introduction as well as demonstra-tions of how each feature worked were given by the experimenter, followed by amini session to allow the subject to fully experience the interface. In the mini ses-sion, the subject watched the first two minutes of the training session video, andwas asked to add six annotations (two for each type) and perform six searchingtasks including all possible task types. Then the subject went through 3 sessions:the training session (also referred to as session 0), the formal session 1, and theformal session 2.The procedure for each session was the same, as shown in Figure 3.1. In eachsession, the subject watched the video first, then filled out a survey, and finallyperformed twelve searching tasks. While watching the video, subjects were free towatch as many times as they wanted, and they could use any feature on the videoplayer page. The survey used in each session was the same. After each session, thesubject was offered the option to take a three-minute break.After all three sessions, the subject was asked to complete the post-experimentsurvey, followed by a semi-structured interview.The entire process lasted for about one and a half hours. The screen of thesubjects finishing their searching tasks was recorded and used to analyze searchingtime.3.2 Interface Design3.2.1 OverviewThe proposed interface consists of two main components, the video player page forlearning and annotating, and the course page for course management and annota-27Figure 3.2: Video player page.Figure 3.3: Course page with annotation manager.tion management. The video player page aims at creating a seamless integrationbetween video and text to fully utilize both the visual and audio information of thevideo. The course page collects all videos of a course and provides an annotationmanager for each video.3.2.2 Video Player PageThe video player page was adapted from the ViDex platform [23]. As shown inFigure 3.2, the video player page consists of three sections: a video player, a tran-28Figure 3.4: The window to add annotations.script, and a filmstrip. Contents of these three sections are synchronized and arethree presentations of the same learning material. Learners watch a video and makeannotations on this page, and they can navigate the video by interacting with anyof these three sections. Annotations and the progress of video watching are alsodisplayed in all three sections, which are discussed briefly in following paragraphs.Annotations are synchronized with the video and are mapped to corresponding lo-cations in all three sections.There are three types of annotations that the learner can make: highlight, tag,and note. The learner can add annotations in the transcript section and the film-strip section following the same procedure: select a video segment by clicking anddragging the mouse, and then select or create an annotation in the pop-up window.As shown in Figure 3.4, the window for adding annotations contains five icons:one for highlight, one for tag, one for note, one for sharing, and one to close thewindow. To add a highlight, the learner just needs to select a color; the learnercan also delete a selected highlight by clicking the eraser button below the colorselection region. To add a tag, the leaner needs to select a tag first, and then clickon the selected tag to associate it with the selected segment. As shown in Figure3.5, a list of existing tags is shown if the learner clicks on the arrow sign; a win-dow for creating a new tag pops up if the learner clicks the “Add New Tag” button.To add a note, the learner simply fills in the blank by typing, and then clicks the“Add” button. If the learner hovers over any annotation icon in any of the threesections, he/she will see the details of the corresponding annotations, which willbe described shortly in following paragraphs.The transcript section is a place to display both the literal transcript of the videoand annotations. The transcript is divided into segments with time stamps, and eachsegment is clickable so that the learner can jump to a specific video segment at will.29Figure 3.5: Select a tag or create a new tag.Figure 3.6: Hover over an annotation icon in the transcript section to see thedetails of annotations.The current segment being played is indicated by the shadow around the transcriptsegment. Tags and notes made for each segment are shown by icons above thesegment; the number following each icon indicates the number of the same typeof annotations made for the segment. If the learner hovers over any icons above atranscript segment, he/she will see the details of the corresponding annotations, asshown in Figure 3.6. The learner can also delete tags and notes, or modify noteshere.The filmstrip, as shown in Figure 3.7, is a collection of video frames withequal time intervals between any two adjacent frames, along with indicators forannotations, a viewing history heat-map, and a vertical bar representing the currentframe being played. The heat-map can be disabled by clicking the switch at thetop-left corner. Highlights are indicated by color bars at the top; tags and notesare indicated by corresponding icons in the middle of the filmstrip. Icons for tagsare placed at the same horizontal line above icons for notes, and adjacent tags ornotes are grouped into one icon to avoid overlapping. If the learner hovers over anannotation icon, he/she will see the details of the annotations and modify or delete30Figure 3.7: The filmstrip section.Figure 3.8: The timeline of the video player section.annotations, just as in the transcript section. The learner can also click on any placeof the filmstrip to play the video from a specific point at will.In the video player section, video control options are provided, including aplay/stop button, speed up and slow down buttons, a button to switch caption op-tions, a button for volume adjusting, and a button to enter full-screen mode, asshown in Figure 3.8. Moreover, the learner is able to add annotations and playhighlighted parts only. To add an annotation in this section, the learner simplyclicks any of the three annotation-adding buttons below the timeline bar: “High-light”, “Tag”, “Note”. To play only the highlighted parts, the learner can hoverover the “Play HL” button and then select a highlight color to play in a pop-up win-dow, as shown in Figure 3.9. Annotations are displayed around the timeline bar atdifferent horizontal lines depending on their types. Tags are displayed underneaththe timeline bar as circles and squares representing a single tag and grouped tags,respectively; the colors of circles representing single tags are the same as the tags,while all squares representing grouped tags are green. Highlights are displayed ascolored bars above the timeline bar, and the colors are the same as correspond-ing highlights. All notes are orange with single notes as circles and group notesas squares. The learner can also hover over indicators of tags or notes to see thedetails.31Figure 3.9: Play highlighted parts only.Figure 3.10: Annotation manager.3.2.3 Annotation ManagerIn the course page, each lesson/video is equipped with an annotation manager(AM) which manifests as a blue rectangle at the top right corner of the lessoncard, as shown in Figure 3.3. If the learner clicks the blue rectangle, a window willpop up beside the rectangle. This popup window displays all annotations of thevideo based on category in three separate tabs: “Highlights”, “Tags”, and “Notes”.As Figure 3.10 shows, there is a list of annotations of the same category in eachtab.For each item in the “Highlights” tab, the highlighted transcript segment is thesame as in the video player page — a text with a color background; moreover,the preceding and following ten words (if there are) are also displayed to give thehighlight a context. If the learner clicks on text with a color background, he/shewill be led to the video player page with the video playing from the beginning ofthe highlighted segment.In the “Tags” tab, each item consists of a tag sign showing the tag name andthe transcript segment that the tag attaches to. The tagged transcript segment isdisplayed with a grey background and with context, similar to the “Highlights”32tab. The color of the rectangle is the same as the color of the tag icon in the videoplayer page. If the learner clicks on text with a grey background, he/she will be ledto the video player page with the video playing from the beginning of the transcriptsegment that is associated with the tag .In the “Notes” tab, each note is displayed with a grey background. Clicking onany note will lead the learner to the video player page with the video playing fromthe beginning of the transcript segment that the note is added to.3.3 Measures3.3.1 Task PerformanceTask timeTime the subject spent on each searching task was defined as the duration betweenwhen the subject opened the AM or clicked on the cover frame of the video, andwhen the subject was informed of the successful completion of the task by theexperimenter.AnnotationsAnnotations the subjects made for each video were collected and analyzed.3.3.2 SurveysSurvey questions and interview questions mainly covered 3 topics: video learning,video annotations, and the AM.The pre-experiment survey focused on participants’ prior experiences withvideo watching, video learning, and annotating both text and video.The session survey was used to solicit subjects’ immediate reflections on thesession, i.e., the video watching process, the annotation process, annotations made,and the experiment material itself (the video).The post-experiment survey was for the participant to report their experiencesin all sessions and general perceptions, including perceptions of the annotationprocess, annotations, and the AM.333.3.3 InterviewsThe interviews were semi-structured. All subjects were free to express their expe-riences with the interface or the experiment, but a fixed set of ten questions wereasked at some point in the interview. These ten questions focused on how theycreated their video annotations, how they searched for to-be-located information,and their experiences with the AM.34Chapter 4Results and DiscussionThis chapter presents and discusses data collected on the measures described inthe previous chapter. A more integrative discussion of data on these measures isprovided in Chapter 5.4.1 Task PerformanceSubjects’ performances on the searching tasks were analyzed to investigate whetherannotations helped learners locate previously seen information in video, and whetherthe use of the AM saved time in the process. Time is measured in seconds in thissection. A fraction of the user study data can be found in Appendix C. AppendixD shows the R code of data analysis using linear mixed effects models which willbe introduced shortly.4.1.1 Training SessionIn the training session, all subjects watched the same video and finished the sameset of searching tasks. Thus, individual differences, if any, are supposed to besalient in this session.All of the 16 subject performances on the 12 tasks of the training session areshown in Figure 4.1. There was not much difference among performances of the16 subjects, but S7 took a longer time to finish the tasks, giving reason to considerindividual differences in the present analysis.35Figure 4.1: Subjects’ task performances in the training session.Subjects performed almost equally on the 12 tasks, with tasks finished usingthe AM (mean (M) = 23.43, standard deviation (SD) = 20.23) took less time thannot using the AM (M = 44.65, SD = 47.55). Locating annotated items (M = 29.15,SD = 33.96) also took less time than non-annotated items (M = 42.97, SD = 43.21).Non-annotated items that were also not searched using the AM took the longesttime (M = 53.12, SD = 51.69), while searching for annotated items using the AMtook the least amount of time (M = 21.82, SD = 21.25). Thus, whether the AMwas used or not and whether the item was annotated or not might have affectedsearching time, and there may be an interaction between whether or not the item isannotated and the use of the AM.To better explore the data, a linear mixed effects model (LMEM) was fitted, asexplained at the end of this section. The LMEMs for all sessions are similar, sothe one for this session is explained here as a background for following sessions.LMEM was used because it is able to deal with unbalanced designs; in the proposedexperiment, the number of annotated items and the number of non-annotated itemswere unequal, and the number of cases when the AM was used was not the sameas when the AM was not used. Thus, LMEM is preferable to traditional analysisof variance (ANOVA) models. Kenward-Roger’s approximation was used to obtainp values because it produced acceptable Type I error rates for small samples.In the LMEM used in this session, between-subject issues were addressed byrandom effects as they were expected to be generalized. For example, the resultshould be robust for other participants and for other videos or searching items.36Within-subject issues were addressed by fixed effects, including AM conditions,whether annotated or not, task order, and interactions among those factors.The analysis results indicated that using the AM significantly reduced search-ing time (F (1, 9.13) = 5.52, p = 0.043), and searching for annotated items took sig-nificantly less time than for non-annotated items (F (1, 130.62) = 5.67, p = 0.019).The effect of the interaction between the AM condition and whether the item wasannotated was not significant. The order of the item did not have a significant effecton task performance, nor did its interaction with the AM condition, indicating thatthe learning effect did not affect subjects’ task performances significantly.Model ExplanationThe LMEM used for this session was as follows:time = AM + annotated + AM:annotated + task order + AM:task order+ (1|sub ject)+(1|item)+ εwhere there were five fixed effects followed by two random effects and an errorterm on the right side. AM was the AM condition associated with the item; an-notated was whether the item was annotated by the subject while watching thevideo; AM:annotated was the interaction between AM and annotated, because us-ing the AM for annotated items may be more efficient than for non-annotated items;task order was the order of the task in the set of twelve searching tasks for thevideo; AM:task order was the interaction between AM and task order, because themore the subject used the AM, the more comfortable he/she may feel about using it,potentially resulting in the AM being a more useful tool; (1 | sub ject) was the ran-dom intercept for each subject, because the individual difference among subjectsmay play a role in how they perform on searching tasks; (1|item) was the randomintercept for each item, because some items might be easier to find than others; theerror term represented the deviations from the predictions due to “random” factorsthat were out of the purview of the experiment.4.1.2 Formal Session 1Subjects’ performances in this session are shown in Figure 4.2. As seen in thetraining session, there was no significant interaction between the AM condition37Figure 4.2: Subjects’ task performances in formal session 1.and whether the item was annotated or the order of the item. Whether the itemwas annotated has a significant effect on searching time (F (1, 169.3) = 5.09, p =0.025). Searching for annotated items (M = 20.67, SD = 20.03) took significantlyless time than for non-annotated items (M = 26.87, SD = 25.12). Using the AM(M = 22.59, SD = 22.61) took about 2 seconds longer to finish the searching taskcompared with not using the AM (M = 21.07, SD = 19.63), but the effect was notsignificant.Model ExplanationThe LMEM used for this session was the following:time = AM + annotated + AM:annotated + task order + AM:task order+ (1|sub ject)+(1|video/item)+ εwhere (1|video/item) represented the random intercept for each item of a video.The factor video had 2 levels, because there were two videos in this session; thefactor item had 12 levels and was nested in the factor video.384.1.3 Formal Session 2The LMEM used to analyze session 1 data was used for this session. Resultssuggested that the interaction between using the AM and whether the item wasannotated was not significant (F (1, 182.69) = 3.20, p = 0.075). There was nosignificant difference in searching time between using the AM and not using theAM; using the AM (M = 15.79, SD = 17.39) took almost the same time as notusing the AM (M = 15.61, SD = 10.71) to locate information in video. Searchingfor annotated items (M = 14.79, SD = 16.14) took less time than non-annotateditems (M = 19.87, SD = 14.47), and the effect was significant (F (1, 148.63) =4.69, p = 0.032).This session was also designed to find subjects’ preferences of whether touse the annotation manager to finish searching tasks when there were no AM con-ditions assigned to tasks. For the training session and session 1, subjects wererequired to use the AM for half the tasks and not to use the AM for the other halfregardless of their preferences and their memories about whether they had anno-tated related parts or not. So the two sessions were designed to find out how muchusing the AM could improve subjects’ performances on the searching tasks. Insession 2, the subject could decide whether to use the AM or not. As shown inTable 4.1, 12 subjects used the AM for more than half of the searching tasks, andthe average number of AM cases was 9 while the average number of non-AM caseswas 3. This indicates that using the AM was highly preferred over the traditionalway of searching for watched video segments by playing the video from the verybeginning and sliding through the timeline.Hypothesis 2, which focuses on the relationship between subjects’ preferencesof using the AM and annotations they made on the video, is proposed: the moreannotations they have made, the more likely they would use the AM to find videosegments. As shown in the table counting annotated items and non-annotated itemsabove, 15 subjects annotated more than half of the items in session 2, and theaverage number of annotated items across 16 subjects was 10 while the numberfor non-annotated items was 2. Comparing the two rows counting the number ofannotated items and items searched using the AM (shown in Figure 4.3), it is notdifficult to see that the subject who used the AM the least (1) also annotated the39Figure 4.3: Comparing numbers of items searched with the AM, annotateditems, and annotations made by subjects.Subject S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 MeanAM 2 6 12 7 12 10 12 11 1 11 9 12 10 10 6 12 8.93annotated 7 10 12 12 9 10 12 11 2 11 8 11 12 8 9 12 9.75annotations 9 14 19 78 26 18 19 27 3 43 18 16 48 6 22 44 25.63Table 4.1: Numbers of items searched with the AM, annotated items, andannotations by subjects in session 2.least number of items (2); subjects who used the AM most were also among thosewho annotated the most. A possible explanation for this phenomenon would be:subjects might remember what and how much they had annotated, thus they mayhave been able to predict whether they would find helpful information in the AMto help them finish the task, and then made decisions on whether or not to use itbased on the prediction or judgement. This hypothesis will be checked in Section5.1 of Chapter 5 using data from surveys and interviews.4.2 AnnotationsPrevious research has shown that learners are not active in making video anno-tations [19]. This may be due to the lack of a mechanism for later use of theseannotations [30]. As the AM supports the use of video annotations for locating40Highlights Tags Notes Highlight colors Tag names TotalSession 0 14.25 2.88 3.94 2.31 1.38 21.06Session 1 16.56 1.44 4.00 2.44 1.00 22.00Session 2 20.25 1.44 3.94 2.38 1.19 25.63Average 17.02 1.92 3.96 2.38 1.19 22.90Table 4.2: Average numbers of annotations made by 16 information, learners’ annotating behaviors may also be changed. Thus,annotations that learners created while learning with videos are analyzed here.As Table 4.2 shows, subjects made a considerable amount of annotations foreach video, even though all videos were only around 6 minutes long. On average,subjects made 23 annotations per video, of which 17 were highlights, 2 were tags,and 4 were notes; these proportions of annotations were consistent across all threeexperiment sessions. Slightly more annotations were made in session 2, and thiswas a result of the increased number of highlights.Each subject’s preference of annotation types and amount of annotations madewere consistent across all three sessions, including the training session. Highlightswere considerably more popular than the other types of annotations, which was inline with the previous study about text annotations [49]; 10 subjects made morehighlights than tags or notes, 4 subjects made more notes than tags or highlights;no subject made more tags than highlights or notes. The average number of colorsselected for highlights was 2 for each of the three sessions, and the average numberof tag types used was only 1 for each of the three sessions.4.3 Surveys4.3.1 Pre-Experiment SurveyAs shown in Appendix A.2, this survey asked about prior experiences with theViDex platform, video watching, video rewatching, note-taking for videos, andannotating textbooks. Likert-scale questions were the majority, complemented bysome questions asking subjects to write down their answers. Not all subjects an-swered every question. Subject responses to the Likert-scale questions are summa-rized in Table 4.3.41Question Scale Mean Median SD nFamiliarity with ViDex 1-5 1.81 1 1.22 16Frequency of video watching 1-7 6.18 7 1.25 11Frequency of video watching for learning 1-7 3.7 4 1.95 10Frequency of video rewatching for learning 1-5 1.9 1.5 0.99 10Perceived difficulty of finding previously seenvideo segments for learning1-7 1.83 2 0.75 6Frequency of note taking while watching videosfor learning1-5 2.78 3 1.09 9Frequency of reviewing notes taken whilewatching videos for learning1-5 2.3 3 0.95 10Frequency of note taking while reading booksfor learning1-6 4.3 4.5 1.42 10Frequency of highlighting while reading booksfor learning1-6 3.1 3 1.85 10Frequency of reviewing annotations (highlightsand notes) made while reading books for learn-ing1-5 2.7 3 0.95 10Table 4.3: Subjects’ responses to pre-experiment survey questions.Generally, subjects were unfamiliar with the ViDex project. 6 subjects weresomewhat familiar with the project, but it is worth noting that the interface usedin the presented experiment was new, thus being familiar with the project did notnecessarily mean that they were familiar with the interface. Subjects can be seenas novices of the proposed interface in this experiment.Subjects watched a variety of videos daily, but only watched videos for learn-ing half the week. Not only were course-related videos included, but videos ofpersonal interest, such as DIY tutorials and music videos, were also watched forlearning. Most videos were watched on YouTube. Thus, the subjects can be seenas experienced video consumers and intermediate video learners.Rewatching a video for learning was common but not frequent, and subjectsexplained that this depended on the difficulty of the video content. 6 subjects pro-vided more detailed reflections on their rewatching habits. Two main purposesfor rewatching a video were to prepare for an exam and to review contents thatwere not understood. Rewatching usually happened the day before an exam, andrewatching both specific segments and the whole video were common. To find aspecific segment to rewatch, scrolling through the video was commonly performed,and subtitles as well as thumbnails were thought as helpful in the process.42Writing notes while watching videos for learning was common, especially forparts that were difficult to understand or follow. Those notes, or parts of them, wereusually reviewed to consolidate memory or to prepare for an exam or assignments.In comparison, writing notes while reading books was performed more fre-quently, and notes were usually written on main points or information that wasnot already known. Highlights were made less than notes for books, and part ofthe reason for this was because highlighting would reduce the book’s resale value.Though highlighting helped to bring attention to specific parts of the text so that thelearner did not need to read or review all of the text, highlighting was considered asmore passive than writing notes. Subjects reviewed either all of their annotationsor part of their annotations on books slightly more often than notes on videos forexams or assignments.In summary, responses to this survey suggested that (1) subjects could be con-sidered as novice users of the proposed interface in this experiment, (2) subjectswere experienced in video watching and had some experience with video learning,(3) subjects usually used the low-level strategy of scrolling through the video time-line to find specific video segments for revisitation, (4) most subjects annotated forboth videos and books, while more subjects reviewed annotations on books thannotes for videos. Discussions of annotating video vs. text and how the proposedinterface affected subjects’ video reviewing patterns will be given in Section 5.1of Chapter 5, with evidence collected from interviews and other surveys.4.3.2 Session SurveySession surveys filled right after subjects finished watching the video solicited im-mediate reflections on learning experiences, as shown in Appendix A.3. Subjectresponses are summarized in Table 4.4. All questions of this survey were Likertquestions with a 1-5 scale.Generally, subjects had some, but not ample, prior knowledge about the topiccovered in the video (M = 2.88, SD = 1.36), which simulates the situation in theclassroom when an instructor gives a lecture. Subjects were interested in the videos(M = 3.91, SD = 0.92) and topics they covered (M = 3.95, SD = 0.79), and thevideos were thought to be well-produced and effective in teaching the topics (M =43Question Mean Median SD nPrior knowledge about the topic. 2.88 3 1.36 16Interest in the topic 3.95 4 0.79 11Interest in the video 3.91 4 0.92 11Perceived effectiveness of the video in teachingthe topic4.18 4 0.8 11Focused more on learning than on making an-notations3.5 3.5 0.95 16Learning took more efforts than making anno-tations2.64 2 1 11Perceived helpfulness of making annotations inlearning3.81 4 0.82 16Perceived quality of learning 3.69 4 0.69 16Perceived familiarity of the video content afterlearning3.64 5 0.85 11Perceived possibility of reviewing annotationsmade in the session for exams of the course3.97 4 1.06 16Table 4.4: Subjects’ responses to session survey questions (Likert scale, 1 to5) of session 1 and 2.4.18, SD = 0.8).In terms of annotating while learning, subjects focused more on learning thanannotating (M = 3.5, SD = 0.95), but this depended on how difficult the videocontent was. The more difficult the video was, the more efforts were devoted tolearning. Considering that the three videos used in the experiment were rated atmedium to high level of difficulty in early pilot studies, and that subjects generallydid not have much prior knowledge about the topics covered by those videos, theannotation process should not have significantly diverted the learner’s attentionfrom learning the video content. Generally, subjects thought the annotation processwas helpful for learning (M = 3.81, SD = 0.82), and would like to review theirannotations for potential exams of the course (M = 3.97, SD = 1.06). Thus, subjectsmay be willing to see their annotations again either in the AM or on the video playerpage.4.3.3 Post-Experiment SurveyThe post-experiment survey used in the presented experiment can be found in Ap-pendix A.4. Subjects’ responses to the post-experiment survey are shown in Table44Question Mean Median SD nEase of use of the AM 4.19 4 0.54 16Like using the AM 4.19 4 0.75 16Using the AM saves searching time 4.44 5 0.73 16Will use the AM to rewatch video segments forlearning4.19 5 1.22 16Making annotations while watching videos forlearning is easy3.94 4 0.68 16Getting used to making annotations whilewatching videos for learning is easy4.3 4.5 0.95 10Will review annotations made on videos for aca-demic goals such as to pass exams4.38 4.5 0.81 16Annotations were made more for learning thanfor the experiment tasks3.56 4 1.09 16Will make annotations for course videos in asimilar way as in the experiment4.45 4 0.52 11Table 4.5: Subjects’ responses to post-experiment survey questions (Likertscale, 1 to 5).4.5. This set of Likert questions used a scale of 1-5.The AM was considered easy to use and effective in saving searching time. TheAM was also liked by subjects, and was projected to be used for future rewatching.Generally, the annotation process was also thought of as easy to become familiarwith.During the annotation process, most subjects (13 out of 16) were making moreannotations for learning than for the tasks, and they felt strongly that they wouldmake annotations on the proposed interface the same way for learning with coursevideos. Subjects also reported the strong preference to review annotations they hadmade for academic goals, such as passing an exam.4.4 InterviewsInterviews were carried out to have a better understanding of subjects’ annotatingpatterns, perceptions of different types of video annotations and the annotationprocess, video rewatching patterns, and usage patterns and perceptions of the AM.45Positive Negative NeutralHighlights 10 2 0Tags 2 4 0Notes 5 6 1The annotation process 7 6 1Table 4.6: Summary of subjects’ interview responses regarding annotationsand the annotation process.4.4.1 AnnotationsTwo topics were discussed by subjects: the content of their annotations, and theirperceptions of the three types of video annotations. Subjects’ responses are sum-marized in Table 4.6.Regarding the content of annotations, 13 out of 16 subjects said that theymainly annotated on key points or important points of the video. Subjects alsoreported that they had annotated on more detailed information such as examples,definitions, or descriptions (n=4); one subject annotated on things that she liked;two subjects annotated on words that were not understood.Subjects had mixed opinions toward the three types of annotations.For highlights, advocates (n=10) thought they were easy and quick to makewhile watching the video and they emphasized more on the original material sothat learners knew what information they were annotating on. Highlights werealso thought to be useful to label or bookmark main ideas of the video, and theyreminded the learner of information following or preceding them as well, whichhelped in recalling and finding information in the video. For other subjects (n=2),highlights were random and not useful, by being too broad, not specific enough,and unable to stimulate thinking.Tags did not receive as much positive feedback as highlights. One subjectreferred them as interesting, but did not think they would be helpful. One subjectreferred to tags as more personalized than highlights but not as useful as notes ingetting information, and this comment is in line with previous studies comparingtagging and handwritten notes [67]. Another positive feedback came from a subjectwho thought that tags were helpful in providing the context of an annotation, justas highlights were. In addition to being considered unhelpful and cumbersome,46slowing down the video watching process, tags also caused some confusion. S12commented on tags as follows:“Tags will be a bit harder, because it’s hard to categorize it. So if Iwant to flag it as important, I would have to probably create a new tag,(and then define) what type of ‘important’ it is.”Similar to notes, tags are learners’ comments on the original material. Even thoughlearners add their own input by adding either notes or tags, the difference betweenthese two types of annotations is that a tag is reusable once created. Tags areusually shorter than notes as well and are usually words, while notes are usuallyfull sentences.Subjects’ opinions on notes varied. In the opinion of supporters (n=5), notesenabled them to reinterpret information in a way that made more sense to themand made the information more straightforward; notes also helped them organizeinformation, such as grouping separated information, so that they did not need tomake two or more highlights; notes were used as precursors or place marks as well.Opposite opinions (n=6) focused on the process of creating notes and the contentsof notes. The subject who referred to tag-making as a cumbersome process alsomade the same comment for notes. Another subject preferred handwritten notesover taking notes on computers. Regarding the contents of notes, two subjectsfelt that there was no need to add notes because all information was already inthe transcript, and one of them would rather add notes from other sources foundonline for future reference if necessary; one subject thought that notes were “otherthings” added to the original material, and thus preferred highlights; one subjectmade mainly notes and highlights because notes alone could not remind the subjectwhat they referred to.4.4.2 The Annotation ProcessIn this part of the interview, subjects were asked about their perceptions of theprocess of making annotations while watching a video for learning. 14 subjectsanswered this set of questions.Half of the subjects thought making video annotations facilitated learning (n=7).Annotating on videos was thought to be natural and easy instead of being a disrup-47tive process. The interaction with the text and the video also made the video watch-ing process more engaging rather than a passive information receiving process, andannotation “footprints” of this process helped to find segments for future rewatch-ing, like pinpoints. Subjects also reported that annotations helped them rememberinformation in a more clear and organized way. For example, S15 commented asfollows:“I think it helped, like, making sure that I understood the information,because then I would go back to the point that I thought was important,and reread it. So it’s like double the information.”The annotation process was also reported to have an effect on how subjectsfound information in video, and S6 commented as follows:“The behaviour of annotating, highlighting the key points or conceptsbuild that road map in your brain. So when you have to go back andfind certain things, the active highlighting that builds that structure inyour head so you know roughly where to look.”S8 also reflected that she was well aware of what annotations to make by goingback and forth, and structuring her annotations, thus she was able to recall whethershe had annotated the to-be-located information or not.The annotation process for video watching was also thought to lead to activelearning because subjects needed to make annotations physically, and this processas well as annotations themselves helped in remembering video content and keep-ing up with the progression of the video (n=5). S8 reflected as follows:“In a video, he (the speaker) goes through things in a certain order.He talks about certain things. If I don’t make notes, then when I amtrying to remember what was going on, I can’t remember the sequenceof the topics, I can’t remember exactly what details he described, andI can’t remember the exact location of, for example, the function ofhypothalamus.”However, the multi-sensory and transient nature as well as the highly interac-tive process posed challenges to subjects in focusing on the content. 3 subjects said48that they would prefer watching the video for the first time without doing anything.Then they would go back and make annotations during their second-time viewing,and then a third viewing without doing anything again. The 3 presentations of theinformation, the transcript, the filmstrip, and the video, competed for subjects’ at-tentions, thus it was difficult for some subjects to focus on all of them. The intrinsicpace of a video and subjects’ pace of making annotations posed an added difficultyas well. For example, S12 commented as follows:“Well, while I was making annotations, I was focusing on writing it, soat that point I am just hearing things, or focusing on what I’m writing,or like the dialogue outside, as opposed to focusing on what’s on thescreen. So with the case of the first, there was a lot more informationdisplayed on the screen, I couldn’t focus on it while I was writingannotations, but I could still hear everything that is being said.......It’s extremely hard to take notes and pay full attention to the video atthe same time. If it was just audio, that would be different, becauseyou still hear everything. But if you’re writing things down, you kindof fall behind at some point, because you can’t write faster than theyspeak.”When unbalanced attention was paid to the three presentations of information, thevideo watching experience would be completely changed. For example, S11 com-mented as follows:“Right now I know I wasn’t paying attention to the visuals, I was justlooking at the text, so I was just like reading a book than really watch-ing a video. That’s what it felt like to me. ”S14 commented on his annotation style as “out of an instinct” and there was nospecific plan on how to annotate. He also reflected that he played back frequentlyas a result of a disrupted stream of thoughts caused by the annotation process.49Support Not SupportAnnotating text was easier 5 3Annotating video was more engaging 5 3The AM was helpful for navigating to video segments of interest 14 0Annotations affected the use of the AM 8 –Table 4.7: Numbers of subjects’ interview responses regarding annotatingvideo vs. annotating text and the use of the AM.4.4.3 Annotating Video vs. Annotating TextSubjects were asked about their perceptions of the difference between the anno-tation processes for video and text. 12 subjects answered this questioning, and asummary is given in Table 4.7.Subjects’ opinions were mixed for either type of annotating. For some sub-jects (n=5), annotating text was easier and less disruptive than annotating videos,and they were able to read and annotate at their own pace while annotating textrather than trying to keep up with the progression of the video. While annotatingtext, subjects also needed to pause reading, annotate, and resume reading, just asthey needed to pause watching, annotate, and resume watching while annotatingvideo, but pausing and resuming text reading was perceived as more comfortablebecause subjects did not need to slow down or play back, so the whole reading andannotating process was more integrated. However, S3 thought about it in the oppo-site way: annotating while reading took more time as it was impossible to read andannotate on text at the same time. Several subjects (n=3) wrote notes in a separatenotebook while reading books, thus it took more time for them to link their notesback to the original text segments to review; they had to search through the bookmore intensely as they could not skim through the book like skimming through avideo in the experiment. For example, S13 commented as follows:“When I read, I usually take notes on a separate piece of paper, so it’snot on the book itself, which takes longer to match my notes to thesection or the paragraph or whatever.”Except for providing quicker links from annotations back to the original videosegments or transcript segments, making video annotations was also thought of asa more engaging learning process than annotating text because of the added audio50and visuals (n=5). S15 commented as follows:“I like the visual component of having a video while also making an-notations, rather than just reading the same words, then reread it abunch of times but with only the annotations.”Though some subjects thought annotating, especially highlighting, videos wasquick and easy, 3 subjects regarded this process as more disruptive and distractingthan annotating on text. More specifically, frequent pausing and playing back thevideo disrupted subjects’ flow of video watching and learning.4.4.4 The Use of The AMThis section of interview questions asked subjects about their strategies using theAM in session 2 when subjects made their own decisions regarding whether to useor not use the AM. General perceptions of the AM were also asked. 14 subjectstalked about their use of the AM in the interview. A summary is given in Table 4.7.All subjects (n=14) thought that the AM was helpful because it took them di-rectly to video segments of interest so that they did not need to scan through thewhole video again, including the two subjects who used the AM only once andtwice respectively in session 2.Annotations and the annotation process seem to be closely associated with theuse of the AM, and the quality and amount of annotations were reported to impactthe effectiveness of the AM. Subjects mainly annotated on key points or conceptsof the video content, and they were aware of what information they had annotatedon (n=7), thus looking at annotations gave them clues about the structure of thevideo content, and also the location of the to-be-located information if it was notalready in the AM. For example, S12 reflected on the process of finishing tasks insession 2 as follows:“I wanted to find just the four main parts of the brain, so I annotatedit, you know. I went over brainstem, thalamus, cerebellum, and cere-brum. Those were like the four main topics in the video, right, whichmade it way easier to jump. Everything that could be said can be clas-sified under these four. So you already know which quarter it is gonnabe in. I feel like most of my times were pretty fast in the last one.”51The AM was used extensively even when subjects were not confident abouttheir familiarity of the video content. For example, S16 annotated on parts thatwere not understood, and she went to the AM everytime when presented with asearching item that she did not understand. So, the AM was used like a place-holder for certain types of information. When subjects did not know much aboutthe topic, they generally could not identify which parts were important, thus theytended to annotate non-understood parts; however, annotating too frequently onnon-understood parts could lead to ineffectiveness of the AM. S15 reported being“swamped in annotations” for a video that she did not know much about.Half of the subjects (n=8) did not use the AM to search for pictures, but S8 usedit for pictures because her annotations reminded her of the visuals of the video. S6only used the AM for the key points and did not use the AM for extra details, whichthe subject referred to as “filled-in stuff”.Regarding future use of the AM, most subjects thought that it would be use-ful in directing them to video segments and reminding them of the structure andcontents of the video (n=9). S12 thought that ideally he would not need to revisitvideo segments if he had made good enough annotations. S7 imagined that shewould use the AM to rewatch video segments if asked questions that she was un-sure about. Both S6 and S9 thought that the AM would be used more often if theywere unfamiliar with the topic of the video, because the AM provided convenientaccess to parts of the video that they were struggling with. S16 would like to seescreenshots of video frames along with text in the AM.4.5 SummaryTo answer the research question and understand related issues, both quantitativedata and qualitative data were collected and analyzed. Quantitative data includestime spent on tasks and responses to most of the survey questions. Qualitative dataconsists of responses to interview questions and some survey questions. These dif-ferent types of data complement each other and jointly answer the research ques-tion in a richer way.Each session of the experiment and corresponding data collected serve for theirown purposes. The training session was used for the subject to develop their own52strategy, and the data analysis for this session was used to reveal possible inter-esting patterns. Both the formal session 1 and 2 were used to compare the AMcondition vs. the non-AM condition, and to compare searching for annotated in-formation vs. non-annotated information. Results suggested that the use of theAM did not have a significant effect on searching time, but searching for annotatedinformation took significantly less time than for non-annotated information. Theformal session 2 was also used to find out whether subjects would use the AMspontaneously, and results indicated that the AM was highly preferred and usedfrequently by subjects; in addition, the number of annotations made by a subjectmight lead to the frequent use of the AM, which will be discussed further in thefollowing chapter.Subjects’ responses to survey questions and interview questions provided pro-found data on video annotations, the annotation process, and the AM. The pre-experiment survey outlined the profiles of subjects and their habits of rewatchingvideos and reviewing annotations for both books and videos. Session surveys so-licited prior knowledge of the topic and perceptions of the video, and immediatereflections on the annotation process and video annotations made for the video.Responses to post-experiment survey questions indicated that the AM was ratedhigh in usability and usefulness, and was projected to be used in the future. Datafrom subjects’ responses to interview questions revealed that opinions on the threetypes of annotations and the annotation process were mixed. Interview data alsosuggested that both the annotation process and video annotations were closely re-lated to how subjects located in-video information.53Chapter 5General DiscussionTo answer the research question, a novel interface has been designed and imple-mented, based on which both quantitative data and qualitative data have also beencollected and analyzed. Main findings are summarized as follows:1. Video annotations seem to be effective in helping learners find previouslyseen video segments;2. The annotation manager is highly preferred by learners over the traditionalway of locating information in video;3. Integrating text with video and the annotation process based on integrationhave the potential to improve both the video learning experience and learningoutcomes.The proposed interface is novel as it presents user-created annotations of mul-tiples videos on the same page, the course page, and provides quick accesses toannotated video segments.The rest of this chapter checks the two hypotheses and discusses results fromtask performances, survey questions, and interview questions more integratively.These discussions offer richer explanations of the results, because session surveys,which were filled right after subjects finished learning with videos, recorded sub-jects’ immediate reflections on their learning and annotating experiences. Theseresponses will partially explain subjects’ task performances; responses to the post-experiment survey and interview questions solicited general perceptions of the54proposed interface and the experiment, and will provide valuable data on issuesrelated to the research question, while also partially explaining subjects’ task per-formances.5.1 Interpreting the ResultsAnnotations Helped in Locating InformationSearching for annotated information was significantly faster than for non-annotatedinformation in all of the three experiment sessions. So how is it that annotatedinformation was located faster? A closer look at the searching process may givean explanation. To search for a given piece of information, the subject startedfrom the course page and entered the video player page to locate the informationeither through the AM or by clicking on the cover frame of the video. In thecase of using the AM, the subject started with annotations he/she had made onthe video, and then entered the video player page; in the case of not using theAM, the subject entered the video player page directly. The subject may havebeen able to locate the information right after entering the video player page ifhe/she found the corresponding annotations in the AM, otherwise he/she needed tohave searched for the information on the video player page. On the video playerpage there were indications of annotations in all of the three sections, thus thesubject was reminded of where annotations were and what had been annotated on,no matter which section they focused on in the searching process. All subjectsmade a considerable amount of annotations mainly on key points, which were alsowhat most of the searching items were about, thus annotations created saliency ofkey points, which might have made annotated information easier to locate.According to Guthrie’s cognitive model, locating information in text requires5 steps: (1) formulation of a goal, (2) inspection of appropriate categories of infor-mation, (3) planning the inspections of information, (4) extraction of details, and(5) recycling to obtain information [29]. Subjects made significantly more high-lights than tags or notes, and highlights were integrated with the original materialbetter than the other types of annotations, thus indications of annotations on theplayer page may have supported subjects in step (2) and step (4), which made the55searching process smoother and quicker. The results are also in line with those fromSalome’s experiment in which the table of contents of the video and indications ofchapters on the video timeline were found to be helpful in locating information inthe video [13, 59]. Moreover, the process of making annotations stimulated sub-jects to think about the video contents and process the video contents at a deeperlevel, thus reinforced memory and accelerated the searching process. This was re-vealed by subjects’ reflections on the annotation process. Thus, Hypothesis 1 isvalidated.The AMWas Highly PreferredThe AM was highly preferred, but only data from the training session revealedits effectiveness in reducing searching time. In the training session, which wasused for subjects to become familiar and develop their learning strategies with theinterface, the AM conditions assigned to tasks were the same for all subjects. So,it is possible that some tasks were inherently easier to be located with the AM. Inthe formal session 1, each half of the subjects watched one of the two videos, andthe AM conditions assigned to tasks of each video were counterbalanced, whichmeant that 8 subjects watched the same video, but only 4 of the 8 subjects finishedsearching tasks with the same AM conditions and the other 4 subjects’ searchingtasks were with the reversed AM conditions, i.e., a task was assigned with the AMif it was not assigned for the other half of the 8 subjects who watched the samevideo.The case was different for the formal session 2, in which subjects used allfeatures at will. Subjects took a longer time to finish searching tasks when using theAM, though the result was not significant. The subjects’ motivations of using theAM in the formal session 2 may shed light on this. It is possible that subjects reliedon the AM as an extension of their internal mental model of the video content, andas a result had difficulty finding information without the help of the AM on thevideo player page [13]. For example, S16 knew that she annotated on parts thatshe did not understand, so she went to the AM every time when needing to searchfor a piece of information that she did not understand. It is also possible that thesearching process would take even longer if the AM was not used. Results from56pre-experiment surveys suggested that subjects relied on low-level strategies suchas scrolling the timeline to find a specific video segment, however, the spontaneoususe of the AM reflected subjects’ needs to process video information at a macro-level instead of solely at the micro-level. This finding is in line with a number ofprevious studies [43, 53, 64].Annotations and the Use of AMHypothesis 2 highlights the effect of annotations on the use of the AM. Resultsfrom session 2 revealed that subjects who annotated the least number of itemswere also among those who used the AM the least. A possible explanation for thisphenomenon would be: subjects might remember what and how much they hadannotated, and thus had some predictions about whether they would find helpfulinformation in the AM for finishing the task, and then made decisions on whetherto use it or not based on their prediction or judgement. Subjects’ responses torelated interview questions (summarized in Section 4.4.4) provided evidence forthis explanation. Subjects did remember what they had annotated, and the processthey went through to make these annotations built a “roadmap” of the video contentin their mind. Looking at annotations in the AM reminded subjects the structureof the video content and locations of the to-be-located information in the originalvideo. Therefore, Hypothesis 2 is confirmed.The Annotation ProcessThe annotation process, as a type of macro-level processing which enables thelearner to temporarily step away from the current point of the video being playedand to look at the video from a more global and personal perspective, has the poten-tial to improve video learning experience and learning outcomes by overcoming theintrinsic challenges of learning with videos. This has been shown in subjects’ re-flections of their learning experiences in the experiment. Though the incorporationof the annotation process into video learning may pose challenges to the continuityof the video viewing process, the added interactivity gives more learner-control,and thus has the potential to help learners with low working memory capacity tobetter assimilate the video contents [25, 27, 31, 40]. The annotation process also57creates a more engaging video learning environment where learners focus moreintensely on learning and can more easily avoid distractions, which can promoteself-regulated learning [10].The Integration of Text and VideoThe integration of text in the video-based learning environment has also probablyfacilitated both the learning process and the in-video information-seeking process.Learners have been found to both read texts and watch videos non-linearly, whereparts of the text or the video are visited repeatedly [33, 65]. This calls for a mech-anism to support this revisitation. It is easy to locate information while readingtext non-linearly, as the entirety of the text, as well as devices such as tables ofcontents or indexes, are exposed to the reader. However, locating video segmentswhile watching poses more challenges if the viewer is not supported by more ad-vanced functions than the traditional VCR controls. For example, while watchinga video, if the viewer wants to rewatch a segment, he/she will need to slide throughthe timeline of the video to find the segment, while it is easier and faster to locatea previously read paragraph while reading a book. In a video-based learning en-vironment, a timeline and filmstrip can provide continuous progression, whereasa transcript provides discrete progression enabling learners to jump to video seg-ments easily, thus navigate the video more conveniently. For videos that have lim-ited text, it is possible that less highlights will be made, and more notes or tagsmay be made by learners to add their own inputs as text. In this case, locatinginformation on the video player page will require more efforts, as the learner willhave to check what the notes or tags are by clicking on icons. However, the useof the proposed AM should still be able to help in the information-seeking processand provides quick access to specific video segments. Therefore, incorporatingvarious annotation features is critical to accommodate different types of learning,as suggested by previous study [67].5.2 LimitationsThough the proposed interface design and experiment answered the research ques-tion, there are some intrinsic limitations for both of them.58There are four main limitations regarding the experiment materials. First ofall, the videos subjects watched were relatively short, being only 5-7 minutes long.This was due to the time constraints of the experiment. In practice, videos for on-line learning are usually longer [28]. Videos chosen for the proposed experimentwere also intense in information covered, which may have posed extra pressure onsubjects. Second, the presented experiment used specific types of video contain-ing intense information both visually and textually, so it is uncertain whether theexperiment results can be generalized to other videos that are, for example, solelyvisually rich or textually rich. Third, the subjects were asked to locate specificpieces of information rather than information they wanted to find; while this designmakes sure that subjects were exposed to the same tasks, it may still have broughtin irrelevant factors which could have deteriorated the results. For example, somesubjects might have difficulty understanding the items, and may have spent a lotof time wandering in the AM or on the video player page as a result, rather thanputting efforts toward locating the information. Finally, though the number of sub-jects recruited was in line with similar research studies [2], and the data from the16 subjects participated in the experiment gave meaningful results, having moresubjects will potentially give more generalizable results.Three limitations are about the methodology of the presented experiment. First,subjects were asked to finish the searching tasks right after finishing learning withthe video, whereas in practice learners revisit materials for a longer period of timeafter their first visit [6]. Thus, longitudinal studies or delayed tests may bring bet-ter results [9]. Second, as reading on paper has been found to be preferred overon screen, despite all of the advanced interactions and functions provided by elec-tronic readers [57], the annotations subjects made on the proposed interface mayhave been more for the text than for the video. Third, though the presented con-trolled lab experiment ruled out confounding factors, such as distractions from sur-rounding environments, the experiment is more artificial than deploying the learn-ing system in classrooms or Amazon Mechanical Turk. As a result, subjects mayhave performed tasks unnaturally in the experiment, potentially not reflecting reallife.Another limitation is on the interface. In the proposed interface, the windowof the AM is not of a fixed size, which was reported by some subjects as trouble-59some because they had to scroll to the bottom of the website to examine all theirannotations if there were too many. This inconvenience may have deteriorated theefficiency of the AM in helping learners locate in-video information, and a morestructured organization of annotations on each tab of the AM window could poten-tially improve user experience with the interface.60Chapter 6Conclusion and Future WorkThe presented work explores the use of video annotations in in-video information-seeking. To answer the research question, a novel video-based learning and anno-tating environment has been developed to enable learners to learn with and annotateon both the video and the text, by integrating the visual and the textual material ofthe video; a novel interface of the annotation manager was also designed to collectannotations that the learner has made on a video and to provide quick accesses toannotated video segments.A controlled lab experiment with 16 undergraduate students as subjects hasbeen carried out. Experiment results showed that video annotations were effec-tive in reducing time spent on locating previously-seen information in video, andthe novel design of the annotation manager was highly preferred by subjects andwas used spontaneously and frequently in the information-seeking process. Theproposed interface design of integrating text with video was perceived as helpfulin video-based learning, and has the potential to improve both the learning expe-rience and learning outcomes. Another contribution of the presented work is theexploration and analysis of different types of video annotations.Future WorkThough the proposed interface design and experiment design were able to answerthe research question and provide meaningful data, a number of future investiga-61tions are needed to further explore video annotations and their uses.The annotation manager, which enables the learner to see all annotations theyhave made on a video and provides quick accesses to video segments, was onlyprovided on the course page. Thus, the learner was not able to use it while learningon the video player page. Providing an annotation manager on the video playerpage has the potential to facilitate information organization and management inthe process of learning, which can be a topic for future exploration.In the proposed annotation manager, there were only pieces of textual infor-mation, so incorporating pictures such as video frames can potentially improve itsfunctionality and further improve the annotating experience.Regarding the annotation process, it is worthwhile to explore a faster and sim-pler way of creating annotations, so that learners do not need to go through multi-ple steps to add an annotation. Though having the AM encouraged learners to addmore annotations for later use, having a less artificial procedure to add annotationscan potentially further encourage learners to make more annotations, as they wouldnot need to divert their attention away from understanding video content.Subjects in the presented experiment exhibited various patterns of video learn-ing, and it will be valuable to identify types of video learners, in a similar way toidentifying types of readers who exhibit distinct reading patterns [33]. Moreover,reading proficiency has been found to have a significant effect on information as-similated from both video and text [24], so it will also be interesting to explore“viewing proficiency” and how this may be related to video-based learning.Though subjects suggested that the annotation process took more efforts thanlearning the video content, they also perceived the annotation process as helpfulfor their learning. Thus, it will be beneficial to investigate the role of the annota-tion process in video-based learning, for example, by investigating what portionsof learners’ attention are devoted to the annotation process and how this affectslearning experiences and learning outcomes.Finally, the integration of video and text in the proposed design consists of thevideo and its literal transcript, which is structured more loosely than an article ora book. Therefore, exploring how the video transcript can be more effective andbetter utilized for learning will be beneficial to further unlock the potential of avideo’s transcript and annotations.62Bibliography[1] M. J. Adler and C. Van Doren. How to read a book: The classic guide tointelligent reading. Simon and Schuster, 2014. → page 11[2] A. Al Hajri. Shaping video experiences with new interface affordances. PhDthesis, University of British Columbia, 2014. → pages 22, 59[3] I. E. Allen and J. Seaman. Online Report Card: Tracking Online Educationin the United States. ERIC, 2016. → page 1[4] T. Anderson and D. R. Garrison. Learning in a networked world: New rolesand responsibilties. In Distance Learners in Higher Education: Institutionalresponses for quality outcomes. Madison, Wi.: Atwood. 1998. → page 1[5] O. Aubert, Y. Prie´, and C. Canellas. Leveraging video annotations invideo-based e-learning. arXiv preprint arXiv:1404.4607, 2014. → pagesxi, 13[6] C. Bazerman. Physicists reading physics: Schema-laden purposes andpurpose-laden schema. Written communication, 2(1):3–23, 1985. → page 59[7] L. Beardsley, D. Cogan-Drew, and F. Olivero. Videopaper: Bridgingresearch and practice for pre-service and experienced teachers. Videoresearch in the learning sciences, pages 479–493, 2007. → pages 15, 17[8] F. Bentley and J. Murray. Understanding video rewatching experiences. InProceedings of the ACM International Conference on InteractiveExperiences for TV and Online Video, pages 69–75. ACM, 2016. → pages3, 7[9] L. W. Brooks, D. F. Dansereau, J. E. Spurlin, and C. D. Holley. Effects ofheadings on text processing. Journal of Educational Psychology, 75(2):292,1983. → page 5963[10] K. S. Bull, P. Shuler, R. Overton, S. Kimball, C. Boykin, and J. Griffin.Processes for developing scaffolding in a computer mediated learningenvironment. 1999. → page 58[11] P.-S. Chiu, H.-C. Chen, Y.-M. Huang, C.-J. Liu, M.-C. Liu, and M.-H. Shen.A video annotation learning approach to improve the effects of videolearning. Innovations in Education and Teaching International, 55(4):459–469, 2018. → page 13[12] S. Cojean and E. Jamet. Facilitating information-seeking activity ininstructional videos: The combined effects of micro-and macroscaffolding.Computers in Human Behavior, 74:294–302, 2017. → pages xi, 9, 10[13] S. Cojean and E. Jamet. The role of scaffolding in improving informationseeking in videos. Journal of Computer Assisted Learning, 34(6):960–969,2018. → pages 9, 56[14] J. Costley, M. Fanguy, M. Baldwin, C. Lange, and S. Han. The role ofmotivation in the use of lecture behaviors in the online classroom. Journal ofInformation Technology Education: Research, 17(1):471–484, 2018. →pages 3, 7[15] J. Davis and D. Huttenlocher. Conote annotation homepage. Available onthe World, 1994. → page 12[16] S. I. De Freitas, J. Morgan, and D. Gibson. Will moocs transform learningand teaching in higher education? engagement and course retention in onlinelearning provision. British Journal of Educational Technology, 46(3):455–471, 2015. → page 1[17] E. Delen, J. Liew, and V. Willson. Effects of interactivity and instructionalscaffolding on learning: Self-regulation in online video-based environments.Computers & Education, 78:312–320, 2014. → page 10[18] S. Dodson, L. Freund, D. Yoon, M. Fong, R. Kopak, and S. Fels.Video-based consensus annotations for learning: A feasibility study.Proceedings of the Association for Information Science and Technology, 55(1):792–793, 2018. → page 19[19] S. Dodson, I. Roll, M. Fong, D. Yoon, N. M. Harandi, and S. Fels. An activeviewing framework for video-based learning. In Proceedings of the FifthAnnual ACM Conference on Learning at Scale, page 24. ACM, 2018. →page 4064[20] S. Dodson, I. Roll, M. Fong, D. Yoon, N. M. Harandi, and S. Fels. Activeviewing: A study of video highlighting in the classroom. In Proceedings ofthe 2018 Conference on Human Information Interaction&Retrieval, pages237–240. ACM, 2018. → page 19[21] B. R. Dye. Reliability of pre-service teachers coding of teaching videosusing video-annotation tools. 2007. → pages 15, 18[22] J. D. Finn and K. S. Zimmer. Student engagement: What is it? why does itmatter? In Handbook of research on student engagement, pages 97–131.Springer, 2012. → page 1[23] M. Fong, S. Dodson, X. Zhang, I. Roll, and S. Fels. Videx: A platform forpersonalizing educational videos. In Proceedings of the 18th ACM/IEEE onJoint Conference on Digital Libraries, pages 331–332. ACM, 2018. →pages v, 28[24] A. Furnham, S. De Siena, and B. Gunter. Children’s and adults’ recall ofchildren’s news stories in both print and audio-visual presentationmodalities. Applied Cognitive Psychology: The Official Journal of theSociety for Applied Research in Memory and Cognition, 16(2):191–210,2002. → page 62[25] R. Garner. Strategies for reading and studying expository text. EducationalPsychologist, 22(3-4):299–312, 1987. → page 57[26] A. Girgensohn, F. Shipman, J. Adcock, M. Cooper, and L. Wilcox. Locatinginformation in video by browsing and searching. In Interactive Video, pages207–224. Springer, 2006. → pages xi, 8, 9[27] S. M. Glynn and F. J. Di Vesta. Control of prose processing via instructionaland typographical cues. Journal of Educational Psychology, 71(5):595,1979. → page 57[28] P. J. Guo, J. Kim, and R. Rubin. How video production affects studentengagement: an empirical study of mooc videos. In Proceedings of the firstACM conference on Learning@ scale conference, pages 41–50. ACM, 2014.→ page 59[29] J. T. Guthrie and P. Mosenthal. Literacy as multidimensional: Locatinginformation and reading comprehension. Educational Psychologist, 22(3-4):279–297, 1987. → page 5565[30] N. M. Harandi, F. Agharebparast, L. Linares, S. Dodson, I. Roll, M. Fong,D. Yoon, and S. Fels. Student video-usage in introductory engineeringcourses. Proceedings of the Canadian Engineering Education Association(CEEA), 2018. → pages 19, 40[31] B. S. Hasler, B. Kersten, and J. Sweller. Learner control, cognitive load andinstructional animation. Applied Cognitive Psychology: The Official Journalof the Society for Applied Research in Memory and Cognition, 21(6):713–729, 2007. → page 57[32] B. Hosack. Videoant: Extending online video annotation beyond contentdelivery. TechTrends, 54(3):45–49, 2010. → page 14[33] J. Hyo¨na¨ and A.-M. Nurminen. Do adult readers know how they read?evidence from eye movement patterns and verbal reports. British Journal ofPsychology, 97(1):31–50, 2006. → pages 58, 62[34] H. Jung, H. V. Shin, and J. Kim. Dynamicslide: Exploring the design spaceof reference-based interaction techniques for slide-based lecture videos. InProceedings of the 2018 Workshop on Multimedia for Accessible HumanComputer Interface, pages 33–41. ACM, 2018. → pages xii, 18, 19[35] M. Ketterl, R. Mertens, and O. Vornberger. Vorlesungsaufzeichnungen 2.0.In Lernen–Organisation–Gesellschaft. eCampus-Symposium derOsnabru¨cker Hochschulen, pages 2–5, 2008. → page 15[36] J. Kim. Toolscape: enhancing the learning experience of how-to videos. InCHI’13 Extended Abstracts on Human Factors in Computing Systems, pages2707–2712. ACM, 2013. → pages xi, 13, 14[37] M. Kipp. Anvil-a generic annotation tool for multimodal dialogue. InSeventh European Conference on Speech Communication and Technology,2001. → pages xi, 15, 16, 18[38] D. LaLiberte. Hypernews homepage, 1997. → page 12[39] R. Lowe. Interrogation of a dynamic visualization during learning. Learningand instruction, 14(3):257–274, 2004. → page 8[40] D. L. Lusk. The effects of seductive details and segmentation on interest,recall and transfer in a multimedia learning environment. PhD thesis,Virginia Tech, 2008. → page 5766[41] M. Mavrikis and E. Geraniou. Using qualitative data analysis software toanalyse students computer-mediated interactions: the case of migen andtransana. International Journal of Social Research Methodology, 14(3):245–252, 2011. → page 18[42] R. E. Mayer and P. Chandler. When learning is just a click away: Doessimple user interaction foster deeper understanding of multimediamessages? Journal of educational psychology, 93(2):390, 2001. → page 7[43] M. Merkt, S. Weigand, A. Heier, and S. Schwan. Learning with videos vs.learning with print: The role of interactive features. Learning andInstruction, 21(6):687–704, 2011. → page 57[44] N. Mirriahi, D. Liaqat, S. Dawson, and D. Gasˇevic´. Uncovering studentlearning profiles with a video annotation tool: reflective learning with andwithout instructional norms. Educational Technology Research andDevelopment, 64(6):1083–1106, 2016. → pages xi, 14, 16[45] E. S. Mollashahi, M. S. Uddin, and C. Gutwin. Improving revisitation inlong documents with two-level artificial-landmark scrollbars. InProceedings of the International Conference on Advanced VisualInterfaces-AVI, volume 18, pages 1–9, 2018. → page 10[46] F. A. Moretti and H. P. Ginsburg. Video interactions for teaching andlearning (vital): A learning environment for courses in early childhoodmathematics education. 2009. → pages xi, 15, 17[47] F. M. Newmann. Student engagement and achievement in Americansecondary schools. ERIC, 1992. → page 1[48] C. Nguyen and F. Liu. Gaze-based notetaking for learning from lecturevideos. In Proceedings of the 2016 CHI Conference on Human Factors inComputing Systems, pages 2093–2097. ACM, 2016. → page 18[49] I. A. Ovsiannikov, M. A. Arbib, and T. H. McNeill. Annotation technology.International journal of human computer studies, 50(4):329, 1999. → pagesxi, 3, 12, 13, 41[50] N. Prestopnik and A. R. Foley. Visualizing the past: The design of atemporally enabled map for presentation (tempo). International Journal ofDesigns for Learning, 3(1), 2012. → pages xi, 8, 967[51] P. Rich, A. Recesso, M. Allexsaht-Snider, and M. Hannafin. The use ofvideo-based evidence to analyze, act on, and adapt preservice teacherpractice. In annual meeting of the American Educational ResearchAssociation, Chicago, 2007. → page 14[52] A. J. Rockinson-Szapkiw, J. Courduff, K. Carter, and D. Bennett. Electronicversus traditional print textbooks: A comparison study on the influence ofuniversity students’ learning. Computers & Education, 63:259–266, 2013.→ pages 3, 12[53] J.-F. Rouet and B. Coutelet. The acquisition of document search strategies ingrade school students. Applied Cognitive Psychology: The Official Journalof the Society for Applied Research in Memory and Cognition, 22(3):389–406, 2008. → page 57[54] A. Saxena and R. Stevens. Video traces: creating common space betweenuniversity and public schools for preparing new teachers. In Proceedings ofthe 8th iternational conference on Computer supported collaborativelearning, pages 643–645. International Society of the Learning Sciences,2007. → pages xi, 17, 18[55] H. V. Shin, F. Berthouzoz, W. Li, and F. Durand. Visual transcripts: lecturenotes from blackboard-style lecture videos. ACM Transactions on Graphics(TOG), 34(6):240, 2015. → pages 9, 13[56] M. Smith. Dynatext: An electronic publishing system, 1993. → page 12[57] J. Stoop, P. Kreutzer, and J. Kircz. Reading and learning from screens versusprint: a study in changing habits: Part 1–reading long information rich texts.New Library World, 114(7/8):284–300, 2013. → pages 3, 12, 59[58] M. S. UDDIN, C. GUTWIN, and A. GOGUEY. Artificial landmarksaugmented media player for video revisitation. → page 10[59] M. S. Uddin, C. Gutwin, and A. Goguey. Using artificial landmarks toimprove revisitation performance and spatial learning in linear controlwidgets. In Proceedings of the 5th Symposium on Spatial User Interaction,pages 48–57. ACM, 2017. → pages xi, 10, 11, 56[60] E. A. Van Es and M. G. Sherin. Learning to notice: Scaffolding newteachers interpretations of classroom interactions. Journal of Technologyand Teacher Education, 10(4):571–596, 2002. → page 1868[61] F. Vohle. Relevanz und referenz: Zur didaktischen bedeutungsituationsgenauer vi deokommentare im hochschulkontext. Vorwort zur Ideeund zum Thema, page 165, 2013. → page 13[62] R. Waller. Functionality in digital annotation: Imitating and supportingreal-world annotation. Ariadne, (35), 2003. → page 11[63] G. A. Wright. How does video analysis impact teacher reflection-for-action?2008. → page 14[64] S. R. Yussen, A. D. Stright, and B. Payne. Where is it? searching forinformation in a college textbook. Contemporary Educational Psychology,18(2):240–257, 1993. → page 57[65] C. Zahn, B. Barquero, and S. Schwan. Learning with hyperlinkedvideosdesign criteria and efficient strategies for using audiovisualhypermedia. Learning and Instruction, 14(3):275–291, 2004. → page 58[66] D. Zhang, L. Zhou, R. O. Briggs, and J. F. Nunamaker Jr. Instructional videoin e-learning: Assessing the impact of interactive video on learningeffectiveness. Information & management, 43(1):15–27, 2006. → page 10[67] X. Zhang. Investigation of a quick tagging mechanism to help enhance thevideo learning experience. PhD thesis, University of British Columbia,2017. → pages 46, 58[68] P. H. Zimmerman, J. E. Bolhuis, A. Willemsen, E. S. Meyer, and L. P.Noldus. The observer xt: A tool for the integration and synchronization ofmultimodal signals. Behavior research methods, 41(3):731–735, 2009. →page 1569Appendix ASurveysThis chapter contains the pre-experiment survey, the session survey, and the post-experiment survey used in the presented experiment.70A.1 Pre-Experiment Survey71727374A.2 Session Survey7576A.3 Post-Experiment Survey7778Appendix BSearching TasksB.1 Quantum Computation79808182B.2 Fusion Power83848586B.3 The Brain878889B.4 No AM Condition90Appendix CUser Study Data91Appendix DR Code92Appendix EPower AnalysisThe power analysis was performed using the software G*Power, based on a pilotstudy with 4 subjects.The pilot study was a simplified version of the formal experiment, in whichall 4 subjects watched the 3 videos in the same order across 3 sessions (QuantumComputation, The Brain, Fusion Power), and finished the same set of searchingtasks with identical AM conditions in session 1.Within-subjects analysis was performed to obtain the partial η which wasneeded to determine the effect size f. Means of subjects’ task performances insession 1 are shown in Table E.1. The AM condition was treated as the within-subjects factor, and the within-subjects partial η was 0.67. The effect size f wasthen calculated by G*Power.Parameters used in the power analysis are shown in Table E.2, and a total sam-ple size of 4 was recommended by the power analysis.93Subject ] AM Non AM Annotated Non Annotated1 33.60 47.73 32.63 64.782 13.42 20.92 15.07 27.703 15.70 16.56 15.08 27.754 19.54 41.20 29.10 36.76Table E.1: Means of subjects’ task performances in session 1 in the pilotstudy (measured in seconds).Parameter ValueTest family F testsStatistical test ANOVA: Repeated measures, within factorsEffect size f 1.428α 0.05Power 0.95Number of groups 2Corr among rep measures 0.5Nonsphericity correction 1Table E.2: Parameters for power analysis on G*Power.94


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items