Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Shaping video experiences with new interface affordances Al Hajri, Abir 2014

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


24-ubc_2015_february_alhajri_abir.pdf [ 44.76MB ]
JSON: 24-1.0167637.json
JSON-LD: 24-1.0167637-ld.json
RDF/XML (Pretty): 24-1.0167637-rdf.xml
RDF/JSON: 24-1.0167637-rdf.json
Turtle: 24-1.0167637-turtle.txt
N-Triples: 24-1.0167637-rdf-ntriples.txt
Original Record: 24-1.0167637-source.json
Full Text

Full Text

Shaping Video Experiences with New InterfaceAffordancesbyAbir Al HajriB.Sc., Sultan Qaboos University, 2003M.Sc., University of Wollongong, 2005A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)December 2014c© Abir Al Hajri, 2014AbstractWatching and creating videos have become predominant parts of our daily lives.Video is becoming the norm for a wide range of purposes from entertainment,to training and education, marketing, and communication. Users go beyond justwatching videos. They want to experience and interact with content across thedifferent types of videos. As they do so, significant digital traces accumulate onthe viewed videos which provide an important source of information for designingand developing tools for video viewing interfaces.This dissertation proposes the next generation video management interfacewhich creates video experiences that go beyond just pushing the play button. Ituses how people view and interact with contemporary video to design strategiesfor future video interfaces. This has allowed the development of new tools for nav-igating and managing videos that can be easily integrated into existing systems.To help define some design guidelines for the video interface, a behaviouralanalysis of users’ video viewing actions (n = 19) was performed. The resultsdemonstrate that participants actively watch videos and most participants tend toskip parts of videos and re-watch specific portions from a video multiple times.Based on the findings, new fast navigation and management strategies are devel-oped and validated in search tasks using a single-video history (n = 12), a videoviewing summary (n = 10) and multiple-videos history (n = 10). Evaluation ofresults of the proposed tools show significant performance improvements over thestate-of-the-practice methods. This indicates the value of users’ video viewing ac-tions.Navigating other forms of videos, such as interactive videos, introduces an-other issue with the selection of interactive objects within videos to direct users toiiAbstractdifferent portions of the video. Due to the time-based nature of the videos, theseinteractive objects are only visible for a certain duration of the video, which makestheir activation difficult. To alleviate this problem a novel acquisition technique(Hold) is created, which temporally pauses the objects while the user interactswith the target. This technique has been integrated into a rich media interface (Me-diaDiver) which made such interaction possible for users.iiiPrefaceAll of the research work presented in this dissertation was conducted in the Hu-man Communication Technologies Laboratory (HCT) at the University of BritishColumbia, Point Grey campus. All user studies and associated methods were ap-proved by the University of British Columbia Behavioural Research Ethics Board[certificates #: H10-01897, H08-03006, and H13-01589].Parts of this dissertation have been published elsewhere. Earlier versions ofSections 4.4, 4.5, 4.6 and 5.1 have been published in A. Al Hajri, G. Miller, S.Fels and M. Fong [5]. I was the lead investigator, responsible for all major areas ofconcept formation, literature review, interfaces (pilot and main) design and imple-mentation, experimental design, data collection and analysis, as well as manuscriptcomposition. G. Miller was involved in the discussion and testing of the interface,and assisted with writing of the manuscript. S. Fels provided feedback on the de-sign and manuscript. M. Fong helped in the implementation of some parts of themain interface.An earlier version of Section 5.2 has been published in A. Al Hajri, M. Fong, G.Miller and S. Fels [6]. I developed and implemented the interface jointly with M.Fong. I was responsible for literature review, experimental design, data collectionand analysis, as well as manuscript composition. M. Fong implemented the VCRalgorithm, was involved in the discussion of the interface and provided feedbackon the manuscript. G. Miller assisted with the interface design and testing, andwriting the manuscript. S. Fels provided editorial feedback on the manuscript.A version of Chapter 6 has been published in A. Al Hajri, G. Miller, M. Fongand S. Fels [8]. I was the lead investigator, responsible for all major areas ofconcept formation, literature review, interfaces (pilot and main) design and imple-ivPrefacementation, experimental design, data collection and analysis, as well as manuscriptcomposition. G. Miller was involved in the discussion and testing of the interface,and assisted with writing. M. Fong helped in the implementation of some parts ofthe interface and provided feedback on the manuscript. S. Fels provided feedbackon the design and manuscript.An overview of the visualizations described in Sections 5.2.2 and 6.3 has beenpresented in a poster and published in A. Al Hajri, M. Fong, G. Miller and S. Fels[7]. I designed the poster and wrote the manuscript. G. Miller assisted with therevisions of the manuscript and poster. S. Fels was the supervisory author on thisproject.Earlier versions of Sections 7.1, 7.2 and 7.2 have been published in A. Al Ha-jri, S. Fels, G. Miller and M. Ilich [4]. I formulated the mathematical model,performed and evaluated the model, was responsible for literature review, experi-mental design, data collection and analysis, as well as manuscript composition. G.Miller helped with writing the manuscript and provided assistance on simplifyingthe model using a vector notation. I have adapted M. Ilich’s experimental interfaceto evaluate the model and test the Hold technique. M. Ilich originally proposed theHold technique which I adapted, modeled and evaluated, and he provided feedbackon the manuscript. S. Fels provided assistance on the manuscript revisions.A version of Section 7.4 has been published in G. Miller, S. Fels, A. Al Hajri,M. Ilich, Z. Foly-Fisher, M. Fernandez and D. Jang [85]. I designed and imple-mented the interface jointly with Z. Foley-Fisher, M. Fernandez and M. Fong. Iwrote the manuscript. M. Ilich and D. Jang implemented earlier versions of someaspects of the interface that were adapted in the produced system. G. Miller andS. Fels were the project leaders. G. Miller was involved in the discussion and de-sign of the interface, and helped in the manuscript writing and revisions. S. Felsprovided feedback on the design and manuscript.An earlier version of Chapter 8 has been presented as an interactive demon-stration in A. Al Hajri, G. Miller, M. Fong and S. Fels [7, 8] and GRAND ’14. Idesigned, implemented and tested the application jointly with M. Fong. G. Millerand S. Fels were involved in the discussion of the design of the application. Iwrote the manuscript for GRAND ’14 and G. Miller provided assistance on themanuscript revisions. M. Fong was responsible for the experimental design andvPrefacedata collection. G. Miller and I were involved in the discussion of the study designand helped in the data collection.The video content of the screenshots in Figures 1.2, 1.4, 1.6, 4.2, 4.3, 4.4, 4.5,5.7, 5.8, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.8, 6.9, 8.1, 8.3, 8.4, 8.5, 8.6, 8.8, and 8.9 isc© copyright 2008, Blender Foundation.The figures in Chapter 2 are used under permission as follows:• Figure 2.1 from c© [28] page 11. Used with permission from Chorianopou-los, K.• Figure 2.2 from c© [82] page 42. Used under permission from Brusilovsky,P.• Figure 2.3 from c© [69] page 566. Used under permission from Kim, J.• Figure 2.4 from c© [22] page 202. Used under permission from Ooi, W. T.• Figure 2.6 from c© [115] page 123. Used under permission from Kaasalainen,J.• Figure 2.7 from c©Aderhold, L., Kao, K. & Hilden, J. (2009). Tabviz. Re-trieved from By permission from Aderhold, L.• Figure 2.8 from c© [83] page 692. Used under permission from Milic-Frayling, N.• Figure 2.9 from c© [120] page 317. Used under permission from Shintani,T.• Figures 2.11 and 2.12 from c© [79] pages 1159-1160. Used under permissionfrom Matejka, J.• Figure 2.13 from c© [92] page 1222. Used under permission from Huber, M.& Olivier, P.• Figure 2.14from c© [25] page 789. Used under permission from Chen, B. Y.• Figure 2.15 from c© [71] page 202. Used under permission from Girgensohn,A.viPreface• Figure 2.16 from c© [105] page 278. Used under permission from Shaw, R.& Liu, Y.• Figure 2.17 from c© [13] page 59. Used under permission from Robbins, D.,Czerwinski, M., Bederson, B. & Cutrell, E.• Figure 2.18 from c© [52] page 3331. Used under permission from Anderson,J.• Figure 2.19 from c© [47] page 281. Used under permission from Balakrish-nan, R.• Figure 2.20 from c© [53] page 845. Used under permission from Hasan, K.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . 132 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1 Studying Video Viewing Behaviour . . . . . . . . . . . . . . . 162.1.1 Explicit-User Metadata . . . . . . . . . . . . . . . . . . 172.1.2 Implicit-User Metadata . . . . . . . . . . . . . . . . . . 182.2 Leveraging Implicit-user Interactions . . . . . . . . . . . . . . . 232.3 Viewing History Visualization . . . . . . . . . . . . . . . . . . 28viiiTable of Contents2.3.1 Timeline-based Visualization . . . . . . . . . . . . . . . 292.3.2 Graph-based Visualization . . . . . . . . . . . . . . . . 292.3.3 3D Visualization . . . . . . . . . . . . . . . . . . . . . 312.4 Video Navigation . . . . . . . . . . . . . . . . . . . . . . . . . 332.4.1 Navigation using Representative Previews . . . . . . . . 342.4.2 Navigation by Direct Manipulation of Content . . . . . 372.4.3 Navigation Applying Users’ Wisdom . . . . . . . . . . 392.5 Object Selection in Videos . . . . . . . . . . . . . . . . . . . . 412.5.1 Static Object Selection . . . . . . . . . . . . . . . . . . 422.5.2 Moving Object Selection . . . . . . . . . . . . . . . . . 452.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Video Viewing Behaviour . . . . . . . . . . . . . . . . . . . . . . . 493.1 Logged Video Interactions . . . . . . . . . . . . . . . . . . . . 513.2 Types of Viewing Behaviours . . . . . . . . . . . . . . . . . . . 513.3 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3.1 Logging Users’ Viewing Behaviour . . . . . . . . . . . 563.3.2 Participation Procedure . . . . . . . . . . . . . . . . . . 563.3.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . 573.4 Collected Video Interaction Dataset . . . . . . . . . . . . . . . 573.4.1 Data Clustering . . . . . . . . . . . . . . . . . . . . . . 603.5 Analysis 1: Watched Categories . . . . . . . . . . . . . . . . . 613.6 Analysis 2: Participants’ Viewing Patterns . . . . . . . . . . . . 623.7 Analysis 3: Viewing Behaviour . . . . . . . . . . . . . . . . . . 653.7.1 Skip Behaviour . . . . . . . . . . . . . . . . . . . . . . 653.7.2 Re-watch Behaviour . . . . . . . . . . . . . . . . . . . 693.7.3 Replay Behaviour . . . . . . . . . . . . . . . . . . . . . 723.7.4 Revisit Behaviour . . . . . . . . . . . . . . . . . . . . . 753.7.5 Drop-off Behaviour . . . . . . . . . . . . . . . . . . . . 783.7.6 Interrupted Viewing . . . . . . . . . . . . . . . . . . . 833.8 Analysis 4: Impact of Video’s Popularity on Users’ Behaviour . 873.9 Analysis 5: Impact of Video’s Length on Users’ Behaviour . . . 873.10 Analysis 6: Correlation between Users’ Behaviours . . . . . . . 89ixTable of Contents3.11 Design Guidelines for a Video Interface . . . . . . . . . . . . . 903.12 Study Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 913.13 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934 Recording and Utilizing Video Viewing Behaviour . . . . . . . . . 954.1 Video Viewing History . . . . . . . . . . . . . . . . . . . . . . 964.1.1 How Viewing History Is Captured . . . . . . . . . . . . 974.2 Use Cases of Video Viewing History . . . . . . . . . . . . . . . 984.2.1 Navigation . . . . . . . . . . . . . . . . . . . . . . . . 984.2.2 Scene Search . . . . . . . . . . . . . . . . . . . . . . . 994.2.3 Video Mashup . . . . . . . . . . . . . . . . . . . . . . 994.2.4 Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.3 Common Properties of Video Navigation Interfaces . . . . . . . 1004.4 Video Navigation Interface . . . . . . . . . . . . . . . . . . . . 1044.4.1 Visualization Components . . . . . . . . . . . . . . . . 1044.5 Investigating the Feasibility of Video Viewing History . . . . . . 1104.5.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . 1114.5.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . 1114.5.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.5.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 1164.5.5 Results and Discussions . . . . . . . . . . . . . . . . . 1174.6 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . 1264.6.1 Design Guidelines . . . . . . . . . . . . . . . . . . . . 1274.7 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285 Single Video Viewing History Visualizations . . . . . . . . . . . . . 1295.1 Visualize History Using a List of Records . . . . . . . . . . . . 1305.1.1 History Timeline as a Vertical List of Thumbnails . . . . 1315.1.2 List of Thumbnails in a Video Viewing Interface . . . . 1335.1.3 Investigating the List of Thumbnails Visualization . . . 1355.2 Visualize History Using Consumption Frequencies . . . . . . . 149xTable of Contents5.2.1 Footprints Visualization Using Colour Intensity . . . . . 1505.2.2 Footprints Visualization Using Variable-sized Thumbnails 1515.2.3 Investigating View Count Record (VCR) Visualization . 1565.3 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1695.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1706 Multiple-Videos History Visualizations . . . . . . . . . . . . . . . 1716.1 History Visualization Considerations . . . . . . . . . . . . . . . 1726.2 Multiple-Video Navigation Interface . . . . . . . . . . . . . . . 1736.2.1 Video History System (VHS) . . . . . . . . . . . . . . . 1736.3 History Visualization Designs . . . . . . . . . . . . . . . . . . 1786.3.1 Video Timeline . . . . . . . . . . . . . . . . . . . . . . 1786.3.2 Video Tiles . . . . . . . . . . . . . . . . . . . . . . . . 1796.4 Visualization Scalability . . . . . . . . . . . . . . . . . . . . . 1806.5 Investigating Multiple-Video History Visualizations . . . . . . . 1836.5.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . 1836.5.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . 1846.5.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 1856.5.4 Pilot Test . . . . . . . . . . . . . . . . . . . . . . . . . 1866.5.5 Results and Lessons Learned from Pilot Test . . . . . . 1886.5.6 Participants . . . . . . . . . . . . . . . . . . . . . . . . 1896.5.7 Results and Discussions . . . . . . . . . . . . . . . . . 1896.6 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1946.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1957 Object Selection in Videos . . . . . . . . . . . . . . . . . . . . . . . 1967.1 Modelling Target Acquisition . . . . . . . . . . . . . . . . . . . 1977.2 Moving Target Selection Technique . . . . . . . . . . . . . . . 2037.3 Empirical Validation of Moving Target Models . . . . . . . . . 2037.3.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . 2067.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . 2067.3.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 2077.3.4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . 209xiTable of Contents7.3.5 Performance Measures . . . . . . . . . . . . . . . . . . 2107.3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 2177.4 Hold in a Rich Media Interface . . . . . . . . . . . . . . . . . . 2217.4.1 MediaDiver Interface . . . . . . . . . . . . . . . . . . . 2217.4.2 MediaDiver Applications . . . . . . . . . . . . . . . . . 2257.5 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2257.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2278 Video Viewing History in a Mobile Application . . . . . . . . . . . 2298.1 Mevie: History-based Mobile Application . . . . . . . . . . . . 2308.2 Mevie Components . . . . . . . . . . . . . . . . . . . . . . . . 2308.2.1 Home Screen . . . . . . . . . . . . . . . . . . . . . . . 2318.2.2 Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . 2318.2.3 Viewing History Timeline . . . . . . . . . . . . . . . . 2348.2.4 Summary and Detailed Viewing History . . . . . . . . . 2378.3 Mevie Workflow . . . . . . . . . . . . . . . . . . . . . . . . . 2408.4 Mevie in a User Study . . . . . . . . . . . . . . . . . . . . . . 2418.4.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . 2418.4.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . 2418.4.3 Design and Procedure . . . . . . . . . . . . . . . . . . 2438.4.4 Tested Videos . . . . . . . . . . . . . . . . . . . . . . . 2448.4.5 Results and Discussions . . . . . . . . . . . . . . . . . 2448.5 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2488.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2499 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2509.1 Dissertation Contributions . . . . . . . . . . . . . . . . . . . . 2509.1.1 Behavioural Analysis of Users’ Video Viewing Actions . 2509.1.2 Video Navigation Techniques . . . . . . . . . . . . . . 2519.1.3 Moving Object Selection Model . . . . . . . . . . . . . 2539.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2539.3 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255xiiTable of Contents9.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 257Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259AppendicesA List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . 274A.1 Conference Publications . . . . . . . . . . . . . . . . . . . . . 274A.2 Interactive Demonstrations . . . . . . . . . . . . . . . . . . . . 275A.3 Research Talks . . . . . . . . . . . . . . . . . . . . . . . . . . 276A.4 Additional Publications . . . . . . . . . . . . . . . . . . . . . . 276B User Studies Questionnaire . . . . . . . . . . . . . . . . . . . . . . 277B.1 User Study on Video Viewing Behaviour . . . . . . . . . . . . . 277B.2 User Study on the Feasibility of Video Viewing History . . . . . 278B.3 List of Thumbnails User Study . . . . . . . . . . . . . . . . . . 282B.4 VCR User Study . . . . . . . . . . . . . . . . . . . . . . . . . 288B.5 Multiple Video Viewing History Visualization User Study . . . . 292B.6 Object Selection User Study . . . . . . . . . . . . . . . . . . . 298B.7 Mevie User Study . . . . . . . . . . . . . . . . . . . . . . . . . 301C Additional Experiments’ Data . . . . . . . . . . . . . . . . . . . . 303C.1 Behavioural User Study . . . . . . . . . . . . . . . . . . . . . . 303C.2 Mevie User Study . . . . . . . . . . . . . . . . . . . . . . . . . 314xiiiList of TablesTable 3.1 Demographics summary for participants in the video viewingbehaviour study. Watching frequencies and videos per sessionare the values reported by participants. (Note: duration in days) 58Table 3.2 A Pearson product-moment correlation coefficient between eachbehaviour and the video popularity (i.e. No. of views obtainedfrom YouTube) and video length. There is no correlation be-tween any pair, (Note: ∗ Correlation is significant at the 0.01level (2-tailed).) . . . . . . . . . . . . . . . . . . . . . . . . 88Table 3.3 Welsh Test comparing the average number of actions occurredper video in each duration group: short, medium and long. Asignificant difference was found between the three groups foreach behaviour. Longer videos had significantly more skips andre-watches but less number of replays. (Notes: ( ) is StandardDeviation; ∗p < .001) . . . . . . . . . . . . . . . . . . . . . 89Table 3.4 A Pearson product-moment correlation coefficient between eachpair of behaviours. There is a moderate positive correlation be-tween re-watch and skip behaviour, which was also seen in theindividuals’ behavioural grouping (Figure 3.7) where most re-watchers were also skippers. ∗ Correlation is significant at the0.001 level (2-tailed). . . . . . . . . . . . . . . . . . . . . . 90Table 4.1 Demographics summary for participants in the investigating thefeasibility of video viewing history study. . . . . . . . . . . 113xivList of TablesTable 4.2 Means (and standard deviations) for number of times video Gridused, number of times History Timeline used, completion timein minutes, number of clip previews, number of clip deletions,number of clip modifications and number of trailer plays for thedifferent modes. When given the hybrid mode, participants cre-ated trailers significantly faster than using the two modes sepa-rately, which illustrates the utility of navigation history in videoauthoring. (∗p < 0.005 ) . . . . . . . . . . . . . . . . . . . . 118Table 4.3 Means of number of times video Grid used, number of timesHistory Timeline used, completion time in minutes, number ofclip previews, number of clip deletions, number of clip modi-fications and number of trailer plays for themes per mode. For‘Serene animals’ and ‘Two objects’ trailers, participants tookmore time creating them using the Grid mode. . . . . . . . . 119Table 4.4 Clips agreement percentage per trailer theme. Background laugh-ter trailer showed the highest agreement while ‘two objects’trailer showed the lowest. . . . . . . . . . . . . . . . . . . . 121Table 4.5 Videos agreement percentage per trailer theme. Backgroundlaughter and serene animals trailers showed 100% agreementof videos among participants. . . . . . . . . . . . . . . . . . 122Table 5.1 Demographics summary for participants in the investigation ofthe list of thumbnails visualization study. . . . . . . . . . . 137Table 5.2 Results of the comparative study between list of thumbnails andFilmstrip for the answer search task, showing a significant ad-vantage using History Timeline in % of answered questions,time needed to answer a question, previews, and interval ac-curacy. Note: SD = standard deviation; ns = not significant;average time per question measured in seconds. * p < 0.025 . 144xvList of TablesTable 5.3 Demographics summary for participants in the investigation ofthe VCR visualization study. (Note: WMP = Windows MediaPlayer, QT = QuickTime, VLC = VLC Media Player, RP = RealPlayer, KMP = KMP Player, Gom = Gom player, M = Mplayer,YT= YouTube . . . . . . . . . . . . . . . . . . . . . . . . . 159Table 5.4 Results of the comparative study for the interval retrieval task,showing a significant advantage using our method (VCR) interms of completion time. Note: SD = standard deviation; com-pletion time measured in seconds. * p < 0.03 . . . . . . . . 163Table 5.5 Agreement between events participants listed for each video.There are at least 4 (out of 5) events that were listed by at least50% of the participants. Note: V2: One Man Band, V3: PartlyCloudy, V4: Day & Night, V5: For The Birds, and V6: Presto 169Table 6.1 Pilot study means of the completion time (in seconds), the num-ber of previews, and the number of scrolling events for eachmode. The results demonstrate that history-based search forpersonal affective intervals is better than search using Filmstripand the video library. . . . . . . . . . . . . . . . . . . . . . 188Table 6.2 Demographics summary for participants in the investigation ofthe multiple-video history visualization study. (Note: WMP =Windows Media Player, QT = QuickTime, VLC = VLC MediaPlayer, RP = Real Player, KMP = KMP Player, Gom = Gomplayer, M = Mplayer, YT= YouTube) . . . . . . . . . . . . . 190Table 6.3 Performance comparisons for the three multiple-video historymethods, using the F-test for equality of means. The resultsdemonstrate that history-based search for personal affective in-tervals is more efficient than search using Filmstrip and thevideo library. Notes: SD is Standard Deviation; Completiontime is measured in seconds; ∗p < 0.01. . . . . . . . . . . . 191Table 7.1 Demographics summary for participants in the investigation ofthe target selection technique study. (Note: D: dimension) . . 206xviList of TablesTable 7.2 Summary of coefficients estimates, corresponding standard er-rors, and R2 values for the regression of the 2D moving targetsmodels, where a and b correspond to coefficients of Fitts’ Lawin (Equation 7.2) . . . . . . . . . . . . . . . . . . . . . . . . 216Table 8.1 Demographics summary for participants in the Mevie user study.(Note: Sci: Science, Eng: Engineering, FT/week: A few timesa week) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242Table C.1 Participants’ response when asked what they think about thehistory-based mobile application Mevie. Most participants foundit useful and interesting. . . . . . . . . . . . . . . . . . . . . 314Table C.2 Participants’ response when asked would they use the history-based mobile application Mevie. Most participants said yes andfor those who said no they think it can be used if it is integratedto video websites such as YouTube and Vimeo. . . . . . . . . 316Table C.3 Participants’ response when asked if they would actively use thehistory in Mevie. Most participants answered yes and for thosewho said no they think others might use it but not themselves. 318Table C.4 Participants’ response when asked about any comments or sug-gestions to improve the design of the history-based mobile ap-plication Mevie. . . . . . . . . . . . . . . . . . . . . . . . . 320xviiList of FiguresFigure 1.1 An illustration of a future video interface with the contribu-tions made by this dissertation. The interface allows users toview and navigate videos while keeping track of their view-ing history and providing access to this data. Users’ viewedcontent can be accessed using the history component. Navi-gating the playing video can be achieved using the VCR andthe Hold technique. (Note: This figure lays down the elementsand interactions defined in this dissertation.) . . . . . . . . . 4Figure 1.2 The history component of the video interface (a). Clicking onone of the thumbnails from history (b) brings a detailed historyin another screen. Three different designs were developed andtested for the detailed history as show in (c). . . . . . . . . . 6Figure 1.3 Viewing history visualizations where each interval is repre-sented by a quadrilateral shape: the bigger the size the morethat interval is watched. Four designs are proposed and eval-uated in this dissertation: (c) History Timeline and (b) ViewCount Record (VCR) for single-video history, and (d) VideoTimeline and (e) Video Tiles for multiple-video history. VCR iscompared against the state-of-the-practice navigation methodfilmstrip (a). . . . . . . . . . . . . . . . . . . . . . . . . . . 7Figure 1.4 The VCR component in the video interface is used for navigat-ing the playing video. . . . . . . . . . . . . . . . . . . . . . 8xviiiList of FiguresFigure 1.5 The Hold technique in the video interface is used for selectingand activating interactive objects for navigation. . . . . . . . 9Figure 1.6 The history-based video viewer application with some content. 10Figure 2.1 Chorianopoulos [28] results of the matching between users’interactions (Replay30) time series and ground truth interestpoints within videos. Approximately 70% of the interestingsegments were observed within 30 seconds before a re-watchinglocal maximum. . . . . . . . . . . . . . . . . . . . . . . . . 21Figure 2.2 Mertens [82] footprint bar visualizing viewing history of a user. 25Figure 2.3 Kim et al. [69] Rollercoaster timeline visualizing viewing his-tory of multiple users (collective wisdom). The height of thetimeline at each point shows the amount of navigation activityby learners at that point. The magenta sections are automatically-detected interaction peaks. . . . . . . . . . . . . . . . . . . 26Figure 2.4 Carlier el al. [22] employed users’ zooming and scrolling ac-tions while watching videos on small screens to re-target ahigh-resolution video for display on small screens. Their ap-proach overview is shown: four frames and a few viewports(first row), heatmaps and detected regions of interests (ROIs)(second row), and re-targeted frames including re-framing tech-niques (last row). . . . . . . . . . . . . . . . . . . . . . . . 27Figure 2.5 YouTube’s history is a Timeline-based visualization. It consistsof a list of thumbnails ordered chronologically. . . . . . . . 28Figure 2.6 Rolling History [115] is another Timeline-based visualization,which used four directions of navigation control to cover thehistory of multiple tabs or browsers opened at the same time. 30Figure 2.7 TabViz is a Timeline-based visualization that employed a fan-shaped hierarchical visualization to show the history of multi-ple tabs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 2.8 Tree or directed graph [83] are Graph-based visualizations whichvisualize each visited page as a node and links as the edges be-tween nodes. . . . . . . . . . . . . . . . . . . . . . . . . . 31xixList of FiguresFigure 2.9 3D visualization: (a) WebBook [120] represented each webpage as a traditional book page; (b) Circle mode [120] placedthumbnails of the visited web pages at the circumference of acircle; (c) Cube mode [120] is a 3D visualization which placedthumbnails of pages on the faces of a cube. . . . . . . . . . 32Figure 2.10 The standard navigation tools used in most video systems arethe video controls (play, pause, seek, fast forward, and rewind). 33Figure 2.11 Video navigation using representative previews in (a) Netflix,and (b) YouTube [79] . . . . . . . . . . . . . . . . . . . . . 34Figure 2.12 Swifter [79]: a video scrubbing technique that displays a gridof pre-cached thumbnails during scrubbing actions . . . . . 35Figure 2.13 Panopticon [63, 92]: a video surrogate system that displaysmultiple sub-sequences in parallel to present a rapid overviewof the entire sequence to the user. . . . . . . . . . . . . . . 36Figure 2.14 Cheng et al. [25] Smartplayer helps users rapidly skim throughuninteresting content and watch interesting parts at a normalpace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 2.15 Kimber et al. [71] Trailblazer allows users to navigate a videoby directly controlling (i.e. dragging) objects in the video oron the floor plan. . . . . . . . . . . . . . . . . . . . . . . . 38Figure 2.16 Shamma et al. [105] HackDay TV system applies users’ foot-prints on the video timeline to visualize what has been con-sumed from a video to help users navigate the video. Colourintensity indicates how often each portion has been used inremixes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Figure 2.17 Baudisch et al. [13] Drag-and-Pick selection technique. . . . 42Figure 2.18 Gunn et al. [52] Comet Tail and Target lock selection tech-niques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43xxList of FiguresFigure 2.19 Grossman et al. [47] Bubble cursor selection technique. (a)Area cursors ease selection with larger hotspots than point cur-sors. (b) Isolating the intended target is difficult when the areacursor encompasses multiple possible targets. (c) The bubblecursor solves the problem in (b) by changing its size dynam-ically such that only the target closest to the cursor center isselected. (d) The bubble cursor morphs to encompass a targetwhen the basic circular cursor cannot completely do so withoutintersecting a neighboring target. . . . . . . . . . . . . . . 44Figure 2.20 Hasan et al. [53] Comet and Target Ghost moving target selec-tion techniques. . . . . . . . . . . . . . . . . . . . . . . . . 47Figure 3.1 Illustration of video viewing behaviours: the white circles rep-resent the start position of a user’s viewing interval; grey linesrepresent the video timeline; red lines indicate the duration andtemporal location watched; and the blue arrows indicate an ac-tion taken by the user to seek to another time. The numbersindicate the order of user actions. . . . . . . . . . . . . . . 53Figure 3.2 Frequency of each inactivity period between records. 80% ofrecords have less than 10 minutes gap from their consecutiverecord. Beyond 50 minutes the frequency at each time point isless than 8 records. . . . . . . . . . . . . . . . . . . . . . . 54Figure 3.3 Number of unique visited videos per participant. Participants7, 10, 14 and 15 had the most number of videos while partici-pants 12, 13 and 18 had the least. . . . . . . . . . . . . . . 59Figure 3.4 Number of collected records per participant. Participants 7, 10and 11 had the most number of records while participants 12,13 and 18 had the least. . . . . . . . . . . . . . . . . . . . . 59Figure 3.5 Average number of visited videos and records per group. Mediumviewers watched more videos on average, while heavy viewershad the most number of records, which indicates more activityamong these users. . . . . . . . . . . . . . . . . . . . . . . 60xxiList of FiguresFigure 3.6 Number of visited videos per category. Music had the mostnumber of videos 681 videos then came Science & Technol-ogy, which had 289 videos. . . . . . . . . . . . . . . . . . . 62Figure 3.7 Participants’ viewing patterns based on their own normalizedbehaviour and actions frequency. . . . . . . . . . . . . . . . 63Figure 3.8 Percentage of skipped videos per category. Gaming (45%)showed a high percentage of videos being skipped while Sci-ence & Technology (13%) had a significantly lower percentagethan the overall average. . . . . . . . . . . . . . . . . . . . 67Figure 3.9 Percentage of skipped videos for each participant. Participants4, 5, 7, 8, 9 and 10 had more than 30% of their number ofviewed videos being skipped. . . . . . . . . . . . . . . . . 68Figure 3.10 An illustration of how participant 8 viewed one of the videos.It is a clear example of how he skipped the video multiple timestrying to find a position of interest and watch for an extendedperiod of time. . . . . . . . . . . . . . . . . . . . . . . . . 69Figure 3.11 Percentage of re-watched videos per category. Each categoryhad at least 20% of its videos encountering a re-watch be-haviour. Film & Animation exhibited the highest percentage(37%) of videos containing a re-watch activity, which can bedue to the mix of comedy and emotion these videos contain. 70Figure 3.12 Percentage of re-watched videos for each participant. Partici-pants 3, 5, 8, 9, and 18 had a significantly higher percentage ofvideos that exhibited re-watch actions (more than 33%). . . 72Figure 3.13 Percentage of replayed videos per category. Some categorieshad no replayed videos while the others had only few replayedvideos. Music category had the highest percentage of videosbeing replayed (∼11%) while Sports and Gaming videos werenever replayed in the same session. . . . . . . . . . . . . . 74Figure 3.14 Percentages of replayed videos per participant. Participants 2,3, 11, and 15 had more than 10% of their videos being replayedwhile nine participants (4, 5, 6, 8, 9, 12, 13, 17, and 18) didnot replay any video in the same session. . . . . . . . . . . 75xxiiList of FiguresFigure 3.15 Percentage of revisited videos per category. More than 14%of the viewed videos in Music, Education, Science & Tech-nology and Sports categories were revisited, whereas only 7%of videos from How-to & Style, Comedy and Entertainmentcategories were accessed again in multiple sessions. . . . . . 77Figure 3.16 Percentages of revisited videos per participant. Participants 2,3, 11 and 19 had 20% or more of their videos being accessedin multiple sessions while six participants (4, 5, 12, 13, 16, and18) did not revisit any video in multiple sessions. . . . . . . 78Figure 3.17 Number of videos being left before t seconds from the start ofthe video. 10% of abandoned videos were left within the first10 seconds of the video. . . . . . . . . . . . . . . . . . . . 80Figure 3.18 Number of videos per viewed percentage. 10% of abandonedvideos had 3% of their content being watched. . . . . . . . . 81Figure 3.19 Percentage of abandoned videos per category. 78% of theviewed videos in Gaming were not watched entirely while around50% of Science & Technology videos were abandoned beforethey finished playing. . . . . . . . . . . . . . . . . . . . . . 82Figure 3.20 Percentage of videos that were not completely watched foreach participant. Aside from participant 12, at least 33% ofeach participant-viewed videos were dropped-off. Participants5 and 18 had all their videos abandoned. . . . . . . . . . . . 83Figure 3.21 Percentage of interrupted videos per category. At least 50% ofthe videos in each category were actively viewed where Gam-ing (85%) showed the highest percentage of videos being in-terrupted while watching. . . . . . . . . . . . . . . . . . . . 85Figure 3.22 Percentage of videos that were interrupted for each participant.Aside from participant 12, at least 33% of each participant-viewed videos were actively watched. Participants 5 and 18had all their videos been interrupted. . . . . . . . . . . . . . 86xxiiiList of FiguresFigure 3.23 An example of one of participant 9 video viewing interactivity.As shown from the view count per second in the video, he per-formed 2 skips, 3 re-watch, and a drop-off at the end indicatinga high user engagement while watching. . . . . . . . . . . . 87Figure 4.1 An illustration of a video history record (9:42 AM 26 Sept2014, video 1, 1:24, 3:12, 12:31 AM 26 Sept 2014, 1:45 PM26 Sept 2014). . . . . . . . . . . . . . . . . . . . . . . . . 97Figure 4.2 The common video player component used in all interfacespresented in this dissertation. It is used for a direct control ofthe selected video using the play/pause button ( , highlightedin green) or via seeking using in the timeline (highlighted inblue). The current playhead time is highlighted in red. . . . 101Figure 4.3 Thumbnail used to represent a video segment with its differentproperties: seek-able, drag-able, delete-able ( ), and favourite-able ( ). . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Figure 4.4 A play-able Thumbnail: on mouse over, a play/pause overlayis displayed over the Thumbnail. Clicking on the overlay playsthe corresponding video interval within this Thumbnail. . . . 103Figure 4.5 View count visualization attached to the left of a Thumbnail in-dicating how often it was watched in relation to other intervalsin the video. . . . . . . . . . . . . . . . . . . . . . . . . . . 103Figure 4.6 Our Video Navigation Interface introducing users’ viewing his-tory in a video interface. The video Player (yellow) is adja-cent to Video Grid (green), which approximates the results ofa search tool. The History Timeline (bottom in blue) providesthe history based on the user’s viewing time, in a scroll-ableone-dimensional filmstrip interface. The vertical red bars oneach thumbnail represent how often this particular segment hasbeen viewed. The Video Mash-up (red) represents a video editlist by visualizing the videos and the intervals a user may com-bine into a summary. . . . . . . . . . . . . . . . . . . . . . 104xxivList of FiguresFigure 4.7 The video player used for a direct control of the selected video.When the cursor is over the time bar a thumbnail for that in-stant is shown to help the user navigate the video. . . . . . 105Figure 4.8 The video grid, which approximates the results of a search tool.Clicking on a video from a grid will start playing the corre-sponding video in the player component. . . . . . . . . . . 106Figure 4.9 The History Timeline provides a visualization of users’ view-ing history based on their viewing time, in a scroll-able one-dimensional filmstrip interface. The vertical red bars on eachthumbnail represent how often this particular segment withinthe video has been viewed. . . . . . . . . . . . . . . . . . . 107Figure 4.10 The Video Mash-up component illustrates a user edit list whichconsists of a user-defined Video Segments with the options ofdelete ( ), re-order ( ), play ( ), and modify segment’s startand end times. . . . . . . . . . . . . . . . . . . . . . . . . 108Figure 4.11 The Summary Preview allows users to watch the video sum-mary they have created. Users can re-play, modify, or exporttheir edit list. . . . . . . . . . . . . . . . . . . . . . . . . . 109Figure 4.12 A Video Timeline is combined with navigation footprints visu-alization, where the more an interval within a video is watchedthe brighter that region becomes. Blue: a single user history(Personal Navigation Footprints), Red: a combined multipleusers histories (Crowd-Sourced Navigation Footprints). . . . 110Figure 4.13 Participants’ clip used in the background laughter trailer. Allparticipants used the same clip with some tolerance at the startand end times of the clip except participant 2 who used a dif-ferent interval. . . . . . . . . . . . . . . . . . . . . . . . . 122Figure 4.14 Percentage of clips extracted from each video. Most of theclips in the trailers came from V2 (Baby elephant sneezes andscares himself) while V6 (Charlie bit my finger) and V7 (Emer-son - Mommy’s Nose is Scary) were the least used in the trail-ers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123xxvList of FiguresFigure 5.1 The History Timeline represented as a video history (a), madeup of individually search-able video segments (b) . . . . . . 132Figure 5.2 Our video navigation interface: the majority of space is de-voted to the currently playing video (top left) with a seek-barpreview; below is a horizontal array of Video Segments ar-ranged by video-time (the Filmstrip), and a vertical array ofVideo Segments to the right (the History Timeline) ordered top-down by user-time i.e. the order in the intervals were viewed. 133Figure 5.3 The Filmstrip component visualizes the entire video into nequal length segments. . . . . . . . . . . . . . . . . . . . . 134Figure 5.4 The Experiment Interface illustrates the familiarity phase. . 141Figure 5.5 The performance of List of thumbnails and Filmstrip in termsof: (a) Average percentage of answered questions, (b) averagetime per question, (c) average number of previews, (d) aver-age answer accuracy per question, for each tested video. Listof thumbnails had significantly more questions answered, lesstime to answer each question, less number of previews, andmore accurate answers. . . . . . . . . . . . . . . . . . . . . 145Figure 5.6 A Coloured Timeline visualizes a user’s navigation footprintsover a video Timeline using colour intensity. The more an in-terval within a video is watched the brighter that region be-comes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151Figure 5.7 The View Count Record (VCR) component visualizes the videoviewing statistics. When no viewing history is available, theVCR presents a familiar Filmstrip (a). When a history is avail-able, our view count manipulation algorithms can be appliedto visualize popular intervals within the video, leading to fastpersonal navigation and social navigation applications. Eachthumbnail can be searched via seeking in the popup red timebar when hovering the cursor over the preview image. . . . . 152xxviList of FiguresFigure 5.8 Each video thumbnail in the VCR is visualized as small videosegments. Each segment is seek-able and play-able on mouseevents. The red/gray portion at the bottom of the widget indi-cates the temporal location of its interval within the completevideo. The yellow line illustrates the current seeking pointwithin the thumbnail, within the zoomed interval for higher-resolution seeking. . . . . . . . . . . . . . . . . . . . . . . 155Figure 5.9 The crowd-sourced data of the “One Man Band” video. Thehighest four peaks used for the shortened video are highlightedin yellow. The View Count Record (VCR) visualizing thiscrowd-sourced viewing statistics is illustrated on top of thegraph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Figure 5.10 The accumulative view count for each video: (a) One ManBand, (b) Partly Cloudy, (c) Day & Night, (d) For The Birds,and (e) Presto. The segments used for each shortened videoare highlighted in yellow. There is a high agreement betweenview count peaks and selected crowd-sourced segments. . . 165Figure 5.11 The crowd-sourced data of the “One Man Band” video alongwith the new cumulative view count. A similar trend appearsin both graphs but the new collected data shows cleaner datawith distinctive peaks. . . . . . . . . . . . . . . . . . . . . 166Figure 5.12 Difference between individual behaviours and crowd-sourceddata for “Partly Cloudy”. All participants had more than onepeak aligned with crowd-sourced highest peaks with the excep-tion of participants 4 and 6 who had only one. Highest peaksare highlighted in yellow. . . . . . . . . . . . . . . . . . . . 167Figure 5.13 Participants’ viewing heuristics for each tested video. Intervalsthat are re-watched by more than five participants are high-lighted in yellow. A high agreement between participants re-watched segments can be seen for each video where there areat least five matched segments. . . . . . . . . . . . . . . . . 168xxviiList of FiguresFigure 6.1 Videos Library Mode from which users select or open a videothey would like to view and navigate. Each video is repre-sented as a small video segment that can be dragged to its topright corner (white square) to start playing in the viewer mode. 174Figure 6.2 Video playback is performed and controlled within the viewer.The video player occupies the majority of the space; the videocan be played/paused using the dedicated button below theplayer (on the left) or by clicking on the video itself; seeking iscontrolled via the white circle playhead or simply by clicking/-dragging on the red/gray video timeline. The filmstrip belowthe player provides real-time previews based on the cursor po-sition, allowing faster navigation of the video. . . . . . . . . 175Figure 6.3 Each history visualization (presented in the History mode) dis-plays the user’s navigation history, and provides top-level ac-cess to all previously viewed videos from which the user mayzoom into any history entry for more detail. The history canbe filtered by date, sorted by time or popularity, and the typeof visualization used can be chosen. The thumbnails’ size isbased on the view count within each video. . . . . . . . . . 176Figure 6.4 The user’s history is visualized as a set of small video seg-ments. An intra-video segment (left) is used to visualize anaggregated history of a single video as one thumbnail, wherethe union of its viewed intervals are visualized in the thumb-nail timeline (red/gray bar) and the combined segment has asingle seek bar. The single video segment (right) represents asingle interval from the history, and it is the furthest possiblezoom level. . . . . . . . . . . . . . . . . . . . . . . . . . . 177Figure 6.5 Video Timeline is a multiple-video history visualization designthat attaches history segments to a user’s vertical navigationtimeline using two columns based on their view count and cho-sen order. . . . . . . . . . . . . . . . . . . . . . . . . . . . 179xxviiiList of FiguresFigure 6.6 Video Tiles is a multiple-video history visualization design thatdisplays history segments based on template patterns followingAlgorithm 4 based on their view count and chosen order. . . 180Figure 6.7 Template patterns are used for the Video Tiles visualization, toprovide a clean set of thumbnails with an implicit order (top-to-bottom, left-to-right, based on a single pattern). Patterns 1,2, 4, and 5 have alternatives where either the entire pattern isreflected or just the portion containing medium and small tiles:Pattern 1 has 8; Pattern 2 has 4; Pattern 4 has 6; Pattern 5 has5. These are applied using Algorithm 4. . . . . . . . . . . . 182Figure 6.8 A thumbnail stack is used when the number of segments tobe visualized exceeds the limit - this addresses scalability ofthe history visualization. The user may zoom into or out ofthe stack via a mouse wheel gesture to obtain control over thepresented level-of-detail. . . . . . . . . . . . . . . . . . . . 183Figure 6.9 The piloted visualization designs for multiple-video history:List (top) and Grid (bottom). In List design, history segmentsare displayed in one vertical scroll-able column, while in Griddesign, they are visualized on an n×m matrix going from leftto right top to bottom. . . . . . . . . . . . . . . . . . . . . 187Figure 7.1 Analysis of a moving target in 1D . . . . . . . . . . . . . . 199Figure 7.2 State Transition Diagram for methods of interaction. . . . . 205Figure 7.3 Experiment Acquisition Types, Red Potion/Chase (a,b), BluePotion/Hold (c,d) and Green Potion/Hybrid (e,f) . . . . . . 208Figure 7.4 The effect of size, speed, angle and direction on mean acquisi-tion time for Hold and Chase in (i) 1D and (ii) 2D. . . . . . 212Figure 7.5 The 2D mean time by size and speed for C: Chase and H: Hold. 214Figure 7.6 Mean Acquisition Time Vs. Index of difficulty for Chase in2D using (a) IDVWtW ′Θ, and (b) IDC2 models. . . . . . . . . 217Figure 7.7 Technique chosen by size, speed, angle, and direction in (i) 1Dand (ii) 2D. . . . . . . . . . . . . . . . . . . . . . . . . . . 219xxixList of FiguresFigure 7.8 The ratio distribution of Hold and Chase using Hybrid in (i)1D and (ii) 2D space . . . . . . . . . . . . . . . . . . . . . 220Figure 7.9 Overview of MediaDiver in view mode. Select-able playersare highlighted when rolling over the player showing the abil-ity to select them. Selection enables highlighting, informationretrieval, pinning and tagging. Pinned players are shown to theleft in the players’ drawer while player’s information can beretrieved and shown on the right. . . . . . . . . . . . . . . . 222Figure 7.10 Overview of MediaDiver in annotation mode. Each rectangu-lar border represents an annotated object. Users can edit eachannotation by moving or deleting it. They can also add newannotation by clicking on the object they want to annotate. . 223Figure 7.11 MediaDiver hiding feature allows users to focus on specificplayer while watching. The selected player is shown in theright image and after pressing the ‘D’ key, the player is re-moved from the video as shown on the left. . . . . . . . . . 224Figure 7.12 Switching between views in MediaDiver using (a) absoluteview and (b) relative view. . . . . . . . . . . . . . . . . . . 226Figure 8.1 Mevie’s Home screen illustrates lists of videos and clips thatare accessible to a user. Each video or clip is represented bya small video preview; tapping on any preview transitions tothe viewer and starts playing the video. More videos can beaccessed by tapping on ‘See All’ for each corresponding list. 231Figure 8.2 A grid of all videos available in the device library. Each videois represented by a thumbnail with a video name and duration.Tapping on any of the thumbnail transitions to the Viewer andstarts playing the video. . . . . . . . . . . . . . . . . . . . 232Figure 8.3 Mevie’s Viewer screen consists of the main video player andthe Filmstrip at the bottom. Tapping or dragging on the film-strip allows users to seek the video being played. . . . . . . 233xxxList of FiguresFigure 8.4 Dragging on the Viewer’s main player brings up a filmstrip(as shown in the middle of the player) that viewers can use tonavigate the current video. . . . . . . . . . . . . . . . . . . 234Figure 8.5 Sharing is possible from the filmstrip in the Viewer Screen us-ing manual selection, most watched, or last watched clips. Inthe sharing mode, users can manually select any interval to beshared using gestures: two fingers or a panning gesture. Thestart and end of the selected interval can be adjusted by drag-ging the needed edge. . . . . . . . . . . . . . . . . . . . . . 235Figure 8.6 Mevie’s Viewing History Timeline represents a sequential logof how a user viewed videos. It consists of two scroll-ablecolumns where the first column shows the dates when therewas at least one video being accessed and the second columnshows all accessed videos. Each viewed video is representedby a seek-able and favourite-able thumbnail. . . . . . . . . . 236Figure 8.7 Summary Viewing History illustrates a collection of all videosaccessed within a specific date. Each video is represented bya thumbnail that can seeked, favourited, played, deleted andopen in the Viewer. The small timeline at the bottom of athumbnail represents the entire video, with the viewed inter-vals highlighted in red. . . . . . . . . . . . . . . . . . . . . 237Figure 8.8 Any viewed interval or video in the user’s history can be easilyshared by tapping on , which brings a list of all availablesocial networks. . . . . . . . . . . . . . . . . . . . . . . . . 238Figure 8.9 Detailed Viewing History illustrates how a single video hasbeen viewed showing all viewed intervals from this video andin the same order how they were viewed. . . . . . . . . . . 239Figure C.1 Average number of visited videos per day for each participant.Participant 11 watched the most number of videos on averageper day, while participants 9, 12, 16 and 18 watched only onevideo on average per day. . . . . . . . . . . . . . . . . . . . 303xxxiList of FiguresFigure C.2 Visited videos in each category grouped into short, medium,and long videos. Most of the viewed videos from Science& Technology were short videos (78%), while most of Mu-sic videos were medium videos (70%). Education had half oftheir viewed videos categorized as long videos. . . . . . . . 304Figure C.3 The distribution of each participant viewed videos among thethree duration groups (short, medium, and long). Participants12 and 19 watched mostly short videos, while half of the videosparticipant 5 watched were long videos. Medium length videoswere the most viewed videos for participants 1, 3, 11, 13, and15 where they had more than half of their viewed videos cate-gorized as medium. . . . . . . . . . . . . . . . . . . . . . . 305Figure C.4 Videos containing skips distributed per category, per partic-ipant, and per users group. Most of the skipped videos camefrom Music category (29%), participants 7 (27%) and 10 (25%),and the medium viewers (52%). . . . . . . . . . . . . . . . 305Figure C.5 For skipped videos, this is the average number of skip ac-tions occurred per video within each category. An educationalskipped video contained on average 5 skip actions indicating asearch for specific information within a video. . . . . . . . . 306Figure C.6 Number of skips occurred per skipped video for each partici-pant. Participant 8 and 13 had the highest average number ofskips per video. While subject 12 did not skip any video. . . 306Figure C.7 Videos containing re-watch behaviour distributed per category,per participant, and per users group. Most of the videos thathad re-watched portions came from Music category (30%),participants 7 (24%) and 10 (23%), and the medium viewers(51%). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307Figure C.8 Number of re-watch actions occurred per re-watched videowithin each category. An educational re-watched video hadthe highest average number of re-watch actions (seven) whichconfirms [70] findings. . . . . . . . . . . . . . . . . . . . . 307xxxiiList of FiguresFigure C.9 Number of re-watch actions occurred per a re-watched videofor each participant. Participant 9 re-watched video had eightre-watch actions on average where most of these performed inScience & Technology videos. . . . . . . . . . . . . . . . . 308Figure C.10 Percentage of videos being replayed and whether they are ina playlist. 4.5% of videos had replay actions where only 32%of the replayed videos came from playlists while 68% of thesewere intentionally replayed. . . . . . . . . . . . . . . . . . 308Figure C.11 Videos containing replay behaviour distributed per category,per participant, and per users group. Most of the replayedvideos came from Music category (77%), participant 15 (42%),and from both heavy or medium viewers (97%). . . . . . . . 309Figure C.12 Number of replay actions occurred per replayed video withineach category. When a music video was replayed, then it wasreplayed 9 times on average. . . . . . . . . . . . . . . . . . 309Figure C.13 Number of replay actions occurred per replayed video. Whenparticipant 11 replayed a video, then it was replayed 7 timeson average. . . . . . . . . . . . . . . . . . . . . . . . . . . 310Figure C.14 Videos containing revisit behaviour distributed per category,per participant, and per users group. Most of the videos thatwere accessed again in a different session came from Musiccategory (41%), participants 7 (21%) and 15 (21%), and themedium viewers (50%). . . . . . . . . . . . . . . . . . . . 310Figure C.15 Number of revisit actions occurred per revisited video withineach category. A revisited How-to & Style video was visited2.6 times on average indicating the resumption of the videocontent in multiple sessions. . . . . . . . . . . . . . . . . . 311Figure C.16 Number of revisit actions occurred per revisited video for eachparticipant. When participant 11 revisited a video, then it wasrevisited on average 4 times, while a revisited video for partic-ipant 17 was accessed twice on average . . . . . . . . . . . 311xxxiiiList of FiguresFigure C.17 Videos containing drop-off behaviour distributed per category,per participant, and per users group. Most of the videos thatwere accessed again in a different session came from Musiccategory (28%), participants 14 (25%), 7 (22%) and 10 (19%),and the medium viewers (62%). . . . . . . . . . . . . . . . 312Figure C.18 Interrupted videos distributed over categories, participants, andviewer groups. Most of the videos that were accessed againin a different session came from Music category (29%), par-ticipants 14 (24%), 7 (22%) and 10 (19%), and the mediumviewers (61%). . . . . . . . . . . . . . . . . . . . . . . . . 312Figure C.19 Average number of interactions per video for each category.At least one interaction (i.e. either skip or re-watch) per videooccurred in each category where Education videos were themost highly interacted with four interactions on average pervideo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313Figure C.20 Average number of interactions per video for each participant.Participants interacted at least once (i.e. either skip or re-watch) per video while viewing. Participants 9 and 11 werevery active while watching where they had five interactions onaverage per video. . . . . . . . . . . . . . . . . . . . . . . 313xxxivGlossary1D One Dimensional2D Two Dimensional3D Three DimensionalANOVA Analysis of Variance, a set of statistical techniques to identify sources ofvariability between groupsGLMMS Generalized Linear Mixed Models, a set of statistical techniques to iden-tify source of variability between groups for non-normal distributed depen-dent variablesK-MEANS Clustering method aims to partition n observations into k clusters inwhich each observation belongs to the cluster with the nearest meanURL Uniform Resource Locator, used to describe a means for obtaining some re-source on the World Wide WebVCR View Count Record, a video navigation tool based on users’ viewing statisticsVVB Video Viewing Behaviour, a Google Chrome extension used to track users’viewing activity on YouTubeXML Extensible Markup LanguagexxxvAcknowledgmentsFirst and foremost, I must acknowledge my praises and limitless thanks to GodAlmighty, the most gracious and merciful, for His help and blessing. I am certainthat this work would not have materialized without His guidance.Most of all, I would like to express my deepest appreciation to my supervisorProfessor Sidney Fels, for his excellent guidance and advice throughout the courseof my PhD research. I have learned so much, and without him, this work would nothave been possible. Thank you so much for believing in my abilities, supportingme, and making this journey such a rewarding experience.I wish also to express my heartfelt gratitude to Dr. Gregor Miller for his pre-cious guidance, continuous encouragement and support that motivated me to moveforward. I thank him for his constructive criticism, personal involvement in allstages of my work, and belief in my capabilities throughout this research. Hiscomments and suggestions have been invaluable.I would like also to thank Dr. Ali Mesbah, Dr. Karthik Pattabiraman andDr. Konstantin Beznosov for agreeing to serve on my supervisory committee andDr. Luanne Freund, Dr. Jane Wang and Dr. Ravin Balakrishnan for being on mydissertation examination committee.I owe a deep debt of gratitude to my university (Sultan Qaboos University,Oman) for giving me this opportunity to pursue my PhD.I must acknowledge as well my fellow colleagues at the HCT lab, for theirfriendship and support and special thanks to Matthew Fong who worked with meclosely on some designs used in this dissertation. The last few years have beenquite an experience and you have all made it a memorable time of my life.I would like to take this opportunity to also thank all the participants who tookxxxviAcknowledgmentspart in the different user studies. Without their contribution, none of this researchwould have been possible.Last but not least, my wholehearted thanks go to my family and friends for theirlove and support during this phase of my life. Special thanks to my parents withoutwhom none of this would have been possible. I remain indebted to them for theirgenerous support throughout my professional career and particularly during theprocess of pursuing this PhD degree. Because of their unconditional love, patience,encouragement, and prayers, I have had the chance to complete this dissertation.xxxviiDedicationTo my parents, Aisha and Mohammed, who prayed constantly for me to reach thispoint in life. Thank you for your support, unconditional and endless love, guidance,encouragement and patience while I was thousands miles away pursuing my Ph.D.research.xxxviiiChapter 1IntroductionVideo has become a predominant part of a user’s daily life, with the emergence ofonline video providers 1 such as YouTubeTM, VimeoTM, and DailymotionTM. Therapid growth rate of video on these providers is due to the recent proliferation ofmobile devices with cameras, and faster data speeds which has changed the conceptof users from just being consumers of content to content creators. This has led tomore and more content becoming available online that billions of consumers canexperience and enjoy with just a simple click.Consuming video online, on mobile devices or on home computers is now awell-accepted form of communication and entertainment. The use of video is notlimited to the entertainment media; it is being used in various ways to drive sales,entertain, communicate and educate. This variety in the types and purposes ofvideos has revolutionized how consumers access, view and interact with mediacontent. Managing, navigating, accessing and sharing specific information fromsuch content is not trivial and often imposes high cognitive workload and physi-cal navigation burdens on the user. Different approaches and designs have beenproposed by researchers to tackle some of these aspects, which are discussed inChapter 2. However, due to the great increase in the quantity of video now avail-able, the variability of content, the evolving nature of the use and consumption ofvideo, and the change of users’ viewing behaviour, we need better management1YouTube:; Vimeo:; DailyMotion: navigation tools to access what we want more efficiently and to provide uswith mechanisms to find previously seen content for us to use or share.To illustrate the motivation behind this research and how these requirementsmanifest in a real-world setting, we consider a scenario from an entertainment con-text since it makes up a quarter of the videos viewed online as shown in Chapter 3.Let us look at Tom who spends 25% of his online time watching entertainmentvideos.Last week, as usual, Tom, watched comedy videos on YouTube. Tom watched afunny one-hour cat video mix and laughed at many parts. He thought, “I’d love toshare the funny bits with Sally.” He clicked on the seek bar, scrubbed around find-ing the funny bits and wrote down the time codes. He emailed Sally, “Hey, Sally,check these out: http://youtube/funnycat; 00:00:10:37 - 00:00:13:21; 00:03:23:10- 00:04:10:21; 00:13:42:07 - 00:15:07:22; 00:28:10:11 - 00:28:47:00; 00:39:01:29- 00:39:56:19; 00:51:19:10 - 00:52:10:21.”Tom noticed that it was already 9:00 PM. So he started watching his favouriteshow “The Amazing Race.” His favourite team did well through most of the chal-lenges. They almost won, but they missed one trick in the last challenge and thecowboys won. Tom remembered that this team was participating in a previous sea-son but he could not remember which season. So he tried to click on one of theteam members to retrieve more information about them. It was hard to select themsince they were moving so fast. Tom paused the video and clicked on one of them,which gave him more details about that participant. “Oh, yeah here it is. Theywere in the 2010 season”, he said.The next day, Tom went to school and he talked with his friend Sally aboutthe show and his favourite team’s performance in the episode. He brought up thewebpage on his phone and looked for the episode. He jumped around the videotrying to find when his team was struggling in one of the challenges. “Here it is.Check it” Tom said. “It is easy. How they did not get it?” Sally asked. “I missedthis episode, but you know what, it reminds me of a challenge from last season”Sally said. He said “Oh really. Which episode was that?” “Let me check. I think Ihave it in my shows list. Let me bring that up for you”, Sally said. They searchedfor the video, and jumped around the video trying to find the shot. “No, No, I thinkit happened sometime when one of the sisters team jumped from the cliff” she said.2IntroductionThey were looking around trying to find it. “Oh! Here it is, here it is! Sally said.“That is really hard. Can you send it to me? I am creating a list of hard challengesas I am planning to participate in the next season” Tom said. “Yeah sure, here itis. Shared the video” Sally said. “Oh, no, I just want that challenge part” He said.Sally said “Sorry, current viewers don’t enable that.”In order to offer Tom and others the accessibility and functionality emphasizedin this scenario, we need to design a video interface that allows users to (1) watchvideos, (2) easily navigate videos, (3) access previously seen content within videos,(4) quickly find previously seen content, (5) share specific portions from videos,(6) generate summaries of previously seen videos, and (7) easily select interactiveobjects within videos. Some strategies, features and design guidelines need tobe developed and added to the current video interfaces to overcome some of thechallenges of meeting these requirements. Figure 1.1 illustrates what we imaginethe future video viewing interface will look like and the different features that needto be added to the current video viewers.Creating a next generation video interface that meets the above requirementsbrings up some challenges, including knowing how people navigate and interactwith the new types of content, as well as matching it to the cognitive mechanismthat people have when dealing with time-based media. As users view and navigatevarious videos, their viewing patterns, annotations comments and so on, can bethought of as digital “footprints” left on the videos. Some of these footprints aregenerated explicitly when users intentionally make a specific action around themedia while viewing the content. This may include, for example, rating a video,commenting, adding annotation, tweeting, or voting. However, the majority ofthe footprints are generated unintentionally (i.e. implicit-user metadata) by thenature of simply interacting with the media without requiring any additional actionsfrom the user. For example, a user’s physiological response, facial expression, eyemovements, visiting a video, viewing, and video interaction clickstreams, such asplay, pause, skip, replay, or seek/scrub. This data can be examined to characterizevideo watching and navigation patterns that can be employed in designing newtools and interfaces for personalized video viewers. Moreover, the content beingviewed from each video and how frequent it has been viewed can signify importantmeaning for such content. This information can then be used to assist the design3IntroductionFigure 1.1: An illustration of a future video interface with the contributionsmade by this dissertation. The interface allows users to view and navi-gate videos while keeping track of their viewing history and providingaccess to this data. Users’ viewed content can be accessed using thehistory component. Navigating the playing video can be achieved us-ing the VCR and the Hold technique. (Note: This figure lays down theelements and interactions defined in this dissertation.)of new tools to navigate and search video content.Viewing pattern statistics are important to facilitate interface design to matchhow people watch videos. There is a significant difference between linearly watch-ing a feature length movie (such as found on Netflix) and short educational modulesor comedy videos, such as on edX2 or YouTube, for example. Thus, characterizingthe contemporary watching patterns would enable developing personalized tools2 could satisfy users’ needs. In the first part of this dissertation, we study users’viewing behaviour on YouTube through web browsers on a desktop platform thatis discussed in more detail in Chapter 3. We explore whether users’ activity whilewatching videos can provide some insights that can be turned into new tools fornavigating video content and searching previously seen content. Through this studywe show that a change exists in the way we view and experience videos besidesjust sitting and watching videos passively from start to end without any interactionduring the playback of the content. We have been able to explore how people inter-act with videos from different categories. And, in contrast to previous research, wehave looked at the behaviour of each individual user and how often they performthe different interactions while viewing. This enables us to determine seven differ-ent behaviours a user may exhibit while viewing videos, which in return helped usdefine some design guidelines for a more personalized video interface tailored tothese behaviours as shown in Figure 1.1.Researchers [44, 69, 111] have put substantial effort in extracting meaningfrom users’ digital footprints and turning it into targeted applications. Many appli-cations have been developed in the literature by mining this kind of data, as dis-cussed in more detail in Chapter 2. These applications indicate that users’ digitalfootprints provide a rich resource that can be leveraged for viewers. Nonetheless,this data is not accessible to users and only researchers use it to define a set of toolsfor consumers. We are interested in providing this data to the users themselves tocreate more personalized experiences, and investigate how they will use it, whatother applications may emerge, and whether this data is going to change the waypeople view videos.Our study (Chapter 3) has showed a high revisitation of videos to access pre-viously seen content, which implies that users very often go back to videos theyhave seen to search for specific portions or information. Providing users with whatthey have seen from each video can help them to easily find what they are lookingfor. We call this kind of data Video Viewing History, which is simply an archiveof each interval a user has watched from any video. History of users’ actions hasbeen widely investigated for multiple purposes in different domains including webbrowsers [21, 60, 75], documents editing [9, 56], workflow [49, 91], tutorial gen-eration [15] and information visualizations [51, 54]. Researchers have introduced5Introductionand developed different tools that keep and visualize records of users’ actions forlater use. In comparison to the user’s actions history in these domains, video view-ing history is more difficult since we are not only dealing with the temporal natureof users’ experience, we are also dealing with the temporal nature of the media it-self; whereas, history for web browsing, for example, has a user time and no mediatime. This additional complication with video history, therefore, requires more so-phisticated representation and visualizations to communicate such data than thoseproposed for the history in the other domains.(a) (b) (c)Figure 1.2: The history component of the video interface (a). Clicking onone of the thumbnails from history (b) brings a detailed history in an-other screen. Three different designs were developed and tested for thedetailed history as show in (c).To provide users (e.g. Tom and Sally in the scenario) access to their previouslyseen content that can be used for different purposes, we need to design an interfacethat allows users to watch videos, keeps track of their viewing history as well asproviding access to such data. This will allow us to explore the usability of viewinghistory and investigate how it can improve users’ task performance such as searchand navigation. To achieve this, a history component (Figure 1.2(b)) is added to avideo interface (Figure 1.2(a)). Through testing and learning from users, a seriesof modifications (Figure 1.2(c)) and experiments are applied to the design of this6Introduction(a) (b)(c) (d) (e)Figure 1.3: Viewing history visualizations where each interval is representedby a quadrilateral shape: the bigger the size the more that interval iswatched. Four designs are proposed and evaluated in this dissertation:(c) History Timeline and (b) View Count Record (VCR) for single-videohistory, and (d) Video Timeline and (e) Video Tiles for multiple-videohistory. VCR is compared against the state-of-the-practice navigationmethod filmstrip (a).component as detailed in Chapters 4, 5, 6 and 8. This leads to two designs fora single-video history: History Timeline (Figure 1.3(c)) and View Count Record(VCR) (Figure 1.3(b)); and two designs for multiple-video history: Video Time-line (Figure 1.3(d)) and Video Tiles (Figure 1.3(e)). Each quadrilateral shape inthese designs refers to an interval that a user has watched that is represented bya seek-able and play-able thumbnail (discussed in details in Section 4.3). Thesedesigns try to present and communicate the viewing history in a clear and compre-hensible way to users so that they do not hinder the navigation and search tasks.We have shown that our designs outperformed the state-of-the-practice approachesin a search task for previously seen events in videos. Using these designs enablesus to check how they can help utilize and manage users’ viewing histories, andevaluate the benefits this brings. They help us to satisfy requirements 2, 3, 4, 5,and 6 of the interface specified earlier.Going back to the scenario about Tom, video navigation is another issue thatneeds to be tackled. Navigating a video using its timeline to find specific content7IntroductionFigure 1.4: The VCR component in the video interface is used for navigatingthe playing video.can be demanding and time consuming. Researchers have proposed different in-teraction techniques to alleviate this problem allowing users quickly navigate andsearch video content as discussed in Chapter 2. Our interface adopts users’ view-ing heuristics to improve video navigation. This leads to the design of the VCRcomponent shown in Figure 1.4, which is detailed in Chapter 5. Thus, viewers canuse either the history or VCR components to navigate, find previously seen con-tent and summarize the video based on their viewing heuristics; hence, it fulfillsrequirements 3, 4, 5 and 6. Our evaluation results have shown that a user’s viewinghistory provides quick navigation and fast search tools for previously seen content.With the diverse selection of videos now available for users, navigation extendsbeyond the use of the controls available in a standard video player. Other formsof navigation have emerged that have introduced new challenges for users. Forexample, one of the forms of video that has emerged and is widely used in market-8IntroductionFigure 1.5: The Hold technique in the video interface is used for selectingand activating interactive objects for and education is interactive video where interactive spots or annotations withinvideos are introduced. Clicking on these spots or annotations directs viewers to an-other video or piece of information where they can engage and spend more time onthat specific information. However, due to the time-based nature of these videos,the embedded clickable objects are visible or active only for a certain duration ofthe video in contrast to web pages, in which hyperlinks are present at all times.Therefore, the activation and selection of these hotspots becomes difficult. This9IntroductionFigure 1.6: The history-based video viewer application with some content.can also be seen from Tom’s scenario where he tried to click on the team membersto retrieve more information. To alleviate the navigation problem in this kind ofvideo, we have looked at the factors that affect the selection of these hotspots toallow users easily navigate the video content and hence satisfy requirements 3 and7. Therefore, as our third main contribution in this dissertation, in Chapter 7 we de-rive and validate a new mathematical model that estimates the time needed to selecta moving object based on its size, speed, direction, and angle of movement. Basedon this model, a novel target acquisition technique called Hold is developed to easethe problem of selecting moving objects. We have verified that Hold outperformsthe traditional technique of chasing the object to be selected and showed how itcan be used as context switch that enables multiple functions in a multi-stream richmedia sport videos interface (MediaDiver) as shown in Figure 1.5.Based on the guidelines we have created and the various interface elementswe have evaluated, this dissertation has proposed a version of the next generation101.1. Contributionsvideo interfaces that overcomes some of the challenges discussed earlier as shownin Figure 1.1. Our version of a video interface that has brought together all the de-sign elements except for the selection technique Hold is called Mevie (Figure 1.6).Chapter 8 describes Mevie in detail, including, how the designs were integrated intoa mobile platform. It helped us to evaluate the components when put together inone interface and to explore people’s reactions when these elements are introducedon a video interface. The results revealed that users welcomed the changes broughtinto the interface and started thinking differently about videos and the way theycould be used. We started to see how a change in the video interface introducesvarious potential applications and usages of videos.1.1 ContributionsThis dissertation provides three main contributions to the research domain of Hu-man Computer Interaction (HCI). These contributions include studying users’ videoviewing behaviour(Chapter 3), creating and validating some design strategies forfuture video interfaces to match how users view and interact with video (Chapters4, 5, 6, and 8), and developing novel selection technique for a moving target (Chap-ter 7). Publications resulting from this work are listed in Appendix A and the maincontributions are summarized here.Contribution 1: Behavioural analysis and characterization of how users inter-act with online videos. We developed a Google Chrome extension that allowed usto collect users’ viewing behaviour while watching videos on YouTube on a desk-top platform. A five-month collection of traces was analyzed to characterize howpeople navigate and view videos and to determine their viewing behaviour. Basedon our sample (n = 19), we showed that people are actively watching videos anddemonstrated that people re-watch all video types equally and they do it often. Thedata revealed that when a user accesses a video a second time, it is mostly to referback or re-watch something that has been previously seen and not to be resumedfrom where it was left. We also showed that the drop-off has little to no correlationwith the video length and popularity, indicating the subjectivity of users’ interestin the content. We found that most participants were re-watchers (i.e. watch por-tions of a video multiple times), skippers (i.e. jump around a video to find specific111.1. Contributionsinformation or to pass over irrelevant and uninteresting parts) or both.Contribution 2: Creating and validating some design strategies for a history-based video viewing interface to enable users to more quickly navigate andfind moments from previously seen videos. We designed and implemented avideo viewer interface that keeps track of users’ viewed portions from each videoand provides access to this data. In order for users to be able to access this data, dif-ferent visualizations for history were designed and evaluated. By applying users’viewing history we were able to provide a fast navigation and search tool. We havesolved it in these contexts:i. For single video history. We used a chronological ordered list of thumbnails(Figure 1.3(c)) as a visualization of the viewed portions within a video, illus-trated in Section 5.1. Using this visualization we showed that participants werefaster in finding answers to specific questions from previously seen content.ii. For multiple video history. We designed two different visualizations for his-tory: Video Timeline (Figure 1.3(d)) and Video Tiles (Figure 1.3(e)), detailedin (Chapter 6). Searching for previously seen events was faster using bothdesigns in comparison to the state-of-the-practice method.iii. For visualization of the history in a filmstrip. We employed the viewingstatistics of a video to construct a histogram-based filmstrip we call VCR (Sec-tion 5.2). We showed that searching for specific events in a previously seenvideo using the VCR (Figure 1.3(b)) outperformed the state-of-the-practicefilmstrip (Figure 1.3(a)).Contribution 3: Developing a novel selection technique and formulating amathematical model for moving target acquisition. To alleviate the burden ofselecting interactive objects in the new form of videos, we modeled the time neededto select a moving object, and based on this model a novel acquisition techniquewas developed.i. Formulating a mathematical model. We derived a new model to estimatethe time needed to select a target moving in a 2D environment based on thetarget size, speed, direction and angle of movement.121.2. Dissertation Outlineii. Validating the model. An experiment was designed to validate the proposedmodel for a moving target. The data showed a very good correlation with themodel, verifying its validity.iii. Applying the selection technique in a working interface. To test the fea-sibility of our proposed selection technique (Hold), we designed and imple-mented an interactive interface, called MediaDiver for experiencing, viewingand annotating complex video domains and its associated metadata content.It enables viewers to interact, browse, explore and annotate (i.e. tag) a multi-stream sport video.1.2 Dissertation OutlineThis dissertation establishes the research needed to meet many of the requirementsfor the next generation of video interfaces. Thus, it is structured around the maincontributions made in this dissertation. We begin in Chapter 2 by reviewing whathas been done in the literature and how researchers tackled the different issues.Chapter 3 details our behavioural analysis on how users view and interact with on-line videos. Our first prototype of a history-based interface and the evaluation of theusability of the interface for creating short trailers are described in detail in Chap-ter 4. Based on the findings from the evaluation, some modifications have beenapplied to the interface, which are described in Chapter 5. Chapter 5 also presentstwo different visualizations for a single video viewing history and details the stud-ies conducted to evaluate these visualizations in a search task in comparison to thestate-of-the-practice method. Chapter 6 then describes new modifications to theinterface, details two new visualizations for multiple-video history, and presentsthe evaluation of the performance of these two designs in a search task. In an ef-fort to reach a large audience, we have integrated our concept and designs into amobile application. A detailed description of this application with its evaluation ispresented in Chapter 8. In Chapter 7, we describe another form of video navigationand tackle the selection problem of moving objects in interactive videos. A newmathematical model is derived in Chapter 7, validated and used to propose a newselection technique. Finally in Chapter 9, we summarize the dissertation contribu-tions, describe directions for future work, and provide some concluding remarks.131.2. Dissertation OutlineThe appendices provide some additional material. Appendix A lists the publica-tions and interactive demonstrations associated with the dissertation. Appendix Bprovides a list of questionnaires that were used in each user study in this research.14Chapter 2Related WorkMost people have experienced videos on different platforms and various social me-dia websites. They use their devices trying to find or explore some videos and theyend up viewing, navigating, re-watching, and sharing several videos. Further, tonavigate between different videos has been made even easier on social websitessuch as YouTube. When a person watches a video on these websites, they are of-fered other videos that he/she can navigate to. These are offered either through alist of recommended videos, or hyperlinks and hotspots within a video that a usercan click to be directed to that video and he/she can start viewing it. This new kindof interactivity changes the concept of watching a video sequentially and passivelyto what is known as interactive videos. In interactive videos, a video is connectedto multiple types of media allowing the viewer to have access to additional infor-mation when requested. Just as the World Wide Web (WWW) changed the readingand publishing of web pages through the introduction of hyperlinks, video link-ing changes the linearity of video, offering viewers richer information and moreimmersive experiences.Even with a single interactive video, various people end up with differentexperiences because they could navigate around it using the embedded hotspots.Recording users’ navigation experiences using their digital footprints gives themthe possibility to navigate back through the video sequences they had already se-lected, allowing them to re-experience specific sections multiple times, reuse por-tions of it, save, search or even share it with others. People can have a rewarding152.1. Studying Video Viewing Behaviourway to enjoy their experience all over again. To offer such a system, we need tounderstand (1) how people navigate and view videos, (2) how this navigation be-haviour and interaction history can be used, (3) how to present this informationto users, and (4) how users’ experience and tasks performance can be improved.In this chapter we are looking at what has been done in the literature and howresearchers tackled these issues. We start by looking at what has been discov-ered about users’ navigation and interaction behaviour while viewing videos (Sec-tion 2.1). We then survey the different applications of recording and utilizing users’viewing behaviour in Section 2.2. For users to be able to use their viewing history,the interface needs to provide a visualization that reflects what they have seen andallows them to easily access and use it. Hence, in Section 2.3 we check differentvisualizations proposed by researchers in the literature. To offer users pleasant ex-periences while viewing videos, in Section 2.4 we looked at how the navigationwithin videos can be made quick and effortless, and how to ease the selection ofinteractive objects within videos in Section Studying Video Viewing BehaviourThere is a growing body of research and interest in understanding video viewingbehaviour which has been largely motivated by the popularity of online videos,interest in the social web, sharing, and the use of videos online. Every second,hundreds of hours of user-generated content are uploaded and millions of peopleare enjoying this content on different platforms (e.g. TV, mobile, desktop, tablet).A large amount of metadata is left on these videos every time users view them,which provides a rich resource that can be leveraged for viewers. This data canbe mined, aggregated and analyzed to understand people’s consumption practices,how they experience media, and to develop models and tools that improve users’task performance.Two resources of metadata have been used in the literature to extract meaningby analysis of the user activity on the video: explicit-user interactions and implicit-user interactions. Explicit-user metadata are collected by asking users to make aspecific action around their points of interest such as rating a video, commenting,adding annotation, tweeting, and voting. Users intentionally provide this infor-162.1. Studying Video Viewing Behaviourmation during/after viewing or experiencing the media. However, implicit-usermetadata is any information that is generated by the nature of simply interactingwith the media without requiring any additional action from the user. For instance,visiting a video, viewing, and video interaction clickstreams, such as play, pause,skip, replay, or seek/scrub. This interaction data is automatically gathered by theapplication while users naturally interact with videos.2.1.1 Explicit-User MetadataSome researchers have focused on the meaning of activities and behaviours sur-rounding the social experience (e.g. users’ comments, ratings, annotations, remixesand micro-blogs) to understand media semantics, to get some insight into users’ be-haviour and how to utilize this data. For example, Shaw and Davis [107] showedthat users’ annotation and retrieval requests can be analyzed and used to develop abetter video representation structure. They also demonstrated that the analysis ofannotations and the re-use of video segments in re-mixes [108] can be used to un-derstand media semantics. Shamma et al. [106] explored the benefit of the micro-blogs (Twitter activity) in structuring a TV broadcast. They found that the analysisof the comments from the Twitter stream can be used to predict changes in topics inthe media event and comments reflect the topics of discussion in the video. Hence,this can be used to create summaries of broadcasts. Olsen and Moon [94] also usedexplicit user data to generate summaries; however, they applied users’ ratings toselect segments instead of users’ tags or comments. Users’ voting was also usedby Risko et al. [98] to visualize the important parts within a video. They devel-oped a lecture video annotating tool with which students indicate the importance ofa segment from a video by clicking a button. The responses were then aggregatedfrom all students to highlight important parts of the video. Yew et al. [122] alsoused users’ ratings but for a different application. They used the users’ rating scoreof a video to train a Naive Bayes classifier to determine the genre category of aYouTube video. A 75.5% category prediction accuracy was produced using theirclassifier, which indicates the importance of explicit-user metadata in reflecting theproperties of the video content. In this dissertation, we are more interested in theimplicit interactions rather than the explicit ones.172.1. Studying Video Viewing Behaviour2.1.2 Implicit-User MetadataImplicit-user data can be collected from user navigation behaviour and interactionwith the video or from user unconscious physical actions such as eye movements,heart rate and brain neuron reactions that can be gathered with electroencephalog-raphy (ECG). Both approaches have been applied in the literature to explore videoviewing patterns and usage opportunities.Users’ Physiological ResponseMoney and Agius [87] looked at users’ physiological responses, including electro-dermal response (EDR), respiration amplitude (RA), respiration rate (RR), bloodvolume pulse (BVP), and heart rate (HR), while viewing a video to develop a videosummarization technique. Analysis of viewers’ facial expressions was also used toprovide a promising video summarization tool [65]. Peng et al. [96] analyzedusers’ behaviour, including eye movements and facial expression, while watch-ing videos to estimate users’ interest in the content. Using the interest estimationmodel, the authors were able to identify interesting parts within a video whichcould then be easily compiled into video summaries. This approach showed an ef-fective utilization of users’ physiological behaviour in video summarization. How-ever, it is not practical since it requires a video camera and additional sensors,which needed to be worn while viewing videos.Users’ InteractionsUsers’ natural interactions with a video, such as play, pause, skip, seek, replay,and revisit, provide a large amount of metadata that can be examined to understandvideo watching and navigation patterns. This information can then be used to im-prove users’ task performance and provide insights on how interfaces should bedesigned. Different researchers have looked at users’ interactions within a videoto determine the interesting or important parts of the video that can be used laterto create representative thumbnails, as a tool for navigation, or as a summarizationtool. Despite the excitement, relatively few attempts have been made to explicitlyanalyze users’ implicit interactions to understand how users interact with videoswhile viewing. Yu et al. [126] analyzed users’ logs from a video-on-demand sys-182.1. Studying Video Viewing Behaviourtem to understand users’ behaviour, content access patterns and their implicationson the design of media streaming system. They looked at users’ daily and weeklyaccess patterns, users’ arrival rate, and streaming session length. For the accesspatterns, they found a correlation between users’ work habits and the peaks ofthe number of users accessing the system. Daily data reached its peaks at noonbreaks and after work, while the weekly peaks were reached on Sundays. The av-erage session length was found to be quite short, where 52.55% of the sessionswere terminated within the first 10 minutes. The authors explained that this wasdue to users being uninterested in the content and they scanned the beginning ofthe videos to quickly determine their interest. This result predicts a design guide-line for a streaming system where, in order to serve a minimum of 50% of users’session, the system needs to cache the first 10 minutes of the videos. They alsofound a negative correlation between session length and video popularity, wherebymore popular videos were watched in shorter session times. Work done by Yu etal. demonstrated how often people watch videos, how long they watch, and therelation between popularity and session length. They did not look at the actualinteractions within the videos such as play, pause, etc.Furthermore, Huang et al [59], Hwang et al. [61] and Yin et al. [124] examinedusers’ actions while viewing videos from a video-on-demand system. However, incontrast to Yu et al. [126], they also studied in-video actions such as play, pause,unpause, seek and stop events. Haung et al. [59] analyzed a nine-month traceof data from MSN video service. They studied users’ interactions such as pause,resume, fast-forward, and fast-backward. They found that users generally watchlarge portions of short videos, while less than 20% of users stayed on long videos(i.e. duration more than 30 minutes) and watched more than 60% of these videos.This indicates that most users do not watch videos in their entirety. Moreover, theydiscovered that 80% of the sessions were watched without any interaction fromusers. Similar results were also found by Yin et al. [124]. For long videos (i.e.more than 30 minutes long), approximately 40% of sessions had some interactivityfrom users. However, as the authors mentioned, their data was mostly for videosbetween 5 and 15 minutes. They did not have any videos of length between 30to 48 minutes. Likewise, Hwang et al. [61] found that most users only watcheda fraction of a video, and they quit the videos before the end. For long videos,192.1. Studying Video Viewing Behaviouraround 50% of videos were abandoned before reaching 40% of the video length,while for shorter videos only 20% of the videos were quit before viewing 40% ofthe video. Hwang also looked at the correlation between a video’s popularity andhow much of it was viewed. They discovered that there was little to no correlationbetween them. On the other hand, Yin et al. [124] studied users’ interactions witha video-on-demand system during the 2008 Beijing Olympics. Similar to Yu et al.[126], the peak hours of access was observed after work and more interestingly,the peak days were on the opening ceremony day of the Olympics and the daywhen a popular Chinese athlete withdrew due to an injury. Yin et al. also lookedat how the popularity of videos (i.e. number of views) changes over time and theresults showed that in such a multi-day event, popular content changes frequently,whereby the top 5 videos were completely new every day, which was due to thereal-time, event-driven nature of the content. The top 5 videos were discovered tobe strongly related, whereby these segments belonged to the same logical event.For the video length, they found an inverse correlation between a video length andthe viewed percentage of a video where longer videos were dropped-off earlierthan shorter ones. This result coincides with Huang’s [59] findings. Yin’s dataalso showed a strong correlation between video length and session length for shortvideos. Overall, viewing time (i.e. session length) was under 600 seconds irre-spective of the actual video length. In terms of activity while watching, 80% of thesessions had no user actions (e.g. play, pause, seek) similar to findings by Huanget al. [59]. However, these findings contradict the observations made by Gkonela[44], Chorianopoulos [28], Kim [70], and our reported results in Chapter 3.Gkonela and Chorianopoulos [28, 44] analyzed users’ interactions in a con-trolled lab experiment for three different types of video, which were Documentary,How-to, and Lecture. To control video playback, they employed three custom but-tons: play/pause, GoForward (i.e. skip forward 30 seconds), and GoBackward (i.e.rewind or jump backward 30 seconds). They found that users used skip forwardbutton most of the time (812 out of 1,258 interactions) and they explained that itwas due to time pressure of the experiment in which users had to answer questionswithin 5 minutes of a 10 minutes video length. When trying to find the answers tothe provided questions, users were allowed to skip different portions of the videobefore settling down on a region that contained the intended answer. Looking at202.1. Studying Video Viewing BehaviourFigure 2.1: Chorianopoulos [28] results of the matching between users’ in-teractions (Replay30) time series and ground truth interest points withinvideos. Approximately 70% of the interesting segments were observedwithin 30 seconds before a re-watching local maximum.the re-watching behaviour, for all videos, they found a high match between thepeaks (i.e. local maximum) in the users’ re-watching time series graph and theestablished ground truth. Approximately 70% of the interesting segments wereobserved within 30 seconds before a re-watching local maximum as shown in Fig-ure 2.1. This work shows interesting observations where users tend to skip videosa lot when trying to find specific information, and they re-watched parts of interestmultiple times. However, these results could not be generalized since 1) custombuttons were used that do not allow authors to capture the real natural users’ navi-212.1. Studying Video Viewing Behaviourgation behaviour, 2) the experimental design forces users to exhibit some specificbehaviour since they were tasked to find answers for some questions which doesnot give users the freedom to navigate videos as they would naturally do, and 3) thethree types of videos, have a similar purpose to gain knowledge, which does notmap the general users’ behaviour because users may exhibit different behaviourwhen they view videos not for a learning purpose.Li et al. [74] also investigated viewing patterns in a laboratory setting. They in-vestigated the usability of different features of an enhanced video interface and howoften participants used these features across six different video contents. Their en-hanced interface contained different features to navigate and view videos, such astraditional video controls (e.g. play/pause, fast forward, and rewind), rich indexesfor navigation (e.g. table of contents, video shot boundaries, and personal marks),and speeded-up playbacks (e.g. time compression and pause removal). Participantswere asked to watch long videos (40 - 60 minutes) in a limited amount of time (30minutes) using the different features. The results showed that participants appliedthese features differently based on the video content. For informational audio-centric, video indexing tools such as a table of contents or a list of personal notesprovide a valuable tool for fast viewing of the content. However, for informationalvideo-centric videos, the list of thumbnails and shot boundary frames helped par-ticipants to quickly view videos. For narrative-entertainment videos, participantspreferred to have a fast playback of the video and to jump around commercials.Similar to Gkonela and Chorianopoulos [28, 44], Li et al. [74] used a task thatmight have forced participants to apply specific patterns while viewing rather thanthe normal behaviours participants would exhibit naturally.Another work that explicitly analyzed users’ interactions was done by Kim etal. [70], who looked at specific interactions in one category of videos. Kim et al.analyzed students’ interactions in educational videos to understand how learnersuse video content and how that affects their learning experience. They looked atin-video drop-off rate and users’ interactions including play and re-watch. Theydiscovered that the drop-off rate increases as the video gets longer. Moreover, theyfound that students returned and re-watched parts from videos and most of these re-watch actions occurred around parts that are confusing or important (similar to [28,44]). 61% of the re-watch peaks accompany a visual transition in a video. Based222.2. Leveraging Implicit-user Interactionson the observations around the peaks, Kim et al. identified five categories of users’activity: starting of new material, returning to missed content, following a tutorialstep, replaying a brief segment, and repeating a non-visual explanation. Peaks thatoccur due to following a tutorial step were found to be significantly higher thanthose occurring for starting new material and non-visual explanation. Kim’s workhas proven that users actively view videos and they do actually re-watch parts froma video multiple times. Their work provides a good start in understanding videoviewing behaviour; however, they have not analyzed other interactions such asskip, replay, and revisit. Moreover, they only looked at educational videos thathave a similar purpose as in [28, 44], and their results were based on a specificpopulation (i.e. students). In Chapter 3, we report the results of analyzing thedifferent behaviours while navigating non-specific categories of videos with nospecific purpose of viewing and using different sample of users than just students.We demonstrate how these behaviours occur in different video categories and foreach user not only for the collective users’ behaviour. We also show how the re-watching behaviour exists across all other categories. This contradicts with otherresearchers’ [28, 44, 69] claim of the occurrence of this behaviour in educationaland how-to videos only.2.2 Leveraging Implicit-user InteractionsA large body of work has been going on analyzing users’ behaviour logs to be usedfor a targeted application. This work includes looking at users’ interactions andusing the wisdom of the crowd to propose new tools for video navigation, sum-marization, creating representative thumbnails, and design guidelines for viewinginterfaces and content producers. Martin and Holtzman [77] emphasized the im-portance of implicit interactions in developing models to filter media delivered to auser to be more personally relevant and socially grounded content. They used thepercentage of a video watched, the amount of time spent viewing, and the amountof time spent interacting with a video to assess the popularity and the relevance ofthe content being viewed to the users’ social networks. These data then can be usedto filter and propose other related content to the user based on their own behaviour,preference, and their social peers. Martin and Holtzman focused on news content232.2. Leveraging Implicit-user Interactionsand did not explore their model applicability to other types of content that couldexhibit different users’ behaviour. Yew et al. [123] recognized the importance ofusing the number of play, pause, fast forward, and rewind events to identify thegenre of comedy videos. Feeding this information to their Naive Bayes classifiershowed a 6.8% accuracy increase in category prediction when compared to theirprevious work [122], which used explicit-user metadata. They were able to achieve82% accuracy using only the collective wisdom (i.e. collective user interactions).This means that predictive models can be made more accurate via the examinationof users’ viewing behaviour. Yu et al. [126] and Yin et al. [124] discovered the im-portance of using the amount of video being watched for network providers. Theyfound that the session length for most users was roughly 10 minutes, which theysuggested to be used as a guideline for caching mechanisms. Thus, the initial 600seconds of a video (or of the most viewed segments) would be cached instead ofcaching large videos entirely.Some researchers have used the number of views (i.e. revisit actions) a videoreceived to identify quality of content [36], and to predict video popularity [112].Crane and Sornette [36] used the number of views (i.e. revisit actions) a video re-ceived to identify high quality content or videos that attract attention and keep theirappeal longer over time. They analyzed 5 million videos posted on YouTube andfound that using this view count they could identify high quality videos from junkvideos amongst the viral videos within YouTube. Similarly, Szabo and Huberman[112] were able to predict a video’s popularity thirty days ahead using view countson YouTube.Another group of researchers has looked at the number of re-watch actionswithin videos to infer video segments of interest or importance, which then can beused for generating a representative video thumbnail [28, 44, 73, 108], to place con-tent on network providers [61], and to make mash-up video summaries [108, 111,125]. Leftheriotis, Gkonela and Chorianopoulos [28, 44, 73] proposed a thumbnailgeneration method that is based on the peaks of the re-watching view count. Theyused the three most popular scenes in a video (i.e. most re-watched) as the proposedthumbnails to represent a video being played. They found that these representativeframes matched the segments of interest in a video. Shaw and Schmitz [108] pro-posed using the number of reuse of segments in re-mixes to select a representative242.2. Leveraging Implicit-user InteractionsFigure 2.2: Mertens [82] footprint bar visualizing viewing history of a user.thumbnail. They used the top local maxima of these reuse numbers to select seg-ments that can be combined in a video summary. Hwang et al. [61] looked athow viewing patterns can be applied to place the most viewed video content inprovider networks rather than using the complete video to reduce their storage andnetwork utilization. Their approach showed substantial savings in storage and net-work bandwidth when compared to a simple caching scheme. Moreover, accordingto Yu et al. [125], using the peaks in the number of re-watch graphs to generatea video summary can offer a shorter path in each video. They found that thereare segments of a video clip that are commonly interesting to most users. Syeda-Mahmood and Ponceleon [111] used the number of re-watch actions along withsome explicit user interactions where they asked users to provide their sentiment(e.g. bored or interested). Both sets of data were used as states in a Hidden MarkovModel to determine the interesting segments from a video. To generate a previewusing these segments, Syeda-Mahmood and Ponceleon analyzed the audio track toprecisely decide the length of each segment and their start and end time around thepeaks, after which they can be automatically combined into a meaningful videopreview. When testing these previews as teasers for other users to rate for whichpreview they are going to watch the entire video, they found that the ratings of thevideos to be watched entirely changed after watching these previews. This indi-cates the effect of the method for creating video previews. All the aforementionedwork uses collective data to generate these summaries or previews. Furthermore,we show in Chapter 5 how personal re-watching behaviour can also be used as atool for generating video previews.Some have used the viewing history by employing the number of times eachframe or second in a video has been viewed (i.e. view count) as a navigation toolfor a video [69, 82]. Mertens et al. [82] used users’ traces or footprints as an252.2. Leveraging Implicit-user InteractionsFigure 2.3: Kim et al. [69] Rollercoaster timeline visualizing viewing historyof multiple users (collective wisdom). The height of the timeline at eachpoint shows the amount of navigation activity by learners at that point.The magenta sections are automatically-detected interaction peaks.overlay over a video timeline as shown in Figure 2.2. They used different bright-nesses of a colour to indicate how often each part of a video was viewed, whichreflects how a video was consumed. Clicking on any of these highlighted portionson the timeline seeks the video to the corresponding time. Kim et al. [69] appliedMertens’ approach, but they visualized the view count as a histogram (or what theycall Rollercoaster) as illustrated in Figure 2.3. The height at any point in the time-line indicates how often that part was viewed. Using the height instead of colourbrightness makes it easier for users to spot the peaks or where the important partsare located within the video. However, neither visualization tells users the contentof these parts without the need to navigate to each specific point. In Chapter 5, wepropose a better visualization, which applies the view count (i.e. viewing heuris-tics) as the basis similar to Mertens et al. [82] and Kim et al. [69].Carlier et al. [22] and Shamma et al. [104] used another form of implicit in-teraction. Carlier et al. [22] looked at users’ zooming and scrolling actions whilewatching videos on small screens. By exploiting collective users’ wisdom, theyre-targeted a high-resolution video for display on small screens as shown in Fig-ure 2.4. They used the selected regions after applying Gaussian Mixture Models,Minimum Covariance Determinant and a re-framing technique to find and stabi-lize regions of interest from the different frames. Their experimental re-targetedvideos automatically produced using the crowd-sourced data were only slightlyworse than those produced by hand by an expert. This indicates that even zoomingand scrolling interactions provide a potential application based on the trust of thecrowd. However, these need a custom interface that offers these features, which isstill not available on the commonly known video social websites (e.g. YouTube).On the other hand, Shamma et al. [104] looked at the implicit social sharing activity(e.g. the number of pauses, rewinds, fast forwards and session length) that occurs262.2. Leveraging Implicit-user InteractionsFigure 2.4: Carlier el al. [22] employed users’ zooming and scrolling actionswhile watching videos on small screens to re-target a high-resolutionvideo for display on small screens. Their approach overview is shown:four frames and a few viewports (first row), heatmaps and detected re-gions of interests (ROIs) (second row), and re-targeted frames includingre-framing techniques (last row).while sharing a video in a synchronous sharing tool to predict videos’ popularity.Their prediction model reached a 95.8% accuracy using a small training set to pre-dict whether a video would have more than 10 million views. Even though 100 oftheir total 1,580 videos had over 10 million views, their model was able to correctlypredict 81 videos. Their data showed no correlation between YouTube view countand number of times a video was revisited, nor between video length and sessionduration. However, there was a correlation between session duration and YouTubeview count, which might be a dominant feature of the predictive model.In this dissertation, we are looking at how viewing history can propose fastnavigation, quick search, easy sharing, and effortless video authoring. Even thoughmost of the aforementioned research has tackled some of these activities, no onehas provided users access to their own viewing history. The data is only availableto the researchers. Our approach is to give users access to their viewing history andexplore how they can utilize this data and what benefits it brings.272.3. Viewing History VisualizationFigure 2.5: YouTube’s history is a Timeline-based visualization. It consists ofa list of thumbnails ordered chronologically.2.3 Viewing History VisualizationThe human visual system can perceive graphical information such as pictures,videos and charts in parallel; however, text can only be perceived sequentially[55], since the human brain processes visual input earlier than textual input [113].Nadeem et al. showed that the use of visual aids in history mechanisms is moreeffective than the use of only textual data [90], thus having a visualization ofthe video navigation history can help extensively when searching for information.They also demonstrated that the use of history mechanisms may have a signifi-cant effect on user satisfaction and performance when revisiting previously viewedcontent. These findings reveal that it is important to develop and enhance historyvisualization mechanisms. However, since there is little work on historical videonavigation, we are going to explore this from the perspective of web browsinghistory.In most web browsers, history is represented as a list of the visited web pages’titles sorted by date, popularity or aggregated by some time period. The historymenu opens in a new window where pages are visualized as titles or with thumbnailimages. Researchers have tried various visualizations to simplify searching withinthe history, which can be divided into three categories: timeline-based, graph-basedand Three Dimensional (3D) visualizations.282.3. Viewing History Visualization2.3.1 Timeline-based VisualizationIn the timeline-based visualization, history consists of a linear scroll-able list ofthumbnails, which appear in a reverse chronological order: the most recently vis-ited page is at the top of the list, and clicking on any of these thumbnails (or icons)redirects the user to the corresponding web page. Most web browsers, YouTube,Netflix, Hodgkinson [57] and Girgensohn et al. [43] use this visualization to rep-resent the history of users’ visited content as shown in Figure 2.5, where clickingon one of the thumbnails navigates to the corresponding content. Li et al. [75] andHupp et al. [60] used this approach for their detailed histories with the additionof a list of coloured icons next to each thumbnail describing the users’ performedactions. However, this visualization faces problems when multiple tabs or multi-ple browsers are opened at the same time. Vartiainen et al. [115] developed theRolling History for mobile devices where they proposed four directions of naviga-tion control. Instead of having one reel of thumbnails, they used two: horizontaland vertical, shown in Figure 2.6. All opened browsers are in the vertical reel;the currently active browser appears in the middle, and its history is visualized inthe horizontal reel. Blankenship et al. proposed TabViz1, which uses a fan-shapedbrowser tab visualization where concurrent active tabs are represented in a radialhierarchical structure, visualizing the parent of each opened tab (Figure 2.7). Khak-sari proposed a Grid as another solution for multiple tabs opened at the same time[68]. The Grid consists of a number of labeled tabs, where each tab corresponds tothe relevant tab in the browser. Each vertical column of thumbnails is mapped tothe history of corresponding tabs in the background. According to the article, thisvisualization reduces cognitive workload, increases enjoyability and reduces userfrustration.2.3.2 Graph-based VisualizationMilic-Frayling et al. [83] deduced that high effectiveness during search requiresusers to have a mental map of both the hierarchical structure and the access se-quence of web pages. Using the timeline-based visualization, back-tracking orvisiting new content from the currently viewed history item would affect the struc-1 Viewing History VisualizationFigure 2.6: Rolling History [115] is another Timeline-based visualization,which used four directions of navigation control to cover the historyof multiple tabs or browsers opened at the same time.Figure 2.7: TabViz is a Timeline-based visualization that employed a fan-shaped hierarchical visualization to show the history of multiple tabs.302.3. Viewing History VisualizationFigure 2.8: Tree or directed graph [83] are Graph-based visualizations whichvisualize each visited page as a node and links as the edges betweennodes.ture of the history, creating confusion and affecting the searching task. This canbe solved using a 2D graph-based visualization where the history is presented asa horizontal tree (e.g. [83, 84]). A new branch is generated from the parent nodewhenever a user back-traces and visits a new page, as shown in Figure 2.8. Mayer[80] used a directed graph where each visited page is a node and the edge betweenthem is the link. Pages that are visited multiple times are visualized using a singlenode to avoid repetition. Mayer used the size of the node to represent the timespent on the corresponding web page.2.3.3 3D VisualizationThe final category is the use of 3D visualizations for browsing history. Frecon etal. [41] developed WEBPATH, which visualizes the graph of the users’ browsinghistory using a 3D representation. Each visited page is a cube labeled with thepage title on top and the surface of the cube shows an extracted image from theHTML description of the page. Users have control over which image can be used312.3. Viewing History Visualization(a) (b) (c)Figure 2.9: 3D visualization: (a) WebBook [120] represented each web pageas a traditional book page; (b) Circle mode [120] placed thumbnailsof the visited web pages at the circumference of a circle; (c) Cubemode [120] is a 3D visualization which placed thumbnails of pages onthe faces of a represent the page. This visualization has not been evaluated to check its us-ability and performance. Card et al. proposed WebBook [20], which used a bookmetaphor to aggregate web pages in virtual 3D books. A web page is representedas a traditional page in a WebBook as shown in Figure 2.9(a). However, in termsof search speed, this design might not perform well due to the need for flipping orvisiting most of the pages that precede the desired page. Yamaguchi et al. [120]also proposed two other visualizations: a circle mode and a cube mode as shown inFigure 2.9 (b) and (c) respectively. In a circle layout, the thumbnails of web pagesare placed around the circumference of a circle while in the cube layout thumbnailsare put on the surfaces of a cube. However, it is not clear how scaleable the cubeis, since the surface area is limited.None of the approaches proposed to date have been evaluated or used to visual-ize a detailed video navigation history aside from our work described in Chapter 5and Chapter 6. Research within video history (e.g. [43, 84] and YouTube) used theaforementioned approaches only to visualize the previously watched videos but nothow these videos were navigated (i.e. intervals). Other researchers [69, 82] visu-alized the viewing heuristics as footprints over the video timeline. Mertens et al.[82] have used the video timeline itself to visualize users’ viewing ‘footprints’ us-ing different brightness levels, where the more a portion gets viewed the darker thatarea gets in the timeline(Figure 2.2). It is analogous to more footprints on that area.Using this visualization, the user can easily see which portion of the video has been322.4. Video Navigationwatched the most. Similarly, Kim et al. [69] visualized users’ viewing heuristicsin the timeline; but they used a histogram graph (they call Rollercoaster, shown inFigure 2.3) instead of colour brightness where height represents how often eachportion has been viewed. Their visualization shows the intensity and range of eachpeak making it easier for users to spot the commonly revisited parts in the video.However, using Kim or Mertens’ visualization, it is impossible to tell the contentof these peaks without seeking to them and checking their content. In Chapter 5 wedescribe how to visualize a single video history, and in Chapter 6 we demonstratehow this can be extended to visualize a detailed multiple-videos navigation history.2.4 Video NavigationNavigating a video space or even long videos can be demanding and time con-suming. There have been many interaction techniques proposed in the literatureto alleviate this problem in order to quickly navigate and search video content,simplify access and improve efficiency; here, we will only mention a few. Searchin videos falls into two categories: unknown-item search and known-item search.Unknown-item search is when users simply explore a video to check what it con-tains without having a specific goal in mind. However, known-item search is whena user watches or navigates a video to find specific scenes or information. Known-item search can be subdivided into two subcategories: unseen scenes, and previ-ously seen scenes. The search for previously seen-moments needs some knowl-edge about which parts the user already saw. In this dissertation we are interestedin known-item search and more specifically for previously seen parts of a video.Figure 2.10: The standard navigation tools used in most video systems arethe video controls (play, pause, seek, fast forward, and rewind).Using simple video controls (play, pause, seek, fast forward, and rewind) (Fig-ure 2.10) to navigate and search a video is time consuming and not efficient sincethey require users to continuously interact with the video timeline and buttons (e.g.332.4. Video Navigation(a) (b)Figure 2.11: Video navigation using representative previews in (a) Netflix,and (b) YouTube [79]scrub/seek, fast forward) to find a certain portion or information from the video(i.e. known-item search). For fast search, users need to remember the approximatetemporal location of each event. There are many video systems proposed and dis-cussed in the literature that have adopted a variety of interaction tools within theirsystems to improve video navigation. We are going to mention a few of these thatfall within three categories: the use of representative thumbnails for navigation,direct manipulation of the video content, and the utilization of a user’s interactionbehaviour in a navigation tool.2.4.1 Navigation using Representative PreviewsA simple improvement is to modify the standard video timeline with some visual-ization cues. Netflix and Hulu address this problem by using a preview thumbnailat the location of the cursor on the video timeline (i.e. temporal location) to easethe search task (see Figure 2.11(a)). The previous problem still exists since thepreview appears based on the cursor location and users need to keep hovering overthe timeline while monitoring the displayed thumbnail. Simple linear video navi-gation can be accomplished through representative thumbnails, such that selectinga thumbnail directly positions the main video at the specific time correspondingto that thumbnail. Some researchers [31, 38, 97, 102, 121] applied this approachin DVD systems as well. Filmstrip (e.g. used in some videos in YouTube (Fig-ure 2.11)(b)) also applies this approach, which consists of a strip of equally time-spaced thumbnails. This approach helps users spot the scene they are looking for.342.4. Video NavigationFigure 2.12: Swifter [79]: a video scrubbing technique that displays a grid ofpre-cached thumbnails during scrubbing actionsRamos et al. [97] proposed a navigation tool called a Twist-Lens Slider, which ap-plies a filmstrip with a fish-eye view. They used the pressure on the pen as input tozoom in the filmstrip, which provides a better view of the thumbnails under the penposition and reduces the overlapping between thumbnails. Nevertheless, lookingfor target scenes of interest, which are not at or nearby these defined thumbnails,could leave users with the same problem as with the traditional video controls ifthey do not remember its relative temporal location to the available thumbnails.Matejka et al. [79] tried to solve the continuous scrubbing problem in thesearch by displaying more thumbnails using a grid instead of a single thumbnailwhen scrubbing a video (called Swifter, shown in Figure 2.12). Comparing theirapproach to previous ones applied in Netflix and YouTube, showed significantlybetter performance where users were 48% faster in locating a scene. Certainly,increasing the number of thumbnails available for users to search from will in-crease the probability of finding the thumbnail of interest from the grid and thetime needed to search the grid. Moreover, having a fixed number of thumbnailsin the interface might affect the searching performance based on the video length.In Matejka’s approach, users still need to continuously scrub the video timelineto reveal the grid of thumbnails; however, it is less frequent than a single thumb-352.4. Video NavigationFigure 2.13: Panopticon [63, 92]: a video surrogate system that displays mul-tiple sub-sequences in parallel to present a rapid overview of the entiresequence to the user.nail as in Netflix. Thus, it still needs users to somehow remember roughly wherethe intended scene is located or they will just have to scrub the entire video time-line. Jackson et al. [63] and Nicholson et al. [92] applied a similar concept toMatejka [79] where they displayed a grid of thumbnails for the complete videowithout excluding any frames from the video (see Figure 2.13). They achieved thisby using video surrogates instead of static thumbnails where each surrogate repre-sents approximately ten seconds (depending on the video length). These surrogatesare playing in parallel and each surrogate plays in a loop, to provide users with arapid overview of the video without the need to seek/scrub. When they comparedtheir navigation tool, Panopticon, against YouTube and a tool similar to Matejka’stool (Swifter), Panopticon was significantly faster in finding answers from unseenvideos. Having multiple surrogates playing at the same time overloaded partici-pants with information and it is distracting if the goal was not to search the videobut rather enjoying the content of the video. Their results showed that users ac-tually spent significant time looking at the navigation tool rather than the videoplayer itself and this justified the distraction from the video viewing. The authors362.4. Video NavigationFigure 2.14: Cheng et al. [25] Smartplayer helps users rapidly skim throughuninteresting content and watch interesting parts at a normal pace.tried to remove the video player and used Panopticon as a viewer and navigationtool at the same time; however, the parallel playing of the surrogates still was over-whelming and would distract focusing on one specific surrogate. Panopticon wasfound efficient for searching video but not practical for watching videos. A similarconcept to a rapid preview of the video content was proposed by Cheng et al. [25].They proposed Smartplayer (Figure 2.14), a playback mechanism to help usersrapidly skim through uninteresting content and watch interesting parts at a normalpace. However, the determination of the importance or interest of the content isdetermined by the content-producer, which might not match that of the consumersof the video. SmartPlayer provides an interesting technique to quickly obtain anoverview of a video’s content, but does not necessarily improve the performancefor finding tasks.2.4.2 Navigation by Direct Manipulation of ContentSince the video content changes as time passes, the separation between the videocontent and the controls used to manipulate or navigate the video introduces a372.4. Video NavigationFigure 2.15: Kimber et al. [71] Trailblazer allows users to navigate a videoby directly controlling (i.e. dragging) objects in the video or on thefloor acquisition problem. Thus, some researchers have focused in introducingnew techniques to manipulate the video timeline rather than using the traditionalvideo controls (e.g. play, pause, fast forward, rewind, and seek). They allow usersto navigate the video by directly manipulating the video content along its naturalmovement path rather than using the timeline slider [37, 45, 67, 71, 97]. Objectsin a video, which are detected based on their history of movement, are draggedbackward and forward in time to manipulate the video timeline. Kimber et al. [71]developed a direct manipulation by graphing a visualization path for the movementof each tagged object and enabled the user to scrub the video by dragging the objectin the video window (Figure 2.15). Similar to Kimber’s technique, Karrer et al.introduced the DRAGON interface [67], which provided a mechanism that matchesthe direction and movement amplitude of an object of interest to the direction andamplitude of a human interface device. It used the optical flow of the video tocreate the movement trajectory of objects of interest. DRAGON allowed users tohave a frame-accurate navigation through the video, which is not provided by the382.4. Video Navigationtimeline slider. Dragicevic et al. [37] and Goldman et al. [45] also allowed usersto directly manipulate video content to go forth and back through the video in asimilar way to Kimber [71]. Nevertheless, this approach does not help users toquickly find certain information or scenes from the video unless they rememberits relation to the currently viewed content. Monserrat et al. [88] extended thisapproach by allowing both dragging and selection of objects in blackboard-stylelecture videos. Clicking on an object seeks the video to the starting timestamp ofthat object and starts playing from that moment; while dragging the object to theleft or right performs rewind and fast forward action. Their experiment showedthat this was significantly faster than a video player with a single thumbnail pop-up based on cursor location (e.g. similar to Netflix). However, still this approachhas the same problem if the intended scene is not temporarily close to the currentdisplayed content or objects.2.4.3 Navigation Applying Users’ WisdomPeople in real scenarios re-watch parts of videos that are important, interesting, af-fective, or hard to understand [70, 100]. This can be seen most simply in YouTube’sor Vimeo’s feature for sharing a video from a specific start time, providing an exactuse case of what we propose. This watching behaviour leaves digital footprints onthe video frames creating a non-flat video histogram emphasizing the interest ofeach part of the video. Thus it provides a potential tool that can be used to facil-itate fast navigation and scenes search. Using this metadata to enhance the toolsdiscussed in Section 2.4.1 can improve users’ performance while searching as wellas personalize the tool based on users’ own behaviour. Instead of using video pre-views that are generated systematically, personal or the collective wisdom can beapplied. In Chapter 5 we demonstrate how using personal wisdom significantlyimproved the search task within videos.According to Shamma et al. [105], the more affective a scene is, the morethe corresponding interval of the video is viewed or consumed. Thus, the cumula-tive seamless users’ interaction history could be leveraged for the benefit of futureviewers (i.e. social navigation). Yu et al. [125] used low-level feature extractionalong with users’ footprints (i.e. view count) to rank each scene and offer users392.4. Video NavigationFigure 2.16: Shamma et al. [105] HackDay TV system applies users’ foot-prints on the video timeline to visualize what has been consumed froma video to help users navigate the video. Colour intensity indicateshow often each portion has been used in remixes.other scenes that have some correlation with what they are watching. This providesusers with a quick navigation method to similar scenes. Clicking on any of the pro-vided scenes jumps the user to the corresponding time in the video. Mertens et al.[82] have used the video timeline itself to visualize users’ viewing footprints usingdifferent brightness levels, which lets users quickly navigate to the most viewedscenes (Figure 2.2). Shamma et al. [105] have also applied similar approach ontheir HackDay TV as shown in Figure 2.16. However, this does not supply a vi-sualization of the video, which inhibits search (when searching for a previouslyseen event, users need to remember its approximate location). Likewise, Kim et al.[69] used Rollercoaster, shown in Figure 2.3, to visualize users’ viewing heuris-tics in the timeline. Their visualization shows the intensity and range of each peakmaking it easier for users to spot the commonly revisited parts in the video. Theyalso applied non-linear scrubbing by extending the rubber band analogy to con-trol scrubbing speed and support precise navigation [78, 98]. It is similar to theSmartplayer [25] concept where the important parts got longer exposure than otherparts. They applied friction while scrubbing around peaks to slow down scrubbingin these areas, allowing users to get a more comprehensive view of these portions402.5. Object Selection in Videosof the video. Kim’s tool faces the same problem as Mertens’ [82] where there is nopreview of the peaks’ content. We show in Chapter 5 how extending this approachby including previews along heuristics improved the search task.2.5 Object Selection in VideosMost commercial brands today are using the growth in popularity of online videosites to get their name and message out to a wider population of consumers. How-ever, their video marketing is not what it used to be. Simple video commercialpop-ups are being substituted by introducing interactive spots or annotations withinvideos. Clicking on these spots direct viewers to the commercial ad where they canengage and spend more time on the brand’s marketing video. Many online videosites offer companies a platform for creating this kind of interactivity within videos.For example, YouTube annotations give brands the opportunity to make their owninteractive advertisements.One of the main issues with this form of video is the selection of the interactivelinks or objects within these videos to traverse to the next piece of information.Due to the time-based nature of these videos, the embedded clickable anchors orannotations are visible or active for only a certain duration of the video in con-trast to web pages in which hyperlinks are present at all times. Therefore, theactivation and selection of these hotspots becomes difficult and are affected by theshape, size and the location of the object at a given time of the user’s selection.For some, selecting a stationary object or a graphical element can be difficult. Dueto the moving behavior of the objects in videos and camera panning or zooming,the selection becomes challenging. In some application domains, such as sports orracing, the difficulty is exacerbated where objects or regions are rapidly movingmaking the selection hard to be achieved. It can become frustrating when the usertries to chase the moving object or when the selection results in mistakenly acti-vating a hyperlink associated with another object. In order to correct this mistake,the user may need to stop the video clip, rewind it to a previous time point, andtry to select the object again. Thus, there is a need for methods and techniquesthat should help the user to select and activate a hotspot associated with objects ininteractive videos.412.5. Object Selection in VideosFigure 2.17: Baudisch et al. [13] Drag-and-Pick selection technique.2.5.1 Static Object SelectionResearchers have proposed interaction techniques to help users select targets byreducing the index of difficulty as implied by Fitts’ extended models [39]. Oneapproach consists in decreasing the distance from the cursor to the target (D): thiswas applied in Drag-and-Pick [13], Object Pointing [50] and Delphian desktop[11]. Drag-and Pick[13] moves potential targets closer to the cursor dependingon the cursor’s directional movement as shown in Figure 2.17. Guiard et al. de-signed another technique called Object Pointing [50] where the cursor skips emptyspace between targets by jumping from one target to another depending on direc-tional movement and the closest target, thus considerably reducing D. Asano et al.furthered the research of Object pointing to Delphian desktop [11] by adding theprediction of a target using the peak velocity and trajectory of the cursor. Thesetechniques are affected by the layout of objects on screen and tend to work bestwhen targets are sparse on the screen. Baudisch et al. [14] took another approachin their Starburst method to reduce D by using empty screen space to increase theeffective width and decreasing the distance. This technique optimizes movementtime but prevents cursor interaction with the empty space.Several other methods also focused on modifying the effective width eitherby increasing the target width or cursor area. McGuffin and Balakrishnan [81]422.5. Object Selection in VideosComet TailTarget LockFigure 2.18: Gunn et al. [52] Comet Tail and Target lock selection tech-niques.investigated the effectiveness of dynamically increasing the target size as the cursornears the target. They found users could benefit from the expansion even when itoccurred at the last 10% of the movement time towards targets. This effort wasfurthered by Zhai et al. [127] who showed that users had similar performanceeven when they could not predict the target expansion. Bubble targets [32], CometTail [52] and Target Lock [52] enhanced targets with a bubble or a comet tail as thecursor reached the target instead of expanding the target’s actual size (Figure 2.18).Expanding target size out-performs the regular technique for the selection of singleisolated targets but they do not perform well with clustered or dense areas of targetsas selection ambiguity and visual distraction arise.Kabbash and Buxton [66] increased the cursor area to increase the effectivewidth. This technique introduced ambiguity when multiple targets were present onscreen. Worden et al. [118] furthered Kabbash and Buxton’s approach by propos-ing an additional single hotspot at the center of the cursor that is activated whenmultiple targets are present under the cursor to alleviate ambiguity. Other tech-niques such as Bubble cursor [47] and Starburst [14] tried to maximize the activa-tion area (effective width) of each target by partitioning the empty space effectively.The Bubble cursor, shown in Figure 2.19, could cause visual distraction as the cur-432.5. Object Selection in VideosFigure 2.19: Grossman et al. [47] Bubble cursor selection technique. (a)Area cursors ease selection with larger hotspots than point cursors. (b)Isolating the intended target is difficult when the area cursor encom-passes multiple possible targets. (c) The bubble cursor solves the prob-lem in (b) by changing its size dynamically such that only the targetclosest to the cursor center is selected. (d) The bubble cursor morphsto encompass a target when the basic circular cursor cannot completelydo so without intersecting a neighboring target.sor expands/shrinks unpredictably, especially with moving targets. Chapuis et al.applied the expansion of the cursor area in relation to the cursor moving speed ina technique called DynaSpot [24]; however, this technique does not address thespeed of the targets.There also have been some efforts [16, 32, 117, 118] to improve the targetacquisition time by changing the control-display (CD) gain (the ratio between dis-tance moved by the physical input device and distance moved by the visual cursor).By increasing the CD gain in the empty space and decreasing it when the cursoris reaching or over targets, the motor space D/W ratio is decreased. Semanticpointing [16] and Sticky icons [32, 118] used this technique and showed that theyout-perform regular selection. However, they faced problems when distractors arepresent, as this reduces cursor movement time degrading performance comparedto regular pointing. The angle mouse [117] handled a dense cluster of targets and442.5. Object Selection in Videoswas able to avoid distractors. However, as stated in the paper, it only improved theperformance of people with motor impairments.2.5.2 Moving Object SelectionThere has been little research done in this area to investigate and propose newtechniques that allow users to easily select moving objects. Mould and Gutwin[89] suggested using target feedback to assist users in targeting moving objects. Intheir proposal, the object is highlighted once the cursor is over it, indicating thatby clicking, the user can successfully select that object. Their results showed thatthis method helped in reducing the error rate. Nonetheless, the feedback from theirparticipants indicated that it causes visual distraction especially when objects areclose to each other, leading to continual highlighting change. Furthermore, the timeneeded to select the object did not improve by using this technique as illustrated intheir experimental results.Comet Tails and Target Lock, shown in Figure 2.18, are two other techniquesthat were introduced by Gunn et al. [52] to alleviate the difficulty of selectingmoving objects. The Comet Tails technique works by providing a larger sensitivearea behind the object on which the user can click to select that object. In the TargetLock technique, when the cursor moves or is placed over an object, the objectwill be highlighted and remain highlighted even when the cursor moves outsidethe object region (trigger the lock effect). Clicking anywhere would complete theselection of that object. In order to select another object using the Target Locktechnique, the user needs to move the cursor over the new object and it will belocked. These techniques showed promising results in terms of time and error ratewhen compared to the chase technique. However, as discussed by the authors, theTarget Lock resulted in erroneous selections due to the movement of non-targetobjects under the cursor. This could become even worse when objects are movingtoo fast because when the cursor is triggered over any of these objects, they will gethighlighted. This continual alternation of highlighting the objects without users’control can cause frustration and visual distraction.In relation to Hypervideo, to avoid disturbing the video being played, someresearchers tried to present the links or hotspots outside the video player area.452.5. Object Selection in VideosIn HyperCafe [101, 102], hyperlinks were displayed as pictures or text outsidethe video window enabling users to select them without the struggle of chasingthose hyperlinks. Hyper-Hitchcock [43] also provided the same approach used inHyperCafe where the links were placed on the timeline. Even though watching thevideo was not disturbed by the links, the user’s attention was continually drawnfrom the story presented on the Hypervideo because they had to observe the areawhere the links are shown.Similar to the Target Lock technique [52] described earlier, Sundstrom [110]proposed an approach where the cursor is automatically linked to a hotspot in aHypervideo stream. When the user places the cursor within a sensitive region,the cursor is linked or locked to that hotspot and will automatically track it. Thedecision to activate the object associated with the hotspot is left to the user, and theywill not need to manually chase that object over time. The user will be assured thatthey have activated the intended hotspot even when it is completely or partiallyoverlapped by other objects or it is out of the scene. In the case of overlappinghotspots, the system provides three methods to determine which object is to belinked to the cursor. The cursor can either be linked to the object with the closestcenter to the cursor, linked to the one on top, or not linked. This approach mayhave some potential applications as described by the author; however, the timeand the precision needed to position the cursor over the intended object remain aschallenges. Moreover, the ability to activate the hotspot even when it is out of thescene could cause some confusion.In order to mitigate the challenge of selecting moving objects without intro-ducing any distraction for the user, we proposed a novel selection technique calledHold (described in Chapter 7) which temporarily pauses the content while selec-tion is in progress to provide a static target. It helps the user to select objects whilewatching a video without the need to use a separate pause button each time theyneed to do so or to chase a moving object. The Hold technique works as follows:when a user clicks the mouse button down, the moving objects temporarily pausewhile the user interacts with objects. When they release the button, the objectsstart moving again. This method was evaluated against the classical chase tech-nique and the results have shown that it outperforms the latter for small and fastmoving objects in terms of time needed to select the object and the error rate.462.5. Object Selection in VideosTarget Ghost Comet GhostFigure 2.20: Hasan et al. [53] Comet and Target Ghost moving target selec-tion techniques.Hasan et al. [53] proposed two techniques for the selection of moving objects:Comet and Target Ghost (Figure 2.20). Their Comet technique is similar to theone proposed by Gunn [52] where they changed the model in which the size ofthe comet is determined by the speed and the size of the target. The Target Ghost;however, has the selection of static object concept similar to our Hold techniquedescribed in Chapter 7. By clicking on the shift key, a stationary ghost or a proxyof the objects will be displayed while the video will continue playing. This allowsthe user to select while not missing any information from the video. The resultson One Dimensional (1D) targeting showed that Comet outperformed other tech-niques in selecting moving objects, while the Target Ghost did not improve theselection. In the Two Dimensional (2D) targeting, the Target Ghost outperformedall the selection techniques that had been tested. The authors mentioned that evenwith these good achievements, the two techniques have limitations that need tobe taken into consideration. With a large number of objects on the scene, bothComet and Target Ghost could increase the clutter, resulting in visual distractionand selection degradation.Our Hold technique showed promising results in reducing the time needed toselect moving objects. It introduced a variety of metaphors for selection whenapplying the Hold metaphor that can be left for future work. Hold is one of themajor contributions of this research work and to the HCI community.472.6. Summary2.6 SummaryTo summarize, few researchers have examined users’ behaviour while viewingvideos [28, 44, 59, 61, 70, 124, 126]. They have looked at this from a collective per-spective not at the individual level. Most of this research is either category specific[28, 44, 70], population specific [28, 44, 59, 70], or event specific [124]. The dif-ferent aspects that have been analyzed are drop-off rate [59, 61, 70, 70, 124, 126],access peaks [124, 126], session length [124, 126], re-watching peaks [28, 44, 70],number of skips [28, 44], and whether any interactions occurred while viewing[59, 61]. None of the research has examined all of these aspects together, whichcan give different insights into users’ behaviour. The majority of the research onvideo viewing has looked at how to utilize this metadata. Researchers have consid-ered utilizing users’ viewing history for filtering [77], caching [124], categorization[123], summarization [108, 111, 125], navigation [69, 82], and popularity predic-tion [104, 112]. However, most of these tools are not available for consumers orindividuals. Researchers investigated these areas to provide content producers withnew tools to define, place and deliver content to users. To be able to give users ac-cess to this information, mechanisms to visualize and manage this data need to bedefined. To our knowledge, there is no research done in this area aside from over-laying the time series graph of the re-watching view count on the video timeline[69, 82].The related works show the promise of utilizing what users have seen in thepast for the design of new tools and features within the video space. We haveidentified three areas to contribute to this thread of research: (1) a thorough analysisof viewing behaviour considering the different aspects mentioned in the literatureand introducing others to comprehensively understand users’ behavior and howthese behaviours correlate; (2) a history-based video viewing interface that allowsusers to view and quickly navigate videos, offers users access to and managementof their metadata; (3) a new technique to ease the selection of objects within videosfor faster activation or manipulation.48Chapter 3Video Viewing BehaviourNowadays users can easily access hundreds of videos with just a simple click. Theycan enjoy and experience videos from different platforms and on various devices.As users view and navigate various videos, a substantial amount of digital foot-prints are left on these videos, which can be turned into useful information [29]and provide a rich resource that can be leveraged for viewers. Nonetheless, thisdata is not accessible because online video platforms do not share it. Researchers[29, 43, 70] have designed their own plug-ins or systems to capture users’ interac-tions with a video player which are then mined, aggregated and analyzed to under-stand people’s consumption practices, how they experience media, and to developmodels and tools that improve users’ task performance.Although previous studies [28, 44, 59, 61, 70, 124, 126] have analyzed differentsets of interactions within videos, to the best of our knowledge, nobody has donestudies on general (i.e. any category or domain-independent) videos looking athow users’ behaviour changed from traditional watching where users watch videosonce, passively and sequentially. For example, Kim et al. [70] studied learners’ (i.e.students) behaviour in an education scenario, and Chorianopoulos et al. [29, 44]examined instructional videos. While they found certain behaviours, for examplere-watching exists and how to apply it later but they did not look at other types ofvideo (e.g. comedy, sports, etc) and domain-independent users. In contrast, weare looking at the different categories available in YouTube. We are looking at asmany of the possible behaviours we can just from tracking users’ interactions. We49are not actively changing someone’s behaviour; we are just looking at what theyare doing on YouTube using a very simple plug-in (described in Section 3.3.1) ona desktop platform. YouTube was chosen to study users’ viewing behaviour sinceit is the most popular video streaming site (60% of all online videos are watchedthrough YouTube), it hosts videos from various categories and more than 4 billionvideos are viewed daily on it1.In [69, 70], it has been suggested that re-watching exists only in the category ofeducation and instructional videos. Our data suggests otherwise. Our participantsexhibit these behaviours across all video types and in Section 3.7.2 the analysisconfirms this. Others [59, 124] claimed that users mostly watch videos passivelyand do not interact with video content; however, in Section 3.7.6 we show that morethan 65% of our collected videos were actively watched with frequent interactionfrom participants. Moreover, none of the previous work has looked at individualuser’s behaviour. They were mostly looking at the aggregation of all users’ data.In this chapter we show the behaviours exhibited by each participant and across allvideo types.In comparison with previous work, our study (1) analyzes general videos; (2)analyzes behaviours at an individual level; (3) analyzes behaviours in each videocategory; and (4) provides personal viewing pattern based on exhibited behaviours.The important results came from: (1) analysis in (Section 3.7.6) which shows inclu-sively that participants are actively watching videos; (2) analysis in (Section 3.7.2)which shows that participants re-watch all video types equally and they do it often;(3) analysis in (Section 3.7.4) which demonstrates that when participants accessedvideos the next time it is mostly to refer or re-watch something that has been previ-ously seen and not to be resumed from where it was left; (4) analyses in (Sections3.8 and 3.9) which show that the drop-off has little to no correlation with the videolength and popularity, indicating it is subjective with users being uninterested inthe content; and (5) the characterization of the individual personal behaviours inSection 3.6.This chapter starts by describing different interactions logged in our user studyin Section 3.1. The types of behaviour are then described in Section 3.2, fol-1 Logged Video Interactionslowed by our user study design in Section 3.3. For the study findings, Section 3.6discusses the different categorizations being identified for individual viewing be-haviours. Then, sections 3.7 to 3.10 provide detailed analysis of each behaviour,which helped in identifying these sets of behavioural groupings. Finally, Sec-tion 3.12 and Section 3.13 discuss the limitations and the directions for futureresearch based on the observed findings.3.1 Logged Video InteractionsThere is a defined set of functions users can apply to interact with video contentin a familiar video player. These interactions allow users to control the playbackof a video. We are interested in analyzing how users watch any general video byrecording the following actions:Play: When a user starts playback in any part of a video.Pause: Every time a user pauses or stops the video playback.Seek/Scrub: Every time a user jumps to any specific time in the video and contin-ues the playback.Change Video: Each time a different video is being played.These actions allow us to capture which videos a user has watched and moreprecisely which parts of a certain video have been viewed. They also show how auser has navigated each video, which can then be used to characterize their viewingpattern.3.2 Types of Viewing BehavioursThe goal of this chapter is to present the results of users’ watching patterns andhow often they occur so that we can characterize their viewing activity. Variousviewing behaviours may emerge from users while navigating and watching videosas indicated in Chapter 2. In this section, we define six behaviours and interactionsthat we are going to use and analyze in this chapter. These behaviours are deter-mined based on the video interactions defined in Section 3.1. The six behavioursanalyzed are as follows:513.2. Types of Viewing BehavioursSkip: Any time a user seeks and starts watching from a point in a video that hasnot been seen before, as shown in steps 2 and 3 of the skip behaviour inFigure 3.1.Re-watch: Any time a user watches a portion of a video that has been seen before.It can be either explicit or implicit re-watch.Explicit Re-watch: Any time a user seeks and starts watching from a point thathas been seen before (e.g. rewind) (step 2 of the explicit re-watch in Fig-ure 3.1).Implicit Re-watch: Any time a user seeks and starts watching from a point thathas not been seen before but continues watching and encounters some partsthat have been seen before, as illustrated in step 3 of the implicit re-watch inFigure 3.1.Drop-off: Within a single session, if a video is abandoned before the end or is notwatched entirely.Uninterrupted: If a video is viewed sequentially and entirely in a single sessionwithout any actions from the user (e.g. skip, rewind in the middle, or drop-off), as shown in the uninterrupted behaviour in Figure 3.1.Replay: If a video is watched entirely without interruption for the nth time in thesame session where n > 1 (step 2 of the replay behaviour in Figure 3.1).Revisit: If a video is visited again in another session.Some of these behaviours require a definition of a session, specifically howlong a gap is required between activity records before a new session is started. Ses-sion length varies between users, as it depends on how long a user spends watchingvideos, the number of videos watched, the length of the viewed videos and when auser opens and closes the YouTube page. Yin et al. [124] define a session length asthe time from when the user hits the play button for a single video until they clickon the stop button and subtracting any paused time. Almeida et al. [10] classify anyrequests from the same media file to correspond to a single session, while Hwanget al. [61] consider consecutive requests for the same video that are less than T523.2. Types of Viewing BehavioursSkipExplicit re-watchImplicit re-watchRelayWatch sequentially(uninterrupted)Figure 3.1: Illustration of video viewing behaviours: the white circles repre-sent the start position of a user’s viewing interval; grey lines representthe video timeline; red lines indicate the duration and temporal locationwatched; and the blue arrows indicate an action taken by the user to seekto another time. The numbers indicate the order of user actions.apart to be in the same session, otherwise it is treated as a new session. Also, if adifferent video is requested then it is a new session. T was determined based onthe normalized length viewed (NLV) of a video and they found that the change inNLV decreases when T = 1 hour and T = 4 hours. They used T = 4 hours sincethe number of sessions beyond that was negligible in their data. In web browsing,533.2. Types of Viewing BehavioursCatledge and Pitkow [23] measured the time between all events for all users andfound that a lapse of 25.5 minutes or greater can be used to indicate the beginningof a new session. Krishnan et al. [72] used 30 minutes of inactivity to indicate theend of one session and the beginning of the next, following the standard notion of avisit/session in web analytics. Our definition of a session is similar to [23, 61, 72]where we measured the inactivity period between each record for all records acrossusers to determine a session boundaries. Based on the collected data we found that80% of the records had less than 10 minutes of inactivity from their consecutiverecords, as shown in Figure 3.2. Thus, in this chapter we consider two consecutiverecords in the same session if the inactive time between them is within 10 minutes,otherwise these two records belong to different sessions.Figure 3.2: Frequency of each inactivity period between records. 80% ofrecords have less than 10 minutes gap from their consecutive record.Beyond 50 minutes the frequency at each time point is less than 8records.543.3. Study Design3.3 Study DesignIn order to characterize how people navigate and view online videos, we need toobtain information about users and their interactions with viewed videos. To doso, we developed an extension for Google Chrome, described in Section 3.3.1, thatgathers and keeps track of users’ interaction in YouTube such as play, pause or seekactions, videos being viewed, and intervals being watched from each video. Wewanted to characterize users’ watching patterns without influencing their behaviourby allowing users to watch videos, as they would normally do in their own timeand pace while our extension recorded their interactions in the background. Theserecords were then sent to our server to be used for analysis. Using this data, weinvestigated the following hypotheses and questions:H1. People do not watch videos passively from start to end.Q1. Do people watch videos uninterrupted?Q2. Do people skip parts of video?Q3. Do people re-watch videos?(i) in a single session or multiple sessions(ii) entire video or intervalsQ4. How often do they interact with videos while viewing?Q5. Is there any correlation between different behaviours (e.g. skip anddrop-off)?H2. Shorter and popular videos will have more users’ engagement (i.e. interac-tions).Q1. How often were videos abandoned before the end (i.e. drop-off)?(i) Where in-video does drop-off exist the most?Q2. Does video length have any correlation with the user’s watching be-haviour?Q3. Does popularity influence users’ watching behaviour?553.3. Study DesignH3. Music videos will exhibit more replay actions while Education and How-tovideos will have more re-watching actions.Q1. Does the type of category affect users’ watching behaviour?Q2. Does replayed videos come from playlists?3.3.1 Logging Users’ Viewing BehaviourTo log users’ viewing behaviour, we developed Video Viewing Behaviour ex-tension (VVB), an extension to the Google Chrome browser that was built us-ing HTML and JavaScript. This extension ran in the background when the userbrowses the web using Google Chrome. It starts recording once the user startswatching any video in YouTube. Each time the user watches or interacts (i.e. play,pause, seek) with the video being played, VVB keeps a record of the URL, videoID, video title, video duration, video category, number of views, start time of theinterval being watched, end time of the interval, and the user’s local time to iden-tify when the interval was watched. The end time of the interval is being updatedevery 10 ms while the user is watching to avoid losing any information for any un-expected browser failure. It is also recorded when the user pauses, seeks, navigatesto another video, navigates away from YouTube, or simply closes the browser.These records are kept in the browser’s local storage on the user’s machine, whichare then sent along with the user’s identifier (ID) to our server the next day the useropens the Google Chrome browser. These records are stored in our server for lateranalysis.3.3.2 Participation ProcedureFor participant recruitment, a webpage2 was designed to describe the purpose ofthe study, how the extension works, how to install it, the ethics approval, and theextension files to be installed or verified by users who need to check the code be-fore installation. Emails asking people to participate with a link to the extension’swebpage were sent to friends, friends of friends, and student lists at the univer-sity. Once a person had installed the extension, they were asked to sign a consent2 Collected Video Interaction Datasetform electronically before the extension can start recording their behaviour. If theyconsented, they were asked to fill out a form to give us some details about theirgender, age category, how often they watch online videos, and how many videosthey watch per session, as shown in Section B.1. This data was then sent to ourserver, which in return sent an ID to the extension at the participant’s browser to beused as a unique identification for future data. At this point VVB is ready to recordthe user’s behaviour. Participants had the option to opt out from the study at anytime by stopping the extension from recording or simply by deleting the extensionusing chrome://extensions/. This user study was approved by the University ofBritish Columbia Behavioural Research Ethics Board [certificate #: H13-01589].3.3.3 ParticipantsNineteen online volunteers, 11 male and 8 female, participated in this study. Par-ticipants ranged in ages from 19 to 40, where nine participants were from 19-25,six were 26-30, and four were 31-40 years old. Participants reported watchingvideos either on a daily basis (eleven participants), 3-5 times a week (six partici-pants), or once a week (two participants). Thirteen participants reported watching1-3 videos per session, three participants watch 4-6 videos, one participant watches7-10 videos, while two participants watch more than 10 videos per session. Theduration of the experiment and the number of active days (i.e. actually watchedvideos in YouTube) were measured for each participant as shown in Table 3.1. Theduration of the experiment is the time from when the user installed the extensiontill the time of the last record sent to our server, while the number of active days ismeasured by counting the days that a user had at least one record.3.4 Collected Video Interaction DatasetOur dataset consists of interaction logs from YouTube videos for nineteen usersover the period from December 25th, 2013 to May 27th, 2014. Each participant hada different duration of the experiment, based on when they installed the extensionand when they stopped or deleted it, as shown in Table 3.1. Each log entry consistsof user ID, time of access, video URL, video ID, video title, video length, videocategory, number of views, start time of the interval being watched, and end time573.4. Collected Video Interaction DatasetTable 3.1: Demographics summary for participants in the video viewing be-haviour study. Watching frequencies and videos per session are the val-ues reported by participants. (Note: duration in days)P GenderAge Watching Videos/ Experiment ActiveGroup Frequency Session Duration Days1 Female 19-25 3-5 / week 1-3 16.08 82 Female 31-40 Daily 1-3 41.96 63 Female 19-25 3-5 / week 1-3 6.04 44 Male 19-25 Daily 1-3 27.46 145 Female 26-30 Once a week 1-3 11.99 46 Male 26-30 Daily > 10 79.14 297 Male 26-30 Daily 1-3 77.42 388 Male 19-25 Daily 1-3 10.40 79 Male 31-40 3-5 / week 4-6 59.22 1010 Female 19-25 Daily 1-3 75.53 5211 Male 19-25 3-5 / week 7-10 21.06 1312 Male 19-25 Daily 1-3 22.15 313 Female 19-25 3-5 / week 4-6 1.04 214 Male 19-25 Daily 1-3 76.07 7015 Female 31-40 3-5 / week 1-3 68.85 4916 Male 31-40 Daily 1-3 69.50 2017 Male 26-30 Daily > 10 7.59 718 Male 26-30 Daily 1-3 15.24 419 Female 26-30 Once a week 4-6 19.79 6of the interval. We gathered a total of 4,786 interaction logs or records from thenineteen participants, which came from 2,129 unique videos. The distribution ofthe number of unique videos among participants is shown in Figure 3.3. Figure 3.4illustrates the number of collected records per participant, which shows a similartrend to the number of videos with the exception of participant 11 who seemed tobe more active with fewer videos, in comparison to participants 14 and 15 who hada higher number of videos.583.4. Collected Video Interaction DatasetFigure 3.3: Number of unique visited videos per participant. Participants 7,10, 14 and 15 had the most number of videos while participants 12, 13and 18 had the least.Figure 3.4: Number of collected records per participant. Participants 7, 10and 11 had the most number of records while participants 12, 13 and 18had the least.593.4. Collected Video Interaction Dataset3.4.1 Data ClusteringIn order to look for differences between groups along with individual videos orparticipants, we had different clustering based on the activity level of participantsand the length of videos.Grouping ParticipantsTo explore the difference between heavy viewers, medium and light users, we usedthe k-means clustering with k = 3 to divide participants into groups based on theirtotal number of records, number of viewed videos and number of active days in theexperiment. Based on this clustering, heavy viewers include participants 10 and11, medium viewers are participants 7, 14 and 15, while the rest of the participantsare light viewers. Heavy users were more active while viewing videos, as canbe seen in Figure 3.5, where they watched on average fewer videos than mediumviewers. However, they had more interactions on average indicated by the numberof records.Average no. of videos Average no. of recordsFigure 3.5: Average number of visited videos and records per group. Mediumviewers watched more videos on average, while heavy viewers had themost number of records, which indicates more activity among theseusers.603.5. Analysis 1: Watched CategoriesVideo Duration ClusteringTo examine the effect of video duration on different users’ interaction, the k-meansclustering with k = 3 was performed to divide videos into three categories basedon their length: short, medium, and long. The clustering based on the collecteddata showed that short videos are videos within 193 seconds, medium videos havea duration between 193 seconds and 446 second, while long videos are longer than449 seconds. Applying these groups, there were 774 short videos, 914 mediumvideos and 441 long videos in total.3.5 Analysis 1: Watched CategoriesThe dataset was analyzed to examine each video’s category and showed that thewatched videos came from 16 different categories as defined by YouTube. Thesecategories with the number of videos are: Music (681 videos), Science & Technol-ogy (289), Entertainment (238), Education (196), People & Blogs (163), Comedy(158), Gaming (112), Sports (98), Film & Animation (68), How-to & Style (59),News & Politics (25), Pets & Animals (19), Travel & Events (9), Nonprofits &Activism (8), Autos & Vehicles (5), and Trailers (1). Categories that had less than2% of the total number of videos were grouped together in a category called Other.Figure 3.6 illustrates the distribution of videos among the new categories.Similar to Sysomos Inc.3 statistics on YouTube and [26, 42, 123], Music andEntertainment were the most watched categories among users. Even more, the per-centage of Music videos is close to what is reported in Sysomos Inc. (∼31%).However, in our data, Science & Technology was the second most watched cate-gory which was in 11th place according to Sysomos Inc. and was not mentioned in[26, 42, 123]. Education was highly watched among our participants and this wasnot the case in the previous research where it was not highly watched as it came 7thin the Sysomos Inc. analysis. This might be because many of our participants arestudents since we sent invitations for participation through the university mailinglists. Nevertheless, we cannot justify this claim as we have not collected any dataabout participant occupation or interest area. The other categories showed similarpopularity among users to those presented in the previous work ([26, 42]).3 Analysis 2: Participants’ Viewing PatternsFigure 3.6: Number of visited videos per category. Music had the most num-ber of videos 681 videos then came Science & Technology, which had289 videos.3.6 Analysis 2: Participants’ Viewing PatternsThe goal of this study was to characterize each individual user’s watching pat-tern and how often they occur to help define some design guidelines for a futurevideo interface. Thus, we analyzed the dataset for each participant based on thesix behaviours defined in Section 3.2. Participants showed large variations in thedifferent behaviours. Moreover, each participant performed each behaviour withdifferent frequencies. It indicated that individuals have a defined way of watch-ing videos and this changes based on how much they actually watch. Thus, inthis section we are going to categorize each participant’s viewing pattern based onthe behaviour they presented and how frequently these occurred. A detailed anal-ysis of each behaviour is discussed in the following sections. Figure 3.7 showsthe different shapes of participants’ behaviours. The data for each behaviour wascomputed as the number of times that behaviour occurs divided by the number ofvideos that contained these behaviours. These values were then normalized basedon the maximum behaviour the participant exhibited.Based on the analysis, four main behavioural groupings appeared for most par-623.6. Analysis 2: Participants’ Viewing PatternsParticipant 1 Participant 2 Participant 3 Participant 4Participant 5 Participant 6 Participant 7 Participant 8Participant 9 Participant 10 Participant 11 Participant 12Participant 13 Participant 14 Participant 15 Participant 16Participant 17 Participant 18 Participant 19Figure 3.7: Participants’ viewing patterns based on their own normalized be-haviour and actions frequency.633.6. Analysis 2: Participants’ Viewing Patternsticipants which are skipper, re-watcher, replayer, and revisiter. The skipper cate-gory includes participants 3, 4, 5, 6, 7, 8, 10, 13, 14, 15, 16 and 19, the revisitergroup has participants 6, 7, 14, 15, 17 and 19, while the replayer category containsparticipants 2, 7, 11, 15, 16 and 19. All participants except four (2, 8, 11, and 12)are re-watchers. Detailed participant viewing patterns are as follows:Participant 1: She can be categorized as a re-watcher since it is the most frequentbehaviour pattern for her.Participant 2: She seems to be a replayer.Participant 3: She is an active re-watcher and skipper.Participant 4: He is a re-watcher and skipper similar to participant 3 but he tendedto leave many videos before they finished.Participant 5: She is another re-watcher and skipper.Participant 6: He is similar to the previous participants but he is also a revisiter.Participant 6 seems to go back and watch the same videos in different ses-sions.Participant 7: He appears to be very active while viewing videos. He is a re-watcher, skipper, revisiter, and replayer.Participant 8: He is a skipper viewer with some re-watching and revisiting.Participant 9: Similar to participant 1, he is a re-watcher with few skips.Participant 10: She is a re-watcher and skipper with some replaying and revisit-ing similar to participant 3.Participant 11: He is a heavy replayer similar to participant 2.Participant 12: He is a passive viewer who never interacts with the videos whilewatching.Participant 13: Another skipper with few re-watching.643.7. Analysis 3: Viewing BehaviourParticipant 14: Similar to participant 7, he is an active viewer who interacts sooften while watching either by re-watching, skipping, replaying, or revisitingvideos again.Participant 15: She is another active viewer similar to 7 and 14 .Participant 16: He shows a slightly different shape of viewing pattern. He is aheavy re-watcher, skipper, and replayer but never revisits videos.Participant 17: He is the opposite of participant 16 in that he is a heavy revisiterbut never replays videos in the same session.Participant 18: He seems to get bored so easily with videos, which leads him toleave many videos before they finish, otherwise he is similar to participant5.Participant 19: Similar to participant 7, she is a heavy re-watcher, skipper, re-visiter and replayer. However, participant 19 tends to watch more videospassively.3.7 Analysis 3: Viewing BehaviourTo be able to categorize participants’ behaviour described earlier, we needed to geta closer look at each behaviour. Thus a detailed analysis of each behaviour definedin Section 3.2 was performed using three different levels of analysis. First, to get asense of the overall trend, the entire dataset was analyzed for each behaviour. Sec-ond, to explore how frequently this behaviour emerged in each category, the datawas analyzed per category and lastly, for each participant, the data was analyzedseparately to justify that these behaviours occur in individual data and not onlycollectively.43.7.1 Skip BehaviourA user may skip parts of video to search for known items or portions of a video(e.g. specific step in instructional video), check whether it is the targeted video,4Additional data is available in the Appendix in Section C.1.653.7. Analysis 3: Viewing Behaviourcheck whether it is interesting to watch, or to pass over uninteresting portions [12].This is also clear from the development of the Wadsworth constant5 on YouTubewhere users can skip, for example, the first 30% of a video because it contains noworthwhile or interesting information. Users may have more specific informationneeds and selectively watch a video. Any time a user breaks the continuous viewingof a video by jumping to a part in the video that they have not seen before, then thenumber of skips for this video is increased by one. If a video contains at least oneskip then it is marked to contain the skip behaviour.OverallThe data revealed that 20% of the videos contained at least one skip action. Mostof the skipped videos came from the Music category (29%), participants 7 (27%)and 10 (25%), and the medium viewers (52%). This indicates that participantsdo actually interact with a video to skip parts of it. This confirms our hypothesis(H1 and H1. Q2.) and coincides with [46, 61, 70]. Looking at how often thisbehaviour exists on average per video, we found that when a video was skippedthere were on average around 3 skip actions on that video. This indicates that itdoes not happen by chance and participants intentionally skipped parts of videosfor different purposes. This was also found in [12]. It would be useful to relatethese skips with users’ intentions; however, as we mentioned earlier this mightreveal participants identity. We could have asked participants to give feedback oneach action but this would have been restrictive.Per CategoryThe skip behaviour was significantly seen in videos from the Gaming category, asillustrated in Figure 3.8. This might be because users watch these videos lookingfor some tricks they can apply, which explains the selective watching behaviourwhere participants skip parts that they know. All other categories had results sim-ilar to the overall trend, in which around one-fifth of the videos in each categorycontained skips with the exception of the Science & Technology category, whichwas significantly less than the average with only 13%. Since most of the Science &5 constant663.7. Analysis 3: Viewing BehaviourFigure 3.8: Percentage of skipped videos per category. Gaming (45%)showed a high percentage of videos being skipped while Science &Technology (13%) had a significantly lower percentage than the overallaverage.Technology videos were short (i.e. less than a minute) participants might not needto skip any content, which can justify the low percentage of skipped videos in thiscategory.A skipped video in every category seems to have at least two skips on averageper video. Again this justifies that these skips were purposely requested in thosevideos. A skipped educational video had the highest number of skip actions, witharound 5 skips on average per video. It could be that participants are searchingfor specific information within the video, which causes a lot of skips as a result ofmisses. This was also seen in [70] where learners skipped often while watchinglecture videos.Per ParticipantIn terms of individual participants, six participants (4, 5, 7, 8, 9, and 10) had sig-nificantly more videos being skipped than average. More than 30% of their videoscontained skip actions, as shown in Figure 3.9, while five participants (2, 12, 15,17, and 19) had significantly fewer videos skipped than the overall average. Par-673.7. Analysis 3: Viewing Behaviourticipant 12 had no video with skip actions, which can be explained by the fact thatwe did not have enough data on him since he had only 3 records for 3 differentvideos on 3 different days. Participant 12 might had some privacy issues with VVBbeing on, which caused him to stop the extension, or he might have watched morevideos on other devices that did not have the extension installed on them, since hereported watching videos on a daily basis.Figure 3.9: Percentage of skipped videos for each participant. Participants 4,5, 7, 8, 9 and 10 had more than 30% of their number of viewed videosbeing skipped.About the frequency of the skip behaviour per video, we found that apart fromparticipant 12, who did not have any skipped videos, each participant had at leastone skip action on average per skipped video. Participants 8 and 13 seemed toperform many skips when they skipped any video. They had 4 to 5 skip actionson average per skipped video. Most of participant 8’s skips occurred in videosfrom the Entertainment category (8 skips on average per skipped video from En-tertainment category), while participant 13 had most of the skips in the Science &Technology videos. An example of one of the videos viewed by participant 8 isshown in Figure 3.10 where it illustrates how he navigated and jumped around thevideo trying to find a part of interest and when it was found he watched that partfor an extended period of time.In summary, participants tend to skip videos intentionally to find specific infor-683.7. Analysis 3: Viewing BehaviourFigure 3.10: An illustration of how participant 8 viewed one of the videos. Itis a clear example of how he skipped the video multiple times tryingto find a position of interest and watch for an extended period of time.mation within videos. More than one-fifth of the number of videos were skippedwhere the Gaming videos showed the highest percentage of videos being skippedby participants. As expected, heavy viewers skip videos more often in comparisonto the other two groups (i.e. medium and light viewers). These results indicate thatusers tend to watch videos actively (i.e. interacting with the playback) when theyhave particular information to look for. These observations confirm that users donot watch videos passively (H1).3.7.2 Re-watch BehaviourA user may go back or rewind to watch some parts that they have seen before due tobeing interested in that part, to clarify any confusion, or to return to missed content[30, 70]. For example, in an instructional or a tutorial video a user may re-watchsome steps to make sure that they follow the steps correctly, especially if they areapplying them while watching. Any time a user watches any part of a video thathas been seen previously, the number of re-watch actions is incremented and anyvideo that has a number of re-watches larger than one is considered to contain are-watch behaviour.OverallThe data revealed that 25% of the collected videos had a re-watch action eitherimplicit, where a person seeks to watch something he/she has not seen but acciden-tally while continue watching he/she watches a part that has been seen before, orexplicit, where a person goes back in a video to deliberately watch a part that hasbeen seen before. 24% of videos had explicit re-watch actions while only 1% hadimplicit re-watch behaviour. The overall observations show that participants do ac-693.7. Analysis 3: Viewing Behaviourtually watch parts of videos again and this happens most often intentionally, whichconcurs with [17, 46]. Looking at the number of re-watch actions that occurredon each re-watched video revealed that there were on average 3 portions being re-watched per any re-watched video. These results answer H1. Q3 and once moreconfirms that people do interact with videos while watching and they do not justsit and watch a video from beginning to end.Per CategoryRe-watch behaviour was significantly seen in How-to & Style, Film & Animation,Sports, and Gaming categories where more than 30% of the videos contained are-watch behaviour (Figure 3.11). On the other hand, Comedy, Science & Tech-nology, and People & Blogs had significantly lower percentages of videos that con-tained this behaviour. Film & Animation exhibited the highest percentage (∼37%)of videos being re-watched, which can be due to the fact that most animated moviescontain a mix of comedy, actions and emotions, which are some of the key factors[100] prompting users to go back and watch the related parts again and again.Figure 3.11: Percentage of re-watched videos per category. Each categoryhad at least 20% of its videos encountering a re-watch behaviour. Film& Animation exhibited the highest percentage (37%) of videos con-taining a re-watch activity, which can be due to the mix of comedy andemotion these videos contain.703.7. Analysis 3: Viewing BehaviourIn terms of how frequent this happened per video, a re-watched educationalvideo showed the highest average number of re-watch actions, containing sevenre-watched portions on average which confirms the second part of hypothesis H3.This can be due to missing content, confusion, or in order to follow some stepswithin the video. Similar behaviour in educational videos was also found in [17,70]. How-to & Style and Entertainment re-watched videos had three re-watchactions while all other categories exhibited two re-watch actions on average perany re-watched video. Having at least 20% of each category’s videos containinga re-watch behaviour where each re-watched video contains at least two re-watchactivity shows that this behaviour can emerge in any category, which contradictswith other researchers’ [28, 44, 69] claim of the occurrence of this behaviour inEducational and How-to videos only.Per ParticipantIn regard to individual analysis, participants 3, 5, 8, 9, and 18 had a significantlyhigher percentage of videos that were re-watched (more than 33%), as illustratedin Figure 3.12. However, these participants, aside from participant 8, watchedfew videos over the duration of the experiment. Around 37% of participant 8’svideos contained re-watched portions. Looking at the frequency of this behaviourper participant showed that participant 9 tended to perform this action multipletimes per video where he had eight actions on average per a re-watched video.Participants 1, 5, 10, and 11 re-watched four to five portions on average per any re-watched video. These results confirm that the re-watch behaviour not only occurredin the crowd data but even for each individual.To summarize, the re-watch behaviour is common in any type of video andamong participants. More than one-fourth of the viewed videos contained a re-watch behaviour, which confirm a user’s high engagement with videos. Havingthis data provides a potential approach to detect which parts are appealing withinvideos that can be used to recommend clips for others to watch or to generate thevideo’s abstraction and summarization.713.7. Analysis 3: Viewing BehaviourFigure 3.12: Percentage of re-watched videos for each participant. Partici-pants 3, 5, 8, 9, and 18 had a significantly higher percentage of videosthat exhibited re-watch actions (more than 33%).3.7.3 Replay BehaviourThe data was analyzed to examine whether people watch videos in their entiretyand multiple times in the same session and whether they do this explicitly per videoor as part of repeating a playlist. If a video is played in its entirety for another timein the same session then the number of replay actions of this video is incremented.We anticipate that few videos will be replayed and most of these videos will beMusic videos that are coming from playlists since users may play a playlist in thebackground while they are doing other tasks on their computers. We also predictthat Comedy and animated short videos will be replayed, since they are funny,appealing and not too long to be re-watched entirely.OverallThe results showed that only 4.5% of the videos were replayed entirely and onlyone-third of these were from a playlist, which answers our question H3. Q2. Thisindicates that when people replay a video, most of the time they deliberately repeatwatching the entire video and not only because it is part of a repeated playlist,which contradicts with what we expected. As we expected, most of the replayed723.7. Analysis 3: Viewing Behaviourvideos were Music videos (77%) followed by videos from the People & Blogscategory (9%). Most of the replayed videos from the People & Blogs categorywere short videos (75%), which can be the reason for re-watching the entire videorather than only small portions of it. Some videos that are categorized as People &Blogs on YouTube contain funny clips, which may justify the replay behaviour onthese videos.Investigating how frequently this behaviour occurred per video revealed thatwhen a video is replayed, it has a high number of replay actions, 7 times on average.Going back to H1. Q3, people do re-watch the entire video in the same sessionmultiple times but not so frequently.Per CategoryBased on video categories, shown in Figure 3.13, the percentage of the videos be-ing replayed was very low. The Music category had significantly high percentagewith ∼11% of the videos being replayed confirming the first half of our hypothe-sis H3. We think most of these replayed music videos were rn in the backgroundwhile users were working on other tasks; however, our data cannot confirm thisclaim. In contrast, Sports and Gaming videos were never replayed in the same ses-sion, though, they had many re-watched parts as mentioned in the previous section(Section 3.7.2).A result close to the overall frequency of actions per video was also seen in thereplayed videos from Music category where a replayed music videos had around 9replay actions. This high frequency of replay actions per video in a single sessioncan justify the playback of the video in the background. Replayed videos in othercategories apart from How-to & Style only had one replay action per replayedvideo, which is what we predicted. However, the How-to & Style category hadonly one replayed video, which had 4 replay actions.Per ParticipantParticipants also had a similar variation in the percentages of videos that containeda replay action (Figure 3.14). For example, participants 2, 3, 11, and 15 had morethan 10% of their videos being replayed while nine participants (4, 5, 6, 8, 9,733.7. Analysis 3: Viewing BehaviourFigure 3.13: Percentage of replayed videos per category. Some categorieshad no replayed videos while the others had only few replayed videos.Music category had the highest percentage of videos being replayed(∼11%) while Sports and Gaming videos were never replayed in thesame session.12, 13, 17, and 18) did not replay any video in the same session. Participant 11had nineteen videos replayed out of his total 103 videos. The number of actionsper replayed video for each participant showed that each participant had only onereplay action per replayed video with the exception of participants 2 and 11 whohad more than five replay actions on average per video. A replayed video forparticipant 11 had seven replay actions on average.To sum up, the replay behaviour rarely existed where only 4.5% of videos werereplayed and this was also observed between categories and participants. Thisinfrequent behaviour while watching videos aligns with a passive viewing of avideo since the replay behaviour does not require any interaction from the userwhile watching the video content. Users only need to interact with the video onceit finishes playing in order to replay it, thus the viewing of this particular video waspassive. Another justification for it being passive viewing is the replayed videosfrom playlists where a video is automatically replayed without any actions from theuser. This tiny percentage of replayed videos again confirms that users do interactwith videos while watching.743.7. Analysis 3: Viewing BehaviourFigure 3.14: Percentages of replayed videos per participant. Participants 2,3, 11, and 15 had more than 10% of their videos being replayed whilenine participants (4, 5, 6, 8, 9, 12, 13, 17, and 18) did not replay anyvideo in the same session.3.7.4 Revisit BehaviourWe have seen that people re-watch parts of videos multiple times and they rarelyre-watch the entire video in the same session. What about different sessions? Dopeople return to view a video they have seen before? And if they do, how oftendoes this happen? In order to answer these questions the data was analyzed for arevisit behaviour where a video is accessed again but in a different session either tore-watch it entirely or just to find and enjoy some parts of its content. Each time avideo is accessed again but in another session, the number of revisits for this videois incremented and the video is marked as a revisited video.OverallAround 13% of the total videos (269 videos) were revisited where around halfof the re-watched behaviour occurred over multiple sessions. Thus, participantsdo re-watch portions of videos not only in the same session but even come backagain to these videos to enjoy the content they liked over again or to refer back tosome parts from videos. 54% of videos were revisited to be resumed while the restwere accessed to refer back to some content that had been seen previously. These753.7. Analysis 3: Viewing Behaviourfindings answer H1. Q3 and verify that activity within a single video does not stopwithin one session but reoccur when a video is accessed for the next time. Havingthis data recorded and accessible to users can offer a potential for an easy way tofind what a user has previously seen.The majority of revisited videos were from Music (41%), followed by Science& Technology (16%), then Education (10%), which came third. Users may go backto educational videos to refer back to some content or simply to resume from wherethey left off [46, 61]. The data showed that most revisited educational videos werein the long videos group, which may confirm the resume action. Most revisitedvideos from the Music category were in the medium length group where 96% ofthem were revisited to re-watch and enjoy some of its content again. These resultsshow that people resume watching videos in multiple sessions not only within asingle session as shown earlier in Section 3.7.3. The data had also been examinedfor how often this behaviour occurred per video and the results revealed that onaverage a revisited video was accessed in two different sessions.Per CategoryAnalyzing each category individually showed that Music, Education, Science &Technology and Sports categories had more than 14% of their videos being revis-ited (Figure 3.15). Most of the Music videos were revisited to be re-experiencedagain while the videos from the Science & Technology and Sports categories wereaccessed again mostly to be resumed from where they were left. Education videoswere revisited for both resuming the playback and to refer to previous content. Onthe other hand, How-to & Style, Comedy and Entertainment categories had lessthan 7% of their videos revisited. This can be because these categories showeda high re-watching activity (Section 3.7.2) where participants enjoyed and experi-enced the content within a single session. Videos that were revisited from Comedyand Entertainment categories, they were resumed and re-experienced over multiplesessions, whilst How-to & Style videos were revisited to refer back to some pre-viously seen content. Similar activity to the overall trend was seen for categorieswhere a revisited video from each category had between one and three revisit ac-tions on average.763.7. Analysis 3: Viewing BehaviourFigure 3.15: Percentage of revisited videos per category. More than 14% ofthe viewed videos in Music, Education, Science & Technology andSports categories were revisited, whereas only 7% of videos fromHow-to & Style, Comedy and Entertainment categories were accessedagain in multiple sessions.Per ParticipantWith regard to individuals, illustrated in Figure 3.16, participants 2, 3, 11 and 19had more than 20% of their videos revisited, while participants 4, 5, 12, 13, 16, and18 did not revisit any video. Participants 2 and 3 viewed only a few videos, 10 and23 videos respectively, and having one-fifth of these videos being accessed mul-tiple time shows a high revisitation activity. Participant 3 mostly revisited Musicvideos to be enjoyed again. However, participant 19 went back to watch Science& Technology videos in different sessions where 75% of the time was to continuewatching these videos from where they were left and re-experience some of theircontent. The lack of this behaviour for some participants could be explained bythe small number of videos they watched over the duration of the experiment (e.g.participant 18 watched only 4 videos over 4 days).The analysis of how frequently this happened per video for each participantshowed that all participants who revisited a video, except participant 11, had atleast one or two revisit actions on average for that video. When participant 11773.7. Analysis 3: Viewing BehaviourFigure 3.16: Percentages of revisited videos per participant. Participants 2, 3,11 and 19 had 20% or more of their videos being accessed in multiplesessions while six participants (4, 5, 12, 13, 16, and 18) did not revisitany video in multiple sessions.revisited a video, he tended to go back to it four times on average. Most of thesecame from Music videos where he re-experienced the same video again.In short, the revisit behaviour appeared in 13% of the videos being watched byparticipants and across all categories where 46% of these were accessed multipletime to be re-experienced. Looking at this based on the number of videos uploadedto YouTube means a large number of videos are actually being accessed multipletimes by users. Hence, having this data available to users would help them easilyaccess these videos when needed. Furthermore, having access to every part beingwatched would be much appreciated since users do not have to look for the partsthat they would like to refer to or to enjoy. The data showed that a revisited video isaccessed twice on average, which indicates the necessity of having access to suchinformation.3.7.5 Drop-off BehaviourOne of the known issues with online videos is that viewers tend to abandon videosbefore reaching the end and that they do not watch the entire video [59, 61, 124,126]. To investigate how often this happened in our dataset, we looked at any video783.7. Analysis 3: Viewing Behaviourthat was not watched entirely, any video that was left before the end, and when avideo was left or abandoned. A viewed percentage of each video was computedbased on the ratio of the watched content to the video duration. Those videos thathad a ratio less than one were considered as abandoned videos. Some researchers(e.g. [61, 124]) found that the length of the video has an effect on the drop-offratio. We would like to examine how this compared with our data.OverallThe results revealed that around 60% of the videos were not watched entirely wherethe starting part was skipped, or abandoned before they reached the end. Thisconfirms [59, 61, 124, 126] findings and answers our question in H2. Q1. Thiscan be because participants found out that the video is not the targeted video orbecause they found what they were looking for, or just lost interest in the content.Most of the dropped videos came from the Music (28%) category where two-thirdsof these videos were medium length videos. 5% of these music videos were sharedvideos where the starting part is skipped (i.e. shared from specific time feature onYouTube). This suggests a use case where recorded data can help users to easilyshare the parts they want from a video by simply using their own viewing history.13% of the Music videos that were not entirely watched were skipped from thestart. This signifies that users are applying the Wadsworth constant by themselveswhere they skip to the interesting information in the video.To determine the percentage duration of a video being watched, we looked intowhen a video was abandoned and how much of it was actually watched. Figure 3.17shows in seconds how long videos before were abandoned by illustrating how manyvideos were abandoned after t seconds from the start of the video. 50% of theabandoned videos were left within 2 minutes from the start while 10% were leftwithin the first 10 seconds of the video which, according to YouTube’s CreatorPlaybook6, is the time needed to hook viewers into watching the rest of the video.Looking into each group of video durations (i.e. short, medium, and long), the datarevealed that 50% of the short videos were only watched for 48 seconds, 50% ofmedium videos were left after 156 seconds, while 50% of the long videos were6 Analysis 3: Viewing BehaviourFigure 3.17: Number of videos being left before t seconds from the start ofthe video. 10% of abandoned videos were left within the first 10 sec-onds of the video.abandoned after 237 seconds. Participants tended to watch long videos for muchlonger before being abandoned in contrast to the medium and short videos. Thismight be because short videos are more concise where a short shot of any shortvideo can give a hint about its content and thus gets abandoned earlier than mediumand long videos.Examining the data on how much of a video is watched before stopping it,we found that 10% of the abandoned videos had just 3% of their content beingwatched, as illustrated in Figure 3.18. 47% of the videos were abandoned beforereaching half of their content, which indicates that large portions of videos werewasted. Looking more closely into the duration groups, we found that 49% ofshort videos were abandoned before reaching 50% of the video length and only6% of the short videos were watched entirely. In the medium length videos, 53%of the videos were left before reaching 50% of its duration and again only 6%were watched completely. However, 60% of long videos were abandoned beforereaching half way through their content and 8% were watched completely. In com-803.7. Analysis 3: Viewing BehaviourFigure 3.18: Number of videos per viewed percentage. 10% of abandonedvideos had 3% of their content being watched.parison to medium and short videos, shorter portions are watched from the majorityof long videos, which matches [61, 124] results. Hwang et al. found around 55%of the long videos stopped before reaching 40% of the video length while shortestvideos, only less than 20% of the videos dropped-off before viewing 40% of thevideo. These findings can be used for the benefit of the user where they can filtertheir viewing history based on how much of the video they have watched.Per CategoryLooking at the percentage of videos that were not completely watched within eachcategory revealed that Gaming had a very high percentage of videos being leftbefore the end, with around 78% of its videos abandoned or not watched entirely.Usually people watch these kind of videos looking for some tricks, and when foundand mastered users stop playing the rest of the video. This can also be justified bythe high percentage of videos in this category that were skipped so often as shownin Section 3.7.1, which confirms the search approach. Other categories had at least813.7. Analysis 3: Viewing BehaviourFigure 3.19: Percentage of abandoned videos per category. 78% of theviewed videos in Gaming were not watched entirely while around 50%of Science & Technology videos were abandoned before they finishedplaying.half of their videos being not completely watched as shown in Figure 3.19. Puttingthis in the perspective of videos viewed on YouTube, around 5.5 billion videos(based on comScore.com7 11 billion unique videos were viewed on YouTube asof March 2014) are not viewed entirely which is a large number of videos. Thisdata can be used to shorten videos where it could help to generate abstraction andsummaries of videos.Per ParticipantWith regard to analysis of individuals, six participants (4, 5, 7, 9, 14, and 18) leftmore than 70% of their videos before the end. Participants 5 and 18 actually did notwatch any video till the end. It is worth mentioning here that these two participantshad only a few videos in total; participant 5 had eight unique videos and participant18 had only three videos. As shown in Figure 3.20, all other participants asidefrom participant 12 had at least 33% of their videos being dropped off before being7 Analysis 3: Viewing BehaviourFigure 3.20: Percentage of videos that were not completely watched for eachparticipant. Aside from participant 12, at least 33% of each participant-viewed videos were dropped-off. Participants 5 and 18 had all theirvideos abandoned.played in their entirety. These results show the high tendency of abandoning videosbefore they finish among all participants.To wrap up, abandonment of videos was highly seen across categories andparticipants where more than two thirds of viewed videos were not watched com-pletely. The data showed that shorter videos were dropped-off earlier than longerones; however, larger portions of short videos were watched in comparison tolonger videos. At least half of the videos in each category were not entirely viewedand each participant abandoned at least one third of their viewed videos. Hav-ing this data available to users can offers a potential feature to filter videos and togenerate video abstraction.3.7.6 Interrupted ViewingThe results discussed earlier showed that people do interact with videos whileviewing by applying the different actions (e.g. skip, re-watch, or drop-off thevideo). In this section, we would like to investigate how often these interruptionsappear no matter what action is used to assess users’ engagement while viewing.We achieve this by examining whether participants in a single run watch videos833.7. Analysis 3: Viewing Behaviourentirely (i.e. from start to end) without any interruption and are passive viewers,or if they watch videos actively where they interrupt the video by skipping parts,re-watching other parts, or leaving the video before the end. We defined a videoas uninterrupted if it has been viewed entirely without any actions from the user(e.g. skip, rewind in the middle, or leave the video), otherwise it is consideredinterrupted.OverallThe results revealed a high percentage of interrupted videos where 1404 videos(∼66%) were not watched passively. This suggests a clear proof that most of thetime people do not watch videos uninterrupted, which confirms our hypothesisH1. 12% of these interrupted videos encountered all three actions (skip, re-watch,and drop-off), which indicates high engagement with their content. Our findingscontradict with [59, 124] who found that 80% of their collected videos had no userinteractions. This might be because Yin et al. [124] collected data during the 2008Beijing Olympics and viewed videos were related to the Olympics Games andhence were from a more specific category. However, if we compare their results tothe Sports category in our data, our data still shows a high interactivity where morethan 60% of videos from this category had user interactions.Per CategoryTo explore how often this behaviour emerges in each category, the data was ana-lyzed per category as illustrated in Figure 3.21. Gaming, Film & Animation, Edu-cation, Comedy, Entertainment and How-to & Style categories had more than 70%of their videos being interrupted. This can be seen from the large number of skips,rewind, and drop-off behaviours presented on these videos as shown in previoussections. For example, in Gaming, How-to & Style, and Education videos, peoplemay skip to specific steps within a video as needed or rewind some parts to under-stand or follow such parts [70]. Our result for the Education category agrees with[10] who found that 83% of the educational videos were actively viewed wherethey had between one and four interactions. All other categories had at least 50%of their videos being actively interacted with by participants. This high interactiv-843.7. Analysis 3: Viewing BehaviourFigure 3.21: Percentage of interrupted videos per category. At least 50%of the videos in each category were actively viewed where Gaming(85%) showed the highest percentage of videos being interrupted whilewatching.ity with videos provides evidence that users do not just sit down and watch a videofrom start to end without performing any action with the video while it is playing.Analyzing the number of interactions (i.e. either skip or re-watch) per videoin each category showed that each category had at least one interaction per videoon average. Education showed the highest average number of interactions whereit exhibited four interactions per video confirming [10] findings. These resultsdemonstrate that users are most likely to interact with any type of videos, whichcontradicts Kim et al. [70] and Gkonela et al.’s [44] claim of the existence ofinteractivity in educational and instructional videos only.Per ParticipantIn terms of participants, as demonstrated in Figure 3.22, each participant had atleast 33% of the videos interrupted while watching except for participant 12 whowatched all videos entirely and passively. In contrast to participant 12, participants5 and 18 were active with every video they watched where they showed high per-centages of skips and re-watch. Looking at the average number of interactions a853.7. Analysis 3: Viewing BehaviourFigure 3.22: Percentage of videos that were interrupted for each participant.Aside from participant 12, at least 33% of each participant-viewedvideos were actively watched. Participants 5 and 18 had all their videosbeen interrupted.participant performed per video, we found that each participant interacted at leastonce on average per video with the exception of participant 12. Participants 9 and11 showed high activity while watching videos where they performed five interac-tions (i.e. either skip or re-watch) on average per video (e.g. Figure 3.23). Theseresults show that each participant is more likely to interact with videos they areviewing, which provides proof for those who think that people do not watch videosactively.To conclude, participants are most likely to interact with any video they areplaying and at least one-third of the videos a user watches are actively viewed orinterrupted while playing them. Our data rejects Kim et al. [70] and Gkonela etal. [44] finding that this behaviour is only in educational and instructional videos.We showed that this activity occurred in any type of video and for each participant.Recording this interactivity and giving users access to it can offer different tools,for example, a search tool for previously seen clips from videos, a summarizationand authoring tool, and a sharing tool.863.8. Analysis 4: Impact of Video’s Popularity on Users’ BehaviourFigure 3.23: An example of one of participant 9 video viewing interactivity.As shown from the view count per second in the video, he performed2 skips, 3 re-watch, and a drop-off at the end indicating a high userengagement while watching.3.8 Analysis 4: Impact of Video’s Popularity on Users’BehaviourTo investigate whether a video’s popularity plays any role in users’ watching be-haviour, the data was analyzed to test if more actions appeared in the popular videosor if the popularity does not make any difference. Popularity was judged basedon the number of views per video pulled from YouTube’s metadata. A Pearsonproduct-moment correlation coefficient was computed to assess the relationshipbetween each behaviour and the video’s popularity. There was no correlation be-tween any of the behaviours and the number of views per video as shown in Ta-ble 3.2, which matches Hwang et al. [61] findings. This indicates that popularitydoes not affect the user’s behaviour within a video, which rejects H2. Q3.3.9 Analysis 5: Impact of Video’s Length on Users’BehaviourA video’s length is known to be a key measure of a viewer’s engagement. suggests that short videos will get more eyes than longer videos and the873.9. Analysis 5: Impact of Video’s Length on Users’ BehaviourTable 3.2: A Pearson product-moment correlation coefficient between eachbehaviour and the video popularity (i.e. No. of views obtained fromYouTube) and video length. There is no correlation between any pair,(Note: ∗ Correlation is significant at the 0.01 level (2-tailed).)No. of views Video lengthSkip -0.025 0.196∗Re-watch -0.013 0.103∗Replay -0.007 -0.014Revisit -0.006 0.059∗Drop-off 0.004 0.094∗Interrupt 0.019 0.107∗Viewed Percentage -0.041 -0.137∗longer the video, the less people remain engaged. To investigate whether this ap-plies to our collected data, a Pearson product-moment correlation coefficient wascomputed to assess the relationship between each behaviour and the video length.The results revealed that there was no or a negligible relationship between any ofthe behaviours and the video length as presented in Table 3.2. This indicates thatvideo length does not have any correlation with the user’s behaviour within a videoand this rejects H2. Q3. The findings of the viewed percentage of a video does notmatch Yin et al. [124], who found an inverse correlation between viewed percent-age and video length. It seems like the content of the video, not the length, affectsthe activity within videos where videos that have great appeal are most likely beinteracted with more.We also analyzed the data based on the different duration groups, short, medium,and long, to check for any differences on the frequency of each behaviour per videowhen videos are grouped. Since the assumption of homogeneity of variance wasnot met for the data of each behaviour we ran the Welch test to investigate thedifference between the three groups in terms of skip, re-watch, replay, and revisit.The results revealed that there was a significant difference between the three dura-tions with respect to each behaviour, as shown in Table 3.3. Longer videos had asignificantly larger number of skips than medium videos and shorter ones, whichcould be due to the presence of unimportant parts in these long videos and users notwilling to spend a long time on them. They also had more re-watch actions than the883.10. Analysis 6: Correlation between Users’ BehavioursTable 3.3: Welsh Test comparing the average number of actions occurred pervideo in each duration group: short, medium and long. A significant dif-ference was found between the three groups for each behaviour. Longervideos had significantly more skips and re-watches but less number ofreplays. (Notes: ( ) is Standard Deviation; ∗p < .001)Behaviour Short Medium Long Welch testSkip1.55 1.95 4.1520.66∗(0.95) (1.82) (5.47)Re-watch2.33 3.92 4.956.72∗(5.71) (10.03) (7.27)Replay5.95 7.84 17.65∗(2.61) (5.08) (.095)Revisit1.55 1.98 1.881.38∗(1.72) (2.11) (1.64)other two groups; however, it was only significant in comparison to shorter ones.On the other hand, longer videos had a significantly smaller number of replays incomparison to medium videos. This was expected since users will re-watch partsof longer videos rather than replaying the entire video again due to its duration.3.10 Analysis 6: Correlation between Users’ BehavioursWe anticipated that some behaviours will have some correlation with other be-haviours. For instance, more revisited videos will have more re-watch actions. Ac-cordingly, we examined for correlations between any pair of the behaviours usingthe Pearson product-moment correlation coefficient as presented in Table 3.4. Aspredicted there is a low positive correlation between re-watch and revisit, suggest-ing that people mostly access a video again to re-watch some parts of it rather thanresuming the playback. There is a moderate positive correlation between re-watchand skip behaviours, which is explained by the number of implicit re-watcheswhich, by its definition, contains a skip operation and then a re-watch action, asdescribed in Section 3.2. This can also explain why most re-watcher participantswere also skippers as discussed in Section 3.6. A high positive correlation alsooccurred between the drop-off and interruption behaviour and this is again because893.11. Design Guidelines for a Video InterfaceTable 3.4: A Pearson product-moment correlation coefficient between eachpair of behaviours. There is a moderate positive correlation between re-watch and skip behaviour, which was also seen in the individuals’ be-havioural grouping (Figure 3.7) where most re-watchers were also skip-pers. ∗ Correlation is significant at the 0.001 level (2-tailed).Skip Re-watch Replay Revisit Drop-off InterruptSkip 1 0.653∗ -0.005 0.254∗ 0.100∗ 0.301∗Re-watch 0.653∗ 1 0.006∗ 0.365∗ -0.028 0.149∗Replay -0.005 0.006 1 0.620∗ -0.184∗ -0.135∗Revisit 0.254∗ 0.365∗ 0.620∗ 1 -0.074∗ -0.031Drop-off 0.100∗ -0.028 -0.184∗ -0.074∗ 1 0.854∗Interrupt 0.301∗ 0.149∗ -0.135∗ -0.031 0.854∗ 1we considered any video that was dropped off as an interrupted video. Revisit andreplay showed a moderate positive correlation, indicating that the more a video isreplayed the more likely that it will be revisited again in the future sessions to be ei-ther played entirely or visit some parts of it only. We expected that videos that havemore skips will more likely be dropped; however, the data showed very low posi-tive correlation between them. A moderate correlation was found between skip andinterruption and this was observed in Section 3.7.6 where most of the interruptionswere caused by skip actions. All other pairs had little to no correlation.3.11 Design Guidelines for a Video InterfaceBased on the characterizations of the users’ viewing and navigation patterns someguidelines are proposed for a video interface to account for each behaviour. Theseguidelines are:• The interface should allow viewing videos similar to any typical video playerwhere users can play a video and control its playback. This will allow userswho like to watch their videos passively from the start to the end to use theinterface as they would normally do.• Having a replay functionality within the interface would be helpful for userswho like to play their videos over again and again.903.12. Study Limitations• Including a filmstrip in a video interface would allow users to easily get aglance of the video’s content, which could help users to quickly decide ifthey are going to abandon the video or not. This would also aid users tonavigate to specific content with videos that may require few skips from theuser.• The interface needs to provide users with access to previously seen videosif they wish to revisit and view them again. It should also provide access toeach interval a user has watched from any video. This would allow users toeasily go back to what they have seen and refer to or enjoy it again.• Identifying previously seen intervals within a video interface would be help-ful for skip, search and sharing, which could be accomplished by highlight-ing re-watched segments, known drop-off regions or mostly-skipped parts.• Allowing the crowd-sourcing of user interactions with video could lead tomore effective recommendations, automatic summary generation or socialnavigation tools.3.12 Study LimitationsThis section describes some limitations to our study and collected data. Some ofthese are addressed in Section 9.3. The first limitation is that our plug-in was de-veloped for a desktop platform and more specifically worked only in the GoogleChrome browser, which limited the size of the data collected. Thus, our resultsdid not count for users who watch videos on platforms other than desktop and inbrowsers other than Chrome. Another limitation is the number of people that par-ticipated in our study and their demographics. We did not have any users youngerthan 19 years or older than 40 years. This limits the discovered results and needs tobe furthered with more users covering the different age groups and user types. Thislimitation might be caused by our recruiting procedure since we only sent the invi-tation email for participation using university mailing lists and through our friends.Collaborating with YouTube would help to collect more data, which would covermost of these limitations, for example platform, number of users, age groups, and913.13. Directionsusers types. Nevertheless, even with such a narrow and small sample, we clearlysaw a change in how these people watch videos. This is consistent with the litera-ture on video viewing characteristics [28, 44, 61, 70, 126]. One of our main objec-tives was to confirm that there exist non-linear video viewing behaviours beyondlinearly watching movies allowing us to build interfaces tailored to the behaviourswe found. Thus it can also benefit those people who share similar viewing charac-teristics with our tested group. To establish viewing demographics and behavioursfor different video genres would be facilitated by a partnership with an online videoprovider such as YouTube. We leave this to future work as it is outside the scopeof this research.3.13 DirectionsOur analysis of users’ viewing behaviour showed high interactivity while viewingvideos, which has led to some insights on how this data can be useful and how itshows the potential for different tools that can be offered in any video viewer inter-face. However, as discussed in Section 3.12, our data is limited due to the platformused for data collection. This encourages us to develop a cross-platform plug-in tocollect as much data as possible to reflect the general population. This will allowus to investigate how these behaviours and categories persist with a larger groupof viewers. Moreover, it will enable us to explore other viewing patterns that canbe translated into new features. These features then can be integrated into videointerfaces to enrich users’ experiences.In future studies, we plan to connect these findings with users’ intention byrunning a large-scale study where viewing history is recorded along with the pur-pose of such behaviour or action. This can be achieved by collecting explicit datafrom users through tags, questionnaires or interviews. It will allow us to betterunderstand the purpose of each interaction to come up with meaningful viewingpatterns and more defined behavioural groupings. Moreover, more actions can bealso added to the captured behaviour, such as fast-forward and fast-rewind. Thesecan introduce new behaviours or simply signify similar behaviour as skip whereusers skim the content in order to find something specific or just to check whetherit is the intended video.923.14. Summary3.14 SummaryIn this chapter, using traces collected in a 5-month period from YouTube, we pre-sented a detailed investigation of users’ video viewing behaviour. We have demon-strated that users are actively watching videos and they do re-watch different partsof a video no matter the type of video. This has changed the concept of a videobeing watched sequentially and passively to what is known as active viewing. Mostimportantly, the dataset revealed that when users revisit videos for the second time,they are mostly coming back to refer to some of its previously seen content or toenjoy these parts again, not just to resume the playback. Having many revisitedvideos with a high percentage of the re-watch behaviour indicates that users areaccessing videos looking for more precise information than they have already seenbefore. This motivates us to explore the usability of such data when users haveaccess to it. Thus, we plan to design an interface that is tailored to the behaviourswe discovered by keeping track of users’ viewing and navigation practices as wellgives them access to this data. We aim to test the feasibility of having personalviewing history accessible to users and to explore different visualizations that canbe applied to communicate the viewing history in a clear and comprehensible wayto users. We want to check how these visualizations can help utilize and manageusers’ viewing behaviour.Additionally, we think having access to portions of videos a user has previouslyseen can help finding and locating a specific content from certain watched videos.This encourages us to investigate whether having access to such data can speedup and improve the search for previously seen content. Furthermore, having morethan one-fifth of the videos containing skip actions and one-fourth of the videoscontaining re-watch behaviour can be used to signify the interest or popularity ofeach part within a video. This information then can be used to easily navigate tothe most or least trending content of the video to be referred back to, enjoyed againor simply shared with others to have similar experience. It also offers an effortlesstechnique to generate video abstractions or summaries by, for example, stitchingup the most popular portions of a video. Thus, using our interface, we intend tostudy these applications when users have access to their detailed viewing historyand whether that can improve their performance.933.14. SummaryUsers’ classification based on their viewing patterns helped in defining somedesign guidelines for a video interface. Through examining each participants’dataset in correspondence to their interactions, we were able to categorize andidentify each user’s viewing patterns. Four main behavioural groupings appearedfor most participants: skipper, re-watcher, replayer, and revisiter. These findingsintroduce the potential for new approaches and techniques for a video viewer sys-tem and help in formulating some design guidelines for such a system and type ofusers. Thus, in the next chapter we look at how to develop an interface that recordsusers’ viewing history and provides them with access to this information. We wantto investigate whether this data will be useful for these types of users and how itcan be utilized.94Chapter 4Recording and Utilizing VideoViewing BehaviourThe dramatic increase in the quantity of video now available, either in the onlineform from services such as YouTube and Vimeo or the more personal form of homevideo, provides new challenges such as finding specific videos (or intervals withinvideo), authoring new videos from existing content (e.g. video summarization orhome movie editing), sharing video playlists (or even intervals) of online video andannotating dynamic content. The increased volume and short average duration ofonline video has led to new use cases: users are generally free to skip ahead to findintervals which interest them; users may re-watch parts of the video they enjoyed;users may employ temporal video links to provide instant access to a specific timewithin a video, skipping irrelevant sections; playlists and temporal links provideopportunities for customization and sharing of video consumption among users. Inshort, users can easily view personally interesting intervals and are not requiredto view the entire video. These use cases are supported within current interfaces;however, there are no existing mechanisms to support new use cases that arise as anatural consequence, such as navigation of previously viewed intervals.As found in Chapter 3 and [100], users often re-watch segments of video aspart of the contemporary browsing and navigation experience. A historical recordof video viewing would allow users (e.g. re-watchers) to quickly find, share andcomment on popular intervals. Retrieving and reusing information from the past954.1. Video Viewing Historyis a common activity for users [33]. It has attracted researchers’ attention in thedomain of web browsers where they have introduced and developed different toolsthat keep records of users’ browsing experiences for later use. This trend of re-search has recently started to get researchers’ interest in the field of video wherethey found various applications of such information and metadata. In Chapter 2, wereviewed how researchers have used this information for various applications andtools. However, most of these tools do not provide users access to such metadata tounderstand how they are going to utilize it. Hence, in this chapter we explore howto get this data in a video interface and how to present it to users. We want to see ifusing users’ viewing history facilitates better video navigation and mashup tools.To answer this question we developed a video interface and made sure that this in-terface looked like a normal interface leveraging the existing knowledge accordingto Don Norman [93].In this chapter, we introduce a new video navigation interface, which capturesusers’ viewing history (analogous to a web browser history) and offers users accessto their viewing history, in order to investigate the significance of having access tosuch data and how it can improve users’ tasks. Section 4.1 introduces video view-ing history, how it is captured and a description of a history representation forvideo viewing or navigation. Section 4.2 presents the different usage applicationsfor video viewing history. A description of the video navigation interface is pre-sented in Section 4.5 along with the evaluation looking at the significance of theviewing history and users’ impressions of using a personal video history to createmash-ups. Section 4.6 and Section 4.7 discuss the implications of the interface anddirections for future refinement.4.1 Video Viewing HistoryThe video viewing history refers to the list of every video interval a user haswatched associated with data such as video identifier (ID), video location, startand end times of the segment, and time of visit. It is captured to provide efficientaccess to what users have seen and allow them to go back to their previously viewedvideos, and in particular, the intervals viewed within those videos without relyingon their memory to remember what they have watched. This is accomplished by964.1. Video Viewing Historyrecording a continuous video history for the user as they view a video.4.1.1 How Viewing History Is CapturedUsing the video navigation interface described in Section 4.4, navigation-levelevents are captured by recording high-level user actions such as seek, play, pause,and changing video. Internally, the user’s video watching history is represented bya very simple data structure organized by linear user time. Whenever a segment ofvideo is played, a new log is added to the history record. Each log consists of asequence of basic elements (access time, video ID, start time, end time, favouritetime, share time).Figure 4.1: An illustration of a video history record (9:42 AM 26 Sept 2014,video 1, 1:24, 3:12, 12:31 AM 26 Sept 2014, 1:45 PM 26 Sept 2014).Access Time: is the user’s local time when they viewed this video interval.Video ID: A unique ID for each video to identify the source of the content of thecurrent interval.Start Time: is the start timestamp of the current interval from the source video.This time is relative to the beginning of the video.974.2. Use Cases of Video Viewing HistoryEnd Time: is the end timestamp of the current interval relative to the beginningof the video.Favourite Time: The recent date and time when this interval was favourited. Nilif it has not been favourited yet.Share Time: The recent date and time when this interval was shared. Nil if it hasnot been shared yet.An accumulated view count is maintained for every instant of time in all viewedvideos, which is derived from the intersection of all viewed intervals for each video.This provides the user with a better understanding of how a video was consumedand the importance of each interval based on their viewing frequencies. This canbe used later to facilitate fast navigation, search, and video mash-up.4.2 Use Cases of Video Viewing HistoryAs mentioned in Section 2.2, researchers have identified different applicationsof the collective users’ viewing history. In this section, we are going to presentthe main usages from an individual perspective. Video viewing histories and so-cial navigation provide potential interaction techniques for fast navigation, eventsearch, video mash-up, and segments and experience sharing.4.2.1 NavigationAs seen in Chapter 3, people in real scenarios re-watch different parts of videoswhile they skip others. This behaviour may indicate that these parts are important,interesting, affective, or hard to understand [100]. This watching behaviour leavesdigital footprints on the video frames, creating a non-flat video histogram empha-sizing the interest of each part of the video. Visualizing this non-flat histogram letsusers easily identify how this video was consumed and to quickly navigate to thedifferent parts based on their level of importance [69, 82]. It provides a contex-tual map of how a video was viewed which subsequently defines a new quick wayof navigating that video. Interestingly, using crowd-sourced data by applying thecollective wisdom can be leveraged for the benefit of future viewers who have not984.2. Use Cases of Video Viewing Historywatched the corresponding video. First time viewers of a video can employ the col-lective wisdom to guide their navigation through this video and save their time ofwatching what others found interesting rather than watching the entire video. Thus,video viewing history can offer a faster navigation tool for skippers, re-watchersand droppers.4.2.2 Scene SearchLooking for something that we have previously seen is sometimes crucial. Weare not looking for just that video, but for that specific scene or segment fromthat video. Most online video sites (e.g. YouTube) provide users with a list ofvideos they have visited with no mechanism to find a specific scene aside fromcontinuously scrubbing the video to find the scene you want. Having access to thedetailed viewing history alleviates this problem by presenting not only the differentvideos but also the different scenes or segments within videos a user has watched.For the scenes a user has watched the most, applying the non-flat histogram in thevisualization will make the search even faster. Providing filters which based onthe viewing history can speed up the search task. For example, filters can identifythe segments that were watched more than once, the segments that were watchedfrom a specific video, and the segments that were watched at specific times. InSection 5.1.3, Section 5.2.3 and Section 6.5.7 we will show how having access tothe viewing history significantly speeds up the search task.4.2.3 Video MashupTo produce some content that can be enjoyed or even shared with others, normallythe source video is pre-existing and they need to be combined into a unified video.These source videos usually have a lot of content that is uninteresting or not worthincluding in the final production video. Thus, it becomes necessary to determinehow to shorten these videos to emphasize the important content and reduce thetime needed for viewing. It is one of the known issues that researchers (e.g. [95])have tried to propose different approaches to ease the task varying from manualediting to fully automatic mechanisms. An extensive survey of these techniques ispresented in [3, 86, 114]; however, in general, these methods do not take advantage994.3. Common Properties of Video Navigation Interfacesof any implicit information gathered as users consume video, which may be usedto personalize the user experience.As we mentioned earlier, viewing history can emphasize the interesting or im-portant parts of each video, which can be used to filter the content of these videosand help deciding what to include in a video mash-up. It can be used to summa-rize a single video as shown in Section 5.2.3 and [108, 111, 125], or to combinesegments from different videos in a single video as shown in Section 4.5 and [40].Viewing history helps in reducing the time needed for creating video mash-ups andproduces a good quality video summarization.4.2.4 SharingOver the last few years, social media sites have become the most popular visitedwebsites where 74% of all internet users are now active 1. The key attraction ofthese social sites is sharing. People have become more interested in creating, pub-lishing and sharing their own experiences with their families, friends, and fans.On average 68% of viewers share a video they have watched. Some social me-dia sites (e.g. YouTube and Vimeo) have even tried to make it easier to share avideo from a specific starting time rather than the beginning of the video. Videoviewing history makes it even easier and faster where users can share a specificinterval from a video rather than just specifying the starting time. It is just a simpleclick on the share button for that specific segment, as will be shown in Chapter 8.Moreover, as described in the previous section, a user can share video mash-upsor summaries that are generated using their own personal viewing history or thecollective wisdom. Users can even share how they experienced a video by sharingtheir consumption history of that video. Having this ability allows users to comparetheir viewing behaviour with friends or even the collective wisdom.4.3 Common Properties of Video Navigation InterfacesIn this section, we define the common visuals and their related properties that willbe used throughout this dissertation.1 Common Properties of Video Navigation InterfacesFigure 4.2: The common video player component used in all interfaces pre-sented in this dissertation. It is used for a direct control of the selectedvideo using the play/pause button ( , highlighted in green) or via seek-ing using in the timeline (highlighted in blue). The current playheadtime is highlighted in red.Player: A familiar video player (similar to YouTube, QuickTime, etc.), shown inFigure 4.2, contains: play/pause, applied via a button (bottom left, high-lighted in green) or by clicking directly on the video; seeking via playhead(white circle on timeline, ) or mouse click on the timeline (red/grey bar,highlighted in blue); and a current playhead time label (white text, bottomright, highlighted in red). This component is used for direct control of theselected video.Thumbnail: A frame preview that represents the starting frame of an intervalwatched from a video (Figure 4.3). We applied this design since thumb-nails are an accepted form of preview in nearly all digital video retrievalinterfaces. A Thumbnail can have as many properties from the following asneeded.Temporal Visualization: A gray timeline attached to the bottom of a Thumb-nail with a red highlight to indicate the location of the interval withinthe entire video, as shown in Figure 4.3. It helps users spatially con-1014.3. Common Properties of Video Navigation InterfacesFigure 4.3: Thumbnail used to represent a video segment with its differentproperties: seek-able, drag-able, delete-able ( ), and favourite-able( ).textualize the temporal location of intervals within the complete videowhere it came from.Seek-able: Moving the cursor over the bottom third portion of a Thumb-nail pops up a zoom-in visualization of the Temporal Visualization, asillustrated in Figure 4.3, which can be used to seek within the videointerval using mouse motion as a cue. The seeking point is visualizedby a yellow line and the Thumbnail updates to reflect the current seekposition.Drag-able: To play the represented interval of the Thumbnail in the mainvideo player, users can drag the entire Thumbnail (or the seek locationin the Thumbnail) to the main video player (or click the top right arrow(Figure 4.3): dragging the entire Thumbnail plays the correspondinginterval from the beginning, while dragging the seek bar location playsfrom that specific time.Favourite-able: The interval represented by the Thumbnail can be favourit-ed/unfavourited by clicking on its corresponding button. If a Thumb-nail is favourited then the corresponding interval will be marked whichcan be used for filtering video intervals.Delete-able: Unwanted interval or Thumbnail can be removed from users’viewing history by clicking on its corresponding button.1024.3. Common Properties of Video Navigation InterfacesPlay-able: The interval represented by a Thumbnail can be played withinthe Thumbnail itself (i.e. small player). On mouse over, a play/pauseoverlay is displayed over the Thumbnail, as shown in Figure 4.4. Click-ing on the overlay plays the corresponding video interval within thissmall Thumbnail only.Figure 4.4: A play-able Thumbnail: on mouse over, a play/pause overlay isdisplayed over the Thumbnail. Clicking on the overlay plays the corre-sponding video interval within this Thumbnail.Size-able: Thumbnail has a variable size based on a weight factor that isbased on how often an interval has been viewed and how long it is. Itreflects the relative popularity of intervals.View Count Visualization: A vertical red bar attached to the left of a Thumb-nail visualizing how often the corresponding interval has been viewedin relation to other intervals in the video (Figure 4.5). The taller thebar, the higher the view count for that interval.Figure 4.5: View count visualization attached to the left of a Thumbnail in-dicating how often it was watched in relation to other intervals in thevideo.1034.4. Video Navigation Interface4.4 Video Navigation InterfaceGiven the volume of available video data, methods to effectively navigate this spaceare needed to provide users with more enjoyable video watching experiences. Theprinciple goal of our interface is to provide efficient access to previously viewedvideo, and in particular, intervals within those videos. We accomplish this byrecording a continuous video history for the user as they consume video content.We chose video authoring and sharing as the context of our interface since theyrequire bookmarking of intended events. We wanted to investigate whether usershaving their viewing history would ease the video authoring task and whether itwould change users’ authoring and viewing behaviour.Figure 4.6: Our Video Navigation Interface introducing users’ viewing his-tory in a video interface. The video Player (yellow) is adjacent to VideoGrid (green), which approximates the results of a search tool. The His-tory Timeline (bottom in blue) provides the history based on the user’sviewing time, in a scroll-able one-dimensional filmstrip interface. Thevertical red bars on each thumbnail represent how often this particularsegment has been viewed. The Video Mash-up (red) represents a videoedit list by visualizing the videos and the intervals a user may combineinto a summary.4.4.1 Visualization ComponentsOur video navigation interface, shown in Figure 4.6, was developed in Flash CS4using Action Script 3.0. It is based on five separate components, two of whichprovide the user with a familiar video setting (player and video selector), a third1044.4. Video Navigation Interfacevisualizes the personal viewing history using a filmstrip metaphor; and two facili-tate creating video summaries and edit lists, and sharing video histories.PlayerA familiar video player, as described in Section 4.3, facilitates direct control of theselected video. This player has an extra property, a frame preview (i.e. Thumbnail)triggered by the mouse over the timeline (including when seeking interactivelyi.e. with mouse button held down over time slider), as shown in Figure 4.7. Thispreview offers users a visual cue of the content at specific temporal location, hence,ease a search task.Figure 4.7: The video player used for a direct control of the selected video.When the cursor is over the time bar a thumbnail for that instant isshown to help the user navigate the video.Video GridThe Video Grid, shown in Figure 4.8, is a very basic component, which displays theavailable videos to the user. We use this as an example of what could be returnedfrom search results, video suggestions, or any other method of delivering videos tousers. Users may select a video from this grid by clicking on it, and the video willstart playing in the player component.1054.4. Video Navigation InterfaceFigure 4.8: The video grid, which approximates the results of a search tool.Clicking on a video from a grid will start playing the correspondingvideo in the player component.History TimelineThe history is the central component for visualizing a user’s video viewing history.We use a filmstrip metaphor for visualization: the History Timeline consists of aset of thumbnails (similar to interactive storyboards) with view count visualizationas shown in Figure 4.9. As the user consumes video, new thumbnails are added onthe right as older items are pushed off the display to the left. The History Timelineis scroll-able, to provide navigation of the entire viewing history and allows theuser to find a specific interval. Each interval is represented by a different roundedrectangle and each video has a different colour to make it easier for the user todifferentiate the videos. Every time the user watches a video clip, a new roundedrectangle is added to the history, encoding the start and end of that video intervalin the history. Thus, every seek or video change results in a new record added tothe history.The video history is similar conceptually to webpage bookmarks that are usedto refer back to visited pages; however, it is more complicated, as each Video Seg-1064.4. Video Navigation InterfaceFigure 4.9: The History Timeline provides a visualization of users’ viewinghistory based on their viewing time, in a scroll-able one-dimensionalfilmstrip interface. The vertical red bars on each thumbnail representhow often this particular segment within the video has been viewed.ment in the history view is a video interval and also includes a representation of thetime the user watched it. The history is unique in that it records in piecewise linearuser-time, not video time (we only display the history of intervals watched and donot represent the complete video from where it came from in the History view.).The timecodes of user time are displayed above the thumbnails; each thumbnailrepresents a small portion of time (typically around two seconds).The History Timeline also provides the user with the most-viewed intervals ofeach video (a temporally tracked frequency). A view count is maintained for allpoints of time for each viewed video. The view counts are summed and normalizedfor each video and shown in the interface as a vertical red bar to the left of eachthumbnail. The height of the bar corresponds to the view count of that interval: thetaller the bar, the higher the view count for that segment. Alternatively, these barscan be made to show different things, like letting users change the ranking basedon time spent, liked, more actions made, more re-watched, etc.As the user watches the video, the view count will dynamically update. Thisinformation propagates back to previously generated thumbnails. As the user re-views their personal video history, they can see the most popular intervals basedon the view count. They can click on any thumbnail to begin playback at the cor-responding time within the video; the video will play until the ending time of thelast thumbnail in the selected interval.Video Mash-UpThe Video Mash-up is used for short video composition such as summaries, pre-views, or trailers (analogous to video editing software e.g. iMovie, Movie Maker,1074.4. Video Navigation InterfaceFigure 4.10: The Video Mash-up component illustrates a user edit list whichconsists of a user-defined Video Segments with the options of delete( ), re-order ( ), play ( ), and modify segment’s start and end times.etc). We included this component in our prototype to evaluate one of the use caseswe identified that takes advantage of potential strength of having personal viewinghistory. The video mash-up, shown in Figure 4.10, represents a video edit list thatshows the videos, and intervals within each video that a user may combine into asummary. In summary creation mode, when the user subsequently plays a video(either through opening a new video, a segment from history, or seeking within thecurrent video) a new record is added to the edit list. If the video already exists inthe edit list, a new interval is added (unless there is an existing identical interval).If the video does not exist, a new video record with the interval is added to the editlist. Thus, adding segments or intervals to the edit list is simply done by watchingthese intervals, in comparison to other video editing software where users need toadd a video clip, watch it, and then trim it. Using this component makes videoediting simpler for novice users as we discovered upon evaluation.The video mash-up component allows the user to preview a specific interval byusing the play button ( ) for that interval. Users can also modify their edit list bydirectly updating interval start or end times, delete an interval ( ), delete an entire1084.4. Video Navigation InterfaceFigure 4.11: The Summary Preview allows users to watch the video summarythey have created. Users can re-play, modify, or export their edit record ( ), or reorder video records ( ). Previewing the entire summary isprovided by clicking on the “Preview” button.Summary PreviewAfter creating a video mash-up, the user can preview it using the Summary Previewmode, shown in Figure 4.11, where a new video player window pops up, and therest of the interface is disabled and darkened. The summary preview takes theuser’s edit list as input and plays the intervals in order using this content. It allowsusers to replay, modify (back to the main interface to modify the edit list), or exportit if they wish to save it - the edit list is exported as an XML file that can be sharedor modified later.Personal Navigation FootprintsThe recorded video history offers additional benefits, such as within-video historyand crowd-sourced user histories (neither of these were included in our study). Thepersonal history or footprints offers users a visualization of the view count of everypoint of time in the video: the blue bar in Figure 4.12 becomes brighter for the1094.5. Investigating the Feasibility of Video Viewing Historymost-viewed intervals, and each one can be selected to play. It provides the userwith information on the most-viewed intervals of each video (a temporally trackedfrequency). As mentioned previously, a view count is maintained for all pointsof time for each viewed video. As the user views a video the view count willdynamically change, thus the count is continuously updated and the informationpropagated backwards through the history. The view counts are then summed andnormalized within each video.Figure 4.12: A Video Timeline is combined with navigation footprints vi-sualization, where the more an interval within a video is watchedthe brighter that region becomes. Blue: a single user history (Per-sonal Navigation Footprints), Red: a combined multiple users histories(Crowd-Sourced Navigation Footprints).Crowd-Sourced Navigation FootprintsThe personal histories from multiple users are combined to form the most popularintervals for a video (i.e. social navigation similar to these are shown in the redbar in Figure 4.12). This provides users who are watching a video for the first timewith a mechanism to watch what others have found to be most interesting. This isan efficient navigation technique, predicated on trusting the crowd to be “correct”on interesting video content.4.5 Investigating the Feasibility of Video Viewing HistoryIn order to investigate the feasibility of video viewing history, we designed thevideo navigation interface, described earlier in Section 4.4, which keeps and visu-alizes users’ personal navigation history. We wanted to evaluate the utility of thevideo viewing history using our interface, thus, we designed a formative study (ap-proval certificate #: H08-03006) inspired by the studies performed by Grossmanet al. [49] and Leftheriotis et al. [73]. The study evaluated the design aspects andperformance of using personal video navigation histories, and investigated whetherhaving a viewing history would ease and speed up the creation of a short trailer1104.5. Investigating the Feasibility of Video Viewing Historyfrom previously navigated videos. The experiment tasked participants to create ashort trailer from a set of videos based on a predefined theme.We hypothesized that:H1: participants will create the short trailers faster when they have access to theirpersonal viewing history.H2: less trailer editing will be required when using viewing history to select clipsfor trailers.H2.1: less clip deletion will exist when clips are added from viewing historyH2.2: less clip play or preview will exist for trailers created using viewinghistoryH2.3: more timestamp modification will be needed for clips not added fromviewing historyH3: large variation will exist between participants for the clips and videos used ineach trailerH4: participants will need time to understand viewing history: what the visualiza-tion means and how it works4.5.1 ApparatusThe user study was conducted on a Toshiba laptop with a 2.10GHz Core2 DuoCPU with 2GB of RAM running Windows XP Professional. The laptop LCDdisplay was used at a resolution of 1280×800 pixels at a refresh rate of 60Hz. AMicrosoft optical wheel mouse was used as the input pointing device and the AdobeAir environment was set to 1260×430 pixels while running the Flash program.4.5.2 ParticipantsTwelve paid university student volunteers, 6 female and 6 male, participated inthe experiment. Participants ranged in ages from 23 to 35; all were experiencedcomputer users and have either normal or corrected-to-normal vision. Participantsreported watching videos either on a daily basis (ten participants) or 3-5 times a1114.5. Investigating the Feasibility of Video Viewing Historyweek (two participants). Only six of the participants have previously used videoediting software (e.g. iMovie, Movie Maker, Flash, etc.), and all stated “rarelyused”. Each participant worked on the task individually. Table 4.1 shows partici-pants’ demographics.This project targets heavy video users, and according to Purcel2, video is agrowing trend especially amongst those aged 18 to 29. Thus, students are a pri-mary target audience as they are representative of the current/near-future heavyusers of video interfaces. We used students across the university to capture thisdemographic.4.5.3 DesignUsing our interface described in Section 4.4, participants were asked to create sixtrailers based on different themes using eight different videos where they tried threedifferent modes. The different modes, trailer themes, and the set of videos used aredescribed below.Tested ModesThree different interface strategies were tested:• List selection (Grid) mode: participants were presented with the Video Player,Video Grid, and Video Mash-up components to construct a trailer. This moderepresented the case where there is no access to the user’s viewing history.This was the control method, which represented the case where a user hasaccess only to the viewed videos similar to what they have in a social videowebsite (e.g. YouTube).• Personal Navigation History (History) mode: participants were presentedwith the Video Player, History Timeline, and Video Mash-up components tocreate a trailer. In this mode, users had access to the video intervals theyhad watched only. The History mode provided access to every interval orsegment within videos a user had watched (i.e. detailed viewing) unlike2 4.1: Demographics summary for participants in the investigating the feasibility of video viewing history study.P GenderAge Experienced Watching Have Used Video Used Editing UsageGroup Computer User Frequency Editing Tool Tool Frequency1 Male 19-24 Yes Daily Yes Movie Maker, Flash Rarely2 Female 25-30 Yes Daily No - Never3 Female 25-30 Yes Daily Yes Movie Maker Once a month4 Male 31-35 Yes Daily Yes Final Cut Pro, iMovie, RarelyMotion, Flash, Adobe5 Male 19-24 Yes Daily Yes iMovie Rarely6 Male 25-30 Yes Daily No - Never7 Male 31-35 Yes Daily No - Never8 Female 25-30 Yes Rarely Yes Movie Maker Rarely9 Male 25-30 Yes Daily Yes Movie Maker Rarely10 Female 25-30 Yes Once a week No - Never11 Female 25-30 Yes Daily No - Never12 Female 25-30 Yes Daily No - Never1134.5. Investigating the Feasibility of Video Viewing Historythe Grid mode where users only have access to the videos. This mode wasincluded to investigate the effect of history separately.• A combination of the above (Hybrid) mode: Both the Video Grid and theHistory Timeline from the two previous modes were given to the user in thismode as shown in Figure 4.6 along with the Video Mash-up to construct atrailer. Users had access to their viewing histories and the entire videos wereprovided, i.e. they had access to watched and unwatched video intervals.This mode represented our proposed interface.For all modes, participants used the Summary Preview component to previewtheir trailers before exporting.Each participant tried each of the three modes (Grid, History and Hybrid) tocreate a 15-second trailer using four different clips for each trailer. Participantswere divided into two groups of six where each group has a different order of thefirst two modes (Grid and History). We kept the Hybrid mode last for both groupssince it is a combination of the other two modes.Trailer ThemesSix different trailer themes were tested:1. serene animals;2. loud noises (sneeze, cry, laugh, etc);3. funny parts;4. two objects (either two humans or two animals);5. the child or animal gets scared;6. background laughter.The six trailer themes (two for each mode) were tested and both groups kept thesame order, but the order for the two contrasting modes was changed to eliminatethe mode order effect. Thus, group 1 created the serene animals trailer using Gridmode while group 2 used History mode to create the same trailer.1144.5. Investigating the Feasibility of Video Viewing HistoryExperimented videosA set of eight different short videos of length between 14 seconds and one minute(sourced from YouTube, chosen based on total number of views) were used in thisexperiment. These videos are:V1: Baby scared by his own fart!3V2: Baby elephant sneezes and scares himself4V3: Funny baby reaction to light5V4: Cat gets shocked by electric fence6V5: The sneezing baby panda7V6: Charlie bit my finger - again!8V7: Emerson - mommy’s nose is scary!9V8: Evil penguin10Once all the videos were viewed the participants were instructed to create atrailer based on a given theme. They created six different trailers (3 modes × 2trailers per mode). The participants were asked to construct the trailers as quicklyand accurately as possible. For each trailer, the navigation heuristics, the com-pletion time, the final edit list of the trailer, the number of deletion tasks, andthe number of modifications were recorded. The completion time was measuredfrom when the participant clicked on the ‘Start’ button (after reading the task)until the moment the participant clicked on the ‘Export’ button after previewingthe trailer. The participants were also asked to fill out the post-experiment ques-tionnaire shown in Section B.2 to give their comments on the interface, feelings3 OBlgSz8sSM9 Investigating the Feasibility of Video Viewing Historyon usability, and suggestions for improvement. They were asked to rate differentinterface features and tools using a 7-point Likert scale, “I found it easy to use”and “I think it would be useful”,(1 = strongly disagree, 7 = strongly agree). Theexperiment lasted approximately one hour per participant.4.5.4 ProcedureThe experiment proceeded as follows:1. The researchers gave the participants a walk-through of the interface explain-ing the functionality of each feature and tool within the interface and theireffects when applied to a video. The participants were shown how to use thefeatures to create a short trailer. This stage took approximately five minuteson average. The participants were allowed to ask any question during thisstage.2. Participants created a trailer using the same theme and interface to becomefamiliar and comfortable in the application of its features and tools. Theywere allowed to freely view, apply features and control the playback of eightdifferent videos. These eight videos were not included in the actual experi-ment. Participants were also allowed to freely ask any questions during thisfamiliarity stage, which took about 10 minutes.3. The first step of the experiment started by asking the participant to com-pletely watch eight different videos, described earlier in Section 4.5.3, andre-watch any parts they like. A single video was played upon the partici-pant’s selection (never more than one video playing simultaneously). Par-ticipants were allowed to freely select the order of the videos and re-watchany parts from videos. However, they could not start creating trailers unlessthey had watched the entire eight videos to make sure that they had createdenough viewing history for each video, and all participants had seen equiva-lent video. Participants were given extra time (about three minutes) to playback different segments, seek to different parts within a video and create acondensed navigation history based on their interest. This navigation historywas stored to be used for the History and Hybrid modes. At this stage of1164.5. Investigating the Feasibility of Video Viewing Historythe experiment, the participant did not know anything about the theme of thetrailers, in order to create a blind navigation history.4. Once all videos were viewed, the participants started the tasks by clickingon a button to create a trailer. The theme of the trailer and the mode usedwere provided to the user in a popup message, including a “Start” button. Byclicking on the start button, the interface was modified to display componentsthat are related to the task mode. The Video Mash-up component was shownto the user to start adding different clips. Upon the completion of the trailer(by clicking the export button after previewing the trailer), the participantsadvanced to the next trial to create another trailer using the same set of videosbut with a different theme.5. The experiment ended once the participants had submitted all the six trailersusing the different modes. Finally, the participants were asked to fill out thepost-experiment questionnaire shown in Section B. Results and DiscussionsThe study showed positive results of the features of the interface. There were someoverwhelming comments made by the participants about the interface. Most par-ticipants commented that they enjoyed using the interface and they could imagineseeing its features applied, especially in social networking websites like YouTube.They were impressed with the ease of history, finding clips from history and creat-ing a trailer.Task PerformanceFor the trailer creation, each participant was able to complete all the six trailers inunder 6 minutes per trailer. On average, participants were faster in creating trailerswhen they used their viewing history compared to the Grid mode when they didnot have access to the history as shown in Table 4.2. This result confirms ourhypothesis H1. Providing participants with both modes (i.e. Hybrid) significantlyoutperformed the two modes. The mean completion time for each trailer themeusing the different modes is illustrated in Table 4.3. For ‘Serene animals’ and1174.5. Investigating the Feasibility of Video Viewing HistoryTable 4.2: Means (and standard deviations) for number of times video Gridused, number of times History Timeline used, completion time in min-utes, number of clip previews, number of clip deletions, number of clipmodifications and number of trailer plays for the different modes. Whengiven the hybrid mode, participants created trailers significantly fasterthan using the two modes separately, which illustrates the utility of navi-gation history in video authoring. (∗p < 0.005 )Grid History Hybrid F-testGrid Usage5.45 - 1.96-(3.06) - (2.05)History Usage- 5.33 2.79-- (2.18) (2.45)Completion Time4.08 3.44 2.676.44∗(1.39) (1.44) (1.27)Previews1.88 1.63 1.710.44(1.03) (0.82) (0.95)Deletion4.13 2.33 2.462.68(2.94) (3.57) (2.34)Modifications17.83 15.63 11.133.03(9.81) (11.03) (7.73)Plays6.96 5.58 3.712.51(4.79) (5.51) (4.82)‘Two objects’, participants took more time creating them using the Grid mode. Ascan be seen, when the Grid mode was faster, it was only few seconds faster thanthe History mode; however, when it was slower it was at least one minute slower.This shows that History was much better than Grid, which indicates that viewinghistory sped up video mash-up.In terms of trailers’ editing, as shown in Table 4.2, there was no significantdifference between the modes in term of number of previews, deletion, modifica-tions, and plays. The means per trailer and mode is illustrated in Table 4.3. Theinsignificance in the performance may be due to the small number of tested themes1184.5.InvestigatingtheFeasibilityofVideoViewingHistoryTable 4.3: Means of number of times video Grid used, number of times History Timeline used, completion time inminutes, number of clip previews, number of clip deletions, number of clip modifications and number of trailerplays for themes per mode. For ‘Serene animals’ and ‘Two objects’ trailers, participants took more time creatingthem using the Grid mode.Theme ModeGrid TimelineTime Previews Deletion Modifications Playsusage usageserene animalsGrid 6 - 5.47 2 3.83 23.17 6.5History - 6 3.83 2 0.83 17.83 10loud noisesGrid 4.17 - 3.84 2.33 3.83 14 4.83History - 6.17 3.91 1.5 4.5 17.33 6.67funny partsGrid 4.83 - 3.58 1.5 5.33 23 10History - 4.67 3.74 2 0.8 17.17 4.33two objectsGrid 6.83 - 3.44 1.67 3.5 11.17 6.5History - 4.5 2.26 1.17 0.5 10.17 1.33gets scared Hybrid 1.5 3 3.13 2 2.17 11.83 5.08backgroundHybrid 2.42 2.58 2.21 1.42 2.75 10.42 2.33laugh1194.5. Investigating the Feasibility of Video Viewing Historywhere we had only two per mode and the unfamiliarity with the viewing historyand its usages. Moreover, even though participants were asked to create these trail-ers as quickly as they could, they felt that it was a subjective task and they wantedto spend more time on it to get almost a perfect trailer out of the provided videos.This could also be seen from the significant positive correlation between the com-pletion time and the time the participants spent in previewing the trailers severaltimes, playing each interval, deleting intervals, and tuning intervals to an accuracyof 0.1 seconds. One of the participants tried to extract clips from almost all thevideos that are related to the theme, not just taking the four clips as asked.Nevertheless, these results show that having access to the personal viewinghistory positively impacts the video mash-up task. The access to viewing historyalso changed users’ authoring behaviour, where participants used History Timelineto select clips for the trailers in the Hybrid mode more than accessing each videoand scrubbing it to select the intended clips. This shows the promise of providingthe viewing history and its tendency of changing the way users view and authorvideos.Modes’ RankingWe asked participants to rank the different modes for ease and speed in creat-ing trailers. They ranked the Hybrid mode as the easiest while Grid and Historyhad almost the same ranking (both came second). This might be due to the factthat the Hybrid mode gave participants access to the components of the other twomodes. This gave them two options to find clips: either by navigating the entirevideo or searching from their personal history. When it was easier to remember thelocation of the clip within the video, participants preferred to navigate the video(i.e. using the Grid mode’s components), whereas otherwise they used the history.When asked which mode for creating trailers was the fastest, they ranked bothHybrid and History as the fastest modes with Grid coming last, which coincideswith the quantitative results, illustrated in Table 4.2, where a significant difference(F(2,69) = 6.443, p < 0.005, effect size = 0.157) of the completion time existedbetween modes. A post-hoc pairwise comparison showed that Hybrid and Historywere faster than Grid. This was because the time needed for participants to find1204.5. Investigating the Feasibility of Video Viewing HistoryTable 4.4: Clips agreement percentage per trailer theme. Background laugh-ter trailer showed the highest agreement while ‘two objects’ trailershowed the lowest.Clip1 Clip2 Clip3 Clip4Serene animals 66.67 50 91.67 66.67Loud noises 75 75 66.67 66.67Funny parts 58.33 66.67 75 50Two objects 41.67 50 41.67 75Gets scared 66.67 66.67 66.67 91.67Background laughter 91.67 100 100 100clips using these two modes was faster than navigating the entire video since theybookmarked the events. It could be also because most participants used the his-tory component more frequently than the grid to select clips in the Hybrid modeas shown in Table 4.2. Participants commented that they liked the interface whenboth modes are available (Hybrid), to have more freedom to use any feature theylike.Trailers’ AgreementThe generated trailers were analyzed to check the agreement of the individual in-cluded clips between participants and how often each video was used. Table 4.4shows the percentage of agreement between participants for each clip used in eachtrailer, while Table 4.5 illustrates the percentage of agreement for each video usedin each trailer. Surprisingly, a high percentage of agreement existed between partic-ipants, which contradicts with hypothesis H3.‘Background laughter’ and ‘Sereneanimals’ trailers had 100% agreement of videos used, even though there are 6videos that contain background laughter. ‘Background laughter’ trailer had also100% agreement for clips used except for clip1, as shown in Figure 4.13, whichwas a result of one participant’s interval added from V1 that did not match otherparticipants’ intervals. ‘Funny parts’ and ‘Gets scared’ trailers showed a largevariations in the video used which can be explained by the variation in personalperception of events to be funny or scary. The lowest clips agreement was foundin the ‘two objects’ trailer which was due to the fact that videos that contain this1214.5. Investigating the Feasibility of Video Viewing HistoryTable 4.5: Videos agreement percentage per trailer theme. Backgroundlaughter and serene animals trailers showed 100% agreement of videosamong participants.V1 V2 V3 V4 V5 V6 V7 V8Serene0 100 0 100 100 0 0 100animalsLoud noises 75 75 75 25 41.67 66.67 50 0Funny parts 25 58.33 66.67 16.67 75 41.67 58.33 66.67Two objects 8.33 41.67 58.33 33.33 58.33 91.67 8.33 100Gets scared 66.67 66.67 16.67 66.67 91.67 8.33 83.33 0Background100 100 100 100 0 0 0 0laughtheme have the two objects appear in several parts of the video which caused thevariation of the clips chosen.Figure 4.13: Participants’ clip used in the background laughter trailer. Allparticipants used the same clip with some tolerance at the start andend times of the clip except participant 2 who used a different interval.This high agreement between participants in the clips and videos they used1224.5. Investigating the Feasibility of Video Viewing HistoryFigure 4.14: Percentage of clips extracted from each video. Most of the clipsin the trailers came from V2 (Baby elephant sneezes and scares him-self) while V6 (Charlie bit my finger) and V7 (Emerson - Mommy’sNose is Scary) were the least used in the trailers.shows a promise of using the collective wisdom to generate video mash-ups. Itwill help reduce the time needed for someone to create them where it can be auto-matically generated using some smoothing algorithm for the collective data.Videos’ UsageFigure 4.14 illustrates how often each video was used to create the 6 trailers. Mostof the clips used in the trailers came from V2 (Baby elephant sneezes and scareshimself) since it has scenes that match all themes used for the trailers. While V6(Charlie bit my finger) and V7 (Emerson - Mommy’s Nose is Scary) were the leastused in the trailers because they only match specific themes. For example, V6 canbe used for ‘two objects’, ‘loud noise’, and ‘funny parts’ trailers, while V7 can beonly used for ‘funny parts’ and ‘gets scared’ trailers. Moreover, the variations inpersonal taste and perception plays another factor on which clips and videos beingused for each trailer.1234.5. Investigating the Feasibility of Video Viewing HistoryFeatures’ RankingWe found positive results from the questionnaire in relation to the ease and use-fulness of the interface components and features. The average ranking across allcomponents and features was 6.2 for ‘would be useful’, and 5.8 for ‘easy to use’(all scores are out of 7).Least useful features: Two features scored less than 6; all others were graded ashighly useful (more than 6 out of 7). The two which scored lowest were: the useof different colours to differentiate between videos in the history timeline (5.1 outof 7) and the different rectangles used for each segment in the history timeline (5.5out of 7). This might be due to the clear differentiation between the tested videos,which made these features less useful.Most useful features: In the Video Mash-up component: modify clips timecode(6.84) and remove clips (6.5). In the Summary Preview component: replay thetrailer (6.84) and modify trailer (6.5). In the History Timeline component: playspecific segment (6.4) and create trailer (6.3). In the Video Grid component: selecta video to play (6.6). Finally, in the Video Player component: play/pause (7), seek(7) and frame preview (6.9). For easiness, almost all the features were easy to useexcept the editing of the clips’ timecodes in the trailer edit list component (4.5)since participants had to type these timecodes.Participants’ SuggestionsParticipants suggested that dragging intervals from the History Timeline to theVideo Mash-up to add clips might be easier for them instead of re-watching theclips to be added to the trailer. They also liked to have the dragging interaction toorder clips rather than using the corresponding clip’s up/down buttons ( ). Partic-ipants found it frustrating when new clips were added to the trailer edit list everytime they used seek in the video. One of the participants said, “I would like to havecontrol of when these clips are added rather than going back every time and deletethe clips which I had not originally added”. Allowing for enabling and disablingthe trailer edit list component can mitigate the problem of automatic addition of1244.5. Investigating the Feasibility of Video Viewing Historyclips to the list.For the history, at the beginning when viewing and navigating videos, it wasdifficult for participants to understand how the history is constructed and whenthumbnails are added. Some participants found the addition of new segments to thehistory strip every time they used seek in the video a bit confusing and impactedthe understanding of their history. However, as reported in the results by the timewhen participants experimented with the hybrid of both modes, they tended to usethe history more often. Moreover, some participants stated that the history is funand easy to use once you get familiar with it. This confirms our hypothesis H4 andwe anticipated that, since it is a new concept and users have not had any earlierexposure to it.In terms of the history visualization, the size of the thumbnails was criticizedas too small for some participants, which hampered content recognition. Mostof the participants suggested using larger interface components. Some participantsfound having a horizontal component for the history confusing, since it represents adifferent timeline from video. To separate the concepts, we need to apply a designguideline such as using a vertical visualization for the history and horizontal forvideo. This may also make scrolling faster since it is the norm in most long lists.Lastly, participants suggested having control over the vertical red view count barattached to each thumbnail in the history. They would like to decide on the metricsused for these bars instead of using just the frequency. This will enhance the abilityto filter the different thumbnails. Some metrics that may be used are: time spent,liked segments, number of actions, users ranked, or mostly watched.Participants’ CommentsThere were some overwhelming comments made by the participants about the in-terface. One participant commented that, “I definitely see how this would be reallyhelpful for long videos because I will not have to waste my time watching thewhole video again to get to the important stuff. I could directly use my previoushistory to navigate to these intervals.” Others said, “It is really cool and easy tocreate different trailers that I could share with my friends”; “I need to have this.Could we have it in YouTube?”; and finally “I really do like this interface and I1254.6. Lessons Learnedwould love to have it to create wonderful clips from my home videos.”Based on the results and the participants’ valuable comments and suggestions,we believe using implicit bookmarking through a personal viewing history is help-ful for navigation of video spaces. It can be applied to different applications suchas video highlights, video summaries, authoring using multiple videos, sharing in-teresting clips, quickly navigating and skimming previously watched videos, andwatching new videos’ interesting parts using crowd-sourced viewing histories. Par-ticipants foresee that our interface would be valuable in social networking contextssuch as online video sites.4.6 Lessons LearnedWe have learned a great deal from this user study. Primarily, the introduction ofviewing history to users changes their behaviour in viewing and video mash-upscreation. More exposure to the concept will introduce a new way of watching,navigating, and enjoying videos from which future video viewing behaviour willemerge. Many other lessons have been learned about the design of the interface,which we employed to formulate the design guidelines presented in Section 4.6.1.For example, visual orientations can introduce confusion for some users; for in-stance, the horizontal user’s timeline (i.e. history timeline) was confused with thevideo timeline. Also, when the content of videos is distinguishable, then there isno need to introduce extra cues to differentiate between videos because it is con-sidered as clutter. Size of thumbnails plays an important rule in recognizing theircontent and users tend to like them bigger. However, there is a trade-off betweensize, organization of components and interface clutter, which can be used to as-sess an acceptable size. In terms of the interactions, they need to be simple andeffortless. Examples of simplifying include drag-and-drop interaction to add clipsto users’ edit lists instead of the necessity to watch these clips, and for orderingclips instead of using specific buttons, as well as using ‘+’ and ‘-’ buttons for edit-ing users’ clips to control timecodes for the clips instead of just being able to typethem. Users want more control over how the data is filtered. For example, theywould like to decide on the metrics used for the view count visualization insteadof using just the frequency. In relation to the user study design, to illustrate the1264.7. Directionsdifference between methods, more themes need to be tested, more tasks need to beperformed and more participants need to be recruited.4.6.1 Design GuidelinesIn this section we list some design guidelines that are formulated based on theresults and feedback from the user study presented earlier in Section 4.5.• Use horizontal orientation to indicate video timeline and vertical for user’stimeline (i.e. history).• Following the previous guideline, apply the component’s width to those thatare representative of videos to indicate the video length or the representativeinterval length. For example, the width of a thumbnail represents the lengthof the interval being represented; thus, seeking that thumbnail will only seekwithin the representative interval.• Use considerable thumbnail size to recognize its content.• Never play more than one video at a time to avoid distraction and to avoidoverwhelming users by the interface.• Use drag-and-drop interaction to move thumbnails or clips.• Use interactive thumbnail instead of a set inactive images to convey moreinformation using a smaller space.4.7 DirectionsOur preliminary results point to a few promising directions. Since using historyto create videos’ mash-up or abstraction showed encouraging results, we plan toinvestigate its performance when compared to other authoring tools. This moti-vates us to explore various tools that can be proposed using history to improve taskperformance. Thus, we are going to look at the different interactions that can beoffered to easily access, manipulate and reuse the viewing history.1274.8. Summary4.8 SummaryTo summarize, we have developed a video-viewing interface to provide users ac-cess to their viewing history as well as a platform to navigate, play, and generatevideos. Previously proposed interfaces did not offer users accessibility to what theyhave seen before, aside from just showing users their footprints in a video. More-over, the interfaces that utilized video viewing history kept this data from theirusers. Our contribution of offering transparent access to what users have viewed isthe ability to assess how it can change the way users view videos and what kind ofapplications can emerge based on users appropriation of this data.The interface defines a new way to navigate a video space using a user’s ownpersonal history, which provides a new mechanism for consuming media content.In this chapter we have presented how to structure a video navigation history tofacilitate later reuse and sharing. An interface was described with different vi-sualization components that use the history representation structure. Using theinterface and the structure, we provided four different applications where viewinghistory could be efficiently applied. However, there are different aspects that needto be taken into consideration in relation to the interface. The interface appliedthe interactive filmstrip metaphor, which we know sacrifices screen space as it getslonger and condenser. Visualizing large sets of data is a well-known problem thathas received a significant body of work in the literature. We need to explore visu-alizing the history by applying some of these mechanisms against the usability ofthe history. The uses of the history are not limited to the provided applications. Forinstance, it could also be used to analyze users’ navigation behaviour, and extractsome entries of the history to be re-executed.Evaluating our interface showed positive results and we received highly pos-itive comments from participants. This encouraged us to look at how to improvethe way a viewing history is presented and in which application it excels. In thenext chapter, we describe different visualization designs to represent a single videoviewing history, illustrate additional interactions, and investigate how they per-form.128Chapter 5Single Video Viewing HistoryVisualizationsConsuming video online, on mobile devices or on home computers is now a well-accepted form of communication and entertainment, shown by the rapid growth ofvarious providers such as YouTube, Vimeo and Dailymotion. Despite the volumeof video available, methods for efficient navigation and sharing of in-video contenthave not provided users with the ease of use or level of personalization required toaccommodate their needs. Constraints such as limiting the length of videos (e.g.six seconds on Vine and 15 seconds on Instagram) can simplify the problem; how-ever, these do not address the challenges with unconstrained video. Part of theproblem is that the spatio-temporal representation of video complicates relativelysimple actions such as search or selection. Video search often taxes human mem-ory by requiring memorization of a large quantity of previously seen content. Inparticular, finding and selecting interesting parts has poor navigation and searchsupport. We propose that the addition of a single-video visualization mechanismusing viewing statistics will overcome some of these difficulties.We investigate the usefulness of visualizing prior viewing by either single ormultiple users to support fast navigation (to popular or unseen parts), search anddirectly previewing content, without interrupting normal playback. We envisionusers will watch videos differently when they have a visualization of their personalnavigation: they can implicitly tag segments of video by re-viewing (thereby in-1295.1. Visualize History Using a List of Recordscreasing the view count); it would also capture their natural behaviour, such aswatching a funny section multiple times in a lengthy video. This non-linear view-ing behaviour is already evident as reported in Chapter 3 and as in YouTube audi-ence retention graphs1: videos have peaks in the graphs, implying users watch dif-ferent content and likely seek to find interesting parts (unfortunately these graphsare not generally public, and require voluntary publication by video owners). View-ing graphs often show a shallow negative exponential curve (i.e cold-start problem)from crowd-sourced data, which can be very simply filtered to highlight the mostpopular content. Likewise, viewing statistics can be used to filter out videos whereonly the first few seconds are watched.In this chapter, we focus on the design of visualizations for a single videoviewing history that support fast in-video navigation and quick scene search. Thischapter introduces two approaches for these visualizations: (1) using a list of ev-ery viewed record (presented in Section 5.1), and (2) using the viewing heuristicsas a summary of how the video has been consumed (described in Section 5.2).Section 5.1.1 describes how to visualize each viewed interval in a list of thumb-nails, whereas, Section 5.2.2 explains how viewing heuristics can be presented asa summarized Filmstrip. The evaluation of each approach looking at how each vi-sualization performs in comparison to the state-of-the-art is presented in Sections5.1.3 and 5.2.3. These user studies and their associated methods were approved bythe University of British Columbia Behavioural Research Ethics Board [certificate#: H08-03006]. Finally, Section 5.3 discusses the implications of each approachand directions for future refinement.5.1 Visualize History Using a List of RecordsOne of the earliest approaches that have been used to represent a browsing historywas a linear scroll-able list of the user’s traversed components with or withoutscreenshots of the components (i.e. thumbnails) where the most recently visitedcomponent is at the top of list. Most web browsers, YouTube, and Netflix use thisapproach to visualize the history of users’ visited content as shown in Figure 2.5,where clicking on one of the thumbnails navigates to the corresponding content.1 Visualize History Using a List of RecordsThus, due to the familiarity of this visualization among users, we are going toemploy it for visualizing a detailed in-video viewing history where every viewedinterval from the video is presented as a separate thumbnail.5.1.1 History Timeline as a Vertical List of ThumbnailsHistory Timeline uses a Filmstrip metaphor for visualization where it consists ofa list of thumbnails (similar to interactive storyboards) as shown in Figure 5.1.These thumbnails represent video intervals that the user watched. We representthis as a Video Segment component as discussed in the next section. Every timethe user seeks to a new temporal location, a new Video Segment is added to theend of the user’s history list and represents the interval a user watches in the mainvideo player. These video segments are ordered in chronological order so that anynew video segments are always added to the end of the visualization. We appliedthis approach instead of the reverse order commonly used in web browsers andYouTube because History Timeline is always visible along with the video player (asshown in Figure 5.2) and it automatically updates as the user watches or interactswith the video player. This was applied to avoid any confusion that could be causedby continuously changing the position of the previously seen segments as the userstarts watching any new segment. The History Timeline is a scroll-able box, toprovide navigation of the entire viewing history and allow the user to find a specificinterval.The video history is conceptually similar to bookmarks that are used to referback to visited pages; however, it is more complicated, as each Video Segment inthe history view is a video interval and also includes a representation of the timethe user watched it. Thus, representing both of these quantities requires additionalmechanisms for visualization and interaction. Moreover, as seen in Chapter 4, theremay be confusion with the currently viewed timeline and with the history viewssince two timelines exist: when/what the user watched and the video’s timeline. Weaddress this by showing the History Timeline vertically, while the video timelineremains horizontal, consistent with the user’s mental model of the video. Thehistory is unique in that it records in piece-wise linear user-time, not video time(we only display the history of intervals watched and do not represent the complete1315.1. Visualize History Using a List of Records(a) (b)Figure 5.1: The History Timeline represented as a video history (a), made upof individually search-able video segments (b)video from where it came from in the History view). Each Video Segment is asmall interactive video widget as described below.Video SegmentEach interval a user watches is represented as a seek-able, play-able, and drag-ablethumbnail with a temporal visualization as described in Section 4.3. Thus, eachthumbnail contains the viewed interval only, visualized by the starting frame ofthat interval. The temporal visualization shows the location of the interval withinthe entire video to help users spatially contextualize the temporal location of in-tervals within the complete video where it came from. Each thumbnail shown inFigure 5.1(b) is seek-able and play-able to allow users to easily search within theintervals and minimize the time needed to search for a previously viewed scene.On mouse over, a play/pause overlay is displayed over the widget. Clicking onthe overlay plays the corresponding video interval within this small widget only(Note: while technically, this is a re-watching of video, we do not add this activityin the History View.) Moving the cursor over the bottom third portion of the widgetpops up a zoom-in visualization of the interval timeline, which can be used to seek1325.1. Visualize History Using a List of RecordsFigure 5.2: Our video navigation interface: the majority of space is devotedto the currently playing video (top left) with a seek-bar preview; be-low is a horizontal array of Video Segments arranged by video-time(the Filmstrip), and a vertical array of Video Segments to the right (theHistory Timeline) ordered top-down by user-time i.e. the order in theintervals were viewed.within the video interval using the mouse motion as a cue. The seeking point isshown by a yellow line and the thumbnail updates to reflect the current seek posi-tion. To play a Video Segment in the main video player, users can drag the entirewidget (or the seek location in the widget) to the main video player: dragging theentire widget plays the corresponding interval from the beginning, while draggingthe seek bar location plays from that specific time. These interactions can be easilytransferred to a touch-screen device supporting today’s most common online videoconsumption platforms.5.1.2 List of Thumbnails in a Video Viewing InterfaceWe designed our interface in a way that allows users to play, view and navigatevideos similar to any video player as we envision our history management to be1335.1. Visualize History Using a List of RecordsFigure 5.3: The Filmstrip component visualizes the entire video into n equallength segments.used to augment it. As described in Section 4.4, since our interface records theuser’s navigation history, History Timeline was introduced to offer the users theflexibility of using this history for fast navigation. However, based on the results ofour previous study explained in Chapter 4, a few modifications have been appliedto this component as described earlier in Section 5.1.1.Our video viewing interface, shown in Figure 5.2, is based on three separatecomponents: a Player, which provides the user with a familiar video setting, aFilmstrip, which provides a navigation tool for the entire video content, and His-tory Timeline, which visualizes the personal viewing history using a vertical list ofthumbnails (described in Section 5.1.1). The main interface components are thePlayer and the History Timeline.PlayerA familiar video player as described in Section 4.3. This component is used fordirect control of the selected video. To watch a specific video from disk, usersclick on the “Open” button and choose their video file.FilmstripThe Filmstrip component, shown in Figure 5.3, is the state-of-the-art video navi-gation tool. It provides a visualization aid to different parts of a video for fasternavigation and supports access to the entire video content. The Filmstrip simplyconsists of a fixed number of thumbnails (n) from the playing video. The entirevideo is divided into n equal length intervals, where each interval is representedby a seek-able, play-able, and drag-able thumbnail with a temporal visualizationdescribed in Section 4.3. These intervals are created systematically based on thelength of the video and the number of segments to be visualized. We applied this1345.1. Visualize History Using a List of Recordsdesign since thumbnails are an accepted form of preview in nearly all digital videoretrieval interfaces. Moreover, the Filmstrip metaphor is commonly used to presentcontent of video as a navigation device, and is considered effective on desktop sys-tems [31], while also providing a quick summary. As such, we chose to employthe Filmstrip metaphor to aid video navigation.User Viewing History Timeline (History Timeline)This is the central component for visualizing the personal viewing history of theuser. As described in Section 5.1.1, every time the user seeks a video, a new recordis generated and hence, a new Video Segment is added to the end of the user’shistory list to represent the interval a user has watched. The History Timeline(Figure 5.1) updates its content as a user interacts with the video player illustratingwhen/what parts of the video a user has watched.5.1.3 Investigating the List of Thumbnails VisualizationAn extensive set of pilot studies were performed (12 participants total) to inves-tigate a user’s preferred scenario for history, and to triangulate the use cases inwhich a history is beneficial for video viewing. Throughout this triangulation pro-cess it became more apparent to us how complex and intricate the video-viewingtask can be. For tasks such as seeking to a specific time or finding a particularevent from a previously watched video, using a history was found to be as good asusing a Filmstrip methodology. Under more modern viewing patterns (e.g. non-linear viewing behavior) where users view only parts of the video (e.g. trailers,summaries, playlists or direct temporal links) using a history to find events andseek to them was found to be more efficient.Using this result from the pilot studies, a full comparative user study was per-formed to evaluate the design and performance of our interface and to demonstratethe utility of personal video viewing histories. We developed an evaluation pro-tocol that satisfies the use cases previously defined (without biasing our design orthe control) and provides the user with sufficient viewing history while keeping theexperiment relatively short. For fair comparison, we ensured our interface mimicscurrently adopted approaches with logical extensions.1355.1. Visualize History Using a List of RecordsUsing our protocol, we investigated whether visualizing and using a video nav-igation history would make searching for previously seen events more efficient. Weconducted a user study comparing the performance of tasks employing a personalviewing history (History Timeline) against those with the state-of-the-art naviga-tion method (Filmstrip) to find events within a previously seen video (i.e. findinganswers to questions similar to [44, 69, 92, 125]).Both methods have similar layouts and functionality, differing on the processof segment creation described earlier in the description of Filmstrip and HistoryTimeline. In Filmstrip, the intervals are created systematically, while in the HistoryTimeline, segments are constructed based on a personal navigation history. Eachparticipant tried both methods, on different videos. In Filmstrip tasks, the partic-ipants were only presented with the main Video Player and a horizontal Filmstripcomponent. They only used these components and their features to find the answerto each question. In History Timeline tasks, the participants were presented withthe main Video Player and a vertical History Timeline component.Our experiment was divided into two phases: phase 1 was conducted to com-pare the two methods (Filmstrip and History), and phase 2 was performed to quali-tatively analyze the entire proposed interface. In phase 1, participants used the twocomponents (Filmstrip, and History Timeline) of the interface separately, while inthe second phase they were exposed to all components and features.ApparatusThe experiment application was developed in Flash CS4 and ActionScript 3.0. Theexperiment ran on an Intel dual-processor dual-core 3 GHz Mac Pro desktop with8GB RAM and equipped with a 24” Dell LCD monitor with a resolution of 1920×1200 pixels at a refresh rate of 60Hz. A Microsoft optical wheel mouse was usedas the input pointing device with default settings and the Adobe Air environmentwas set to 1920×900 pixels while running the Flash program.ParticipantsTwelve paid university students (different from those in the pilot studies), 6 femaleand 6 male, participated in the experiment. Participants ranged in age from 19 to1365.1. Visualize History Using a List of Records30, all were experienced computer users and have either normal or corrected-to-normal vision. Participants reported watching videos either on a daily basis (tenparticipants) or 3-5 times a week (two participants). Each participant worked onthe task individually. Table 5.1 shows participants’ details.Table 5.1: Demographics summary for participants in the investigation of thelist of thumbnails visualization study.P GenderAgeMajorWatchingGroup Frequency1 Male 26-30 Engineering Daily2 Female 26-30 Forestry Daily3 Male 26-30 Computer science Daily4 Female 19-25 Business / management Daily5 Female 26-30 Computer science Daily6 Male 26-30 Computer science Daily7 Male 19-25 Engineering 3-5 times a week8 Female 19-25 Science 3-5 times a week9 Female 26-30 Natural sciences / medicine Daily10 Male 26-30 Engineering Daily11 Female 19-25 Natural sciences / medicine Daily12 Male 26-30 Education DailyDesignTo evaluate the efficiency of using the list of thumbnails as an aid for efficientsearch within a previously seen video, we gave participants a predefined history,which they had to watch and subsequently answer questions based on the content.We followed this procedure rather than allowing participants to create their ownhistory. This decision was made based on the pilot studies where participants foundcreating a history based on an unknown list of questions was confusing. Someparticipants said, “What interests me might not be what you are looking for.” Thus,they tried to create many history segments at each point where they thought therewas potential for a question. The result was a long list of short segments thatrequired significant scrolling when searching. In short, the uncertainty of whatshould be included on the list led to a large number of segments and created someconfusion that affected the performance while using the method.1375.1. Visualize History Using a List of RecordsIn order to tackle this problem, the participants could be allowed to create theirown history based on their interest after which they can be asked questions thatexist within these segments. However, the variation between participants’ interestswould make the comparison difficult. Thus, instead of asking participants to createtheir own history, we decided to give them all the same history from which toanswer the list of questions (analogous to watching a playlist created by a friend ora video summary). All participants watched the same clips from the video and wereasked questions only from these clips. Thus, we ensured they had experienced thesame clips and eliminated the personalization factor within the history. Our methodwas compared with the state-of-the-art method (Filmstrip). The History Timelinecontained only the segments of the predefined history while Filmstrip containedseven evenly divided intervals over the entire video.To run this evaluation, a set of videos needed to be chosen so that they meetcertain criteria for our study. We looked for videos that:• were salient enough for everybody so that they would pay attention andwould not get bored, which could affect their performance;• contained enough events of interest on these videos to accumulate historyvery quickly;• were relatively short to fit a laboratory controlled study;• had a narrative structure, which should favor and be fair to Filmstrip.Accordingly, in this experiment, we used five different short videos that meet theabove criteria (V1: Toy Story 3 trailer 2, V2: Geri’s Game3, V3: Partly Cloudy4,V4: Alma5, V5: For the Birds6). V1 was used to explain the interface’s compo-nents and features, and demonstrate how to use them. The other four videos wereused for the actual experiment where each participant used a single method for eachvideo. Each participant experienced each method in a different video to eliminate2 j4hIc43 Visualize History Using a List of Recordsthe learning effect. The number of questions per video were: V1: 20; V2: 38; V3:25; V4: 38; and V5: 32.The participants were divided into two groups, A and B. Both groups experi-enced the videos in the same sequence but with different method sequences. Theyboth started the familiarity phase where they tried a hybrid of the two methods toanswer example questions and to become familiar with both methods. Group A wasgiven Filmstrip for V2 and V4 and History Timeline for V3 and V5, while groupB used the opposite. Thus, the results were compared between the two groups foreach video.To quantitatively compare the results of the two methods, we measured sixvariables for each video:1. The percentage of questions answered for each video within a specified timerange. The total time given is 15 seconds multiplied by the total number ofquestions (chosen based on the pilot studies);2. The time needed to answer each question - this is the time measured betweentwo subsequent correct submission clicks;3. The number of wrong submissions (errors) for each question, which is mea-sured by counting the number of submission clicks that did not result in acorrect answer;4. The number of seeks within a video for each question;5. The number of previews performed in a video for each question, which ismeasured by counting the number of seeks and playbacks within thumbnails(either in Filmstrip or History Timeline);6. The accuracy of the submitted answer, which is measured by comparing thesubmitted interval with the ground truth interval. It is measured using aweighted factor where the accuracy at the edges of the interval is given 20%while the inclusion of the interval in given 80%. The calculation is as follow:accuracy = 20× edges accuracy+80× interval intersection1395.1. Visualize History Using a List of Recordsedges accuracy =(ge−gs)−abs(gs− ts)−abs(ge− te)ge−gsinterval intersection =g∩ tge−gswhere g is the ground truth interval, t is the submitted interval, s is the starttime of the interval, e is the end time of the interval, and g∩ t is the inter-sected duration between g and t.These counts were tracked to analyze the user’s behaviour and to investigatethe difference between the two methods.The participants were also asked to fill out questionnaires (Section B.3) andgive their comments on the entire interface, and suggestions for improvement.This was conducted to qualitatively compare the two methods and to provide anindication of the importance of correcting each particular aspect of the proposedinterface. Participants rated different interface features and tools using a 7-pointLikert scale. The experiment lasted approximately two hours per participant.ProcedureParticipants were given a description of the procedures to be employed in the study,informed of the goals and objectives of the study, and informed consent was re-quested to participate in the study. Two methods were evaluated in the study:Filmstrip and History Timeline.The researcher started by showing the interface (shown in Figure 5.4), describ-ing its components, explaining the functionality of the tools within the interfaceand how to answer a question in order to complete each task. Participants werethen asked to try the same interface with the first video (V1) and task. They weregiven the freedom to play with the interface and ask any questions. This familiaritystage was intended to allow the participant to try all the interface components andbecome familiar with the available functionality. The participants started watchingthe video provided without knowing the questions they will be asked.For each video, the task started by playing the segments from the predefinedhistory in the main video player. Participants were asked to watch the playback1405.1. Visualize History Using a List of RecordsFigure 5.4: The Experiment Interface illustrates the familiarity phase.and pay attention to the video content in order to answer questions later. Once allsegments from the history were viewed, the video paused and a list of questions(compiled by researchers) was displayed to the left of the main video player, asshown in Figure 5.4. Participants were asked to read these questions before click-ing on the “Start” button, to be able to ask for any clarification before starting theactual task. Once the “Start” button is clicked, the timer starts, the method’s cor-responding components were only shown and participants could begin providinganswers. Participants were advised to answer as many questions as they couldwithin the given time by watching the clip that contains the answer.The questions were randomly ordered (i.e. they were not temporally orderedaccording to their occurrence in the video). Each participant had the freedom tochoose the order that they would like to follow to answer these questions (e.g. tem-porally, linearly, difficulty, memorability, etc.). The participants started answeringthese questions by clicking on a “Start” button after reading the list of questions.In the familiarity phase, the participant could use either the Filmstrip or HistoryTimeline to navigate the video in an attempt to find an answer. To answer a ques-tion, the participant needed to re-watch the clip that contains the answer in themain video player. They navigated Video Segments in the provided component(Filmstrip or History Timeline) to find a clip or a video frame that contains the1415.1. Visualize History Using a List of Recordsanswer. Once they found the frame, they dragged the Video Segment of a specificvideo frame using the seek thumb of the Video Segment to the main video playerto start playing that clip. After watching the interval that contains an answer for asingle question, they clicked on a submit button beside the corresponding questionto proceed to the next question.Each submitted clip was automatically evaluated: it was considered correct if itcontains at least one frame from the ground truth answer. If the answer was correct,the question was faded out from the list and a message was displayed to participants(e.g. “Great. You have 2:14 to answer the rest. Hurry up.”) to encourage them andmaintain time pressure. However, if the answer was wrong, a message is shownwhich asks the participant to try again “Sorry. Try again”. Participants repeatedthe same actions until they answered all questions or the time elapsed. A newlist of questions was displayed and the participants continued applying the sameprocedure to answer these questions. The questions for each video were dividedinto two lists to avoid overloading participants with too many questions. A shortbreak was provided between question sets.After completing the second list of questions, the familiarity stage ended andthe participants were advanced to the experiment where they watched a new videoin similar settings to the watching phase of the familiarity stage. Upon the comple-tion of the watching phase, a new list of questions was given based on the contentof this video. In this stage, the user was provided with only one method (Filmstripor History Timeline) to be used in finding the answers. Once all questions wereanswered or the time given elapsed, the participants proceeded to the next videowhere they repeated the previous stage but with a different video and using a differ-ent method than the one used for the previous video. Participants continued tasksuntil they completed all four videos.During the experiment, participants were asked to fill out three questionnaires,shown in Section B.3. These questionnaires were designed using the standardQuestionnaire for User Interface Satisfaction (QUIS version 6.0 [27]), modified toreflect the functionality and tools applied and the usability of our interface. Somesections were removed (e.g. Terminology and system information, and System ca-pabilities), some questions were modified, and others were added. The first ques-tionnaire was after V4 in which participants evaluated the method they used for1425.1. Visualize History Using a List of Recordsthat video. The second questionnaire was after V5 where participants rated theusability of the second method they used. After completing the second question-naire, participants were given the full interface where they have the freedom tocreate their own history and explore use cases that might be applicable to them.Every performed seek within the video or any watched interval in the main videoplayer was recorded and visualized as a Video Segment in the History Timeline.This helped the participants to understand how history is created for later usage.Once they were satisfied playing with the interface, they were asked to fill out thelast QUIS questionnaire, which evaluates the entire interface.Results and DiscussionsThe study showed positive and promising results on the list of thumbnails visual-ization and its features. It showed a significant performance over the state-of-the-art method and participants preferred it to the Filmstrip.Method’s Performance: T-test analysis results, illustrated in Table 5.2, showedthat our method (History Timeline) was significantly faster than the state-of-the-art(Filmstrip), which allowed participants to answer more questions within the sametime. Additionally, History Timeline had significantly fewer thumbnail previewscompared to Filmstrip. Participants were significantly more precise in finding theanswers using History Timeline. However, both methods demonstrated similarbehaviour in terms of average number of seeks, and errors.For each video, some of the measured variables showed a significant differ-ence between the two methods, illustrated in Figure 5.5. There was a significantdifference in the percentage of questions answered for History Timeline and Film-strip in V2 (i.e. Geri’s Game); t(10)= 14.1, p < 0.001, V3 (i.e. Partly Cloudy);t(10)= 20.41, p < 0.001, and V4 (i.e. Alma); t(10)= 18.4, p < 0.001. In terms ofthe average time needed to answer a question, with History Timeline, participantstook significantly less time than Filmstrip in V2; t(1592)= 4.81, p < 0.001, andV4; t(1592)= 4.06, p < 0.001. V2;t(1592)= 5.78, p < 0.001, and V4; t(1592)=5.65, p < 0.001, showed also significant difference between the two methods interms of the average number of previews existed per question. With History Time-1435.1. Visualize History Using a List of RecordsTable 5.2: Results of the comparative study between list of thumbnails andFilmstrip for the answer search task, showing a significant advantage us-ing History Timeline in % of answered questions, time needed to answera question, previews, and interval accuracy. Note: SD = standard devia-tion; ns = not significant; average time per question measured in seconds.* p < 0.025Filmstrip HistoryMean Meant-test(SD) (SD)Total % of answered questions 64.54 83.21 -Mean % of answered questions62.12 73.2611.23*(15.20) (18.87)Average time per question21.18 19.392.27*(14.28) (12.35)Average no. of seeks1.55 1.49ns(1.29) (1.14)Average no. of previews27.79 24.002.49*(26.67) (24.89)Average no. of errors0.23 0.18ns(0.63) (0.56)Average accuracy70.34 76.074.40*(23.70) (20.09)line participants were significantly more accurate in selecting the required intervalfor all videos (V2: t(1592)= 4.07, p < 0.001, V3: t(1592)= 2.84, p < 0.005, V4:t(1592)= 6.35, p < 0.001, and V5: t(1592)= 4.75, p < 0.001). And lastly in termsof the average number of seeks, a significant difference between the methods onlyoccurred in V4 (t(1592)= 5.11, p < 0.001) where History Timeline had fewer ofseeks. Participants tended to make more errors when using Filmstrip.1445.1.VisualizeHistoryUsingaListofRecords(a) (b)(c) (d)Figure 5.5: The performance of List of thumbnails and Filmstrip in terms of: (a) Average percentage of answered ques-tions, (b) average time per question, (c) average number of previews, (d) average answer accuracy per question,for each tested video. List of thumbnails had significantly more questions answered, less time to answer eachquestion, less number of previews, and more accurate answers.1455.1. Visualize History Using a List of RecordsMethod’s Features Ratings: Each participant answered two identical question-naires (using questions 8-28 from Section B.3), one for each method, to comparethe overall reaction, the time it takes to learn and general impressions. Each ques-tionnaire contained thirty-seven 7-point Likert scale questions. The data collectedfor the two methods were analyzed and compared in terms of these 37 questions.A Wilcoxon Signed Rank Test revealed that the overall rating of the Filmstrip wassignificantly lower than the History Timeline, z = −1.73, p < .05. The HistoryTimeline showed no significant difference from the Filmstrip in all questions ex-cept for the question “Learning to use the method features” where (1:Difficult,7:Easy) History Timeline (Md = 6), and Filmstrip (Md = 5), z =−1.82, p < 0.04.This also coincided with participants’ preference where nine out of twelve partici-pants stated that the History Timeline was faster and easier in finding the answersfor the questions. Participants reported that the History Timeline was faster be-cause it is based on a personal mental context map created for the video. It waseasier for them to refer back to the corresponding segments when needed, whichwas not the case for Filmstrip. Since Filmstrip segments were created systemati-cally, participants needed at least to navigate one segment. If the answer was withinthe segment, they submitted it, otherwise they needed to navigate the preceding orsubsequent segment. This was also demonstrated by the participants’ quantitativeresults where they did not need to preview thumbnails so often and were able toanswer more questions using History Timeline. All questions for both methodswere rated above 5 out of 7 except for one question for the Filmstrip: “Using themethod is effortless” (Md = 4.5).Interface’s features Ratings: After acquiring some familiarity with the overallinterface in phase 2, participants were asked to complete a third questionnaire (Sec-tion B.3) to rate the entire interface. Responses were compiled for each of the 12participants in the study, along with any written comments that the participantshad. The overall mean rating of all sections of the QUIS was 5.73, on a 7-pointscale, and all questions were rated above 5.For the “Overall Reaction” section, the “Ease of use” overall rating was theonly factor that was rated significantly different from the mean response (M =6, SD = 0.48) where it was higher, indicating that users can easily learn and use1465.1. Visualize History Using a List of Recordsthe interface. The others seven ratings, Impressiveness (M = 5.58, SD = 0.54),Satisfaction (M = 5.17, SD = 0.51), Stimulation (M = 5.92, SD = 0.48), Perceived“powerful” (M = 5.83, SD = 0.56), Flexibility (M = 5.5, SD = 0.69), Helpfulness(M = 5.92, SD = 0.58), and Usability (M = 6.08, SD = 0.50) were not significantlydifferent from the mean user response level. From these “Overall Reactions” tothe interface, we can conclude that users found our interface easy to use, helpful,useful, and flexible. In the “Learning” section, the overall rating of all items werenot significantly different from the mean response except for one item “Steps tocomplete a task follow a logical sequence” (1:Never, 7:Always) (M = 5.83, SD =0.42), which was rated significantly more than the mean response. However, forthe general impressions section, the overall ratings for “Screens are aestheticallypleasing” (M = 6, SD = 0.43), “Screen designs and layout are attractive” (M =5.92, SD = 0.33), “Interface is impressive” (M = 5.75, SD = 0.61), “Interface cando a great deal” (M = 5.83, SD = 0.42), and “Interface is fun to use” (M = 5.75,SD = 0.57), were rated significantly better than the mean response. This indicatesthat users found our interface aesthetically pleasing and fun to use. Nevertheless,“Interface maintains one’s interest” was rated significantly lower than the meanresponse (M = 5.67, SD = 0.44), which might be due to the limited videos used inthe experiment. The remaining items of this section were not significantly differentfrom the mean response. For the “Satisfaction” section, all items were also notsignificantly different from the mean response.Interface’s Negative & Positive Aspects: In addition to the QUIS item findings,participants were also asked to list the three most negative and positive aspects ofthe interface. Some participants reported that “I found it weird to have a verti-cal list of video pieces”, “Vertical scrolling”, “Not being able to delete segmentsfrom history”, and “Not being able to favorite some segments from history.” Thecomment about the vertical scrolling coincided with their responses to the loca-tion of the component where the mean rating for the location of History Timeline(M = 5.1,SD = 0.79) was significantly lower than the location of the Filmstrip(M = 6,SD = 0.67), t(22) = 2.49, p = 0.021. However, some participants foundhaving two different orientations for the different components (i.e. horizontal Film-strip and vertical History Timeline) made it easier for them to differentiate. We de-1475.1. Visualize History Using a List of Recordssigned the component in this layout to eliminate the confusion between the HistoryTimeline and the video timeline; this may need additional investigation. Aboutfavouriting and deleting items from history, we are considering adding these fea-tures to the interface to help users manage their history and being able to filterit using the favourite items. Participants also commented, “The main frameworkcould be better if the vertical and horizontal list can be chosen to be disappeared”.We think having the History Timeline in a different mode (similar to web browsersand YouTube history) might help since it will reduce the number of thumbnailspresented at the same time in the interface, as well as prevent the dynamic changein the History Timeline as the user interacts with a video.Participants found the interface helpful and impressive in being able to dynam-ically create points or bookmarks, which would allow them to skip to a favouriteclip. This was also seen in participants’ response to how segments were created,where “Having a control over the creation of the segments” in the History Time-line (M = 5.92, SD = 0.79) was rated significantly higher than “Systematicallycreated intervals” in Filmstrip (M = 5.01, SD = 0.79). Participants appreciatedhaving control of what they are watching, being able to go back to seen intervals,and the ability to create video segments. One participant said, “If I create my ownbookmarks in the video, I may skip to a favourite song e.g. in a concert.” Mostparticipants stated that the interface is fun to use once you get familiar with it.Participants’ Vision of the Interface: In order to explore whether users foreseepotential for the interface we asked them “Where and how do they think videonavigation history can be useful?” Participants provided us with valuable responsesthat gave us some insight as to how this interface can be further modified andtested. They foresee that it would be useful in the educational environment as wellfor home usage. Some participants indicated its usefulness for sharing where it canbe used to pinpoint movies and clips for sharing with friends. Other participantspointed out its application for monitoring video consumption at home and at theoffice; “Home: parents can either set video clips that are allowed for their childrenor to monitor what they have watched. Office: employer can monitor if employeesare watching videos during office hours.” Some indicated that it would be usefulfor chaptering long videos and emphasize important parts. Others mentioned its1485.2. Visualize History Using Consumption Frequenciesapplication for educational lectures and tutorials, for example, “pinpoint importantclips within educational videos” and “create a summary of an educational lecture.”Some participants foresee using it to create tutorials/demos for a technical or aneducational presentation or creating song lists.Participant’s Comments: There were also some very positive comments made bythe participants about the interface. One participant commented, “I definitely seehow this would save my time because I will never need to watch the entire videoagain to find good stuff.” Others said, “Pretty impressive framework and dynamicresponse”; “The interface is impressive and it has potential usage in comparison toother video player interfaces”; and finally, “I would love to see this implementedwithin social websites. I could see how I would use it and definitely I will havemore fun.”Based on the participants’ valuable comments and suggestions, we believe us-ing personal navigation history is helpful in navigating a video space. This couldwork for different applications such as highlights or summary of videos, a movieusing home videos, sharing interesting clips, quickly navigating and skimming pre-viously watched videos, and finally watching new videos’ interesting parts usingcrowd-sourced viewing histories. Getting this positive feedback from the partici-pants and how they welcomed the idea of using their personal navigation historymotivated us to investigate other ways to visualize viewing history and how to vi-sualize multiple-video history. In the next section, we are going to present anotherapproach to visualize this viewing history and explore the usefulness of crowd-sourced metadata.5.2 Visualize History Using Consumption FrequenciesOnline video viewing has seen explosive growth, yet simple tools to facilitate nav-igation and sharing of the large video space have not kept pace. Our objective isto design a visualization that supports fast in-video navigation (play most popularor unseen parts), search (seek within intervals with prior knowledge e.g. ‘seen theevent before’ or ‘never seen the event’), preview, and instant sharing (share a singleinterval directly). To accomplish this, we propose the use of single-video view-1495.2. Visualize History Using Consumption Frequenciesing statistics generated from an individual’s video consumption, or crowd-sourcedfrom many people watching the same video; as the basis for visualizing video’sfootprints. Whenever a segment of video is played, the video ID and timestampsfor the interval’s start and end are recorded. An accumulated view count is main-tained for the video at a given resolution (e.g. 15 samples per second of video).This data can then be used to visualize how a video was consumed, create a sum-mary of a video viewing history, and to easily navigate through its clips.There are two different approaches to how this data can be visualized: (1) usingcolour or heatmap, or (2) using representative variable-sized thumbnails.5.2.1 Footprints Visualization Using Colour IntensityThe viewing history, or footprints, offers users a visualization of the view countof every point of time in the video or how often each part was viewed. This issimilar in concept to timeline footprints [82], as shown in Figure 2.2. The bluebar becomes brighter for the most-viewed intervals, and each one can be selectedto play. It provides the user with information on the most-viewed intervals of thatvideo (a temporally tracked frequency). Using the user’s viewing statistics, a viewcount is maintained for all points of time for each viewed video. As the user viewsa video, the view count will dynamically change, thus the count is continuouslyupdated and the information propagated backwards through the history. The viewcounts are then summed and normalized within each video. This visualization canbe also used to visualize the viewing statistics from multiple users (i.e. crowd-sourced), which can be combined to form the most popular intervals for a video.This provides users who are watching a video for the first time with a mechanismto watch what others have found to be most interesting. This is a fast navigationtechnique, predicated on trusting the crowd to be “correct” on interesting videocontent.Limitations of a Coloured TimelineApplying colour intensity to visualize users’ footprints in a video faces many prob-lems such as:1. There is no in-place preview, which makes it impossible to determine the1505.2. Visualize History Using Consumption Frequenciescontent from the footprints. For example, users cannot tell what the contentis of the most viewed part.2. It requires users to navigate to any part to reveal its content.3. There is no direct interval sharing.4. Having multiple variations of the colour to indicate the intensity makes ithard for users to distinguish these frequencies.Figure 5.6: A Coloured Timeline visualizes a user’s navigation footprintsover a video Timeline using colour intensity. The more an intervalwithin a video is watched the brighter that region becomes.5.2.2 Footprints Visualization Using Variable-sized ThumbnailsThis approach also uses viewing statistics (personal or crowd-sourced) as the basisfor modification to the well-known Filmstrip visualization [31], to create our pro-posed visualization, which we call View Count Record (VCR). The VCR employs aFilmstrip metaphor since thumbnails are accepted as a standard on which nearly alldigital video retrieval interfaces are built upon. Moreover, the Filmstrip metaphoris commonly used to present content of video as a navigation device. It is a pre-sentation scheme for abstracting information in a digital video segment, which isconsidered effective on desktop systems for video retrieval [31]. Furthermore, theFilmstrip communicates shot information all at once in a static form, which is aquick method for showing video content as opposed to playing it.The VCR, shown in Figure 5.7, uses a variable thumbnail size (and variableinterval length) to reflect the relative popularity of intervals. We used size insteadof colour since we are representing intervals using thumbnails where colour dis-crimination may be confused with the thumbnail content and would be difficult todifferentiate for some videos. We also piloted the visualization using a colouredfrequency bar attached to each thumbnail to indicate the popularity (Chapter 4);while the information was welcomed, the visualization was reported as cluttered.1515.2. Visualize History Using Consumption Frequencies(a) Filmstrip (equivalent to VCR without visualization of viewing statistics)(b) View Count Record (VCR)Figure 5.7: The View Count Record (VCR) component visualizes the videoviewing statistics. When no viewing history is available, the VCRpresents a familiar Filmstrip (a). When a history is available, our viewcount manipulation algorithms can be applied to visualize popular in-tervals within the video, leading to fast personal navigation and socialnavigation applications. Each thumbnail can be searched via seekingin the popup red time bar when hovering the cursor over the previewimage.The VCR applies a histogram visualization similar to [69] but it uses thumbnailsinstead of a time-series graph, where the height of these thumbnails indicates theheight of the histogram bars. It consists of a fixed number of size-able video seg-ments7 (described in Section 5.1.1). The duration and size of each segment is basedon how often its corresponding interval has been viewed. If no viewing statisticsare available, the VCR appears as a normal Filmstrip as shown in Figure 5.7(a). Asthe user consumes video, the VCR updates its segments accordingly.VCR ConstructionThe construction of the VCR, illustrated in Algorithm 1, starts by gathering inter-vals of time in which consecutive frames have equal view counts. While there areintervals less than a set threshold, the algorithm attempts to merge these intervalswith one of their neighbouring intervals. The neighbour to merge with is deter-7In our interface, we used 6 segments based on the width of the interface and the maximum widthof a single VCR segment.1525.2. Visualize History Using Consumption Frequenciesmined by two criteria: first by the difference in view counts, and if the differenceof view counts are equal, then by the duration of the neighbouring intervals. Themerging process chooses the smallest difference in view counts, or the smallestinterval duration. This process repeats until all intervals’ duration are greater thanthe preset threshold.Algorithm 1 VCR construction: every VCR segment duration and view count(VC) is measured based on other existing segments.Retrieve segments S = {S1, S2,. . . , Sn} using Algorithm 2if n > no. of thumbnails in the VCR thenrepeatRetrieve peak segments P = {P1, P2,. . . , Pk}if k > no. of thumbnails in the VCR ÷2 thenp← P(index(min(P))).sIndexelsep← index(min(S.VC))end ifif Sp−1.VC = Sp+1.VC thenm← index(min(Sp−1.dur,Sp+1.dur)elsem← index(min(abs(Sp.VC−Sp−1.VC),abs(Sp.VC−Sp+1.VC)))end ifSm.VC← (Sm.VC×Sm.dur +Sp.VC×Sp.dur)÷ (Sm.dur +Sp.dur)Sm.dur← Sm.dur +Sp.durremove Spuntil S.length≤ no. of thumbnails in the VCRelsewhile S.length < no. of thumbnails in the VCR dop← index(max(S.dur))insert new segment s at p+1Sp+1.VC← Sp.VCSp.dur← Sp.dur÷2; Sp+1.dur← Sp.durend whileend ifdraw {S1, S2,. . . , SS.length}Upon the completion of the merging process, the VCR contains a set of inter-vals with durations that are greater than the preset threshold. However, since the1535.2. Visualize History Using Consumption FrequenciesAlgorithm 2 Segments Merging: every segment with a duration less than a prede-fined threshold is merged with its closest neighboring segment.Retrieve segments S = {S1, S2,. . . , Sn} using Algorithm 3for each s ∈ S doif Ss.dur < threshold thenif Ss−1.VC = Ss+1.VC thenm← index(min(Ss−1.dur),Ss+1.dur)elsem← index(min(abs(Ss.VC−Ss−1.VC),abs(Ss.VC−Ss+1.VC)))end ifSm.VC← (Sm.VC×Sm.dur +Ss.VC×Ss.dur)÷ (Sm.dur +Ss.dur)Sm.dur← Sm.dur +Ss.durremove Ssend ifend forAlgorithm 3 Frames Gathering: every consecutive frames with equal view countare gathered into one segment.Retrieve frequencies of video frames from user’s heuristics F = {F1, F2,. . . , Fn}S = {}index← 1; S1.VC← F1; S1.dur← 0for each f ∈ F doif Ff 6= Sindex.VC thenindex← index+1Sindex.VC← Ff ; Sindex.dur← 1;elseSindex.dur← Sindex.dur +1end ifend fornumber of items in the visualization component is limited, we must reduce the setof intervals to match. Thus, we look at the peaks of the view count graph, keepthe highest peaks, and merge the other intervals until we get to the desired resolu-tion. Conversely, if we do not have enough intervals, we linearly sample and splitintervals until we have enough.We then create our visualization by using a size-able Video Segment compo-1545.2. Visualize History Using Consumption FrequenciesFigure 5.8: Each video thumbnail in the VCR is visualized as small videosegments. Each segment is seek-able and play-able on mouse events.The red/gray portion at the bottom of the widget indicates the temporallocation of its interval within the complete video. The yellow line illus-trates the current seeking point within the thumbnail, within the zoomedinterval for higher-resolution seeking.nent (described in Section 5.1.1), shown in Figure 5.8, for each interval. The sizeof each segment is based on a ratio of the current segment’s view count to the max-imum view count for the video. The VCR updates automatically when the videois paused, based on the latest viewing statistics (it does not update while view-ing so as not to distract from the main video). It illustrates to the user how theyconsumed the video and which parts were viewed the most/least. This provides asimple mechanism to find or navigate back to these segments when needed.VCR ScalabilityThe construction and visualization of the VCR is based on the video used, thenumber of peaks, and the interface size, which are independent of the platformused. It is not affected by the length or duration of the video being visualized sincethe algorithm as described in Section 5.2.2, merges (or linearly samples and splitsintervals) until the VCR gets to the desired resolution (i.e the required numberof segments). However, due to the limited space and the fixed number of videosegments, some medium-height peaks may be diminished and not easily viewedin the VCR. To alleviate this problem, the interface supports a zoom feature (viamouse wheel) where the selected video segment expands and is represented by itsown VCR with the same number of segments. When a segment is zoomed-in, the1555.2. Visualize History Using Consumption FrequenciesVCR updates to visualize segments within that zoomed segment only and hides anyother segments. Thus the VCR always uses the same number of video segments.Personal and Crowd-Sourced VCRThe VCR can be used to present either personal viewing statistics or crowd-sourcedmetadata. It uses the same algorithm described in Section 5.2.2. In a crowd-sourced VCR, the viewing count is gathered from multiple users and then usedin the algorithm to construct the VCR. Having both options will allow users tocompare their viewing behaviour with the crowd and for unseen videos the crowd-sourced VCR will give them suggestions on what to watch from the questionedvideo.Advantages over Coloured TimelineApplying size-able thumbnails to visualize how users viewed a certain video haveresolved the issues presented by a coloured Timeline. It allows users to easily in-dicate the content of most popular parts of the video. Overlapping peaks can beeasily distinguished by just scrolling the mouse wheel over the intended segment,which reduces the need of navigating or seeking to each peak. Thus, having dis-tinctive segments allows fast sharing of these segments. Similar to the colouredtimeline, VCR provides quick navigation to different parts of the video, but, VCRalso provides fast search of content without the burden of continuous seek or clickson different peaks.5.2.3 Investigating View Count Record (VCR) VisualizationWe designed a comparative study to evaluate our navigation tool and see how thehistory would help people to access their previously seen clips to share one oftheir liked segments. Our aim was to investigate 1) if our visualization of videonavigation provides faster search for user-specific affective intervals, and if usersprefer our visualization for this task; 2) if crowd-sourced histories provide goodsummaries of video. Participants were asked to find and share their favourite inter-vals using either the VCR or Filmstrip visualizations. In this study we went with apersonalized approach for the search task to get a better insight on how this would1565.2. Visualize History Using Consumption Frequencieswork for real scenarios. We compared against the Filmstrip design instead of thefootprints’ coloured timeline [82] for several reasons: footprints does not easilylet a user directly select or share a full interval; video cannot be previewed insidethe footprints visualization (VCR and Filmstrip can both directly preview withoutseeking); VCR and footprints could be used together, so we believe a comparisonagainst Filmstrip is more informative.In this study, we hypothesized that users would perform better when usingVCR in terms of time needed to find their previously seen liked segments, and lessnavigation would be required. In terms of the agreement with the crowd-sourcedhistory, we anticipated that users would appreciate having VCR as a recommen-dation tool for unseen videos. However, in terms of participant agreement witheach single segment from the crowd-sourced history, we expected a big variationbetween participants and videos. For method preference, we expected that par-ticipants would like having a VCR with history available over when no-history isvisualized.ApparatusThe experiment application was developed in Flash CS4 and ActionScript 3.0. Theexperiment ran on an Intel dual-processor dual-core 3 GHz Mac Pro desktop with8GB RAM and equipped with a 24” Dell LCD monitor with a resolution of 1920×1200 pixels at a refresh rate of 60Hz. A Microsoft optical wheel mouse was usedas the input pointing device with default settings and the Adobe Air environmentwas set to 1920×900 pixels while running the Flash program.ParticipantsTen paid volunteers, 6 female and 4 male, participated in the experiment. Partici-pants ranged in ages from 19 to 40. Three of the participants were undergraduatestudents while the rest (i.e. seven participants) were from the general public (non-academic). Each participant worked on the task individually. All participants wereexperienced computer users and have normal or corrected to normal vision. Sevenparticipants watch online videos on a daily basis, two watch videos 3-5 times aweek, and one watches once a week. Five of the participants watch 1-3 videos on1575.2. Visualize History Using Consumption Frequenciesaverage per day, while three watch 4-6 videos per day and two watch more than 10videos per day on average. Table 5.3 presents participants’ details.DesignTwo different navigation modes were tested: no-history (Filmstrip) and with-history(VCR). The case where no-history was available was represented by the state-of-the-art Filmstrip, shown in Figure 5.7(a). Each participant tried both modes tonavigate and share their preferred parts of the video. Participants were dividedequally into two groups where Group 1 used Filmstrip first and VCR second, andGroup 2 had the order reversed. Participants freely watched a set of 5 differentvideos (Disney short animations) between 3 and 5 minutes long. These videoswere chosen based on the same criteria listed in previous experiment, described inSection 5.1.3. The selected videos were: V1: One Man Band8, V2: Partly Cloudy9,V3: Day & Night 10, V4: For The Birds11, and V5: Presto12. Video length does notaffect the VCR as mentioned in Section 5.2.2; however, due to the time constraintsof the experiment, short videos were tested.Each participant performed a total of 14 search tasks (2 modes × 7 segmentsper mode); they were asked to perform as quickly as possible. For each task, thecompletion time, the number of previews, and the number of zoom events wererecorded. The completion time was measured from when the participant clickedon a ‘Find’ button until the moment they found the correct interval (confirmed bythe researcher). The navigation behaviour and statistics were recorded during theviewing phase. The participants were also asked to rank the best mode based onspeed, ease and preference.Upon the completion of the sharing tasks, participants started the second taskwhere they were shown a short version of each video, automatically created fromcrowd-sourced histories (described in Section 5.2.3). Participants were asked whetherthe shortened version was a good summarization and whether each segment in theshortened video matched their own affective segments. The final task was to fill out8 5.3: Demographics summary for participants in the investigation of the VCR visualization study. (Note: WMP= Windows Media Player, QT = QuickTime, VLC = VLC Media Player, RP = Real Player, KMP = KMP Player,Gom = Gom player, M = Mplayer, YT= YouTubeP GenderAgeVisionWatching Videos Familiar FrequentGroup Frequency per Day Players Player1 Male 31-40 Normal Daily 1-3 videos iTunes, QT, RP, QTVLC, WMP2 Male 26-30 Corrected Daily more than iTunes, M, QT, YT, KMP10 videos RP, VLC, WMP3 Male 19-25 Normal Daily 4-6 videos iTunes, M, QT, iTunesRP, VLC, WMP4 Female 19-25 Corrected Daily 4-6 videos QT, VLC, WMP WMP5 Female 19-25 Corrected 3-5 times 1-3 videos RP, VLC, WMP VLC, WMPa week6 Female 19-25 Normal 3-5 times 1-3 videos iTunes, VLC, VLCa week WMP7 Female 26-30 Normal Once a week 1-3 videos RP, VLC RP8 Female 19-25 Normal Daily 1-3 videos iTunes, RP, VLCVLC, WMP9 Male 31-40 Normal Daily more than RP YT, RP10 videos10 Female 19-25 Normal Daily 4-6 videos VLC, WMP, WMPGom1595.2. Visualize History Using Consumption Frequenciesa questionnaire to rank the modes, and provide feedback on the interface (attachedin Section B.4). The experiment lasted approximately one hour per participant.ProcedureThe experiment proceeded as follows:1. The researchers gave the participants a walkthrough of the interface explain-ing the functionality of each feature and tool, and their effects when appliedto a video. This stage took approximately five minutes. The participantswere allowed to try the interface and ask any questions during this stage.2. Participants were then asked to completely watch the five different videosand re-watch any parts they wanted. Only one video was played at a time.Participants’ navigation behaviour was stored to be used for the searchingtask. To ensure all participants had seen an equivalent video content, thetasks began after all videos were viewed.3. Once each video was viewed, participants were asked to list five differentintervals they would like to share; these were recorded by the researcher.Once the participants named the segments, they were advanced to the nextvideo.4. When all videos were completely viewed, the researcher chose an event fromthose provided by the participant that they had to find, one event at a time.Events were chosen from different videos so that no consecutive search taskscame from the same video. The search task began by clicking ‘Find’, choos-ing a video from a grid of thumbnails, and then the navigation layout for thecurrent mode was displayed - the participant used this to find an interval rep-resenting the event. The interval is submitted for consideration by playingit in the main player: if approved by the researcher as correct, the task iscomplete. Participants proceeded to find and share the next event. Upon thecompletion of finding the seven segments, the participants were advanced tothe next mode to find another seven segments.5. The first stage of the experiment ended once the participant had experi-mented in two modes where they found seven segments for each mode.1605.2. Visualize History Using Consumption FrequenciesAfter which, a short version of mostly viewed segments of each previouslywatched video based on the crowd-sourced statistics was played. The crowd-sourced heuristics was an aggregation of a navigation history of 6 people,described in Section 5.2.3, who are different from the experiment’s partici-pants.6. Once a shortened video stopped playing, participants were asked if theythought the shortened version was a good summarization and whether eachsegment in the shortened version matched their own affective segments.7. The experiment ended when participants had ranked the five different short-ened videos. Finally, the participants were asked to fill out a questionnairewhere they ranked the modes and gave their feedback and comments aboutthe interface, its features and their experience.Crowd-Sourced Data CollectionSix graduate students (2 female, 4 male, aged 24 to 37), completely separate fromparticipants in this study, voluntarily participated in the crowd-sourced data collec-tion. Participants were invited prior to the experiment to freely watch and navigatethe same set of videos while their viewing statistics were recorded. Their data wasthen aggregated and visualized using the VCR. At least nine peaks existed for eachvideo. However, due to the experiment time constraints (one hour), we decidedto use only the highest four peaks of each video in the shortened videos that weretested. Figure 5.9 illustrates the crowd-sourced data for “One Man Band” video,highlighting the segments chosen for the shortened video, and the VCR of this datais shown on top of the graph.Results and DiscussionsMost participants commented that they enjoyed their time using the interface andthey foresaw its applicability as a navigation aid for unwatched videos where socialnavigation can be leveraged for the benefit of future viewers, as well as a summa-rization tool for their own videos. Participants were impressed by how closelythe crowd-sourced popular intervals matched their own preferences for favourite1615.2. Visualize History Using Consumption FrequenciesFigure 5.9: The crowd-sourced data of the “One Man Band” video. The high-est four peaks used for the shortened video are highlighted in yellow.The View Count Record (VCR) visualizing this crowd-sourced viewingstatistics is illustrated on top of the graph.intervals, confirming that in most cases this would provide an effective tool fornavigating new video.Search Task: The main task in the experiment was to search for previously seenpreferred intervals: each participant was able to complete each search task in lessthan one minute (for all 14 trials). A paired-samples t-test analysis determinedthe significance of the results in terms of the average completion time per searchand the average number of previews per search. The analysis, shown in Table 5.4,demonstrated that the search task using Filmstrip took significantly more time thanwith the VCR. Participants were asked to rank the different modes for preference,ease and speed: they ranked the VCR as the most liked, easiest and fastest mode,1625.2. Visualize History Using Consumption FrequenciesTable 5.4: Results of the comparative study for the interval retrieval task,showing a significant advantage using our method (VCR) in terms ofcompletion time. Note: SD = standard deviation; completion time mea-sured in seconds. * p < 0.03Filmstrip VCRMean SD Mean SD t-testCompletion Time 24.31 10.42 21.11 5.38 2.28*No. of Previews 40.39 35.41 35.53 27.56 0.88which coincides with the quantitative results. This indicates that having access tothe user’s personal navigation record is useful for finding previously-seen contentwithin video, and that our visualization cues (e.g. size) of the mostly watchedsegments helped users to quickly and easily navigate to the correct intervals.In terms of the average number of previews, the results revealed no significantdifference between the two modes, which we did not anticipate. This may be dueto the fact that many view count peaks can exist within a single video segment ofthe VCR, and that some segments ended up much smaller in size, which made itharder to navigate. When analyzing the participants’ navigation history, we foundthat participants created 11 history segments on average per video. This means thatwhen using heuristics some VCR segments had more than one peak since there areonly 6 segments in the VCR. However, as we mentioned in Section 5.2.2, we addedthe zoom functionality to mitigate this. Participants rarely used the zoom featureand preferred to navigate through these segments instead, which explains the largenumber of previews.Agreement With Crowd-Sourced VCR: All participants agreed that the short-ened video (created automatically using the crowd-sourced data) was an effectivesummary of the video content. Before using the interface, participants were askedwhether they would use others’ recommendations as a tool for navigating unseenvideos; we were most interested in discovering if participants’ views would changeafter using our interface. Most (9 out of 10) said they would not use recommen-dations; however after using the interface and viewing the shortened video they1635.2. Visualize History Using Consumption Frequenciesexpressed surprise at the quality of the summary. Participants mentioned that hav-ing the crowd-sourced VCR would save time, especially for long videos, since theycan decide whether to watch the entire video or just the summary, or even just partsof the summary.For each video, participants were asked to rank each segment derived from thecrowd-sourced data. At least eight out of ten participants agreed that each segmentrepresented something they liked or illustrated an affective clip. Out of a total of20 segments, 7 segments were liked by eight participants, 8 segments were likedby nine subjects, while the remaining 5 were liked by all participants. Some ofthe segments that were not liked by some participants were either due to religiousbeliefs or perceived as violent content, while other participants considered thesesegments to be funny. Looking at participants’ viewing heuristics for each video,as shown in Figure 5.10, also revealed high agreement with the crowd-sourcedsegments, which matches and justifies participants’ rankings. We expected thevariation between participants; however, we did not predict the generally high levelof agreement. This suggests implicit tagging of video from many users may serveas a valuable navigation tool for online video.Participants’ Viewing Heuristics: Participants’ viewing heuristics were gatheredfor each video to be compared with the crowd-sourced metadata and explore thefeasibility of using viewing heuristics as a recommendation tool. The new cumula-tive view count for each video showed a similar trend to the crowd-sourced meta-data but with less variations. Figure 5.11 illustrates the new collective heuristicsfor “One Man Band” along with the crowd-sourced view count of the six volun-teers used in this experiment. They have similar trends with a cleaner graph forthe new collected view counts. Having more users viewing the same video willremove most of the noise in a graph leading to a more representative summary ofthe video. New viewers can then use these summaries to judge whether or not towatch the questioned video.Looking at each participant’s viewing heuristic compared to the crowd, it showedsimilar behaviour between the video portions that participants watched more thanonce. Individual peaks almost aligned with the highest peaks of the crowd-sourcedmetadata. For example, for the video “Partly Cloudy”, all participants had more1645.2. Visualize History Using Consumption Frequencies(a) (b)(c) (d)(e)Figure 5.10: The accumulative view count for each video: (a) One ManBand, (b) Partly Cloudy, (c) Day & Night, (d) For The Birds, and (e)Presto. The segments used for each shortened video are highlightedin yellow. There is a high agreement between view count peaks andselected crowd-sourced segments.1655.2. Visualize History Using Consumption FrequenciesFigure 5.11: The crowd-sourced data of the “One Man Band” video alongwith the new cumulative view count. A similar trend appears in bothgraphs but the new collected data shows cleaner data with distinctivepeaks.than one peak aligned with crowd-sourced highest peaks with the exception ofparticipants 4 and 6 who had only one (Figure 5.12). This high alignment jus-tifies the high agreement for the clips used in each shortened video. Moreover,comparing each individual viewing behaviour with the collective behaviour fromall participants showed at least five matched re-watched segments per video forall participants as shown in Figure 5.13. This coincides with what we saw in thecrowd-sourced data, which proves that crowd-sourced data can provide a potentialvideo summaization tool.We also looked at the list of the events participants named for each video thatthey would possibly share with others. The total number of different named eventsper video is illustrated in Table 5.5 with the number of participants per event ineach video. There were at least four (out of 5) events that were listed by more thanfive participants, which indicates around 80% agreement per video on events tobe shared. Looking at individual listings showed that almost all of the events the1665.2. Visualize History Using Consumption FrequenciesFigure 5.12: Difference between individual behaviours and crowd-sourceddata for “Partly Cloudy”. All participants had more than one peakaligned with crowd-sourced highest peaks with the exception of par-ticipants 4 and 6 who had only one. Highest peaks are highlighted inyellow.participants listed match the peaks in their personal viewing history. This provesthat the re-watching behaviour coincides with affective parts in a video (similar to[12]), which can offer a simple tool for clips recommendations within a video forsharing.Features’ Ranking: From the aggregated results of the questionnaire (measuringease-of-use and usefulness), the average ranking across all components and fea-tures was 5.82 out of 7. All features were ranked above 5 except for three items:getting started ((M = 4.5), remembering how to use the interface (M = 4.6), andusing the zoom (M = 4.3). The zoom scored slightly lower due to the mouse wheelsensitivity being reported as too high, which led to some participants becomingconfused or frustrated. This could also explain the low usage of this feature while1675.2.VisualizeHistoryUsingConsumptionFrequenciesOne Man Band Partly Cloudy Day & NightFor The Birds PrestoFigure 5.13: Participants’ viewing heuristics for each tested video. Intervals that are re-watched by more than fiveparticipants are highlighted in yellow. A high agreement between participants re-watched segments can be seenfor each video where there are at least five matched segments.1685.3. DirectionsTable 5.5: Agreement between events participants listed for each video.There are at least 4 (out of 5) events that were listed by at least 50%of the participants. Note: V2: One Man Band, V3: Partly Cloudy, V4:Day & Night, V5: For The Birds, and V6: Presto# of# of Participants per EventEventsV2 19 8 7 6 5 5 3 3 3 2 2 1 1 1 1 1 1 1 1 1V3 15 6 6 6 5 4 4 4 3 3 3 3 3 2 1 1V4 19 7 7 6 5 5 3 3 3 3 2 2 2 2 1 1 1 1 1 1V5 12 8 7 6 5 5 4 4 4 3 2 2 2V6 19 8 7 5 5 4 4 3 3 2 2 2 1 1 1 1 1 1 1 1performing the tasks where only two participants used it for 4 tasks out of 140 tasks(10 participants × 2 modes × 7 tasks) when searching for events. This has beentaken into account for future versions of the interface. Overall participants appre-ciated the zoom since it enabled them to get a more detailed view of the video’scontent.Participants’ Feedback: There were some positive impressions and commentsmade by the participants about the interface. One participant commented, “YouTubestatistics has already a feature that shows you how others viewed your video. Whydon’t you employ your tool there? It will really help me decide what to watch.”Others said, “Is this available in any online videos? Can we try it in YouTube orVimeo?”; “It is really cool and easy to use. When are you going to apply this to on-line video websites?”; and finally “I didn’t expect others’ history would be useful,but, you showed me it is.”5.3 DirectionsOur current visualizations were limited to a single video viewing history and theywere tested in a controlled laboratory experiment for a short period of time. Thus,we aim to deploy a field study to check the validity and the scalability of thesevisualizations and navigation mechanism. We intend to explore how users respondto these mechanisms, in conjunction with the presented VCR, via a field study1695.4. Summaryutilizing online video. Extensive data will help determine general users’ currentviewing behaviour for all types of video, and how it changes when given a VCR andother methods based on viewing statistics. We plan to investigate the validity andacceptance of crowd-sourced data as a basis for video navigation, summarizationand teasers generation.5.4 SummaryViewing heuristics were generated from individual video consumption, or crowd-sourced from many people watching the same video; both provide quick toolsfor navigating, searching and generating video summaries. In this chapter, wehave presented list of thumbnails and VCR, two different approaches to visualize asingle-video viewing heuristics that provide simple navigation, search, preview andsharing of video intervals. They establish a new way to navigate and view a videospace using a personal or crowd-sourced video history. We performed user studiestesting these approaches that found positive significant results and highly positiveaffect and comments from participants. Through these studies, we have demon-strated that applying users’ viewing history significantly improved the search andnavigation through videos. Moreover, using crowd-sourced data as a tool for rec-ommending segments within videos (i.e. social navigation) was found to be ap-preciated, and we confirmed that the summaries generated from crowd-popularsegments were effective at communicating the content of video. The VCR and listof thumbnails were rated highly by users who recommend integrating these mech-anism into online video websites.In the next chapter, we look at how to visualize multiple-video history and howto extend the list of thumbnails approach, presented in this chapter, taking intoconsideration participants’ comments and feedback.170Chapter 6Multiple-Videos HistoryVisualizationsVideo navigation histories are a simple archive that a person can use to easily finda previously viewed video interval. They can navigate to the exact location withinthe original video by simply clicking on the references within their history. Thisprovides the user with a record for historical navigation and removes much of theburden of relying on memory. However, finding previously viewed content in anavigation history is often a difficult task due to the design, organization, and vol-ume of information to visualize. In the previous chapter, we proposed and evalu-ated two different approaches to visualize users’ viewing history that showed betterperformance over the state-of-the-art methods. However, those visualizations aredesigned for a single-video viewing history that lacks the capability of visualizingand managing users’ entire viewing history of a video space. Thus, the goal of thischapter is to extend the work presented in Chapter 5 for multiple-video history bytesting different design layouts that support user-centred management of history,and to evaluate the benefits this brings.In this chapter, we describe a Video History System (VHS) framework in Sec-tion 6.2.1 that offers users a platform to track and manage their video viewingand navigation history. Section 6.3 presents our proposed visualizations of a de-tailed multiple-video navigation history: Video Tiles and Video Timeline. Theseare both part of the Video History System (VHS) framework, and utilize the same1716.1. History Visualization Considerationsunderlying representation. Section 6.5 reports the results of evaluating the historyvisualizations against the state-of-the-art method. Finally Section 6.6 addresses thelimitations, refinements and directions for future research.6.1 History Visualization ConsiderationsVideo viewing history visualization as described in previous chapters, is more com-plex than web browsers’ history. Based on the results of our s