Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

It’s over there : can intelligent virtual agents point as accurately as humans? Wu, Fan 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2021_may_wu_fan.pdf [ 28.11MB ]
JSON: 24-1.0394857.json
JSON-LD: 24-1.0394857-ld.json
RDF/XML (Pretty): 24-1.0394857-rdf.xml
RDF/JSON: 24-1.0394857-rdf.json
Turtle: 24-1.0394857-turtle.txt
N-Triples: 24-1.0394857-rdf-ntriples.txt
Original Record: 24-1.0394857-source.json
Full Text

Full Text

It’s Over ThereCan Intelligent Virtual Agents Point as Accurately asHumans?byFan WuB.Eng., Zhengzhou University, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)October 2020c Fan Wu 2020The following individuals certify that they have read, and recommend tothe Faculty of Graduate and Postdoctoral Studies for acceptance, the thesisentitled:It’s Over There: Can Intelligent Virtual Agents Point as Ac-curately as Humans?submitted by Fan Wu in partial fulfillment of the requirements for thedegree of Master of Applied Science in Electrical and ComputerEngineering.Examining Committee:Sidney Fels, Electrical and Computer EngineeringSupervisorTim Salcudean, Electrical and Computer EngineeringSupervisory Committee MemberDongwook Yoon, Computer ScienceSupervisory Committee MemberiiAbstractTo support e↵ective pointing interactions with an intelligent virtual agent(IVA), the first question to answer is how accurately users can interpretthe direction of IVAs’ pointing. In this thesis, we designed an IVA and in-vestigated its capability to point to the real world as accurately as a realperson. We used a spherical Fish Tank Virtual Reality (FTVR) display as itprovides e↵ective 3D depth cues and is situated in real-world coordinates, al-lowing IVAs to point to the real world. We first conducted an experiment todetermine the pointing cue, a fundamental design factor of our IVA. Specif-ically, we evaluated the e↵ect of head and hand cues on users’ perceptionof the IVA’s pointing. The findings provide design guidelines for selectingpointing cues in virtual environments. Following the guideline, we furtherdetermined our IVA’s other design factors, including appearance, subtletieson how it points, with rationales elaborated. Using our designed IVA, weconducted an experiment to investigate the di↵erence between the IVA andnatural human pointing, measured by users’ accuracy of interpreting thepointing to a physical location. Results show that participants can inter-pret the IVA’s pointing to a physical location more accurately than the realperson’s pointing. Specifically, the IVA outperformed the real person in thevertical dimension (5.2% less error) and yielded the same level of accuracyhorizontally. Our IVA design mitigated the pointing ambiguity due to theeye-fingertip alignment commonly found in human pointing, which may ac-count for the IVA’s higher pointing accuracy. Thus, our findings providedesign guidelines for visual representations of IVAs with pointing gestures.iiiLay SummaryIt is believed that IVAs with pointing gestures enabled would enrich the ver-bal communicative channel and promote an ecient human-like interaction.However, designing an IVA that can point to locations in the real world asaccurately as a real person is a challenge. In this thesis, we designed a 3Dvisual representation of an IVA and we aim to enable an IVA to point to thereal world as accurately as a real person. With design factors determinedand rationales elaborated, we conducted experiments and demonstrated thatparticipants can interpret the IVA’s pointing to a physical location more ac-curately than the real person’s natural pointing. Specifically, our designedIVA outperformed the real person in the vertical dimension and yielded thesame level of accuracy horizontally. Our findings and design strategies pro-vide guidelines for futures studies on human-agent interactions with pointinginvolved.ivPrefaceAll of the work presented in this thesis was conducted in the Human Com-munication Technologies Laboratory at the University of British Columbia.All experiments and associated methods were approved by the University ofBritish Columbia’s Research Ethics Board (Certificate Number H08-03005-A022).A version of Chapter 3 has been published in [84] as listed below. I wasthe lead investigator, responsible for the experimental design, data collec-tion and analysis, as well as the manuscript composition. Mr. Kashiwaqiassisted with the rendering of an intelligent virtual agent (IVA) in a spheri-cal display, on the basis of the calibration techniques [32, 78, 89]. Dr. Zhouwas involved in the experimental design and contributed to manuscript ed-its. Dr. Seo provided editorial feedback on the manuscript. Prof. Felscontributed suggestions on the formulation of the research question.[84] Fan Wu, Qian Zhou, Kyoungwon Seo, Toshiro Kashiwaqi,and Sidney Fels. I got your point: An investigation of pointingcues in a spherical fish tank virtual reality display. In 2019 IEEEConference on Virtual Reality and 3D User Interfaces (VR),pages 1237–1238. IEEE, 2019A version of Chapter 4 has been submitted to a journal and is under peer-review at the moment of the thesis submission. I formulated the researchquestion, designed the IVA and experiment, analyzed the collected data andwrote the manuscript. Prof. Fels was involved throughout the project inthe design iterations and manuscript edits. Dr. Zhou was involved in thevPrefacediscussion of the experimental design and data analysis. Dr. Stavness andDr. Zhou provided help in manuscript edits.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xvDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . 31.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . 52 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Intelligent Virtual Agents (IVAs) . . . . . . . . . . . . . . . . 62.1.1 Embodiment in IVAs . . . . . . . . . . . . . . . . . . 62.1.2 Pointing in IVAs . . . . . . . . . . . . . . . . . . . . . 9viiTable of Contents2.2 Perception of Pointing . . . . . . . . . . . . . . . . . . . . . . 112.2.1 Perception of Pointing in the Real World . . . . . . . 122.2.2 Perception of Pointing in Virtual Environments . . . 142.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Evaluation of Pointing Cues in a Spherical FTVR Display 173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.1 Pointing Cues . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Spherical FTVR Display . . . . . . . . . . . . . . . . 183.1.3 Research Goal . . . . . . . . . . . . . . . . . . . . . . 193.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.1 Participants . . . . . . . . . . . . . . . . . . . . . . . 193.2.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.3 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 193.2.4 Procedures . . . . . . . . . . . . . . . . . . . . . . . . 213.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.1 Pointing Cues . . . . . . . . . . . . . . . . . . . . . . 263.3.2 Degree . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.3 Error Distribution . . . . . . . . . . . . . . . . . . . . 273.4 Design Implications . . . . . . . . . . . . . . . . . . . . . . . 283.5 Limitations and Future Work . . . . . . . . . . . . . . . . . . 283.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Comparison of Pointing Accuracy Between IVA and HumanPointing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Design Rationale for IVA . . . . . . . . . . . . . . . . . . . . 324.2.1 IVA Appearance . . . . . . . . . . . . . . . . . . . . . 324.2.2 Pointing Gestures . . . . . . . . . . . . . . . . . . . . 334.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35viiiTable of Contents4.3.1 IVA Scale . . . . . . . . . . . . . . . . . . . . . . . . . 354.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . 364.3.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3.4 Experimental Setup . . . . . . . . . . . . . . . . . . . 374.3.5 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 394.3.6 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 414.3.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 424.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4.1 Accuracy Di↵erence Between IVA and RP . . . . . . 474.4.2 Distance . . . . . . . . . . . . . . . . . . . . . . . . . 544.4.3 Viewing Condition . . . . . . . . . . . . . . . . . . . . 554.5 Design Implications . . . . . . . . . . . . . . . . . . . . . . . 564.6 Limitations and Future Work . . . . . . . . . . . . . . . . . . 574.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.1 Contribution Summary . . . . . . . . . . . . . . . . . . . . . 605.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.3 Potential Applications . . . . . . . . . . . . . . . . . . . . . . 635.3.1 Agents . . . . . . . . . . . . . . . . . . . . . . . . . . 635.3.2 Avatars . . . . . . . . . . . . . . . . . . . . . . . . . . 645.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.4.1 Identification of Each Design Factor’s E↵ect . . . . . 645.4.2 Comparison with Other Displays . . . . . . . . . . . . 655.4.3 Pointing Interactions with Verbal Cues Added . . . . 665.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 67Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68ixTable of ContentsAppendicesA Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80A.1 15 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80A.2 30 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80B Geometric Calculations . . . . . . . . . . . . . . . . . . . . . . 81B.1 Distance Adjustment . . . . . . . . . . . . . . . . . . . . . . 81B.2 Analysis of Estimated Error in Human Pointing . . . . . . . 83B.2.1 Estimated Vertical Error in Human Pointing . . . . . 83B.2.2 Estimated Horizontal Error in Human Pointing . . . 84B.3 Eye-shoulder Distance . . . . . . . . . . . . . . . . . . . . . . 85B.3.1 Error Di↵erence Between IVA and RP . . . . . . . . 85B.3.2 Horizontal and Vertical Error in RP . . . . . . . . . . 86C Consent Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 88xList of Tables4.1 Interview Responses on the Comparison of IVA and RP . . . 474.2 Interview Responses of Pointing Cues in IVA and RP. . . . . 48xiList of Figures2.1 Put That There. . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Amazon Alexa. . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Jack Pointing on a Virtual Board. . . . . . . . . . . . . . . . 102.4 User Interacting with MACK. . . . . . . . . . . . . . . . . . . 112.5 Pointing Gestures . . . . . . . . . . . . . . . . . . . . . . . . . 133.1 Experimental Settings on Degree . . . . . . . . . . . . . . . . 203.2 IVA with Pointing Cues . . . . . . . . . . . . . . . . . . . . . 213.3 Experimental Apparatus . . . . . . . . . . . . . . . . . . . . . 223.4 Accuracy and Mean Time . . . . . . . . . . . . . . . . . . . . 243.5 Questionnaire Responses on the Confidence Level . . . . . . . 253.6 Radial Heat Maps About Error Distribution . . . . . . . . . . 274.1 IVA in a Spherical FTVR Display . . . . . . . . . . . . . . . 324.2 Eye-fingertip Alignment and Arm Vector Pointing . . . . . . 334.3 Top View of the Experimental Setup . . . . . . . . . . . . . . 384.4 Experimental Setup for IVA and RP . . . . . . . . . . . . . . 404.5 Distance Error with Means and 95% Confidence Intervals (CIs) 434.6 Vertical Error with Means and 95% CIs . . . . . . . . . . . . 454.7 Scatter Plot of Each Participant’s Average Error Bias . . . . 464.8 Side View and Top View of Human Pointing . . . . . . . . . 494.9 Horizontal & Vertical Error with Means and 95% CIs . . . . 51B.1 Distance Adjustment for Same Retinal Size . . . . . . . . . . 81B.2 Estimated Vertical Error Analysis in Human Pointing . . . . 83xiiList of FiguresB.3 Estimated Horizontal Error Analysis in Human Pointing . . . 84B.4 Horizontal and Vertical Error Di↵erence Between IVA and RP 85B.5 Eye-shoulder Distance from Side View and Top View . . . . . 86xiiiList of Abbreviations• 2D: 2-dimensional• 3D: 3-dimensional• ANCOVA: Analysis of Covariance• ANOVA: Analysis of Variance• AR: Augmented Reality• CI: Confidence Interval• CVE: Collaborative Virtual Environment• FTVR: Fish Tank Virtual Reality• HH: Head+Hand• HMD: Head-mounted Display• IVA: Intelligent Virtual Agent• PC: Personal Computer• RP: Real Person• SameDis: Same Distance• SameRet: Same Retinal Size• VAC: Vergence-Accommodation Conflict• VR: Virtual RealityxivAcknowledgementsThe research work presented in Chapter 3 was funded by the Natural Sci-ences and Engineering Research Council of Canada (NSERC) and B-ConEngineering. The work in Chapter 4 was funded by NSERC.I would like to thank my supervisor Prof. Sidney Fels for the guidanceand supervision throughout my graduate studies. I truly appreciate hispatience and insightful suggestions that helped me overcome challenges inmy research journey. I would also like to thank Dr. Zhou for providingme with invaluable suggestions, both in research and life, and for her kindwords to keep me sane and optimistic.I am grateful to be given the opportunity to join the Human Commu-nication Technologies (HCT) Lab. This work would not have been possiblewithout the technical resources and our lovely lab mates. Their support tobe participants in my user studies and words of encouragement are greatlyappreciated.My deepest thanks go to my loving mom and brother, for their love andcontinuous support that keep me believe myself and go through the diculttimes.xvDedicationTo my mom and brotherxviChapter 1IntroductionThere is much research targeting voice and gestures to improve naturalhuman-agent interactions, as these are core components in interpersonalcommunication [11, 30, 57]. Among various gestures, pointing is a foun-dational building block in human communication [48]. Deictic pointing,pointing that complements the spoken indicated referent, is the most com-mon method to indicate things and is frequently used as an extra-linguisticmeans for referent identification [49, 82]. One typical example is the speech“your key is over there” along with pointing that indicates there (wherethe key is) rather than describing the specific location. It simplifies verbalcommunications and avoids referential failures, especially when verbal de-scriptions fail to express intentions concisely (e.g., confusion was found inthe description “the bright pink flat piece of hippopotamus face shape pieceof plastic” in a conversation [37]).Not only is deictic pointing important in the real world, but also it is animportant research field in the interaction with virtual environments. Forexample, as far back as 1980, Bolt’s “Put that there” research [13] demon-strated how an intelligent virtual agent (IVA) can recognize and interpreta person’s pointing gestures to objects in a 2D virtual world to facilitatenatural human-computer interactions. More recently, with the advances invoice-based IVAs, such as Amazon Alexa [1], the new viewer and context-aware display technologies provide opportunities for IVAs to be visualizedin 3D and be able to point at objects in the real world. We believe that en-abling IVAs with pointing gestures would enrich the verbal communicationchannel and promote ecient human-like interactions [46].1Chapter 1. IntroductionTo equip IVAs with pointing interactions, the first question to answeris how accurately users can interpret the direction of IVAs’ pointing, whichis a fundamental building block of pointing interactions. If users can nottell where an IVA is pointing e↵ectively, it would be dicult to use point-ing gestures for social interactions with IVAs. However, it remains unclearwhether it is possible to design an IVA that can point to real locations andhave users accurately recognize where the IVA is pointing. Optimally, usersshould be able to interpret an IVA’s pointing to the real world as well as, oreven better than a real person’s pointing.In this thesis, we designed a 3D visual representation of an IVA withour research goal as: to enable an IVA to point to the real world as ac-curately as a real person. We measured users’ accuracy of interpreting thepointing direction as the evaluation metric. We believe our findings can pro-vide guidelines for futures studies or industrial applications on human-agentinteractions with pointing involved.We used a spherical Fish Tank Virtual Reality (FTVR) display for ourIVA design. Unlike immersive displays, our spherical FTVR display is cali-brated to be viewer-aware in the real-world coordinate system, enabling theIVA to point from the virtual world to objects in the real world. It alsoo↵ers e↵ective 3D depth cues for pointing perception (i.e., stereoscopic cueand motion parallax) [47]. Besides, spherical displays have been found toprovide better gaze [40, 66], size and depth [87] perception compared to theflat displays.To meet our research goal, we deliberately made decisions on IVAs’ de-sign factors. One of the fundamental design factors is the pointing cue. Weselected head and hand cues for the evaluation and disabled the eye gaze,due to the head cue’s dominant e↵ect in the overall gaze perception [73]and the generalization to backward-pointing [81]. We will discuss the rea-sons in detail in Chapter 3. To answer the question of what pointing cuese↵ectively promote a good perception, we conducted an experiment to inves-21.1. Research Questionstigate the e↵ect of head and hand cues on users’ performance of interpretingthe IVA’s pointing from a spherical FTVR display. On top of the findingfrom the experiment, we determined IVAs’ other design factors, includingthe appearance, subtleties on how it points, and di↵erences compared tothe human pointing that can help improve users’ perception of the pointingdirection.Using our IVA design in the spherical FTVR display, we conducted anexperiment to investigate the di↵erence between the IVA and natural hu-man pointing, measured by users’ accuracy of interpreting the pointing to aphysical location. Our results demonstrate that an IVA can be designed topoint to locations in the real world more accurately than a real person.In this thesis, we report the design rationales, experiments as well asfindings and discussions from our analysis. We believe our studies and IVAdesign help pave the way for other research about the design of IVAs’ point-ing gestures as well as users’ perception of pointing either in the virtualenvironment or in the real world.1.1 Research QuestionsThe main research question regarding the design of our IVA, addressed inthis thesis, is:• How accurately can an IVA point to the real world compared to a realperson?To answer this question, we first determined the pointing cue, which is akey aspect in our IVA design. Specifically, we evaluated head and hand cuesby resolving the question:• How do head and hand cues a↵ect users’ performance in interpretingan IVA’s pointing to the real world?31.2. Contributions1.2 ContributionsThe main contribution of this thesis is:We demonstrated that a visual 3D IVA can be designed to point tothe real world more accurately than a real person. Specifically, the IVAoutperformed the real person in the vertical dimension and yielded the samelevel of accuracy horizontally.Regarding the design strategies of our IVA and the comparison withnatural human pointing, this thesis provides the following contributions:• Established design guidelines on selecting pointing cues inthe virtual environment. We provided design guidelines on theselection of head and hand cues to enable users a good perception inpointing scenarios with di↵erent diculty levels.• Established design strategies for IVA pointing. To find reasonsattributed to our IVA’s higher pointing accuracy than the real person,we discussed potential design factors and suggested design implicationsfor IVA pointing.• Confirmed the robustness of previous human pointing results.Our experiment on human pointing replicated previous study’s results[10, 24], which also enhances the validity of our experimental design.41.3. Thesis Structure1.3 Thesis StructureThe remainder of this thesis is structured as follows: Chapter 2 provides aliterature review related to the topics of pointing in IVAs and the perceptionof pointing in di↵erent environments. Chapter 3 describes the methodologyand experiment in evaluating the e↵ect of head and hand cues on users’perception of pointing. Chapter 4 presents the IVA design, methodologyand experiment in investigating our main research question, to establish thecapability of an IVA to point as accurately as a real person. Chapter 5concludes with a summary of our contributions as well as limitations andsuggestions on future research directions.5Chapter 2Related WorkThe research described in this thesis is motivated by facilitating e↵ectivepointing interactions with IVAs. The visual embodiment allows for pointinggestures enabled in human-agent interactions. So we first review previousstudies on the intelligent virtual agents (IVAs), covering the visual embodi-ment and the deictic pointing. To promote e↵ective pointing interactions, afundamental problem we should answer first is users’ perception of pointing,that is, how accurately users can interpret the pointing direction. We targetthe goal to enable an IVA to point to the real world as accurately as a realperson. So in the second part, we review prior literature on the perceptionof pointing in the virtual environment and the real world.2.1 Intelligent Virtual Agents (IVAs)2.1.1 Embodiment in IVAsIntelligent virtual agents (IVAs) are interactive digital characters that ex-hibit human-like qualities and can communicate with humans and each otherusing natural human modalities [75]. Considerable research has been de-voted to working in this research field. One of the first studies is “Put thatthere”[13], which allows users to interact with an IVA using deictic gesturesas well as voice commands, to manipulate objects in a 2D wall-sized display(Figure 2.1). In 1997, Dragon Systems launched the first continuous dic-tation program, Dragon NaturallySpeaking [35], which can behave like anIVA to do dictation and complete users’ voice commands for the control of62.1. Intelligent Virtual Agents (IVAs)Figure 2.1: Put that there [13]: users interact with an IVA using deicticpointing and voice commands. ( c ACM (1980), by permission)a PC. IVAs in these previous works, including most current-state-of-the-artcommercial IVAs, such as Amazon Alexa [1] (Figure 2.2) or Google Assis-tant [27], are mainly controlled by voice and lack the visual embodiment andnon-verbal gesture cues, which are important in human social interactions[46].Although a literature review [26] on early studies shows mixed e↵ects ofthe visual embodiment on human-agent interaction, a newer meta-analysisby Yee et al. [85] suggests that a visual representation of an agent leads tomore positive social interactions than not having it. Moreover, they foundthat the e↵ect size of having a face was larger than the degree of real-ism in the visual presentation or the behavior of the agent: the realism ofthe embodied agent may matter very little, and animating highly realisticfaces might appear unnatural and induce negative feelings (i.e., the UncannyValley [61]). Over past decades, dozens of studies have demonstrated thepsychological benefits of agent embodiment. Bente et al. [12] observed a72.1. Intelligent Virtual Agents (IVAs)Figure 2.2: Amazon Alexa [1] has a voice interface but lacks the visualembodiment and pointing gestures.stronger social presence in collaborative net-communications using embod-ied virtual representations. Demeure et al. [28] showed that virtual agentswere perceived as being more believable if they were embodied, particu-larly related to a higher sense of competence and warmth. Kim et al. [46]investigated the e↵ects of embodiment and social behaviors on the percep-tion of IVAs during social interactions. They measured subject feelings inmultiple dimensions and concluded that participants were socially closer tothe IVA when it had a human-like visual embodiment than only having adisembodied voice.In this thesis, we design an embodied IVA with its appearance not thatrealistic to avoid the uncanny valley. We believe a visual representationcould enrich the communicative channels to convey the IVAs’ status andintentions, as well as allows the provision of non-verbal cues, particularlythe pointing gestures.82.1. Intelligent Virtual Agents (IVAs)2.1.2 Pointing in IVAsDeictic PointingPrevious research [5, 29, 43, 58] has reported many classifications of gestures.A common one proposed by McNeil [58] is to divide gestures into four types: 1) Iconic gestures depict concrete objects or actions in terms of theirphysical aspects, e.g., tracing a square with the index finger to refer to awindow or box. 2) Metaphoric gestures show abstract ideas or the vehicleof a metaphor, e.g., making fingers to a V shape to indicate the victory.3) Deictic gestures are pointing gestures that indicate concrete entities inphysical environment or abstract loci in space, e.g., pointing the tree whilesaying “the tree is over there”. 4) Beats are rhythmic movements alignedwith the speech pattern. Among these gestures, deictic pointing, the mainrepresentative of deictic gestures, is the most common method to indicatethings [49, 82].Deictic pointing is typically used to complement or replace the spokendeixis [51]. Deixis, a term from Greek, is a form of referring that is tied to thecontext of utterance. It is defined as “pointing” via language [76]. Based onthe type of referent that speakers want to indicate, deixis can be categorizedto person deixis (“she”, “you”), spatial deixis (“there”, “here”) and temporaldeixis (“today”, “now”) [86]. Deictic pointing is produced to indicate thedeixis and direct the recipient’s attention towards the referent in the sharedenvironment [23]. A typical example of it is to reply to questions like “whereis xxx” by pointing to the spatial deixis “there”, which is commonly alongwith the response “it’s over there” rather than a verbal description of thelocation.By circumscribing the referential domains [9], deictic pointing reducesthe verbal e↵ort to identify a target and hence facilitate more ecient com-munication, especially when it is dicult to rely only on verbal descriptionsto convey intentions. Likewise, we believe IVAs with deictic pointing en-abled would promote an ecient human-like interaction. In this thesis, our92.1. Intelligent Virtual Agents (IVAs)Figure 2.3: Jack, a virtual human presenter, can point at a particular posi-tion on a virtual board in interaction with users [64]. ( c IEEE (2000), bypermission)research focuses on deictic pointing. We will use “pointing” in our thesis torepresent “deictic pointing” as they are often used interchangeably [56].IVAs with PointingPointing is a natural and fundamental gesture for indicating a particularobject. The ubiquity of pointing drives research on incorporating it for IVAsin virtual environments. Most prior studies on IVAs with pointing focusedon its benefits in drawing users’ attention to some content in the virtualworld where the IVA is situated. For example, the Persona agent [2] couldpoint to images on web pages, and Jack, as a virtual meteorologist, can give aweather report by pointing to the weather images [64] (Figure 2.3). Atkinson[6] showed an animated virtual agent serving as a tutor in a knowledge-based learning environment and demonstrated the benefits of pointing indirecting the learners’ attention. When combined with speech and context,the BEAT was created to generate correlated gestures by extracting thelinguistic and contextual information from the input text [18]. To achieve102.2. Perception of PointingFigure 2.4: MACK can point to the physical environment shared with users[17]. ( c IMAGINA (2002), by permission)deictic believability, Lester et al. [52] designed COSMO agent, using a deicticplanner to determine the generation of pointing guided by the spatial deixisframework. Rather than pointing to the virtual environment, MACK [17],in mixed reality, could point to a physical paper map shared with users inreality along with speech (Figure 2.4).However, an unanswered question is how accurately an IVA can pointto the real world, which is a fundamental building block of pointing in-teractions. Our research aims to design an IVA and enable it to point asaccurately as a real person. Understanding the design strategies can providefoundations for e↵ective human-agent interactions in a pointing scenario.2.2 Perception of PointingDuring interpersonal interactions, the accuracy with which observers candetect the pointed targets based on another person’s pointing gestures hasbeen a key issue. Because if a person cannot accurately interpret the other’spointing direction, it would be dicult to establish joint attention within a112.2. Perception of Pointingconversation [15]. Likewise in the pointing interaction with IVAs, we shouldfirst establish users’ accurate perception of IVAs’ pointing.Pointing can be classified into proximal and distal pointing, dependingon whether the pointer can touch the referents (targets) or not [69]. Proxi-mal pointing is mainly used to identify what aspect of an object is referredto [10]. In contrast, the main concern for the perception of distal pointingis how accurately observers can interpret the pointing direction and locatethe pointed target in a shared environment [14]. As our IVA is a situatedvirtual agent inside a spherical display without the capability to access thephysical objects, we evaluate the perception of distal pointing in this thesis.By measuring users’ accuracy of interpreting the pointing direction, we in-vestigate the comparison of IVA and natural human pointing. The pointers(IVA and real person) are situated in di↵erent environments, so we reviewstudies on the perception of pointing in the two environments: the real worldand virtual environments.2.2.1 Perception of Pointing in the Real WorldHuman Pointing GesturesWhen humans perform pointing naturally, without instructions, instead ofpointing using their arm vector, Bangerter & Oppenheimer [10] and Hen-riques & Crawford [42] observed that humans commonly orient their arm sothat the fingertip intersects the line joining the target and their dominanteye while gazing at the target. This is called eye-fingertip alignment and isillustrated in Figure 2.5 Top. This mechanism was further shown in the es-timation of human pointing direction. With various methods proposed, thehead-hand line [21, 53, 60] (also known as the eye-fingertip line) was foundto be the most reliable (90%) compared to forearm direction and head ori-entation [63]. The eye-fingertip alignment is also called eye-finger ray case(EFRC). Mayer et al. [55] demonstrated that it yielded the lowest o↵setamong four other ray cast techniques. To make the IVA point as accurately122.2. Perception of Pointingas a human, this alignment needs to be considered when designing the IVA’spointing gestures.(a) Eye-fingertip alignment(b) Arm Vector PointingFigure 2.5: (Top) Eye-fingertip alignment commonly found in Human Point-ing: humans naturally align the dominant eye with the fingertip when point-ing at a target. (Bottom) Arm Vector Pointing: arm vector, eyes and headall point at a target directly without coordination.Evaluation of the Perception of Pointing in the Real WorldDuring interpersonal communication, how observers perceive the pointingbehavior is an important research topic. The pointing behavior typically in-volves the movement of eye gaze, head and arm [42]. Considerable researchhas been done targeting gaze perception. People can accurately discern their132.2. Perception of Pointingmutual gaze with another person [3, 36] and the direction of the other per-son’s gaze [34]. By contrast, research on the perception of pointing accuracyis scant. By evaluating the detection accuracy for di↵erent combinations ofhead, eye and hand pointing cues, Butterworth and Itakura [15] showed thatpointing can improve spatial localization of targets when compared to headand gaze cues but suggested that pointing had limited accuracy. Bangerter &Oppenheimer [10] contested their findings with a more precise measurementtechnique. The results revealed that the detection accuracy was comparableto the accuracy level for eye gaze and it was una↵ected by the exclusion ofeye gaze and head orientation. Despite the good accuracy, they observed aperceptual bias towards the side of the pointer’s arm away from the guesserin the horizontal dimension and above the target in the vertical direction.It was illustrated that the ambiguity introduced by the deviation betweenthe eye-fingertip line and arm line might account for it. A study by Cooneyet al. [24] evaluated the pointing accuracy in the horizontal direction andreplicated Bangerter & Oppenheimer’s results.Considering the ambiguity shown in human pointing and exploiting thefact that we have explicit control over the IVA’s head, eye and finger posi-tioning, we designed our IVA to use arm vector pointing rather than eye-fingertip alignment as an approach to improve its pointing accuracy (Fig-ure 2.5 Bottom). We will expand the discussion in Chapter Perception of Pointing in Virtual EnvironmentsWhile pointing is ubiquitous in daily interactions within the real world, itis dicult for users to precisely interpret the pointing direction in virtualenvironments. Wong and Gutwin [83] compared users’ accuracy in a collab-orative virtual environment (CVE) with the real world and observed worseperformance in CVE although the di↵erence was smaller than expected. Theimmersive head-mounted displays (HMDs) and virtual reality (VR) systems(e.g., CAVE) only support pointing within the virtual environment where142.2. Perception of Pointingthe IVA is situated. By merging the real world with the virtual environ-ment, FTVR [80] displays enable the IVA to point from the virtual worldto the real world and provide a mixed reality experience. Our experimentsused a spherical FTVR display because it has advantages over other VR/ARdisplays and planar displays as we will discuss in Chapter 3.Evaluation of the Perception of Pointing in FTVRRegarding the evaluation of users’ perception of pointing in FTVR, pre-vious research focused on the assessment of pointing cues. As suggestedin Section 2.2.1, when pointing to objects beyond the arm’s reach, humanscommonly point with an extended arm and index finger with a synchronizedhead and eye gaze movement [62, 74]. Pointing behavior typically involvesthe movement of cues: head, hand (arm) and eye. Kim et al. [47] classifiedthe pointing cue factors into three levels: gaze, hand, and gaze+hand. Theyfound no significant di↵erence among the three levels with an experimentin a cylindrical 3D display. Using gaze to convey pointing direction withina spherical display has been shown to be e↵ective [39, 65, 66]. However,there was no design guideline on how to select e↵ective pointing cues inthe virtual environment, which is an essential component in IVA’s design.Among head, hand and eye gaze cues, we selected head and hand cues anddisabled the eye gaze, due to the head cue’s dominant e↵ect in the overallgaze perception [73] and the generalization to the backward-pointing [81].We conducted a user study to evaluate the e↵ect of head and hand cues toenable users a good perception, as we will present in Chapter 3.The research listed above is mostly concerned with telepresence. That is,the remote person is represented by an avatar or captured using cameras torealize remote collaboration. By contrast, we are using the IVA to performpointing. In this context, the IVA is regarded as a social entity to mimichuman intelligence [46] and work with a person. Unlike pointing in telep-resence, designing the IVA’s pointing gestures provides more possibilities to152.3. Summaryimprove users’ perception of pointing as the pointing behaviours do not haveto be exactly human-like. Thus, for our design, we have the opportunity todesign the IVA’s pointing gestures not completely the same as humans. Thisenables us to remove the eye-fingertip alignment in the IVA as suggested inSection 2.2.1. The complete IVA design is discussed in Chapter 4.2.3 SummaryTo summarize, we first presented the benefits of IVAs’ visual embodimenton allowing the provision of pointing gestures to facilitate human-like in-teractions. We then introduced deictic pointing and surveyed prior IVAswith pointing gestures, uncovering the research gap and the importance ofour research question. With users’ perception of pointing as the evaluationmetric, we aim to design an IVA and enable it to point to the real worldas accurately as a real person. So we reviewed previous literature on users’perception of pointing in the real world and virtual environments, suggestingways and possibilities to design an IVA with comparable pointing accuracyas a real person.16Chapter 3Evaluation of Pointing Cuesin a Spherical FTVR Display3.1 IntroductionWhile pointing is ubiquitous in our daily interactions within the real world, itis dicult to accurately interpret an IVA’s pointing direction to the physicalworld, considering its complex and subtle gesture cues, such as the move-ment of the hand, gaze and head. One potential solution is to provide robustpointing cues for a better perception. Using gaze to convey pointing direc-tion in telepresence has been explored by many researchers [7, 39, 47, 65, 66].Typically, the pointing behavior involves the movement of cues: head, hand(arm) and eye [42]. Kim et al. [47] classified the pointing cue factors intothree levels: gaze, hand and gaze+hand, and conducted an experiment in acylindrical 3D life-size display. Although they evaluated the e↵ect of point-ing cues on users’ accuracy of judgment, there was no design guideline pro-vided for cues selection, and the low-resolution display(⇠ 8.10ppi) makesthe result lack the external validity for the generalization. A design guide-line for selecting e↵ective pointing cues would benefit the design of IVAswith high pointing accuracy. In this chapter, we present an experiment onthe evaluation of several pointing cues to enable users a good perceptionand provide guidelines on cues selection for designing pointing in the virtualenvironment.173.1. Introduction3.1.1 Pointing CuesDeciding what cues to evaluate is not trivial. Previous work indicated thatgaze pose is the sum of the head orientation and eye orientation while headpose was reported to contribute 68.9% on average to the overall gaze direc-tion, with an accuracy of 88.7% in a meeting scenario [73]. Wilson et al.[81] also showed that the internal features of the head or the outline headcontour provide cues of equivalent strength in the discrimination of headorientation. It suggests us to focus mainly on the head cue not only for itsrobust e↵ect on gaze pose perception, but also because it could ensure theobservers’ perception even when the other person faces backward. Thus,we revised Kim et al.’s classification [47] of the pointing cues to: Hand,Head, and Hand+Head (HH) and included backward-pointing besides theforward-pointing.3.1.2 Spherical FTVR DisplayWe chose to use a spherical FTVR display for IVA due to the followingreasons. First, FTVR displays are situated in the real world which enablesthe IVA to point from the virtual environment to locations in the real world.Alternative approaches, such as immersive headset displays, only supportpointing within the virtual environment where the IVA is situated. ThoughAR displays provide the see-through feature that can get similar e↵ects,these systems lack the tangible nature of having a volumetric display thatis part of the real world. FTVR displays also provide motion parallax andstereoscopic cues, which are important in interpreting pointing gestures [47].Regarding the screen shape, spherical FTVR displays have been found toprovide better depth and size perception compared to a planar counterpart[87]. Spherical screens also showed better task performance in perceivinggaze direction compared to planar screens [40, 66]. As perceiving pointinggestures depends on multiple aspects of visual perception such as depth andorientation perception, it is promising to use spherical FTVR displays to183.2. Experimentprovide better task performance.3.1.3 Research GoalBesides the pointing cues, we also introduced diculty level as an indepen-dent variable with 15 setting (Figure 3.1) defined as fine pointing and 30as coarse pointing. Our goal is to investigate how head and hand cues a↵ectusers’ perception of the IVA’s pointing from a spherical FTVR display, bothin fine pointing (15) and coarse pointing (30). We are also interested inwhether the combination of these two cues results in a better performanceon the pointing perception. We believe the finding would provide guide-lines about cues selection for a good perception of pointing in the virtualenvironment.3.2 Experiment3.2.1 ParticipantsTwelve unpaid participants (5 females and 7 males) aged between 24 and35 participated in the study. All had normal or corrected-to-normal vision.3.2.2 DesignThe experiment used a 2 ⇥ 3 within-subjects factorial design: degree has 2conditions (15 and 30) (Figure 3.1) and pointing cues have 3 conditions(Head, Hand, and Head+Hand (HH)) (Figure 3.2). The sequence of thedegree and pointing cues were fully counterbalanced. We measured accuracyand time for a quantitative analysis. We collected subjective data abouttheir ranking of confidence level on pointing cues through a questionnaire.3.2.3 ApparatusOur IVA was rendered using Unity3D and MikuMikuDance (MMD) [44]model from DeviantArt [68] and displayed in a 24-inch diameter spherical193.2. Experiment(a) 15 setting (b) 30 settingFigure 3.1: Experiment settings. (Left) 15 apart with 21 targets. Threetargets (0, 1, 23) are excluded due to invisibility; (Right) 30 apart with 11targets and one target (0) excluded.FTVR display (Figure 3.3). With four stereo projectors rear projectingonto the spherical surface, we adopted an automated camera-based multi-projector calibration technique [89] to enable a 360 seamless image with 1-2millimeter accuracy. The projectors are Optoma ML750ST with 1024⇥ 768pixel resolution and a frame rate of 120Hz. With an NVIDIA Quadro K5200graphics card, a host computer sends rendering content to the projectors.The OptiTrack (NaturalPoint Inc., Corvallis, OR) optical tracking sys-tem was used to track the passive markers attached to the shutter glasses forhead tracking. We used a pattern-based viewpoint calibration [78] that canquickly compute a viewpoint registration with an average angular error ofless than one degree. Viewers gain perspective-corrected images with stereorendering coupled with the synchronized shutter glasses. The total latency isbetween 10-20 msec [32]. With a good resolution of ⇠ 34.58ppi, the display203.2. ExperimentFigure 3.2: IVA with pointing cues provided in the spherical FTVR display:(Left) Head; (Middle) Hand; (Right) Head+Hand(HH).provides various 3D depth cues such as motion parallax and stereoscopiccues [88].3.2.4 ProceduresParticipants started with a given verbal explanation of the experiment. Eachdegree setting includes three rounds of pointing cues tasks, covering the threeconditions of pointing cues. As illustrated in Figure 3.3 and Figure 3.1, sub-jects were asked to sit on an 80cm high chair to ensure the line of sight ishorizontal. Each participant was given two sheets of paper to record thenumbers based on their interpretation of the IVA’s pointing direction. Onesheet was for 15 condition with a 21⇥3 empty table (total trials ⇥ pointingcues) while the other one was with a 11⇥ 3 table for 30 setting. Before theformal trials, there were 2 training trials for each pointing cue condition.Once the IVA received the finished response ‘ok’ from the participant, herhead and hand returned to the initial position with no pointing for 0.5 sec-onds to avoid cross-talk with previous targets served as a reference, beforeperforming the next pointing task. The targets were randomly sorted witheach one only occurring once. After tasks for each degree condition, partic-ipants were asked to rank their confidence level for the pointing cues in the213.2. ExperimentFigure 3.3: (Left) Experimental apparatus showing an IVA pointing to atarget with the head and hand pointing cues. The IVA in the sphericaldisplay was controlled by a keyboard to point to di↵erent targets, with eyesclosed to disable the eye gaze. (Right) Spherical display setup: it provides360 visibility with perspective corrected rendering. With a good resolutionof ⇠ 34.58ppi, the system also o↵ers motion parallax and stereoscopic cues.For IVA’s rendering, we used Unity3D and MMD model for set up andanimation.questionnaire (Appendix A).3.2.5 ResultsAccuracyA participant response is considered accurate if they can successfully iden-tify the correct target. The mean accuracy and 95% confidence intervals(CIs) are displayed in Figure 3.4 Top. With all assumptions met, we useda 2 (degree) by 3 (pointing cues) repeated measures ANOVA on the accu-racy. We found that degree had a significant main e↵ect on the accuracy(F (1, 11) = 102.731, p < .001), with lower accuracy in 15 (Mean(M) =0.731, Standard Error(SE) = 0.024) than in 30 (M = 0.937, SE = 0.016).The main e↵ect of pointing cues was also significant (F (2, 22) = 10.384, p =223.2. Experiment.001). There was a significant interaction between degree and pointing cues(F (2, 22) = 9.648, p = .001).A following simple e↵ects analysis indicated that in 15 setting, Head(M = 0.587, SE = 0.032) achieved significantly lower accuracy than Hand(p < .05) as well as HH cues (p < .001). There was no significant di↵erence(p > .1) in accuracy between Hand (M = 0.782, SE = 0.031) and HHcues(M = 0.825, SE = 0.021). No significant di↵erence was found acrosscues in the 30 setting (p > .1).TimeTime was calculated by averaging time participants spent per trial to com-plete the task. The mean time taken and 95% confidence intervals (CIs)are displayed in Figure 3.4 Bottom. With all assumptions satisfied, theANOVA suggested that degree had a significant main e↵ect on participants’task completion time (F (1, 11) = 28.483, p < .001), with longer mean timein 15 (M = 4.586, SE = 0.266) compared to 30 (M = 3.492, SE =0.185). Results also revealed a significant main e↵ect of pointing cues(F (1.25, 13.73) = 10.789, p < .005) and interaction between pointing cuesand degree (F (2, 22) = 3.497, p < .05).In 15 setting, the time of Head (M = 5.405, SE = 0.590) was found tobe significantly longer (p < .05) than Hand (M = 3.988, SE = 0.284) whilenot significantly di↵erent (p > .05) from the HH cue (M = 4.365, SE =0.383), which is illustrated in Figure 3.4 Bottom. There was no significantdi↵erence (p > .1) between Hand and HH cues. In 30 setting, the time wasnot significantly di↵erent across the pointing cues (p > .05).Confidence LevelIn 15 setting, all twelve participants evaluated Head as the least confidentcue. HH cue was ranked by eight as having the highest confidence level.Four participants gave the same ranking for HH and Hand cue.23(a) Accuracy(b) Mean TimeFigure 3.4: (Top) Accuracy with means and 95% confidence intervals (CIs):Hand cue greatly improved the accuracy by 19.4% than Head cue in the15 setting. The di↵erence between Hand and HH is subtle for both set-tings. (Bottom) Completion Mean Time with means and 95% CIs: Handcue achieved lower time (0.38 seconds less) than HH cue in the 15 setting.Significance values are reported in brackets for p < .05(⇤), p < .01(⇤⇤), andp < .001(⇤ ⇤ ⇤).243.2. ExperimentFigure 3.5: Participants’ confidence rank with means (circle), medians(cross) and 95% CIs from 1 “most confidence” to 3 “least confidence” tothe questions of how confident they felt about the reported result with eachof the pointing cue.In 30 setting, Head was evaluated as the least confident cue by elevenout of twelve participants. Six participants were most confident in theirjudgment when they observed HH cue, and five gave the same ranking forHH and Hand cue. One participant found no di↵erence in confidence levelamong HH, Head, and Hand cue.A Friedman ranked sum test was performed and revealed a significant dif-ference across conditions in the confidence level (2(5) = 49.781, p < .001).Post hoc analysis with Wilcoxon signed-rank tests for multiple comparisonswas conducted with a Bonferroni correction applied, resulting in a signifi-cance level set at p < 0.0056. There were significant di↵erences across anypairs of pointing cues within each degree condition except for HH and Handin 30 ( Z = 2.449, p > .01 ). No significant di↵erence was found for thepointing cue between degree settings. Results of the mean, median and 95%CIs are shown in Figure Discussion3.3 Discussion3.3.1 Pointing CuesIn 15 setting, it was not surprising to find that Head cue performed worstboth in Accuracy and Time. The high diculty level of the 15 setting re-quires strong cues for an accurate interpretation. From interview responses,all the participants ranked Head as the least confidence level option. Elevenparticipants commented that Head cue was weak and not informative for afine pointing task, which may account for its lowest accuracy. Also, theiruncertainty is likely to result in a long time before they made a decision.By contrast, Hand cue was reported by 9 participants that it was verystraightforward as they can extend the IVA’s arm and index figure to thetarget. Regarding the comparison between Hand and HH, although par-ticipants showed significantly higher confidence in HH than Hand, we weresurprised to find that Hand cue achieved similar accuracy (only 4.4% dif-ference) to the combination of hand and head cue (HH) (Figure 3.4 Top).Eight participants commented that HH cue was more natural than Hand.Similar to their subtle di↵erence in accuracy, there was also no significantdi↵erence between Hand and HH in Time. According to Figure 3.4 Bottom,the mean time of the HH cue is slightly higher than the Hand cue. Onepossible explanation is that participants would take more time to observeand combine two cues for judgment. In contrast, they only took one cue asthe reference thus possibly made faster decisions in Hand condition.In 30 setting, we found significant di↵erences in confidence level forpairs across pointing cues except for HH and Hand. However, there was nosignificant di↵erence across cues, both in Accuracy and Time. It is under-standable due to the simplicity of the task with fewer targets and a widerdistance between targets. However, it was interesting to find that 11/12participants reported that Head was the least confident, whereas it did notlead to a pronounced worse result.263.3. DiscussionFigure 3.6: Radial heat maps showing the error distribution over the targetposition: for each target, we recorded number of participants who mademisjudgment on it and represented the number using a color coding scheme.Participants performed worse in left/right area compared to front/behindarea.3.3.2 DegreeNot surprisingly, participants performed better in 30 compared to 15 ac-cording to results in Accuracy and Time. In 15 setting, similar targets areplaced closer. There was a subtler angular change of pointing cues betweentargets. Not only would participants be prone to make more misjudgments,but also they might need more time to discern the pointed target.3.3.3 Error DistributionAn analysis of the heat maps in Figure 3.6 suggested that although Handcue and HH cue improved the overall accuracy to some extent, participantsmade more misjudgments in the left and right part in the circle of tar-gets, especially the left behind area (target 3, 4, 5). This is aligned with273.4. Design Implicationstheir comments. Five participants indicated the diculty in recognition ofthe pointing direction when IVA was pointing to left or right, and two at-tributed it to the closer distance between targets in left/right areas fromthe observer’s perspective compared to front/back areas. One participantmentioned that “I have more confidence in the right part (target 15-21)compared to the left part”. We speculate that the dominant hand mightaccount for it based on the fact that the participant is right-handed. Thecomparison of users’ performance in di↵erent areas of the targets and howto improve users’ accuracy in particular areas can be research questions toexplore in future work.3.4 Design ImplicationsThe findings summarized below could help the design of pointing in thevirtual environment. When designing interactions for fine pointing (15),Hand cue is a highly accurate (78.2%; only 4.4% lower than HH cues) andthe fastest indicator (0.38 seconds faster than HH), except for a loss ofnaturalness. In the scenario of coarse pointing (30), Head cue appearsto be sucient with good accuracy (90.2%) and fast time (3.90 seconds),although participants perceived it with least confidence.3.5 Limitations and Future WorkThere are several limitations to the experiment presented in this chapter.We disabled eye gaze cue in our experiment due to the dominant e↵ect ofhead gaze cue and the generalization to the backward-pointing. In futurework, we plan to evaluate the e↵ect of pointing cues with eye gaze cueincluded to provide guidelines on the cues selection for forward-pointing. Weplaced horizontal objects as targets and evaluated participants’ perception ofIVA’s pointing. It remains a question whether our findings would hold whenvertical targets were included. Future work could investigate participants’283.5. Limitations and Future Workaccuracy of interpreting IVA’s pointing with objects placed horizontally andvertically, to capture a more solid finding of the e↵ect of pointing cues.Another limitation is that we used degree with two levels (15and 30) as thediculty level. These two levels may not be sucient to represent a generalfine pointing or coarse pointing task. Future studies should incorporate morelevels to enhance the validity as well as the generalization of the results.We used a spherical FTVR display as the virtual environment to evaluatethe perception of pointing. Prior studies indicated the eye gaze directionwas perceived better in a spherical display than the planer counterpart [40,66]. As we disabled the eye gaze in our experiment, it becomes a questionthat whether the better performance of the spherical FTVR display wouldstill hold. In addition, a real-world condition with human pointing can beincluded as a baseline to give a more general discussion of how pointing cuesa↵ect users’ perception of pointing in di↵erent environments. Our resultssuggest that participants made more misjudgments in the left/right areacompared to the front/back area in a circle of targets. In future work, wewould like to identify the e↵ect of target positions on users’ perception andpropose ways to improve users’ accuracy when they have a bad performancein particular areas.293.6. Summary3.6 SummaryIn this chapter, we reported an experiment in evaluating the e↵ect of headand hand cues on users’ perception of an IVA’s pointing direction. Wemeasured their performance in interpreting IVA’s pointing from a spheri-cal FTVR display to the real-world targets under pointing scenarios withtwo diculty levels. With collected data in accuracy and time as well assubjective responses, we analyzed the performance and suggested design im-plications. The findings not only help our design of IVA with high pointingaccuracy in the following chapter, but also provide guidelines for selectingcues when designing pointing in virtual environments.30Chapter 4Comparison of PointingAccuracy Between IVA andHuman Pointing4.1 IntroductionDesigning an intelligent virtual agent that can point to locations in thereal world as accurately as a real person is a challenge. In the previouschapter, we evaluated the e↵ect of head and hand cues on users’ perceptionof pointing and summarized the findings that can serve as a guideline tohelp determine IVAs’ pointing cues. However, to enable an IVA to point asaccurately as a real person, there involves making more design decisions.In this chapter, we propose a design of an IVA with rationales illustratedin Section 4.2 and present an experiment using our designed IVA to inves-tigate the di↵erence of the pointing accuracy between the IVA and naturalhuman pointing. Prior research shows that the distance between users andtargets can a↵ect users’ interpretation of the pointing direction [7, 21, 83].So we configured the distance as an independent variable to investigate howthe accuracy changes in di↵erent distances. Through an analysis of collecteddata and discussion of the result, we demonstrated that our designed IVAcan point to the real world more accurately than a real person. Specifically,the IVA outperformed the real person in the vertical dimension and yieldedthe same level of accuracy horizontally. We also discuss design factors that314.2. Design Rationale for IVAmight contribute to the result and suggest design implications. We believeour findings provide design guidelines for visual representations of IVAs withpointing gestures.4.2 Design Rationale for IVAThis section provides the design rationale regarding the key aspects of ourIVA design.4.2.1 IVA AppearanceFigure 4.1: The Intelligent Virtual Agent (IVA) is situated in a sphericalFish Tank Virtual Reality (FTVR) display that provides 3D depth cues andenables the IVA to point from the virtual world to the real world.324.2. Design Rationale for IVAAs suggested in Section 2.1, the state of the art in photo-realistic rep-resentations for IVAs is subject to the Uncanny Valley [61]: the degree ofrealism does not necessarily lead to positive evaluations. Considering thise↵ect, Schneider et al. [70] suggests using a non-human appearance withthe ability to behave like a human. Following this suggestion, we chose aJapanese female cartoon character as our IVA to avoid the negative feelingscaused by a human-like appearance while supporting human-like behaviors.Our IVA’s appearance is designed with large eyes and small nose (Figure 4.1)to provide the characteristic of the baby schema [54], which could induce apleasurable feeling as people perceive it as cute [41].4.2.2 Pointing Gestures(a) Eye-fingertip alignment (b) Arm vector pointingFigure 4.2: (Left) Eye-fingertip alignment commonly found in Human Point-ing: humans commonly align the dominant eye with the fingertip whenpointing at a target. (Right) Arm vector pointing: arm vector, eyes andhead all point at a target directly without coordination.Our goal is to design an IVA so that users’ perception of IVA’s pointingis as accurate as or more accurate than the perception of a real person’s334.2. Design Rationale for IVAnatural pointing. Thus, our design is informed by how humans naturallypoint and how users perceive another person’s pointing. As we discussed inSection 2.2.1, humans commonly point to where they are looking by aligningtheir fingertip with the gaze of their dominant eye [10, 42] (Figure 4.2 Left).However, when it comes to the perception of another person’s pointing, theperceived accuracy is not a↵ected by the occlusion of eyes [24] and headorientation [10]. This suggests that humans seem to mainly refer to a per-son’s arm vector for detecting their pointing direction. This might introduceambiguity because the location followed by the arm vector is di↵erent fromthe actual target location followed by the eye-fingertip line (Figure 4.2).Bangerter & Oppenheimer [10] found that the observers exhibited a per-ceptual bias towards the upside of the target as well as the pointer’s side,potentially because of the ambiguity. Therefore, rather than design IVAto point the same way as humans do commonly (eye-fingertip alignment),we instead remove the eye-fingertip alignment in the design of IVA’s point-ing gestures, that is, the arm vector directly points at targets (Figure 4.2Right). We expect that this approach will eliminate the perceptual errors ofeye-fingertip alignment and result in a perceptually accurate IVA pointinggesture.For pointing cues, previous research has found that the orientation ofthe pointer’s eyes, head and hand are used as visual cues for an observer tointerpret a pointing gesture [42, 47]. Findings from the user study presentedin the previous chapter show that the hand cue alone provides accuratepointing perception but it is less natural than head+hand cues. Thus inthis experiment, we included all the pointing cues, i.e., eyes, head and handcues, to promote accurate and natural perception.In summary, we design our IVA to point with an outstretched arm, eyesand head facing the target without the eye-fingertip alignment (Figure 4.2Right). In other words, all of the three parts target the single positionrespectively without coordination. As a baseline, an independent real person344.3. Experiment(RP) was hired to be the pointer. The dominant hand and eye of our RPare both on the right side. RP was instructed to point to the target asaccurately as possible with the head, eyes and outstretched arm. Exceptthat, RP was asked to point naturally without further instructions on thespecific manner about pointing gestures, to capture their specific naturalhuman pointing as the baseline. Both RP and IVA used the left arm topoint to the targets in the left region and the right arm for targets in thecenter or right region.4.3 ExperimentThe goal of our experiment is to investigate the di↵erence between our de-signed IVA’s pointing and a real person’s natural pointing, measured by theaccuracy of a human observer’s interpretation of their pointing to a physicallocation. In doing so, we establish the capability of IVAs to point to thereal world when interacting with humans.4.3.1 IVA ScaleWhile life-scale display exists in previous prototypes [47], in practice IVAdevices such as home assistant systems [1, 16, 27] are relatively small andunlikely to support life-size rendering. Therefore, the di↵erent scale of realperson (RP) and IVA makes it challenging to make a fair comparison on thepointing perception. In terms of whether to control the same retinal imagesize of RP and IVA, there is a trade-o↵ between keeping and changing theobservation distance between the observer and the pointer (RP or IVA).Previous studies show contradictory findings and trends. On the one hand,Chen’s study shows that the task performance of visual reasoning did notvary as long as the retinal image is unchanged, which they demonstratedthrough an experiment with a larger display placed farther than a smallerdisplay [19]. However, on the other hand, based on size constancy [72],354.3. Experimentparticipants might still perceive RP’s arm as longer even if the retinal sizeof RP and IVA are the same by placing RP farther away. To eliminate thee↵ect of the size di↵erence, we include both designs (same retinal size &same observation distance) in our experiment, as we will discuss in detail inSection ParticipantsThirty-six participants (19 females and 17 males) aged between 21 and 30were recruited from a local university to participate in the study with com-pensation of a $10 gift card. All had normal or corrected-to-normal vision.4.3.3 DesignWe followed a 2⇥ 2⇥ 2 mixed design with one between-subjects variable as:• C1 The viewing condition, which could be Same Retinal Size (SameRet)or Same Distance (SameDis). In SameRet, human pointer was placed56 cm farther from the participant to keep the same retinal size ofthe pointer’s arm. In SameDis, the observation distance was the samewithout adjustment.and two within-subjects variables as:• C2 The pointer, which could be Intelligent Virtual Agent (IVA) orReal Person (RP).• C3 The distance, which could be near or far. The distance betweenthe participant and target area is 70 cm in near and 210 cm in far.36 participants were randomly and equally divided into 2 groups. Onegroup went through SameRet and the other did in SameDis. The sequenceof C2 pointer and C3 distance was fully counterbalanced.We measured the Distance Error for a general evaluation of accuracy.Prior research [10, 24] suggested a bias in horizontal & vertical direction364.3. Experimentexisted in people’s perception of the other’s pointing. So we included ErrorBias as a metric and further calculated the absolute error and error bias inthe horizontal & vertical dimensions. We collected subjective data througha post-study interview. We defined the quantitative metrics as follows:• Absolute Error– Distance Error, defined as the Euclidean distance between thecorrect target location and participants’ perceived location.– Horizontal & Vertical Error, which is the horizontal & verticalcomponents of Distance Error, defined as the absolute di↵erencebetween the correct target location and participants’ perceivedlocation along the corresponding axis.• Error Bias– Horizontal & Vertical Error Bias, computed by subtracting theactual position from the perceived location. Hence, a positivevalue indicates an estimation to the right or above the true loca-tions, respectively.4.3.4 Experimental SetupFigure 4.3 shows the top view of the experimental setup, where participantsand IVA/RP are in the two viewing conditions (SameRet and SameDis).The arm length of IVA and RP is 30.5cm and 68cm, respectively. Thevisual angle covering their arms from the participant’s perspective wouldbe di↵erent if IVA and RP were of the same distance to the participant.In SameRet, to ensure the same retinal size of IVA and RP, we calculatedand adjusted RP to move 56cm further away from the participant for thesame visual angle of their arms. Please refer to Appendix B.1 for details ofthe calculation. In SameDis, the observation distance was the same withoutadjustment.37(a) Same Retinal Size(b) Same DistanceFigure 4.3: Top view of the experimental setup. (Top) Same Retinal Size(SameRet): Real person (RP) is 56 cm farther away from the participantto keep the same retinal image size. (Bottom) Same Distance (SameDis):the observation distance between the participant and pointer is the same inIVA and RP.384.3. ExperimentThe rest of the setup was the same in SameRet and SameDis. Theparticipant was seated by the right side of the pointer. The distance betweenthe participant and the target area is 70 cm in near and 210 cm in far.The comprehension of pointing can be classified to be proximal and distal,depending on whether the pointer can touch the referents (targets) or not[69]. Although our IVA is situated in a 3D display without the capability totouch the physical object, we chose 70 cm, which is a fairly close distancebetween the pointer and target, to approximate the proximal pointing andrepresent a near distance condition. We decided 210 cm as the far conditiondue to the limitation of the experimental room suggested in a pilot study.The pilot study also suggested that the height of the participant matters.That is, we ensured that from the side view, the center of the target area,participant’s shoulder and pointer’s shoulder are horizontally aligned. Wealso ensured that the shoulder levels of IVA and RP were matched.We created an 80cm⇥80cm square projected onto a fabric projectorscreen as the target area. Instead of having physical objects, the targetsin our design are locations within the area, which allows us to measure par-ticipants’ accuracy independent of physical objects’ spatial resolution. Wepiloted a blank background with a cross cursor for selection. However, par-ticipants from the pilot study commented that the task was too dicult dueto the lack of reference in a blank background. Thus we provided a relativelydense 40⇥40 line grid as the background as shown in Figure ApparatusAs shown in Figure 4.1, the apparatus of our spherical FTVR display is thesame as described in Chapter 3. In addition, we used an Optoma ML750STprojector with 1024⇥ 768 pixel resolution to display an 80cm⇥80cm targetarea on a flat fabric projector screen. The grid content and target indicatorwere created by Unity3D.39(a) IVA pointing(b) Human PointingFigure 4.4: (Top) Experimental setup for near distance condition with theIVA as the pointer (participants wear tracked shutter-glasses to receive aperspective-corrected, stereo rendering on the spherical display). The IVApointed with the arm, eyes and head all facing the target directly. (Bottom)Experimental setup of the Same Retinal Size (SameRet) condition in neardistance with a real person (RP) as the pointer. The RP naturally alignedhis right dominant eye with the fingertip to point to a location (Figure 4.2)and is farther away from the participant to keep the same retinal size asIVA. Green lines are added for illustrative purposes to show the pointinggesture and targets.404.3. Experiment4.3.6 ProcedureParticipants started by filling out a consent form followed by verbal ex-planations of the experiment (Appendix C). As illustrated in Figure 4.4,participants were asked to sit on an adjustable chair to ensure the horizon-tal alignment of their shoulder and the pointer’s shoulder as suggested inSection 4.3.4. The procedure was the same for both IVA and RP.Each participant was provided with a mouse and a clipboard to holdit. They were instructed to click where they believe IVA or RP was point-ing. They were asked to prioritize accuracy over speed. With a total of4 conditions (IVA vs RP, Near vs Far), each contains 20 trials at di↵erentlocations with the first 5 marked as training. Participants were told the truelocation in the training trials for practice. The 5 locations in the trainingtrials were left middle, right middle, top middle, bottom middle and centerto illustrate the entire region. In the formal trials, the locations of targetswere randomly generated and can be any location inside the target area.To avoid cross-talk with previous targets served as a reference, participantswere instructed to close their eyes between trials.To point, the IVA was controlled by a keyboard to point to any randomlocations inside the target area, whereas RP produced the pointing gestureusing a visible random target while the participant had their eyes closed.The dominant hand and eye of our RP are both on the right side. BothIVA and RP used the left arm to point to the targets located in the leftregion and the right arm for targets in the center or right region. When thegesture was ready, the reference target for RP disappeared and participantswere asked to open the eyes to perform the task. IVA and RP held thegesture until the participant had finished the click and said “okay.” Otherthan this, participants and RP were instructed not to communicate. Onceall conditions were completed, a follow-up interview was carried out for eachparticipant to discuss the comparison of pointers and preference of pointingcues. The study took approximately 30-40 min to complete.414.3. Experiment4.3.7 ResultsFor the ease of comparison, we normalized the absolute error with the min-imum possible error as 0 and maximum value as 1. We also normalizedthe error bias with the minimum possible bias as -1 and maximum valueas 1. As shown in Section 4.3.4, the target area is 80cm⇥80cm. The minand max possible values for the distance error, horizontal/vertical error, anderror bias are (0, 80p2) cm, (0, 80) cm, and (80, 80) cm respectively.We were aware that IVA conditions for SameRet and SameDis are thesame. There has been debate regarding which method to use to analyzethis kind of data. Based on previous work[25, 38], we were suggested touse mixed ANOVA with C1 viewing condition as between-subjects factorand C2 pointer as within-subjects factor. Because our research questionwas whether the mean accuracy change between IVA and RP di↵ered inSameRet and SameDis. Another option, ANCOVA, answers a di↵erent re-search question: whether the RP means, adjusted for IVA means, di↵eredbetween SameDis and SameRet. The ANCOVA was suggested to be appro-priate when the research question is not about gains or changes. So we usedmixed ANOVA for our data analysis.Absolute ErrorWith all assumptions met, we used a 2 (C1 viewing condition) by 2 (C2pointer) by 2 (C3 distance) mixed-model ANOVA on the Distance Error,Horizontal Error and Vertical Error respectively.Distance Error: Results suggest that C2 pointer had a significant maine↵ect on the distance error (F (1, 34) = 21.076, p < .001), with smallererror in IVA (Mean(M) = 0.153, Standard Error(SE) = 0.004) than inRP (M = 0.183, SE = 0.005). The main e↵ect of C3 distance was alsosignificant (F (1, 34) = 125.621, p < .001), with a higher level of accuracy fornear (M = 0.147, SE = 0.004) compared to far (M = 0.189, SE = 0.005).424.3. ExperimentFigure 4.5: Distance Error with means and 95% confidence intervals (CIs):(Left) Participants showed lower distance error in IVA (17.31 cm) than RP(20.70 cm). (Right) Distance errors were larger when participants were far(21.38 cm) compared to near (16.63 cm) from the target. Significance valuesare reported in brackets for p < .05(⇤), p < .01(⇤⇤), and p < .001(⇤ ⇤ ⇤).No significant main e↵ect was found between C1 SameRet and SameDis(F (1, 34) = 0.443, p > .5). There was also no interaction among C1, C2andC3. The mean error with 95% CIs for the two main e↵ects was displayedin Figure 4.5.Horizontal Error: The ANOVA suggested that the horizontal error inC3 near (M = 0.125, SE = 0.004) was significantly lower (F (1, 34) =69.164, p < .001) than in far (M = 0.170, SE = 0.007). No significantmain e↵ect of C2 pointer (F (1, 34) = 0.435, p > .5) was found (IVA: M =0.145, SE = 0.006; RP: M = 0.150, SE = 0.006). There was also no signifi-cant di↵erence between C1 SameRet and SameDis (F (1, 34) = .444, p > .5).No interaction e↵ects were found among C1, C2 and C3.Vertical Error: For the absolute vertical error, there was a significantmain e↵ect of C2 pointer (F (1, 34) = 29.416, p < .001), with smaller errorin IVA (M = 0.128, SE = 0.005) than RP (M = 0.179, SE = 0.007). Error434.3. Experimentfor C3 near (M = 0.139, SE = 0.006) was significantly lower (F (1, 34) =31.744, p < .001) than far (M = 0.169, SE = 0.007), but not for C1 viewingconditions (F (1, 34) = 2.661, p > .1). An interaction e↵ect was observedbetween C1 viewing condition and C2 pointer (F (1, 34) = 5.054, p < .05).Means and 95% CIs for the interaction e↵ect were shown in Figure 4.6.A simple e↵ects analysis revealed that in SameRet, vertical error in RP(M = 0.161, SE = 0.010) was significantly larger (p < .05) than in IVA(M = 0.131, SE = 0.006) . The same holds true (p < .001) for SameDis aswell (IVA: M = 0.125, SE = 0.007; RP: M = 0.198, SE = 0.009). Verticalerror was significantly lower (p < .05) in SameRet (M = 0.161, SE = 0.010)than in SameDis (M = 0.198, SE = 0.009) for RP’s pointing, but not (p >.5) when it comes to IVA.Error BiasBy averaging the error bias for each of the 36 participants, we show thescatter plot in Figure 4.7. With all assumptions satisfied, a mixed-modelANOVA was conducted on the Horizontal and Vertical Error Bias respec-tively.Horizontal Error Bias: We found thatC1 viewing condition,C2 pointerand C3 distance all showed no significant main e↵ect on the horizontalerror bias (F (1, 34) = 0.139, p > .5; F (1, 34) = 3.168, p > .05; F (1, 34) =2.467, p > .1). There was no interaction e↵ect found as well.Vertical Error Bias: In contrast, vertical error bias was significantlydi↵erent (F (1, 34) = 284.837, p < .001) in C2 RP (M = 0.168, SE = 0.008)than in IVA (M = 0.030, SE = 0.009) , but not between C3 near and far(F (1, 34) = 1.660, p > .1). We also found no significant di↵erence betweenC1 SameDis and SameRet (F (1, 34) = 2.20, p > .1) and no interactionamong C1, C2 and C3.444.3. ExperimentFigure 4.6: Vertical Error with means and 95% CIs: IVA yielded higheraccuracy than RP both in SameRet and SameDis.Interview ResponsesWe asked participants questions about the comparison of pointers and whatcues they referred to in di↵erent conditions. We summarize the result inTable 4.1 and Table 4.2.As shown in Table 4.1, regarding the question “which one is easier tojudge the direction”, IVA was voted by the majority (55.6%) especially inSameRet (72.2%). With actual locations being told in the training session,there were 77.8% participants (72.2% in SameRet and 83.3% in SameDis)finding RP had a larger deviation between the actual location and theirexpectation.In terms of the question “among hand, eyes and hand+eyes, what point-ing cues did you mainly focus on to judge the pointing direction”, we pre-sented their responses in Table 4.2. For IVA, 55.6% participants selectedhand as the main reference and 38.9% voted for the combination of handand eyes. As for RP, hand was reported as the main reference by most par-ticipants (61.1%) and 25% indicated that they depended on both hand and454.4. DiscussionFigure 4.7: Scatter plot of each participant’s average error bias: Data pointsabove the zero horizontal line represent upward estimation; Data points tothe right of the zero vertical line show right-ward estimation. Participantsshowed a systematic upward bias (Mean: 13.44 cm) in perceiving RP’spointing.eyes.4.4 DiscussionIn general, based on the results of distance error (Figure 4.5) in Section 4.3.7,we conclude that participants can interpret IVA’s pointing more accuratelythan RP’s pointing. Specifically, based on the results in the horizontal andvertical dimension, we summarize the following major findings:• IVA yielded a significantly higher accuracy vertically and achieved thesame level of accuracy as RP horizontally.• Participants showed a systematic upward bias regardless of the dis-464.4. DiscussionTable 4.1: Interview Responses on the Comparison of IVA and RPIVA RP IVA = RPSameRet SameDis Total SameRet SameDis Total SameRet SameDis TotalWhich one iseasier to judgethe direction?13(72.2%)7(38.9%)20(55.6%)5(27.8%)6(33.3%)11(30.6%)05(27.8%)5(13.9%)Which one haslargerdeviationbetween reallocation andyourexpectation?4(22.2%)3(16.7%)7(19.4%)13(72.2%)15(83.3%)28(77.8%)1(5.6%)01(2.8%)tance when judging RP’s pointing direction.• Errors were larger when participants were farther to the target areaboth horizontally and vertically.• The viewing condition (SameRet or SameDis) did not significantlya↵ect the accuracy di↵erence between IVA and RP.4.4.1 Accuracy Di↵erence Between IVA and RPIn this section, we discuss several design factors that may have contributedto the observed accuracy di↵erence between IVA and RP.Pointing GesturesMost participants used the hand/arm cue to perceive the pointing direction,but RP’s natural pointing followed the eye-fingertip alignment commonlyfound in human pointing. This might introduce ambiguity and result in474.4. DiscussionIVA RPSameRet SameDis Total SameRet SameDis TotalHand9(50%)11(61.1%)20(55.6%)10(55.6%)12(66.7%)22(61.1%)Eyes1(5.6%)1(5.6%)2(5.6%)4(22.2%)1(5.6%)5(13.9%)Hand + Eyes8(44.4%)6(33.3%)14(38.9%)4(22.2%)5(27.8%)9(25%)Table 4.2: Interview responses of pointing cues in IVA and RP.the upward bias. Our design strategy for IVA, not to use eye-fingertipalignment, mitigated the ambiguity and therefore might help improve theIVA’s accuracy to some extent. We will discuss this point in terms of thevertical dimension, and then expand the discussion to the horizontal andhorizontal vs vertical dimensions.484.4. DiscussionFigure 4.8: (Left) Side view (vertical dimension) of human pointing: arm-line location (red dot) is always higher than the actual target location (greendot) followed by the eye-fingertip line. (Right) Top view (horizontal dimen-sion) of human pointing: Left and Right arm pointing both result in thedeviation between arm line and eye-fingertip line.Vertical Dimension: Our finding that participants showed an upwardbias in perceiving pointing is consistent with Bangerter & Oppenheimer’s[10] results: the ambiguity found in human pointing might account for thebias. To illustrate this ambiguity, we show graphs in Figure 4.8 from thehorizontal (top view) and vertical dimensions (side view) when a pointer(eye icon) is pointing to a target (green dot).As discussed in Section 4.2.2, prior research [10, 42] has revealed thathumans commonly place their fingertip on the line joining the dominant eyeand the target (Figure 4.2 Left). The RP in our experiment was asked topoint naturally without instructions. Not surprisingly, we observed that thearm line of RP pointed to a position above the true target. After the experi-ment, we asked RP about how they pointed and found that RP aligned theirdominant eye with their fingertip to point to targets. Thus, we believe theactual target could be found by extending the eye-fingertip line. As shown494.4. Discussionfrom Figure 4.8 Left, due to the eye-shoulder distance, the eye-fingertip line(green line) is not aligned with the arm line (red line), resulting in the factthat the arm-pointing location (red dot) is higher than the actual targetlocation (green dot). If participants depended more on the hand/arm lineas their reference, there would be error and upward bias reflected in theresults.According to the subjective data in RP, during training, 28/36 partic-ipants noticed a large deviation in RP between the real location and theirexpectations with 16 commenting that it was confusing to find that RPpointed quite higher than their estimation. This upward bias was furtherdemonstrated in the quantitative data as suggested in Section 4.3.7 andFigure 4.7. Though most participants reported the deviation, they did notappear to do logical reasoning or calibrate their results in the formal tri-als, because they still show significant upward bias. By contrast, in IVA’straining trials, fewer (8/36) reported the deviation and confusion. It indi-cates that the IVA’s pointing, arm-vector pointing rather than eye-fingertipalignment, is likely to be more aligned with how the majority interpreted thepointing direction. In other words, they appear to judge the pointing direc-tion by matching the arm-vector pointing (IVA) instead of the eye-fingertipalignment (RP). Data further supported this showing that a majority of par-ticipants (22 out of 36 (61.1%) in RP and 20/36 (55.6%) in IVA) reportedthat they mainly focused on the hand/arm cue as the reference to find thepointing location. The finding replicates previous studies [10, 24] and as il-lustrated before, it might be the main reason for the upward bias (Please seeAppendix B.2.1 for more information on the analysis of estimated verticalerror).In terms of the absolute error, one of the most important findings ofour experiment is the significantly lower vertical error in IVA compared toRP as shown in Figure 4.9. With the eye-fingertip alignment removed inIVA, the correct target location would be reached directly by following the504.4. DiscussionFigure 4.9: Horizontal & Vertical Error with means and 95% CIs: IVAyielded significantly lower error (5.2% less) in vertical dimension (IVA: 10.24cm; RP: 14.32 cm) and same level of accuracy as RP horizontally (IVA: 11.60cm; RP: 12.00 cm).arm vector, which is how a majority of participants perceived as discussedbefore. This suggests that having IVA use arm vector pointing rather thaneye-fingertip alignment may account for the improved accuracy.More interestingly, participants seem to unconsciously rely on the armvector pointing and are unaware of the eye-fingertip alignment, because theywere surprised by it when being told after the experiment. Though, theyunderstood it shortly after trying to point by themselves to a target. Basedon this observation, we could infer that most participants naturally pointwith the fingertip aligned with the dominant eye when pointing themselves.However, they are not cognizant of this instinctive behavior and rely moreon the arm vector when interpreting another person’s pointing direction.Horizontal Dimension: We did not find significant error di↵erence be-tween IVA and RP (Figure 4.9) in the horizontal dimension. As illustratedin Figure 4.8 Right, we found that the smaller eye-shoulder distance from thetop view compared to the side view might result in a smaller error, which we514.4. Discussiondiscuss in detail with equations and figures in Appendix B.3.1. We also pro-vide a general analysis of the estimated horizontal error in human pointingin Appendix B.2.2.Participants also did not show a significant di↵erence in the horizontalerror bias between IVA and RP. However, seven participants indicated thediculty, and five attributed it to the oblique viewing angle as they wereseated by the right side of the pointer. The viewing angle might have somee↵ect on their judgment and can be an independent variable to explore inthe future.Horizontal vs Vertical Dimension: As shown in Figure 4.9, in RP, par-ticipants were more accurate horizontally than vertically, which is consistentwith previous findings [20, 22] that users have better horizontal visual acu-ity to perceive gaze directions compared to vertical acuity. We speculatethe larger eye-shoulder distance in the top view compared to the side viewmight also account for that, which we discuss in detail in Appendix B.3.2.Another potential explanation is that the arm switch of the pointer in thehorizontal dimension makes the task easier. The left/right arm implies theleft/right region, which might reduce the diculty to some extent.Contrary to the result in RP, the horizontal error was slightly largerthan the vertical error in IVA. The previous finding of the human visualacuity cited above [20, 22] does not hold here, which might be because therewas a gaze perception di↵erence between IVA and RP, related to the IVA’snon-human appearance. Also, the oblique viewing angle discussed beforecan potentially have some e↵ect.Another factor could be the switch of arms. We discussed that the armswitch might improve the accuracy by implying whether the target is in theleft/right region. However, it might hinder participants’ performance fromthe perspective of the arm pointing gestures. Most participants mainly referto the hand/arm cue to find the pointed target. But it should be noted thatthe rotary center of the left and right arm is di↵erent. Once the arm was524.4. Discussionchanged from one to another, participants had to mentally “rebuild” thearm vector with the new origin. While for the vertical dimension, the leftand right arm’s origin is the same from the side view. Hence the changewould be continuous without the need for rebuilding vectors. One partici-pant confirmed this with comments “I mainly referred to the head cue forhorizontal judgment, because the arm switch made it dicult to tell hori-zontally using the hand cue but it is e↵ective for vertical direction”. Twoparticipants indicated the diculty caused by the arm switch between trials.To explore the phenomena, further experiments are needed with controlleddesign on these factors.FTVR DisplayDespite providing head tracking and depth cues, the visualization of IVA onthe FTVR display is not the same as reality. FTVR displays still have manytechnical and perceptual limitations, e.g., lower resolution and fewer depthcues than the reality. These constraints could a↵ect participants’ accuracy.One participant pointed out that the IVA’s eyes were like a 2D painting withthe lack of stereo, making it hard to tell the gaze direction. We speculateit might be because the cartoon appearance is not realistic and looks flatthough stereoscopic cue was provided throughout the study. Two partici-pants indicated the diculty saying that IVA lacked the depth information(e.g., shadows and background). However, all of their quantitative data stillsuggests a higher accuracy in IVA than in RP. This indicates that our dis-play’s constraints did not have a significant negative impact on participants’performance. The e↵ect of display quality characteristics on the perceptionof pointing should be identified with further user studies.AppearanceThe di↵erent appearances between RP and IVA, such as gender, realism andeyes may have some influence on participants’ perception. Prior research534.4. Discussionon user preferences for agents’ gender presents contradictory findings andtrends, which may be due to user characteristics or context [67]. Regard-ing the realism, RP was reported by four participants to be more familiarand common. Two perceived IVA’s bigger eyes helpful for judging the di-rection. In contrast, RP’s eye gaze cue was reported to be subtle by threeparticipants, with one indicated it was even harder to discern the changein the horizontal direction. Moreover, two participants said that they triedto avoid the eye contact in RP, while there was no such concern in IVA.Besides, previous research showed that users exposed to images of cute an-imals were more physically tender in their motor behavior and performedbetter on a task that demanded extreme carefulness [71]. The cuteness ofthe IVA might have some e↵ect on participants’ performance. Future stud-ies could determine the extent to which each aspect contributes to pointingperception.4.4.2 DistanceNot surprisingly, participants perceived more accurately when targets werecloser than farther, no matter whether it is in SameRet or SameDis. Withthe same target area, farther distance results in a subtler angular changefor all the pointing cues (head, eyes and hand). Three participants alsocommented that it was hard to extend the arm line to locate the targetwhen further away. However, despite the higher level of diculty for far-ther distances, our IVA can still point more accurately than the real person,indicating the e↵ectiveness of our IVA design in pointing gestures and ap-pearance. It also suggested that users are able to know where an IVA ispointing within a range of distance. Practically, for example, this impliesthat should an IVA be used as a home assistant or a virtual tutor, it can besituated in a single location and still be able to point to near and far objectswhile indicating, “It’s over there.” to provide deictic indications to the user.544.4. Discussion4.4.3 Viewing ConditionWe introduced the viewing condition as a between-subjects factor to inves-tigate its potential e↵ect on the di↵erence between IVA and RP. The resultsshow that changes in the retinal image size (SameRet or SameDis) did nota↵ect the accuracy di↵erence between IVA and RP. In other words, ourmain finding that IVA can point as well as or better than RP holds both inSameRet and SameDis. Conducting experiments in di↵erent viewing con-ditions also helps to validate our results and study design. A secondaryfinding was that RP in SameRet pointed more accurately than in SameDisvertically as shown in Figure 4.6, it suggested that compared to SameDis,participants seem to perform better when perceiving a smaller retinal sizein SameRet. This is an interesting topic that can be studied in the future.However, participants’ comments about the e↵ect of the size di↵erenceare quite divergent. In SameDis where the retinal size is not adjusted, 5out of 18 participants commented that IVA’s pointing was easier. Theyattributed the reason to be the smaller size of IVA so that they perceiveda more noticeable change of eyes, hand and head orientation. Conversely,four participants who found RP easier commented that RP’s life-size wasmore natural, with two reporting that the more extended arm reduced thediculty. Though we kept the same retinal size in SameRet, their com-ments are also divergent, likely due to the size constancy as introduced inSection 4.3.1. Four emphasized the benefits of IVA’s smaller size whereastwo preferred the life-size of RP. While future studies can quantify individu-als’ sensitivity to this factor, we also note that from a practical perspective,our study shows that there is unlikely a one-size-fits-all solution to optimizethe size and visual representation of an IVA. Thus, allowing users to tailortheir IVA’s appearance would be advisable even though accurate pointingwill work with a generic IVA that can point to the real world.554.5. Design Implications4.5 Design ImplicationsOur study shows that an IVA can be designed to point at real-world loca-tions more accurately than a real person. We enumerate the main designimplications from this study:1. Removing eye-fingertip alignment could potentially mitigate the knownambiguity found in human pointing, which may help improve the point-ing accuracy, especially in the vertical dimension.As suggested in Section 4.4.1, in order to design an IVA that can accu-rately point to physical locations, our study suggests that removing theeye-fingertip alignment can mitigate the ambiguity shown in humanpointing, which may help participants perceive the pointing directionmore accurately in the vertical dimension. Future studies would beneeded to precisely quantify this potential benefit as well as provide amethodology to understand the human pointing better.2. The eye-fingertip alignment should be considered when it comes to thedetection and estimation of the human pointing direction.The results show that participants mainly rely on the hand to judgethe other’s pointing direction, and they were mostly unaware of theeye-fingertip alignment. When they want to point at something, theynaturally point with the fingertip aligned with their dominant eye.While we were looking for design implications for IVAs’ pointing, italso indicates that for human body posture estimation, the alignmentshould be taken into consideration. Because this is how humans com-monly point in daily life even though most are not aware of it. Itwould be interesting to re-run Bolt’s “Put that there.” experimentswith this factor taken into account [13] to see how to make IVAs ac-curately interpret users’ pointing gestures.3. More cues are required for perceiving the pointing to objects farther564.6. Limitations and Future Workaway.According to our results, when participants were farther from the tar-get, the accuracy of the pointing perception decreased significantly.Visual cues such as the orientation of the head, hand and eye gazemight not be sucient to accurately indicate the target. Additionalverbal cues, such as the location or feature description, should beconsidered to convey the pointing direction eciently, which betterresembles the human pointing behaviour. For example, a combina-tion of verbal description, i.e., “it’s on the table over there”, with apointing gesture can be implemented with IVA. A future study couldinvestigate the natural communication mechanisms combining voiceand deictic gestures.4.6 Limitations and Future WorkOur experiment has several limitations. The technical constraints of our dis-play apparatus made moving it infeasible. So the experimental environmentbetween IVA and RP was not controlled to be the same as shown in Fig-ure 4.4. Though we confirmed with RP that his pointing behavior followedthe eye-fingertip alignment, it would be more helpful and convincing if wecould use the motion capture system to track the posture and provide somedata about the accuracy. The RP in the experiment was instructed to pointat the same location until the participant said ok. Although we observedthat the RP did not keep the pointing gesture at the same location for longerthan 15 seconds, it was inevitable that there might exist human errors, suchas hand tremor and jitters, that may negatively impact participants’ accu-racy, while IVAs do not su↵er from these errors. We hired one single RP asthe pointer and without explicit instructions, the RP pointed naturally andit is in the same way as most humans point [10, 42]. In other words, wewere comparing our designed IVAs’ pointing with natural human pointing,574.6. Limitations and Future Workand the natural pointing followed the common human pointing mechanismas expected. However, besides this common mechanism, human pointingdoes indeed have less common variations depending on di↵erent pointinghabits. Future studies with multiple RPs would test the robustness of ourresults as well as investigate how users perceive when there are variationsto provide upper and lower bounds of IVA/RP di↵erences.The design of an IVA involves many factors. Though the removal of theeye-fingertip alignment was suggested to potentially mitigate the error shownin the vertical interpretation of RP’s pointing, the e↵ect of it cannot bedetermined without controlled experiments. As suggested in Section 4.4.1,the appearance and FTVR display are likely to have some e↵ect on theestimation of the pointing direction as well. Future work will draw attentionto controlled experiments for each of the design factors to demonstrate theire↵ects and the degree of individual’s sensitivity to the cues that we observed.For example, to precisely quantify the e↵ect of the eye-fingertip alignment,we can have an IVA point with eye-fingertip alignment to compare with thecurrent design. Through studying di↵erent pointing configurations, we cancreate a set of configurable IVA characters that individuals can personalizeto optimize their interactions with the IVA.584.7. Summary4.7 SummaryIn this chapter, we presented the design of a 3D visual IVA situated in aspherical FTVR display to point to the real world with design factors delib-erately decided. We conducted an experiment to investigate the di↵erence inpointing accuracy between our designed IVA and natural human pointing,measured by participants’ accuracy of interpreting the pointing direction.Results demonstrated that in general, the IVA can point more accuratelythan a real person. Specifically, the IVA outperformed the real person inthe vertical dimension and yielded the same level of accuracy horizontally.Also, as expected, our human pointing baseline results replicate previousstudies [10, 24], showing that participants have a perceptual bias when inter-preting a real person’s pointing. Specifically, a pronounced overestimationwas found in the vertical dimension. With a discussion on participants’ accu-racy as well as subjective responses, we illustrated our speculation that thebias might be attributed to the ambiguity that is related to the eye-fingertipalignment commonly found when humans point at objects (Figure 4.2). Theremoval of the alignment in our design of IVA mitigated this known ambi-guity, which may account for the IVA’s higher accuracy. Meanwhile, weindicated that other design factors might also have e↵ects on the result,which we will explore in future work. We believe our study and IVA de-sign provide guidelines on designing IVA’s pointing gestures, and help pavethe way for other research about users’ perception of pointing either in thevirtual environment or in the real world.59Chapter 5ConclusionsIn this thesis, we aim to enable an IVA to point to the real world as accu-rately as a real person. To meet this research goal, we deliberately madedecisions on IVAs’ design factors. We first evaluated the e↵ect of head andhand cues on users’ perception of pointing. On top of the findings from theexperiment, we further determined other design factors and provided thedesign rationale. Using our designed IVA, we conducted an experiment toanswer our main research question: How accurately can an IVA point tothe real world compared to a real person? We investigated this question bymeasuring users’ accuracy of interpreting the pointing direction to a physi-cal location. Results demonstrated the capability of an IVA to point to thereal world more accurately than a real person. We believe the strategies weemployed provide design guidelines for 3D visual representations of IVAs’pointing gestures.This chapter revisits and summarizes the contributions of this thesis.Limitations of our study are presented afterwards. We conclude this chapterwith suggestions of potential applications and promising future directions,along with final remarks.5.1 Contribution Summary• Demonstrated the capability of an IVA to point to the realworld more accurately than a real person. Results from ourstudy on the comparison of pointing accuracy between our designedIVA and a real person suggested that, in general, users can interpret605.1. Contribution Summarythe IVA’s pointing direction more accurately than a real person’s natu-ral pointing. Specifically, the IVA outperformed the real person in thevertical dimension and yielded the same level of accuracy horizontally.• Established design guidelines on selecting pointing cues in thevirtual environment. We selected head and hand cues for the eval-uation and disabled the eye gaze due to the head cue’s dominant e↵ectin the overall gaze perception [73] and the generalization to backward-pointing [81]. Through an experiment in evaluating the e↵ect of headand hand cues on users’ perception of the IVA’s pointing, we sug-gested design guidelines applicable to pointing scenarios with di↵erentdiculty levels: among Head, Hand and Head+Hand (HH) cues, infine pointing, Hand cue is a highly accurate and the fastest indicatorexcept for a loss of naturalness compared to HH cue. While in thescenario of coarse pointing, Head cue performs suciently well thoughparticipants perceived it with the least confidence.• Established design strategies for IVA pointing. We proposeda 3D visual representation of an IVA with design factors deliberatelydecided. Our designed IVA is capable to point accurately from a spher-ical FTVR display into the real world. To find reasons attributed toIVA’s higher accuracy than the real person, we extensively discussedthe potential design factors. We found that removing the eye-fingertipalignment likely improved the IVA’s accuracy in the vertical dimen-sion.• Confirmed the robustness of previous human pointing re-sults. Our results for the human pointing baseline were consistentwith previous literature that users mainly focus on hand to find thepointing direction and have a perceptual bias when interpreting theother’s pointing [10, 24]. Particularly, we found users had a systematicupward bias in the vertical dimension, very likely because of the am-615.2. Limitationsbiguity due to the eye-fingertip alignment commonly found in humanpointing.5.2 LimitationsDemographics of participants The participants in our experiments aregenerally from a pool of engineering graduate students. They might havea higher exposure to VR or 3D displays and more experience in interactingwith them, which may weaken the external validity of our result to generalizeto the general public. Another limitation is the sample size. We referredto previous similar perception studies [10, 83] and ran prior power analysisto determine the sample size in our experiments. However, a larger numberof participants would give more reliable results with greater precision andpower.Controlled laboratory experiments We conducted controlled experi-ments in a lab environment. Participants were required to consciously inter-pret the pointers’ pointing direction as accurately as possible from a fixedposition. However, in daily interactions, they may rarely be confronted withsuch scenarios to consciously estimate the pointed location. So it might in-terfere with the automatic process of users’ perception of pointing.Experimental design Though we have put e↵orts into designing our ex-periment and eliminating the confounding variables, some limitations stillexist. Some specific issues in our experimental design may hinder the gener-alization of our findings, for example, the placement of only horizontal tar-gets as discussed in Chapter 3, the distance to the target area constrainedby the lab space, and the comparison with a single real person’s naturalpointing as discussed in Chapter 4. There are also some systematic errorslike the potential human pointing error (e.g., the hand jitters) as discussedin Chapter 4. These limitations may negatively a↵ect the validity of our625.3. Potential Applicationsresults.5.3 Potential ApplicationsThis thesis focuses on users’ perception of pointing, which is a foundationalbuilding block for pointing interactions with IVAs. The design strategies andfindings can benefit potential applications with IVAs’ pointing interactionsinvolved.Though our experiments were conducted within the context of usingan intelligent virtual agent, the design implications are also applicable tophysically embodied agents (i.e., robots) [45] and avatars. Avatars are dis-tinguished from agents by the element of control [33]. An agent is anautonomous entity driven by artificial intelligence [31], while an avatar is avirtual representation of a human completely controlled by the human [77].We discuss the potential applications in these two categories.5.3.1 AgentsThanks to the advances in artificial intelligence, intelligent agents, includingvirtual agents and physical robots, are becoming part of our daily life [46].Our findings and suggestions can serve as a guideline to help design agentspointing with high accuracy, which would promote a more human-like in-teraction and benefit our life in many areas. For example, by introducingpointing gestures, smart home assistants (e.g., Amazon Alexa [1], GoogleAssistant [27], or Apple Siri [4]) can help users find their items by pointingto the location complementing the voice interface. Virtual tutors or robotscan teach students more eciently and naturally, through pointing to es-tablish joint attention like a real tutor usually does. On the other hand,we suggest taking into consideration how humans commonly point whendesigning agents to accurately interpret users’ pointing direction. In thismanner, for example, medical robots would better understand the indicated635.4. Future Workreference pointed by patients during the interaction and thus realize a moree↵ective treatment.5.3.2 AvatarsTypically the behaviors of avatars are the reflection of those executed byhumans in real-time [8]. However, our findings show that the avoidance ofeye-fingertip alignment, which is commonly found in human pointing be-havior, is critical to make observers accurately interpret avatars’ pointingdirection. We suggest modifying the reflection of human pointing behav-iors to match the arm-vector direction (Figure 4.2 Right) in the design ofavatars. Although it might not be human-like, most users are not aware of itwhen perceiving others’ pointing, but they naturally followed the alignmentwhen pointing by themselves, as discussed in our study (Chapter 4). Prac-tically, in telepresence, local users can better judge the pointing directionof remote users’ avatars and hence facilitate e↵ective remote collaborations.Avatars are also used in collaborative virtual environments (CVEs) that al-low users to interact with one another in the virtual environment via theiravatars. Providing e↵ective pointing gestures would improve the richness ofinteractions in CVEs [83].5.4 Future Work5.4.1 Identification of Each Design Factor’s E↵ectWe proposed a 3D visual representation of an IVA and demonstrated theIVA’s capability to point to the real world more accurately than a realperson. Although we suggest that removing the eye-fingertip alignmentmay account for the improved accuracy in the vertical dimension, we alsodiscussed other design factors that might a↵ect the IVA’s pointing accuracy.Our research paves the way for other controlled experiments in the futurework to precisely quantify the e↵ect of each design factor on users’ perception645.4. Future Workof IVA’s pointing.5.4.2 Comparison with Other DisplaysOur studies were conducted using an IVA situated in a spherical FTVR dis-play. It is an open question as to how the findings can be applied to otherdisplays. For 2D conventional flat displays, it has been shown that the flatdisplay achieved worse performance in gaze perception than a spherical dis-play [66]. Conventional flat displays like televisions lack the motion parallaxand stereoscopic cues, which are essential depth cues for pointing perception[47]. In addition, there exists a common phenomenon called Mona Lisa ef-fect in 2D flat displays. It describes the e↵ect that a character’s eyes seem tofollow the user irrespective of the user’s position [59], which could negativelya↵ect users’ perception of pointing.For flat FTVR displays, gaze, depth and size perception have been shownto be better in a spherical FTVR than a flat counterpart [40, 87]. Butpointing perception also includes the arm orientation, which has not beenevaluated yet. Another related issue is the vergence-accommodation conflict(VAC). Users’ eyes converge on the virtual objects far away from the screenbut focus on the screen surface. It can contribute to motion sickness andeyestrain [79]. Although participants in our experiments were required tosit on a fixed chair, we did not constrain their head motion. If they tried tosee di↵erent sides of the IVA, they may naturally move their head followinga curvature. With a spherical display, they could keep a relatively constantscreen distance. While for the planer counterpart, users’ viewing distanceto the screen surface would change while moving their head, which mightresult in a more pronounced conflict e↵ect. Future studies are requiredto investigate these problems and evaluate IVAs’ pointing accuracy withinthese displays.Likewise, for immersive systems like head-mounted displays (HMDs) orCAVE displays, it remains unanswered whether our design suggestions would655.4. Future Workhelp improve the pointing accuracy of IVAs in these displays. Compared toFTVR displays, the VAC is believed to be more severe in HMDs as FTVRdisplays render a smaller depth interval of the 3D space [50]. Comparing theperformance of our spherical FTVR display with other 3D displays wouldalso give us a better understanding of the unique features of our display.5.4.3 Pointing Interactions with Verbal Cues AddedGesture and language are highly integrated components in interpersonalconversation [11, 30, 57]. Our work provides a foundation for designing IVAsthat can point accurately to the real world. However, during a conversation,people do not rely on pointing gestures exclusively because of the risk ofmisunderstanding [10]. Typically, they will rely di↵erently and flexibly ongestural or verbal means [9]. Thus, a future step will concentrate on therole of pointing gestures with verbal cues given to establish joint attentionwith the IVA. Furthermore, with users’ perception of pointing established,we would be able to apply deictic pointing to social interactions that involvepointing in research or industrial applications in future work.665.5. Concluding Remarks5.5 Concluding RemarksAs the state of the art in the design of voice interfaces for home assistantsand other digital assistants grows, an embodied IVA that can provide ges-ture cues is expected to draw more attention for a human-like interaction.In this thesis, with design rationales elaborated, we proposed a 3D visualrepresentation of an IVA and demonstrated the IVA’s capability to point tothe real world more accurately than a real person. Our work shows how a3D visual IVA within a spherical FTVR display can provide e↵ective point-ing gestures, which could be used in conjunction with a voice interface fornatural communication bridging the virtual and the real world.67Bibliography[1] Mark Alexa. Amazon Alexa: 2017 User Guide + 200 Ester Eggs. In-dependently published, 2017. (see pages: 1, 7, 8, 35, and 63)[2] Elisabeth Andre´, Thomas Rist, and Jochen Mu¨ller. Webpersona: alifelike presentation agent for the world-wide web. Knowledge-BasedSystems, 11(1):25–36, 1998. (see page: 10)[3] Michael Argyle and Mark Cook. Gaze and mutual gaze. 1976. (seepage: 14)[4] Jacob Aron. How innovative is apple’s new voice assistant, siri?, 2011.(see page: 63)[5] Ferdinando Arzarello and Ornella Robutti. Approaching functionsthrough motion experiments. Educational Studies in Mathematics,57:305–308, 01 2004. (see page: 9)[6] Robert K Atkinson. Optimizing learning from examples using animatedpedagogical agents. Journal of Educational Psychology, 94(2):416, 2002.(see page: 10)[7] Ignacio Avellino, Ce´dric Fleury, and Michel Beaudouin-Lafon. Accu-racy of deictic gestures to support telepresence on wall-sized displays.In Proceedings of the 33rd Annual ACM Conference on Human Factorsin Computing Systems, pages 2393–2396, 2015. (see pages: 17, 31)[8] William Sims Bainbridge. Berkshire encyclopedia of human-computer68Bibliographyinteraction, volume 1. Berkshire Publishing Group LLC, 2004. (seepage: 64)[9] Adrian Bangerter. Using pointing and describing to achieve joint focusof attention in dialogue. Psychological Science, 15(6):415–419, 2004.(see pages: 9, 66)[10] Adrian Bangerter and Daniel Oppenheimer. Accuracy in detecting ref-erents of pointing gestures unaccompanied by language. Gesture, 6:85–102, 01 2006. (see pages: 4, 12, 14, 34, 36, 49, 50, 57, 59, 61, 62,and 66)[11] Janet Beavin Bavelas and Nicole Chovil. Visible acts of meaning: Anintegrated message model of language in face-to-face dialogue. Journalof Language and social Psychology, 19(2):163–194, 2000. (see pages: 1,66)[12] Gary Bente, Sabine Ru¨ggenberg, and Nicole C Kra¨mer. Socialpresence and interpersonal trust in avatar-based, collaborative net-communications. In Proceedings of the Seventh Annual InternationalWorkshop on Presence, pages 54–61, 2004. (see page: 7)[13] Richard A Bolt. “Put-that-there”: Voice and gesture at the graphicsinterface, volume 14. ACM, 1980. (see pages: 1, 6, 7, and 56)[14] George Butterworth. Pointing is the royal road to language for babies.In Pointing, pages 17–42. Psychology Press, 2003. (see page: 12)[15] George Butterworth and Shoji Itakura. How the eyes, head and handserve definite reference. British Journal of Developmental Psychology,18(1):25–50, 2000. (see pages: 12, 14)[16] Michael Calore. Facebook made you a smart-home device with a cameraon it, Oct 2018. (see page: 35)69Bibliography[17] Justine Cassell, Tom Stocky, Tim Bickmore, Yang Gao, YukikoNakano, Kimiko Ryokai, Dona Tversky, Catherine Vaucelle, andHannes Vilhja´lmsson. Mack: Media lab autonomous conversationalkiosk. In Proc. of Imagina, volume 2, pages 12–15, 2002. (see page: 11)[18] Justine Cassell, Hannes Ho¨gni Vilhja´lmsson, and Timothy Bickmore.Beat: the behavior expression animation toolkit. In Life-Like Charac-ters, pages 163–185. Springer, 2004. (see page: 10)[19] Jian Chen, Haipeng Cai, Alexander P Auchus, and David H Laidlaw.E↵ects of stereo and screen size on the legibility of three-dimensionalstreamtube visualization. IEEE transactions on Visualization andComputer Graphics, 18(12):2130, 2012. (see page: 35)[20] Milton Chen. Leveraging the asymmetric sensitivity of eye contact forvideoconference. In Proceedings of the SIGCHI conference on Humanfactors in computing systems, pages 49–56. ACM, 2002. (see page: 52)[21] Kelvin Cheng and Masahiro Takatsuka. Hand pointing accuracy forvision-based interactive systems. In IFIP Conference on Human-Computer Interaction, pages 13–16. Springer, 2009. (see pages: 12,31)[22] Marvin G. Cline. The perception of where a person is looking. TheAmerican Journal of Psychology, 80(1):41–50, 1967. (see page: 52)[23] He´le`ne Cochet and Jacques Vauclair. Deictic gestures and symbolicgestures produced by adults in an experimental context: Hand shapesand hand preferences. Laterality, 19:278–301, 01 2014. (see page: 9)[24] Sarah Cooney, Nuala Brady, and Ailbhe Mckinney. Pointing perceptionis precise. Cognition, 177, 04 2018. (see pages: 4, 14, 34, 36, 50, 59,and 61)70Bibliography[25] Gerard Dallal. The analysis of pre-test/post-test experiments, May2012. (see page: 42)[26] Doris M Dehn and Susanne Van Mulken. The impact of animatedinterface agents: a review of empirical research. International journalof human-computer studies, 52(1):1–22, 2000. (see page: 7)[27] Devon Delfino. “what is a google home hub?”: Everything you need toknow about the google smart device that can help you navigate dailylife, Sep 2019. (see pages: 7, 35, and 63)[28] Virginie Demeure, Rados law Niewiadomski, and Catherine Pelachaud.How is believability of a virtual agent related to warmth, competence,personification, and embodiment? Presence: teleoperators and virtualenvironments, 20(5):431–448, 2011. (see page: 8)[29] Laurie Edwards. The role of gestures in mathematical discourse: Re-membering and problem solving. In Proceedings of the 29th Conferenceof the International Group for the Psychology of Mathematics Educa-tion, volume 1, pages 135–138. University of Melbourne Melbourne,2005. (see page: 9)[30] Randi A Engle. Not channels but composite signals: Speech, gesture,diagrams and object demonstrations are integrated in multimodal ex-planations. In Proceedings of the twentieth annual conference of thecognitive science society, pages 321–326, 1998. (see pages: 1, 66)[31] Thomas Erickson. Designing agents as if people mattered. Softwareagents, pages 79–96, 1997. (see page: 63)[32] Dylan Brodie Fafard, Qian Zhou, Chris Chamberlain, Georg Hage-mann, Sidney Fels, and Ian Stavness. Design and implementation of amulti-person fish-tank virtual reality display. In Proceedings of the 24thACM Symposium on Virtual Reality Software and Technology, pages 1–9, 2018. (see pages: v, 20)71Bibliography[33] Jesse Fox, Sun Joo Ahn, Joris H Janssen, Leo Yeykelis, Kathryn YSegovia, and Jeremy N Bailenson. Avatars versus agents: a meta-analysis quantifying the e↵ect of agency on social influence. Human–Computer Interaction, 30(5):401–432, 2015. (see page: 63)[34] Caroline Gale and Andrew F Monk. Where am i looking? the accu-racy of video-mediated gaze awareness. Perception & psychophysics,62(3):586–595, 2000. (see page: 14)[35] Simson Garfinkel. Enter the dragon, Sep 1998. (see page: 6)[36] James J Gibson and Anne D Pick. Perception of another person’s look-ing behavior. The American journal of psychology, 1963. (see page: 14)[37] Bradley A Goodman. Repairing reference identification failures by re-laxation. In Proceedings of the 23rd annual meeting on Association forComputational Linguistics, pages 204–217. Association for Computa-tional Linguistics, 1985. (see page: 1)[38] Karen Grace-Martin. Analyzing pre-post data with repeated measuresor ancova, Jan 2013. (see page: 42)[39] Georg Hagemann, Qian Zhou, Ian Stavness, Oky Dicky Ardian-syah Prima, and Sidney S. Fels. Here’s looking at you: A sphericalftvr display for realistic eye-contact. pages 357–362, 11 2018. (seepages: 15, 17)[40] Georg Hagemann, Qian Zhou, Ian Stavness, and Sidney Fels. Investi-gating spherical fish tank virtual reality displays for establishing real-istic eye-contact. In 2019 IEEE Conference on Virtual Reality and 3DUser Interfaces (VR), pages 950–951. IEEE, 2019. (see pages: 2, 18,29, and 65)[41] Yugo Hayashi and Moritz Marutschke. Designing Pedagogical Agents toEvoke Emotional States in Online Tutoring Investigating the Influence72Bibliographyof Animated Characters, volume 9192, pages 372–383. 08 2015. (seepage: 33)[42] Denise Henriques and John Crawford. Role of eye, head, and shoul-der geometry in the planning of accurate arm movements. Journal ofneurophysiology, 87:1677–85, 05 2002. (see pages: 12, 13, 17, 34, 49,and 57)[43] Sandra Herbert. Gesture types for functions. In MERGA 2012: The35th Annual Conference of the Mathematics Education Research Groupof Australasia, pages 322–329. MERGA, 2012. (see page: 9)[44] YU HIGUCHI. Mikumikudance, 2010. (see page: 19)[45] Thomas Holz, Mauro Dragone, and Gregory MP O’Hare. Where robotsand virtual agents meet. International Journal of Social Robotics,1(1):83–93, 2009. (see page: 63)[46] Kangsoo Kim, Luke Boelling, Ste↵en Haesler, Jeremy Bailenson, GerdBruder, and Greg F Welch. Does a digital assistant need a body? theinfluence of visual embodiment and social behavior on the perception ofintelligent virtual agents in ar. In 2018 IEEE International Symposiumon Mixed and Augmented Reality (ISMAR), pages 105–114. IEEE, 2018.(see pages: 1, 7, 8, 15, and 63)[47] Kibum Kim, John Bolton, Audrey Girouard, Jeremy Cooperstock, andRoel Vertegaal. Telehuman: E↵ects of 3d perspective on gaze and poseestimation with a life-size cylindrical telepresence pod. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems,CHI ’12, pages 2531–2540, New York, NY, USA, 2012. ACM. (seepages: 2, 15, 17, 18, 34, 35, and 65)[48] Sotaro Kita. Pointing: Where language, culture, and cognition meet.Psychology Press, 2003. (see page: 1)73Bibliography[49] Alfred Kobsa, Jurgen Allgayer, Carola Reddig, Norbert Reithinger,Dagmar Schmauks, Karin Harbusch, and Wolfgang Wahlster. Com-bining deictic gestures and natural language for referent identification.In Coling 1986 Volume 1: The 11th International Conference on Com-putational Linguistics, 1986. (see pages: 1, 9)[50] Gregory Kramida. Resolving the vergence-accommodation conflict inhead-mounted displays. IEEE transactions on visualization and com-puter graphics, 22(7):1912–1931, 2015. (see page: 66)[51] Robert M. Krauss, Yihsiu Chen, and Rebecca F. Gottesman. Lexicalgestures and lexical access: a process model, page 261–283. LanguageCulture and Cognition. Cambridge University Press, 2000. (see page: 9)[52] James C Lester, Jennifer L Voerman, Stuart G Towns, and Charles BCallaway. Deictic believability: Coordinated gesture, locomotion, andspeech in lifelike pedagogical agents. Applied Artificial Intelligence,13(4-5):383–414, 1999. (see page: 11)[53] Zhi Li and Ray Jarvis. Visual interpretation of natural pointing ges-tures in 3d space for human-robot interaction. In 2010 11th Inter-national Conference on Control Automation Robotics & Vision, pages2513–2518. IEEE, 2010. (see page: 12)[54] Konrad Lorenz. Die angeborenen formen mo¨glicher erfahrung.Zeitschrift fu¨r Tierpsychologie, 5(2):235–409, 1943. (see page: 33)[55] Sven Mayer, Valentin Schwind, Robin Schweigert, and Niels Henze.The e↵ect of o↵set correction and cursor on mid-air pointing in realand virtual environments. In Proceedings of the 2018 CHI Conferenceon Human Factors in Computing Systems, CHI ’18, New York, NY,USA, 2018. Association for Computing Machinery. (see page: 12)[56] Tara McGowan. Abstract deictic gestures-in-interaction: a barome-ter of intersubjective knowledge development in small-group discussion.74BibliographyWorking Papers in Educational Linguistics (WPEL), 25(2):4, 2010. (seepage: 10)[57] David McNeill. So you think gestures are nonverbal? Psychologicalreview, 92(3):350, 1985. (see pages: 1, 66)[58] David McNeill. Hand and mind: What gestures reveal about thought.University of Chicago press, 1992. (see page: 9)[59] Hironori Mitake, Taro Ichii, Kazuya Tateishi, and Shoichi Hasegawa.Wide viewing angle fine planar image display without the mona lisae↵ect. (see page: 65)[60] Thomas Moeslund, Moritz Sto¨rring, and Erik Granum. Vision–based user interface for interacting with a virtual environment. Proc.DANKOMB, 2000, 2000. (see page: 12)[61] Masahiro Mori, Karl F MacDorman, and Norri Kageki. The un-canny valley [from the field]. IEEE Robotics & Automation Magazine,19(2):98–100, 2012. (see pages: 7, 33)[62] Sebastiaan FW Neggers and Harold Bekkering. Gaze anchoring to apointing target is present during the entire pointing movement and isdriven by a non-visual signal. Journal of Neurophysiology, 86(2):961–970, 2001. (see page: 15)[63] Kai Nickel and Rainer Stiefelhagen. Pointing gesture recognition basedon 3d-tracking of face, hands and head orientation. In Proceedings of the5th international conference on Multimodal interfaces, pages 140–146,2003. (see page: 12)[64] Tsukasa Noma, Liwei Zhao, and Norman I Badler. Design of a vir-tual human presenter. IEEE Computer Graphics and Applications,20(4):79–85, 2000. (see page: 10)75Bibliography[65] Oyewole Oyekoya, William Steptoe, and Anthony Steed. Sphereavatar:A situated display to represent a remote collaborator. In Proceedings ofthe SIGCHI conference on human factors in computing systems, pages2551–2560, 2012. (see pages: 15, 17)[66] Ye Pan and Anthony Steed. Preserving gaze direction in teleconfer-encing using a camera array and a spherical display. In 2012 3DTV-Conference: The True Vision-Capture, Transmission and Display of3D Video (3DTV-CON), pages 1–4. IEEE, 2012. (see pages: 2, 15, 17,18, 29, and 65)[67] Jeunese Payne, Andrea Szymkowiak, Paul Robertson, and GrahamJohnson. Gendering the machine: Preferred virtual assistant genderand realism in self-service. In International Workshop on IntelligentVirtual Agents, pages 106–115. Springer, 2013. (see page: 54)[68] Dan Perkel. Share wars: Sharing, theft, and the everyday productionof web 2.0 on deviantart. First Monday, 21(6), 2016. (see page: 19)[69] Chris L Schmidt. Adult understanding of spontaneous attention-directing events: What does gesture contribute? Ecological Psychology,11(2):139–174, 1999. (see pages: 12, 39)[70] Edward Schneider, Yifan Wang, and Shanshan Yang. Exploring theuncanny valley with japanese video game characters. In DiGRA Con-ference, 2007. (see page: 33)[71] Gary D Sherman, Jonathan Haidt, and James A Coan. Viewing cuteimages increases behavioral carefulness. Emotion, 9(2):282, 2009. (seepage: 54)[72] Alan Slater, Anne Mattock, and Elizabeth Brown. Size constancy atbirth: Newborn infants’ responses to retinal and real size. Journal ofexperimental child psychology, 49(2):314–322, 1990. (see page: 35)76Bibliography[73] Rainer Stiefelhagen and Jie Zhu. Head orientation and gaze directionin meetings. In CHI ’02 Extended Abstracts on Human Factors inComputing Systems, CHI EA ’02, pages 858–859, New York, NY, USA,2002. ACM. (see pages: 2, 15, 18, and 61)[74] J. Taylor and D. McCloskey. Pointing. Behavioural Brain Research,29:1–5, 1988. (see page: 15)[75] David Traum, William Swartout, Peter Khooshabeh, Stefan Kopp, Ste-fan Scherer, and Anton Leuski. Intelligent Virtual Agents: 16th In-ternational Conference, IVA 2016, Los Angeles, CA, USA, September20–23, 2016, Proceedings, volume 10011. 01 2016. (see page: 6)[76] Elkhas Vaysi and Leila Salehnejad. Spatial and temporal deixis inenglish and persian. International Journal of Humanities and CulturalStudies (IJHCS) ISSN 2356-5926, 3(1):1405–1414, 2016. (see page: 9)[77] Astrid M Von der Puetten, Nicole C Kra¨mer, Jonathan Gratch, andSin-Hwa Kang. “it doesn’t matter what you are!” explaining sociale↵ects of agents and avatars. Computers in Human Behavior, 2010.(see page: 63)[78] Andrew John Wagemakers, Dylan Brodie Fafard, and Ian Stavness. In-teractive visual calibration of volumetric head-tracked 3d displays. InProceedings of the 2017 CHI Conference on Human Factors in Com-puting Systems, CHI ’17, pages 3943–3953, New York, NY, USA, 2017.ACM. (see pages: v, 20)[79] Colin Ware. Information visualization: perception for design. MorganKaufmann, 2019. (see page: 65)[80] Colin Ware, Kevin Arthur, and Kellogg S Booth. Fish tank virtualreality. In Proceedings of the INTERACT’93 and CHI’93 conference onHuman factors in computing systems, pages 37–42, 1993. (see page: 15)77Bibliography[81] Hugh R Wilson, Frances Wilkinson, Li-Ming Lin, and Maja Castillo.Perception of head orientation. Vision research, 40(5):459–472, 2000.(see pages: 2, 15, 18, and 61)[82] Nelson Wong. Distant Pointing in Desktop Collaborative Virtual Envi-ronments. PhD thesis, University of Saskatchewan, 2013. (see pages: 1,9)[83] Nelson Wong and Carl Gutwin. Where are you pointing? the accuracyof deictic pointing in cves. In Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems, pages 1029–1038, 2010. (seepages: 14, 31, 62, and 64)[84] Fan Wu, Qian Zhou, Kyoungwon Seo, Toshiro Kashiwaqi, and SidneyFels. I got your point: An investigation of pointing cues in a sphericalfish tank virtual reality display. In 2019 IEEE Conference on VirtualReality and 3D User Interfaces (VR), pages 1237–1238. IEEE, 2019.(see page: v)[85] Nick Yee, Jeremy N Bailenson, and Kathryn Rickertsen. A meta-analysis of the impact of the inclusion and realism of human-like faceson user experiences in interfaces. In Proceedings of the SIGCHI confer-ence on Human factors in computing systems, pages 1–10, 2007. (seepage: 7)[86] George Yule. The study of language. Cambridge university press, 2016.(see page: 9)[87] Qian Zhou, Georg Hagemann, Dylan Fafard, Ian Stavness, and SidneyFels. An evaluation of depth and size perception on a spherical fishtank virtual reality display. IEEE transactions on visualization andcomputer graphics, 25(5):2040–2049, 2019. (see pages: 2, 18, and 65)[88] Qian Zhou, Fan Wu, Sidney Fels, and Ian Stavness. Closer object lookssmaller: Investigating the duality of size perception in a spherical fish78Bibliographytank vr display. In Proceedings of the 2020 CHI Conference on HumanFactors in Computing Systems, pages 1–9, 2020. (see page: 21)[89] Qian Zhou, Kai Wu, Gregor Miller, Ian Stavness, and Sidney Fels. 3dps:An auto-calibrated three-dimensional perspective-corrected sphericaldisplay. In 2017 IEEE Virtual Reality (VR), pages 455–456. IEEE,2017. (see pages: v, 20)79Appendix AQuestionnaireA.1 15 SettingIn thisCloser target pointing condition, please rank the pointing cues basedon how confident you are about your decisions. (please use 1, 2, 3 where1 denotes the most confidence. You can use same number to indicate thesame confidence level.)• Head only:• Hand only:• Hand+Head:A.2 30 SettingIn thisWider target pointing condition, please rank the pointing cues basedon how confident you are about your decisions. (please use 1, 2, 3 where1 denotes the most confidence. You can use same number to indicate thesame confidence level.)• Head only:• Hand only:• Hand+Head:80Appendix BGeometric CalculationsB.1 Distance AdjustmentFigure B.1: RP was moved further to ensure the same visual angel coveringarms of IVA and RP.One of the most obvious scale di↵erence between IVA and RP when pointingis the di↵erence in their arm length. The absolute length of RP’s arm81B.1. Distance Adjustmentis longer than IVA’s arm. The visual angel covering their arms from theparticipant’s perspective would be di↵erent if IVA and RP were of the samedistance to the participant. To ensure the same retinal size of IVA and RP,we adjusted RP to move further away from the participant for the samevisual angel of their arms (↵ = ).As shown in Figure B.1, We can have a function based on the law ofcosines:cos↵ =402 + 59.52 + 402 + 902  30.522 ⇤ p402 + 59.52 ⇤ p402 + 902cos =402 + x2 + 402 + (x+ 68)2  6822 ⇤ p402 + x2 ⇤p402 + (x+ 68)2cos↵ = cos(B.1)By calculation, we can get the result: x = 78cm. So the distance betweenRP and participants is 68cm + 78cm = 146cm, which is 146cm  90cm =56cm further away than IVA is from the participant.82B.2. Analysis of Estimated Error in Human PointingB.2 Analysis of Estimated Error in HumanPointingWe provide a general analysis of how the estimated vertical/horizontal errorchanges with respect to the parameters for future studies as a reference.B.2.1 Estimated Vertical Error in Human PointingFigure B.2: When a human pointer points to targets from top to down, errvreaches the minimum when the pointer’s arm vector is perpendicular to thetarget area.According to the side view in human pointing (Figure B.2 Left), the esti-mated vertical error is:errv = x ⇤ disv/arm (B.2)When a human pointer points to targets from top to down as shown inFigure B.2 Right, x decreases until it reaches the minimum spot where thepointer’s arm vector is perpendicular to the target area and then increases.As disv and arm remain unchanged, errv changes proportionally with x.83B.2. Analysis of Estimated Error in Human PointingB.2.2 Estimated Horizontal Error in Human PointingFigure B.3: errh reaches the minimum when the right arm vector is perpen-dicular to the target area.According to the top view in human pointing (Figure B.3), the estimatedhorizontal error is:errh = x ⇤ dish/arm (B.3)When a human pointer points to targets using the left arm, dish is dish2and for right arm pointing dish is dish1. Figure B.3 Right suggests that xreaches the minimum when the left/right arm vector is perpendicular to thetarget area. The more deviation from the minimum spot, the larger x is.As dish and arm remain unchanged, errh changes proportionally with x.Particularly, as dish2 < dish1, errh is the smallest when the human pointeruses the right arm for pointing and the arm vector is perpendicular to thetarget area from the top view.84B.3. Eye-shoulder DistanceB.3 Eye-shoulder DistanceB.3.1 Error Di↵erence Between IVA and RPWe did not find a significant error di↵erence between IVA and RP (Fig-ure B.4) in the horizontal dimension. As illustrated in Figure B.5 Topview, the deviation between the eye-fingertip line (green) and arm line (red)also seems to introduce the error (errh). Similar to the vertical dimensiondiscussed in Section 4.4.1, we expect the removal of alignment might im-prove the accuracy horizontally as well. However, our results show a rathersmaller di↵erence (0.005) compared to the vertical one (0.052) (Figure B.4).One possible explanation is the smaller eye-shoulder distance from top view(dish) than it from side view (disv).Figure B.4: The error di↵erence between IVA and RP in horizontal dimen-sion is 0.005, which is much smaller than the vertical error di↵erence (0.052).RP got smaller error horizontally (0.15) than vertically (0.179).85B.3. Eye-shoulder DistanceFigure B.5: The eye-shoulder distance from side view is larger than it fromtop view (disv > dish).As discussed in Appendix B.2:errv = x ⇤ disv/armerrh = x ⇤ dish/arm(B.4)RP in our experiment used the left arm to point to targets in the leftregion and the right arm for targets in the center or right region. So dishis dish1(31cm) for left arm pointing and dish is dish2(25cm) for right armpointing. As disv(36cm) > dish(31cm or 25cm), errv > errh. We speculatethat the removal may not make much di↵erence in eliminating the smallerhorizontal error (errh).B.3.2 Horizontal and Vertical Error in RPAs shown in Figure B.4, in RP, participants were more accurate horizon-tally than vertically. We speculate the larger eye-shoulder distance in thetop view (disv) compared to the side view (dish) might account for that.Based on equations B.4, we averaged dish and roughly estimated the ratioof horizontal error and vertical error:dish = (dish1 + dish2)/2 = (25cm+ 31cm)/2 = 28cm (B.5)86B.3. Eye-shoulder Distanceerrh/errv = dish/disv = 28cm/36cm ⇡ 0.78 (B.6)We also calculated the actual ratio by dividing the mean horizontal error(0.15) by the mean vertical error (0.179):0.15/0.179 ⇡ 0.84 (B.7)The error due to the eye-shoulder distance might contribute to the result.87Appendix CConsent Forms88 Aug 8, 2019        Page 1 of 2 Ethics ID: H08-03005-A022 THE UNIVERSITY OF BRITISH COLUMBIA  Department of Electrical & Computer Engineering 2332 Main Mall Vancouver, B.C., V6T 1Z4 Aug 8, 2019 Consent Form  Evaluation of pointing gesture on joint attention using a spherical perspective-corrected display  Principal Investigator  Dr. Sidney Fels, Professor, Department of Electrical and Computer Engineering, University of British Columbia (604) 822-5338  Co-Investigators Fan Wu, Graduate Student, Department of Electrical and Computer Engineering, University of British Columbia  (778) 883-8795  Purpose and Procedures  The purpose of this study is to investigate how well people could interpret where the virtual agent is pointing in a spherical FTVR display compared to real people pointing. You will be asked to perceive and identify the agent or real people’s pointing and determine the direction by clicking in a blank display. In these studies, your reactions and accuracy will be recorded. We will also ask you for your impressions of the interaction experience. The session should take about 30 minutes. You may stop your participation at any time.  Confidentiality The identities of all people who participate will remain anonymous and will be kept confidential. Identifiable data will be stored securely in a locked metal filing cabinet or in a password protected computer account. All data from individual participants will be coded so that their anonymity will be protected in any project reports and presentations that result from this work.   Remuneration/Compensation 89 Aug 8, 2019        Page 2 of 2 Ethics ID: H08-03005-A022  We are very grateful for your participation. You will receive $10 for your participation whether you complete the experiment or not. (Not applicable for Pilot Subjects!!).  Contact Information About the Project  If you have any questions or require further information about the project you may contact Fan Wu at (778) 883-8795.   Contact for information about the rights of research subjects  Any questions about the details of the study or interview process can be directed to the investigators using the contact information listed above. If you have any concerns or complaints about your rights as a research participant and/or your experiences while participating in this study, contact the Research Participant Complaint Line in the UBC Office of Research Ethics at 604.822.8598, or if long distance email or call toll free 1.877.822.8598.  Consent  We intend for your participation in this project to be pleasant and stress-free. Your participation is entirely voluntary and you may refuse to participate or withdraw from the study at any time.  Your signature below indicates that you have received a copy of this consent form for your own records.  Your signature indicates that you consent to participate in this project. You do not waive any legal rights by signing this consent form.  I, ________________________________, agree to participate in the project as outlined above. My participation in this project is voluntary and I understand that I may withdraw at any time.    ____________________________________________________ Participant’s Signature                                                           Date   ____________________________________________________ Student Investigator’s Signature                                            Date 90


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items