UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Perception of motion in virtual reality interception tasks Rolin, Robert Adam 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2017_september_rolin_robert.pdf [ 5.88MB ]
JSON: 24-1.0354558.json
JSON-LD: 24-1.0354558-ld.json
RDF/XML (Pretty): 24-1.0354558-rdf.xml
RDF/JSON: 24-1.0354558-rdf.json
Turtle: 24-1.0354558-turtle.txt
N-Triples: 24-1.0354558-rdf-ntriples.txt
Original Record: 24-1.0354558-source.json
Full Text

Full Text

Perception of Motion in VirtualReality Interception TasksbyRobert Adam RolinB.Sc., McGill University, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2017© Robert Adam Rolin 2017AbstractVirtual Reality (VR) and related 3D display technologies have recently ex-perienced tremendous growth in promise and popularity, but have signif-icant limitations. Human vision, carefully tuned to integrating multiplecues from the real words can incorrectly perceive the virtual world in thesedisplays. In this thesis, we conduct a series of psychophysics experimentsevaluating motion perception in VR, culminating in a user-adapted methodto increase interception accuracy of virtual objects by modifying motion-in-depth cues. Using a baseball hitting simulation in VR, we show thatour modified motion-in-depth cues result in greater accuracy. Finally, wepresent implementations of 3D gaze analysis algorithms.iiLay SummaryVirtual Reality (VR) and related 3D display technologies have recently ex-perienced tremendous growth in promise and popularity, but have significantlimitations. Human vision, carefully tuned to integrating multiple cues fromthe real words can incorrectly perceive the virtual world in these displays.The key goals of this thesis are to evaluate the effects of these limitation onhumans’ ability to interpret the movement of virtual objects and make ac-curate interceptive actions. We identify characteristics of virtual movementthat cause people to make consistent errors and present methods to modifyvirtual objects in ways to make their motion more easily interpreted. Usinga realistic baseball hitting simulation in VR, we show that our modificationsresult in greater accuracy.iiiPrefaceA version of chapter 3 has been submitted for publication [52] and waspresented in a poster at the Vision Science Society 17th Annual Meeting(2017). I implemented and ran the user studies, performed the statisticalanalysis, and wrote most of the paper manuscript.The studies ran for this thesis were approved by the UBC BehaviouralResearch Ethics Board (Certificate Number: H16-01651).ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixI Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1 Visual Perception in VR . . . . . . . . . . . . . . . . . . . . 51.1.1 Vergence-Accommodation Conflict . . . . . . . . . . . 51.1.2 Depth Underestimation . . . . . . . . . . . . . . . . . 61.2 Motion Perception . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Time-to-Contact . . . . . . . . . . . . . . . . . . . . . 7II Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 VR Time-to-Contact Experiments . . . . . . . . . . . . . . . 102.1 General Methods . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Visual Environment, Task and Procedure . . . . . . . 102.1.2 Apparatus and Set-up . . . . . . . . . . . . . . . . . . 112.1.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . 112.2 Experiment 1: Lateral vs. Looming Motion . . . . . . . . . . 122.2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Visual Environment, Task and Procedure . . . . . . . 13vTable of Contents2.2.3 Participants . . . . . . . . . . . . . . . . . . . . . . . 132.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Experiment 2: VR vs. Single Screen . . . . . . . . . . . . . . 152.3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.2 Visual Environment, Task and Procedure . . . . . . . 162.3.3 Participants . . . . . . . . . . . . . . . . . . . . . . . 162.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Experiment 3: Effect of Absolute Depth . . . . . . . . . . . . 192.4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4.2 Visual Environment, Task and Procedure . . . . . . . 192.4.3 Participants . . . . . . . . . . . . . . . . . . . . . . . 192.4.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5 Experiment 4: Effect of Conflicting Cues . . . . . . . . . . . 222.5.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5.2 Visual Environment, Task and Procedure . . . . . . . 232.5.3 Participants . . . . . . . . . . . . . . . . . . . . . . . 232.5.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . 232.5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 243 User Specific Motion Correction . . . . . . . . . . . . . . . . 283.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1.1 User Calibration . . . . . . . . . . . . . . . . . . . . . 293.1.2 Motion Correction . . . . . . . . . . . . . . . . . . . . 303.2 Experiment 5: Motion Correction Tool Evaluation . . . . . 323.2.1 Virtual Environment . . . . . . . . . . . . . . . . . . 333.2.2 Visual Stimuli, Task and Procedure . . . . . . . . . . 343.2.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . 353.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 373.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 Eye-Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.1 Data Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 414.1.1 Converting Eyelink Data to 3D . . . . . . . . . . . . 444.2 Saccade Detection . . . . . . . . . . . . . . . . . . . . . . . . 444.3 Pursuit Detection . . . . . . . . . . . . . . . . . . . . . . . . 454.4 CNN for 3D Gaze Localization . . . . . . . . . . . . . . . . . 464.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . 464.4.2 Preprocessing and Filtering . . . . . . . . . . . . . . . 464.4.3 Model Architecture . . . . . . . . . . . . . . . . . . . 47viTable of Contents4.4.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . 474.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 48III Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54viiList of Tables2.1 The ball speeds and corresponding times-to-contact used inExperiment 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 The results of the repeated measures ANOVA performed forExperiment 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 The ball speeds and corresponding times-to-contact used inExperiment 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 The results of the repeated measures ANOVA performed forExperiment 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5 The ball speeds and corresponding times-to-contact used inExperiment 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.6 The results of the repeated measures ANOVA performed forExperiment 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.7 The ball speeds and modified speeds used in Experiment 4. . 242.8 The results of the repeated measures ANOVA performed forExperiment 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1 Parameters for baseball pitch simulation. . . . . . . . . . . . 354.1 The result of the network’s predictions on unseen test data.Errors are calculated in normalized space. . . . . . . . . . . . 48viiiList of Figures2.1 The baseball stadium model used in the TTC experiments. . 112.2 The physical set-up for the TTC experiments: chin-rest, key-board, monitor, and Eyelink 1000. The monitor and Eyelink1000 were used only for one section of Experiment 2. . . . . . 122.3 The procedure for Experiment 1. A ball moved from thecross to the target and subjects had to press a button whenthey thought the ball would hit the target. (a) The firstexperimental block. (b) The second experimental block. . . . 132.4 Judged vs. actual TTC (in seconds) for Experiment 1. Blackline indicates perfect responses. Left panel shows the left-to-right trials, right panel shows the looming trials. Error barsshow standard error. . . . . . . . . . . . . . . . . . . . . . . . 152.5 The procedure for Experiment 2. A ball moved from the crossto the target and subjects had to press a button when theythought the ball would hit the target. Right-handed subjectswere placed 0.75m to the left of home plate and left-handedsubjects were placed 0.75m to the right of homeplate. Sub-jects were rotated toward the cross placed above the pitcher’smound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.6 Judged vs. actual TTC (in seconds) for Experiment 2. Blackline indicates perfect responses. Panels are split by presenta-tion duration. Error bars show standard error. . . . . . . . . 182.7 The procedure for Experiment 3. A ball moved from the crossto the target and subjects had to press a button when theythought the ball would hit the target. Experimental blocksvaried in the distance of the starting cross. . . . . . . . . . . . 202.8 Judged vs. actual TTC (in seconds) for Experiment 3. Blackline indicates perfect responses. Panels are split by presenta-tion duration. Error bars show standard error. . . . . . . . . 22ixList of Figures2.9 The modified depth cues used in Experiment 4. (a) Baseline.(b) Monocular cues transferred to baseline ball with binocularcues maintained. (c) Binocular cues transferred to baselineball with monocular cues maintained. . . . . . . . . . . . . . 252.10 Judged vs. actual TTC (in seconds) for Experiment 4, novicesports players only. Black line indicates perfect responses.Panels are split by presentation duration. Error bars showstandard error. . . . . . . . . . . . . . . . . . . . . . . . . . . 262.11 Judged vs. actual TTC (in seconds) for Experiment 4, ex-perienced sports players only. Black line indicates perfectresponses. Panels are split by presentation duration. Errorbars show standard error. . . . . . . . . . . . . . . . . . . . . 263.1 Calibration data from one subject. (a) Response times for thecalibration procedure, differentiated by display speed. Thethree speeds correspond to actual times-to-contact of 0.90,0.62 and 0.47 seconds. (b) The model fit to the subject’sperceived speeds. . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 (a) The position of a ball at four points in a pitch movingwith velocity v with actual size shown. (b) The position ofa ball moving with velocity v′ with perceived size shown. (c)The output of the modifications. The position of the ballwith velocity v and perceived size of ball with velocity v′.After a certain distance, perceived sizes are extrapolated fromprevious values. . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Virtual environments shown to subjects in Experiment 5. (a)Calibration section. (b) Testing section. . . . . . . . . . . . . 333.4 The collision bounds of the bat were extended vertically. Thebounce angle of the ball was determined by the bat’s rotationaround the vertical axis plus a random vertical component.Image source: shutterstock.com . . . . . . . . . . . . . . . . 363.5 Relative hit angles (in degrees) as a function of trial numberfor two representative subjects. Subject 106 completed block1 (a) with intervention (orange data points), and block 2 (b)without (blue data points). Subject 110 completed blocks inthe reverse order, first the baseline condition (c), then the in-tervention (d). Ball speed denoted by symbol type. Negativehit angles indicate late hits, positive hit angles indicate earlyhits. Misses are coded at 90° hit angle by default, indicatedby shaded box at top of each panel. . . . . . . . . . . . . . . 38xList of Figures3.6 Hitting error (in degrees) across all observers (n=9 per group)for three speeds. Lines are best fit model fits. Each datapoint is the average hitting error per trial across observers;each panel shows 50 trials per speed (total of 150 trials perblock). Error bars denote standard errors. . . . . . . . . . . . 394.1 Average eye direction during a 2.5s left-to-right TTC exper-iment. (a) Raw data. (b) Filtered with 3-unit wide medianand Gaussian filters. . . . . . . . . . . . . . . . . . . . . . . . 424.2 Average eye velocity during a 2.5s left-to-right TTC experi-ment. (a) Computed from filtered gaze directions. (b) Fur-ther filtered with 3-unit wide median and Gaussian filters,followed by a FIR filter. . . . . . . . . . . . . . . . . . . . . . 424.3 Average eye acceleration during a 2.5s left-to-right TTC ex-periment. Computed via finite differences from the filteredvelocities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.4 Angular distance between gaze and object during a 2.5s left-to-right TTC experiment. (a) Absolute angular distance. (b)Angular distance along the horizontal and vertical axes. Neg-ative values correspond to left and down. . . . . . . . . . . . 434.5 Eye velocity (blue) and acceleration (orange) during a 2.5sleft-to-right TTC experiment. Green sections have been iden-tified as saccades. . . . . . . . . . . . . . . . . . . . . . . . . . 454.6 Eye velocity (blue) and angular distance between gaze andball (orange) during a left-ro-right 2.5s TTC experiment. Redsections have been identified as pursuit. . . . . . . . . . . . . 454.7 Sample images of each eye used for training. (a) Right eye.(b) Left eye. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.8 CNN architecture. Each eye image is fed to one of two iden-tical streams. The latent features of the two streams areflattened and merged before a fully connected layer. . . . . . 47xiPart IIntroduction1Virtual Reality (VR) and related 3D display technologies have recentlyexperienced tremendous growth in promise and popularity, but have signifi-cant limitations. Human vision, carefully tuned to integrating multiple cuesfrom the real world such as disparity, vergence, and accommodation, canincorrectly perceive the virtual world in these displays (e.g., [21]). However,accurate motion perception is essential for tasks such as interception andcollision avoidance that are central to many VR applications that simulatebaseball and other ball sports, as well as in many video games.Motion perception research involves investigating the process by whichhumans can infer the direction and speed at which objects in a scene are mov-ing. These abilities are not easily explained and has led to a large amountof research. This research has shed light on both the visual elements of ascene which cause movement to be perceived as well as some of the neuro-physiology involved in perceiving movement. Much of the neurophysiologyinvolved in motion perception is still not known.It is now thought that visual perception, including motion perception,exists in two distinct streams. This is known as the two-stream hypoth-esis. The first stream, dubbed the “ventral stream,” is involved largely inperception-based tasks that involve a higher degree of conscious thought andslower actions. The second stream, dubbed the “dorsal stream,” is involvedin action-based tasks that can be quicker and more instinctual [43].Our desire to better understand motion perception in VR is alignedmore closely with the action-based dorsal stream. While gaining knowl-edge about both streams would be valuable, we are primarily concernedwith user-experience issues within Virtual Reality simulations and how theyaffect users’ ability to interact naturally in virtual scenes. There are ad-ditional reasons to investigate motion perception in VR. The use of VRas a training tool and as a medium to conduct psychophysics experimentshas increased substantially in recent years. Incorrect motion perception inboth of these areas could be quite problematic. As a training tool, thetrainee has the potential to become trained to conditions that will differfrom the real world and thus the training may be ineffective or counter-effective. When conducting psychophysics experiments, it becomes possibleto make conclusions about human motion perception that may reflect onlythe handicapped motion perception inside a VR display and not generallyabout motion perception. While this thesis sheds light on using VR as amedium for psychophysics experiments, we focus mainly on the use of VRas a simulation tool involving active interactions from the user.Our investigation of motion perception is specifically limited to one mainaspect of motion perception: the ability to judge time-to-contact (TTC).2Judging time-to-contact involves estimating when a moving object will reacha specified point in space and is necessary for many natural interactions inVR such as catching or hitting an object. Judging time-to-contact is oftenthought of as a proxy for judging speeds, but it has been shown that it ismuch more likely that our brain is actually estimating times-to-contact asopposed to estimating speeds and inferring a time-to-contact [53]. Our goalsare to determine whether there are systematic weaknesses of peoples’ abil-ity to judge times-to-contact in VR, and further what the effects of speedand presentation duration (viewing time) are on these judgments. We alsoexplore methods for improving peoples’ estimates of time-to-contact in VR.Because of our focus on VR applications, we use parameters (i.e., speeds,distances) that one would encounter in the real world which is in contrastto many previous time-to-contact experiments. After a preliminary experi-ment comparing left-to-right to looming motion (Experiment 1), we chooseto focus our experiments on looming and near-looming motion with inter-actions occurring within reaching distance of the user. We limit ourselvesto this context for two reasons. Firstly, this is the context for many sportssimulations (e.g., baseball, tennis, table-tennis) which are popular VR use-cases. Secondly, we find that looming and near-looming motion results ininaccurate estimates of time-to-contact and strong effects of speed and pre-sentation duration on these estimates, indicating the potential for noveltechniques that will improve estimates of time-to-contact.Our investigation of time-to-contact estimates in VR proceeds over fiveexperiments. In the first experiment we examine the role played by direc-tion on TTC estimation in VR. In the second experiment we compare TTCestimation on a VR display to a conventional monitor. In the third exper-iment we examine the role played by depth on TTC estimation in VR. Inthe fourth experiment we examine the effect of selectively modifying cer-tain depth cues on TTC estimation in VR. In the fifth experiment we testa method to improve a user’s perception of motion in Virtual Reality dis-plays. In a calibration stage, we first learn user-specific parameters based ona user’s errors in TTC judgments to characterize their perception of virtualmotion. In the second stage, we modify motion-in-depth cues to facilitatemore accurate judgments for that user. Using a baseball hitting simulationin VR, we show that our modified motion-in-depth cues resulted in moreaccurate and consistent interactions of moving objects in VR.This thesis also documents some work on eye tracking. Although noresearch questions were answered using eye tracking data, the analysis pro-cedure may nonetheless be valuable for others doing similar analyses. Wepresent implementations for filtering 60 Hz 3D eye tracking data as well as3saccade and pursuit detection. A saccade is the rapid movement of the eyebetween two fixation points and pursuit refers to the slower smooth trackingof an object. We also present a Convolutional Neural Network (CNN) forpredicting 3D gaze location. Convolutional Neural Networks are a class ofdeep, feed-forward artificial neural network that have recently seen tremen-dous growth for tasks involving images analysis such as object detection andtracking.4Chapter 1Related WorkThis thesis deals with problems related to motion perception in VR. We willreview here some previous results related to perception in VR as well as therelevant literature on motion perception.1.1 Visual Perception in VR1.1.1 Vergence-Accommodation ConflictMost modern VR headsets use a fixed focal point, providing stereo cues butnot focal cues. A lack of focal cues causes perceptual differences betweenVR and the real world, resulting in the vergence-accommodation conflict.This is when two depth cues, vergence (the amount of cross of the eyes)and accommodation (the amount of bending of the eyes’ lenses), indicatedifferent depths of focus. This phenomenon occurs when a virtual objectdoes not lie on the display’s focal plane and is more pronounced the furtheran object is from the focal plane. The effects of this conflict, which includereduced ability to perceived 3D structure and visual fatigue, are well docu-mented and explained [21]. There have been no documented effects of thevergence-accommodation conflict on motion perception.Another direct result of having a fixed focal distance is the lack of blurcues induced by objects lying outside of the focal plane. Normally, whenwe look at an object, other objects at different depths will appear blurredwith objects appearing more blurred the further they are from the focalplane. The effectiveness of blur as a depth cue and its ability to impactvisual perception has been explored quite thoroughly [20, 37–40, 46, 59,61]. Due to the importance of blur, and its absence from VR headsets,there have been attempts to introduce blur in gaze-contingent stereoscopicdisplays. They have been shown to enhance realism, improve quantitativedepth perception, and reduce discomfort [12, 42]. Additionally, the use ofblur has been heralded for enabling significant reduction in rendering costs[16, 47].51.2. Motion PerceptionMulti-focal DisplaysDue to the documented effects of the vergence-accommodation conflict,many attempts have been made to create stereoscopic displays without asingle fixed focal length [4, 21, 23, 34]. The most promising of these forVR is the 4D Light Field Display, which uses two stacked Liquid CrystalDisplays to recreate a light field that allows the user to accomodate [23].Another solution to this problem was recently presented by Oculus Re-search [41]. Their technique is to place a phase-modulation element betweenthe eyepiece and display screen that warps the focus of the screen to conformto the 3D content. This creates a focal surface that follows the contours ofthe scene and feels natural.1.1.2 Depth UnderestimationOne potential consequence of the vergence-accommodation conflict is thedocumented underestimation of depth in VR [7, 24, 31, 51]. Studies haveshown subjects real and virtual objects and asked them to estimate theirdepth via either verbally estimating a distance, walking to the virtual ob-jects, or triangulating by pointing at them from multiple positions. Thesestudies have shown underestimation effects of up to 50% for distances up to35m. Distances in the real world are shown to be estimated accurately.Depth underestimation may be the reason why many sports simulationsare reportedly sub-optimal. In a basketball simulation, Covaci, et al. con-clude that distance underestimation or other perceptual disturbances in VRmake people adapt to the training conditions of the task. Novice users onlyreach expert-level performance by finding a new way of throwing the ball [9].In a downhill ski-jumping simulation, Staurset, et al. observe that athletesconsistently struggled with the timing of the jump, always being too late onthe take-off ramp [57].While depth underestimation may or may not play a role in motionperception, it provides motivation to believe that other aspects of visualperception could be similarly affected by a VR display.1.2 Motion PerceptionBehavioural studies in humans, using visual psychophysics, have provided adeep understanding of motion sensitivity - our ability to detect and discrim-inate moving objects (reviews by [5, 45]). It is an area of interest due to its61.2. Motion Perceptionimportance in everyday life for tasks such as collision avoidance and inter-ception, two areas of applied VR research. For the purpose of this thesis,we will review relevant literature on motion perception in time-to-contact(TTC) paradigms.1.2.1 Time-to-ContactIn a TTC experiment, people have to make judgments about when an ob-ject will collide with another objects, when one or both of the objects aremoving. For most TTC experiments, a combination of speed, distance, andpresentation duration are varied, i.e., the object will disappear at somepoint in its collision trajectory. The goal of such research is to deter-mine what visual stimuli are needed in order to estimate TTC accurately([15, 29, 30, 32, 35, 48, 50, 56, 60], reviews by [17, 19]).Many studies in the past focused solely on the monocular predictor ofTTC, so-called Tau,τ =θ∂θ/∂t(1.1)an equation originally derived in a science fiction novel by astronomer FredHoyle that describes when an object approaching an observer’s head at con-stant velocity will collide with their head [22]. Here θ is the visual anglesubtended by the object. However, more recently researchers have doubtedthat this variable provides sufficient information to account for human-levelaccuracy and began looking at binocular indicators of TTC [60]. The fol-lowing equation was derived for TTC based only on binocular informationof objects moving directly toward the eye with constant speed [58].TTC ≈ ID(∂δ/∂t)(1.2)HereD is the distance from the object to observer, ∂δ/∂t is the rate of changeof relative disparity and I is the interpupillary distance. It should be notedthat this value can be approximated purely on retinal image variables, andnot on D which has been shown to be difficult to encode.It is well-known that the effectiveness of static binocular disparity de-creases sharply as viewing distance is increased since relative disparity isinversely proportional to the square of the viewing distance. This causessensitivity to differences in depth to fall off quickly when depth is greaterthan 10m. However, this is not the case with the rate of change of relativedisparity. It has been shown that the ratio between the magnitude of themonocular and binocular correlates of an object’s motion-in-depth does not71.2. Motion Perceptiondepend on the object’s distance, but is proportional to the objects absolutesize and inversely proportional to the viewer’s interpupillary distance [15].The threshold after which the rate of change of relative disparity or the rateof change of angular size produces a detectable percept of motion-in-depthare different and vary between subjects but can be assumed to be about 5arc min/s [49]. All the stimuli used in our studies surpass this threshold.While both monocular and binocular signals are available to the brain,the accuracy and relative strength of these signals is dependent on a num-ber of conditions of the stimuli, and this in turn affects how the signals areintegrated by the brain. For example, it can be seen from Equation 1.2that as distance increases ∂δ/∂t will rise above its detection threshold atprogressively smaller values of TTC, indicating that faster objects will pro-duce a stronger binocular signal. Additionally, binocular information willbe relatively less effective in generating a sensation of motion-in-depth asviewing time decreases [49].In an experiment to quantify the accuracy of these two signals, Grayand Regan judged people’s estimations of TTC using a staircase procedurein conditions where only monocular information was present, only binocularinformation was present, and where both were present. They confirm thatminimal errors are achieved when both signals are present, reporting errorsas small as 1.3% of total TTC. This approaches the accuracy with which topsports players can judge the time to impact of an approaching ball, assumingthat they only use visual information up to about 300ms before the actualimpact [15].In experiments to determine the relative strength of these signals, Reganand Beverley reported that binocular cues are weighted more heavily, orproduce a stronger sensation of motion-in-depth, at high speeds and longerviewing times [49]. Lopez-Mo´liner, et al. showed that over the course ofa single TTC task, a person’s judgment will increasingly depend on themonocular cue with time [33]. It is thought that the relative weight givento a signal is an indication of its reliability [28].There has been some work on the perception of motion using VR simu-lations, usually focused on sports-related tasks [10, 13, 14, 63]. There is alsoa growing interesting in using VR for training athletes, with both academicresearch in implementing realistic sports simulations [25, 26, 62] and com-mercial products available [1, 2], (review by [6]). However, it is still unclearwhat the effects of VR on motion perception itself are, and what can bedone to combat those effects. Dealing with this issue is crucial for creatingperceptually-faithful simulations.8Part IIMethods9Chapter 2VR Time-to-ContactExperimentsWe conducted four time-to-contact (TTC) experiments. While testing dif-ferent hypotheses, the experiments were very similar. The general methodsemployed in all of the experiments are documented first, followed by specifichypotheses, implementation details and results for each experiment.2.1 General Methods2.1.1 Visual Environment, Task and ProcedureParticipants were placed in a virtual baseball stadium, simulated using Unityand a 3D model from the Unity Asset Store (Figure 2.1). In each experi-mental trial, participants viewed a ball flying between a red fixation crossand a target. Participants were instructed to estimate when it would hit thetarget, by pressing an assigned button on a keyboard. At the start of eachexperimental trial, participants were instructed to fixate on the cross. Uponsuccessful fixation, determined by eye tracking, the cross was replaced by aball that moved at a constant, horizontal velocity toward the target.Ball trajectories were either shown in full or only partly, i.e., the ballwould disappear at some point during the pitch, after traveling 1/4, 1/2,3/4, or the entire distance to the target. Ball speeds were chosen to produceuniformly spaced times-to-contact so as to avoid biased responses, but variedbetween experiments. Each experiment contained 20 trials per duration× speed condition (16 conditions total) in randomized order, resulting ina total of 320 trials per experimental block per participant, further splitinto two sub-blocks of 160 trials. Each experiment contained two or threeblocks and the block ordering was counterbalanced across participants. Eachparticipant completed 10 demo trials at the beginning of each experimentand each experiment took no more than 45 minutes to complete.102.1. General MethodsFigure 2.1: The baseball stadium model used in the TTC experiments.2.1.2 Apparatus and Set-upStimuli were presented in an Oculus Rift DK2 (Oculus VR, Menlo Park, CA)with a built-in 60 Hz SMI eye-tracker (SensoMotoric Instruments, Berlin,Germany). The eye-tracker is accurate to <1 degree of visual angle, and a 9-point calibration of the eye tracker was performed before each experimentalblock.Participants were seated with their head stabilized using a chin rest(Figure 2.2). We ran experiments through Unity® with an i7 860 processor,6GB RAM and a GTX 780 graphics card. Although the GPU is below therecommended specifications for the Oculus Rift, no indication of inadequateresources was observed (e.g., dropped frames).In addition to stimulus presentation in VR, Experiment 2 also used aclassic laboratory non-VR setting for visual stimulus presentation. Stimuliwere presented at a distance of 51 cm on a CRT monitor (NEC MultiSyncFP2141SB) with a resolution 1600 × 1200 pixels, set to a frame rate of 85Hz. Eye movements were recorded with a high-resolution desktop-mountedeye tracker (Eyelink 1000, SR Research, Ottawa, ON) and participants wereseated with their head stabilized by a combined chin and forehead rest. Inthis setting we used the same Unity executable file but changed the field ofview to account for the increased distance between the participant and thedisplay.2.1.3 Data AnalysisParticipants performance in the TTC task was measured as reaction time(in seconds). For each subject and trial, we calculated the temporal response112.2. Experiment 1: Lateral vs. Looming MotionFigure 2.2: The physical set-up for the TTC experiments: chin-rest, key-board, monitor, and Eyelink 1000. The monitor and Eyelink 1000 were usedonly for one section of Experiment 2.error by subtracting actual TTC from reaction time. Positive errors indicatea late response and negative errors indicate an early response. Outliers wereremoved using a z-score analysis with a cut-off at 3 standard deviations.Repeated measures analysis of variance (ANOVA) was used to determinemain effects of viewing time and speed, as well as the different manipulationsmade in each experiment. Post-hoc t-tests with Holm-Bonferroni correctionswere used to test for differences in specific conditions.In all experiments we asked participants about their level of experience indifferent sports. We found that participants who had played an interceptiveball sport (e.g., baseball, badminton, tennis) competitively gave responsesthat were very different from the rest. We thus analyzed them separately.2.2 Experiment 1: Lateral vs. Looming Motion2.2.1 PurposeThe purpose of Experiment 1 was to examine the accuracy of human sub-jects in a TTC experiment in VR. Since no experiments of this nature havebeen completed, we examine the simplest scenarios, as is common in psy-chophysics experiments. We wanted to establish whether there were anysystematic weaknesses in peoples’ ability to estimate TTC. Knowing thiswould allow us to identify areas where new methods could be introducedto accommodate for the uncovered weaknesses. There is evidence from ani-mal studies that there are different populations of neurons sharply tuned toeither motion within a frontoparallel plane with static disparity or motioninvolving changing-disparity [18]. Thus it makes sense to test accuracy in122.2. Experiment 1: Lateral vs. Looming Motion(a) (b)Figure 2.3: The procedure for Experiment 1. A ball moved from the crossto the target and subjects had to press a button when they thought theball would hit the target. (a) The first experimental block. (b) The secondexperimental block.both these conditions.Previous studies evaluating online absolute TTC estimates have foundthat TTC is underestimated when it greater than some threshold and isoverestimated when less than the threshold. Typically this threshold isbetween one and two seconds [17].We use a button press as our response technique. This distinguishesour work from some previous results which have used relative or off-lineestimates but allows us to make claims about actually taking actions atappropriate times in VR, which is a focus of ours.2.2.2 Visual Environment, Task and ProcedureExperiment 1 consisted of two experimental blocks. In both blocks partic-ipants were positioned in the virtual baseball stadium directly above homeplate. In the first block the ball moved from left to right and in the secondblock the ball moved directly toward the subject (Figure 2.3). The speedsand corresponding TTCs are given in Table ParticipantsWe tested 8 subjects (mean age: 27 yrs, std = 6.1 yrs; 4 of them female),undergraduate or graduate students at the University of British Columbia.All participants had normal or corrected-to-normal visual acuity and normal132.2. Experiment 1: Lateral vs. Looming MotionSpeed (m/s) TTC (s)30 1.020 1.515 2.012 2.5Table 2.1: The ball speeds and corresponding times-to-contact used inExperiment 1.stereo vision, determined by a stereo acuity test (Stereo Fly Test, PrecisionVision Inc., Woodstock, IL). All were unaware of the experimental hypothe-ses.2.2.4 ResultsA summary of the response data is given in Figure 2.4 and a summary ofthe ANOVA is given in table 2.2. Many of the interaction effects involvingthe direction were significant so it makes sense to look at the data for thedifferent directions separately. For lateral motion, responses were nearlyperfect. There was a significant effect of presentation duration, but notof speed. Most responses were slightly late except for the slowest speedand shortest presentation duration which produced slightly early responses.Viewing time correlated to variance in responses, with the shortest viewingtimes producing the most variation in responses.For looming motion, there were significant effects of speed, presentationduration, and their interaction. Previous research in TTC has indicatedthat people will tend to underestimate TTC when actual TTC is over 1.5sand will overestimate TTC when actual TTC is less than 1.5s. Our resultswere consistent with this. The interaction effect between presentation du-ration and speed can be summarized by saying that errors were magnifiedby shorter viewing times. The shortest viewing time producing the latestresponses for the shortest TTCs and earliest responses for the longer TTCs.With short viewing times, people were unable to estimate speeds accuratelyand tended toward the average speed.142.3. Experiment 2: VR vs. Single ScreenFigure 2.4: Judged vs. actual TTC (in seconds) for Experiment 1. Black lineindicates perfect responses. Left panel shows the left-to-right trials, rightpanel shows the looming trials. Error bars show standard error.2.3 Experiment 2: VR vs. Single Screen2.3.1 PurposeThe purpose of Experiment 2 was to test for differences in TTC estimationusing a VR display and a conventional 2D display. This would allow us todetermine the effect of the additional binocular stereo cues present in theVR display. Knowing this would valuable as it might give insight into whena VR display should be used over a conventional monitor. For example,baseball players often use a television when watching pitches and trainingthemselves to be able to distinguish a ball from a strike. Knowing the effectsthat a VR display has might influence whether it is worth investing in a newway to train.We maintain much of the simplicity of the previous experiment (e.g., con-stant velocities) in order to avoid introducing variables that could conflateour results.Some practical consideration must be taken into account for this experi-ment due to the differences in experimental set-up inherently introduced bythe two displays. Since the 2D display is further from the eye than the VRscreen, the FOV of the virtual camera for the 2D section must be adjusted152.3. Experiment 2: VR vs. Single ScreenEffectSum ofSquaresdfMeanSquareF p η2Presentation Duration (PD) 1.24 3 0.0847 4.88 0.01 0.41Direction (D) 0.68 1 0.2680 2.54 0.16 0.27Speed (S) 3.21 3 0.0227 47.10 <0.001 0.87PD × D 0.37 3 0.0617 1.99 0.15 0.22PD × S 2.14 9 0.0039 60.49 <0.001 0.90D × S 1.27 3 0.0097 43.34 <0.001 0.86PD × D × S 1.13 9 0.0032 39.16 <0.001 0.84Table 2.2: The results of the repeated measures ANOVA performed forExperiment 1.to compensate. Additionally, in VR subjects cannot track the ball all theway to the target because the target is next to their head and they arehead-stabilized facing forward. For the screen less eccentric viewing anglesare needed.2.3.2 Visual Environment, Task and ProcedureParticipants were placed in the same virtual environment used in Experi-ment 1. Their position was 0.75m to the left or right of home plate for right-or left-handed participants, respectively. A red fixation cross was placed ateye-level above the pitchers mound, 18.4m from home plate (Figure 2.5). Atthe start of each experimental trial, participants were instructed to fixate onthe cross. Upon successful fixation, determined by eye tracking, the crosswas replaced by a simulated baseball that moved at a constant, horizontalvelocity toward home plate. Participants were instructed to estimate whenit would cross home plate, i.e., collide with the fronto-parallel plane thatcontains the participants eyes, by pressing an assigned button on a key-board. The baseball was a to-scale, textured, 3D model of a baseball thatwas animated with backspin to make the horizontal velocity more realistic.Experiment 2 consisted of two experimental blocks. One was completedin VR and the other was completed using a single screen. The speeds andcorresponding TTCs are given in Table ParticipantsWe tested 10 subjects (mean age: 26 yrs, std = 2.6 yrs; 1 of them female),undergraduate or graduate students at the University of British Columbia.162.3. Experiment 2: VR vs. Single ScreenFigure 2.5: The procedure for Experiment 2. A ball moved from the crossto the target and subjects had to press a button when they thought theball would hit the target. Right-handed subjects were placed 0.75m to theleft of home plate and left-handed subjects were placed 0.75m to the rightof homeplate. Subjects were rotated toward the cross placed above thepitcher’s mound.Speed (m/s) TTC (s)36.8 0.518.4 1.012.3 1.59.2 2.0Table 2.3: The ball speeds and corresponding times-to-contact used inExperiment 2.All participants had normal or corrected-to-normal visual acuity and normalstereo vision, determined by a stereo acuity test (Stereo Fly Test, PrecisionVision Inc., Woodstock, IL). All were unaware of the experimental hypothe-ses.2.3.4 ResultsA summary of the response data is given in Figure 2.6 and a summary ofthe ANOVA is given in Table 2.4. We found significant effects of speedand viewing time but no significant effect of which display was used. Theeffects of speed, viewing time, and their interaction were consistent with theprevious experiment’s looming section and previous research on TTC.Although there is no significant effect of display used, there seems to bea small, consistent difference between responses given on each display. Since172.3. Experiment 2: VR vs. Single ScreenFigure 2.6: Judged vs. actual TTC (in seconds) for Experiment 2. Blackline indicates perfect responses. Panels are split by presentation duration.Error bars show standard error.η2 values are >0.1 it is likely that the experiment just did not have enoughpower for these differences to be significant. For the two longer presentationdurations, using VR caused earlier responses. One explanation for this isthat the additional binocular cues present in the VR display made peoplemore accurate. The fact that responses become too early for some of thefully-viewed trials may be due to a bias to respond early when somethingis sensed very close, a defense mechanism to prevent getting hit. Since theball is perceived to be much closer in VR, it may trigger this bias while thescreen would not.Previous work states that binocular cues to motion-in-depth are usedmore strongly for faster moving objects and at lower viewing times (monoc-ular cue is weighted more heavily with time) but that the binocular signalbecomes less effective at generating a sensation of motion-in-depth as view-ing time decreases. Therefore situations where the binocular signal is givenhigh weight but is ineffective at producing a reliable signal of motion shouldresult in higher errors. This explains why the shortest viewing time produceshigher errors in VR at three out of four speeds and the second shortest view-ing time produces higher errors in VR only at the fastest speed.182.4. Experiment 3: Effect of Absolute DepthEffectSum ofSquaresdfMeanSquareF p η2Presentation Duration (PD) 2.034 3 0.0811 8.36 <0.001 0.48Display (D) 0.219 1 0.0179 2.9162 0.12 0.24Speed (S) 2.187 3 0.0296 24.61 <0.001 0.73PD × D 0.011 3 0.0157 0.24 0.87 0.03PD × S 1.177 9 0.0136 9.61 <0.001 0.52D × S 0.004 3 0.0045 0.2692 0.85 0.03PD × D × S 0.018 9 0.0026 0.77 0.65 0.07Table 2.4: The results of the repeated measures ANOVA performed forExperiment 2.2.4 Experiment 3: Effect of Absolute Depth2.4.1 PurposeThe purpose of Experiment 3 was to determine the effect of depth on TTCestimation in VR. Since depth is not involved in the relative importance be-tween the monocular and binocular predictors of TTC, adjusting the depthof stimuli allows us to isolate effects of the vergence-accommodation conflict.Further stimuli are in a higher degree of conflict than stimuli closer to thefocal plane (1.5m). In addition to evaluating the effects of the vergence-accommodation conflict in this setting, this experiment also provides amethod for reducing the degree of conflict for constant-velocity objects.2.4.2 Visual Environment, Task and ProcedureExperiment 3 consisted of three experimental blocks. The procedure for thefirst block was a baseline condition that was identical to the VR section ofExperiment 2. In the second and third blocks, the stimuli was compressedinto a limited depth range, while maintaining monocular cues and TTC. Inthe second block a maximum depth of 9m was used and in the third block amaximum depth of 2m was used (Figure 2.7). The speeds and correspondingTTCs used in Experiment 3 are given in Table ParticipantsWe tested 10 subjects (mean age: 23 yrs, std = 3.0 yrs; 5 of them female),undergraduate or graduate students at the University of British Columbia.All participants had normal or corrected-to-normal visual acuity and normal192.4. Experiment 3: Effect of Absolute DepthFigure 2.7: The procedure for Experiment 3. A ball moved from the crossto the target and subjects had to press a button when they thought theball would hit the target. Experimental blocks varied in the distance of thestarting cross.Speed (m/s) TTC (s)36.8 0.524.5 0.7518.4 1.014.7 1.25Table 2.5: The ball speeds and corresponding times-to-contact used inExperiment 3.stereo vision, determined by a stereo acuity test (Stereo Fly Test, PrecisionVision Inc., Woodstock, IL). All were unaware of the experimental hypothe-ses.2.4.4 MethodsTo compress the stimuli to a smaller depth range we first fix a maximumdepth dmax. Then when simulating the position of the original ball we canconvert a position in the original space to a position in the compressed space.Since our original ball travelled from 18.4m to 0m at a constant velocity,converting from an original position, po, to a compressed position, pc, isgiven bypc = pC +z(po − pC)z (po − pC) (2.1)z =dmax18.4(1− tTTC) (2.2)202.4. Experiment 3: Effect of Absolute Depthwhere pC is the camera position. The ball is then scaled to maintain thesame visual angle it had before being compressed. Thus monocular cuesremain constant and binocular cues, while decreased, indicate the same valueof TTC and remain above detection threshold. It should be noted thatalthough the ball is scaled to have the same visual angle before and afterdepth reduction, the closer ball will still appear smaller since binocular visionallows for the perception of curvature.This can cause numerical instabilities as poz approaches pCz but was usablefor our purposes because the object was out of view by that point.2.4.5 ResultsA summary of the response data is given in Figure 2.8 and a summary ofthe ANOVA is given in Table 2.6. We found significant effects of speed andviewing time but no significant effect of the distance of the starting cross.However, there seems to be small, consistent differences between responsesgiven at different starting distances for the shorter two presentation dura-tions. Because η2 values are >0.1 it is likely that the experiment just didnot have enough power for the differences to be significant. Both reductionsin distance confer an advantage for the lowest presentation duration butthe biggest reduction confers a greater advantage as presentation durationincreases. This implies that part of the error for the baseline responses withshort viewing times is due to the object being far away, not just due to hav-ing been viewed for a short period of time. Reducing the distance of trialswith 1/2 viewing times to dmax = 2 reduced error by up to 70%. Since the1/4 viewing time trials were not improved as much, viewing time must stillbe a critical factor in predicting the TTC of looming objects.EffectSum ofSquaresdfMeanSquareF p η2Presentation Duration (PD) 2.48 3 0.0674 12.28 <0.001 0.58Distance (D) 0.08 1 0.0172 4.44 0.06 0.12Speed (S) 2.41 3 0.0353 22.74 <0.001 0.71PD × D 0.015 3 0.0035 1.43 0.26 0.13PD × S 0.347 9 0.0058 6.62 <0.001 0.42D × S 0.009 3 0.0032 0.92 0.45 0.07PD × D × S 0.008 9 0.0026 0.32 0.97 0.09Table 2.6: The results of the repeated measures ANOVA performed forExperiment 3.212.5. Experiment 4: Effect of Conflicting CuesFigure 2.8: Judged vs. actual TTC (in seconds) for Experiment 3. Blackline indicates perfect responses. Panels are split by presentation duration.Error bars show standard error.2.5 Experiment 4: Effect of Conflicting Cues2.5.1 PurposeThe purpose of Experiment 4 was to test two methods to modify depth cueswith the goal of making users more accurate at speed estimation in VR. Thiswould have a large number of implications for VR in general if successful.It would provide a way to correct for the perceptual limitations introducedby the medium to provide greater realism and immersion. This experimentis quite similar to the study conducted by Rushton and Wann [54] whodocumented effects of having monocular cue indicate a TTC 10% before orafter the binocular cue. They found that responses were biased to the cuethat indicated the earlier TTC. In our experiment we specifically alter cuesto compensate for the degree that participants in previous studies misjudgedTTC. Our approach here was to develop a one-size-fits-all motion correctionmethod by looking at averaged data. Individual differences between peoplewere not addressed until the next experiment.222.5. Experiment 4: Effect of Conflicting Cues2.5.2 Visual Environment, Task and ProcedureExperiment 4 consisted of three experimental blocks. The first block was abaseline condition that was identical to the VR section of Experiment 2 andthe baseline section of Experiment 3. In the second block, the size of the ballwas adjusted to produce the monocular motion-in-depth cues of a faster ballwhile maintaining its binocular motion-in-depth cues. In the third block,the speed and size of the ball are adjusted to produce the binocular motion-in-depth of a faster ball while maintaining its monocular motion-in-depthcues.Ball speeds and corresponding TTCs are given in Table ParticipantsWe tested 9 subjects (mean age: 25 yrs, std = 4.1 yrs; 4 of them female),undergraduate or graduate students at the University of British Columbia.Three participants had experience playing an interceptive ball sport (e.g.,baseball, badminton) and were analyzed separately. All participants hadnormal or corrected-to-normal visual acuity and normal stereo vision, de-termined by a stereo acuity test (Stereo Fly Test, Precision Vision Inc.,Woodstock, IL). All were unaware of the experimental hypotheses.2.5.4 MethodsTo determine the speed of the faster ball whose monocular/binocular cuesare put onto the original ball, we created a model of how users perceivedifferent speeds in VR from their TTC estimates. Using this model wewould be able to choose a speed that will be perceived as some desiredspeed. To create this model we used the response data from Experiment 2.We derive an estimate of perceived speed by dividing the distance the balltravelled by a subject’s response time. We fit a line to all subjects’ perceivedspeeds (in m/s) by minimizing squared error. The resulting model is givenbyvperceived = 0.76 vactual + 2.9 (2.3)from which we can get a speed with the desired perception by plugging ouroriginal speed as vperceived. Let the original speed be called the display speedand the new speed, which produces the perception of the original speed, becalled the modified display speed. The modified display speeds are given inTable 2.7232.5. Experiment 4: Effect of Conflicting CuesDisplay Speed (m/s) Modified Display Speed (m/s) TTC (s)36.8 48.5 0.524.5 30.7 0.7518.4 21.7 1.014.7 16.4 1.25Table 2.7: The ball speeds and modified speeds used in Experiment 4.To transfer the monocular cues from the modified display speed ball ontothe display speed ball (Figure 2.9b) we adjust the size of the display speedball to have the same visual angle as the modified display speed ball. Visualangle of a spherical object is given byθ = 2 arctan(rd) (2.4)where r is the object’s radius and d is the object’s distance. Therefore itcan be seen that the original ball must be scaled by dodm , where do is thedistance of the original ball from the observer and dm is the distance of themodified ball from the observer. One issue with this technique is that oncethe modified display speed ball passes the subject, its perceived size willbecome smaller than the original ball and the original ball will shrink. Toaddress this, once the modified display speed ball is within a metre fromthe subject, the original ball just continues to expand linearly at the ratedetermined from the two most recent frames.To transfer the binocular cues from the modified display speed ball ontothe display speed ball (Figure 2.9c) we place the original ball at the posi-tion of the modified ball and then scale the ball to maintain it’s originalmonocular cues. Here the scaling factor will be dmdo .Although our model is based on speeds, this is simply a more intuitiveway of looking at TTC estimates. If our stimuli started at a different distancewe would not expect our model of how users’ perceive speed to be accurate.2.5.5 ResultsA summary of the response data for the novice sports players is given inFigure 2.10 and a summary of the response data for experienced sportsplayers is given in Figure 2.11. A summary of the ANOVA run for allsubjects is given in Table 2.8. The differences in the ANOVA for each groupis discussed below.We found significant effects of speed and viewing time as well as thedifferent motion-in-depth cue modifications we made. In the novices’ data,242.5. Experiment 4: Effect of Conflicting CuesFigure 2.9: The modified depth cues used in Experiment 4. (a) Baseline.(b) Monocular cues transferred to baseline ball with binocular cues main-tained. (c) Binocular cues transferred to baseline ball with monocular cuesmaintained.it can be seen that the manipulation of either the monocular cue (size-adjusted) or the binocular cue (speed-adjusted) provided a significant changein response. Further, this change was consistent across speeds and viewingtimes, reducing responses by 85-110ms. While both interventions causedreductions in response time, they were not significantly different from eachother. This is consistent with the result from Rushton and Wann whichshowed that humans will be biased toward the cue that indicates the earlierTTC [54].For the experts’ data, we found only a significant effect of speed and asignificant interaction effect between speed and presentation duration, butno effect of presentation duration or the modified cues. Looking at theirresponse data, it is clear that these subject’s differed from the distributionof subjects seen previously: most of their responses were late. Anotherinteresting finding from these subjects was that the cue adjustments seemsto have a considerably different effect. While the speed adjustment’s effectlooks similar to the novices’, the size adjustment had a much smaller, andsometimes negative, effect. It has been shown that level of experience orskill in a task will affect how available information is integrated [8]. Itwould appear that these subjects are more focused on the actual position ofthe ball than its rate of apparent size increase.252.5. Experiment 4: Effect of Conflicting CuesFigure 2.10: Judged vs. actual TTC (in seconds) for Experiment 4, novicesports players only. Black line indicates perfect responses. Panels are splitby presentation duration. Error bars show standard error.Figure 2.11: Judged vs. actual TTC (in seconds) for Experiment 4, experi-enced sports players only. Black line indicates perfect responses. Panels aresplit by presentation duration. Error bars show standard error.262.5. Experiment 4: Effect of Conflicting CuesEffectSum ofSquaresdfMeanSquareF p η2Presentation Duration (PD) 0.70 3 0.0479 4.88 <0.001 0.37Cue Adjustment (CA) 0.45 2 0.0327 6.83 0.007 0.46Speed (S) 2.45 3 0.0280 29.19 <0.001 0.78PD × CA 0.04 6 0.0041 1.48 0.21 0.15PD × S 0.58 9 0.0053 12.11 <0.001 0.60CA × S 0.02 6 0.0012 2.28 0.05 0.22PD × CA × S 0.08 18 0.0011 1.05 0.41 0.11Table 2.8: The results of the repeated measures ANOVA performed forExperiment 4.27Chapter 3User Specific MotionCorrectionAs seen in Experiment 4, it is possible to effect subjects’ responses by mod-ifying one motion-in-depth cue. One failing of Experiment 4, which causedresponses to be made less accurate for some subjects, was not taking intoaccount individual differences. We had fit one model to a group of subjectsand used the model on another group of subjects. The subjects who weredifferent from the subjects the model was fit to did not see improvement.Thus it would follow that a better solution would be to develop a model foreach subject, which is what is done in our software tool.In this section we continue to conflate speed perception and TTC esti-mation since we keep our distances fixed. When we say “perceived speed”it should be interpreted as the speed at which a subject’s TTC estimationwould be accurate.3.1 MethodsOur tool is composed of two main parts. In this first part, User Calibra-tion, we determine the parameters of a model which characterizes how auser perceives motion-in-depth in VR. In the second part, Motion Correc-tion, we use a novel strategy for modifying the motion-in-depth cues of anarbitrary VR object using the parameters determined from the calibration(size-adjustment seen in Experiment 4). The modifications are meant topresent a set of stimuli that will increase a user’s ability to intercept themovement of the original object and thereby increase accuracy when inter-acting with virtual objects.To be concrete, let v = (x˙ y˙ z˙)T be the velocity of a small object, ex-pressed in a world-fixed coordinate frame with origin located at the nominallocation of the observer’s head and oriented using the typical convention ingraphics (−z is the viewing direction, y pointing up). Experiment 1 sug-gested that horizontal motion could be perceived accurately across speeds283.1. Methodsand viewing times in VR, but motion in depth (looming) is the main sourceerror. Thus we focus on modelling how an input motion at speed z˙ is actu-ally perceived, and modifying other cues (e.g., the rate of change of size) sothat the stimulus will be perceived as actually moving at speed z˙.In this thesis we use a virtual baseball environment to test our methods.The task of hitting a baseball provides a common or familiar scenario inwhich speed estimation plays a crucial role.3.1.1 User CalibrationThe goal of the calibration procedure is to relate the speed towards the user,z˙, to the user’s perception of the speed, z˙p. In other words, we construct anempirical model z˙p = f(z˙) by estimating the parameters of the model.We derive an observer’s perceived speed from their subjective estimatesof time-to-contact. Time-to-contact is the relevant measure of speed whenobject interception or avoidance is the goal, and is easy to report. In ourmethod we ask the observer to indicate, by button press, when a virtualobject moving with a constant velocity towards them will collide with theirhead. The object disappears after travelling three quarters of distance tothe user to prevent any feedback on the actual time of contact. The responsetimes from a representative subject can be seen in Figure 3.1a. To determinea perceived speed for a given object speed, we take the median responsetime over all trials for that speed. By dividing the distance travelled bythe median response time, we arrive at an estimate of perceived speed. Theestimates of perceived speed for a subject are shown in Figure 3.1b.After determining (z˙, z˙p) pairs, we fit a model to this data to allow usto predict perceived speed for any model speed. The data seem to be wellapproximated as an affine functionz˙p = f(z˙) = w0 + w1z˙ (3.1)The model parameters w0 and w1 are estimated using least squares regres-sion.Without much a priori knowledge on how VR affects the perceptionof motion, we cannot justify using very complicated models. An alterna-tive may be to map actual time-to-contact to user response time insteadof mapping actual speed to perceived speed. Such a model would be moreappropriate if the distance travelled by objects was not constant. Since ourcalibration and test environments had objects travelling the same distance,mapping based on time and speed are equivalent.293.1. Methods(a) (b)Figure 3.1: Calibration data from one subject. (a) Response times for thecalibration procedure, differentiated by display speed. The three speedscorrespond to actual times-to-contact of 0.90, 0.62 and 0.47 seconds. (b)The model fit to the subject’s perceived speeds.3.1.2 Motion CorrectionOnce we have a model that can predict how a user will perceive a virtualspeed in VR, we can modify depth cues to change the user’s perception ofvirtual motion. Motion through depth is indicated primarily though twodepth cues. The first is the monocular motion-in-depth cue, τ, which corre-sponds to the rate of change of apparent size of an object as it moves throughdifferent depths. An object appears larger when it is close to a viewer andsmaller when it is far away. The second motion-in-depth cue, the rate ofbinocular disparity change, comes from the difference of the images createdby an object on a viewer’s left and right retinas.Binocular disparity is a function of the actual distance of the objectand vergence; thus manipulating binocular disparity can only be done bymanipulating the actual position of the object in the two images. Sincewe are attempting to facilitate more accurate interactions, adjusting theposition of an object may be challenging. One would want to ensure thatan object is in the position a user expects it to be at the time when theywant to interact with it.Adjusting the size of objects can be easily accomplished in VR software.Thus to facilitate more accurate perceptions our technique is to present theposition of the object according to the actual model speed while giving theobject the monocular depth cues of an object travelling at a speed that willbe perceived as the actual model speed (see Figure 3.2).303.1. MethodsFigure 3.2: (a) The position of a ball at four points in a pitch moving withvelocity v with actual size shown. (b) The position of a ball moving withvelocity v′ with perceived size shown. (c) The output of the modifications.The position of the ball with velocity v and perceived size of ball withvelocity v′. After a certain distance, perceived sizes are extrapolated fromprevious values.Specifically, given an desired velocity to display, v = (x˙ y˙ z˙)T , we com-pute a modified velocityv′ = (x˙ y˙ f−1(z˙))T , (3.2)where f is the model described in Equation. 3.1. We then determine thevisual angle subtended by the object moving with velocity v′ and updatethe size of the original object to subtend the same visual angle. The visualangle of a spherical object is given byθ = 2 arctan(rd) (3.3)where r is the radius of the object and d is the objects distance to the viewer.It can be seen that the size of the original object should be scaled by d/d′where d is the distance between the original object and the viewer and d′ isthe distance between the object moving with velocity v′ and the viewer.For looming objects, the procedure above will only produce desirableresults during the time-span in which both actual and simulated objectare in front of the viewer (Figure 3.2). For example, consider a user whounderestimates speed. When the original object is halfway to the user the313.2. Experiment 5: Motion Correction Tool Evaluationsimulated object might be three quarters of the way to the user. At thispoint the simulated object will subtend a larger visual angle so the originalobject will be made bigger. However, once the simulated object passes theviewer it will start to subtend a small and smaller visual angle. We do notwant our original object to start getting smaller because that would not helpin making the object seem like it is moving faster. To work around this, wemonitor the size of the object and detect when it has stopped increasing (ordecreasing). Once that is detected there are number of sensible things to do.We could keep the size constant from that point on. Alternatively, we couldmaintain the rate of expansion the object had before. We found that keepingsize constant caused a weird artifact in the motion of the object which makesit seem like it is suddenly changing speed or direction. We also found thatmaintaining the rate of expansion, which is geometric, could cause objectsto become incredibly big. What we found worked best was continuing alinear rate of expansion using the two most recent sizes to determine therate.3.2 Experiment 5: Motion Correction ToolEvaluationWe conducted an experimentally controlled user study to evaluate our motion-correction tool’s effectiveness for increasing user accuracy in VR. Humansubjects completed a calibration procedure and then played a baseball gamewith our methods used to modify the motion-in-depth cues of the baseball,and to alter perception of the ball’s movement. Subjects were randomlyassigned to one of two groups: one group played the baseball game withoutdepth cue modification in block 1 (baseline), then completed the same taskwith depth cue modification in block 2 (treatment). The other group com-pleted the tasks in reverse order, first treatment then baseline, to control forany effects of training.This experiment is markedly different from the previous experimentsbecause it involves large human actions, where the previous experimentsused only button presses. This means that performance errors will involveerrors from visual processing as well as motor action. Button presses alsorequire motor action but the swinging motion used here is considerablymore complicated and will be less consistent. Although this may make theresults of the modification harder to measure, it puts the experiment firmlyin the dorsal visual stream and permits us to make conclusions regardingour technique in realistic interactive VR scenarios.323.2. Experiment 5: Motion Correction Tool Evaluation(a)(b)Figure 3.3: Virtual environments shown to subjects in Experiment 5. (a)Calibration section. (b) Testing section.3.2.1 Virtual EnvironmentVisual stimuli were presented in an Oculus Rift headset (Oculus VR, MenloPark, CA) connected to a computer with a 2.3 GHz processor, 448 GBRAM, and a GTX Titan X graphics card. The virtual visual environmentwas a custom built application developed in Unity. For the calibration task,the virtual environment was a large open field, consisting of a ground plane,tiled with an image of grass (Figure 3.3a). For the testing session, the virtualenvironment was a realistic baseball field (Figure 3.3b). This environmentwas captured as a 360-deg photo with a Ricoh Theta S camera. The imagewas then re-projected onto a hemisphere to provide a ground plane at theappropriate depth. A 3D model of a pitcher, purchased from the Unity AssetStore, was placed over the pitchers mound.333.2. Experiment 5: Motion Correction Tool Evaluation3.2.2 Visual Stimuli, Task and ProcedureCalibration Task.A red cross was placed 18m in front of the subject and presented for 0.5to 1 sec. Upon disappearance of the fixation cross, a textured cube (sidelength 30 cm) appeared and moved directly towards the subject at a speedof either 45, 65, or 85 mph (corresponding to times to contact of 0.90, 0.62and 0.47 sec). The cube never reached the subject but disappeared at arandom distance 4 to 5m away from the subject. The task was to press abutton on the Oculus Touch controller at the estimated time to contact withthe subject’s head. Subjects completed 20 consecutive trials at each speedfor a total of 60 trials.Baseball Task.Subjects were placed standing in the virtual environment in the batter’s loca-tion and a natural stance that depended on the subject’s handedness (e.g.,right-handed subjects’ left foot forward, and vice versa for left-handers).Subjects held an Oculus Touch controller that corresponded to a baseballbat in the virtual environment. They were asked to hit a looming, pitchedball as straight as possible back towards the pitcher. A model of the naturalflight of a baseball was developed based on parameters (Table 3.1) derivedfrom the literature [3, 55]. We used an enhanced Euler update with thefollowing equations.a =1m(Fgravity + Flift + Fdrag) (3.4)Fgravity = −g m (0 1 0)T (3.5)Flift =12ρ Cl sin(θ)A‖v‖v (3.6)Fdrag = −12ρ CdA‖v‖v (3.7)The ball moved at 40, 60 or 80 mph, directly over home plate, andthrough the strike zone. The order of pitch velocities was randomized fromtrial to trial but was consistent across participants. Subjects completed thistask at a self-directed pace. Each trial started with pressing a button on theOculus Touch controller, initializing the pitching animation.We instructed subjects to start each swing with the bat pointing behindthem, and finish with the bat pointing in front. In a normal baseball sce-nario, it is possible for the batter to have perfect timing but swing either343.2. Experiment 5: Motion Correction Tool EvaluationVariable Description Valuem Mass 0.145kgr Radius 0.037mg Acceleration of gravity 9.81m/sv Translational velocity varies throughout pitchω Rotational velocity {40, 0, 0}Hzρ Density of air 1.204kg/m3θ Angle between v and ω varies throughout pitchA Cross-sectional area 0.004m2s Spin factor r ‖ω‖‖v‖Cl Lift coefficient{1.5s s ≤ 0.10.09 + 0.6s otherwiseCd Drag coefficient 0.3Table 3.1: Parameters for baseball pitch simulation.above or below the ball, thus missing it. Because our focus was on the timingof interception, rather than its spatial accuracy, we extended the collisionbounds of the bat to be a large vertical plane (Figure 3.4).Upon successfully hitting the ball, a subject would feel haptic feedbackfrom the controller and see the ball bounce off the bat in the directiondetermined by the bat’s yaw-angle (Figure 3.4). The vertical componentof the direction was sampled uniformly at random between zero and the z(depth-component) of the velocity. This had the effect of having hits thatranged from grounders to pop-flies. The total force applied to the ball wasmade proportional to the speed of the subject’s hand during the swing.Subjects completed 50 trials at each speed in randomized order for atotal of 150 trials per block; half of the subjects were assigned to a blockorder in which they received baseline first and then treatment, the otherhalf completed treatment first and then baseline.3.2.3 Data AnalysisThe azimuth angle of the ball’s bounce off the bat was the main variableof interest in our study. We examined the hitting error, which is definedas the absolute value of the hitting angle. A value of 0 is ideal and wouldindicate a hit that went straight back towards the pitcher. We expected353.2. Experiment 5: Motion Correction Tool EvaluationFigure 3.4: The collision bounds of the bat were extended vertically. Thebounce angle of the ball was determined by the bat’s rotation around the ver-tical axis plus a random vertical component. Image source: shutterstock.comsubjects to perform better in the treatment as compared to the baselineblock across all speeds. We also expected subjects to hit more accuratelyover time, as they got more practice. We thus also analyzed learning rate,i.e., how hit angle changed over the course of the experiment. Linear mixedmodels were used to analyze hit angle data. ANOVA could not be usedsince the responses from a subject are not independent due to the effectof practice. Linear mixed models represent a response variable as a linearcombination of predictors, which include both fixed effects (non-randomquantities) and random effects and are well suited for data with multiplecorrelated measurements per subject.The overall time course of hit angles was modeled with a line (interceptand slope), a fixed effect of treatment, ball speed, and a fixed interactionbetween treatment and speed. Following a Growth Curve Analysis proce-dure [44], these factors were added progressively only when they led to amodel that explained the data significantly better (stepwise regression withforward selection). The same results were obtained regardless of the orderin which factors were added. The model also included participant randomeffects on both the slope and intercept terms corresponding to inter-subjectvariation in starting ability and learning rates on the task. Thus our finalmodel looked likey ∼ (b0 + bs + bt + bs×t + bi) + (m0 +ms +mi) x (3.8)where b are biases, m are slopes, subscripts s, t, and s× t capture the fixedeffects, and subscript i captures the random effects. Our dependent variable,y, will represent the hitting error and our independent variable, x, will363.2. Experiment 5: Motion Correction Tool Evaluationrepresent the trial number. Significance values were obtained from t-testsusing the Satterthwaite approximation to degrees of freedom; significancelevels were consistent with the Kenward-Roger approximation [27].3.2.4 ResultsAcross all conditions, subjects performed well in the task. On average, theyresponded with a mean hit angle of M = -3.7°, (std = 33.5°), indicatingthat the ball was hit slightly late on average. Subjects missed the ball in8.7% (std = 6.1°) of all trials on average. Hitting performance depended onspeed: subjects tended to overestimate balls at slow speed (hit early), andunderestimate balls at fast speed (hit late). Subjects performed best at themedium speed (hit angle M = -1.8°, std = 30.1°), as compared to the slowspeed (M = 13.5°, std = 26.9°), or the fast speed (M = -23.3°, std = 32.8°).Variability between subjects was high both in terms of accuracy andprecision. Figure 3.5 shows two example profiles of hit angles as a functionof trial number, for two representative subjects with different block order;each data point is the hit angle in a given trial. These two subjects performedat different overall levels. Subject 106 rapidly improved in hitting accuracyin the first 50 trials, then saturated at a relatively high performance level(M = 6.1° across all speeds; Figure 3.5a), and maintained this level in block2 (M = -2.4°; Figure 3.5b). Subject 110 performed at a lower level (block1: M = -14.7°, block 2: M = -10.1°; Figure 3.5c, d). Results from thesetwo representative subjects also reveal differences in response precision overtime, with smaller improvements in subject 110 vs. 106, indicating potentialdifferences in learning rate.Our statistical models address the main question to what extent perfor-mance differences are due to the intervention. To this end, we first considerdata from block 1 only, and disregard block 2, where performance improveddue to training and repetition across all conditions. The models revealeda significant effect of treatment (i.e., a significant decrease in the interceptterm) for the fastest two speeds (60 mph: estimate = -9.2°, SE = 1.81° ,p < 0.001; 80 mph: estimate = -10.1°, SE = 1.85°, p < 0.001; estimatesare given relative to the baseline). Increases in speed were found to corre-spond to higher intercept terms; negative slopes of model fits for the twofaster speeds (Figure 3.6b,c) indicate improved hitting performance overtime. The slowest speed neither showed a significant decrease in hit angleover time, nor any effect of treatment (Figure 3.6a).Taken together, these results show that subjects’ performance improvedafter depth cues were modified, but this finding depended on speed. Whereas373.2. Experiment 5: Motion Correction Tool EvaluationFigure 3.5: Relative hit angles (in degrees) as a function of trial number fortwo representative subjects. Subject 106 completed block 1 (a) with inter-vention (orange data points), and block 2 (b) without (blue data points).Subject 110 completed blocks in the reverse order, first the baseline con-dition (c), then the intervention (d). Ball speed denoted by symbol type.Negative hit angles indicate late hits, positive hit angles indicate early hits.Misses are coded at 90° hit angle by default, indicated by shaded box at topof each panel.383.3. DiscussionFigure 3.6: Hitting error (in degrees) across all observers (n=9 per group)for three speeds. Lines are best fit model fits. Each data point is the averagehitting error per trial across observers; each panel shows 50 trials per speed(total of 150 trials per block). Error bars denote standard errors.treatment had no effect at the slowest speed, performance improvementswere significant at the two faster speeds.We next considered all data, and included block and group (treatmentfirst or control first) as fixed-effects in our model. Results show that perfor-mance improved significantly from block 1 to block 2 (fixed effect of group,estimate = -4.5°, SE = 0.51°, p < .001). Importantly, treatment effects weresimilar to those obtained with a model including only block 1 data, indicat-ing that treatment effects hold even when considering results obtained aftertraining / experience (60 mph: estimate = -1.1°, SE = 1.23°, p < .001; 80mph: estimate = -4.1°, SE = 1.25°, p < .002; estimates are given relative tothe baseline).3.3 DiscussionWe have shown that we can significantly improve a user’s perception ofmotion in VR displays by conducting only a short calibration. This methodcan be easily implemented in any VR display, analogous to the ubiquitousGamma correction used to correct for the non-linearity in light output ofCRT displays. Our study showed that there is some power of generalizationto our method since our testing task used different speeds and directionsthan the calibration task. This implies that many other experiences (e.g.,other sports, games) could be similarly improved using the same calibrationdata.Future models could take into account size and disparity, tailored todifferent object sizes [54]. Moreover, future methods could be adaptive,393.3. Discussiontaking into account individual variations to determine how many trials wouldbe needed for an accurate estimate of a given user’s errors.It is interesting that the slowest speed in our user-study did not see anysignificant improvement over time or effect of the treatment. It is possiblethat when speed is slow enough, the effect of depth underestimation in VRwill be stronger than the effect of speed underestimation. This would explainwhy many users would swing early for the slow pitches despite respondinglate for the slowest speed in the calibration.The disappearance paradigm for TTC experiments has drawbacks andother calibration procedures could be used. Firstly, it has been shown that ifan object is occluded, rather than disappear, it will result in more accurateTTC estimates, potentially due to being more realistic [36]. Secondly, itis slightly difficult to explain to subjects that they should ignore the factthe object disappears and try to extrapolate its motion. Lastly, it seemslike the disappearance, especially for slower speeds, forces subjects to elicitcognitive strategies which is not desirable. Occlusion or perhaps fading theobject into disappearance should work better.40Chapter 4Eye-TrackingMost of the eye movement data used in this thesis came from an SMI 60Hz eye tracker installed in an Oculus DK2. The data was processed andanalyzed following procedures from Diaz, et al. [11]. The following fea-tures were recorded for all experiments: left/right/average point of regards(PORs), left/right/average pupil position and size, and left/right/averagegaze direction and gaze base point. We will document here the filtering andanalysis process and show sample data from the TTC experiments.4.1 Data FilteringGaze directions are first filtered using a 3-unit wide median filter followedby a 3-unit wide Gaussian filter. Filtered and unfiltered gaze directions fora left-to-right-moving object are given in Figure 4.1.Angular velocities can then be calculated from the filtered gaze direc-tions. This is done by dividing the angular distance between gaze directionsin adjacent frames by the time between the frames. Angular velocities arethen filtered with a 3-unit wide median filter, followed by a 3-unit Gaussianfilter, followed by a FIR filter with a kernel of [0 1 2 3 2 1 0]. Filtered andunfiltered angular velocities are given in Figure 4.2. It can be seen thatthe filtering process will decrease the peak value and increase the length ofsaccades.Gaze accelerations are calculated from filtered angular velocities usingfinite differences (Figure 4.3).Angular distances between gaze and an object can be calculated as ab-solute distance or distance along the vertical or horizontal axis (Figure 4.4).Absolute distance can be calculated easily by computing the angle betweenthe gaze vector and the vector from the average gaze base point to the objectat every frame. Angular distance along an axis can be computed using thealgorithm presented in Diaz, et al. [11].414.1. Data Filtering(a) (b)Figure 4.1: Average eye direction during a 2.5s left-to-right TTC experi-ment. (a) Raw data. (b) Filtered with 3-unit wide median and Gaussianfilters.(a) (b)Figure 4.2: Average eye velocity during a 2.5s left-to-right TTC experiment.(a) Computed from filtered gaze directions. (b) Further filtered with 3-unitwide median and Gaussian filters, followed by a FIR filter.424.1. Data FilteringFigure 4.3: Average eye acceleration during a 2.5s left-to-right TTC exper-iment. Computed via finite differences from the filtered velocities.(a) (b)Figure 4.4: Angular distance between gaze and object during a 2.5s left-to-right TTC experiment. (a) Absolute angular distance. (b) Angular distancealong the horizontal and vertical axes. Negative values correspond to leftand down.434.2. Saccade Detection4.1.1 Converting Eyelink Data to 3DAfter configuring an Eyelink to work from within Unity, or some other ap-plication, we will need to convert the POR data given by the Eyelink intothe 3D geometry of the virtual environment. This can be done by convertingthe pixel values to metres and using a measured head position relative tothe screen to compute the gaze directions. Here we assume an origin justbelow the centre of the screen (on the table, for easy measuring).mx = pxwmwp− wm2(4.1)my = (hp − py) hmhp+ hs (4.2)where m = [mx my] is position in meters, p = [px py] is pixel position,wm× hm is the size of the screen in meters, wp× hp is the size of the screenin pixels, hs is the height of the bottom of the screen from the ground.hp − py is used because the eyelink uses the top left as the origin.Once the PORs are in a measurable format (e.g., metres), measure theposition of a persons eyes relative to the origin.eL = [−ipd/2, hy, hz] (4.3)eR = [ipd/2, hy, hz] (4.4)if the origin is head-centered at screen depth. Interpupillary distance (ipd)can be calculated from the SMI data (distance between pupil positions) orassumed to be 6.4cm as is commonly done. Gaze directions are then givenby subtracting eye position from POR.4.2 Saccade DetectionSaccade detection is also outlined in Diaz, et al. [11]. Saccade peaks aredetermined from points in the velocity trace that surpass a threshold. Theexact value of the threshold will depend on the scenario and can usually bedetermined by visual inspection of a few velocity plots. For our near-loomingexeriments, 10° was found to work well, while 50° worked for our left-to-right experiments. Saccade onset and offset were determined by accelerationmaxima and minima, respectively. Figure 4.5 shows sample velocity andacceleration plots labeled with the times identified as saccades.444.3. Pursuit DetectionFigure 4.5: Eye velocity (blue) and acceleration (orange) during a 2.5s left-to-right TTC experiment. Green sections have been identified as saccades.Figure 4.6: Eye velocity (blue) and angular distance between gaze and ball(orange) during a left-ro-right 2.5s TTC experiment. Red sections have beenidentified as pursuit.4.3 Pursuit DetectionFollowing Diaz, et al., we defined pursuit as >5 consecutive frames wherepursuit gain (ratio of eye velocity to object’s angular velocity) falls between0.3 and 1.2 and the angular distance from gaze to object is less than 4degrees. Periods of pursuit lasting less than 100ms were discarded andperiod of pursuit separated by less than 5 frames were merged. Figure 4.6shows sample velocity and angular distance plots labeled with the timesidentified as pursuit.454.4. CNN for 3D Gaze Localization(a) (b)Figure 4.7: Sample images of each eye used for training. (a) Right eye. (b)Left eye.4.4 CNN for 3D Gaze LocalizationWe implemented a Convolutional Neural Network for the task of predicting3D gaze location from eye images.4.4.1 Data CollectionEye images were captured at 5Hz using a SMI eyetracker installed in anOculus DK2 which captures left and right eye images asynchronously. Eacheye image is 320 × 240.Using OpenGL, a red cube with side length 0.1 moved around a blackbackground while eye images were captured. The cube moved between tworandomly chosen points with a collection volume over the course of 2 seconds.The collection volume had dimensions 0.2 x 0.2 x 1.8 and was centred 0.2min front of the head.The data collection session lasted 10 minutes and produced 3016 left andright eye images (∼2.3 GB). Sample left and right eye images are shown inFigure Preprocessing and FilteringSince left and right eye images and the cube positions were captured asyn-chronously, timestamps were used to create data points of the form (IL, IR,p) where IL, and IR are left and right eye images, respectively, and p is theposition of the cube at close points in time. To account for saccades, weremoved the first two data points of images that occurred after the cube464.4. CNN for 3D Gaze LocalizationFigure 4.8: CNN architecture. Each eye image is fed to one of two identicalstreams. The latent features of the two streams are flattened and mergedbefore a fully connected layer.started on a new path between two points in the collection volume. Imageswere manually scanned to remove those with blinks. The data was normal-ized by subtracting the mean and dividing by the standard deviation foreach pixel. Output data was normalized to [0,1] in each dimension.There were 2695 images after preprocessing (11% removed). These wererandomly split into 1887 training data, 404 validation data, and 404 testdata (70:15:15).4.4.3 Model ArchitectureWe learn a two-stream convolutional neural network. Each stream takesan image of either the left eye or right eye as input. The latent featuresof the images are merged before they are input to a fully connected layerthat outputs the x,y, and z coordinates of the cube position. All activationfunctions are RELU except the output layer which has a sigmoid activationfunction.The final architecture, determined through a manual search, is given inFigure TrainingThe network was trained with the following properties, also determinedthrough a manual search:474.4. CNN for 3D Gaze Localization• Optimizer: Adadelta (standard parameters)• Batch size: 1• Epochs: 15• Regularization: L2 on every layer (λ = 10−4)• Loss: Mean Squared Error (MSE)Training took less than 10 minutes on a GTX 980 and test error did notimprove significantly on longer runs.4.4.5 ResultsPrediction errors are given in Table 4.1. Errors are calculated by computingthe difference from actual cube position in normalized space to the network’sprediction. These results are quite good and this network would be usable forgaze-based interaction modalities in VR/AR. An online demo was attemptedbut ran into complications compiling Tensorflow with C++ on Windows,which was a constraint of the eye tracker.Mean Absolute Error Mean Squared Errorx 0.025 0.0041y 0.028 0.0059z 0.042 0.0048All 0.054 0.0066Table 4.1: The result of the network’s predictions on unseen test data.Errors are calculated in normalized space.48Part IIIConclusions49This thesis has undertaken an investigation of motion perception in VR.We have shown that speed estimation of looming virtual objects is ofteninaccurate and we introduced a novel intervention for presenting movingstimuli in VR. Our user study demonstrates that a correction of depth cuessignificantly affects users’ interception performance, resulting in greater ac-curacy and faster learning at high stimulus speeds compared to a baselinecondition with non-adjusted cues. Critically, this method is easy to imple-ment in any VR display, and only requires a brief user calibration.We set out to determine whether there are systematic weaknesses of peo-ple’s ability to judge times-to-contact in commonplace VR applications suchas a baseball simulation. We found lateral motion to be judged accuratelywith no significant effect of speed on TTC estimates and a small effect ofpresentation duration causing trials with small viewing time to be estimatedlate and longer trials to be estimated early. This is consistent with previ-ous TTC research and because of the overall level of accuracy, appears tonot be affected by the vergence-accommodation conflict. For looming andnear-looming motion we found that people had a much more difficult timegiving accurate TTC estimates and saw strong effects of speed and presen-tation duration. When comparing VR to a conventional display we foundconsistent differences in subjects’ responses, which can be explained by theintroduction of stereo cues in the VR display, but lacked the power to makestatistically justified claims. We found that stimuli that are only viewedwhen they are far away will result in poorer estimates of TTC than stim-uli viewed closer, even when all other predictors are the same. Knowingthat distance estimates are not needed to predict TTC indicates that thevergence-accommodation conflict is to blame for the inaccuracy of estimatesbased on far objects. We found that manipulating one of the monocular orbinocular predictors of TTC to imply a shorter TTC was effective for re-ducing estimates of TTC for subjects who were not experienced in the task.For subjects who had previously been trained for similar tasks, adjustingthe binocular cue was found to reduce estimates of TTC but only adjustingthe monocular cue did not have a consistent effect. While the result fornovice users is consistent with previous research [54], this conflicting resultin experts has not been seen before. However, this dichotomy between ex-pert and novice was not tested rigorously: there were only 3 experts who byhappenstance participated in this experiment.A couple of our experiments lacked the power to make statistically jus-tified conclusions. This is indicative of the large inter-subject variabilityin these tasks and could have been remedied by choosing the number ofsubjects in a more principled way, such as power analysis.50The Convolutional Neural Network was found to be quite successful.Although it does not achieve state-of-the-art performance, it would be usableas an interaction mechanism for Virtual Reality and Augmented Reality.The use of eye images over other mechanisms like infrared sensors make itan attractive option for the growing number of VR/AR displays but it ispossible that the effectiveness of the eye images were only due to reflectionsof the infrared sensors that could be seen in the images. Further, our CNN islikely far from optimal in terms of model architecture or training parameters.This amount of data collected to train and test the network was fairly limitedand does not provide any indication of generalizability to use on users whoseeye images were not trained on.There are two main contributions of this thesis. Firstly, we documentedeffects of the vergence-accommodation conflict on human motion perception.We showed that reducing this conflict resulted in more accurate estimatesof time-to-contact. Secondly, we introduced a method to model how users’perceive speeds based on their estimates of time-to-contact and to adjustmonocular depth cues in order to improve their estimates of time-to-contact.Although our first experiments involved contrived situations and stimuliand only button presses, our final validation experiment was a very realisticbaseball simulation that would be similar to actual VR applications. Whilethe button press paradigm can debatably be part of the dorsal or ventralvisual stream, the final experiment involved actually swinging at baseballsas one would in real life and is clearly in the dorsal stream thus making theresearch relevant to other personal-space interactions.There are many strengths to our method for modifying depth cues. Mostnotably, it provides users with more accurate estimates of time-to-contact.This can make many VR application less frustrating for users because it willlimit the amount of time needed to learn how to accurately interact with anobject. Ideally, a user will interact with virtual stimuli as they would interactwith real stimuli and should not have to learn VR-specific adaptations fortasks they already do in the real world. Additionally, this method canprovide developers with a way to control the level of help given to a userby varying the amount of modification from zero to the amount dictatedby our methods. It is very simple to implement and requires very littlecomputational overhead. This is considerably important for VR applicationswhere maintaining a minimum frame-rate is crucial. It requires only a shortcalibration and can then be deployed for a large set of tasks.One limitation of our method is that our calibration and testing tasksstarted at the same distance. This allows us to effectively conflate speedand TTC but reduces the generalizability of our method. In our method51we use temporal response errors to derive an estimate of perceived speedsand then determine which speed should be shown for an object to be per-ceived appropriately. If the starting distances varied between calibrationand testing this would not work. For example, if a user responded earlyto a trial in the calibration that started 18m away and took 1s, we mightconclude that a ball traveling 18 m/s should be shown at 25 m/s to elicitan accurate response. But if the stimuli started 5m away and travelled 25m/s it does not follow that the response would be more accurate; the usermay have responded late to a 18 m/s object that started 5m away. Thus itseems the proper extension of our method would be to map actual TTC topredicted TTC. The challenges of this would be knowing what the actualTTC of an arbitrary stimuli would be. This could potentially involve solvingtime-consuming differential equations.Another limitation of our method for modifying depth cues is its ineffec-tiveness on slower speeds. There are a couple possible reasons for this inef-fectiveness. One is that users’ responses in the calibration procedure for theslowest speeds were not an accurate representation of their time-to-contactestimates. This could be due to the pitfalls of the disappearance paradigmin TTC experiments or to having too long a disappearance time which wouldhave elicited cognitive strategies and displaced a purely perception-based re-sponse. The second reason is that the effect of depth underestimation maybe larger than the effect of TTC underestimation for some subjects.The applications of this method follow directly from our experiments. Ifa VR baseball game included these methods, it would allow users to be moreaccurate and learn to play the game more effectively and quickly. However,one should be wary of saying that modifying motion-in-depth cues in VRwill allow a user to train better for a task that they do in the real world.Since our methods modify the visual stimuli of a scene, it is possible thatthose training in VR will become accustomed to relying on cues that don’texist or are different in the real world. However if the point of training is tolearn the proper timing of an action, our methods would allow the user tolearn how to give accurately timed responses more quickly.This work was a first look at motion perception in VR. As such, thereare many directions to extend this work. We did not look at how peopleperceived movement of objects away from themselves and whether this per-ception could be manipulated. Similarly, we did not look at interactionsthat occurred outside of the space that a user could physically reach. Manyinterception tasks, like throwing or shooting, would fall into these categoriesof interaction and movement.Additionally, as stated above, this work could be abstracted away from52relying on estimates of perceived speeds to estimates of predicted responsetime. This would allow the calibration procedure to generalize to taskswith different starting distance but would require a method for knowing theactual TTC of an object.Direction discrimination is a core aspect of motion perception that wedid not investigate. Our calibration procedure used a looming stimulus andwas tested on stimuli that approached at approximately 2° angle. Whilethis approach angle was sufficiently small for our modification to still beeffective, it would interesting to know how the effectiveness of our modifieddepth cues changes with approach angle.It has been shown that the relative usefulness of monocular and binocularpredictors of TTC depend on other properties of the stimuli that we didnot investigate. We did most our experiments on a baseball-sized objectand it remains to be seen whether the modifications would still be effectiveon objects of a different size. Similarly, the usefulness of monocular andbinocular predictors of TTC will depend on an objects shape, since rotatingnon-spherical objects will not produce a consistent rate of change of angularsize. Additionally, rotation rate and self-motion have been shown to effectTTC estimates and could be incorporated into these methods.It seems as if an ideal motion correction system would need to take intoaccount many properties such as object size, distance, speed, and movementdirection. Another future direction of research would thus be to determinea calibration procedure which could estimate this multidimensional functionwith some accuracy if we didn’t want a separate calibration procedure forevery task.Lastly, training effects from using VR with these modifications shouldbe investigated. If one gets accustomed to seeing virtual objects in waysthat don’t exist in the real world, how will that affect the way the realworld is perceived? Before using VR with modified depth cues to help, say,professional sports players, it is important that this question be answered.In this thesis we have introduced a method for modifying virtual objectsand shown that non-veridical implementations of virtual objects can resultin more accurate perceptions and natural interactions. At a high level thisis a very interesting result because it shows that the ideal Virtual Realitymay not be one that perfectly emulates the real world but rather one thataccounts for and overcomes the limitations introduced by the medium inorder to emulate how the real world is perceived.53Bibliography[1] EON Sports VR. http://eonsportsvr.com/. Accessed on 2017-01-12.[2] STRIVR — Immersive Performance Training in Virtual Reality. http://www.strivrlabs.com/. Accessed on 2017-01-12.[3] Robert K Adair. The physics of baseball, 2002.[4] Kurt Akeley, Simon J Watt, Ahna Reza Girshick, and Martin S Banks.A stereo display prototype with multiple focal distances. In ACMTransactions on Graphics, volume 23, pages 804–813. ACM, 2004.[5] Thomas D Albright and Gene R Stoner. Visual motion perception. Pro-ceedings of the National Academy of Sciences, 92(7):2433–2440, 1995.[6] L Gregory Appelbaum and Graham Erickson. Sports vision training:A review of the state-of-the-art in digital training techniques. Interna-tional Review of Sport and Exercise Psychology, pages 1–30, 2016.[7] Claudia Armbru¨ster, Marc Wolter, Torsten Kuhlen, Will Spijkers, andBruno Fimm. Depth perception in virtual reality: distance estimationsin peri-and extrapersonal space. Cyberpsychology & Behavior, 11(1):9–15, 2008.[8] Viola Cavallo and Michel Laurent. Visual information and skill level intime-to-collision estimation. Perception, 17(5):623–632, 1988.[9] Alexandra Covaci, Anne-He´le`ne Olivier, and Franck Multon. Visualperspective and feedback guidance for vr free-throw training. IEEEcomputer graphics and applications, 35(5):55–65, 2015.[10] Cathy M Craig, Ce´dric Goulon, Eric Berton, Guillaume Rao, LaureFernandez, and Reinoud J Bootsma. Optic variables used to judge fu-ture ball arrival position in expert and novice soccer players. Attention,Perception, & Psychophysics, 71(3):515–522, 2009.54Bibliography[11] Gabriel Diaz, Joseph Cooper, Constantin Rothkopf, and Mary Hayhoe.Saccades to future ball location reveal memory-based prediction in avirtual-reality interception task. Journal of Vision, 13(1):20–20, 2013.[12] Andrew T Duchowski, Donald H House, Jordan Gestring, Rui I Wang,Krzysztof Krejtz, Izabela Krejtz, Rados law Mantiuk, and Bartosz Bazy-luk. Reducing visual discomfort of 3d stereoscopic displays with gaze-contingent depth-of-field. In Proceedings of the ACM Symposium onApplied Perception, pages 39–46. ACM, 2014.[13] Philip W Fink, Patrick S Foo, and William H Warren. Catching fly ballsin virtual reality: A critical test of the outfielder problem. Journal ofVision, 9(13):14–14, 2009.[14] Rob Gray. Behavior of college baseball players in a virtual batting task.Journal of Experimental Psychology: Human Perception and Perfor-mance, 28(5):1131, 2002.[15] Robert Gray and David Regan. Accuracy of estimating time to col-lision using binocular and monocular information. Vision Research,38(4):499–512, 1998.[16] Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and JohnSnyder. Foveated 3d graphics. ACM Transactions on Graphics,31(6):164, 2012.[17] Peter A Hancock and Michael P Manser. Time-to-contact. OccupationalInjury: Risk Prevention and Intervention, pages 44–58, 1998.[18] Heiko Hecht and Geert JP Savelsbergh. Theories of time-to-contactjudgment. Advances in psychology, 135:1–11, 2004.[19] Heiko Hecht and Geert JP Savelsbergh. Time-to-Contact, volume 135.Elsevier, 2004.[20] Robert T Held, Emily A Cooper, James F Obrien, and Martin S Banks.Using blur to affect perceived distance and size. ACM Transactions onGraphics, 29(2), 2010.[21] David M Hoffman, Ahna R Girshick, Kurt Akeley, and Martin S Banks.Vergence–accommodation conflicts hinder visual performance and causevisual fatigue. Journal of Vision, 8(3):33–33, 2008.[22] Fred Hoyle. The Black Cloud, pages 26–27. London: Penguin, 1957.55Bibliography[23] Fu-Chung Huang, David Luebke, and Gordon Wetzstein. The light fieldstereoscope. ACM SIGGRAPH Emerging Technologies, 24, 2015.[24] J Adam Jones, J Edward Swan II, Gurjot Singh, Eric Kolstad, andStephen R Ellis. The effects of virtual reality, augmented reality, andmotion parallax on egocentric depth perception. In Proceedings of the5th symposium on Applied Perception in Graphics and Visualization,pages 9–14. ACM, 2008.[25] Benjamin Knoerlein, Ga´bor Sze´kely, and Matthias Harders. Visuo-haptic collaborative augmented reality ping-pong. In Proceedings ofthe international conference on Advances in Computer EntertainmentTechnology, pages 91–94. ACM, 2007.[26] Taku Komura, Atsushi Kuroda, and Yoshihisa Shinagawa. Nicemeetvr:facing professional baseball pitchers in the virtual batting cage. InProceedings of the 2002 ACM symposium on Applied Computing, pages1060–1065. ACM, 2002.[27] Alexandra Kuznetsova, Per Bruun Brockhoff, and Rune Haubo BojesenChristensen. Package lmertest. R package version, 2(0), 2015.[28] Martin Lages. Bayesian models of binocular 3-d motion perception.Journal of Vision, 6(4):14–14, 2006.[29] Klaus Landwehr, Heiko Hecht, and Bernhard Both. Allocentric time-to-contact and the devastating effect of perspective. Vision Research,105:53–60, 2014.[30] David N Lee. A theory of visual control of braking based on informationabout time-to-collision. Perception, 5(4):437–459, 1976.[31] Jack M Loomis and Joshua M Knapp. Visual perception of egocen-tric distance in real and virtual environments. Virtual and AdaptiveEnvironments, 11:21–46, 2003.[32] Joan Lo´pez-Moliner and Claude Bonnet. Speed of response initiationin a time-to-contact discrimination task reflects the use of η. VisionResearch, 42(21):2419–2430, 2002.[33] Joan Lo´pez-Moliner, Hans Supe`r, and Matthias S Keil. The time courseof estimating time-to-contact: Switching between sources of informa-tion. Vision Research, 92:53–58, 2013.56Bibliography[34] Gordon D Love, David M Hoffman, Philip JW Hands, James Gao,Andrew K Kirby, and Martin S Banks. High-speed switchable lensenables the development of a volumetric stereoscopic display. OpticsExpress, 17(18):15716–15725, 2009.[35] Arthur J Lugtigheid and Andrew E Welchman. Evaluating methods tomeasure time-to-contact. Vision Research, 51(20):2234–2241, 2011.[36] Michael P Manser, Peter A Hancock, and Carolyn A Kinney. Re-search paradigm as a factor influencing estimates of time-to-contact.In Proceedings of the Human Factors and Ergonomics Society AnnualMeeting, volume 40, pages 1202–1206. SAGE Publications Sage CA:Los Angeles, CA, 1996.[37] Jonathan A Marshall, Christina A Burbeck, Dan Ariely, Jannick PRolland, and Kevin E Martin. Occlusion edge blur: a cue to relativevisual depth. JOSA A, 13(4):681–688, 1996.[38] George Mather. Image blur as a pictorial depth cue. Proceedings ofthe Royal Society of London B: Biological Sciences, 263(1367):169–172,1996.[39] George Mather. The use of image blur as a depth cue. Perception,26(9):1147–1158, 1997.[40] George Mather and David RR Smith. Blur discrimination and its rela-tion to blur-mediated depth perception. Perception, 31(10):1211–1219,2002.[41] Nathan Matsuda, Alexander Fix, and Douglas Lanman. Focal surfacedisplays. ACM Transactions on Graphics, 36(4):86:1–86:14, July 2017.[42] Michael Mauderer, Simone Conte, Miguel A Nacenta, and DhanrajVishwanath. Depth perception with gaze-contingent depth of field. InProceedings of the SIGCHI Conference on Human Factors in Comput-ing Systems, pages 217–226. ACM, 2014.[43] David Milner and Mel Goodale. The visual brain in action. OxfordUniversity Press, 2006.[44] Daniel Mirman. Growth curve analysis and visualization using R. CRCPress, 2016.57Bibliography[45] Ken Nakayama. Biological image motion processing: a review. VisionResearch, 25(5):625–660, 1985.[46] Robert P O’Shea, Donovan G Govan, and Robert Sekuler. Blur andcontrast as pictorial depth cues. Perception, 26(5):599–612, 1997.[47] Anjul Patney, Joohwan Kim, Marco Salvi, Anton Kaplanyan, ChrisWyman, Nir Benty, Aaron Lefohn, and David Luebke. Perceptually-based foveated virtual reality. In ACM SIGGRAPH 2016 EmergingTechnologies, page 17. ACM, 2016.[48] Lieke Peper, Reinoud J Bootsma, Daniel R Mestre, and Frank CBakker. Catching balls: How to get the hand to the right place atthe right time. Journal of Experimental Psychology: Human Percep-tion and Performance, 20(3):591, 1994.[49] David Regan and Kenneth I Beverley. Binocular and monocular stimulifor motion in depth: changing-disparity and changing-size feed the samemotion-in-depth stage. Vision Research, 19(12):1331–1342, 1979.[50] David Regan and Stanley J Hamstra. Dissociation of discriminationthresholds for time to contact and for rate of angular expansion. VisionResearch, 33(4):447–462, 1993.[51] Rebekka S Renner, Boris M Velichkovsky, and Jens R Helmert. The per-ception of egocentric distances in virtual environments-a review. ACMComputing Surveys (CSUR), 46(2):23, 2013.[52] Robert A Rolin, Jolande Fooken, Miriam Spering, and Dinesh K Pai.Perception of motion in virtual reality interception tasks. IEEE Trans-actions on Visualization and Computer Graphics, in review 2017.[53] Simon K Rushton and Philip A Duke. Observers cannot accuratelyestimate the speed of an approaching object in flight. Vision Research,49(15):1919–1928, 2009.[54] Simon K Rushton and John P Wann. Weighted combination of sizeand disparity: a computational model for timing a ball catch. NatureNeuroscience, 2(2):186–190, 1999.[55] Gregory S Sawicki, Mont Hubbard, and William J Stronge. How tohit home runs: Optimum baseball bat swing parameters for maximumrange trajectories. American Journal of Physics, 71(11):1152–1162,2003.58Bibliography[56] Matthew RH Smith, John M Flach, Scott M Dittman, and Terry Sta-nard. Monocular optical constraints on collision control. Journal of Ex-perimental Psychology: Human Perception and Performance, 27(2):395,2001.[57] Emil Moltu Staurset and Ekaterina Prasolova-Førland. Creating asmart virtual reality simulator for sports training and education. InSmart Education and e-Learning 2016, pages 423–433. Springer, 2016.[58] Alex Vincent and David Regan. Parallel independent encoding of ori-entation, spatial frequency, and contrast. Perception, 24(5):491–499,1995.[59] Dhanraj Vishwanath and Erik Blaser. Retinal blur and the perceptionof egocentric distance. Journal of Vision, 10(10):26–26, 2010.[60] John P Wann. Anticipating arrival: Is the tau margin a specious the-ory? Journal of Experimental Psychology: Human Perception andPerformance, 22(4):1031, 1996.[61] Simon J Watt, Kurt Akeley, Marc O Ernst, and Martin S Banks. Focuscues affect perceived depth. Journal of Vision, 5(10):7–7, 2005.[62] Shuhong Xu, Peng Song, Ching Ling Chin, Gim Guan Chua, Zhiy-ong Huang, and Susanto Rahardja. Tennis space: an interactive andimmersive environment for tennis simulation. In Image and Graph-ics, 2009. ICIG’09. Fifth International Conference on, pages 652–657.IEEE, 2009.[63] Frank TJM Zaal and Reinoud J Bootsma. Virtual reality as a tool forthe study of perception-action: The case of running to catch fly balls.Presence: Teleoperators and Virtual Environments, 20(1):93–103, 2011.59


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items