UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Reconciling pixels and percept : improving spatial visual fidelity with a fishbowl virtual reality display Zhou, Qian 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2020_november_zhou_qian.pdf [ 31.81MB ]
Metadata
JSON: 24-1.0391981.json
JSON-LD: 24-1.0391981-ld.json
RDF/XML (Pretty): 24-1.0391981-rdf.xml
RDF/JSON: 24-1.0391981-rdf.json
Turtle: 24-1.0391981-turtle.txt
N-Triples: 24-1.0391981-rdf-ntriples.txt
Original Record: 24-1.0391981-source.json
Full Text
24-1.0391981-fulltext.txt
Citation
24-1.0391981.ris

Full Text

Reconciling Pixels and PerceptImproving Spatial Visual Fidelitywith a Fishbowl Virtual Reality DisplaybyQian ZhouB.Sc., Tianjin University, 2011M.Sc., Georgia Institute of Technology, 2014M.Sc., Shanghai Jiao Tong University, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)June 2020c© Qian Zhou 2020The following individuals certify that they have read, and recommend tothe Faculty of Graduate and Postdoctoral Studies for acceptance, the dis-sertation entitled:Reconciling Pixels and Percept: Improving Spatial Visual Fidelitywith a Fishbowl Virtual Reality Displaysubmitted by Qian Zhou in partial fulfillment of the requirements for thedegree of Doctor of Philosophy in Electrical and Computer Engi-neering.Examining Committee:Sidney Fels, Electrical and Computer EngineeringSupervisorRobert Rohling, Electrical and Computer EngineeringSupervisory Committee MemberAlan Kingstone, PsychologyUniversity ExaminerBoris Stoeber, Mechanical EngineeringUniversity ExaminerAdditional Supervisory Committee Members:Septimiu E. Salcudean, Electrical and Computer EngineeringSupervisory Committee MemberPanos Nasiopoulos, Electrical and Computer EngineeringSupervisory Committee MemberiiAbstractVirtual Reality (VR) has fundamentally changed how we can perceive three-dimensional (3D) objects in a virtual world by providing pictorial represen-tations as 3D digital percepts rather than traditional 2D digital percepts.However, the way we perceive virtual objects is fundamentally different fromthe way we perceive real objects that surround us every day. Therefore, thereexists a perceptual gap between the virtual and real world. The researchdescribed in this dissertation is driven by a desire to provide consistent per-ception between the two worlds.Bridging the perceptual gap between virtual and physical world is chal-lenging because it requires both understanding technical problems such asmodeling, rendering, calibration and sensing, but also understanding howhuman perceive 3D space. We focus on a Fishbowl VR display to inves-tigate the perceptual gap by introducing new techniques and conductingempirical studies to improve the visual fidelity of digital 3D displays.To create a seamless high-resolution spherical display, we create an auto-matic calibration approach to eliminate artifacts and blend multiple projec-tions with sub-millimeter accuracy for a multiple-projector spherical display.We also perform an end-to-end error analysis of the 3D visualization, whichprovides guidelines and requirements for system components.To understand human perception with the Fishbowl VR display, we con-duct a user experiment (N = 16) to compare spatial perception on theFishbowl VR display with a traditional flat VR display. Results show thespherical screen provides better depth and size perception in a way closer tothe real world. As the virtual objects are depicted by pixels on 2D screens, aperceptual duality exists between the on-screen imagery and the 3D perceptwhich potentially impairs perceptual consistency. We conduct two studies(N = 29) and show the influence of the on-screen imagery causing percep-tual bias in size perception. We show that adding stereopsis and using weakperspective projection can alleviate perceptual bias. The explorations fromthis dissertation lay the groundwork for reconciling pixels with percept andpave the way for future studies, interactions and applications.iiiLay SummaryVirtual Reality (VR) fundamentally changes our ways of viewing and inter-acting in a computer-generated 3D world. The underlying motivation of VRis to present digital content as if it exists in the real world, thus leveraginghumans’ built-in capabilities practiced every day. While users perceive a3D object, they are actually looking at digital pixels on a 2D display. Un-fortunately, there are still significant challenges in using pixels to generaterealistic visualization, which is preventing this technology from being usedin various applications such as training and product design. In this work, weaddressed challenges in providing consistent perception from both technicaland perceptual aspects, including: creating an accurate display calibrationmethod, characterizing the visual error, demonstrating the influence of thescreen shape and the effect of on-screen pixels. By overcoming the majorobstacles in providing consistent visual experience, we will make VR moreaccessible and practical for future research and applications.ivPrefaceMuch of this dissertation have been published elsewhere. A complete list ofall publications can be found in Appendix A. The co-authorship of contri-butions is discussed here. All of the research work in this dissertation wasconducted in the Human Communication Technologies Laboratory at theUniversity of British Columbia, Point Grey campus. All user studies andassociated analysis were approved by the University of British Columbia Be-havioural Research Ethics Board with the original certificate (H08-03005)and post approval activities (H08-03005-A017 and H08-03005-A019).Chapter 3 and Appendix B have been published in [154] as listed below.I wrote the source code for the calibration, performed the evaluation, andwrote the text. Dr. Miller assisted with analysis and writing. Dr. Felsprovided editorial feedback on the manuscript. Mr. Wu and Ms. Correaassisted with the implementation of the demo shown in [154] as well as thesupplementary video.[154] Qian Zhou, Gregor Miller, Kai Wu, Daniela Correa, and Sid-ney Fels. Automatic calibration of a multiple-projector spherical fishtank VR display. In 2017 IEEE Winter Conference on Applicationsof Computer Vision (WACV), pages 1072–1081. IEEE, 2017. →videoVersions of Chapter 4 have been published in [155] and [37]. The erroranalysis has been published in [155]. I formulated the visual error andperformed the simulation and analysis in consultation with Dr. Fels. Iwrote the manuscript and the source code of the error model. Mr. Wuassisted with the implementation of the demo shown in [155] as well asthe supplementary video. The design and implementation of the sphericalprototypes described in Section 4.1 has been published in [37]. Mr. Fafard,Mr. Chamberlain, Mr. Hagemann and I jointly developed the software andhardware of the system. Mr. Fafard wrote the manuscripts. I assisted withthe data collection, analysis and writing. Dr. Miller, Dr. Stavness and Dr.Fels provided editorial feedback on the manuscript.[155] Qian Zhou, Gregor Miller, Kai Wu, Ian Stavness, and SidneyFels. Analysis and practical minimization of registration error in avPrefacespherical fish tank virtual reality system. In Asian Conference onComputer Vision (ACCV), pages 519–534. Springer, 2016. →video[37] Dylan Fafard, Qian Zhou, Chris Chamberlain, Georg Hagemann,Sidney Fels, and Ian Stavness. Design and implementation of a multi-person fish tank virtual reality display. In Proceedings of the 24th ACMSymposium on Virtual Reality Software and Technology (VRST), page1-9. ACM, 2018. →videoA version of Chapter 5 has been published in [152]. I was responsible forthe experimental design, data collection and analysis, as well as manuscriptcomposition. Mr. Hagemann assisted with the implementation of the planardisplay prototype used in the experiment. Dr. Fels and Mr. Fafard wereinvolved in the discussion of the experimental design. Dr. Stavness and Dr.Fels provided editorial feedback on the manuscript.[152] Qian Zhou, Georg Hagemann, Dylan Fafard, Ian Stavness, andSidney Fels. An evaluation of depth and size perception on a sphericalfish tank virtual reality display. IEEE Transactions on Visualizationand Computer Graphics (TVCG), 25(5): 2040–2049, 2019. →videoAn early version of Chapter 6 has been published in [156]. I was re-sponsible for the experimental design, data collection and analysis, as wellas manuscript composition. Dr. Fels and Ms. Wu were involved in thediscussion of the experimental design. Dr. Stavness and Dr. Fels providededitorial feedback on the manuscript.[156] Qian Zhou, Fan Wu, Sidney Fels, and Ian Stavness. Closerobject looks smaller: Investigating the duality of size perception in aspherical fish tank vr display. In Proceedings of the 2020 CHI Con-ference on Human Factors in Computing Systems, pages 1–9. ACM,2020.Early versions of Section 3.2 and Section 4.1 have been published asinteractive demonstrations in [157] and [153]. I developed and tested theapplications jointly with Mr. Fafard, Mr. Wagemakers, Mr. Chamberlain,Mr. Wu and Mr. Hagemann. Mr. Fafard, Mr. Wagemakers and Mr.Chamberlain developed the rendering infrastructure required to run the ap-plication. I wrote the manuscripts with the assistance of Dr. Miller, Dr.Stavness and Dr. Fels. Dr. Stavness and Dr. Fels were involved in thediscussion of the design of applications and provided editorial feedback onthe manuscript.viPreface[153] Qian Zhou, Georg Hagemann, Sidney Fels, Dylan Fafard, An-drew Wagemakers, Chris Chamberlain, and Ian Stavness. Coglobe: aco-located multi-person ftvr experience. In ACM SIGGRAPH 2018Emerging Technologies, page 5. ACM, 2018.[157] Qian Zhou, Kai Wu, Gregor Miller, Ian Stavness, and Sid-ney Fels. 3dps: An auto-calibrated three-dimensional perspective-corrected spherical display. In Virtual Reality (VR), 2017 IEEE, pages455–456. IEEE, 2017.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Fishbowl VR Display . . . . . . . . . . . . . . . . . . . . . . 31.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 3D Visual Displays . . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Fish Tank Virtual Reality Displays . . . . . . . . . . 132.1.2 Projector-Based 3D Displays . . . . . . . . . . . . . . 142.1.3 Head-Mounted Displays . . . . . . . . . . . . . . . . . 152.1.4 Static and Swept Volumetric Displays . . . . . . . . . 152.1.5 Spherical Displays . . . . . . . . . . . . . . . . . . . . 162.2 Visual Perception Evaluation in the Virtual Environment . . 172.2.1 Definition of Visual Fidelity . . . . . . . . . . . . . . 172.2.2 Perceptual Duality . . . . . . . . . . . . . . . . . . . 192.2.3 Evaluation of Depth Cues . . . . . . . . . . . . . . . 20viiiTable of Contents2.2.4 Depth and Size Perception Evaluation . . . . . . . . . 222.3 Hardware and Software Display Techniques . . . . . . . . . . 232.3.1 Multiple-Projector Display Calibration . . . . . . . . 242.3.2 Head Tracking Techniques . . . . . . . . . . . . . . . 252.3.3 Error Analysis of 3D Displays . . . . . . . . . . . . . 252.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Multiple-projector Spherical Display Calibration . . . . . . 283.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Calibration Approach . . . . . . . . . . . . . . . . . . . . . . 313.2.1 Semi-Automatic Calibration Approach . . . . . . . . 323.2.2 Automatic Calibration Approach . . . . . . . . . . . 363.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.2 Error Measurement . . . . . . . . . . . . . . . . . . . 393.3.3 Implementation . . . . . . . . . . . . . . . . . . . . . 403.3.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4 Discussion and Limitations . . . . . . . . . . . . . . . . . . . 443.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Visual Error Analysis for Fishbowl VR Displays . . . . . . 474.1 Fishbowl VR Display . . . . . . . . . . . . . . . . . . . . . . 484.2 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.3 Visual Error Analysis . . . . . . . . . . . . . . . . . . . . . . 544.3.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . 554.3.2 Viewpoint Error . . . . . . . . . . . . . . . . . . . . . 554.3.3 Display Error . . . . . . . . . . . . . . . . . . . . . . 564.3.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . 574.4 Discussions and Limitations . . . . . . . . . . . . . . . . . . 614.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Evaluation of Depth and Size Perception . . . . . . . . . . . 635.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.2 Depth and Size Perception Study . . . . . . . . . . . . . . . 655.2.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 655.2.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2.3 Participants . . . . . . . . . . . . . . . . . . . . . . . 685.2.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 685.2.5 Experiment Design . . . . . . . . . . . . . . . . . . . 695.2.6 Result . . . . . . . . . . . . . . . . . . . . . . . . . . 71ixTable of Contents5.3 Discussion and Limitations . . . . . . . . . . . . . . . . . . . 755.3.1 Size Constancy . . . . . . . . . . . . . . . . . . . . . . 755.3.2 Visibility . . . . . . . . . . . . . . . . . . . . . . . . . 775.3.3 Head Movement . . . . . . . . . . . . . . . . . . . . . 795.3.4 Indications to FTVR Studies and Applications . . . . 815.3.5 Limitations and Future Work . . . . . . . . . . . . . 825.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836 Perceptual Duality of FTVR Displays . . . . . . . . . . . . . 856.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.2 User Study 1: Influence of the On-screen Imagery . . . . . . 876.2.1 Projected Size and Movement . . . . . . . . . . . . . 896.2.2 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.2.3 Experimental Design . . . . . . . . . . . . . . . . . . 916.2.4 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 916.2.5 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . 916.2.6 Participants . . . . . . . . . . . . . . . . . . . . . . . 926.2.7 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 936.2.8 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . 946.2.9 Result . . . . . . . . . . . . . . . . . . . . . . . . . . 946.2.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 966.3 User Study 2: Influence of the Projection Matrix . . . . . . . 976.3.1 Task 1: Subjective Impression . . . . . . . . . . . . . 996.3.2 Task 2: Size Judgement . . . . . . . . . . . . . . . . . 1006.3.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 1006.3.4 Participants . . . . . . . . . . . . . . . . . . . . . . . 1006.3.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . 1016.3.6 Result . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 1036.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . 1046.4.1 Influence of the On-screen Imagery . . . . . . . . . . 1046.4.2 Size Perception Accuracy . . . . . . . . . . . . . . . . 1056.4.3 HeadMove vs ObjectMove . . . . . . . . . . . . . . . . 1076.4.4 Choice of Projection Matrix . . . . . . . . . . . . . . 1086.4.5 Design Recommendations for FTVR Displays . . . . 1106.4.6 Limitations and Future Work . . . . . . . . . . . . . 1106.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111xTable of Contents7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . 1137.2 Indications to FTVR Designs . . . . . . . . . . . . . . . . . . 1157.3 Discussion of Fishbowl VR Displays . . . . . . . . . . . . . . 1177.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1187.5 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 123Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124AppendicesA List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . 142A.1 Journal Publication . . . . . . . . . . . . . . . . . . . . . . . 142A.2 Conference Publication . . . . . . . . . . . . . . . . . . . . . 142A.3 Research Talk . . . . . . . . . . . . . . . . . . . . . . . . . . 143A.4 Additional Publication . . . . . . . . . . . . . . . . . . . . . 143B Multi-Projector Display Calibration Optimization . . . . . 145C User Study Questionnaires . . . . . . . . . . . . . . . . . . . . 149C.1 User Study on Size and Depth Perception . . . . . . . . . . . 150C.2 User Study 1 on Perceptual Duality . . . . . . . . . . . . . . 152C.3 User Study 2 on Perceptual Duality . . . . . . . . . . . . . . 154D User Study Result . . . . . . . . . . . . . . . . . . . . . . . . . 156D.1 User Study on Size and Depth Perception . . . . . . . . . . . 156D.2 User Study 1 on Perceptual Duality . . . . . . . . . . . . . . 160D.3 User Study 2 on Perceptual Duality . . . . . . . . . . . . . . 162xiList of Tables3.1 Calibration results of proposed approaches . . . . . . . . . . . 424.1 Estimated visual error with three different tracking systems . 605.1 Size perception on different 3D displays . . . . . . . . . . . . 82xiiList of Figures1.1 Diagram of a Fishbowl VR display . . . . . . . . . . . . . . . 51.2 Illustration of perceptual duality . . . . . . . . . . . . . . . . 62.1 Reality-virtuality schematic continuum . . . . . . . . . . . . . 112.2 Reproduction fidelity dimension defined by Milgram [94] . . . 182.3 List of depth cues . . . . . . . . . . . . . . . . . . . . . . . . . 203.1 Goal of display calibration . . . . . . . . . . . . . . . . . . . . 293.2 Multiple-projector spherical display layout . . . . . . . . . . . 303.3 Proposed calibration pipeline . . . . . . . . . . . . . . . . . . 313.4 Example of calibration result for a three-projector sphericaldisplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.5 Illustration of global and local point error . . . . . . . . . . . 393.6 Calibration results of proposed approaches . . . . . . . . . . . 423.7 Captured grid pattern before and after calibration . . . . . . 434.1 Design of the Fishbowl VR display . . . . . . . . . . . . . . . 484.2 Spherical prototypes with 12 inch and 24 inch diameter . . . 494.3 Viewpoint calibration approach . . . . . . . . . . . . . . . . . 514.4 Overview of the rendering pipeline . . . . . . . . . . . . . . . 524.5 Diagram of angular visual error . . . . . . . . . . . . . . . . . 554.6 Visual angle affected by the viewpoint position . . . . . . . . 574.7 Angular error affected by the virtual point position . . . . . . 594.8 Display error varied by pixels . . . . . . . . . . . . . . . . . . 615.1 Experimental setup of a flat FTVR display . . . . . . . . . . 665.2 Illustration of experimental stimulus . . . . . . . . . . . . . . 675.3 Results of error magnitudes . . . . . . . . . . . . . . . . . . . 715.4 Results of questionnaire . . . . . . . . . . . . . . . . . . . . . 735.5 Illustration of two-way interaction effects . . . . . . . . . . . 745.6 Linear regression of size-constancy . . . . . . . . . . . . . . . 765.7 Illustration of the visibility for the flat and spherical screen . 78xiiiList of Figures5.8 Diagram of head movement of all participants . . . . . . . . . 806.1 Example of the perceptual duality on size perception . . . . . 866.2 Illustration of head and object move . . . . . . . . . . . . . . 876.3 Diagram of perspective projection . . . . . . . . . . . . . . . 886.4 Simulation result of the projected size of the on-screen imagery 906.5 Experimental setup of Study 1 . . . . . . . . . . . . . . . . . 926.6 Results of bias error with means and 95% confidence intervalsin Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.7 Results of questionnaire in Study 1 . . . . . . . . . . . . . . . 966.8 Illustration of weak perspective projection . . . . . . . . . . . 986.9 Diagram of subjective impression task . . . . . . . . . . . . . 996.10 Study 2 results of bias error and questionnaire . . . . . . . . 1026.11 Example of floating effect . . . . . . . . . . . . . . . . . . . . 1066.12 Example of participant’s view in Study 2 . . . . . . . . . . . 1086.13 Result of bias error influenced by VR expertise . . . . . . . . 1097.1 Illustrations of applications using the Fishbowl VR display . 121D.1 Results of two-way ANOVA of the absolute depth error . . . 156D.2 Results of three-way ANOVA of the absolute size error . . . . 157D.3 Results of three-way ANOVA of the size ratio . . . . . . . . . 158D.4 Results of pairwise t-test of head movement in the depth task 159D.5 Results of pairwise t-test of head movement in the size task . 159D.6 Results of questionnaire in the depth and size task . . . . . . 159D.7 Results of the bias error in Study 1 . . . . . . . . . . . . . . . 160D.8 Results of the confidence rating in Study 1 . . . . . . . . . . . 160D.9 Results of the realism rating in Study 1 . . . . . . . . . . . . 161D.10 Results of the bias error in Study 2 . . . . . . . . . . . . . . . 162D.11 Results of questionnaire in Study 2 . . . . . . . . . . . . . . . 163xivGlossary1D One Dimensional2D Two Dimensional3D Three DimensionalANCOVA Analysis of CovarianceANOVA Analysis of VarianceAR Augmented RealityCAD Computer-Aided DesignCI Confidence IntervalDOF Degree of FreedomFTVR Fish Tank Virtual RealityHMD Head-mounted DisplayMR Mixed RealityRE Real EnvironmentRMS Root Mean SquareSAR Spatial Augmented RealityVE Virtual EnvironmentVR Virtual RealityxvAcknowledgementsMuch of this work has been supported by the University of British Columbia,Natural Sciences and Engineering Research Council of Canada (NSERC)and B-Con Engineering.This work would not have been possible without the invaluable guidanceand supervision from my research supervisor Professor Sidney Fels, whohas been my role model in my development as a researcher. Thanks forproviding an excellent environment for research and supporting me withperceptive insight and enthusiasm of research.I am grateful to my committee members and examiners: Robert Rohling,Tim Salcudean, Panos Nasiopoulos, Alan Kingstone, Boris Stoeber andKevin Ponto (my external examiner). Their expertise in various aspects hasbeen very helpful to refine this work. Special thanks go to Kellogg Boothfor taking the time to review the dissertation and providing constructivefeedback through in-depth conversations.I would like to thank Gregor Miller and Ian Stavness for their guidanceand continuous support for the work in Chapter 3 and 4, as well as Chapter 5and 6, respectively. I would also like to thank my fellow colleagues at theHCT lab for their support in user studies and special thanks go to FanWu and Georg Hagemann for the friendship and support through the toughtimes.Lastly, I would like to thank my parents and my husband Weipu forthe endless supplies of encouragement and generous support. Thanks forputting up with me through both the good and bad times. Because of theirunconditional love and patience, I have got the opportunity to complete thisdissertation.xviChapter 1IntroductionVirtual Reality (VR) has fundamentally changed the way we perceive 3Dobjects in a virtual world by providing pictorial representations as 3D dig-ital percepts rather than traditional 2D digital percepts. The underlyingmotivation of VR is to present virtual 3D content as if it exists in thereal world, thus leveraging humans’ built-in capabilities practiced and well-developed in the 3D world that surrounds us every day. One of the mostimportant factors to support such experience is the visual fidelity, definedas “the degree to which scenes presented on a display system are perceivedin the same manner as an equivalent real-world scene” [128]. High levelof visual fidelity provides users consistent perception between the real anddigital world, which is crucial in VR.For 3D displays, providing high visual fidelity is non-trivial, and ofteninvolves technical challenges in rendering, calibration and tracking, as wellas perceptual challenges which require a deep understanding of how humansperceive the 3D space. In this dissertation, we investigate visual fidelityby examining a spherical Fish Tank Virtual Reality display. Fish TankVirtual Reality (FTVR) displays [30] enable head-coupled perspective onhigh-resolution desktop screens. In comparison to other 3D displays, FTVRdisplay is promising to provide consistent perception because it allows usersto see virtual contents situated in the real world. Using a spherical FTVRdisplay, we propose approaches to improve visual fidelity from both technicaland perceptual standpoints towards bridging the perceptual gap betweenreal and virtual world.1.1 MotivationThe research described in this dissertation is motivated by a desire to visu-ally provide perceptual consistency between the virtual and real world byidentifying technical challenges and understanding the way humans perceivespatial structures in the real and virtual environment. Providing consistentperception enables a user to utilize real-world skills in a virtual environment.11.1. MotivationIt also helps to transfer learning from the virtual environment back into thereal world [129].It is important to provide high visual fidelity because our cognition andactions depend on what we perceive. Every canonical manipulation taskin 3D involves different aspects of spatial perception. A selection task, forexample, is the task of selecting a target object from the entire set of objectsavailable. The real world counterpart of the selection task is picking andgrabbing an object with a hand. It requires accurate depth perception sothat the user knows the position of the target object. As another example, ascaling task is the task of resizing an object until the size of the object meetscertain criterion. It requires precise size perception knowing the scale of theobject. Providing high fidelity of depth and size perception is importantsince users can leverage their interaction skills in the real world when displayseffectively provide perception in a consistent way as in the real world.Despite of their abstract forms, these canonical tasks are the buildingblocks of many 3D applications. For example, Computer Aided Design(CAD) systems have an assembly task which involves selecting objects, plac-ing them in the desired locations, and resizing them along various dimen-sions [145]. Misperception of the distance and size of virtual models woulddecrease the performance of these interactions. Providing accurate spatialperception is more than supporting 3D interactions in CAD. Most impor-tantly, it helps preserving consistent shape perception between the designand fabricated object, such that when a virtual object is fabricated in thereal world, the user is confident that the fabricated object matches the lookof the design. Medical visualization is another example that requires percep-tual consistency for various tasks ranging from diagnose to surgical trainingand planning. Accurate depth and size perception plays a critical role inidentifying anatomic landmarks, performing resection of tissue, and clos-ing the incision [26]. Effective surgical training allows trainees to progresspartway along the learning curve before performing on a real patient. Toensure the learning from the virtual environment can be transferred backinto the real world, it is important to provide spatial perception consis-tently between the training and real environment, which require high visualfidelity supported by the display.Driven by these use cases, our work seeks to recognize technical andperceptual challenges to provide perceptual consistency in the virtual envi-ronment. A virtual entity goes through a sequence of processes before it isultimately perceived by the viewer. Problems may arise from image creationto percept formation, making the virtual entity deviated from the way it wasintended to look like. There are steps that need to be taken to ensure the21.2. Fishbowl VR Displayperceptual consistency. Our work is to identify and understand these stepsto improve visual fidelity in the virtual environment.1.2 Fishbowl VR DisplayRecent advances in display technologies make it possible to present virtualcontent with high resolutions, fast refresh rates, and stereoscopic capabili-ties. Depicting the virtual scene with high definition pixels helps to buildup the perceptual vividness of 3D space. However, it is still challenging toprovide the same visualization as in the real world. While we perceive avirtual object, we are actually looking at pixels on a 2D screen. This isfundamentally different from the way we perceive a real object, because thelight is not emitted from true 3D space, but rather from 2D pixels. Solu-tions are available to date, in the form of volumetric displays, which producevolume-filling 3D imagery so that each voxel emits visible light from the re-gion where it appears [40]. However, these displays often suffer from limitedresolution and low brightness with a small viewing angle, which spoils the3D experience and fails to meet the promise of high visual fidelity . There-fore, pixel-based 3D displays are still the mainstream of VR in providinghigh quality viewing experience.To bridge the perceptual gap between the virtual and real world, thereare technical issues and perception issues. From the technical standpoint,display systems have registration error, tracker jittering, system latency anddisplay resolution that can cause different artifacts in the virtual environ-ment. These artifacts do not exist in the real world and hence impairs thevirtual experience. On the other hand, we perceive various forms of feed-back in the real world. The visual information alone is formed by lots ofdepth cues, such as perspective, occlusion, shadows, motion parallax andstereopsis. In contrast, the virtual environment can only provide a limitedcollection of them. Furthermore, within this collection, information can con-flict with each other. The stereoscopic vergence-focus problem, for example,is a conflict between the disparity and vergence information caused by the2D screen [142].Confronted with so many obstacles, how can we improve the visual fi-delity in the virtual environment? Fortunately, for the visual information,research has found that the way we use spatial information depends greatlyon the purpose of perceiving [27, 142]. It stands for the reason that if we canunderstand and use the spatial information which is significant for criticaltasks in an application, users may still have the correct spatial perception31.2. Fishbowl VR Displayin the absence of some depth cues. Of all the depth cues, stereopsis hasbeen long favored for providing acuity in relative depth judgements [142],positioning objects in a 3D space [19], and improving size-constancy in thevirtual environment [91]. In addition to stereopsis, motion parallax has beenfound to be an independent depth cue for both shape and relative depth per-ception in the absence of others [119]. It is believed that motion parallax isthe one that will most likely enables users to see more information, partic-ularly for certain cognitive tasks [142].While most 3D displays today provide both stereopsis and motion par-allax cues, few work have met the promises for providing perceptual con-sistency between the real and virtual environment. Immersive 3D displays,such as head-mounted displays (HMDs) and CAVE displays, create deep en-gagement and strong immersion in the virtual environment. They surroundand isolate users from the real world, making it challenging to provide con-sistent perception between the real and virtual world. Unlike immersive 3Ddisplays, FTVR displays create a 3D illusion that is situated within the realworld, allowing easy transition between real-world and virtual informationon the FTVR display. See-through HMDs can create similar effect, butpresently support a limited field of view with a reduced brightness causedby the additive blending [11].Previous studies have compared task performances of FTVR displayswith other VR forms like CAVE and HMD [31, 41, 106]. The results ofthese studies show mixed preferences, suggesting that user performance istask related. Users tend to perform better in immersive 3D displays with ex-ploration tasks [41], while in FTVR displays they perform better with tasksthat require understanding of spatial structures [31, 106]. Nevertheless, mostof these user evaluations have relied on fixed, single screen FTVR, wherethe user’s head motion generates limited motion parallax cues and is gener-ally within a viewing angle of 45 degrees off-axis. Limited head-motion willfavor non-head-coupled rendering because the user’s perspective does notsignificantly deviate from a default fixed perspective. Multi-screen FTVRdisplay can create a larger viewing space than a single screen, increasing thehead-coupled motion parallax. A previous study with a cubic FTVR hasshown potentially better performance with a 3D path-tracing task [126] andmental rotation task [86]. However, the presence of seams between multi-ple displays discourages users to better take advantage of the multi-screenaspect [86, 126].Spherical FTVR displays have the advantage to generate seamless mo-tion parallax with the 360 degrees of visibility, creating a virtual “fishbowl”experience using head-tracked rendering. It maintains a metaphor of virtual41.2. Fishbowl VR DisplayFigure 1.1: Fishbowl VR display with multiple rear projectors. The stereo-scopic imagery is perspective-corrected based on the viewer’s left and righteye positions. The viewer perceived a 3D scene by looking at the projectionon the spherical surface.objects contained within the bounds of the spherical display. This metaphorhelps the virtual content to appear more naturally situated within the realworld, as if it were real objects within a glass globe or display case. As adisplay situated in the real world, we believe the spherical FTVR displayis promising to provide perceptual consistency between the real and virtualenvironment. By convention, a spherical fish tank is equivalent to the word“fishbowl”. We will use Fishbowl VR display in this dissertation to representspherical Fish Tank VR display.Our investigation starts with a spherical display using multiple mini-projectors shown in Figure 1.1. Tiling multiple projectors on the sphericalscreen can increase the display resolution and make the system scalable. Thechallenge lies in stitching and blending multiple projection to create a seam-less display. We present an automatic calibration approach that achievedsub-millimeter accuracy as discussed in Chapter 3. With head-tracking, thecalibrated display renders correct perspective based on the viewpoint. How-51.2. Fishbowl VR Displayever, it is not clear how accurate the calibration and tracking needs to bein order to create effective visualization. Even small inaccuracies inducedby the display or tracker may break the 3D illusion and cause visual arti-facts. Therefore, we conducted an end-to-end error analysis of a FishbowlVR display, which provides detailed descriptions of the criteria for differentcomponents in the system in Chapter 4. Together with the calibration ap-proach, we established the technical foundation of a Fishbowl VR displayto provide 3D experience with high visual fidelity .As the display system fidelity has been established, we conducted studiesto understand the way humans perceive spatial structure using the FishbowlVR display in Chapter 5. The spherical screen has several unique propertiesthat can potentially improve spatial perception and enhance the 3D experi-ence, such as the enclosing shape, consistent curved surface and borderlessviews from all angles. It is not clear whether these natural affordances canimprove spatial perception in comparison to traditional flat FTVR displays.Therefore, we conducted an experiment to see whether users can perceive thedepth and size of virtual objects better on a Fishbowl VR display comparedto a flat FTVR display in Chapter 5.Figure 1.2: Illustration of perceptual duality. (a) M. C. Escher’s Waterfall.Copyright c© The M.C. Escher Company B.V. Baarn. The Netherlands. (b)A user perceives a 3D cube by looking at a 2D image rendered in pixels onthe surface of the Fishbowl VR display. The user knows it is a 2D picturebut also knows it is a 3D structure. These two concurrent understandingsrepresent the perceptual duality when perceiving 3D out of 2D.61.3. ContributionVirtual objects are depicted by pixels on the screen. When we perceive a3D object, we are actually looking at digital pixels (Figure 1.2(b)), which isfundamentally different from the way we see objects in the real world, whenlight directly comes from the object, not from pixels with an incorrect depth.As human beings, we have the ability to perceive 3D out of a 2D picturebefore 3D displays are invented. Escher’s Waterfall (Figure 1.2(a)), is anexample of our ability to perceive 3D structure that can even be physicallyimpossible by looking at a 2D image. We know this is a 2D picture, wealso know this is an impossible 3D structure. The duality between thesetwo concurrent understandings may cause perceptual inconsistency when itmismatches with viewer’s expectation. We conducted studies to demonstratethis perceptual duality and proposed methods to alleviate the perceptualinconsistency in Chapter 6.Providing high visual fidelity in the virtual environment is difficult andinherently has considerable technical and perceptual challenges. To fullyovercome these challenges, years of research are required with the concertedeffort from numerous researchers. We expect that our effort of exploitingsome unique characteristics of such unusual display can help understand waypeople perceive objects, and build the perceptual bridge between the virtualand real world.1.3 ContributionThis dissertation investigates visual fidelity with a Fishbowl VR display fromthe technical aspect and the human factor aspect. We developed technicalapproaches to create a multi-projector Fishbowl VR display (Chapter 3 and4). The contribution of Chapter 3 is based on computer vision techniquesand Chapter 4 based on computer graphic techniques. We conducted studiesto understand the way humans perceive spatial structures using the FishbowlVR display (Chapter 5 and 6). Publications resulting from this work arelisted in Appendix A. The four main contributions are summarized here.Automatic calibration of a multiple-projector spherical displayi. Created a novel automatic calibration approach of a multiple-projector spherical display. We developed an automatic calibra-tion method to blend multiple projections in creating a seamless dis-play for a multiple-projector spherical display. Using the correspon-dence between the projected pattern and the observed pattern from a71.3. Contributionsingle camera, we reconstruct the 3D position of each projected pixelon the display.ii. Applied and evaluated the calibration approach to a FishbowlVR prototype. We applied and evaluated the calibration approachwith a Fishbowl VR prototype. The results achieved sub-millimetercalibration accuracy.Error analysis of a Fishbowl VR displayi. Formulated visual error of a Fishbowl VR display. We formu-lated the visual error of a Fishbowl VR display in terms of displaycalibration error and head-tracking error. Using this model, we ana-lyzed the sensitivity of the user’s visual error to each error source inthe system.ii. Established design guidelines for FTVR displays. Using thismodel, we provided guidelines and requirements for the tracking anddisplay system to minimize visualization error for FTVR displays.Evaluation of spatial perception in a Fishbowl VR displayi. Compared spatial perception in a Fishbowl VR display witha traditional flat FTVR display. We conducted an experimentand demonstrated that users perceived the depth and size of virtualobjects better on a Fishbowl VR display compared to a traditional flatFTVR display.ii. Demonstrated superiority of the spherical display in provid-ing better size-constancy. We showed that the perception ofsize-constancy is stronger on the Fishbowl VR display than the flatFTVR display.Evaluation of perceptual duality in FTVR displaysi. Demonstrated the influence of on-screen imagery on size per-ception. We conducted two studies and found that the size of on-screen imagery significantly influenced object size perception, indicat-ing there is a perceptual duality between the on-screen pixels and the3D percept.81.4. Outlineii. Demonstrated the influence of stereopsis on size perception.We conducted a study and found adding stereopsis can mitigated theperceptual bias of size perception caused by the on-screen imagery.iii. Compared size perception under different projection matri-ces. We conducted a study and found weak perspective projectionsignificantly reduced the perceptual bias of size perception and wasstrongly preferred by users compared to perspective projection.1.4 OutlineThis dissertation is structured around the four main research contributions.Chapter 2 is an overview of literature related to the topics of 3D displaysand perception studies in the virtual environment. Chapter 3 describesthe automatic calibration approach to blend multiple projections on thespherical display. Chapter 4 details the design and implementation of theFishbowl VR prototypes with visual error model, simulation results, andguidelines for FTVR displays. Chapter 5 describes the methodology andexperiment in evaluating depth and size perception on the Fishbowl VRdisplay. Chapter 6 describes the methodology and experiments in evaluat-ing the influence of the on-screen imagery on perceived object size. Chapter7 summarizes the dissertation contributions, describes directions for futurework, and provides concluding remarks. The appendices provide additionalbackground material. Appendix A lists the publications, research talks anddemonstrations associated with the dissertation. Appendix B describes theoptimization formulation of the calibration approach discussed in Chapter 3.Appendix C provides the subjective questionnaires of user experiments de-scribed in Chapter 5 and 6. Appendix D provides the tables of results fromthe statistic analysis in Chapter 5 and 6.9Chapter 2Related WorkNumerous effort has been made in researching and developing 3D displaytechnologies. By providing additional depth information, 3D displays intro-duce the idea of reproducing the visual experience in the digital world aswe have in the real world. Recently, the rapid advances in computer graph-ics and display technologies boost the development of 3D displays, makingvarious 3D displays such as Oculus Rift [54] and Hololens [61] accessibleand affordable. These emerging high quality implementations have made itpossible to deliver high level of visual experience in the virtual environment.In this chapter, we review the visual display technologies and perceptionstudies that are crucial to provide high visual fidelity in the virtual environ-ment. The first part of the chapter surveys existing 3D displays to providean overview of the strengths and weakness of each display technology insupporting perceptual consistency between the virtual and real world. Asproviding high fidelity visual experience requires understanding of human vi-sual perception, the second part reviews existing effort in evaluating spatialperception in the virtual environment focusing on depth and size perception.The last part reviews technical approaches such as calibration and trackingtechniques to support high fidelity visual experience on 3D displays.2.1 3D Visual DisplaysDisplay devices present information to one or more modalities of humanperceptual system with most of the displays focusing on stimulating the vi-sual, auditory or haptics [17]. In this dissertation, we focus on 3D visualdisplays which provide additional depth cues such as stereopsis and motionparallax compared to traditional 2D visual displays. Therefore, the termof displays represents visual displays in this dissertation. Among differenttypes of 3D visual displays, the head-mounted display is the most well-known one. Head-mounted displays (HMDs) can provide a strong senseof immersion and engagement with an “inside-out” 3D experience (lookingoutwards from inside the 3D volume). As an alternative design to early im-mersive HMDs, Fish Tank Virtual Reality display was developed to provide102.1. 3D Visual Displaysan “outside-in” 3D experience (looking inwards from outside the 3D volume)[106] with affordable and effective technologies for exploring 3D content andtasks. While being non-immersive, it renders virtual imagery situated in areal-world context, which is important for providing consistent perceptionbetween the virtual and real environment.Figure 2.1: Milgram’s reality-virtuality schematic continuum [94] with rep-resentative 3D displays discussed in Section 2.1. Real environments areshown at the left end of the continuum with virtual environments at theopposite extremum. The schematic continuum shows the variety of ways inwhich the Real (R) and Virtual (V) components could be mixed. Examplesof 3D displays include: (a) FTVR display pCubee [126] c© ACM (2010). (b)Perspecta volumetric display [40] c© IEEE (2005). (c) CAVE Painting withphysical props [72] c© ACM (2001). (d) Turkeck with physical replica [25]c© ACM (2015). (e) Sparse Haptic Proxy [24] c© ACM (2017). (f) AnnexingReality [50] c© ACM (2016). (g) Spatial AR display pMomo [158] c© ACM(2016). (h) Dyadic AR display [11] c© ACM (2015). All figures adaptedwith permission.Despite of the implication from its name, perhaps Fish Tank Virtual Re-112.1. 3D Visual Displaysality (FTVR) display is better classified as an Augmented Reality display.In Virtual Reality (VR), computer-generated graphics fully replace the realworld, while in Augmented Reality (AR), the virtual graphics is blended andsituated in the real world. We included Milgram’s taxonomy of Mixed Re-ality (MR) displays [94] to better understand the relation between different3D displays and the situated environments.As shown in Figure 2.1, the “reality-virtuality continuum” is an ab-straction which can be visualized as a line connecting the virtual and realenvironment, where the real environment is shown at the left end of thecontinuum, and virtual environment at the opposite extremum. In this con-tinuum, both AR and VR displays are subsets of MR displays, with ARdisplays considered to be closer to the real environment. Milgram furtherexpands the one-dimension continuum to a schematic collection of compos-ites represented as blocks between the virtual and real environment. Asshown in Figure 2.1, the same left-to-right continuum demonstrates the va-riety of ways in which the Real (R) and Virtual (V) components could bemixed, ranging from fully real to fully virtual environments with interme-diate points on the continuum being mixed environments. The upper halfof the scheme represents approaches creating real components in the virtualenvironment with the bottom half as creating virtual components in the realenvironment.To better understand the role of FTVR displays, existing representative3D display systems are enumerated and placed in the blocks of this scheme.Detailed discussions of these approaches are provided later in this section.VR HMDs such as Oculus Rift [54] and surround-screen projector-basedVR system (CAVE) [29] provide immersive 3D experience fully replacingthe real world. They are examples of Block 5. Some HMDs and CAVEsystems [72, 124] use real-world objects to provide real haptic sensation inthe virtual environment. They are examples of Block 4, 6 and 7. Otherimmersive HMDs with room-scale physical replica such as physical walls[24, 25, 90] are examples of Block 2 and 3. AR HMDs such as Hololens[61] superimpose virtual images on top of the real world. Depending on theamount of superimposed entities, they are examples of Block 8, 9 and 10.Spatial Augmented Reality (SAR) displays such as iLamp [114] and DyadicProjected SAR [11, 158] use projectors to place the augmenting graphicsover the physical objects. Depending on the scale of projection coverage,they are examples of Block 11 and 12. Finally, volumetric displays [40] andFTVR displays [143] are desktop 3D displays situated in the real world.Therefore, they are examples of Block 8.According to the reality-virtuality continuum, we believe blocks closer122.1. 3D Visual Displaysto the real environment shown in Figure 2.1 are promising to provide con-sistent perception as in the real world. The proximity of FTVR displayswith respect to the real environment shows promise of providing consistentperception. In this section we provide a background on the types of 3Ddisplays. Each 3D display approach has its unique properties to supportvarious depth cues. By examining and comparing the properties of FTVRdisplays to other 3D displays, we will gain an understanding of how to pro-vide consistent perception and improve the 3D experience in the virtualenvironment.2.1.1 Fish Tank Virtual Reality DisplaysFish Tank Virtual Reality (FTVR) display was originally proposed as asingle desktop display [143]. It is designed to be a small, bounded andfish-tank-sized artificial environment which corrects perspective to a trackedviewpoint. The important finding of the original FTVR display was a com-parison study of different depth cues, such as motion parallax and stereop-sis. While motion parallax and stereopsis together worked the best, motionparallax alone resulted in better performance compared to stereopsis alone[6, 143, 144]. Other research also found that motion parallax enhancedpresence [81], improved comfort [88], and decreased rotation error [87].A number of studies have been conducted to compare users’ performanceamong FTVR, CAVE and HMD. Qi et al. compared the performance ofHMD and FTVR using simulated volumetric data [106]. They found thatFTVR provided better accuracy at judging the shape, density and con-nectivity of objects. Demiralp et al. compared CAVE and FTVR in ascientific visualization application and found users performed an abstractvisual search task significantly more quickly and accurately on FTVR overCAVE [31]. Using similar experimental setup with a different task, Prabhatet al. found the opposite result. Participants performed significantly betteron CAVE and uniformly preferred CAVE [104]. The mixed results suggestthat user performance is task related. However, most of these user evalu-ations have relied on fixed, single screen FTVR displays, where the user’shead motion generates limited motion parallax cue and is generally withina viewing angle of 45 degrees off-axis. Limited head-motion will favor non-head-coupled rendering because the user’s perspective does not significantlydeviate from a default fixed perspective.An important extension of the FTVR concept used multiple screens toconstruct multi-sided FTVR displays. The first multi-screen FTVR dis-plays used three flat Liquid Crystal Display panels arranged into convex132.1. 3D Visual Displayscorner [71] or three projectors projecting into a concave corner [32]. Theadvantage of the multi-screen display is that it allows for a larger range ofhead movement around the screen and therefore enhanced the use of motionparallax cue. The concept was further refined with screens on five sidesof a box. Cubee and pcubee provided the illusion of an enclosed volumet-ric display allowing a viewer to view all sides of the 3D objects “inside”the display [126, 127]. However, it has been reported [85] that the occlu-sion caused by seams between screens discouraged users from changing theirview from one screen to another. This makes the shape of the display animportant form factor. Among different shapes of displays, the sphericalFTVR display has a promising shape as it has no seams between screensto provide an unobstructed view from all angles. Spherical FTVR displayshave advanced recently with improved calibration and rendering techniques[134, 139]. These advances have enhanced FTVR experience in 3D.2.1.2 Projector-Based 3D DisplaysVideo projectors have been utilized to build scalable and immersive 3Ddisplays. As a pioneering work, CAVE provided room-scale immersive visu-alization by projecting content on the six surrounding walls of a cubic emptyroom [29]. Following CAVE, various work have extended this concept withdifferent display shapes, such as dome [1, 121] and cylinder [122]. As CAVErequires an empty room setup, other work also explored the possibility ofprojecting view-dependent content onto dynamic surfaces in a room withfurniture and objects [68].In contrast to CAVE which completely surrounds and isolates the userfrom the real environment, Raskar first introduced the idea of augmentingthe appearance of everyday objects with video projections [114, 116, 117].This approach is called Spatial Augmented Reality (SAR) [13]. It augmentsand enhances physical objects and spaces in the real world by projectingimages onto their visible surfaces. The idea of overlaying virtual content ontoreal objects is followed by various work. Mine et al. [95] used projection-based augmentation to create spatially augmented 3D objects and spacesthat enhance the theme park experience. Zhou et al. [158] extend projection-based augmentation on moveable objects. Benko et al. [9] demonstrated aworking implementation of a projected SAR tabletop by accounting for thedeformations caused by physical objects on the table. Later they extendedthis approach to the entire room and provide simultaneous perspective viewsto two users [11]. Similar to Benko’s work, Jones et al. [69] proposedIllumiRoom to create peripheral projected illusions in the living room by142.1. 3D Visual Displaysaugmenting the area surrounding a television with projected visualizationsto enhance the viewing experience.In summary, projector-based approaches can support both VR and ARexperience in a scalable way. The primary challenge for projector-based 3Ddisplays is the multiple-projector calibration to produce a single continuousimage across different projections.2.1.3 Head-Mounted DisplaysHead-mounted display (HMD) maximizes field of view by placing a stereopair of displays directly in front of each eye. Although it is not until recentfive years commercial HMD products become widely accessible, one of themost well-known HMD can be traced back to as early as 1968 by Sutherland[131]. Based on the reality-virtuality continuum [94], HMDs can be classifiedas VR and AR HMDs. VR HMD, such as Oculus [54], provided compelling3D experience by placing the user in a fully immersive virtual environment.The result is the absence of the user’s physical surrounding context andpotential co-located collaborators [6], making it challenging to provide con-sistent perception across the real and virtual world. AR HMDs allow usersto see the real world, by mounting cameras on the display, known as video-based see-though [58], or reflecting projected images into user’s eyes, knownas optical-based see-through [61]. Because AR HMD augments user’s viewby superimposing virtual objects on the real world, it requires high accuracyon the registration to align the virtual object in the real environment [7],and presently only supports a limited field of view with a reduced brightnesscaused by the additive blending [11].A known perceptual issue for HMDs is the accommodation-convergenceconflict. The user focuses on a screen close to their eyes, while their eyesconverge toward 3D objects located far from the screens. The failure topresent focus information correctly, coupled with the convergence, may causemotion sickness and eyestrain [142]. While this problem also exists in other3D displays such as SAR and FTVR displays, it is believed to be less severethan HMDs as the content and displays both fit certain constraints with asmaller depth interval of the 3D space rendered by the displays [83].2.1.4 Static and Swept Volumetric DisplaysUnlike other 3D displays, static and swept volumetric displays producevolume-filling 3D imagery so that each voxel emits visible light from theregion where it appears. Due to this property, static and swept volumetric152.1. 3D Visual Displaysdisplays do not require supplementary hardware of head-tracking to presentdepth cues such as stereopsis and motion parallax. As 3D objects are ren-dered in-place via voxels, there is no accommodation-convergence mismatch.These displays became commercially available with relatively high technicalrequirements since the early 00s [40]. Comprehensive reviews and classifica-tion of volumetric displays can be found in [14, 40]. Despite of the capabilityof illuminating 3D points in their physical spatial locations, volumetric dis-plays are in general limited in viewing angle, resolution and brightnenss withdisplay artifacts compared to other 3D displays [45].2.1.5 Spherical DisplaysA number of interactive spherical display prototypes have been presentedin various forms. Perspecta Spatial 3D System from Actuality Systems[40] utilized an embedded stationary projector to project imagery onto arotating screen. Although it can generate volume-filling imagery, this typeof system is expensive to build and has limitations in resolution and scaling.As an alternative method, some systems use rear-projection directly ontoa spherical screen. Sphere [10], Snowball [15], and commercial productslike Pufferfish [65], use one projector for rear-projection onto the sphericalscreen. While simple and fairly effective, these systems offer low resolutionand lack of scalability. Spheree [133] extends this approach by mountingmultiple pico-projectors under the spherical screen to increase the resolutionand make the display scalable.Besides the design of the display system, various work have investigatedunique properties of spherical displays with comparisons to traditional flatdisplays. As one of the early work, Benko et al. [10] identified several uniqueproperties of spherical displays in comparison to flat displays. The sphericalshape provides easy 360-degree access for multiple users without occlusions.The continuous nature of the curved shape provides smooth transition bothhorizontally and vertically. The enclosing nature of the sphere inherently de-fines the virtual space within the screen. In contrast to flat displays, spheri-cal displays do not have a fixed “sweet spot” of viewing. Following this study,Bolton et al.[16] investigated how spherical displays can support differencesin sharing of information during competitive and cooperative tasks. Theydid not find superiority of spherical displays in competitive and cooperativetasks. Pan et al. [99] presented a tele-presence system using a spherical dis-play. They found the spherical display allows more accurate judgment of thegaze direction in comparison to a planar display. Teubl et al. [133] designeda perspective-corrected spherical display. They integrated the spherical dis-162.2. Visual Perception Evaluation in the Virtual Environmentplay with 3D design software. They found the free-viewpoint provides ahighly immersive sense for users. Berard et al. [12] presented a hand-heldspherical display and investigated its benefits for object examination. Theyfound the object examination task did not benefit from the accurate andprecise rotations offered by hand-held spherical displays compared to planardisplays. The divergent results from these studies suggest that user perfor-mance with spherical displays can be task-dependent. Though considerablework has presented various interaction techniques to leverage the nature ofspherical displays, previous literature reveals little conclusive findings on thefundamental spatial perception when using spherical displays.2.2 Visual Perception Evaluation in the VirtualEnvironmentLiving in a three-dimensional world, we are provided with abundant infor-mation everyday which helps us to perceive 3D objects and interact withthem. The information we receive is inherently multi-modal in the forms ofvisual, auditory and haptic feedback. In some cases we also get olfactoryfeedback. To improve the sensory experience in the virtual environment,all the modalities need to be considered. Among them, the visual infor-mation, which stimulates users’ visual system, is by far the most commonmodality studied by researchers to create high fidelity experience in the vir-tual environment due to its importance in the human sensory system [17].In this dissertation, we focus on the visual modality to create a consistentperception between the real and digital world.This section reviews existing literature which evaluate the spatial per-ception in the virtual environment. We first review definitions of fidelityin the virtual environment from previous literature to help understand anddistinguish the visual fidelity discussed in this dissertation. Then we reviewexisting work in terms of perceptual duality, depth cues, and size perception.2.2.1 Definition of Visual FidelitySeveral research examined the fidelity of the virtual environment. Milgramdiscussed “reproduction fidelity” [94] in the taxonomy of mixed reality dis-plays. The term refers to “the relative quality with which the synthesisingdisplay is able to reproduce the actual or intended images of the objectsbeing displayed”. As indicated in the reproduction fidelity dimension illus-trated in Figure 2.2, this term emphasized on the visual modality by examin-172.2. Visual Perception Evaluation in the Virtual Environmenting the implementation quality of both hardware and software to reproducean image. It is a gross simplification of a complex concept, associated withseveral factors, such as display hardware and graphic rendering techniques,by comparing an intended image to an actual image being displayed. Whilerelated, it is different from our intention of providing consistent perceptionbetween the real and virtual world.Figure 2.2: Milgram discussed “reproduction fidelity” [94] in the taxonomyof mixed reality displays c© SPIE (1995). The term refers to “the relativequality with which the synthesising display is able to reproduce the actualor intended images of the objects being displayed”.Another body of literature defined “VR system fidelity”, which consistsof “display fidelity”, “interaction fidelity”, and “scenario fidelity” [84, 93,108, 109]. McMahan et al. first proposed to use the term of “display fi-delity” [93]. The term refers to “the objective degree of exactness with whichreal-world sensory stimuli are reproduced by a display system”. It is distin-guished from “interaction fidelity”, which refers to “the objective degree ofexactness with which real-world interactions are reproduced in an interac-tive system”. The definition of “display fidelity” is broader than Milgram’sreproduction fidelity because the former goes beyond the visual modalityand deals with other modalities, such as haptic feedback, reproduced bydisplay systems based on the definition, although McMahan’s studies pri-marily examined the visual aspect. In their studies [93], they found thathigh level of display fidelity significantly improved users’ performance aswell as subjective judgments of presence, engagement, and usability.Laha [84] and Ragan [108] further expanded McMahan’s definitions to“scenario fidelity”. It refers to “the objective degree of exactness with whichbehaviors, rules, and object properties are reproduced in a simulation ascompared to the real or intended experience”. It focuses on the realism of182.2. Visual Perception Evaluation in the Virtual Environmentthe simulated scenario and the associated model data. As indicated by thename, it is a context-related concept associated with the realism providedby multiple modalities, with a desire to allow users to perceive and interactin a simulated scenario similar to the real or intended environment. Amongthe three definitions of fidelity in the work of McMahan [93], Laha [84]and Ragan [108], the term “display fidelity” is closest to our visual fidelitymentioned in this dissertation. However, there are two differences betweenthem. First, the display fidelity includes multiple sensory modalities whilevisual fidelity only deals with the visual modality. The second differenceis more subtle. The fidelity in the virtual environment can be either re-productional or operational. Being reproductional means exactly replicatingthe full sensory experience by reproducing equivalent stimuli in the virtualenvironment. Being operational means revoking the same behavior of userseven if the sensory stimuli are different. McMahan’s definition of displayfidelity emphasized on reproducing the real-world sensory stimuli, while ourvisual fidelity is operational to revoke the same behaviors of users. Given thelimitations of current display technologies of using pixels to depict virtualentities, one could hardly reproduce the real-world sensory stimuli as usersdirectly see the pixels rather than the depicted object.Among the previous work, perhaps our visual fidelity is closest to Ste-fanucci’s definition. Stefanucci et al. [128] defined “perceptual fidelity” as“the degree to which scenes presented on a display system are perceivedin the same manner as an equivalent real world scene”. As indicated bythe name, perceptual fidelity emphasized on the providing equivalent per-ception rather than generating equivalent stimuli. Therefore, high level ofperceptual fidelity means a virtual object can be perceived and interpretedin the same manner as an equivalent real object. In this dissertation, we useStefanucci’s definition of perceptual fidelity. Since perceptual fidelity canbe divided into visual, auditory and haptic fidelity, we focus on the visualmodality, which is a subset of Stefanucci’s perceptual fidelity.2.2.2 Perceptual DualityWhen users perceive 3D objects in screen-based 3D displays with variousdepth cues, they are actually looking at pixels on a 2D screen. A perceptualduality exists between the object’s pixels and 3D percept such that userscan either perceive the object in a 3D space, or as a 2D representation onthe screen. In human vision science, similar perceptual duality has beenstudied in the real environment. They found that the inherent dual realityin paintings or photographs enables viewers to perceive a scene as 3D at the192.2. Visual Perception Evaluation in the Virtual Environmentsame time see the flat surface of the picture [44, 46, 136]. In the virtualenvironment, few work has investigated this duality. Ware [142] discussedthe duality of size perception in the virtual environment as “a choice betweenaccurately judging the size of a depicted object as though it exists in a 3Dspace and accurately judging its size on the picture plane”. Benko et al.[11] mentioned this ambiguity as object presence. Using projector-based 3Ddisplays, they investigated when visualizing without the stereo cue, whetheruses could perceive the presence of virtual objects as spatial rather than as2D projections on the screen surface. Perhaps the most related work will beElner’s study on the phenomenal regression to the real object in the virtualenvironment [35]. When matching the perspective size of a virtual object toa real object at different distances, they found participants had a tendency toreport the size towards the real size, indicating the spatial perception mightbe related to both the standard and perspective stimulus. As there has beenlimited work investigating the perceptual duality, it remains unanswered ashow the on-screen imagery could impact spatial perception.2.2.3 Evaluation of Depth CuesFigure 2.3: List of depth cues discussed in Section 2.2.3. A comprehensivereview of the depth cue theory can be found in [142].Depth cues are the visual cues that help us to perceive objects and scenes in a3D space. There has been a large body of research which investigates the waythe visual system processes depth cue information to provide an accurateperception of space. A list of important depth cues is shown in Figure 2.3,including monocular cues and binocular cues. Monocular cues require only202.2. Visual Perception Evaluation in the Virtual Environmentone eye to be perceived, which can be further divided into static and dynamicmonocular cues. Static monocular cues, sometimes also known as pictorialcues, are classic depth cues widely presented in paintings, photographs andcomputer graphics, including but not limited to: shading, shadow, occlusion,texture and linear perspective. Dynamic monocular cues are the visual cuesfrom the optical flow when objects or viewers are moving, such as kineticdepth effect and motion parallax. A classic example of kinetic depth effectis the spinning dancer illusion resembling a pirouetting female dancer [146].It is ambiguous to interpret the dancer’s movement as one direction or theother due to the kinetic depth effect. In comparison to monocular cues,binocular cues require two eyes to perceive, including stereoscopic depthand eye convergence. A comprehensive review of the depth cue theory canbe found in Ware’s Information Visualization [142].These depth cues have been adequately simulated in the virtual environ-ment. Computer generated images provide pictorial cues in various applica-tions from 3D games to movies. Advanced 3D displays utilize head-trackingand shutter glasses to provide motion parallax and stereopsis. The relativeimportance of these depth cues in the virtual environment has been investi-gated by a number of work. As an early work, Wanger et al. [141] evaluatedthe influence of pictorial cues on perceived spatial relations in computergenerated images by examining the accuracy of matching the position, ori-entation and scale of two virtual objects. They found the linear perspectiveand shadow cues are important in the positioning and orientation tasks, andthe interaction between different pictorial cues is important in the scalingtask. For non-pictorial cues, Arthur et al. [6] assessed the relative impor-tance of stereopsis and motion parallax cues in a path-tracing tasks using aFTVR display. They reported that motion parallax is more important thanstereopsis to understand 3D structures. Ware [144] reported similar find-ings in a graph-tracing task and also noted dynamic monocular cues such asmotion parallax or kinetic depth cue can help in understanding 3D graphs.However, Arsenault and Ware found the stereopsis cue is considerably moreimportant than motion cues to guide hand movements in a Fitts’ Law tap-ping task [5]. The diverging results of previous studies indicate that theeffectiveness of depth cues is task-dependent.These early work identified important cues in different tasks and pro-vided guidelines on how to effectively apply these cues in 3D designs [142].Nowadays, due to the advances of display technologies, most depth cuessuch as stereopsis and motion parallax can be easily provided in 3D dis-plays. CAVE and FTVR displays provide stereopsis cue via shutter glassesand high-frequency screens. HMDs provide motion parallax cue via head-212.2. Visual Perception Evaluation in the Virtual Environmenttracking. However, presenting necessary depth cues may still not be suf-ficient to provide consistent perception between the virtual and real envi-ronment. In particular, depth cues such as sterepsis, accommodation andmotion parallax are rendered from the screen, not directly from the virtualentity in its physical spatial location. The spatial perception of the screen it-self and the graphical image projected on the screen are inherently conflated.Hence it is important to examine the role of the screens in 3D displays toprovide perceptual consistency between the virtual and real world.2.2.4 Depth and Size Perception Evaluation3D displays provide extra depth information, which can potentially facilitate3D understanding. A number of work have been conducted to investigatethe depth perception with different 3D displays such as HMDs and FTVRdisplays. While HMD has been classified as an “inside-out” display, whichallow viewers to look outwards from inside the display volume, FTVR dis-plays are “outside-in” displays, which support looking inwards from outsidethe display volume [106]. Due to the different visualization nature betweenHMD and FTVR displays, perception studies of these two categories of 3Ddisplays have focused on different aspects of depth perception. Depth per-ception studies using an HMD mostly investigated the egocentric distance,defined as the interval between the viewer and the virtual object [28]. Muchresearch used a verbal estimate task [4, 75, 96] as well as action-based taskssuch as the blind walking or pointing task [20, 70, 74, 96] to evaluate absolutedepth perception with HMDs. Egocentric depth has been measured mostlyin the action space and frequently reported to be under-estimated in theVE [118]. In contrast to the extensive research on egocentric distance, thereis limited work on exocentric distance, defined as the distance between twovirtual objects. As one of the early work, Arthur et al. [6] evaluated exocen-tric depth using a path-tracing task with a FTVR display. They reportedthat depth cues, such as the stereopsis and motion parallax, helped users toperceive relative depth to understand the 3D graph structure. Ware et al.[144] reported similar findings in a graph-tracing study and also noted anystructured motion cues such as head-coupled perspective can help in perceiv-ing depth. Grossman et al. [45] evaluated exocentric depth on a volumetricdisplay using a depth ranking task, a collision judgment task and a path-tracing task. They found volumetric displays enable significantly better userperformance than other 3D display techniques. Geuss et al. [43] found boththe egocentric and the exocentric depth are under-estimated in a turn-and-walk task using HMDs. Other researchers have used a depth-ranking and222.3. Hardware and Software Display Techniquespath-tracing task to evaluate the exocentric depth with other 3D displays[9, 12, 126]. Being simple and straightforward, the depth-ranking task hasbeen found to be effective to evaluate exocentric depth for closer distances.While there has been considerable work studying depth perception withdifferent 3D displays, the number of size perception studies in the VE isquite limited. Most of the research investigated size perception with anHMD using a size-matching and size-judgment task. Eggleston et al. [34]found size-constancy is weak in a VE compared to the real world using anHMD. Kenyon et al. [91] further found depth cues like stereopsis and fa-miliar environmental objects can help establish size-constancy in the VE.They used a size-matching task to measure size perception in different view-ing conditions. Ponto et al. [103] investigated the size perception usinga shape-matching task and found that accurate perceptual calibration willsignificantly improve the size perception. Stefanucci et al. [128] used size-judgment tasks to assess the perceived size of virtual objects. They foundthe size in the VE is underestimated compared to the real world. Kelly [74]investigated the re-scaling effect caused by walking through the VE also us-ing a size-matching task. They found walking through a VE causes rescalingof perceived space with an HMD. Benko et al. [11] evaluated size percep-tion with Dyadic Projected Augmented Reality and found participants areable to deduce the size and distance of a virtual projected object with asize-judgment task. Elner and Wright [35] reported a direct measure ofVE visual quality in a distance and size estimation task with an HMD. Tosummarize, size perception in the virtual environment is measured via sizematching or judgment tasks, mostly using CAVE and HMD systems, withthe results showing a trend of size underestimation [28].Despite of different 3D displays utilized in previous studies, it is promis-ing to use similar methodologies and tasks to evaluate spatial perceptionin the virtual environment due to the effectiveness demonstrated from theresults. Yet most previous work focus on HMD and CAVE; there is a cleargap in the perception studies for FTVR displays.2.3 Hardware and Software Display TechniquesTo provide accurate perception in the virtual environment, it is important toensure the display renders the correct perspective in real-time based on theviewpoint, which requires hardware and software support to create the 3Dvisual display. In this section, we review the technology to implement the3D visual displays, with a particular focus on the tracking and calibration232.3. Hardware and Software Display Techniquestechniques to provide the fidelity necessary for FTVR displays.2.3.1 Multiple-Projector Display CalibrationMultiple-projector systems tile multiple projectors to create scalable dis-plays, typically for large-scale screens. The challenge for multiple-projectorsystems lies in the stitching and blending of images from different projec-tors to create a seamless imagery. This requires geometric and photometriccalibration. The geometric calibration of multiple-projector systems usescameras to record correspondences from the known pattern to the observedprojected pattern. We summarize a few approaches that are most closelyrelated to our effort in calibrating the spherical FTVR display described inChapter 3.Despite substantial work on calibration techniques for planar screens[21, 23, 110, 112], the non-linearity of the curved screen is a challenge formultiple-projector system calibration. An early work [113] presents a cal-ibration approach of the non-planar surface using a stereo camera pair torecover intrinsic and extrinsic parameters and reconstruct the non-planarsurface. This approach has been further improved by focusing a subset ofcurved screens called a quadric screen to recover a quadric transformation[115]. Another approach [49] uses physical checkerboard patterns attachedon the curved display to provide the camera a composition of 2D-mesh-basedmappings. Their approach aims at a class of curved surface that can be bentor folded from a plane. However, the use of a physical marker on the displaycauses limitation in the application space.Majumder et al. proposed a series of automatic calibration approachesfor non-planar screens [120–122]. Using an uncalibrated camera, they com-puted a rational Bezier patch for a dome screen. This approach worked forvarious shapes such as extruded surfaces, swept surfaces, dome surfaces andCAVE-like surfaces. The camera is mounted on a pan-tilt unit to cover theentire display. Their approaches aim at large-scale immersive displays.Teubl et al. developed Fastfusion [134] which automatically calibratedmultiple-projector systems with different shapes of screens. Fixed warping isutilized to register imagery from projectors to avoid geometry reconstructionof the display. For a spherical shape, an implicit linear assumption hasbeen made using a homography transformation between the camera andthe projector. This causes observable misalignment and distortions in theoverlapping areas.242.3. Hardware and Software Display Techniques2.3.2 Head Tracking TechniquesThe importance of head tracking has been long appreciated in the practiceof Human Computer Interaction. There has been a tremendous body ofliterature related to tracking technologies and sensor fusion. Comprehen-sive surveys on the head tracking and human sensing technologies can befound in [125, 132, 149]. Sensor-based head tracking requires use of sensorssuch as inertial measurement unit or magnetic sensors [102] mounted on thehead to capture the movement of head, while vision based tracking such asKinect [60] requires the acquisition of the head images using cameras. Bothapproaches have their advantages and disadvantages. Sensor based trackingcan be uncomfortable to users as sensors are mounted on heads with phys-ical contact, while vision based tracking is more user friendly but suffersfrom configuration complexity and occlusion problems [80]. As vision basedtracking performs best with low frequency motion and sensor based trackingsuch as inertial based sensing performs better for measuring high-frequencyrapid motion, a number of hybrid tracking approaches have been proposed toexploit the complementary nature [125]. The configuration complexity suchas the synchronisation between multiple sensors has made it challenging tobe widely applied. Recent advances of optical marker-based motion captureprovides high-fidelity tracking with low latency. A number of commercialmotion capture systems have become available and accessible [62, 66].2.3.3 Error Analysis of 3D DisplaysSeveral research has handled the error for different 3D displays [7, 51, 92,150]. As a pioneering work, Holloway et al. [51] analyzed different errorsources for a HMD using a set of parameters. They found the system delayas a combination of the tracking latency (11ms), graphic computation cost(17ms) and video sync delay (16.7ms) at 60 Hz caused significant registra-tion error of 6cm. MacIntyre et al. [92] presented a statistical method toestimate the error and further use the estimated error to improve the ARinterface of HMDs. Bauer provided detailed analysis and approaches to es-timate the tracking error in augmented reality systems [8]. For Non-HMDsystems, Cruz et al. discussed tracking noise and delay as error sourcesin the CAVE [29], with comparisons to a HMD and traditional monitor.They found the monitor (2.5◦) caused more eye angular error than HMDand CAVE (0.5◦) for large viewing distance (> 100cm). However, for smallviewing distance (< 100cm), a tracking error of 3cm may cause up to 5◦ ofeye angular error on CAVE. Kindratenko investigated tracking error in the252.4. SummaryCAVE and summarized techniques to correct the location error and theorientation error [79]. Vorozcovs examined sources of error and discussedtechniques to reduce these errors in CAVE-like systems [137]. While most ofthe work focus on HMD and CAVE systems, limited consideration has beengiven to FTVR displays. It remains to be unanswered on the error analysisof FTVR displays and how different sources of error would contribute tothe visual error. As technologies have advanced with better tracking andrendering support, we expect current 3D displays will have less visual errorcompared to the literature.2.4 SummaryTo summarize, we presented a review of 3D display techniques and discussedthe strengths and weakness in support of high visual fidelity . Existing workdiscussed in Section 2.1.1 show the promise of FTVR displays to bridge theperceptual gap between the real and virtual world. Combining FTVR witha spherical display is promising to further enhance the 3D experience withunique properties of the spherical screen. Yet the spherical FTVR displayshave challenges from both technical and perceptual standpoints.We reviewed the technical challenges (Section 2.3.1 and 2.3.3) and theperceptual issues (Section 2.2.2 and 2.2.4) in the virtual environment. Theprimary technical challenge for multi-screen FTVR displays is to stitch andblend display content from multiple screens to create a seamless display.Even small inaccuracies and errors induced by the display or tracker maybreak the 3D illusion and cause visual artifacts. It remains to be unansweredon the error analysis of FTVR displays and how different sources of errorwould contribute to the visual error.While considerable work have presented spherical display prototypes,previous literature reveals little conclusive findings on the fundamental spa-tial perception when using spherical displays. Like other screen-based 3Ddisplays, FTVR displays create 3D illusions by rendering view dependentimagery on 2D screens. When users perceive 3D objects, they are actuallylooking at pixels on a 2D screen. A perceptual duality exists between theon-screen pixels and 3D percept such that users can either perceive the ob-ject in a 3D space, or as a 2D representation on the screen. Yet there hasbeen very limited work investigating this perceptual duality and its potentialinfluence on size perception.To clear these obstacles in support of high visual fidelity , we have identi-fied four areas to contribute to this thread of research with two as technical262.4. Summarycontributions and the rest two as human factor contributions. They are:(1) an automatic calibration approach to blend multiple screens, (2) an er-ror analysis of a spherical FTVR display to establish design guidelines, (3)an evaluation of spatial perception in a spherical FTVR display and (4) anevaluation of perceptual duality in FTVR displays.27Chapter 3Multiple-projector SphericalDisplay CalibrationIn the previous chapter, we outlined a number of interesting and uniqueproperties of spherical displays identified by existing work, making them apromising platform to support high fidelity of visual experience in the virtualenvironment. In particular, the capability to provide an unobstructed viewfrom all angles is ideal for creating FTVR visualizations. Providing highresolution, uniformly spaced pixel imagery on the spherical screen is impor-tant for constructing Fishbowl VR displays. One approach is to tile multipleprojectors on the spherical screen to increase the resolution and make thesystem scalable. The challenge for this lies in the stitching and blending ofimages from different projectors to create seamless imagery. This requiresgeometric and photometric calibration of the multiple-projector system.3.1 IntroductionGeometric calibration of a multiple-projector system typically uses a camerato record correspondences from the known pattern to the observed pattern.Systems with a planar screen take advantage of 2D homography transforma-tions to linearly establish the projector-to-display correspondences. Whilethere has been substantial previous work on calibration techniques for planarscreens [21, 23, 110, 112], automatic calibration for a curved screen has notreceived as much attention. A few have investigated approximate correspon-dence through 2D parameterization either with a linear approximation [134]or physical markers on curved screens [49]. Others have attempted to recoverthe 3D geometry of the display to establish the mapping [2, 113, 115, 155],but this usually requires a substantial amount of manual interaction.In addition, previous work has primarily targeted large scale immersivedisplays like domes to create a sense of immersion. These displays consist ofmultiple front-projecting projectors and cameras with pan-tilt units to coverthe entire display. For relatively small scale desktop FTVR display, projec-283.1. IntroductionFigure 3.1: Our goal is to calibrate a multiple-projector spherical displaywith a single camera to allow for a seamlessly blended image. For sphericalrendering (left), blending is the most important issue. We target the appli-cation of Fishbowl VR display (right) which uses single-person perspective-corrected rendering and requires a more accurate calibration method to pro-vide a higher quality experience to the user.tors are used in a rear-projection configuration through a small projectionhole at the bottom of the spherical screen [10, 133, 155]. Existing calibra-tion methods will not work because the camera’s view is mostly blockedby the edge of the projection hole. Applications in FTVR display requireaccurate geometry registration to support perspective-corrected viewpointsand subtle interactions in real-time. Taken together, the configuration ofthe Fishbowl VR display makes the calibration challenging in the followingaspects:• Scale The Fishbowl VR display is a desktop system with a small-scale screen compared to large-scale immersive displays. Space is quitelimited for the camera and projectors beneath the screen.• Visibility The camera view will be occluded by the edge of the smallprojection hole at the bottom.• View-dependent applications The calibration results will supportview-dependent applications by generating perspective-corrected im-agery in real-time.293.1. IntroductionThis chapter provides an automatic calibration method that meets theserequirements and supports applications in a desktop Fishbowl VR display.We start with a semi-automatic approach that solves these problems. Thisapproach begins with a pre-calibration step that outputs the intrinsic param-eters of the camera and the projectors. Then each projector is paired withthe same camera to form a stereo pair. For each pair, a pattern projectedonto the display is captured by the camera to recover extrinsic parametersvia the essential matrix. Using intrinsic and extrinsic parameters, we trian-gulate projected features and compute the sphere’s pose via Weighted LeastSquares. The parameters of the sphere (pose) and the camera/projectorpairs (intrinsics and extrinsics) are further refined via a nonlinear optimiza-tion. Finally we recover the 3D position of each pixel on the display surfacevia sphere-ray intersection for each pixel per projector.Up to this point this approach is semi-automatic because it requires ad-ditional work to calibrate the intrinsics of the projectors. We provide furtherimprovement of this approach by avoiding the separate calibration of projec-tors: by estimating the fundamental matrix using the projected pattern onthe sphere, we recover the absolute dual quadric for each projector, whichis then used to recover the intrinsic parameters of projectors.Figure 3.2: left: Multiple-projector spherical display layout with an exampleof two projectors (P1 and P2) showing overlapping back-projections with re-spect to the camera. right: Projected blob patterns on the spherical displaysurface observed by the camera, for each of two projectors.303.2. Calibration ApproachFigure 3.3: Calibration pipeline of a semi-automatic and automatic approachfor a desktop Fishbowl VR display. For semi-automatic approach, projec-tors and camera are calibrated in the pre-calibration step. For automaticapproach, only the camera is calibrated in the pre-calibration step. An addi-tional step of blending (intensity normalization) which adjusts the intensityof overlapping area between adjacent projections is followed by ray-sphereintersection based on [113].We also introduce a practical evaluation method using the camera toestimate the accuracy of our approach, using on-screen metrics instead ofreprojection error. We can measure the misalignment by matching pointsand lines between the observed pattern and the expected pattern.3.2 Calibration ApproachOur Fishbowl VR system consists of multiple projectors and a sphericalscreen. As shown in Figure 3.2(left), the projectors work in rear-projectionmode through a small projection hole at the bottom of the spherical screen.313.2. Calibration ApproachPerspective-corrected images are generated on the sphere based on the viewer’sposition.The calibration approach we propose is presented in Figure 3.3. Westart with a semi-automatic approach that supports applications in a desk-top Fishbowl VR display. Then we enhance the method to make it auto-matic by avoiding the separate calibration of projectors. This approach isrealized by recovering the absolute dual quadric for each projector using thefundamental matrix.3.2.1 Semi-Automatic Calibration ApproachPre-Calibration The camera and the projectors are pre-calibrated to de-termine the intrinsic parameters. The camera is calibrated using a standardcheckerboard calibration approach [151] to estimate the nine intrinsic pa-rameters including focal length (2), principal point (2) and lens distortion(5). The checkerboard-based calibration approach [151] typically yields there-projection error of one pixel. Each projector is calibrated using a plane-based calibration approach [39] with the help of the calibrated camera toestimate the four intrinsic parameters including focal length (2) and princi-pal point (2). The plane-based calibration approach [39] typically yields there-projection error of 1-2 pixels. After this step, the intrinsic parameters ofcamera and projectors are recovered.Pair Calibration Each projector Pi and the camera C are paired as astereo pair Si as shown in Figure 3.2(a). Although we have the pre-calibratedintrinsics for each pair, the extrinsics are still unknown. In this step, weproject blob patterns onto the spherical screen, detect them as blob featuresin the camera, and record feature correspondences for each pair as shown inFigure 3.2(b).Using these correspondences, the essential matrix Ei can be recoveredsince we know the intrinsics for the pair Si. Then we extract the rotationRi and translation Ti from the essential matrix Ei using Singular ValueDecomposition. Although there are four possible solutions for the calibratedreconstruction from Ei, only one solution is the correct reconstruction thathas 3D points in front of both C and Pi. Thus testing with a single point todetermine if it is in front of both C and Pi is sufficient to select the correctsolution [48] for Si. The camera center is chosen to be the origin for allprojectors and the camera with all three axes aligned with the camera.However, the translation vector Ti is recovered up-to-scale such thatthe coordinates between pairs are still up-to-scale. To solve this problem,323.2. Calibration Approachwe choose one pair S0 as the “standard” pair that is assumed to have atranslation with norm of 1, then we estimate scale factors for other pairswith respect to S0. These scale factors are computed using the knowledgethat points between pairs are on the same sphere. We first triangulate blobfeatures for each pair using the up-to-scale extrinsics. Then we fit a spherefor each pair, and compute each scale factor using Linear Least Squaresbased on the recovered sphere poses (center position and diameter) from S0and Si. After this step, each pair has extrinsics in the same camera-centeredcoordinates.Sphere Pose Estimation With intrinsics, extrinsics and 3D points inthe camera-centered coordinate system, the sphere pose can be recoveredby fitting a sphere with these 3D points using a Weighted Linear LeastSquares [130]. The weighting comes from the re-projection error in thetriangulation step so that a large re-projection error results in a small weightin determining the sphere pose.From these steps we have calculated a full set of parameters that affectsthis system: intrinsics, extrinsics and sphere pose. However, parameterslike extrinsics and sphere pose are roughly estimated. Thus, using theseparameters to compute a 3D position for each pixel on the display spheremay cause significant errors. So we use these results as an initial guess fora nonlinear optimization to refine them.Nonlinear Optimization Parameters are refined using a non-linear op-timization with the previous result serving as an initial guess. We nowdescribe the error function we use for the non-linear optimization.Assume we have 1 camera and N projectors. Camera parameters ~pchave 9 degree of freedom (DOF): 4 for the focal length and the principalpoint; 5 for lens distortion [33]. Each projector has parameters ~ppi with 10DOF: 4 for the focal length and the principal point; 3 for rotation and 3for translation. Sphere parameters ~ps have 4 DOF: 3 for the center positionand 1 for radius.For each pixel ~xpij in projector Pi with the subscript j representing jthpixel, a ray is back-projected and intersects with the sphere at the point~Xij . The back-projection and ray-sphere intersection can be expressed as afunction f based on variables ~ppi and ~ps:~Xij = f(~xpij ; ~ppi , ~ps) (3.1)Then the 3D point ~Xij is observed by the camera at pixel ~xcij on the333.2. Calibration Approachimage plane. This can be expressed as a function g based on variables ~psand ~pc:~xcij = g(~Xij ; ~pc) (3.2)Substituting equation (3.1) into equation (3.2), we get a function F thatmodels this whole process:~xcij = g(f(~xpij ; ~ppi , ~ps); ~pc, ~ps)= F (~xpij ; ~pc, ~ps, ~ppi)(3.3)Since we know exactly which pixel ~xpij has been projected from projectorPi, the error function is formulated as the re-projection error in the camera:E =∑i∑jd(xcij , F (xpj ; pˆc, pˆs, pˆpi))2, (3.4)where xcij is the detected point from camera and F (xpj ; pˆc, pˆs, pˆpi) is theestimated point based on parameters pˆc, pˆs and pˆpi . The analyticexpressions and derivations of equation 3.1, 3.2, and 3.4 can be found inAppendix B.For a system with N projectors, there are 13+10N variables to refine. Weuse a Levenberg-Marquardt algorithm to solve this non-linear least squareproblem. The solver is initialized using our previous results.Ray-Sphere Intersection After refining the parameters, we compute the3D position for each pixel on the display via ray-sphere intersection withrays coming from each projector by solving a quadratic equation B.4. Thegeometric result for each pixel is stored in a look-up table.When there are two real solutions of the quadratic equation B.4, whichindicates there are two points intersected between a ray and a sphere, wealways choose the point further from the ray origin as we use rear-projectionthrough a projection hole. When there is no real solution, which indicatesthe ray does not intersect with the sphere, we store a value of (0, 0, 0)T ,indicating the corresponding pixel is off sphere and should be discarded inrendering. In a rare case when there is one real solution, indicating the rayis tangent to the sphere, we store the result of that solution. It should benoted that the reconstruction does not determine the scale of the display.In practice, the geometric result is normalized and stored as RGB values inan image format for each projector as shown in Figure 3.4.343.2. Calibration ApproachFigure 3.4: Example of calibration result stored as images for a three-projector spherical display. Black values represent off-sphere pixels and thecircle pattern shows the projection hole. The 3D positions are stored inRGB channels of the image and the intensity weight for blending is storedin the alpha channel.Alpha Blending To create a seamless imagery on the screen, the per-ceived intensity in the overlapping area between adjacent projections needsto be normalized. We implemented the alpha blending approach by Raskar[113] using the geometric result of pixels from the previous step. We createan alpha mask for each projector with a weight factor between 0 and 1 foreach pixel. For the same point on the screen, the sum of weights from allprojectors will add to unity which yields to normalized intensity. The weightAij for the jth pixel of projector i can be computed as:Aij =di(i, j)N∑k=1dk(i, j) ∗ bk(i, j)(3.5)where di(i, j) is the arc distance from the jth pixel of projector i to thenearest edge of projected image from projector i, and dk(i, j) is the arcdistance from the jth pixel of projector i to the nearest edge of projectork. bk(i, j) is a binary value of 0 or 1. It equals to 1 when the jth pixel ofprojector i is within the frustum of projector k, and to 0 otherwise.When the jth pixel of projector i does not overlap with other projections,bi(i, j) = 1 and bk(i, j) = 0 for all k 6= i. Therefore the weight Aij will be1. When the jth pixel of projector i overlaps with other projections, theweight Aij will be a value between 0 and 1. Aij is inclined to 0 when thejth pixel is close to the edge of projected image from its own projector i.Therefore more weights are given to the pixels from other projectors at thesame point on the screen. The alpha blending result for each pixel is storedin the alpha channel of the calibration result.353.2. Calibration ApproachTo summarize, our semi-automatic method uses projector and cameraintrinsics as prior knowledge to estimate the extrinsics and display pose pa-rameters. These estimations are then used as an initial guess in a nonlinearoptimization to refine the result. In this process, the camera and projectorsare calibrated once and can be used afterwards. If there is disturbance onthese devices, the system can be re-calibrated by projecting blob patternsand using the non-linear optimization. However, as described, this semi-automatic approach still requires manual work in the pre-calibration step tocalibrate the intrinsics of the projectors. We now describe our automatic ap-proach that recovers projector intrinsics together with the extrinsics directlyfrom projected patterns on the spherical display.3.2.2 Automatic Calibration ApproachIn this section we revisit the calibration pipeline in Figure 3.2 and presenttechniques to make the workflow automatic.Pre-Calibration As illustrated in Figure 3.3, only the camera’s intrinsicsare determined for use as prior information in the same way as described inSection 3.2.1.Pair Calibration In this step, we first determine the internal projectorparameters directly from the uncalibrated images. This is essentially anauto-calibration problem described by Hartley and Zisserman [48]. For eachpair Si, we project the same blob pattern as in the semi-automatic approach.The fundamental matrix Fi is recovered using these correspondences. Weobtain a projective reconstruction for each pair by choosing the projectionmatrices as P0 = [I|0] and Pi = [[e′i]×Fi|e′i], where e′i is the epipole in theprojector view, P0 for the camera and Pi for the projector. The reconstruc-tion {Pi, Xi} for each projector is up to a projective transformation. Ourgoal is to find the projective transformation Hi such that {PiHi, H−1i Xi}is a metric reconstruction which is up to a similarity transformation. Asdescribed by Hartley and Zisserman [48], Hi can be expressed in the form:Hi =(Kc 0vT 1)(3.6)where Kc is the intrinsic matrix of the camera. Since Kc is known, theonly unknown is the vector vT with 3 DOF. Because vT = −pTKc wherepT are the coordinates of the plane at infinity, this is essentially a problemto recover the plane at infinity, pT , for each pair.363.2. Calibration ApproachWith further information such as the vanishing points, pT can be recov-ered. However, vanishing points or parallel lines can hardly be observed inour case because the screen is curved. In most projectors today the principalpoint is vertically offset at the bottom-center of the image with zero-skewbetween axes so that the projection is not occluded by the table [97, 111].We use the assumptions of zero skew and principal points 1 to provide ad-ditional constraints needed to solve for pT .Encoding the infinity plane pT in a concise way, the absolute dual quadricQ∗∞ for each projector under the transformation Hi can be expressed as:Q∗∞ = Hi(I3×3 00T 0)HTi =(ω∗c −ω∗cp−pTω∗c pTω∗cp)(3.7)where ω∗c = KcKTc is the dual image of absolute conic (DIAC) of thecamera [48]. Each Q∗∞ is related to the projector intrinsics in the form of:ω∗pi = PiQ∗∞PTi (3.8)where ω∗pi = KpiKTpi is the DIAC of the projector and Pi is thereconstructed projection matrix of the projector using the fundamentalmatrix.In this case, constraints on projector intrinsics can be transferred toconstraints on Q∗∞. Our constraints are the known principal points andzero-skew. This results in three linear constraints for each projector on Q∗∞:ω∗pi(1, 3) = ppx ∗ ω∗pi(3, 3)ω∗pi(2, 3) = ppy ∗ ω∗pi(3, 3)ω∗pi(1, 2) = ppx ∗ ppy ∗ ω∗pi(3, 3)(3.9)where ppx and ppy are known principal points. This provides a directsolution for pT . Once we recover pT , we can get the DIAC and hence theintrinsics of each projector.1For cases in which the principal point is not at the bottom-center, a general cameramatrix is necessary. This will be dealt with by the semi-automatic method in whichprojector intrinsics are pre-calibrated.373.3. EvaluationThe rest of the method is the same as our semi-automatic approach: es-timate the extrinsics using these intrinsics, then triangulate and fit a sphereto find the sphere pose, followed by a nonlinear optimization to refine theseparameters.By doing these steps, we avoid the manual work to calibrate the pro-jectors, and the whole calibration can be implemented by projecting blobpatterns and detecting projected features automatically.3.3 EvaluationWhile various work have proposed calibration methods for curved screens,none of them have provided an evaluation for on-surface accuracy. Raskaret al. [115] evaluated their methods using root mean square (RMS) re-projection error, others use simulation to estimate percentage errors of theestimated camera and display parameters [120–122]. In this work, we pro-pose an evaluation method that estimates on-surface accuracy empirically.We use this to evaluate our calibration approach and compare variations ofour approach. We also include comparison with other work [115, 155] basedon RMS re-projection error.3.3.1 MetricsWe use three metrics to evaluate the result of calibration: global point error,local point error and line error. These three metrics are chosen to char-acterize the visual artifact and distortion caused by calibration inaccuracywith point errors describing the severity of the shadow effect and line errorrepresenting the non-linear distortion caused by the curved display. The fi-delity of calibration is encapsulated in these metrics with the goal to createa seamless and undistorted imagery on a curved screen.Global point error describes the overall misalignment of the display. Wedefine the global point error to be the displacement between the expectedand the actual position of a projected point. There are two units that canbe applied to point error: one is the RMS re-projection error in pixels; theother is the arc length in, for example, millimeters, directly on the sphericalscreen. Because the arc length varies with the size of the spherical screen,we use radians of the sphere to describe on-surface misalignment.Local point error describes the local misalignment between adjacent pro-jectors. The effect of local point error is usually observed as a ghost effect inthe overlapping projectors area. We define it as the displacement betweena point from one projector and the same point from its adjacent projector.383.3. EvaluationSimilar to global point error, we use both RMS reprojection error in pixeland on-surface error in radians to describe local point error.Line error is used to describe the distortion of the overall display. Theeffect of line error is observed as distortions (i.e. straight lines appear to becurved). This is important for a FTVR system, because the distortion willcause perceptual discrepancies based on the viewpoint. We define the lineerror to be the angular difference between the expected lines and the ob-served lines. Ideally the projected line should be collinear with the expectedline.Figure 3.5: (left) Camera’s view of projected grid patterns from two projec-tors (red and blue). Expected grid pattern is drawn with black dash lines andobserved projected pattern is drawn with solid white lines. Purple arrowsillustrate the local point error in overlapping area. (right) Back-projectionof pixel error to estimate on-surface error3.3.2 Error MeasurementTo measure the error, a camera is introduced into the system that observesprojected patterns on the spherical screen. The camera is regarded as avirtual viewer with known pose (relative to the display) that observes certainpatterns. For example, if a grid pattern is expected from the viewpoint of thecamera; then the color value for each pixel in the projector can be determinedby projecting its associated 3D position onto the image plane of the camera.393.3. EvaluationIdeally, the camera will observe a grid pattern which is exactly the same asthe expected pattern regardless of the curvature of spherical screen. Thenwe use the expected grid pattern as ground truth and compare it with theactual observed pattern. The point error is computed based on the locationsof crossing features in the grid pattern, while line error is computed based onthe angle between the projected line segment and the expected line segmentat each crossing feature.Ideally, a camera should be placed outside the display representing auser who is visualizing the display. In practice, we utilize the same cameraused for calibrating the spherical display since the camera has already beencalibrated with known intrinsic and pose parameters for the display. As thecamera is placed underneath the display, it should be noted that this setupis different from the configuration of the idealized measurement.Figure 3.5 shows the global point error with dashed black lines as groundtruth and solid white lines as actual observations from two projectors withtheir corner features marked in red and blue circles. Crosses and line seg-ments in the image are detected using template matching. The global pointerror is computed based on the displacement in pixels between black andwhite crosses. Shown as arrows, the local point error is computed basedon the displacement in pixels between crosses from adjacent projectors inthe overlapping area. Finally, the line error is computed based on the slopedifference of observed white lines and the expected black lines.Point errors are evaluated in the form of RMS reprojection error in pix-els. Despite being simple and effective, the computed pixel error does notdirectly predict on-surface registration error [115]. To acquire an estimateof on-surface error, we back-project the displacement in the image onto thespherical screen and compute the arc length on the screen as shown in Fig-ure 3.5. As the on-surface error varies with the size of display, the arc lengthis computed in radians.3.3.3 ImplementationWe implement the Fishbowl VR system with two mini-projectors and anacrylic spherical screen. The spherical screen has a diameter of 30 cm anda projection hole of 21 cm diameter. The projectors are ASUS P2B [53]with resolution of 1280 x 800. A host computer with a NVDIA QuadroK5200 graphic card sends rendering content to projectors. The renderingcontents are created using OpenGL. We employ head-tracking to generateperspective-corrected views for applications in FTVR. The viewer is trackedusing Polhemus Fastrak [64] . A two-pass rendering [113] approach based403.3. Evaluationon head position is chosen due to the non-linearity of the curved screen. Weused a single gray-scale camera from FLIR Systems (Flea3 FL3-U3-13Y3M-C) [55] with a camera lens (Fujinon 3.8-13mm F1.4) from Fujifilm [56].The calibration approaches are implemented using OpenCV [98] andthe nonlinear optimization is implemented using Alglib [105]. The semi-automatic approach takes about 20 mins and automatic approach takesabout 3 mins to calibrate a two-projector system. Our approach supportsview-dependent and view-independent applications in Fishbowl VR displays.Figure 3.1(a) shows a view-independent application after calibrated andblended. The earth image is stitched from different projectors seamlessly.Blending is implemented using an alpha mask technique [113]. Figure 3.1(b)shows a view-dependent application. The viewer is tracked and presented aperspective-corrected images using our calibration result.3.3.4 ResultWe compare the results of the semi-automatic and automatic approaches inTable 3.1. For both, we see substantial improvement after nonlinear opti-mization. For the semi-automatic approach, the initial guess is computedusing a pre-calibrated projector with more accurate intrinsic parameters;hence it has much less error than the initial guess of the automatic approach.This also explains a slightly smaller error after refinement compared withthe automatic approach, although the difference is very small. For the au-tomatic approach, although the error before optimization is quite large, theresults are largely improved by nonlinear optimization. As a result, the finalresult of the automatic approach is close to the one from the semi-automaticapproach but required no manual interaction.Figure 3.6 shows the comparison between algorithms with respect toon-surface error. We convert the radian error in Table 3.1 to arc lengthin millimeters on a 30 cm diameter sphere. Both the semi-automatic andautomatic approach can achieve accurate registration: the on-surface pointerror is less than 1 mm and line distortion is less than 1◦. Our approachesare appropriate for Fishbowl VR display where viewers are usually 40 cm to60 cm away from the screen, with an eye angular error to be no more than0.14◦ [155].413.3. EvaluationApproach ProjectorsGlobal point error Line error Local point errorpixel radian degree pixel radianSemi-autoProjector 1 3.9093 0.0107 1.26105.6768 0.0167Projector 2 5.7653 0.0149 1.4643Semi-autowith NLOProjector 1 1.2556 0.0036 0.80241.7165 0.0052Projector 2 1.4586 0.0053 0.8465AutomaticProjector 1 7.1373 0.0241 1.568315.7708 0.0453Projector 2 10.5067 0.0294 2.0423Automaticwith NLOProjector 1 1.6159 0.0051 0.82681.8222 0.0056Projector 2 1.7965 0.0064 0.9298Table 3.1: Comparing results of our semi-automatic and automatic approachbefore and after nonlinear optimization (NLO) in terms of global point error,local point error and line error. Measurements come from our implementedsystem with two mini-projectors. On-surface error is expressed in radianson the sphere.Figure 3.6: Comparing results of semi-automatic and automatic approachbefore and after nonlinear optimization (NLO). The global point error de-scribes the distance between the expected point and observed point for eachprojector, while the local point error describes the inter-projector pixel dis-tance in the overlapping area. On-surface error is estimated as the Euclideandistance in millimeters on a 30 cm diameter sphere.The result is also compared with previous work based on RMS re-projectionerror in pixel. Existing work have used different metrics and setups to eval-uate the accuracy of calibration, such as using the percentage error of pa-423.3. Evaluationrameters [121] for large-scale screens [122] as well as in simulation [120]. Wechose Raskar’s work [115] which used a hemispherical dome with a diameterof 1.5 meter to compare the RMS re-projection error in pixel. Our proposedautomatic approach has the error of 1.706 pixels and the semi-automaticapproach has the error of 1.357 pixels. Our error is higher than Raskar’swork [115] with the re-projection error of 0.863 pixels, but lower than ourprevious work of 2.064 pixels [155], though all approaches are within thesame scale. Comparisons are made in each group’s own setup. Particularly,the setup of Raskar’s work [115] allowed the camera to observe the entirecurved screen, which is different from our setup with the challenge of thevisibility from the camera’s view.Figure 3.7: Observed grid pattern by camera using the semi-automatic ap-proach (a) before nonlinear optimization and (b) after nonlinear optimiza-tion. Observed grid pattern using the automatic approach (c) before non-linear optimization and (d) after nonlinear optimization433.4. Discussion and Limitations3.4 Discussion and LimitationsIn this section, we discuss factors which may affect the results as well as thelimitations of the proposed calibration approaches.Scalability: Though we evaluate the approach with only two projectors,the result generalizes to more than two projectors. To add new projectorsinto the system requires adding another stereo pair directly registered tothe world coordinate system for each projector. Because each projector isregistered independently, this does not cause cascading error as the numberof projectors increases. In practice, we applied this approach to a 12 inchthree-projector system as well as a 24 inch four-projector system as discussedin Chapter 4. The result of rendering is provided in the video 1.Visibility: One of the challenges to calibrate the Fishbowl VR system isthe visibility of the camera. While it is possible to use multiple camerasoutside the sphere which can potentially see the entire spherical surface, thesetup and calibration of such multi-camera system could also introduce alot of work. In practice we used a single grayscale camera with a cameralens as described in Section 3.3.3 with a 97 ◦ horizontal view. We observedmore error on the portion invisible to the camera after calibration comparedto the visible portion on the screen. To maximize the visibility from thecamera and potentially increase the accuracy, we suggest using camera lenswith strong distortions such as the fish-eye lens to calibrate the system.Shape: While this work focuses on the spherical shape, the calibrationapproach can be applied to other shapes. For geometric primitives withprior knowledge on the shape, such as sphere, plane and cylinder, we canreplace the sphere equation B.4 to other primitive equations, which requiresmodifications in the steps of sphere pose estimation, nonlinear optimizationand ray sphere intersection in Figure 3.3. For arbitrary shapes, in additionto the above modification, the display surface needs to be reconstructed inthe step of pair calibration after recovering the 3D coordinates of projectedpatterns via triangulation. However this can only recover visible portionof the display surface from the camera’s view without using any geometryassumptions. It is also worth noting that in practice a perfect sphericalscreen can be hardly found. To some extent the physical screen will deviatefrom the geometric assumption. Depending on the quality of the screen,1https://youtu.be/mtPR57DEMY8443.4. Discussion and Limitationswhen the deviation is large, the nonlinear optimization may converge toa local minimum with noticeable error. In that case, one may considerreconstructing the surface without using the geometric assumption. Futurework could consider using simulation to evaluate the calibration approachby introducing distortions of the spherical screen similar to [121].Uniform error: Among all the stereo pairs, there will always be a “stan-dard” pair that has theoretically minimum error. As shown in Figure 3.6,the error in Projector 1 is always smaller than Projector 2, regardless ofthe metric used. This is due to the scale ambiguity of projector extrinsicparameters. When recovering extrinsic parameters for each pair, we chooseone pair as the “standard” pair that has a translation with norm 1; then weestimate scale factors for other pairs with respect to the “standard” pair.The estimation of scale, however, is not accurate since we use the estimatedsphere pose in each pair to get a linear solution of the scale as the initialguess, while those estimated sphere poses already contain errors from theprevious step. For future work, we suggest an improved method will esti-mate the sphere pose together with all scale factors using a nonlinear leastsquare solver.Robustness: While the automatic approach can generate results veryclose to the semi-automatic one, the former is more sensitive to noise. In oursystem, if the fundamental matrix has not been estimated correctly due toincorrect feature correspondences or bad lighting, the automatic approach ismore likely to fail than the semi-automatic approach. So a trade-off has beenmade between robustness and priors, which should influence the decision onwhich to use.Uncalibrated camera: It is possible to calibrate the system without cal-ibrating the camera. The initial guess of camera intrinsics like focal lengthcan be acquired via EXIF tags of the capture image [121]. However, due tothe small projection hole in our system, we use a camera with strong lensdistortion to have a wider view on the spherical screen. The focal lengthis also adjusted to make sure that the unblocked portion is in focus. So inour case it is not appropriate to calibrate the system without having cameraintrinsic parameters as priors even with the help of nonlinear optimization.Lens distortion: For simplicity, we did not include the lens distortion ofprojectors in the calibration pipeline. In practice, we did not observe no-453.5. Summaryticeable error with the projectors we used (described in Section 4.1). Butadding the lens distortion as additional parameters in the nonlinear opti-mization would ideally increase the accuracy of calibration and should beconsidered in the future work.Evaluation metrics: Although direct comparison with existing work us-ing proposed metrics is preferable, limitations inherent in the type of displaywe are targeting differ from the situations covered by other work, which hasbeen identified as scale, visibility and perspective-correction in Section 3.1.These practical limitations of our multi-projector system are the primarymotivations to create the proposed approach. Meanwhile, we include ouradditional proposed metrics and evaluation method for use by future re-searchers that have fewer limitations in their systems.Color calibration: This work mainly focuses on the geometry calibrationof a multi-projector spherical display with the assumption that the samemodel of projectors would be employed in a single setup with negligible inter-projector color difference. Future work should consider this color differenceand propose additional color calibration steps to support a consistent viewwith considerations from both geometry and color aspects.3.5 SummaryIn this chapter, we presented an automatic calibration approach, as wellas a practical evaluation method for a spherical multiple-projector display.We identified practical problems in calibrating the display to support theapplication of Fishbowl VR displays. Our proposed approach solves theseproblems and achieves less than 1 mm on-surface point error and less than1◦ line error using a 30 cm diameter spherical screen. Although our cali-bration approach is for a spherical multiple-projector Fishbowl VR display,it can be applied to other multiple-projector systems. It should be notedthat the proposed calibration approach does not determine the scale of thedisplay. We see the estimation of the scale factor relative to the real worldas a separate problem which will be addressed and discussed as a viewpointcalibration procedure [139] in Chapter 4.46Chapter 4Visual Error Analysis forFishbowl VR DisplaysWith a display calibration approach as described in Chapter 3, it is possibleto reconstruct the display surface which can effectively support seamless ren-dering on the spherical screen. To render the perspective-corrected imageryin real-time, it further requires a fast and accurate tracking system to trackthe viewpoint. However, it is not clear what requirements and specificationsof the display and tracking system in order to create an effective FTVR sys-tem. Even small inaccuracies induced by the display or tracker may breakthe 3D illusion and cause visual artifacts. These artifacts include, but arenot limited to: distortion (straight lines appear to be curved), ghosting ef-fect (doubled image) and floating effect (virtual objects that are supposed tolocate at a fixed location appear to swim about as viewer moves head) [51].This requires the understanding of how error sources can cause inaccuraciesand visual artifacts.In this chapter, we present the design and implementation of FishbowlVR display with main results of an end-to-end error analysis. Previous chap-ter presents a display calibration approach to generate a seamless imagerywithout considering the viewpoint. In this chapter we introduce the view-point to the system and analyze the visual error from the viewpoint. Whilemany FTVR systems have been proposed, few work has included formalanalysis of the visual error. It is important to understanding the errors inFTVR displays for the following reasons. First, characterizing the natureand sensitivity of the errors helps to eliminate visual artifacts and providecorrect perspectives. Second, when building the system and deciding cali-bration approaches, one may expect different criteria for system componentsbased on the application. Error analysis provides guidelines when choosingthese components.474.1. Fishbowl VR Display4.1 Fishbowl VR DisplayThis section introduces in details the components of the Fishbowl VR dis-play including the hardware such as projectors and screens, as well as thesoftware, such as the graphics rendering approach and the display calibra-tion approach. A diagram of the Fishbowl VR display with its hardwarecomponents is illustrated in Figure 4.1(left). A photo of a Fishbowl VRprototype with virtual fruit models is shown in Figure 4.1(right). Furtherimplementation details can be found in [37].Figure 4.1: (left) Diagram of the Fishbowl VR display with its main hard-ware components. (right) Virtual 3D fruit models shown on a 12 inch spher-ical display prototype.Spherical screen: Our display uses multiple projectors rear-projectingonto a spherical surface. The inner surface of a plexiglass sphere is coatedwith a translucent projection paint (created by B Con Engineering, Inc.).The imagery is projected on the inner surface of the screen with a thicknessof 4 mm. A hole in the bottom of the sphere allows for rear-projection ontothe inner surface of the sphere from projectors mounted below as shown inFigure 4.2. The hole size is a trade-off between projector coverage, projectorplacement and roundness of the final display. We found that a hole size of75% of the diameter worked well for the projectors we use. We used twosurface configurations: a 12 inch diameter sphere with a 9 inch diameterhole (Figure 4.2, left), and a 24 inch sphere with an 18 inch diameter hole(Figure 4.2, right).484.1. Fishbowl VR DisplayFigure 4.2: A 12 inch diameter spherical prototype built with three projec-tors (left) and 24 inch diameter spherical display built with four projectorssetup (right).Projector and chassis placement: Projecting onto a curved surface canraise a few concerns. First, consumer projectors have a flat focal plane anda shallow depth-of-field; therefore the corners of the projector when pro-jected onto the sphere can be a bit out of focus. Second, the perceived lightintensity of the projectors is somewhat view dependent, which can causevariation in projector brightness as one moves around the display. In prac-tice, we have found that these optical issues are not noticeable, particularlyfor scenes that do not have a bright background. Chassis can be made froma table with a circular hole cut in the table top to allow rear-projection fromthe projectors mounted on a lower shelf below the tabletop. The placementof projectors should allow projectors to cover the spherical surface as muchas possible and each projector overlaps its neighbor by ∼10%.The optimal number of projectors depends on the size of the surface andthe desired surface resolution. Covering the surface of the sphere with moreprojectors means that each projectors’ visible patch on the surface will besmaller and therefore the effective resolution will be higher. We show twoeffective configurations: a 12 inch diameter sphere with three projectors(Asus P2B [53], 1024 x 768 resolution at 120Hz) at ∼63.49 pixels per inch494.1. Fishbowl VR Display(PPI), and a 24 inch diameter sphere with four projectors (Optoma ml750s[63], 1024 x 768 resolution at 120Hz) at ∼34.58 PPI, with PPI computed asin [37]. The multiple projector approach allows the surface resolution to bescaled up by adding more projectors.While in this work the placement of projectors relative to the screen isdetermined manually, future work can look for the optimal placement ob-tained automatically with a simulation of projection to avoid manual work,which can take parameters such as screen size and desired resolution asinput, and output the optimal placement as well as number of projectorsrequired.Stereo glasses and projectors synchronization: We use short-throwmini-projectors for a compact design. To render stereo views, the projectorsrequire a refresh rate of at least 120Hz. Fortunately, there are a few optionsfor small stereo projectors, such as the Optoma ml750s [63]. We used activeshutter glasses for our setup that are controlled wirelessly with a radio-frequency (RF) signal [67]. The NVIDIA Quadro graphics card generates ahardware synchronization signal controlling the stereo glasses. It is possibleto use polarized 3D glasses to provide stereo visualization. However, wefound the spherical screen interferes with the transition of the circularlypolarized light, causing left eye and right imagery incorrectly received withthe glasses. Special care on the lens of projectors and the coating of thescreen is required to ensure the transition of the circularly polarized light.Head tracking: FTVR requires tracking a user’s viewpoint relative tothe display screens. This could be accomplished by eye or face trackingfrom video, but for robust and low latency tracking it is often done usingmarker-based head tracking. We have explored two tracking systems inour prototypes. In the early stage of the prototype, we used Polhemusmotion tracking systems [64] with electromagnetic trackers attached to thestereo glasses in the 12 inch prototype. While the tracking system hasthe accuracy of 0.7 mm and the latency of 4 msec, the tracker needs tostay within 30 inches of the electromagnetic source to ensure the trackingaccuracy [102], which constrains the movement of users. In the later stageof the prototype, we used the OptiTrack [62] optical tracking system withpassive markers attached to the stereo glasses for head tracking and activemarkers on handheld wands for manipulating virtual pointers and objectswithin the display. This provides 5 x 5 x 5 meters of tracking volume withsub-millimeter accuracy and the latency of 8 msec.504.1. Fishbowl VR DisplayFigure 4.3: Viewpoint calibration procedure to estimate the parameters ofthe camera model for each eye with respect to the display.Viewpoint calibration: To accurately register the tracked viewpoint withrespect to the display, we employed an interactive perceptual viewpoint cal-ibration approach [138]. During the calibration, a user visually aligns 2Dpatterns on the spherical surface using each eye at a time such that patternswill appear undistorted if viewed from a known calibration position as shownin Figure 4.3. This is repeated through a set of predetermined positions thatdefine the parameters of the camera model for each eye. Note that the pat-tern and predetermined positions are generated in the display coordinatesystem, while the user is tracked in the tracking coordinate system. Theviewpoint calibration approach also estimates the similarity transformationbetween the display and tracking system, which contains the scale factorof the reconstructed display surface. We then use the calibrated geometricmodel to render the view-dependent content for each eye.514.2. Rendering4.2 RenderingWe have developed a general-purpose rendering engine for multi-screen FVTRdisplays. This section describes the rendering pipeline for Fishbowl VR.More details of the general-purpose rendering engine can be found in [38].Figure 4.4: Overview of our rendering pipeline. We generate a view frus-tum for each tracked viewpoint of the scene (A) and render to an off-screentexture (B). The mapping from projector-space to sphere-space is used tonon-uniformly sample the rendered texture to generate the pre-warped im-age for each projector (C, two projectors shown). For illustration, the pre-warped image (C) is colored to show different projector regions, including:magenta regions that are not visible on the spherical surface (because theydo not pass through the bottom hole of the sphere), yellow regions that arevisible on the spherical surface, but not from the current viewpoint, andblack regions that are alpha blended for a seamless transition in the overlapbetween projectors.524.2. RenderingThe rendering system is built with the Unity game engine [135]. As illus-trated in Figure 4.4, we use a render-to-texture pass and texture samplingto generate perspective-corrected imagery across multiple screens based onRaskar’s work [113]. In the render-to-texture pass, a virtual camera is placedin the scene at the tracked position of the user and is rendered-to-texturewith a normal rendering pass. For stereo rendering, there is an additionalpass for the additional viewpoint. The texture is then sampled in the shaderto generate per-pixel color for each projector.In a second pass, pixel locations in projector-space are transformed to thesphere surface location using the projector-to-sphere map computed duringmulti-projector calibration (as described in Chapter 3). Sphere-space loca-tions are transformed to the viewpoint camera’s clip space with the standardModelViewProjection matrix. These locations are then used to sample therendered texture saved from the last rendering pass. Finally, we render thesampled color from the rendered texture after applying alpha blending basedon the weights computed in multi-projector calibration (Chapter 3). Theper-screen texture sampling is efficient as it does not involve 3D rendering,just 2D texture-space sampling.The above process can be summarized as a two-pass rendering pipeline:Pass 11. Place a virtual camera C at the viewpoint’s position2. Compute perspective projection from the camera C3. Render the projection to a texture TPass 2. For each projector i:. For each pixel p:1. Retrieve the 3D position Vp from a look-up table of pro-jector i (computed as described in Chapter 3)2. Compute 2D coordinate UVp via transforming the 3Dposition Vp to the clip space of camera C3. Sample the texture T using UVp to retrieve the colorColorp in RGB channels4. Apply alpha blending to Colorp in its alpha channelbased on the intensity weight (also computed as describedin Chapter 3)5. Render the pixel with Colorp534.3. Visual Error AnalysisWe chose the two-pass rendering approach because of the spherical screen.The non-linearity of the spherical screen is not naturally supported by theperspective projection in a single pass. While variations of perspective pro-jection exist, such as the off-axis perspective projection [82], they are stilllinear functions. However, screens with a polyhedron shape such as cube orpyramid can consider using the generalized perspective projection such asthe off-axis perspective projection [82].To simultaneously output to multiple projectors with the capability tosupport stereo visualization, we uses a workstation graphics card that canoutput frame-synchronized video for left and right eyes. An NVIDIA QuadroK5200 graphics card was used to output and synchronize projectors on eachprototype. It is possible to scale up the system with multiple Quadro cardsto frame-synchronized video for up to 16 outputs. We use NVIDIA Mosaictechnology to unify the image of the projectors into one screen. We use anNx1 typology (where N is the number of projectors) to represent the circulararrangement of the projectors under the spherical display. Importantly,NVIDIA Mosaic synchronizes all screens in resolution and framerate forstereo rendering. Each projector has a resolution of 1024x768, making atotal Mosaic resolution of 4096x768.4.3 Visual Error AnalysisIn this section, we discuss the error model of a Fishbowl VR display. LetE¯ represents a viewpoint looking at a virtual point P as shown in Fig-ure 4.5(left). Then the intersection point D¯ between the ray E¯P and thesphere is the corresponding pixel on the spherical display. With the exis-tence of errors, D is the pixel actually displayed on the sphere. A numberof error sources can cause this discrepancy, divided into two categories asthe viewpoint error and display error:• Viewpoint Error Error in the position of the viewpoint E¯. There aremultiple sources that can cause the viewpoint error, such as the trackerjittering, tracker latency and viewpoint offset between the viewpointand tracker.• Display Error Error made in displaying the perspective-corrected im-agery, caused by the display calibration approach and projector/screendisplacement.In order to find out how these error sources can influence viewers’ visualexperience, we examine how the viewpoint and display error are applied to544.3. Visual Error Analysisthe perceived image individually. Although it would be practical to use abinocular viewpoint model, the analysis of visual error would be compli-cated and is affected by other factors such as the accommodation-vergenceconflict. For simplicity, we assume a monocular viewpoint. Despite of usinga simplified model, the result provides practical guidelines for establishingcriteria for system components and determining calibration fidelity.Figure 4.5: (left) Angular error α in the Fishbowl VR display. The dis-crepancy between the displayed pixel D and the desired pixel D¯ is causedby the viewpoint error and display error. (right) Angular error α caused byviewpoint error T in the static visualization.4.3.1 MetricWe use eye angular error α [29] as the metric of static visual error. Theangular error α can be defined to be the displacement between the displayedpixel D and the desired pixel D¯ based on the viewpoint E¯:α = arccos(DE¯ · D¯E¯‖DE¯‖‖D¯E¯‖), (4.1)As shown in Figure 4.5(left), E¯ represents the viewer’s position lookingat a virtual point P . The discrepancy between the displayed pixel D andthe desired pixel D¯ is caused by the viewpoint error and display error.4.3.2 Viewpoint ErrorThe viewpoint error can be modeled as a translation vector T between themeasured position E and actual position of the viewpoint E¯ as shown inFigure 4.5(right). Assuming we know the actual viewpoint E¯, viewpointerror T and virtual point P , we can solve D¯ via ray-sphere intersection554.3. Visual Error Analysisequations: {D¯ = E¯ + λ(P − E¯)D¯TQD¯ = 0(4.2)where Q is the quadric surface matrix of the sphere, λ is the distance fromthe actual viewpoint E¯ to the desired pixel D¯. Similarly, we can solve forD using E, T , P and Q. With D and D¯, the angular error α can becomputed using equation 4.1 as a function of the virtual point P ,measured viewpoint E¯ and viewpoint error T .4.3.3 Display ErrorDisplay error can be represented by the displacement between the desired3D position of pixel D¯ and its actual location D on the display surface inFigure 4.5(left). This error depends on the accuracy of display calibrationdescribed in Chapter 3. While we could use empirical methods to mea-sure the accuracy of calibration with additional cameras [22], we describean analytical method in this section to estimate the accuracy using errorpropagation [48].In equation 3.1, the 3D positions of projected pixels are computed us-ing ray-sphere intersection based on projector projection parameters ~pp andsphere pose parameters ~ps in the form of column vector. To understand theinfluence of projector parameters ~pp and sphere pose parameters ~ps on thedisplay error, we compute the covariance matrix of X as:ΣX = JX(p)ΣpJX(p)T , (4.3)where JX(p) is the Jacobian matrix of the vector function X(~p) withrespect to the parameter vector ~p and Σp is the covariance matrix of ~p.The parameter vector ~p can be projector parameters ~pp or sphere poseparameters ~ps.For the projector parameters ~pp, the covariance matrix Σpp can be esti-mated using backward propagation of the re-projection error once the projec-tor is calibrated. Assuming ~pp is the vector that contains projector intrinsicKp and extrinsic parameters (Rp Tp), the camera matrix of the projectorcan be expressed as:~m = Kp(Rp Tp)~M, (4.4)564.3. Visual Error Analysistranslation magnitude (cm)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2angular error (degree)00.511.522.533.5viewing distance: 30cmviewing distance: 45cmviewing distance: 60cmviewing distance: 75cmdistance between viewpoint and sphere (cm)20 30 40 50 60 70 80angular error (degree)01234567Figure 4.6: (left) Visual angle α increased by the magnitude of viewpointerror T when viewing a virtual point at the center of sphere. (right) Visualangle α increases as the viewpoint E moves closer to the sphere with 1 cmviewpoint error.where ~M is the 3D coordinates of points and ~m is the resulting imagecoordinates in 2D. The covariance matrix Σpp of ~pp is computed usingbackward propagation from equation (4.4):Σpp = (JmppTΣ−1m Jmpp )−1, (4.5)where Σm is the re-projection error during projector calibration and Jmppis the Jacobian matrix of the equation (4.4) with respect to the projectorparameter ~pp.Hence the resulting covariance matrix ΣX of equation 3.1 can be com-puted by plugging Σpp back in equation 4.3.The covariance matrix of sphere pose parameters can also be computed ina similar manner using a least square covariance computation. Assume ~ps isthe vector that contains sphere parameters xs, ys, zs and rs. The covariancematrix Σps can be estimated based on a weighted least square [130].4.3.4 ResultUsing the error model, we insert parameters based on the implementationof a 12 inch prototype described in Section 4.1. In this section, we presentthe simulation results of our Fishbowl VR design.Effect of viewpoint error: Angular error α is maximized when the di-rection of T is perpendicular to the EP . In the following sections, we assume574.3. Visual Error AnalysisEP is tangent to T so that we are evaluating based on the worst case. Fig-ure 4.6(left) shows the simulation result of α as a function of |T | while theviewer is looking at the virtual point P placed at the sphere center. Theslope increases as the viewer gets closer to the sphere, meaning α becomesmore sensitive to translation error if the viewer is closer to the display.Effect of viewpoint position: Figure 4.6(right) illustrates the simula-tion result of how the viewpoint position influences α when the viewer islooking towards a virtual point P placed at the center of the sphere. Asthe viewpoint E¯ moves away from the sphere along E¯P , α decreases dra-matically. To control α, a minimum viewing distance can be establisheddepending on the applications. For example, for interactive applications,the viewing distance is likely shorter than ones only with visualizations.Hence, the FTVR display will require a more accurate tracking system withprecise calibration approaches.Effect of virtual point position: We assume that the virtual point Pcan be placed both inside and outside the sphere. Figure 4.7 illustratesthe changes of angular error α when the virtual point P moves from Pi tothe viewpoint E¯, where i = 1 ∼ 5 indicates five different locations of thevirtual point. As a function of the virtual point position Pi, the angularerror α first decreases when Pi travels towards D¯i as shown in Figure 4.7(left), so that the virtual point away from its corresponding display pixelwill increase α. When Pi arrives at D¯i, there is no angular error since Dioverlaps with D¯i, suggesting that points on the display surface towards theviewer will not cause angular error even with the existence of viewpointerror. As Pi moves out of the sphere further toward E¯, α starts to increaserapidly, meaning points out of the sphere are more sensitive to viewpointerror. Since α depends on the position of virtual point Pi, there exists an“optimal” rendering region with relatively small angular error. For instance,to set a maximum angular error of 0.4◦, virtual objects should be renderedwithin a region which is less than 18 cm inward away from its correspondingpixel, or 5 cm outward away, denoted as -5cm to +18cm.584.3. Visual Error AnalysisFigure 4.7: Effect of the virtual point position on the angular error whenthe viewing distance is 30 cm and viewpoint error is 0.2 cm. (right) Thegeometric layout of different virtual points from P1 to P5 in a top-view withthe viewpoint E. (left) The visual error α caused by the viewpoint error Tchanges as a function of the viewer-object distance between the viewpointE and virtual point Pi. Di represents the corresponding pixel on the screendepicting the virtual point Pi. α is zero when the virtual point Pi is onthe screen overlapping with Di. α increases as Pi moves away inward fromDi, and increases more dramatically as Pi moves away outward from Di.This indicates virtual points in front of the screen are more sensitive to theviewpoint error than points inside the screen. Curves of P1 and P5, as wellas P2 and P4 are almost overlapped in (left) due to their nearly symmetriclayout in (right).Based on this analysis, we have quantitative results for three trackingsystems we use for a 12 inch FTVR prototype. The average joint accuracyfor Kinect v2 [60] using its joint tracking SDK is around 8cm [148]. For amaximum angular error of 2◦, the viewing distance should be no less than60 cm assuming a viewer is looking at a virtual object at the center. At thisminimum viewing distance, the rendering range should be -8cm to +11cmfrom the screen. For Polhemus Fastrak with a tracking accuracy of 0.2 cmwithin 150 cm of the transmitter, it is possible to yield a 0.5◦ maximumangular error system. Similarly, OptiTrack can achieve tracking accuracyof 0.1 cm, yielding an angular error of 0.25◦, making tracking systems withsub-centimeter accuracy suitable for an interactive FTVR system so viewerscan get close enough to the display.594.3. Visual Error AnalysisTable 4.1: Estimated angular error caused by different tracking systems withsuggested viewing distances and rendering regions for a 12 inch prototype.The viewing distance assumes the virtual object is rendered in the center ofthe sphere. The rendering region assumes viewing at the minimum distance.Tracking Tracking Angular Viewing Renderingsystem error error distance regionKinect v2 8cm [148] <2 degree >60cm -8cm to +11cmFastrak 0.2cm [102] <0.5 degree >30cm -6cm to +30cmOptiTrack 0.1cm [62] <0.25 degree >30cm -6cm to +30cmEffect of display error: We used the calibration results of the re-projectionerror caused by the display calibration in a 12 inch prototype described inSection 4.1. The average 3D Euclidean distance error for pixels on screenis 0.315 mm caused by the projector parameters, while the average distanceerror is 1.344 mm caused by the sphere pose parameters. This indicatesthat the error of sphere pose tends to be more influential than the error ofprojector calibration. It is important to improve the sphere pose estimationfor a smaller display error.In addition, the display error is not spatially homogeneous. Figure 4.8shows display errors computed at each pixel of a projector with the projectorand sphere pose parameters as error sources respectively. The display errorreaches peaks on the corners and falls off from the center by a factor up to tentimes the error at the center. The fringe pixels have significantly more errorsthan pixels at the center possibly resulting in a noticeable misalignmentin the overlapping area since overlaps happen mostly along the projectionfringe. Hence, it is useful to have a pixel-by-pixel optimization after theEuclidean reconstruction to minimize those local errors and alleviate themisalignment. In practice, the local display error in the overlapping areashould be no more than 2 mm to avoid making the misalignment noticeable.If we compare the display error with the viewpoint error, the display errorof 1 mm on the display surface only yields a 0.38◦ angular error with theviewing distance of 30 cm. So the viewpoint error causes significantly moreangular error than display error. This is consistent with Holloway’s resultfor HMDs as the viewpoint error is the major cause for visual artifacts likedistortions and floating effect [51]. However, it is also important to minimizethe display error since it accounts for the ghosting effect in the overlapping604.4. Discussions and LimitationsFigure 4.8: Display error varied by pixels caused by (a) projector intrinsics(b) sphere pose as error source. The display error reaches peaks on thecorners in the image plane of projectors.area of adjacent projectors, especially for view-independent applications inwhich imagery is wall-papered on the entire sphere so that overlapping pixelsare always used for rendering.4.4 Discussions and LimitationsIn summary, we found that for the viewpoint error, visual error α increases:1) as the viewpoint error increases, with its slope influenced by the viewingdistance; 2) as the viewer moves closer to the sphere; 3) as the virtual point isfurther from the screen. For the display error, we found that the parametersof sphere pose plays an important role in recovering the 3D coordinates ofpixels on the display. The fringe pixels have significantly more errors thanpixels at the center possibly resulting in a noticeable misalignment in theoverlapping projector area.While the error model allows us to establish bounds and requirementsfor the tracking system and calibration approach, it is important to pointout several assumptions and limitations of the error model. Firstly, eacherror source is analyzed independently. Hence the estimation of visual erroris based on the assumption of the independency of error sources, which maynot hold for cases when one error source depends on another. For example,for the display error, the estimation of sphere pose may depend on theprojector parameters if the calibration approach uses projectors to estimatethe sphere pose, which could potentially lead to cascaded error higher thanthe estimated bounds. Secondly, the error model is based on monocular614.5. Summaryviewing while in practice users visualize FTVR displays with both eyes.Future work is required to expand current approach to stereoscopic viewing.Lastly, the error model is based on the assumption of static visualizationwhen viewers remains static with respect to the display. It provides limitedinsights of the temporal resolution such as the latency of tracking. Futurework are required to include the analysis of dynamic visualization whenusers move around the display.Being aware of such assumptions, we see the proposed error model usefulin providing guidelines for system designs as where to spend the time in orderto improve the visual experience. It also gives some insights of what level oferrors can be expected for a given set of hardware and software.4.5 SummaryIn this chapter, we present the design and implementation of the FishbowlVR display by introducing the tracked viewpoint to the calibrated sphericaldisplay with the approach described in Chapter 3. We also present the mainresults of an end-to-end error analysis of the display in terms of the visualerror from the viewpoint. By characterizing the nature and sensitivity ofthe errors, we found: viewpoint error causes significantly more eye angularerror than display calibration error; angular error becomes more sensitive totracking error when the viewer moves closer to the sphere; and angular erroris sensitive to the distance between the virtual object and its correspond-ing pixel on the surface. Taken together, these results provide practicalguidelines for building a Fishbowl VR display in choosing the system hard-ware and software components and can be applied to other configurationsof geometric displays.62Chapter 5Evaluation of Depth and SizePerceptionFTVR displays create a compelling 3D spatial effect by rendering to theperspective of the viewer with head-tracking. Combining FTVR with aspherical display has the potential to enhance the 3D experience with uniqueproperties of the spherical screen such as the enclosing shape and borderlessviews from all angles around the display. The ability to generate a strong3D effect on a spherical display with head-tracked rendering is promisingfor increasing user’s performance in 3D tasks. An unanswered question iswhether these natural affordances of the spherical screen can improve spatialperception in comparison to traditional flat screen.It is important to provide better spatial perception as it can improveusers’ performance in 3D tasks which rely on the depth and size perceptionin 3D applications such as Computer Aided Design (CAD). In this chapter,we described an experiment to determine whether users can perceive thedepth and size of virtual objects better on a Fishbowl VR display comparedto a flat FTVR display.5.1 IntroductionTraditional FTVR displays use a high resolution desktop flat display coupledwith head-tracking to generate view-dependent imagery. The latest advanceof FTVR uses multiple mini-projectors to make it scalable and high resolu-tion with uniformly spaced pixels [157]. Arranged inside a geometric shape,such as a box [127], a cylinder [77] or a sphere [157], it can render 3D scenesby coupling perspective to the user’s viewpoint with wide viewing angles.Among different shapes of displays, the spherical shape is of particular inter-est as it provides seamless and unobstructed views from all angles. Used as a2D surface display, previous studies have identified several unique propertiesof the spherical screen when rendering images directly on its surface [10]. Inaddition to the unobstructed 360-degree field of view, the continuous nature635.1. Introductionof the curved shape provides smooth transition both horizontally and verti-cally [10]. The enclosing nature of the sphere inherently defines the virtualspace within the screen. In contrast to flat displays, spherical displays donot have a fixed “sweet spot” of viewing as the normal of the screen is alwaysparallel to the viewing direction regardless of the user position.These properties are ideal for creating FTVR visualizations. While anumber of spherical display prototypes have been reported [10, 16, 133], itis not clear whether the spherical shape factor can effectively improve per-formance in the 3D space when used for FTVR compared to the traditionalplanar FTVR display. Hence, in this work, we investigate whether view-ers can benefit from the spherical shape factor with additional depth cuessuch as stereopsis and motion parallax. Particularly, we are interested thatwhen coupled with head-tracking, does the compelling 3D effect providedby Fishbowl VR displays improve viewer’s spatial perception.The spatial perception of the screen itself and the projected imageryon the screen are inherently conflated. Visual cues of rendered objects aredirectly depicted on the screen surface. The physical representation of thescreen shape provides the connection between the real world and the virtualenvironment (VE). The screen has a geometric indication that conveys thespatial locations of virtual objects situated in the real world. This implica-tion may influence a viewer’s spatial perception. Depth perception, for ex-ample, is of particular interest. When displaying virtual objects, the screeninherently serves as a depth reference that indicates the relative distance be-tween virtual objects and the viewer. The enclosing nature of the sphericalscreen might anchor the depth perception differently from the planar screen.In a similar way, size perception is another example that could be affectedby the shape of the screen. Depth and size perception are intimately relatedas objects appear smaller when they are further away. Size has been usedas an indirect measure of the depth [73] under the assumption of the sizeconstancy. Size constancy refers to the ability to perceive the same size ofan object regardless of its distance from the observer even though the retinalsize of the object changes [140]. When perceiving a virtual object, to somedegree, we can judge the size either based on its spatial size as if it existsin the 3D space, or the 2D size on the screen [76, 142]. Perhaps the screenfactor may affect which mode the we see in the VE.Despite substantial perception studies in VEs, few empirical studies havebeen done regarding the shape factor of the screen. To investigate this ques-tion, we conducted an empirical user study to assess the spatial perceptionwith the Fishbowl VR display in comparison to a flat FTVR display usinga depth ranking task and a size matching task. The result of the study645.2. Depth and Size Perception Studyhelps us to understand the potential benefits and challenges of Fishbowl VRdisplays as well as provides directions for future 3D applications suitable forspherical displays.5.2 Depth and Size Perception StudyWe conducted a user study to evaluate user’s perception on the depth andsize information when visualizing the 3D stimuli on a Fishbowl VR displayin comparison to a flat FTVR display. The purpose of this study is toevaluate the capability of the Fishbowl VR display to provide effective spa-tial perceptions relative to the traditional flat screen. The spherical shapeis of particular interest due to its enclosing shape, borderless screen andconsistent curved surface.5.2.1 ApparatusWe used the 12 inch Fishbowl VR prototype described in Section 4.1 to con-duct the experiment. To minimize differences in the apparatus, we utilized aprojector (Asus P2B [53]) with the same 1024x768 pixel resolution and 120Hz frame rate (60 Hz each eye) to back-project on a planar screen to builda flat FTVR display. It has the physical screen size of 36cm x 27cm, whichhas a similar visible screen area with the spherical screen of 30 cm diameter.Both the spherical and flat FTVR display are coupled with Polhemus track-ing systems which track viewpoints at the update rate of 60 Hz with theaccuracy of 1 mm. They both provide stereopsis and perspective-correctionso users can physically move their viewpoint while visualizing stereo images.The stereo images are generated by 120 Hz projectors and synchronized withshutter glasses. The flat FTVR display experimental setup is shown in Fig-ure 5.1.A Unity [135] application was developed for the experiment to renderthe stimuli as describe below and record the answers as well as the headmovements of participants. A virtual depth axis with 12 tick marks and avirtual pool ball was rendered in the Unity application. We used a stan-dard international billiard ball with the diameter of 57.15 mm as shown inFigure 5.2. We used a post-experiment questionnaire and a demographicquestionnaire in the study which can be found in Appendix C.1.655.2. Depth and Size Perception StudyFigure 5.1: Flat FTVR display experimental setup in this study. (left)Mini-projector back-projects onto a planar screen with the projection sizeof 36cm x 27cm. (right) Flat FTVR display shows stimulus to a user in thestudy. The projector has the same resolution and brightness as ones usedin the Fishbowl VR display.5.2.2 TasksAs our goal is to investigate whether a spherical display surface can improvespatial perception with the FTVR display, we measured the exocentric depthperception as well as size perception. Depth and size perception are tightlycoupled by nature. They are important as various 3D interactions suchas selection, translation and scaling depend on them. We evaluated depthand size perception using a depth-ranking task [45] and a size-matchingtask [76] respectively. We used these two tasks due to the effectivenessdemonstrated from previous researchers’ results as discussed in Section 2.2.4.Our present study seeks to provide evidence on spatial perception with abetter understanding of the nature of spherical displays. It also providessome insight on size perception with FTVR displays as there has been fewsize perception studies with FTVR displays.Task 1: Depth Ranking Participants were required to rate the depthof a pool ball floating in a virtual scene. The task is similar to the taskspreviously used by Grossman [45] and Benko [9] in evaluating the volumetricdisplay and spatial augmented reality respectively. In our implementation,a depth axis is presented with 12 tick marks uniformly distributed on theaxis labeled as 0-11 with 2cm spacing. The depth axis and tick marks are665.2. Depth and Size Perception Studyvirtually placed behind the screen with the first tick mark located at thefront screen. A pool ball was drawn floating above the axis as shown inFigure 5.2.The task of participants was to rate the depth of the ball by selecting atick mark at the same depth. In our experiment, stimuli balls are presentedin three possible regions: near (tick 2-3), mid (tick 5-7) and far (tick 9-10).For each trial, the ball will appear in any of the three regions.Figure 5.2: (left) Stimulus used in the experiment, including a virtualpool ball, a depth axis and a physical pool ball. The physical pool ballis a standard international billiard ball with the diameter of 57.15mm. Inthe depth ranking task, participants were required to rate the depth ofthe virtual ball relative to the depth axis. In the size matching task, theyadjusted the size of the virtual ball to match with the physical ball. (right)A top-down view of the stimulus used in the experiment.Task 2: Size Matching As shown in Figure 5.2, participants were pre-sented with both a real pool ball and a virtual pool ball in the display. Theparticipants’ task was to adjust the size of the virtual ball until they per-ceived the size of the virtual ball to be identical to the real ball. This taskis similar to the tasks previously used by Kenyon [76] and Kelly [123]. Inour implementation, the real ball with the diameter of 57.15mm was placed675.2. Depth and Size Perception Study3cm in front of the display. Participants can touch and grasp the real ballto perceive its size while adjusting the virtual ball to match to the real ballusing a controller. Similar to the depth ranking task, the virtual pool ballwill appear in one of the three regions (near, mid and far) with differentinitial sizes.We chose a pool ball as a visual stimuli due to its isotropic shape. Thisis to encourage participants to perceive and match the size based on thevolume in 3D rather than a 1D feature such as the length of an edge.5.2.3 ParticipantsSixteen participants (11 male and 5 female) from University of BritishColumbia were recruited to participate in the study with compensation.Ethic was approved by the University of British Columbia Behavioural Re-search Ethics Board (H08-03005). The average age of all participants was29 years old from 18 to 35 years old. In a post-experiment demographicquestionnaire, we asked participants to rate their expertise with 3D inter-faces such as 3D video games and 3D design on a scale from 1 (“novice”) to4 (“expert”). Their answers showed an average score of 1.75 ranging from1 to 4. We also asked participants to provide their previous experience andusage with Virtual Reality with a scale from 1 (“never”) to 4 (“regularly”).The average score of all participants was 1.88 between 1 and 4.5.2.4 ProcedureEach participant was first given a brief introduction to the two displays.They started by filling the consent form. Then they were instructed to sitfacing the display with the viewing distance of 0.7m in both display condi-tions. Participants could move their viewpoint by leaning the upper bodyto the left/right facing the screen with the angle approximately 150 degrees.Their movement was limited to the volume by requiring them to stay seatedat a fixed spot facing the display in order to prevent them from takingviewpoints which would trivialize the depth ranking task. They physicallymoved from one display to another after completing one display condition.In both tasks, participants received verbal instructions to understand theprocedure. They practiced three trials prior to the formal trials per task perdisplay condition.In the depth ranking task, they conducted 12 trials in a row per displaycondition, with four of the near, mid and far viewing distances respectively.The sequence of the trials is randomized. All tested depths were within 1685.2. Depth and Size Perception Studymeter. Participants controlled a cursor in the scene to choose the tick markusing a controller. They pressed buttons on the controller to confirm theresult and proceed to the next trial. In total, each participant conducted24 formal trials plus 6 practice trials. It took about 20 min to complete thedepth ranking task.In the size matching task, they also conducted 12 trials per display con-dition, with four of the near, mid and far viewing distances respectively.Each viewing distance includes four initial sizes, two larger and two smallerones than the real ball with a random scale between 1.08 and 1.26 for largeballs, as well as the scale between 0.73 and 0.91 for small balls, respec-tively. Participants adjusted the size of the virtual ball using a controllerand pressed buttons to confirm the result once they perceived the size ofthe virtual ball was identical to the real ball. For the size matching task,each participant conducted 24 formal trials plus 6 practice trials across twodisplay conditions. It took about 20 min to complete the size matching task.At the end of each display condition, participants were presented witha questionnaire. For each task, we presented three Likert-scale questionsto participants and asked them to rate each with a number in the range -2(“totally disagree”) to 2 (“totally agree”). The three questions addressedthe easiness, intuitiveness and confidence when performing the task with thedisplay. After completing each task (2) per display condition (2), partici-pants answered the three questions(3), resulting in a total of twelve ques-tions (2x2x3). Once they completed the entire experiment, they answereda demographic questionnaire and completed the previous post-experimentquestionnaire with additional two Likert-scale questions on the overall real-ism and enjoyment per display condition. Finally, they were asked to ratetheir preferences and specify reasons between displays in both tasks. Eachparticipant performed all parts of the experiment in one session. The entiresession took about 50 min.5.2.5 Experiment DesignThe study used within-subjects design to evaluate performance across thetwo display conditions using two tasks. The two display conditions repre-sent the two display devices of a flat FTVR and a Fishbowl VR display.The order of display conditions was counterbalanced to reduce the effects ofordering. In each display condition, participant performed the experimentby completing the two tasks and then filled a post-experiment question-naire. The two tasks always appeared in the same order. We described theindependent and dependent variables in this section for the two tasks.695.2. Depth and Size Perception StudyTask 1: Depth Ranking We investigated two independent variables asDisplay and Distance with two levels (Sphere/Flat) for Display and threelevels (Near/Mid/Far) for Distance respectively. Subject performance wasevaluated quantitatively based on the dependent variable ErrorMagnitude,defined as the average absolute distance between the reported answer andthe correct answer. Distance is included as a factor since we expected theErrorMagnitude increases as the viewing distance increases.Task 2: Size Matching We investigated three independent variables asDisplay, Distance and InitialSize with two levels (Sphere/Flat), three lev-els (Near/Mid/Far) and two levels (Large/Small) respectively. We includeInitialSize as a factor because size judgments showed an anchoring effect,by which the adjusted ball size on a given trial tended to be biased towardthe initial ball size set in the beginning of that trial [123]. Similar to previoussize perception studies [76, 91], subject performance was evaluated based onthe measure SizeRatio and AbsoluteError. According to [76], SizeRatiorepresents the relative size of the virtual ball compared to the size of thereal ball:SizeRatio =BallSizeSetByParticipantCorrectRealBallSizeThe numerator is the adjusted ball size reported by participants as iden-tical to the real ball size, which varied in each trial. The denominator isthe size of real ball fixed at 57.15mm. Ideally, SizeRatio would be 1 ifparticipants adjust the ball size without any error to match the real ball.SizeRatio can indicate the under or over-estimation of the size. It iseffective to investigate the influence of InitialSize. However, it is not agood measurement of the accuracy since SizeRatio can be above or below1 so that the average SizeRatio can be 1 while the absolute mean error ismuch greater than zero. As a result, AbsoluteError is another dependentvariable calculated to examine the deviation between the ideal result andSizeRatio:AbsoluteError = |1− SizeRatio|For both tasks, we tracked the head movement and computed the nor-malized head movement , defined as the amount of head motion in metersper second per trial. As the viewing duration were different for trials andparticipants, we normalized the amount of head movement by dividing itwith the time spent per trial.705.2. Depth and Size Perception StudyFigure 5.3: (left) Error magnitudes with means and 95% confidence interval(CI) for the depth ranking task and size matching task. (right) Normalizedhead movement with means and 95% CI for the two tasks, computed as theamount of head movement divided by the time spent per trial.5.2.6 ResultTask 1: Depth RankingRepeated measures two-way ANOVA (2 Display x 3 Distance) was car-ried out on the ErrorMagnitude. Results revealed an effect of Display(F (1, 15) = 11.00, p < 0.01), but not of Distance (F (2, 30) = 0.287, p =0.753) nor was the interaction (F (2, 30) = 0.448, p = 0.643). The FishbowlVR display (M = 0.50, SE = 0.22) has significantly lower ErrorMagnitudethan the flat FTVR display (M = 3.25, SE = 0.83). These results are shownin Figure 5.3 (left). A complete table of results can be found in Appendix DFigure D.1.ANCOVA on the participants’ expertise on VR and 3D interfaces wascarried out on the ErrorMagnitude with the Display as independent factorand their expertise on VR and 3D interfaces as covariate. The results didnot show significant difference on the covariate of VR expertise (F (1, 29) =0.297, p = 0.590) or 3D interface expertise (F (1, 29) = 1.89, p = 0.180) onErrorMagnitude.Task 2: Size MatchingRepeated measures three-way ANOVA (2Display x 3Distance x 2 InitialSize)was carried out on the AbsoluteError with factors of Display, Distance andInitialSize. Results revealed an effect of Display (F (1, 15) = 5.132, p <0.05), but not ofDistance (F (2, 30) = 2.093, p = 0.141), InitialSize (F (1, 15) =0.546, p = 0.472), nor any interactions. The Fishbowl VR display (M =0.041, SE = 0.008) has significantly lower AbsoluteError than the flat715.2. Depth and Size Perception StudyFTVR display (M = 0.055, SE = 0.010). These results are shown in Fig-ure 5.3 (left). A complete table of results can be found in Appendix DFigure D.2.Repeated measures three-way ANOVA (2 Display x 3 Distance x 2InitialSize) was carried out on the SizeRatio with factors of Display,Distance and InitialSize. Main effects ofDisplay, Distance and InitialSizewere found as Display (F (1, 15) = 4.88, p < 0.05), Distance (F (2, 30) =5.13, p < 0.05) and InitialSize (F (1, 15) = 36.64, p < 0.001). Results re-vealed two-way interaction effects betweenDisplay andDistance (F (2, 30) =5.13, p < 0.05), Display and InitialSize (F (1, 15) = 38.71, p < 0.001), aswell as Distance and InitialSize (F (2, 30) = 12.97, p < 0.001), but not onthe three-way interaction (F (2, 30) = 1.47, p = 0.245). A complete table ofresults can be found in Appendix D Figure D.3.Post-hoc pairwise t-test with Bonferroni correction 2 for the interac-tion Display x Distance shows an effect of Distance when Display is Flatbetween Distance Far and Near (t = 4.23, p < 0.01), as well as Far andMid (t = 2.93, p < 0.05), but not when Display is Sphere. For the in-teraction Display x InitialSize, there is an effect of Display when theInitialSize is Small (t = 3.834, p < 0.01), but not when the InitialSize isLarge. InitialSize has an effect in both Flat display (t = 7.835, p < 0.001)and Sphere display (t = 2.926, p < .05). For the interaction Distance xInitialSize, there is an effect of InitialSize when Distance is Far (t =7.468, p < 0.001) and Mid (t = 4.859, p < 0.001), but not when Distanceis Near. Distance has an effect when InitialSize is Large (t = 5.473, p <0.001) but not when InitialSize is Small.ANCOVA on the participants’ expertise on VR and 3D interfaces wascarried out on the AbsoluteError and SizeRatio with the Display as in-dependent factor and their expertise on VR and 3D interfaces as covari-ate. The results did not show significant difference on the covariate of VRexpertise (F (1, 29) = 0.659, p = 0.423) on AbsoluteError or (F (1, 29) =0.829, p = 0.370) on SizeRatio, as well as 3D interface expertise (F (1, 29) =0.0369, p = 0.849) on AbsoluteError or (F (1, 29) = 2.223, p = 0.147) onSizeRatio.2As the two-way interaction effect Distance x InitialSize was not relevant to differ-ences between display conditions, it was not analyzed further in the post-hoc simple effectanaylsis.725.2. Depth and Size Perception StudyFigure 5.4: Participants’ ratings with means (circle), medians (cross) and95% CI, from -2 “totally disagree” to 2 “totally agree”. They reported rat-ings on the confidence, easiness and intuitiveness for the depth ranking andsize matching task, as well as the overall impression regarding the enjoymentand realism per display condition.Head Movement AnalysisWe conducted pairwise t-test on the normalized head movement betweenthe Fishbowl VR and flat FTVR display. For the depth ranking task, at-test indicated that there is no significant difference on the normalized headmovement (t = −1.02, p = 0.322). For the size matching task, the FishbowlVR display has significantly more head movement than the flat FTVR dis-play (t = −2.76, p < 0.05) as shown in Figure 5.3 (right). A complete tableof results can be found in Appendix D Figure D.4 and D.5.Post-Experiment QuestionnaireA Wilcoxon Signed Rank Test was performed on the Likert-scale questions.There were significant differences on the easiness (W = 78, p < 0.01), in-tuitiveness (W = 62, p < 0.01) and confidence (W = 45, p < 0.01) for thedepth-ranking task, but not for the size-matching task. There were also sig-nificant differences on the overall impression of realism (W = 72.5, p < 0.01),and enjoyment (W = 15, p < 0.05) between display conditions. The results735.2. Depth and Size Perception StudyFigure 5.5: The two-way interaction effects between (a) Display xInitialSize, as well as (b) Display x Distance. The size perception is lessinfluenced by Distance as well as InitialSize on the Fishbowl VR displaythan on the flat FTVR display.of questionnaire are summarized in Figure 5.4. For the preference data,75% participants believed they performed better on the Fishbowl VR dis-play for both tasks. There were 93% and 87.5% participants preferring theFishbowl VR display for the depth-ranking and size-matching task respec-tively. A Chi-Square analysis shows significant differences on the perfor-mance (χ2 = 14, p < 0.001) and preference (χ2 = 12.25, p < 0.001) for thedepth-ranking task, as well as the performance (χ2 = 12.875, p < 0.01) andpreference (χ2 = 11.267, p < 0.001) for the size-matching task. A completetable of results can be found in Appendix D Figure D.6.Crossover between Depth and SizeBecause depth and size perception are tightly coupled by nature, we seek toinvestigate whether the under/over perception occurs on both size and depthjudgments in two display conditions. On the flat FTVR display, 86.5% errorswere underestimating the depth and 83% errors were overestimating the size.On the Fishbowl VR display, 50% errors are caused by underestimating thedepth and 67% errors are caused by underestimating the size. In short,participants tend to underestimate the depth and overestimate the size onthe flat FTVR display; on the Fishbowl VR display, there is no clear signas whether they under or over estimated the depth and size. As as result,we did not find valuable crossovers between the two tasks.745.3. Discussion and Limitations5.3 Discussion and LimitationsThe Fishbowl VR display shows superiority in both tasks compared to theflat display. In the depth ranking task, participants performed much betteron the Fishbowl VR display with the ErrorMagnitude of 0.5 in comparisonto 3.25 on the flat display, as shown in Figure 5.3(left), which yielded a depthaccuracy of 1cm on the spherical display and 6.5cm on the flat display. Thedifference of performance is consistent with the results of the questionnaire.Participants reported significantly higher scores in the spherical conditionon confidence, easiness and intuitiveness as shown in Figure 5.4.In the size matching task, although the difference of performance isstatistically significant between display conditions, the improvement wasmarginal with the AbsoluteError of 4.1% in the spherical condition and5.5% in the flat condition, which yielded a size accuracy of 2.34mm on thespherical display and 3.14mm on the flat display. Inconsistent with theperformance data, participants rated similar scores over the two displays inthe questionnaire. No significant difference has been found on confidence,easiness or intuitiveness. This indicates the spherical shape did not showsignificant superiority on the accuracy in the size study as much as it showedin the depth study. However, we found an interesting interaction effect onthe dependent variable SizeRatio, implying that the strength of the spher-ical shape in the size study may not be reflected as higher accuracy butrather as better consistency, which we’ll discuss further in this section.5.3.1 Size ConstancyWhile the difference of AbsoluteError in the size study is relatively small,the interaction effect on the SizeRatio betweenDisplay andDistance showsthat Distance influenced user performances differently in the two displayconditions. The different slopes in Figure 5.5 (b) indicates that the sizeperception is less influenced by Distance on the spherical display than on theflat display. When matching the balls with different Distances, participantshad better task precision in the spherical condition.This interaction effect has a better interpretation as size-constancy [76].Failure to preserve size-constancy makes an observer judge size based onvisual angle so that the perceived size of an object shrinks with increasingviewing distance. In our study, if size-constancy is preserved and dominatesacross different distances, the slope is zero with the SizeRatio of 1 for allstimulus in Figure 5.6. To analyze size-constancy, we fit the SizeRatio datawith a regression line over different distances for each display. As shown755.3. Discussion and Limitationsin Figure 5.6, the positive linear relationship between SizeRatio and theviewing distance indicates that participants have a greater tendency of es-timating based on the visual angle rather than preserving size-constancy.Paired t-test on slopes based on the standard error of regression models[3]shows the regression slope with the flat display is significantly larger thanthe slope with the spherical display (t = 2.85, p < 0.05). This indicates thatperceptual judgments of the size on the spherical display better preservedthe size-constancy in comparison to the judgment on the flat screen. Partic-ipants have a greater tendency to base their judgments on the visual anglewhen using flat displays.Figure 5.6: Linear regression of SizeRatio versus the viewing distancefor all participants in two display conditions with the slope of regressionindicating size-constancy. If the size-constancy is completely met, SizeRatiowill be a constant of 1 with the slope of zero. The rise in the size-ratio withthe increasing distance indicates that participants are performing more likevisual angle than size-constancy.Another interaction effect Display x InitialSize also shows interestingperformance consistency as a hysteresis effect of the InitialSize. As illus-trated in Figure 5.5 (a), the distinct slopes show the hysteresis effect isdifferent in the two display conditions. The size perception on the spherical765.3. Discussion and Limitationsdisplay appears to be less influenced by the InitialSize compared to the flatdisplay. Consequently, when matching the balls with different InitialSizes,participants had better task precision in the spherical display condition.When similar size-judgment tasks were conducted in the real world, aprevious study found that participants were less accurate at perceiving thesize in virtual world, and the effect of the Distance factor was greater inthe virtual world than in the real world [128]. Notably, in our study thespherical screen provides task performance closer to the performance in thereal environment with better accuracy and consistency compared to a flatscreen. Perhaps this indicates that the spherical screen may provide bettervisualization in a way closer to the real world. Consistent with the taskperformance, we found that the rate of the realism on the spherical displayis significantly higher than the flat display (W = 72.5, p < 0.01) as shownin Figure 5.4. Eight out of sixteen participants further commented thatthe pool ball looks more 3D-like and realistic on the spherical display whenasked about reasons of their preferences on displays. The level of realismmay influence the visual cues that participants take to infer the position andsize of stimulus. In our study, there are multiple visual cues, including 2Dcues, such as the projected ball’s 2D size on the screen surface, as well as3D cues, such as stereopsis and motion parallax. When a user performs thetasks purely based on 2D cues, the performance will be greatly influenced byDistance and InitialSize. However, if they use more 3D cues, this will leadto a more consistent performance across different Distance and InitialSize.When participants felt the objects appear more realistic when viewing thestimulus on a particular display, it is more likely they counted more on the3D cues rather than 2D cues. This is also referred by Ware as the “dualityof depth perception” when people perceive pictures of objects [142], whichis further investigated in Chapter 6. For 3D displays, it is necessary to makesure that the screen effectively conveys spatial information so that 2D cueswill not overrule 3D cues.5.3.2 VisibilityDue to the nature of the spherical screen, viewers are not able to see theentire curved screen simultaneously. So we made the screen size of the flatFTVR display approximately the same as the visible area of the sphericalscreen. This is also to reduce the influence of screen size on the perceptionof stimulus size. When rendering an object with the same distance to theviewer, the object can be cut-off by the screen edge on the flat display if theviewpoint moves to the side, but not on spherical display as it is borderless.775.3. Discussion and LimitationsAs illustrated in Figure 5.7, the further away the object is to the viewer, thesmaller the Range of Visibility (RoV) is before the object gets cut-off. Thisresults in a distance-dependent RoV on the flat display but a constant RoVon the spherical display. In our study, viewpoint movements are constrainedsince participants are required to sit on a fixed chair when viewing thestimuli. The angle of head movements facing the screen is about 150 degreein both display conditions so they could not see the side view. However, thelimited RoV for objects far away may still discourage participants to movetheir viewpoints so the motion parallax cue has not been exploited on theflat display.Figure 5.7: Limited visibility on the flat FTVR display for displayingobjects behind the screen. The object is cut-off by the screen edge of flatFTVR display if the viewpoint moves to the side, but not on the FishbowlVR display. The further away the object is to the viewer, the smaller thevisibility is before the object gets cut-off.In particular, the RoV has a large impact on the performance of thedepth ranking task. As shown in Figure 5.3 (right), the head movementwas much more frequent in the depth-ranking task than the size-matchingtask. It is a task evaluating the exocentric depth in which participants re-lied heavily on head movements. If the ball gets cut-off by the screen edge,the information is then lost, making it harder to determine the correct tick.Eight out of sixteen participants commented that they got better perspec-785.3. Discussion and Limitationstives on the spherical display when asked for reasons of their preferencesbetween displays in the questionnaire. Participants may obtain more infor-mation on the relative distance by making use of the spherical borderlessfeature. It would be interesting to further explore this issue and compareresults with large flat screens so that the stimuli will not be cut-off by thescreen edge.5.3.3 Head MovementAs shown in Figure 5.8, head movements were mostly side-to-side in bothtasks to provide better side views. The movements appear to be more con-strained vertically in the flat condition than the spherical condition, likelydue to the spherical continuously varying surface which provides visible con-tents changed by the viewpoint heights. Their movements appeared to forma curved path around the virtual ball to keep a consistent viewing distancewith respect to the stimuli target. In the spherical condition, this resultsin a nearly constant distance to the screen surface while the screen distancevaries in the flat condition due to the nature of the screen shape. This dif-ference might influence users’ performance if we consider the vergence/focusconflict: viewers converge toward the virtual object but focus eyes on thedisplay surface. As discussed by Bruder [20], the conflict might become moreevident if viewers get closer to the screen surface while moving their head.In our study, while participants are constrained to sit, we did not limit thetype of their head motion. So if they naturally moved their head following acurvature when trying to see different sides of the object, they could keep arelatively constant screen distance on the spherical display. In this case, thespherical screen shape seems to reduce this conflict, though future study isrequired to investigate on that.In the depth-ranking task, participants relied heavily on head movementsto find the closest tick to the stimuli ball in both display conditions. Whileparticipants actively moved viewpoints in both conditions as shown in Fig-ure 5.8 (a)(b), we did not find a significant difference on the normalizedhead movement shown in Figure 5.3(right). As their movements were con-strained on a chair, it is likely that they exploited the possible movementswhile staying on the chair in both conditions, resulting in similar amount ofhead movements in two display conditions.In the size-matching task, participants showed less head movements inboth conditions. While their movements were equally constrained in bothconditions, they moved more frequently in the spherical condition than theflat condition (t = −2.76, p < 0.05) as shown in Figure 5.8 (c)(d). In our795.3. Discussion and Limitationsobservation, participants were more likely to stay at the same viewpointin the flat condition, while they frequently moved viewpoints to comparethe size between the real ball and virtual ball in the spherical condition,implying that they took the motion parallax cue more with the sphericalscreen. This is also consistent with Ponto’s shape-matching study in whichthey found that subjects produced smaller errors when they were allowedto move freely in the environment [103]. As moving viewpoints is a naturalbehavior in the real world, this indicates participants felt more natural andrealistic when viewing the 3D scene on the spherical screen.Figure 5.8: Head movement visualization of all participants in the depthstudy with the (a) flat FTVR display and (b) Fishbowl VR display, as wellas in the size study with the (c) flat FTVR display and (d) Fishbowl VRdisplay. Blue dots represent the viewpoint positions where participants wereviewing the stimulus throughout all trials.While participants were instructed to sit on the chair, we did not observethem trying to stand up and move around the spherical display in the sizestudy. As they were only using part of the spherical display space shownfrom their head movement, it also implies that a whole sphere screen maynot be necessary for the size-matching task. A hemisphere screen facing the805.3. Discussion and Limitationsviewer could potentially be sufficient to support the range of head motion,though future work is required to find the minimum area of the sphericalscreen without compromising the task performance.The head movements finding diverged from the Kenyon’s size-constancystudy [76] in which they found side-to-side head movements were small or ab-sent. One possible reason is the difference between the experiment stimulusused. In their experiment, they used a Coke bottle which had a vertical sizefeature. The side-to-side head movement is less helpful when participantstried to adjust the height of the bottle. This also indicates that the results ofsize perception experiments might be sensitive to different stimulus, whichwill be discussed in Section 5.3.4.5.3.4 Indications to FTVR Studies and ApplicationsAs there have been few size perception studies with FTVR displays, our workseeks to provide insights on the size perception with FTVR displays. Usingthe same size-matching task, we compared our data to the Kenyon’s studyon the CAVE display [76] with the Near viewing distance of 1.07m in theirsparse environment condition. As shown in Table 5.1, both AbsoluteErrorand size-constancy3 appear to be better on FTVR displays, indicating FTVRdisplays might be promising to provide better size perception in the VE.While the findings are encouraging, we should be aware of several aspectsof the experiment which the results could be potentially sensitive to. Iden-tifying them would be helpful for designing future related experiments. AsPonto mentioned in [103], the accuracy of the perceptual calibration couldinfluence the size perception. In our study, we used a perceptual calibra-tion approach to determine viewing parameters of each eye. In practice, wenoticed that inaccurate calibration could possibly make stationary objectsappear to move when the viewer changes the position, similar to the floatingeffect mentioned in Ponto’s study [103]. Hence it is important to employ areliable and accurate perceptual calibration method for this type of study.Another factor that may influence the result is the visual stimuli. We chosea spherical stimuli in our study due to its isotropic property. It remains tobe a question that whether a stimuli with different shapes such as a cubewould reproduce similar results. Future controlled experiments are neededto compare the size perception with different stimulus on different shapes ofscreens. Using the methodology of this study, we plan to expand the shapefactor from the spherical to cubic and cylindrical displays. Further compar-3The slope value of 0.36 is after converting the value of 0.11 from feet to meters.815.3. Discussion and LimitationsTable 5.1: Comparison of the size perception on different 3D displays, in-cluding the CAVE display from existing work [76], and our work.Metric CAVE Flat FTVR Fishbowl VRAbsolute-Error 20% 5.5% 4.1%Size-Constancy Slope 0.36 0.21 0.045isons with immersive displays such as CAVE and HMD displays would alsohelp us to understand the impact of display factors on the depth and sizeperception.Realistic visual perception is crucial in 3D applications such as Com-puter Aided Design (CAD) systems as they need to accurately depict thesize, shape and position of the object. Misperception of the distance andsize of virtual models would decrease the performance of tasks ranging from3D scaling to assembly in CAD. The ability to preserve size-constancy al-lows users to effectively position, scale and design prototypes in the virtualworld. It is not just about making correct size judgments between two vir-tual entities, but also preserving consistent size perception between the realand virtual environment such that the perceived scale of a virtual model isconsistent when it gets fabricated in the real world. In comparison to other3D displays such as HMD, the FTVR display is promising to provide consis-tent perception as it allows users to see virtual contents situated in the realworld. Hence we hope that our work on the distance and size perceptionwould help to pave the way for 3D applications such as CAD in FTVR.5.3.5 Limitations and Future WorkThere are several limitations to the experiment presented in this chapter.We used a flat FTVR display as a baseline to evaluate the perception onthe spherical display. It remains to be a question as how the performancewould be compared to a real world stimuli as flat FTVR displays providedlimited insight. In particular, the size-constancy has been demonstrated inthe physical world with consistently reproducible results [76]. We expect theresults would be better using a real world stimuli with future experimentsevaluating the perception on the Fishbowl VR compared to a baseline basedon real world stimulus.Another limitations is that we measured exocentric depth between twovirtual points rather than the egocentric depth. This is different from pre-vious studies [74] in which they measured the egocentric depth. This is a825.4. Summarylimitation as the exocentric distance measure may not directly reflect theegocentric distance which participants perceived when adjusting the size. Itwould be worthwhile further investigating the egocentric depth perceptionand comparing results with the exocentric depth. In addition, as partici-pants performed the depth task prior to the size task, it is possible that theirperformance in the size task could be affected by their performance in thedepth task, which makes the order of two tasks a potential confound vari-able. Future studies could consider separate the two tasks into two studies.While the result indicates that the spherical shape factor improves thedepth and size perception with its unique properties such as enclosure shape,it is not clear on how these properties influence the task performance indi-vidually. Future work is required to understand the mechanisms that maycause the difference. For example, a cubic FTVR display could be usedto evaluate the importance of the borderless feature of the spherical screenwhile having a similar enclosure shape. In addition to its borderless feature,the spherical screen is a display with finite volume as Benko mentioned in[10]. Rendering virtual objects within the volume presents a metaphor asif they are inside a glass globe or display case. The question can be raisedof whether rendering a virtual boundary of the back screen could influencethe task performance as one may perceive the display ending on the otherside. This may also provide additional motion parallax when moving theviewpoints.Finally, in our study we provided various 3D cues such as motion par-allax and stereopsis. It remains to be seen whether more subtle cues suchas the vergence/focus conflict, would impact differently on curved screenscompared to flat screens, as the spherical screen seems to present a largerrange of physical surface depths to focus on. Therefore, the screen shapemay help to limit this conflict depending on different visual stimulus. Fu-ture experiments designed to explore the relationships that may exist willhelp us better understand the impact of display technologies on visual cuespresented to the user.5.4 SummaryIn this chapter, we have presented an empirical study evaluating spatialperception on a Fishbowl VR display using a depth-ranking and a size-matching task. We found that the Fishbowl VR display provides bettertask performance with depth accuracy of 1cm which yielded significantlyless error compared to the accuracy of 6.5cm on the flat FTVR display in835.4. Summarythe depth-ranking task. The performance of the size-matching task was alsobetter with an accuracy of 2.3mm on the spherical display as compared to3.1mm on the flat display. In addition, their performance is more consistenton the spherical display with better size-constancy. We believe the FishbowlVR display is promising to improve users’ performance in 3D tasks which relyon the depth and size perception in 3D applications such as CAD that maybenefit from the spatial perception provided by the Fishbowl VR display.84Chapter 6Perceptual Duality of FTVRDisplaysIn this previous chapter, we demonstrated the superiority of Fishbowl VRdisplays in improving spatial perception by depicting 3D objects on a spher-ical screen. Because users perceive 3D objects by looking at digital pixels ona 2D screen, there exists a perceptual duality between the on-screen pixelsand the 3D percept. In this chapter, we investigated this perceptual dualityby evaluating the influence of the on-screen imagery. Understanding the per-ceptual duality helps us to provide accurate perception of real-world objectsdepicted in the virtual environment and pave the way for 3D applications.6.1 IntroductionFTVR displays provide compelling 3D experiences by rendering view depen-dent imagery on 2D screens. While users perceive a 3D object in space, theyare actually looking at pixels on a 2D screen. The users know it is a 2D im-age, and also know it is a 3D object. These two concurrent understandingsrepresent the perceptual duality between the object’s pixels and the objectin space, which could potentially cause perceptual inconsistency betweenthe real and virtual world. This potential inconsistency can be illustratedas an example in Figure 6.1. When a 3D house is rendered in a Fishbowl VRdisplay, the on-screen imagery is computed based on the viewer’s position.Due to perspective projection, the on-screen imagery becomes smaller asthe viewer moves closer, which contradicts to the reality as users expect alarger percept when getting closer to the object. This mismatch may causeperceptual inconsistency and interfere with the 3D experience.The on-screen imagery is a visual stimuli that provides various visualcues to the viewer, including 3D cues (depth cues), such as motion parallaxand binocular stereo, as well as 2D cues (on-screen cues), such as the positionand size of the 2D projection on the screen. While the importance of thesedepth cues has been long appreciated in the practice of FTVR [36, 81, 142],856.1. IntroductionFigure 6.1: A user interacting with the Fishbowl VR display. Due to per-spective projection, the on-screen imagery becomes smaller as the user getscloser, which contradicts to the reality as users expect a larger percept whengetting closer to the object.the influence of on-screen visual cues has been underestimated. As thevirtual objects are depicted via the imagery, these on-screen cues mightinfluence the perception of virtual objects. In our previous example, thesize of the on-screen imagery interferes with the viewer’s size perception inFigure 6.1. When judging the house’s size, they may judge it based on theactual size of the house, or, the 2D size of its projection on the screen. Invision science, similar ambiguity has been previously referred as “dualityof the depth perception in pictures” [44, 46, 142]. They found that addingdepth cues in paintings can make one see in 3D rather than in 2D [142];though their work focuses on static pictures. It is an open question whetherthese findings could be applied to screen-based 3D displays such as FTVRdisplays.To investigate whether the on-screen imagery can influence users’ per-ception, we conducted two experiments that measures users’ size perceptionwith different on-screen imagery on a Fishbowl VR display. We focus on thespherical form factor in this study because it has been most widely adoptedfor FTVR displays [36]. The first experiment evaluated the influence ofon-screen imagery with and without the stereo cue. The second study in-vestigate whether different sizes of the on-screen imagery caused by differentprojection matrices can affect the perceived size of virtual objects.To the best of our knowledge, it is the first study to evaluate and pro-vide insights on the perceptual duality with FTVR displays. While it isconducted with a Fishbowl VR display, the result applies to most screen-based 3D displays such as a CAVE [29]. Our study establishes a fundamental866.2. User Study 1: Influence of the On-screen Imagerylimitation for a broad range of “screen-based” 3D displays. All screen-based3D displays, like FTVR and CAVE, approximate holograms by renderingperspective on the surface with the assumption that if the perspective isgeometrically correct, the perception will be correct. But our study showsthat under some circumstances, this assumption may not hold and the ap-proximated “hologram” causes perceptual bias with visual artefacts. Un-derstanding the perceptual bias helps us to provide accurate perception andpave the way for 3D applications.Figure 6.2: Illustration of HeadMove and ObjectMove. In ObjectMove,as object moves towards the user, the on-screen imagery gets larger. InHeadMove, as the user moves towards the object, the on-screen imagerygets smaller.6.2 User Study 1: Influence of the On-screenImageryThe purpose of this study is to investigate whether different sizes of the on-screen imagery can affect perceived size of virtual objects. As the perceivedobject size can be greatly affected by the visual angle on the retinal image[76], we defined two types of movements (HeadMove and ObjectMove) as il-lustrated in Figure 6.2 to provide consistent retinal images across conditionswith different on-screen size as described below.876.2. User Study 1: Influence of the On-screen ImageryFigure 6.3: Perspective projection of a traditional planar FTVR display.(top) The projected size on the screen decreases as the viewpoint movestowards the screen from D to C in HeadMove. (bottom) The projected sizeincreases as the object moves towards the screen from A to B in ObjectMove.The visual angle on the retinal image increases in the same way in both cases,thus a viewer moving toward an object versus object moving towards theviewer has the same impact on the retinal image.886.2. User Study 1: Influence of the On-screen Imagery6.2.1 Projected Size and MovementAs shown in Figure 6.3, the on-screen imagery is computed based on theperspective projection between the virtual object and viewpoint 4 . To keepconsistent retinal images, we define two types of movements: HeadMove andObjectMove. We show that HeadMove and ObjectMove provides distinct on-screen imagery while maintaining the same visual angle.In HeadMove (Figure 6.3 (top)), the viewpoint moves closer to the virtualobject via forward head movements toward the screen. The projected sizeon the screen can be computed as:ProjectedSize = L− Lzod(6.1)where d is the viewing distance between object and viewpoint, zo is thedistance between object and screen, and L is the virtual object’s size. Theprojected size decreases as d decreases, shown as the blue line in Figure 6.4(left). In ObjectMove (Figure 6.3 (bottom)), the virtual object movescloser to the viewpoint. Similarly, it can be computed as:ProjectedSize =Lzvd(6.2)where L is the virtual object’s size, zv is the distance between theviewpoint and screen. Contrary to HeadMove, the on-screen projected sizeincreases as d decreases, shown as the red dash line in Figure 6.4 (left). Inboth HeadMove and ObjectMove, the visual angle α can be computed as:α = 2arctanL2d(6.3)As the visual angle α only depends on d and L, it is independent ofmovement types; thus, the retinal images are the same across HeadMoveand ObjectMove as shown in Figure 6.4 (right). The primary differencebetween two movements is the on-screen size of the 2D imagery. Note thatthe changes of the on-screen are in opposite for HeadMove and ObjectMove:it shrinks in HeadMove and grows in ObjectMove. In this study, we want totest whether the oppositely varied size of the on-screen imagery results indifferent size interpretations with equivalent retinal images.4The implementation of the perspective-corrected rendering in FTVR can be found inSection 4.2.896.2. User Study 1: Influence of the On-screen ImageryWhile the example shown in Figure 6.3 uses a planar screen, the sameprojected phenomena occurs for arbitrary screen shapes, as each pixel onthe screen follows the same rule of the perspective projection in the viewfrustum using the rendering approach described earlier in Figure 4.4.Figure 6.4: Diagram of equation 6.1, 6.2 and 6.3. (left) Projected sizeon the screen and (right) visual angle on the retinal image as functions ofthe viewing distance d. In HeadMove, the projected size decreases as ddecreases, while in ObjectMove, the projected size increases as d decreases.In both HeadMove and ObjectMove, the visual angle increases independentof movement types as d decreases. The values of parameters in equation 6.1,6.2 and 6.3 are based on our experimental setup.6.2.2 TaskParticipants visualize a virtual ball while getting closer to the ball, via eitherHeadMove or ObjectMove. The task for participants is to judge whether thesize of the ball has changed or not by making a three-alternative forced choicevia answering the question: “Is the size of the ball getting smaller, largeror unchanged?”. Early pilots of this experiment showed that presenting thevirtual ball without modifying its size would trivialize the task; to make thetask nontrivial, the size of the virtual ball is adjusted by making it smaller,larger or unchanged so that ball’s size was varied at the same time as thehead/object was moving. If participants’ perception is influenced by theon-screen imagery, their answers will be biased towards one side.906.2. User Study 1: Influence of the On-screen Imagery6.2.3 Experimental DesignWe followed a 2x2 within-subjects design with two independent variables as:• C1 the movement condition, which could be HeadMove or ObjectMove.In HeadMove, participants move the head towards the object while inObjectMove they move the object towards them.• C2 the viewing condition, which could be Stereo or NonStereo. InStereo, participants visualize stereoscopic imagery while in NonStereothey visualize monocular imagery set to the mid-point of two eyes assuggested by [36].User performance was evaluated based on the measure BiasError , de-fined as the difference of scores between the reported and expected answer,with the score of Small, Same and Large equal to -1, 0 and 1 respectively.Hence, a positive value of BiasError indicates overestimation of size whilea negative value means underestimation.6.2.4 HypothesisWe hypothesized that:• H1-1: there is a difference of size perception on BiasError betweenthe HeadMove and ObjectMove;• H1-2: Stereo will have lower BiasError than NonStereo because of itsadditional depth cue.These hypothesis were based on the combination of previous researchon the duality of size perception in pictures [35, 142] and the observationsmade using FTVR displays in lab.6.2.5 StimuliSimilar to other size perception studies [73, 123, 152], we chose a sphericalstimuli due to its isotropic shape. We use the same texture and shadowpictorial cues across conditions to help users perceive ball depth. A shadow isdropped to appear on a plane overlapping the physical black surface holdingthe display as shown in Figure 6.5. It is necessary to render a shadow toindicate the ball’s position. In particular, without stereopsis, this is theprimary visual cue that indicates ball depth. We chose a wooden texture so916.2. User Study 1: Influence of the On-screen Imageryusers do not have an obvious size feature. This would encourage users toperceive the size based on the volume in 3D rather than 1D or 2D featureslike the length of a checkerboard pattern. In real life, people do see woodenballs but have no prior knowledge about their exact size. We assume thiswould help to minimize prior size bias.Figure 6.5: Experimental setup of Study 1. In HeadMove, participantsmove their head from D to C when the stimuli stays at A. In ObjectMove,participants move the stimuli from A to B when the head stays at the originD. We use a spherical display (24” diameter) with projectors rear-projectingthrough a projection hole at the bottom of the screen. We track the headposition to render view-dependent imagery shown in the right corner as anexample of NonStereo, and ensure the movement magnitude is 10 inches inboth HeadMove and ObjectMove.6.2.6 ParticipantsSeventeen participants (12 male and 5 female) from the University of BritishColumbia were recruited to participate in the study with compensation.Ethic was approved by the University of British Columbia Behavioural Re-search Ethics Board (H08-03005). All participants passed the stereo acuitytest. The average age of all participants was 29 years old from 18 to 35years old. All participants completed written informed consent. We alsoasked participants to provide their previous experience and usage with Vir-926.2. User Study 1: Influence of the On-screen Imagerytual Reality displays with a scale from 1 (“never”) to 4 (“regularly”). Theaverage score of all participants was 2.4.6.2.7 ProcedureParticipants started by filling out the consent form after verbal explanationsof the study. We measured the interpupillary distance (IPD) of each par-ticipant with a ruler tape and calibrated the viewpoints based on the IPD[138]. Prior to the study, they underwent a stereo acuity test to confirm theeligibility [42]. Then they were seated on a fixed chair in front of the spher-ical display. They were instructed to place their head in a position wherethe head could gently touch a wooden bar rigidly attached on the chair toensure the viewing distance d of 34 inches as shown in Figure 6.5. To ensurethe consistency of the moving velocities in HeadMove and ObjectMove, par-ticipants were instructed to perform forward movements toward the screenpaced by an audible electric metronome at 1.5 Hz similar to [18, 91]. Wemeasured the velocity of their head movements before the study and set themeasured velocity to all conditions.As shown in Figure 6.5, in HeadMove, participants were required to judgethe size change of the ball placed at A, while moving their heads forwardtowards the display from D to C. They moved their head 10 inches withthe velocity paced by the metronome. Participants were presented with twosuccessive stimuli per trial, which allows them to confirm their answer beforereporting it. They chose among three alternatives of smaller, unchanged andlarger using a controller to report their answer. In ObjectMove, participantswere required to judge the size change of the ball while pressing a buttonon the controller to move the ball towards them. They moved the ball 10inches from A to B at the pre-measured velocity paced by the metronomewhile keeping their heads stationary at the origin D. Likewise in HeadMove,they were presented with two successive stimuli per trial and reported theanswer among three alternatives using a controller.In both conditions, the movement of head/object caused the viewingdistance d to decrease from 34 inches to 24 inches. This is to ensure thevisual angle changes in the same way across conditions. Each participantconducted 12 data trials plus 3 practice trials per condition, resulting in 48(12x4) data observations for BiasError . The 12 data trials always contained4 larger, 4 smaller and 4 unchanged stimuli in a random sequence at theresizing ratios of either 0% (4 unchanged), 15% (2 larger, 2 smaller) or 30% (2larger, 2 smaller). The initial diameter of the ball was randomized between4 and 6 inches. At the end each condition, we presented a questionnaire936.2. User Study 1: Influence of the On-screen Imagerywith two likert scale questions (confidence and realism) to participants andasked them to rate each with a number in the range -2 (“totally disagree”)to 2 (“totally agree”).Early pilots of this experiment showed that repeated toggling betweenHeadMove and ObjectMove was disorienting; to minimize this disorienta-tion, participants only toggled between HeadMove and ObjectMove onceand switched viewing conditions within each movement condition. Hence wecounter-balanced the movement conditions (2), and also counter-balancedthe viewing conditions (2) within each movement condition (2), resulting in8 (2x2x2) different sequences.6.2.8 ApparatusWe used the 24 inch prototype described in Section 4.1 and Figure 4.2 (right)to conduct the study. As shown in Figure 6.1, the user is tracked using Op-tiTrack [62] optical tracking system with passive markers attached to thestereo glasses. We use Unity game engine [135] to create our 3D contentof the study with a two-pass rendering approach to generate perspective-corrected imagery based on tracked viewpoints [37]. We calibrated thespherical display using an automatic calibration approach [154] describedin Chapter 3 with the on-surface error of 1-2 millimeter. We used a pattern-based viewpoint calibration [138] to register user-specific viewpoint withrespect to the display (Figure 4.3 (right)) with the average angular errorof less than one degree. The total latency is between 10-20 msec [37]. Weused a post-experiment questionnaire and a demographic questionnaire inthe study as attached in Appendix C.2.6.2.9 ResultData did not meet the normality assumption of ANOVA. A Friedman rankedsum test was performed and revealed a significant difference across condi-tions in BiasError ( χ2(3) = 32.0, p < .001 ). Pairwise post-hoc Wilcoxonsigned rank test for multiple comparisons with Bonferroni correction showsBiasError for HeadMove (M = −0.23, SE = 0.043) is significantly lower(W = 151, p < .001) than ObjectMove (M = 0.48, SE = 0.098) when view-ing in NonStereo; BiasError for HeadMove (M = −0.088, SE = 0.025) issignificantly lower (W = 119, p < .05 ) than ObjectMove (M = 0.24, SE =0.090) when viewing in Stereo. BiasError for Stereo (M = 0.24, SE = 0.090)is significantly lower (W = 94, p < .05 ) than NonStereo (M = 0.48, SE =0.098) when performing ObjectMove, but not when performing HeadMove946.2. User Study 1: Influence of the On-screen ImageryFigure 6.6: BiasError with means and 95% confidence intervals. BiasErrorof HeadMove are below zero while BiasError of ObjectMove are above zero,showing a trend of underestimation in HeadMove and overestimation inObjectMove. Significance values are reported in brackets for p < .05(∗),p < .01(∗∗), and p < .001(∗ ∗ ∗).(W = 20.5, p = .108 ). The mean BiasError with 95% confidence intervals(CI) is shown in Figure 6.6. A complete table of results can be found inAppendix D.7.An one-sample Wilcoxon signed rank test was performed on BiasErrorindicated a significant underestimation of size (W (µ < 0) = 136, p < 0.001)when performing HeadMove and overestimation ( W (µ > 0) = 140, p <0.001 ) when performing ObjectMove. The mean underestimation rate inHeadMove and overestimation rate in ObjectMove is 83.3% when viewing inNonStereo, and reduce to 64.7% when viewing in Stereo.A Friedman ranked sum test was performed on the Likert-scale ques-tions of the confidence and realism. Results revealed a significant differenceacross conditions in the confidence ( χ2(3) = 10.9, p < 0.05 ) and realism( χ2(3) = 8.86, p < 0.05 ). Post-hoc Wilcoxon signed rank test for multiplecomparisons with Bonferroni correction did not show significant differencebetween any pairs. Results of the mean, median and 95% CI are shown inFigure 6.7. A complete table of results can be found in Appendix D.8 andD.9.ANCOVA on the participants’ expertise on VR was carried out on the956.2. User Study 1: Influence of the On-screen ImageryFigure 6.7: Participants’ ratings with means (circle), medians (cross) and95% confidence interval from -2 “totally disagree” to 2 “totally agree” to thequestions of to how confident and real they felt about the reported resultand stimuli in Study 1.BiasError with the movement and viewing conditions as independent factorsand VR expertise as covariate. The results did not show significant differenceon BiasError ( F (1, 63) = 3.31, p = 0.074 ) influenced by the the covariateof VR expertise.6.2.10 DiscussionResults of One-sample Wilcoxon signed rank test show that participantssystematically underestimated size in HeadMove and overestimated size inObjectMove when retina images are the same across conditions. Post-hocanalysis of the Friedman test show that the perceived size in HeadMoveis significantly smaller than ObjectMove. These results indicate that C1(movement) affects the perceived object size between HeadMove and Object-Move. Hence we reject the null hypotheses of H1-1 and accept it. Resultsof post-hoc analysis also show that overestimation of size is less severe inStereo than NonStereo, indicating that adding the stereo cue significantlymitigated the perceptual bias. Given the result, we reject the null hypothe-ses of H1-2 and accept it. The result of Stereo is consistent with Stefanucci’sstudy [128] in which they found the size underestimation was alleviated bythe addition of stereo as a depth cue in the display.While this perceptual bias was alleviated by the stereo cue, many FTVRdisplays do not support stereoscopic viewing [36] especially for multiple966.3. User Study 2: Influence of the Projection Matrixusers due to limitations of hardware [153]. These FTVR displays requireapproaches to reduce this perceptual bias when viewing without Stereo. Asan important pictorial cue, linear perspective has been closely associatedwith size perception, as an object getting further appears smaller to us. Asshown in Figure 6.3, the perspective projection makes the on-screen imageryvaried along with the depth of viewpoint and virtual object. A natural ques-tion that arises is whether using different projection matrices can affect theunder/overestimation of perceived size. Hence, we designed another studyaiming at (i) studying the influence of projection matrices on the observedeffect and (ii) confirming the influence of the on-screen imagery on the per-ceived size.6.3 User Study 2: Influence of the ProjectionMatrixWe investigated whether different sizes of the on-screen imagery caused bydifferent projection matrices can affect perceived size of virtual objects. Weused two types of projection matrices (Persp and WeakPersp) to create on-screen imagery with different sizes when viewing in NonStereo as describedbelow.Projection MatrixIn Persp, the on-screen imagery is generated using perspective projectionbased on the positions of viewpoint, screen and virtual object as shownin Figure 6.3. In WeakPersp, we used an average constant distance zavgbetween the viewpoint and screen, so that the on-screen imagery is generatedbased on the constant distance independent of the viewpoint-screen depthas shown in Figure 6.8(left). When the viewpoint moves towards the screenin HeadMove, the projected size on the screen is a constant:ProjectedSize =Lzavgzo + zavg(6.4)where zavg is the constant average distance between screen and viewpoint,zo is the distance between object and screen, and L is the virtual object’ssize. The on-screen projected size is a constant as shown in Figure 6.8(right). In ObjectMove, when the object moves closer to the viewpoint, theon-screen projected size increases as zo decreases as shown in Figure 6.8(right).976.3. User Study 2: Influence of the Projection MatrixFigure 6.8: Weak perspective projection of a traditional planar FTVR dis-play. (left) The on-screen imagery is generated based on a constant distancezavg between the viewpoint and screen, so that the projected size is indepen-dent of the viewpoint-screen depth. (right) Projected size on the screen as afunction of viewing distance. In HeadMove, the projected size is a constantindependent of the viewing distance, while in ObjectMove, the projected sizechanged at a smoother gradient compared to the perspective projection inFigure 6.3(b).We chose WeakPersp due to the following reasons. First, WeakPerspprovides a smoother transition on the projected size than Persp when theviewpoint gets closer to the object. Comparing Figure 6.3(b) with Fig-ure 6.8(right), WeakPersp generates on-screen imagery with a constant sizein HeadMove, while Persp shrinks the projected size, which contradicts tothe reality as users expect a larger percept when getting closer. Hence, ren-dering on-screen imagery with a constant size can potentially preserve theperceptual consistency between the virtual and real world. In the case ofObjectMove, the gradient of projected size in WeakPersp is smaller thanthe gradient in Persp. While the projected size will still become larger, thesmaller gradient provides a smoother transition so there will be less over-estimation on the perceived size as observed in Study 1. Second, since theon-screen imagery shrinks in Persp when moving towards the screen, thesame virtual object will be rendered with fewer pixels, causing a decrease inthe perceived resolution, which is also against the common sense as closerobjects should appear more clear. WeakPersp warps the viewing distanceat a constant value, which prevents the resolution from decreasing when theviewer moves towards the screen. It is worth noting that WeakPersp still986.3. User Study 2: Influence of the Projection Matrixprovides perspective corrected imagery as the imagery changes when usersperform lateral movements, but the imagery remains unchanged for forwardor backward movements.We evaluate the effect of projection matrices using the same size judge-ment task as described in Section 6.2. As different projection matrices mayalso influence users’ general spatial impression and subjective preference.We also included a forced-choice viewing preference task to assess subjec-tive 3D impression.Figure 6.9: Subjective impression task: participants walked around the dis-play clockwise and performed forward head movement towards the screenfor closer inspection before selecting their preference between a pair of pro-jection conditions (Persp and WeakPersp).6.3.1 Task 1: Subjective ImpressionParticipants visualized a stationary 3D scene with a house model while walk-ing around it. They were instructed to pay close attention to how 3D thescene appeared and how natural the objects looked like as in the real world.As shown in Figure 6.9, participants were instructed to stand facing thefront of the house, and walk to each side of the house clockwise. At eachside of the house, they were asked to inspect the house and move closer untilthe house was lit up shown from a window on the side of the house. Once996.3. User Study 2: Influence of the Projection Matrixthe window lit up, they walked to the next side until completing a full circle.We therefore enforced forward and lateral head movement to simulate theviewing experience for 3D visualization when users browse the 3D data bywalking around, and inspect points of interest by moving closer to the dis-play. The independent variable is C3 (projection) with two levels of Perspand WeakPersp. We measured subjects’ rating and preference on C3 via apost-questionnaire (Appendix C.3).6.3.2 Task 2: Size JudgementWe used the same size judgement task in Study 1. Participants visualizeda virtual ball while getting closer to the ball, via either HeadMove or Ob-jectMove. The task for participants was to judge whether the size of theball has changed or not by making a three-alternative forced choice as de-scribed in Section 6.2.2. We followed a 2x2 within-subject design with twoindependent variables as C3 (projection) and C1 (movement). Likewise inTask 1, C3 has two levels of Persp and WeakPersp, and C1 has two levelsof HeadMove and ObjectMove. Similar to Study 1, in HeadMove, partici-pants moved the head towards the object while in ObjectMove they movedthe object towards them. Participants’ performance was evaluated based onthe measure BiasError as in Study 1 described in Section 6.2.3. A positivevalue of BiasError indicates overestimation of size while a negative valuemeans underestimation.6.3.3 HypothesisWe hypothesized that:• H2-1: WeakPersp will be preferred over Persp since the on-screenimagery of WeakPersp does not change when subjects get closer tothe virtual stimuli.• H2-2: WeakPersp will have lower BiasError than Persp since WeakPerspproduces more stable on-screen imagery when the viewing distance isdecreasing.6.3.4 ParticipantsTwelve participants (7 male and 5 female) from University of British Columbiawere recruited to participate in the study with compensation. Ethic was ap-proved by the University of British Columbia Behavioural Research Ethics1006.3. User Study 2: Influence of the Projection MatrixBoard (H08-03005). The average age of all participants was 27 years oldfrom 18 to 35 years old. We also asked participants to provide their previ-ous experience and usage with Virtual Reality displays with a scale from 1(“never”) to 4 (“regularly”). The average score of all participants was 2.1.6.3.5 ProcedureAll participants completed written informed consent. They performed Task1 before Task 2. The order of tasks was the same for all participants, whereasthe projection and movement conditions were counter-balanced within eachtask. In Task 1, participants walked around and visualized the display inPersp and WeakPersp with a maximum viewing distance of 50 inch to thedisplay. The constant viewing distance of WeakPersp is chosen to be 50inch to maximize perceived resolution. The light on each side of the housewould be turned on when participants moved forward and reached 20 inchfrom the display. After each projection condition, they filled in a post-questionnaire (Appendix C.3) with five Likert-scale questions to rate eachwith a number in the range -2 (“not at all”) to 2 (“very much”). Thefive questions were chosen and modified from Parent’s questionnaire whichevaluated the physical presence of virtual art exhibits [100]. At the endof Task 1, participants were asked to provide their preferences and specifyreasons between projection conditions. Task 2 has the same procedure asin Study 1. Participants were seated on a fixed chair in front of the displaywith the procedure described in Section 6.2.7. Likewise in Study 1, eachparticipant conducted 12 data trials plus 3 practice trials per condition,resulting in 48 (12x4) data observations for BiasError .We used the same apparatus as in Study 1 for both tasks, except for thequestionnaires. A post-experiment questionnaire assessing the subjectiveimpression and a demographic questionnaire used in Study 2 can be foundin Appendix C.3.6.3.6 ResultTask 1 - Subjective Impression: Chi-Square analysis shows significantdifferences on the preference of C3(projection) (χ2 = 8.33, p < 0.01) with 11out of 12 participants preferring WeakPersp over Persp. Wilcoxon MatchedPairs Signed Rank Test was performed on the likert scale questions. Therewere significant differences on the Q1-consistent (W = 28.0, p < .05 ), Q2-reachable ( W = 55.5, p < .05 ), Q3-geometry ( W = 50.0, p < .05 ), andQ4-realistic (W = 32.5, p < .05 ), but not Q5-presence (W = 37, p = .09 ).1016.3. User Study 2: Influence of the Projection MatrixFigure 6.10: (top) BiasError with means and 95% confidence intervals (CI).The under and overestimation rates in HeadMove and ObjectMove has beenreduced with WeakPersp compared to Persp. (bottom) Participants’ rateswith means (circle), medians (cross) and 95% CI from -2 “not at all” to 2“very much” to the five questions in the post-questionnaire. The five ques-tions include: Q1-consistency (“visualization is consistent with real-worldexperience”), Q2-reachable (“feel can reach and grasp a virtual object”),Q3-geometry (“virtual objects appear geometrically correct”), Q4-realistic(“virtual objects appear realistic”) and Q5-presence (“feel virtual objectsexist in the real environment”). Significance values are reported in bracketsfor p < .05(∗), p < .01(∗∗), and p < .001(∗ ∗ ∗).1026.3. User Study 2: Influence of the Projection MatrixThe results of questionnaire are summarized in Figure 6.10. A completetable of results can be found in Appendix D.11.Task 2 - Size Judgement: Repeated measures two-way ANOVA wascarried out on the BiasError with C1(movement) and C3(projection). Re-sults revealed a two-way interaction effect between C1 and C3 (F (1, 11) =10.1, p < 0.01 ). Main effect of C1 was also found ( F (1, 11) = 36.64, p <.001 ), but not on C3 ( F (1, 11) = 4.14, p = 0.067 ).Post-hoc pairwise t-test with Bonferroni correction for the interactionbetween C1 and C3 shows BiasError for HeadMove (M = −0.125, SE =0.028) is significantly lower ( t = 6.66, p < .001 ) than ObjectMove (M =0.5, SE = 0.086) when viewing in Persp, but not in WeakPersp ( t =2.52, p = 0.08 ). BiasError in WeakPersp (M = 0.215, SE = 0.105) issignificantly lower ( t = 3.77, p < .01 ) than Persp (M = 0.5, SE = 0.086)when performing ObjectMove, but not when performing HeadMove ( t =1.38, p = 0.732). The mean BiasError with 95% CI is shown in Figure 6.10.A complete table of results can be found in Appendix D.10.One-sample t-test was performed on BiasError indicated a significantunderestimation of size ( t(µ < 0) = −4.450, p < 0.001 ) when performingHeadMove in Persp, but not in WeakPersp (t(µ < 0) = −0.491, p = 0.633).It also indicated a significant overestimation of size( t(µ > 0) = 5.826, p <0.001) when performing ObjectMove in Persp, but not in WeakPersp (t(µ >0) = 2.044, p = 0.066 ). In Persp, the mean underestimation rate of Head-Move is 83.3%, and the overestimation rate of ObjectMove is 91.7%, whichreduced to 50.0% and 58.3% respectively in WeakPersp.ANCOVA on the participants’ expertise on VR was carried out on theBiasError with the movement and viewing conditions as independent factorsand VR expertise as covariate. The results showed significant difference onBiasError ( F (1, 43) = 4.82, p < 0.05 ) influenced by the covariate of VRexpertise.6.3.7 DiscussionResults show that participants systematically underestimated size in Head-Move and overestimated size in ObjectMove when viewing in Persp, whileno significant under or overestimation is found when viewing in WeakPersp.Results of two-way anova show that the perceptual bias in Persp is sig-nificantly larger than WeakPersp when performing ObjectMove. These re-sults indicate that different projection matrices affect the perceived object1036.4. General Discussionsize in 3D space. Hence we reject the null hypothesis of H2-1 and ac-cept it. When performing HeadMove, although there is difference on theperceived size between WeakPersp (M = −0.021, SE = 0.042) and Persp(M = −0.125, SE = 0.028), the result is not significant. However, resultsof Task 1 show that participants overwhelmingly preferred WeakPersp overPersp when performing HeadMove and visualizing 3D scene. Given theresult, we reject the null hypothesis of H2-2 and accept it.6.4 General DiscussionThe results of our studies show a perceptual bias in size perception caused byon-screen imagery. Adding stereo cue can mitigate the observed bias. Whenviewing without the stereo cue, weak perspective projection reduced theunder/overestimation rate. Users also have a strong subjective preferencefor WeakPersp over Persp.6.4.1 Influence of the On-screen ImageryResults of both studies show that participants systematically underesti-mated size in HeadMove and overestimated size in ObjectMove when retinaimages are the same across conditions. This indicates the on-screen size ofthe 2D imagery affects the perceived object size in 3D space. In particular,when participants moved closer to the object in HeadMove, they had a ten-dency to report objects as smaller. This contradicts reality as we usuallyexpect closer objects look larger due to a larger visual angle. In addition,the bias appears to be stronger in ObjectMove than HeadMove in both stud-ies. One potential explanation is that the absolute gradient of the projectedsize along the viewing distance is steeper in ObjectMove than HeadMove asshown in Figure 6.3(b) and Figure 6.8(right). To encourage perception of3D features, users were instructed to attend to the change of the ball. Notethat if they focused on the change of the shadow or the distance betweenshadow and ball instead, their performance will still be consistent becausethey are visualizing the scale of the entire scene and any geometries in thescene are subject to the projection model. However, it may impact the ex-tent of the effect, which may account for the performance variance shown inFigure 6.6.Knowing the effect of the on-screen imagery helps us to understand spa-tial perception in the screen-based 3D displays. One of the perceptual er-rors is the size underestimation of virtual objects [73, 91, 128]. Virtual ob-jects are usually located behind the screen in the screen-based 3D displays1046.4. General Discussion[126, 128, 152], rendering an on-screen imagery smaller than the actual sizeas shown in Figure 6.3. If the size perception regresses towards the on-screen imagery similar to the perceptual bias observed with the real object[35], users will have a tendency to report a modified value for the perceivedobject size, biased towards the on-screen size, resulting in the underestima-tion of the reported value. Hence the on-screen imagery could be a possiblesource for the size underestimation. On the other hand, if virtual objectsare rendered in front of the screen, the on-screen imagery will be larger thanthe actual size. Therefore, we expect users will overestimate the size, whichrequires future experiments to investigate the effect of the on-screen whenrendering on different side of the screen.The influence of the on-screen imagery might not be restricted to sizeperception. One of the visual artifacts in screen-based 3D displays is thefloating effect [36], sometimes also known as swimming effect [51]. Usersobserved stationary virtual objects to move along while they walked aroundin the environment [103] . As shown in Figure 6.11, as the viewer moveslaterally while visualizing a ball in the center, the projection on the screenalso moves based on the viewer’s position. Likewise in our size judgementtask, if the on-screen optical flow interferes with users’ percept of the virtualobject, they may have a tendency to report that the object moves along withtheir head movement. With system latency and registration error, such localchanges may become more noticeable. Hence, the on-screen imagery couldbe a potential cause for the floating effect.Another problem which has been well-known is the vergence/focus con-flict. It describes a problem as viewers converge toward the virtual objectbut focus eyes on the on-screen imagery, which has been found to causeeyestrain [142]. We see the perceptual duality and vergence/focus conflictas two separate problems arisen from the same origin: users directly see thepixels rather than the object. The depth separation of the on-screen imageryand the 3D object causes visual artifacts and problems, which unfortunatelypresent in all synthesized 3D displays which use pixels to depict a 3D scene.6.4.2 Size Perception AccuracyAs BiasError does not directly reflect the accuracy 5, to better under-stand the effect of WeakPersp and Stereo on the performance, we alsocomputed the AbsoluteError , defined as the average absolute differencebetween the reported answer and the correct answer. In study 2, while5BiasError can be negative or positive so that the average BiasError can be zero whilethe absolute mean error is much greater than zero.1056.4. General DiscussionFigure 6.11: Illustration of the floating effect: users observe stationaryvirtual objects to move along while they walk around in the environment,which could possibly be caused by the movement of on-screen imagery.WeakPersp reduces perceptual bias on perceived size, it did not significantlyimprove the accuracy of perceived size, as there is no significant difference(F (1, 11) = 1.53, p = 0.24 ) between WeakPersp (M = 0.354, SE = 0.050)and Persp (M = 0.403, SE = 0.056) on AbsoluteError . In Study 1, Ab-soluteError is significantly lower ( F (1, 16) = 21.1, p < .001 ) with Stereo(M = 0.306, SE = 0.046) than NonStereo (M = 0.466, SE = 0.043) ,suggesting the addition of the stereo cue improved the size perception withless systematic bias but also better accuracy. The result is consistent withWare’s discussions on the duality of depth perception in picture: the amountand effectiveness of the depth cues can help viewers to judge the size of adepicted object in a 3D space rather than on the picture plane [142]. This issimilar to the scenario when people view a painting in the real world. Theycan choose to see the depicted object as a 3D structure, or as a 2D picturesurface. The painter creates spatial vividness by adding various pictorialdepth cues in their work to strengthen the illusion. Similarly, the stereo cuein our study is also an additional depth cue, making it easier to see in a‘3D mode’, rather than in a ‘2D mode’, reducing sensitivity to the on-screencues. Hence, it is suggested that future designs of FTVR displays shouldinclude stereopsis to alleviate the influence of on-screen cues.1066.4. General Discussion6.4.3 HeadMove vs ObjectMoveHeadMove and ObjectMove are designed to provide equivalent retinal im-ages across conditions while rendering oppositely varied size of the on-screenimagery. Naturally, in this study, we cannot decouple the on-screen imageryfrom the movement itself, making it ambiguous to interpret the result aswhether the bias is caused by different on-screen cues or different movementtypes. In reality, when our eyes get closer to an object, via either heador object movement, we always feel closer objects look larger due to the in-creased visual angle. Hence it seems unlikely that the movement types wouldbe responsible for the opposite BiasError observed in our study. We alsocomputed AbsoluteError for HeadMove and ObjectMove. In study 1, Abso-luteError of ObjectMove (M = 0.578, SE = 0.038) is significantly higher(F (1, 16) = 45.8, p < .001 ) than HeadMove (M = 0.194, SE = 0.027) , sois in study 2 (F (1, 11) = 37.25, p < .001). In the study, we provided limitedpictorial cues with a plain background to keep the visual stimuli simple. Onepossible explanation for the performance difference is that the plain back-ground did not provide sufficient depth information when the object movedtowards participants in ObjectMove, compared to HeadMove, in which theymight have a better depth perception via self movement. As depth and sizeperception are closely related, the lack of depth cues might be the causeof the difference in AbsoluteError . Additionally, HeadMove provides extraproprioception cue compared to the object movement. Participants couldsee the changes of real-world objects as they approached the screen, whilethe only visual cue in ObjectMove is the virtual object. Therefore, it issuggested to consider the head movement for better size perception.1076.4. General DiscussionFigure 6.12: Participant’s view with (top) Persp and (bottom) WeakPerspwhen moving from (left) far to (right) near. While there is only subtledifference between Persp and WeakPersp when viewing at far, the imageryof WeakPersp is enlarged and distorted compared to Persp when viewing atnear.6.4.4 Choice of Projection MatrixIn Study 2, we used the weak perspective projection to warp the viewingdistance at a maximized constant value for a higher resolution, resulting inan enlarged projected imagery compared to Persp. However, the imagery isdistorted due to the warping so that straight lines become noticeable curvedwhen viewing at a close distance as shown in Figure 6.12. In addition,the enlarged imagery inflates the perceived absolute scale of virtual objects.Particularly, we found that participants with higher VR expertise tend tooverestimate the size with positive BiasError as shown in Figure 6.13. Onepotential explanation is that advanced VR users are more sensitive to theenlarged imagery caused by WeakPersp, as Persp is the default projectionmatrix commonly used in VR applications.1086.4. General DiscussionFigure 6.13: BiasError influenced by participants’ expertise on the usageof VR. Participants with higher VR expertise are more likely to overesti-mate the size in WeakPersp than Persp, potentially caused by the enlargedprojected imagery in WeakPersp, which is less commonly used in VR scenes.Despite of these visual artifacts, WeakPersp was still favored by 91.67%participants. When asked about the reason, 5 out of 12 participants men-tioned that the scene looked larger when getting closer, which better matchedwith the real world experience. Two participants further explained thatthey felt the scene in Persp was “moving away” from them since it shrunkas they moved closer. This finding is interesting, if we consider the fact thatPersp produces on-screen imagery which is geometrically correct based onthe viewer’s position, while the imagery is actually warped in WeakPersp.The result of questionnaire regarding Q3-geometry shows opposite rating:participants believed WeakPersp provided correct geometry projection withrespect to the viewpoint as shown in Figure 6.10. This leads us to believethat the warped scene in WeakPersp better matched viewer’s expectationwhen moving closer to the screen, even if it is not geometrically correct.A potential explanation might be related to users’ prior experience withspherical objects such as fish bowls and snow globes, which usually havevisual distortions due to the light refraction caused by the spherical glassand water. It is possible that similar distortion on the spherical displays isexpected, which requires further experiments to investigate.While WeakPersp was favored by participants, it may not be the op-timal projection matrix. Our studies investigated WeakPersp and Stereoseparately in two studies. In Study 2, we rendered in WeakPersp without1096.4. General DiscussionStereo cue. When they are coupled, it is worth noting that the disparity ofStereo will not change due to the nature of WeakPersp. In other word, whenusers get closer to the screen, the disparity deviates from the correct valuewhich could potentially cause problems. More experiments are requiredto investigate the potential issues and understand the effect of projectionmatrix. In our study, it shows WeakPersp is preferred than Persp whenviewing in NonStereo for a simple walk-around visualization. This indicatesWeakPersp could be considered for NonStereo casual applications such asattention-drawing showcase visualization which does not require users towear glasses.6.4.5 Design Recommendations for FTVR DisplaysAs our study found that the on-screen imagery influenced users’ size per-ception, we summarize the following design recommendations for FTVRdisplays. First, it is suggested to include the stereo cue to reduce the ob-served perceptual bias. Similar to the stereo cue, other depth cues such aspictorial cues or motion parallax cue may help to reduce the bias, while itis worthwhile investigating these cues in the future study. Second, whenviewing without the stereo cue, projection matrices which produce a sta-ble on-screen imagery, such as weak perspective projection, can mitigatethe perceptual bias and make sure users perceive 3D scenes in a way as ex-pected. Third, the head movement should be considered to provide accuratesize perception in 3D applications which need to accurately depict the sizeand shape of the object, such as computer aided design and virtual surgery.6.4.6 Limitations and Future WorkThere are several limitations to the experiments presented in this chapter.Our study was performed on a Fishbowl VR display. We chose a sphericaldisplay as it is the most common form-factor for recent FTVR displays. Wechose a spherical stimuli as it has been widely used in size perception studies[73]. However, awareness should be raised on the potential interaction be-tween the shape of the screen and the stimuli. Our study evaluated the sizeperception of a spherical stimuli on a spherical screen. Future work couldinvestigate stimulus with different shapes on displays with other shapes.We expect the findings could be extended to other display shapes suchas planar or cylindrical displays, because the design of the task, the analysisof the projected size and the head/object movement are independent of theshape factor. We also expect the perceptual bias to be more pronounced1106.5. Summarywhen the size of the stimuli is comparable to the size of the screen, becausethe screen can serve as a reference that makes the local changes of the on-screen imagery more noticeable. It is also interesting to consider how thesefindings could be transferred to HMDs. Unlike FTVR displays, users movetogether with HMD screens. Ideally, there is no relative movement betweenthe viewpoint and screen when users wear HMDs tightly. Therefore weexpect the result will be different from our study as the projected size willincrease in both HeadMove and ObjectMove, making closer objects looklarger.Secondly, while we found that both Stereo and WeakPersp can alleviatethe perceptual bias of perceived size, they were evaluated separately. Itis unclear whether applying WeakPersp with Stereo will over correct theperceptual bias. In addition, WeakPersp renders enlarged on-screen imageryby warping the viewer-screen depth which caused visual artifacts such asdistortions on the Fishbowl VR display. The enlarged imagery may alsoaffect the perception of absolute scale in the virtual environment. Futurestudies would be necessary to investigate the influence of different projectionmatrices with the stereo cue regarding size perception and visual artifacts.Lastly, our study assessed the effect of on-screen imagery on perceivedsize. As discussed in Section 6.4.1, we expect the influence of the on-screenimagery might not be limited to size perception. Similar to the perceptualbias observed with the real object [35], it may occur not only for the size ofobjects but also for other attributes such as position or orientation, whichmay help to explain some visual artifacts observed in the virtual environ-ment such as the floating effect. More experiments are required to furtherunderstand the influence of on-screen imagery on spatial perception. Ourinvestigation with the on-screen imagery is still at a preliminary stage, butinitial findings are encouraging that the on-screen imagery plays a signifi-cant role in size perception, and the stereo cue as well as projection matrixcan mitigate the perceptual bias caused by on-screen imagery.6.5 SummaryFTVR displays render perspective-corrected imagery on a 2D screen. Be-cause users perceive a 3D object by looking at pixels on the 2D screen, thereexists a perceptual duality between the on-screen pixels and the 3D percept.In this chapter, we conducted two empirical studies to demonstrate the in-fluence of the on-screen imagery, causing 83.3% under/over-estimation ofperceived size. The addition of stereopsis significantly mitigated the per-1116.5. Summaryceptual bias to 64.7%. When viewing without stereopsis, weak perspectiveprojection alleviated the perceptual bias to 58.3% with a strong preferenceby 91.7% users compared to perspective projection. The results suggest thatfuture designs of Fishbowl VR and volumetric displays should use stereopsisand weak perspective projection to mitigate perceptual bias and reconcilepixels with percept.112Chapter 7ConclusionsThis chapter reviews and summarizes the contributions of this dissertation.We discuss the applicability of our previous findings to a broader researchdomain, as well as the limitations of our work. Finally, we present futuredirections along with concluding comments.7.1 Research ContributionsThis dissertation provides four primary contributions to the research domainof 3D displays. These contributions are: creating automatic display calibra-tion approach (Chapter 3), formulating visual error (Chapter 4), demon-strating the superiority of the spherical screen (Chapter 5) and the percep-tual duality in FTVR displays (Chapter 6). Chapter 3 uses computer visiontechniques to develop an automatic calibration approach to reconstruct thedisplay surface. Chapter 4 uses computer graphic techniques to incorporatethe viewpoint into the Fishbowl VR display and analyze the visual errorfrom the perspective-corrected rendering. Chapter 5 and 6 are grounded onhuman factor studies to investigate the visual perception on the FishbowlVR display. Publications from this dissertation are listed in Appendix Aand the main contributions are summarized below.Automatic calibration of a multiple-projector spherical displayi. Created a novel automatic calibration approach of a multiple-projector spherical display. We developed a novel automatic cali-bration method using a single camera for a multiple-projector sphericaldisplay. Modeling the projector as an inverse camera, we estimate theintrinsic and extrinsic projector parameters automatically using a setof projected images on the spherical screen. A calibrated camera isplaced beneath to observe partially visible projected patterns. Usingthe correspondence between the observed pattern and the projectedpattern, we reconstruct the shape of the spherical display and finallyrecover the 3D position of each projected pixel on the display.1137.1. Research Contributionsii. Applied and evaluated the calibration approach to a Fish-bowl VR prototype. We applied the calibration approach in aprototype and achieved sub-millimeter calibration accuracy by esti-mating the on-surface error. The results were consistent with existingwork. The calibrated display can support both view-dependent andview-independent applications.Error analysis of a Fishbowl VR displayi. Formulated visual error of Fishbowl VR displays. We formu-lated the visual error of a Fishbowl VR display in terms of displaycalibration error and head-tracking error. Using this model, we ana-lyzed the sensitivity of the viewer’s visual error to errors in each partof the system. We also presented the design and implementation oftwo Fishbowl VR prototypes.ii. Established design guidelines for FTVR displays. We appliedthe error analysis on our prototypes and provided guidelines to mini-mize visual error for FTVR displays. We found: head tracking errorcauses significantly more visual error than display calibration error;visual error becomes more sensitive to tracking error when the viewermoves closer to the sphere; and visual error is sensitive to the distancebetween the virtual object and its corresponding pixel on the surface.Taken together, these results provide practical guidelines for buildinga Fishbowl VR display and can be applied to other configurations ofgeometric displays.Evaluation of spatial perception in a Fishbowl VR displayi. Compared spatial perception on a Fishbowl VR display witha traditional flat FTVR display. We conducted an experimentand found that the Fishbowl VR display provides better task per-formance with depth accuracy of 1cm which yielded significantly lesserror compared to the accuracy of 6.5 cm on the flat FTVR display ina depth-ranking task. In a size-matching task, the performance wasalso better with an accuracy of 2.3 mm on the spherical display ascompared to 3.1 mm on the flat display.ii. Demonstrated superiority of the spherical display in provid-ing better size-constancy. We found that the perception of size-constancy is stronger on the Fishbowl VR display than the flat FTVR1147.2. Indications to FTVR Designsdisplay. This indicates that the natural affordances provided by thespherical form factor better preserved the size-constancy and improvedsize perception in 3D compared to a flat display.Evaluation of perceptual duality of FTVR displaysi. Demonstrated the influence of on-screen imagery on size per-ception. We conducted two studies and found that the size of on-screen imagery significantly influenced object size perception, causing83.3% under/overestimation of perceived size. This perceptual biasindicates there is a perceptual duality between the on-screen pixelsand the 3D percept.ii. Demonstrated the influence of stereopsis on size perception.We conducted a study and found adding stereopsis can mitigated un-der or overestimation of perceived size, which reduced to 64.7% whenviewing with stereopsis. It is suggested that future designs of FTVRdisplays should include the stereo cue to alleviate the influence of on-screen imagery.iii. Compared size perception under different projection matri-ces. We conducted a study and found using weak perspective pro-jection significantly reduced the perceptual bias of size perception to58.3% and was strongly preferred by 91.7% users compared to perspec-tive projection. Projection matrices which produce a stable on-screenimagery, such as weak perspective projection, should be considered tomitigate the perceptual bias and make sure users perceive 3D scenesin a way as expected.7.2 Indications to FTVR DesignsWe investigated the visualization and spatial perception provided by Fish-bowl VR displays. Based on the result, we now revisit the form factors ofFTVR displays and discuss design indications to FTVR displays. As ourstudies are carried out on two Fishbowl VR prototypes, we also discuss theapplicability of our findings to other FTVR displays with different shapes.Stereopsis Stereopsis has been shown to be critically important for FTVRdisplays in various tasks [5, 6]. Consistent to the literature, we also foundthat stereopsis improved size perception in Chapter 6. While users’ task1157.2. Indications to FTVR Designsperformance has been significantly improved with stereopsis, we did notfind significant subjective preference on stereopsis. Relevant to our findings,recent work also found that while users’ performance was significantly de-graded without stereopsis, they did not have a strong preference for stereorendering [36]. Therefore, non-stereo FTVR displays would be reasonablefor use cases that simply provide subjective impression such as 3D showcaseat the exhibition. However, these work were all carried on spherical displays.It is possible that the spherical display creates a strong “fishbowl” metaphorof virtual objects contained within the sphere. With the metaphor, it main-tains the 3D illusion without stereopsis. Therefore, this finding may not beapplied on traditional planar FTVR displays.Head-coupling Head-coupling in FTVR displays provides motion par-allax cue by rendering to viewer’s perspective. It is suggested to includehead-coupling in Fishbowl VR displays based on the results of Chapter 5and 6. In Chapter 6, we found that head movement provided accurate sizeperception with low perceptual bias. Compared to the planar screen, usersexhibited more head movements with the spherical screen. For a singleviewer, it is probably unnecessary to cover the entire sphere with head-coupling to provide view-dependent 360◦ imagery, as we found that userswere only using part of the spherical display space in Chapter 5. For multi-person viewing experience, supporting 360◦ visibility becomes important aseach viewer gains own distinct perspective around the display [37].On-screen Imagery When users visualize a virtual object in FTVR dis-plays, they are actually looking at pixels on a 2D screen. A perceptual du-ality exists between the on-screen pixels and the 3D percept. The influenceof on-screen imagery inherently exists due to the nature of FTVR displays,which we evaluated on a Fishbowl VR display in Chapter 6. Though our in-vestigation is conducted using a spherical display, the observed effect couldbe expected independent of the display shape. When the stimuli size iscomparable to the screen size, it is possible for the perceptual bias be morepronounced, because the screen can serve as a reference that makes the lo-cal changes of the on-screen imagery more noticeable. However, if a similarexperiment is conducted in a completely dark environment with light onlycoming from the on-screen imagery, the result might be different because itbecomes difficult to observe the local changes on the screen when the screenitself is invisible.1167.3. Discussion of Fishbowl VR Displays7.3 Discussion of Fishbowl VR DisplaysWe found that the Fishbowl VR display is promising to provide consistentperception between the physical and digital world in Chapter 5 comparedto a traditional flat counterpart display. Therefore, we believe 3D applica-tions which require high fidelity spatial perception, such as CAD and virtualsurgery, can potentially benefit from the spherical screen shape over a tra-ditional flat screen. Given the property of the Fishbowl VR display, it ismore likely for this type of display to be applied for targeted domains ratherthan having a wide-spread adoption in all fields. In this section, we pro-vide a general discussion of the Fishbowl VR display on its strength andlimitations by comparing to other displays.The spherical screen of the Fishbowl VR display has an enclosure nature,which presents a metaphor as if virtual objects are inside a glass globeor display case with a finite rendering volume. This is different from flatdisplays with a window metaphor and infinite rendering volume. While itis possible to render virtual objects outside the boundary of the sphericalsurface, it is inconsistent with the metaphor of having a glass globe andtherefore potentially breaks the 3D illusion. Furthermore, the virtual objectsare more sensitive to the tracking error when rendered outside the screenas discussed in Section 4.3.4 and Figure 4.7 compared to rendered inside ata equivalent distance to the screen. Therefore, we constrained the virtualscene inside the sphere in our user studies (Chapter 5 and 6) as well asdemonstrated applications [47, 147, 153].While the enclosure nature provides a strong metaphor, the finite render-ing volume is a constraint when applying this type of display to applications.This limitation is not unique to the spherical surface. Other enclosed sur-faces, such as cube and cylinder, are also subject to it. Due to this constraint,it is better for this type of display to be applied for targeted applicationswith exocentric tasks. During exocentric tasks, the position and orientationof objects are defined in an exocentric reference frame [17]. Object-centeredapplications, such as CAD, can potentially benefit from this type of displayby presenting the model in the center of the display. Naturally, the modelwill be scaled based on the size of the display. A smaller display, such as the12 inch prototype (Figure 4.2(left)), will be suitable for a single designer sce-nario, and a larger display, such as the 24 inch prototype (Figure 4.2(right)),will be suitable for a collaborative working scenario.The targeted exocentric application domain is consistent with the non-immersive nature of FTVR displays, which provide an “outside-in” viewingexperience, compared to immersive displays with an “inside-out” viewing ex-1177.4. Limitationsperience, such as HMD and CAVE. Unlike immersive displays, which isolatethe user by blocking out all physical surroundings, FTVR displays situatein the physical environment. It provides opportunities for users to inter-act with physical objects, such as note-taking, messaging or recording whenusers work in the virtual environment with existing physical tools. In acollaborative environment, FTVR displays allow direct communication be-tween users. There are the major strength of FTVR displays compared toHMD. Therefore, despite of the lack of generalization for various applica-tions, we believe Fishbowl VR displays can be applied for targeted appli-cations with exocentric tasks and the requirement of interacting with thephysical surrounding.7.4 LimitationsDemographic of participants One of the limitations across all of ourstudies is the demographic of participants. Our participants generally camefrom a pool of engineering students. Hence, they might have higher thanaverage technical skills and experiences. It is possible that they had moreexposure to VR technologies and 3D displays than the general public. Assuch, it would be difficult to generalize our result to the common public.Yet, our conclusions from these studies stand for this type of users whocould potentially use Fishbowl VR displays in the future. An issue relatedto the demographic is the sample size. Each experiment involved 12-17participants, which is typical for human factor studies in the literature.Naturally, larger sample size would make the results more pronouncing.Yet, each experiment is a sub-study as part of an entire study which triedto answer the same research question but from different perspective. Thetriangulation from multiple metrics in sub-studies helps to build the internalvalidity from different perspectives.Controlled laboratory experiments Another limitation is that all ofour studies were short-term controlled laboratory experiments, which maynot reflect the viewing experience under natural conditions when people usethe system in a relaxing and comfortable way. We tried to address thisissue by bringing the display to conferences and having attendees to tryour experimental stimuli in demo sessions. However, the observations weresubjective and potentially biased towards our display given the nature oflive-demo in conferences.1187.4. LimitationsAbstract task design An issue related to the study design is the designof tasks used in our studies, including a size matching task, depth judge-ment task and size judgement task. Different from real tasks in 3D applica-tions, these tasks are abstract and artificial, which may not directly reflectusers’ performance in 3D applications. However, it is important to keep thetask simple and abstract in controlled experiments to ensure the outcomeis only associated with the independent variables. Though these perceptualabstract tasks differentiate from real tasks, to some degree, they imitategeneral 3D tasks in applications as a combination of depth, size and orienta-tion perception of objects. For example, the size matching task in Chapter 5reflects user’s ability to perceive and adjust the scale of 3D objects whichis common in computer aided design systems; the spatial impression task inChapter 6 imitates walk-around visualization experience when 3D displaysare used as an attention-drawing showcase or a casual entertainment device.Experimental stimuli A more specific issue, relevant to Chapter 5 and 6,is that we used spherical stimuli across different visual tasks. We chose thespherical shape of stimuli in our study due to its isotropic property, whichhas been commonly used in the literature of perceptual studies [73, 89, 101].As our display is also spherical, it remains to be a question that whetherstimuli with different shapes such as a cube would reproduce similar results.The same type of study will need to be carried out again for stimuli withother shapes on the Fishbowl VR display. It is also worth noting the dis-crepancy between the visual quality of experimental stimuli and the visualfidelity mentioned in this dissertation. There has been tremendous effort incomputer graphics to develop photo-realistic rendering from the perspectiveof graphic content. Our goal of improving the visual fidelity takes a dif-ferent perspective via studying the technical and perceptual factors of 3Ddisplays. Therefore the experimental stimuli do not necessarily require thephoto-realistic high quality of graphic rendering. However, future experi-ments could consider to include the variation of visual stimuli used in thestudy, such as using different textures or shading models, and investigate thepotential interaction between the visual quality of stimuli and the displayfactors such as the screen shape in Chapter 5 and the movement type inChapter 6.1197.5. Directions7.5 DirectionsShort term directions for the research components of this dissertation arediscussed at the end of each chapter. Here we discuss broader directions forsupporting perceptual consistency as well as the future potential of FishbowlVR displays.Evaluating spatial perception beyond controlled laboratory stud-ies. As discussed in Section 7.4, our findings are based on a small groupof users in a series of short-term controlled laboratory studies, which iscommon for perception studies in the domain of HCI. This type of studydesign provides strong internal validity and helps us to explore the complex-ity of spatial perception with specific insights under controlled conditions.One challenge when applying the results to real-world use cases is that theexperimental tasks may not be equivalent to real-world tasks that couldpotentially benefit from improved spatial perception. Therefore, evaluatingspatial perception in natural viewing conditions, such as in a field study,would provide data on the effective spatial perception supported by Fish-bowl VR displays. In Chapter 6, we designed a walk-around visualizationtask which simulates the viewing experience when users browse the 3D databy walking around, and inspect points of interest by moving closer to thedisplay. As head movements can be complicated with a combination of allpossible directions, designing tasks which allows free movement would pro-vide practical insights for realistic scenarios such as visual analytic, trainingand gaming. In addition to the movement, using real visual stimuli, suchas scanned volume datesets [84] generated from ultrasound or computed to-mography, would make the results more pronounced in use cases such asscientific visualization. These specific experimental design variations are ex-amples of what we see as a promising future direction for understanding andimproving spatial perception in 3D displays.Comparing with other 3D displays. In Chapter 5, we compared theperformance of a Fishbowl VR display to a traditional flat FTVR displayand found the spherical display provided better spatial perception than theflat display. It is an open question as how the results can be applied to other3D displays such as HMD and CAVE. While there exists studies which com-pared spatial perception from CAVE, FTVR and HMD [31, 41, 78, 106], theyall used flat FTVR displays which supports limited field of view. Compar-ing the performance of Fishbowl VR displays with other 3D displays would1207.5. Directionsgive us a broader understanding of the unique properties of this type of dis-play. Furthermore, as the user performance is also task-related, comparingwith other 3D displays helps to understand the visual fidelity of differentdisplays and find appropriate applications the could benefit from FishbowlVR displays.Figure 7.1: Examples of potential 3D applications using the Fishbowl VRdisplay: (a) smart home assistant coupled with a spherical screen situatedin the living room, (b) users making eye contact with a virtual agent [47]using a 12 inch prototype, and (c) collaborative tasks for two users to worktogether in the virtual environment using a 24 inch prototype [36].Identifying appropriate 3D applications. Our efforts show potentialfor Fishbowl VR displays to provide consistent spatial perception in a waysimilar to the real world. It is unclear which applications can directly ben-efit from it. Here we discuss several possible areas which can potentiallybenefit from our findings and the two Fishbowl VR prototypes we devel-oped (Figure 4.2). Naturally, smaller Fishbowl VR display, such as the12 inch prototype, would benefit tasks and applications with personal use1217.5. Directionsand collaborative tasks would favor larger displays like the 24 inch proto-type. Home assistant devices, such as Google Home [57] and Amazon Echo[52], have seen rapid adoption in the home over the past few years. So far,they have been limited to audio devices. As illustrated in Figure 7.1(a), theFishbowl VR display extends the home assistant concept to use a volumetric3D display as a focal point of the living room to create an embodied assis-tant with 360◦ visibility. Other augmented reality living room systems havebeen proposed, such as headset AR [61] and projection onto living roomsurfaces [116]. However, these systems lack the tangible nature of having avolumetric display that is part of the real living room. The spherical displayis tangible and could show a virtual object situated in a real living environ-ment. Furthermore, several perceptual studies have found spherical displaysprovided better eye contact [47] (shown in Figure 7.1(b)) and gaze perception[99] when interacting with virtual agents and avatars. This makes FishbowlVR as appropriate displays applied in home assistant devices visible fromall angles, such as using the 12 inch prototype.Computer Aided Design (CAD) is another area that could benefit fromconsistent spatial perception. Realistic visual perception is crucial as usersneed to accurately perceive the size, shape and position of the 3D designs.Misperception of the distance and size of virtual models would decrease theperformance of tasks ranging from 3D scaling to assembly in CAD. Theability to preserve size-constancy as shown in Chapter 5 allows users to ef-fectively position, scale and design prototypes in the virtual environment. Itis not just about making correct size judgments between two virtual entities,but also preserving consistent size perception between the real and virtualenvironment such that the perceived scale of a virtual model is consistentwhen it gets fabricated in the real world. In comparison to immersive 3Ddisplays such as HMD, Fishbowl VR displays provide more consistent per-ception as it allows users to see virtual contents situated in the real world.In particular, in the collaborative environment when multiple users worktogether, a larger Fishbowl VR display, such as the 24 inch prototype, al-lows people to share their experience and visualize the same virtual entitysituated in the real world. Through co-location, users can explore a virtualenvironment together while at the same time interacting with each otherin the real world as shown in Figure 7.1(c). We suggest future studies toidentify challenges and develop techniques to provide consistent perceptionfor multiple co-located users simultaneously. We hope that our work onspatial perceptions shows potentials and help to pave the way for futureapplications in the area like CAD and home assistant devices.1227.6. Concluding Remarks7.6 Concluding RemarksTo conclude, this dissertation has presented a Fishbowl VR display with newtechniques and perceptual studies to improve visual fidelity towards bridgingthe perceptual gap between the virtual and real world. We first formulatedthe visual error (Chapter 4) and created an automatic calibration approachto generate a seamless imagery with multiple projectors (Chapter 3), al-lowing us to subsequently investigate perceptual issues. Once the technicalfoundation is established, we demonstrated the superiority of the sphericalscreens as well as the perceptual duality via a series of human factor studies(Chapter 5 and 6), which play an important role in improving visual fi-delity and enhancing the 3D experience. Providing realistic visualization inthe virtual environment is difficult and inherently has considerable technicaland perceptual challenges. While it was not possible to exhaust all possibleaspects, we focus on the core technical and perceptual issues, allowing us tolay the groundwork for reconciling pixels with percept. Our work can thusserve as a baseline for future experiments, theoretical models, interactiontechniques, and 3D applications to build on.123Bibliography[1] Johnny Accot and Shumin Zhai. Beyond fitts’ law: models fortrajectory-based hci tasks. In Proceedings of the ACM SIGCHI Con-ference on Human factors in computing systems, pages 295–302. ACM,1997. → page 14.[2] Atif Ahmed, Rehan Hafiz, Muhammad Murtaza Khan, Yongju Cho,and Jihun Cha. Geometric correction for uneven quadric projectionsurfaces using recursive subdivision of be´zier patches. ETRI Journal,35(6):1115–1125, 2013. → page 28.[3] JM Andrade and MG Este´vez-Pe´rez. Statistical comparison of theslopes of two regression lines: a tutorial. Analytica chimica acta,838:1–12, 2014. → page 76.[4] Claudia Armbru¨ster, Marc Wolter, Torsten Kuhlen, Will Spijkers, andBruno Fimm. Depth perception in virtual reality: distance estimationsin peri-and extrapersonal space. Cyberpsychology & Behavior, 11(1):9–15, 2008. → page 22.[5] Roland Arsenault and Colin Ware. The importance of stereo and eye-coupled perspective for eye-hand coordination in fish tank vr. Pres-ence: Teleoperators and Virtual Environments, 13(5):549–559, 2004.→ pages 21 and 115.[6] Kevin W Arthur, Kellogg S Booth, and Colin Ware. Evaluating 3dtask performance for fish tank virtual worlds. ACM Transactions onInformation Systems (TOIS), 11(3):239–265, 1993. → pages 13, 15,21, 22, and 115.[7] Ronald Azuma and Gary Bishop. Improving static and dynamic reg-istration in an optical see-through hmd. In Proceedings of the 21stannual conference on Computer graphics and interactive techniques,pages 197–204. ACM, 1994. → pages 15 and 25.124Bibliography[8] Martin Bauer. Tracking Errors in Augmented Reality. PhD thesis,Technische Universita¨t Mu¨nchen, 2007. → page 25.[9] Hrvoje Benko, Ricardo Jota, and Andrew Wilson. Miragetable: free-hand interaction on a projected augmented reality tabletop. In Pro-ceedings of the SIGCHI conference on human factors in computingsystems, pages 199–208. ACM, 2012. → pages 14, 23, and 66.[10] Hrvoje Benko, Andrew D Wilson, and Ravin Balakrishnan. Sphere:multi-touch interactions on a spherical display. In Proceedings of the21st annual ACM symposium on User interface software and technol-ogy, pages 77–86. ACM, 2008. → pages 16, 29, 63, 64, and 83.[11] Hrvoje Benko, Andrew D Wilson, and Federico Zannier. Dyadic pro-jected spatial augmented reality. In Proceedings of the 27th annualACM symposium on User interface software and technology, pages645–655. ACM, 2014. → pages 4, 11, 12, 14, 15, 20, and 23.[12] Francois Berard and Thibault Louis. The object inside: Assessing3d examination with a spherical handheld perspective-corrected dis-play. In Proceedings of the 2017 CHI Conference on Human Factorsin Computing Systems, pages 4396–4404. ACM, 2017. → pages 17and 23.[13] Oliver Bimber and Ramesh Raskar. Spatial augmented reality: mergingreal and virtual worlds. AK Peters/CRC Press, 2005. → page 14.[14] Barry G Blundell and Adam J Schwarz. The classification of volu-metric display systems: characteristics and predictability of the imagespace. IEEE Transactions on Visualization and Computer Graphics,8(1):66–75, 2002. → page 16.[15] John Bolton, Kibum Kim, and Roel Vertegaal. Snowglobe: a sphericalfish-tank vr display. In CHI’11 Extended Abstracts on Human Factorsin Computing Systems, pages 1159–1164. ACM, 2011. → page 16.[16] John Bolton, Kibum Kim, and Roel Vertegaal. A comparison of com-petitive and cooperative task performance using spherical and flatdisplays. In Proceedings of the ACM 2012 conference on ComputerSupported Cooperative Work, pages 529–538. ACM, 2012. → pages 16and 64.125Bibliography[17] Doug A Bowman, Sabine Coquillart, Bernd Froehlich, Michitaka Hi-rose, Yoshifumi Kitamura, Kiyoshi Kiyokawa, and Wolfgang Stuer-zlinger. 3d user interfaces: New directions and perspectives. IEEEcomputer graphics and applications, 28(6):20–36, 2008. → pages 10,17, and 117.[18] Mark F Bradshaw, Andrew D Parton, and Richard A Eagle. Theinteraction of binocular disparity and motion parallax in determin-ing perceived depth and perceived size. Perception, 27(11):1317–1331,1998. → page 93.[19] Mark F Bradshaw, Andrew D Parton, and Andrew Glennerster. Thetask-dependent use of binocular disparity and motion parallax infor-mation. Vision research, 40(27):3725–3734, 2000. → page 4.[20] Gerd Bruder, Ferran Argelaguet, Anne-He´le´ne Olivier, and AnatoleLe´cuyer. Cave size matters: Effects of screen distance and parallaxon distance estimation in large immersive display setups. Presence:Teleoperators and Virtual Environments, 25(1):1–16, 2016.→ pages 22and 79.[21] Han Chen, Rahul Sukthankar, Grant Wallace, and Kai Li. Scalablealignment of large-format multi-projector displays using camera ho-mography trees. In Proceedings of the conference on Visualization’02,pages 339–346. IEEE Computer Society, 2002. → pages 24 and 28.[22] Han Chen, Rahul Sukthankar, Grant Wallace, and Kai Li. Scalablealignment of large-format multi-projector displays using camera ho-mography trees. In Proceedings of the conference on Visualization’02,pages 339–346. IEEE Computer Society, 2002. → page 56.[23] Yuqun Chen, Douglas W Clark, Adam Finkelstein, Timothy C Housel,and Kai Li. Automatic alignment of high-resolution multi-projectordisplay using an un-calibrated camera. In Proceedings of the conferenceon Visualization’00, pages 125–130. IEEE Computer Society Press,2000. → pages 24 and 28.[24] Lung-Pan Cheng, Eyal Ofek, Christian Holz, Hrvoje Benko, and An-drew D Wilson. Sparse haptic proxy: Touch feedback in virtual envi-ronments using a general passive prop. In Proceedings of the 2017 CHIConference on Human Factors in Computing Systems, pages 3718–3728. ACM, 2017. → pages 11 and 12.126Bibliography[25] Lung-Pan Cheng, Thijs Roumen, Hannes Rantzsch, Sven Ko¨hler,Patrick Schmidt, Robert Kovacs, Johannes Jasper, Jonas Kemper, andPatrick Baudisch. Turkdeck: Physical virtual reality based on people.In Proceedings of the 28th Annual ACM Symposium on User Inter-face Software & Technology, pages 417–426. ACM, 2015. → pages 11and 12.[26] Nusrat Choudhury, Nicholas Ge´linas-Phaneuf, Se´bastien Delorme, andRolando Del Maestro. Fundamentals of neurosurgery: virtual realitytasks for training and evaluation of technical skills. World neuro-surgery, 80(5):e9–e19, 2013. → page 2.[27] Zeynep Cipiloglu, Abdullah Bulbul, and Tolga Capin. A frameworkfor enhancing depth perception in computer graphics. In Proceedingsof the 7th Symposium on Applied Perception in Graphics and Visual-ization, pages 141–148. ACM, 2010. → page 3.[28] Sarah H Creem-Regehr, Jeanine K Stefanucci, and William B Thomp-son. Perceiving absolute scale in virtual environments: How theoryand application have mutually informed the role of body-based per-ception. In Psychology of Learning and Motivation, volume 62, pages195–224. Elsevier, 2015. → pages 22 and 23.[29] Carolina Cruz-Neira, Daniel J Sandin, and Thomas A DeFanti.Surround-screen projection-based virtual reality: the design and im-plementation of the cave. In Proceedings of the 20th annual confer-ence on Computer graphics and interactive techniques, pages 135–142.ACM, 1993. → pages 12, 14, 25, 55, and 86.[30] Michael Deering. High resolution virtual reality. ACM SIGGRAPHComputer Graphics, 26(2):195–202, 1992. → page 1.[31] Cagatay Demiralp, Cullen D Jackson, David B Karelitz, Song Zhang,and David H Laidlaw. Cave and fishtank virtual-reality displays: Aqualitative and quantitative comparison. IEEE transactions on visu-alization and computer graphics, 12(3):323–330, 2006. → pages 4, 13,and 120.[32] JP Djajadiningrat, Gerda JF Smets, and CJ Overbeeke. Cubby: amultiscreen movement parallax display for direct manual manipula-tion. Displays, 17(3-4):191–197, 1997. → page 14.127Bibliography[33] C BROWN Duane. Close-range camera calibration. Photogramm.Eng, 37(8):855–866, 1971. → pages 33, 145, and 146.[34] Robert G Eggleston, William P Janson, and Kenneth A Aldrich. Vir-tual reality system effects on size-distance judgements in a virtual en-vironment. In Virtual Reality Annual International Symposium, 1996.,Proceedings of the IEEE 1996, pages 139–146. IEEE, 1996. → page 23.[35] Kevin W Elner and Helen Wright. Phenomenal regression to the realobject in physical and virtual worlds. Virtual Reality, 19(1):21–31,2015. → pages 20, 23, 91, 105, and 111.[36] Dylan Fafard, Ian Stavness, Martin Dechant, Regan Mandryk, QianZhou, and Sidney Fels. Ftvr in vr: Evaluation of 3d perception with asimulated volumetric fish-tank virtual reality display. In Proceedings ofthe 2019 CHI Conference on Human Factors in Computing Systems,pages 1–12, 2019. → pages 85, 86, 91, 96, 105, 116, and 121.[37] Dylan Fafard, Qian Zhou, Chris Chamberlain, Georg Hagemann, Sid-ney Fels, and Ian Stavness. Design and implementation of a multi-person fish-tank virtual reality display. In Proceedings of the 24thACM Symposium on Virtual Reality Software and Technology, VRST’18. ACM, 2018. → pages v, vi, 48, 50, 94, and 116.[38] Dylan Brodie Fafard. A virtual testbed for fish-tank virtual reality:Improving calibration with a virtual-in-virtual display. Master’s thesis,University of Saskatchewan, 2019. → page 52.[39] Gabriel Falcao, Natalia Hurtos, and Joan Massich. Plane-based cali-bration of a projector-camera system. VIBOT master, 9(1):1–12, 2008.→ pages 32 and 145.[40] G. E. Favalora. Volumetric 3d displays and application infrastructure.Computer, 38(8):37–44, 2005. → pages 3, 11, 12, and 16.[41] Andrew Forsberg, Michael Katzourin, Kristi Wharton, Mel Slater,et al. A comparative study of desktop, fishtank, and cave systems forthe exploration of volume rendered confocal data sets. IEEE Transac-tions on Visualization and Computer Graphics, 14(3):551–563, 2008.→ pages 4 and 120.[42] Davide Gadia, Gianfranco Garipoli, Cristian Bonanomi, Luigi Albani,and Alessandro Rizzi. Assessing stereo blindness and stereo acuity ondigital displays. Displays, 35(4):206–212, 2014. → page 93.128Bibliography[43] Michael N Geuss, Jeanine K Stefanucci, Sarah H Creem-Regehr, andWilliam B Thompson. Effect of viewing plane on perceived distancesin real and virtual environments. Journal of Experimental Psychology:Human Perception and Performance, 38(5):1242, 2012. → page 22.[44] James J Gibson. The information available in pictures. Leonardo,4(1):27–35, 1971. → pages 20 and 86.[45] Tovi Grossman and Ravin Balakrishnan. An evaluation of depth per-ception on volumetric displays. In Proceedings of the working confer-ence on Advanced visual interfaces, pages 193–200. ACM, 2006. →pages 16, 22, and 66.[46] Ralph Norman Haber. How we perceive depth from flat pictures: Theinherent dual reality in pictorial art enables us to perceive a sceneas three dimensional at the same time we see that the painting orphotograph is actually flat. American Scientist, 68(4):370–380, 1980.→ pages 20 and 86.[47] Georg Hagemann, Qian Zhou, Ian Stavness, and Sidney Fels. In-vestigating spherical fish tank virtual reality displays for establishingrealistic eye-contact. In 2019 IEEE Conference on Virtual Reality and3D User Interfaces (VR), pages 950–951. IEEE, 2019. → pages 117,121, and 122.[48] Richard Hartley and Andrew Zisserman. Multiple view geometry incomputer vision. Cambridge university press, 2003. → pages 32, 36,37, 56, and 145.[49] Michael Harville, Bruce Culbertson, Irwin Sobel, Dan Gelb, AndrewFitzhugh, and Donald Tanguay. Practical methods for geometric andphotometric correction of tiled projector. In Computer Vision and Pat-tern Recognition Workshop, 2006. CVPRW’06. Conference on, pages5–5. IEEE, 2006. → pages 24 and 28.[50] Anuruddha Hettiarachchi and Daniel Wigdor. Annexing reality: En-abling opportunistic use of everyday objects as tangible proxies inaugmented reality. In Proceedings of the 2016 CHI Conference on Hu-man Factors in Computing Systems, pages 1957–1967. ACM, 2016. →page 11.129Bibliography[51] Richard L Holloway. Registration error analysis for augmented real-ity. Presence: Teleoperators and Virtual Environments, 6(4):413–432,1997. → pages 25, 47, 60, and 105.[52] Amazon Inc. Amazon echo, 2019. https://en.wikipedia.org/wiki/Amazon_Echo. → page 122.[53] ASUS Inc. Asus p2b portable led projector, 2020. https://www.asus.com/ca-en/Commercial-Projectors/P2B/. → pages 40, 49, and 65.[54] Facebook Inc. Oculus, 2020. https://www.oculus.com. → pages 10,12, and 15.[55] FLIR Systems Inc. Flea3 usb3 — flir systems, 2020. https://www.flir.com/products/flea3-usb3?model=FL3-U3-13Y3M-C. →page 41.[56] Fujifilm Inc. Fujinon 3.8-13mm f1.4 — fujinon lens, 2020. https://www.fujifilm.com/products/optical_devices/. → page 41.[57] Google Inc. Google home, 2019. https://en.wikipedia.org/wiki/Google_Home. → page 122.[58] HTC Inc. Vive, 2020. https://www.vive.com. → page 15.[59] Matlab Inc. Symbolic math toolbox from matlab, 2019. https://www.mathworks.com/help/symbolic/generate-c-or-fortran-code.html. → page 148.[60] Microsoft Inc. Kinect, 2020. https://developer.microsoft.com/en-us/windows/kinect. → pages 25 and 59.[61] Microsoft Inc. Microsoft HoloLens — mixed reality technology forbusiness, 2020. https://www.microsoft.com/en-us/hololens. →pages 10, 12, 15, and 122.[62] NaturalPoint Inc. Optitrack, 2020. https://optitrack.com. →pages 25, 50, 60, and 94.[63] OPTOMA Inc. Optoma mobile led projector ml750, 2020. https://www.optoma.com/us/product/ml750/. → page 50.[64] Polhemus Inc. Polhemus motion tracking systems, 2020. https://polhemus.com/motion-tracking/overview/. → pages 40 and 50.130Bibliography[65] Pufferfish Inc. Pufferfish interactive spherical displays, 2020. https://pufferfishdisplays.com. → page 16.[66] VICON Inc. Vicon, 2019. https://www.vicon.com. → page 25.[67] XPAND Inc. Xpand 3d glasses lite rf, 2020. http://xpandvision.com/products/xpand-3d-glasses-lite-ir-rf/. → page 50.[68] Brett Jones, Rajinder Sodhi, Michael Murdock, Ravish Mehra, HrvojeBenko, Andrew Wilson, Eyal Ofek, Blair MacIntyre, Nikunj Raghu-vanshi, and Lior Shapira. Roomalive: magical experiences enabled byscalable, adaptive projector-camera units. In Proceedings of the 27thannual ACM symposium on User interface software and technology,pages 637–644. ACM, 2014. → page 14.[69] Brett R Jones, Hrvoje Benko, Eyal Ofek, and Andrew D Wilson. Illu-miroom: peripheral projected illusions for interactive experiences. InProceedings of the SIGCHI Conference on Human Factors in Comput-ing Systems, pages 869–878. ACM, 2013. → page 14.[70] J Adam Jones, J Edward Swan II, Gurjot Singh, Eric Kolstad, andStephen R Ellis. The effects of virtual reality, augmented reality, andmotion parallax on egocentric depth perception. In Proceedings of the5th symposium on Applied perception in graphics and visualization,pages 9–14. ACM, 2008. → page 22.[71] Naoki Kawakami, Masahiko INAMI, Taro MAEDA, and SusumuTACHI. Proposal for the object-oriented display: The design andimplementation of the media3. In Proceedings of ICAT, volume 97,pages 57–62, 1997. → page 14.[72] Daniel F Keefe, Daniel Acevedo Feliz, Tomer Moscovich, David HLaidlaw, and Joseph J LaViola Jr. Cavepainting: a fully immersive 3dartistic medium and interactive experience. In Proceedings of the 2001symposium on Interactive 3D graphics, pages 85–93. Citeseer, 2001. →pages 11 and 12.[73] Jonathan W Kelly, Lucia A Cherep, and Zachary D Siegel. Per-ceived space in the htc vive. ACM Transactions on Applied Perception(TAP), 15(1):2, 2017. → pages 64, 91, 104, 110, and 119.[74] Jonathan W Kelly, Lisa S Donaldson, Lori A Sjolund, and Jacob BFreiberg. More than just perception–action recalibration: Walking131Bibliographythrough a virtual environment causes rescaling of perceived space.Attention, Perception, & Psychophysics, 75(7):1473–1485, 2013. →pages 22, 23, and 82.[75] Jonathan W Kelly, Jack M Loomis, and Andrew C Beall. Judgmentsof exocentric direction in large-scale space. Perception, 33(4):443–454,2004. → page 22.[76] Robert V Kenyon, Daniel Sandin, Randall C Smith, Richard Pawlicki,and Thomas Defanti. Size-constancy in the cave. Presence: Teleop-erators and Virtual Environments, 16(2):172–187, 2007. → pages 64,66, 67, 70, 75, 81, 82, and 87.[77] Kibum Kim, John Bolton, Audrey Girouard, Jeremy Cooperstock, andRoel Vertegaal. Telehuman: effects of 3d perspective on gaze and poseestimation with a life-size cylindrical telepresence pod. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems,pages 2531–2540. ACM, 2012. → page 63.[78] Kwanguk Kim, M Zachary Rosenthal, David Zielinski, and RachelBrady. Comparison of desktop, head mounted display, and six wallfully immersive systems using a stressful task. In 2012 IEEE VirtualReality Workshops (VRW), pages 143–144. IEEE, 2012. → page 120.[79] Volodymyr V Kindratenko. A survey of electromagnetic positiontracker calibration techniques. Virtual Reality, 5(3):169–182, 2000.→ page 26.[80] Georg Klein. Visual tracking for augmented reality. PhD thesis, Uni-versity of Cambridge City of Cambridge, United Kingdom, 2006. →page 25.[81] Sirisilp Kongsilp and Matthew N Dailey. Motion parallax from headmovement enhances stereoscopic displays by improving presence anddecreasing visual fatigue. Displays, 49:72–79, 2017.→ pages 13 and 85.[82] Robert Kooima. Generalized perspective projection. J. Sch. Electron.Eng. Comput. Sci, 2009. → page 54.[83] Gregory Kramida. Resolving the vergence-accommodation conflict inhead-mounted displays. IEEE transactions on visualization and com-puter graphics, 22(7):1912–1931, 2015. → page 15.132Bibliography[84] B. Laha, D. A. Bowman, and J. J. Socha. Effects of vr system fi-delity on analyzing isosurface visualization of volume datasets. IEEETransactions on Visualization and Computer Graphics, 20(4):513–522,2014. → pages 18, 19, and 120.[85] Billy Lam, Yichen Tang, Ian Stavness, and Sidney Fels. A 3d cubicpuzzle in pcubee. In 3D User Interfaces (3DUI), 2011 IEEE Sympo-sium on, pages 135–136. IEEE, 2011. → page 14.[86] Billy Shiu Fai Lam. Evaluation of a tangible outward-facing geometricdisplay. Master’s thesis, UNIVERSITY OF BRITISH COLUMBIA(Vancouver, 2011. → page 4.[87] Joseph J LaViola, Andrew S Forsberg, John Huffman, and An-drew Bragdon. Poster: Effects of head tracking and stereo on non-isomorphic 3d rotation. In 2008 IEEE Symposium on 3D User Inter-faces, pages 155–156. IEEE, 2008. → page 13.[88] Ivan KY Li, Edward M Peek, Burkhard C Wu¨nsche, and Christof Lut-teroth. Enhancing 3d applications using stereoscopic 3d and motionparallax. In Proceedings of the Thirteenth Australasian User InterfaceConference-Volume 126, pages 59–68. Australian Computer Society,Inc., 2012. → page 13.[89] Sally A Linkenauger, Markus Leyrer, Heinrich H Bu¨lthoff, and Betty JMohler. Welcome to wonderland: The influence of the size and shapeof a virtual hand on the perceived size and shape of virtual objects.PloS one, 8(7):e68594, 2013. → page 119.[90] Kok-Lim Low, Greg Welch, Anselmo Lastra, and Henry Fuchs. Life-sized projector-based dioramas. In Proceedings of the ACM symposiumon Virtual reality software and technology, pages 93–101. ACM, 2001.→ page 12.[91] Xun Luo, Robert Kenyon, Derek Kamper, Daniel Sandin, and ThomasDeFanti. The effects of scene complexity, stereovision, and motionparallax on size constancy in a virtual environment. In Virtual RealityConference, 2007. VR’07. IEEE, pages 59–66. IEEE, 2007. → pages 4,23, 70, 93, and 104.[92] Blair MacIntyre, Enylton Machado Coelho, and Simon J Julier. Esti-mating and adapting to registration errors in augmented reality sys-133Bibliographytems. In Virtual Reality, 2002. Proceedings. IEEE, pages 73–80. IEEE,2002. → page 25.[93] Ryan P McMahan, Doug A Bowman, David J Zielinski, and Rachael BBrady. Evaluating display fidelity and interaction fidelity in a virtualreality game. IEEE transactions on visualization and computer graph-ics, 18(4):626–633, 2012. → pages 18 and 19.[94] Paul Milgram and Fumio Kishino. A taxonomy of mixed reality vi-sual displays. IEICE TRANSACTIONS on Information and Systems,77(12):1321–1329, 1994. → pages xiii, 11, 12, 15, 17, and 18.[95] Mark R Mine, Jeroen Van Baar, Anselm Grundhofer, David Rose, andBei Yang. Projection-based augmented reality in disney theme parks.Computer, 45(7):32–40, 2012. → page 14.[96] Betty J Mohler, Sarah H Creem-Regehr, and William B Thompson.The influence of feedback on egocentric distance judgments in real andvirtual environments. In Proceedings of the 3rd symposium on Appliedperception in graphics and visualization, pages 9–14. ACM, 2006. →page 22.[97] Daniel Moreno and Gabriel Taubin. Simple, accurate, and robustprojector-camera calibration. In 2012 Second International Conferenceon 3D Imaging, Modeling, Processing, Visualization & Transmission,pages 464–471. IEEE, 2012. → page 37.[98] opencv. opencv, 2019. https://opencv.org/. → pages 41 and 145.[99] Ye Pan and Anthony Steed. Preserving gaze direction in telecon-ferencing using a camera array and a spherical display. In 3DTV-Conference: The True Vision-Capture, Transmission and Display of3D Video (3DTV-CON), 2012, pages 1–4. IEEE, 2012. → pages 16and 122.[100] Anne Parent. A virtual environment task analysis workbook for thecreation and evaluation of virtual art exhibits. National ResearchCouncil of Canada-Reports-ERB, 1998. → page 101.[101] Etienne Peillard, Thomas Thebaud, Jean-Marie Normand, FerranArgelaguet, Guillaume Moreau, and Anatole Le´cuyer. Virtual objectslook farther on the sides: The anisotropy of distance perception invirtual reality. In IEEE VR Conference 2019, 2019. → page 119.134Bibliography[102] Fastrak Polhemus. 3space fastrak user’s manual. F. Polhemus Inc.,Colchester, VT, 1993. → pages 25, 50, and 60.[103] Kevin Ponto, Michael Gleicher, Robert G Radwin, and Hyun JoonShin. Perceptual calibration for immersive display environments.IEEE transactions on visualization and computer graphics, 19(4):691–700, 2013. → pages 23, 80, 81, and 105.[104] David H Laidlaw Prabhat, Thomas F Banchoff, and Cullen D Jackson.Comparative evaluation of desktop and cave environments for learninghypercube rotations. Technical report, Brown University, 2005. →page 13.[105] ALGLIB Project. Alglib, 2019. https://www.alglib.net/. →pages 41 and 147.[106] Wen Qi, Russell M Taylor II, Christopher G Healey, and Jean-BernardMartens. A comparison of immersive hmd, fish tank vr and fish tankwith haptics displays for volume visualization. In Proceedings of the3rd Symposium on Applied Perception in Graphics and Visualization,pages 51–58. ACM, 2006. → pages 4, 11, 13, 22, and 120.[107] Qian Zhou. Geometric fish tank vr display calibration li-brary, 2019. https://github.com/GeometricFishTankVR/SphericalDisplayCalibration. → page 148.[108] Eric D Ragan, Doug A Bowman, Regis Kopper, Cheryl Stinson,Siroberto Scerbo, and Ryan P McMahan. Effects of field of viewand visual complexity on virtual reality training effectiveness for a vi-sual scanning task. IEEE transactions on visualization and computergraphics, 21(7):794–807, 2015. → pages 18 and 19.[109] Eric D Ragan, Regis Kopper, Philip Schuchardt, and Doug A Bowman.Studying the effects of stereo, head tracking, and field of regard on asmall-scale spatial judgment task. IEEE transactions on visualizationand computer graphics, 19(5):886–896, 2012. → page 18.[110] Andrew Raij, Gennette Gill, Aditi Majumder, Herman Towles, andHenry Fuchs. Pixelflex2: A comprehensive, automatic, casually-aligned multi-projector display. In IEEE International Workshop onProjector-Camera Systems, pages 203–211. Nice, France, 2003. →pages 24 and 28.135Bibliography[111] Andrew Raij and Marc Pollefeys. Auto-calibration of multi-projectordisplay walls. In Pattern Recognition, 2004. ICPR 2004. Proceedingsof the 17th International Conference on, volume 1, pages 14–17. IEEE,2004. → page 37.[112] Ramesh Raskar. Immersive planar display using roughly aligned pro-jectors. In Virtual Reality, 2000. Proceedings. IEEE, pages 109–115.IEEE, 2000. → pages 24 and 28.[113] Ramesh Raskar, Michael S Brown, Ruigang Yang, Wei-Chao Chen,Greg Welch, Herman Towles, Brent Scales, and Henry Fuchs. Multi-projector displays using camera-based registration. In Visualiza-tion’99. Proceedings, pages 161–522. IEEE, 1999. → pages 24, 28,31, 35, 40, 41, and 53.[114] Ramesh Raskar, Jeroen van Baar, Paul Beardsley, ThomasWillwacher, Srinivas Rao, and Clifton Forlines. Ilamps: Geometri-cally aware and self-configuring projectors. In ACM SIGGRAPH 2006Courses, SIGGRAPH ’06, page 7–es, New York, NY, USA, 2006. As-sociation for Computing Machinery. → pages 12 and 14.[115] Ramesh Raskar, Jeroen van Baar, Srinivas Rao, and ThomasWillwacher. Multi-projector imagery on curved surfaces. MitsubishiElectric Research Labs, pages 1–8, 2003. → pages 24, 28, 38, 40,and 43.[116] Ramesh Raskar, Greg Welch, Matt Cutts, Adam Lake, Lev Stesin,and Henry Fuchs. The office of the future: A unified approach toimage-based modeling and spatially immersive displays. In Proceedingsof the 25th annual conference on Computer graphics and interactivetechniques, pages 179–188. ACM, 1998. → pages 14 and 122.[117] Ramesh Raskar, Greg Welch, Kok-Lim Low, and Deepak Bandyopad-hyay. Shader lamps: Animating real objects with image-based illumi-nation. In Rendering Techniques 2001, pages 89–102. Springer, 2001.→ page 14.[118] Rebekka S Renner, Boris M Velichkovsky, and Jens R Helmert. Theperception of egocentric distances in virtual environments-a review.ACM Computing Surveys (CSUR), 46(2):23, 2013. → page 22.136Bibliography[119] Brian Rogers and Maureen Graham. Motion parallax as an indepen-dent cue for depth perception. Perception, 8(2):125–134, 1979. →page 4.[120] Behzad Sajadi and Aditi Majumder. Scalable multi-view registrationfor multi-projector displays on vertically extruded surfaces. In Com-puter Graphics Forum, volume 29, pages 1063–1072. Wiley OnlineLibrary, 2010. → pages 24, 38, and 43.[121] Behzad Sajadi and Aditi Majumder. Automatic registration of multi-projector domes using a single uncalibrated camera. In ComputerGraphics Forum, volume 30, pages 1161–1170. Wiley Online Library,2011. → pages 14, 43, and 45.[122] Behzad Sajadi and Aditi Majumder. Autocalibration of multiprojec-tor cave-like immersive environments. Visualization and ComputerGraphics, IEEE Transactions on, 18(3):381–393, 2012. → pages 14,24, 38, and 43.[123] Zachary D Siegel, Jonathan W Kelly, and Lucia A Cherep. Rescal-ing of perceived space transfers across virtual environments. Jour-nal of Experimental Psychology: Human Perception and Performance,43(10):1805, 2017. → pages 67, 70, and 91.[124] Adalberto L Simeone, Eduardo Velloso, and Hans Gellersen. Substitu-tional reality: Using the physical environment to design virtual realityexperiences. In Proceedings of the 33rd Annual ACM Conference onHuman Factors in Computing Systems, pages 3307–3316. ACM, 2015.→ page 12.[125] Duncan Smith and Sameer Singh. Approaches to multisensor datafusion in target tracking: A survey. IEEE transactions on knowledgeand data engineering, 18(12):1696–1710, 2006. → page 25.[126] Ian Stavness, Billy Lam, and Sidney Fels. pcubee: a perspective-corrected handheld cubic display. In Proceedings of the SIGCHI Con-ference on Human Factors in Computing Systems, pages 1381–1390.ACM, 2010. → pages 4, 11, 14, 23, and 105.[127] Ian Stavness, Florian Vogt, and Sidney Fels. Cubee: a cubic 3d displayfor physics-based interaction. In ACM SIGGRAPH 2006 Sketches,page 165. ACM, 2006. → pages 14 and 63.137Bibliography[128] Jeanine K Stefanucci, Sarah H Creem-Regehr, William B Thompson,David A Lessard, and Michael N Geuss. Evaluating the accuracy ofsize perception on screen-based displays: Displayed objects appearsmaller than real objects. Journal of Experimental Psychology: Ap-plied, 21(3):215, 2015. → pages 1, 19, 23, 77, 96, 104, and 105.[129] Brett Stevens, Jennifer Jerrams-Smith, David Heathcote, and DavidCallear. Putting the virtual into reality: Assessing object-presencewith projection-augmented models. Presence: Teleoperators & VirtualEnvironments, 11(1):79–92, 2002. → page 2.[130] G Strang. Introduction to Applied Mathematics. Wellesley-Cambridge,1986. → pages 33, 57, and 145.[131] Ivan E Sutherland. A head-mounted three dimensional display. In Pro-ceedings of the December 9-11, 1968, fall joint computer conference,part I, pages 757–764. ACM, 1968. → page 15.[132] Thiago Teixeira, Gershon Dublon, and Andreas Savvides. A surveyof human-sensing: Methods for detecting presence, count, location,track, and identity. ACM Computing Surveys, 5(1):59–69, 2010. →page 25.[133] F Teubl, Celso S Kurashima, MC Cabral, Roseli D Lopes, Junia CAnacleto, Marcelo K Zuffo, and Sidney Fels. Spheree: An interactiveperspective-corrected spherical 3d display. In 3DTV-Conference: TheTrue Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2014, pages 1–4. IEEE, 2014. → pages 16, 29, and 64.[134] Fernando Teubl, Celso Kurashima, Marcio Cabral, and Marcelo Zuffo.Fastfusion: A scalable multi-projector system. In Virtual and Aug-mented Reality (SVR), 2012 14th Symposium on, pages 26–35. IEEE,2012. → pages 14, 24, and 28.[135] Unity. Unity technologies inc., 2020. https://unity.com/. →pages 53, 65, and 94.[136] D Vishwanath. Information in surface and depth perception: Recon-ciling pictures and reality. Perception beyond inference, pages 201–240,2011. → page 20.[137] Andrejs Vorozcovs, Wolfgang Stu¨rzlinger, Andrew Hogue, andRobert S Allison. The hedgehog: a novel optical tracking method138Bibliographyfor spatially immersive displays. Presence, 15(1):108–121, 2006. →page 26.[138] Andrew Wagemakers, Dylan Fafard, and Ian Stavness. Interactivevisual calibration of volumetric head-tracked 3d displays. In 2017SIGCHI Conference on Human Factors in Computing Systems (CHI’17). ACM, 2017, to appear. → pages 51, 93, and 94.[139] Andrew John Wagemakers, Dylan Brodie Fafard, and Ian Stavness.Interactive visual calibration of volumetric head-tracked 3d displays.In Proceedings of the 2017 CHI Conference on Human Factors in Com-puting Systems, pages 3943–3953. ACM, 2017. → pages 14 and 46.[140] Mark Wagner. Sensory and cognitive explanations for a century ofsize constancy research. Visual experience: Sensation, cognition, andconstancy, pages 1–35, 2012. → page 64.[141] L. C. Wanger, D. P. Greenberg, and J. A. Ferwerda. Perceiving spatialrelationships in computer-generated images. IEEE Computer Graphicsand Applications, 12(03):44–51, 54–58, may 1992. → page 21.[142] Colin Ware. Information visualization: perception for design. Elsevier,2012. → pages 3, 4, 15, 20, 21, 64, 77, 85, 86, 91, 105, and 106.[143] Colin Ware, Kevin Arthur, and Kellogg S Booth. Fish tank virtualreality. In Proceedings of the INTERACT’93 and CHI’93 conferenceon Human factors in computing systems, pages 37–42. ACM, 1993. →pages 12 and 13.[144] Colin Ware and Glenn Franck. Evaluating stereo and motion cues forvisualizing information nets in three dimensions. ACM Transactionson Graphics (TOG), 15(2):121–140, 1996. → pages 13, 21, and 22.[145] Christian Weichel, Manfred Lau, David Kim, Nicolas Villar, andHans W Gellersen. Mixfab: a mixed-reality environment for personalfabrication. In Proceedings of the SIGCHI Conference on Human Fac-tors in Computing Systems, pages 3855–3864. ACM, 2014. → page 2.[146] Wikipedia contributors. Spinning dancer — Wikipedia, the free en-cyclopedia, 2019. https://en.wikipedia.org/w/index.php?title=Spinning_Dancer&oldid=916380798. → page 21.139Bibliography[147] Fan Wu, Qian Zhou, Kyoungwon Seo, Toshiro Kashiwaqi, and SidneyFels. I got your point: An investigation of pointing cues in a sphericalfish tank virtual reality display. In 2019 IEEE Conference on VirtualReality and 3D User Interfaces (VR), pages 1237–1238. IEEE, 2019.→ page 117.[148] Xu Xu and Raymond W McGorry. The validity of the first and secondgeneration microsoft kinect for identifying joint center locations duringstatic postures. Applied ergonomics, 49:47–54, 2015. → pages 59and 60.[149] Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: Asurvey. Acm computing surveys (CSUR), 38(4):13, 2006. → page 25.[150] Suya You, Ulrich Neumann, and Ronald Azuma. Orientation trackingfor outdoor augmented reality registration. Computer Graphics andApplications, IEEE, 19(6):36–42, 1999. → page 25.[151] Zhengyou Zhang. A flexible new technique for camera calibration.Pattern Analysis and Machine Intelligence, IEEE Transactions on,22(11):1330–1334, 2000. → pages 32 and 145.[152] Qian Zhou, Georg Hagemann, Dylan Fafard, Ian Stavness, and SidneyFels. An evaluation of depth and size perception on a spherical fishtank virtual reality display. IEEE Transactions on Visualization andComputer Graphics, 25(5):2040–2049, 2019. → pages vi, 91, and 105.[153] Qian Zhou, Georg Hagemann, Sidney Fels, Dylan Fafard, AndrewWagemakers, Chris Chamberlain, and Ian Stavness. Coglobe: aco-located multi-person ftvr experience. In ACM SIGGRAPH 2018Emerging Technologies, page 5. ACM, 2018. → pages vi, vii, 97,and 117.[154] Qian Zhou, Gregor Miller, Kai Wu, Daniela Correa, and Sidney Fels.Automatic calibration of a multiple-projector spherical fish tank vr dis-play. In Applications of Computer Vision (WACV), 2017 IEEE WinterConference on, pages 1072–1081. IEEE, 2017. → pages v and 94.[155] Qian Zhou, Gregor Miller, Kai Wu, Ian Stavness, and Sidney Fels.Analysis and practical minimization of registration error in a sphericalfish tank virtual reality system. In Asian Conference on ComputerVision, pages 519–534. Springer, 2016. → pages v, 28, 29, 38, 41,and 43.140[156] Qian Zhou, Fan Wu, Sidney Fels, and Ian Stavness. Closer object lookssmaller: Investigating the duality of size perception in a spherical fishtank vr display. In Proceedings of the 2020 CHI Conference on HumanFactors in Computing Systems, pages 1–9. ACM, 2020. → page vi.[157] Qian Zhou, Kai Wu, Gregor Miller, Ian Stavness, and Sidney Fels.3dps: An auto-calibrated three-dimensional perspective-correctedspherical display. In Virtual Reality (VR), 2017 IEEE, pages 455–456. IEEE, 2017. → pages vi, vii, and 63.[158] Yi Zhou, Shuangjiu Xiao, Ning Tang, Zhiyong Wei, and Xu Chen.Pmomo: projection mapping on movable 3d object. In Proceedings ofthe 2016 CHI Conference on Human Factors in Computing Systems,pages 781–790. ACM, 2016. → pages 11, 12, and 14.141Appendix AList of PublicationsA.1 Journal PublicationQian Zhou, Georg Hagemann, Dylan Fafard, Ian Stavness, and SidneyFels. An evaluation of depth and size perception on a spherical fishtank virtual reality display. IEEE Transactions on Visualizationand Computer Graphics, 25(5): 2040–2049, 2019.A.2 Conference PublicationQian Zhou, Fan Wu, Sidney Fels, and Ian Stavness. Closer objectlooks smaller: Investigating the duality of size perception in aspherical fish tank vr display. In Proceedings of the 2020 CHIConference on Human Factors in Computing Systems, pages 1–9.ACM, 2020.Qian Zhou, Fan Wu, Ian Stavness, and Sidney Fels. Match the cube:Investigation of the head-coupled input with a spherical fish tankvirtual reality display. In 2019 IEEE Conference on Virtual Realityand 3D User Interfaces (VR), pages 1281–1282. IEEE, 2019.Qian Zhou, Georg Hagemann, Sidney Fels, Dylan Fafard, AndrewWagemakers, Chris Chamberlain, and Ian Stavness. Coglobe: a co-located multi-person ftvr experience. In ACM SIGGRAPH 2018Emerging Technologies, page 5. ACM, 2018.Dylan Fafard, Qian Zhou, Chris Chamberlain, Georg Hagemann, Sid-ney Fels, and Ian Stavness. Design and implementation of a multi-person fish tank virtual reality display. In Proceedings of the 24thACM Symposium on Virtual Reality Software and Technology (VRST),page 5. ACM, 2018.Dylan Fafard, Andrew Wagemakers, Ian Stavness, Qian Zhou, GregorMiller, and Sidney S Fels. Calibration methods for effective fish142A.3. Research Talktank VR in multi-screen displays. In Proceedings of the 2017 CHIConference Extended Abstracts on Human Factors in ComputingSystems, pages 373–376. ACM, 2017.Qian Zhou, Gregor Miller, Kai Wu, Daniela Correa, and Sidney Fels.Automatic calibration of a multiple-projector spherical fish tankVR display. In 2017 IEEE Winter Conference on Applications ofComputer Vision (WACV), pages 1072–1081. IEEE, 2017.Qian Zhou, Kai Wu, Gregor Miller, Ian Stavness, and Sidney Fels.3dps: An auto-calibrated three-dimensional perspective-correctedspherical display. In Virtual Reality (VR), pages 455–456. IEEE,2017.Qian Zhou, Gregor Miller, Kai Wu, Ian Stavness, and Sidney Fels.Analysis and practical minimization of registration error in a spher-ical fish tank virtual reality system. In Asian Conference on Com-puter Vision (ACCV), pages 519–534. Springer, 2016.A.3 Research TalkMarch 2018 - Spherical Fish Tank VR• Emerging Media Lab, Vancouver, BC, CanadaAugust 2018 - CoGlobe: A Co-Located Multi-Person FTVR Experience• Siggraph 2018, Vancouver, BC, CanadaA.4 Additional PublicationQian Zhou, Sarah Sykes, Sidney Fels, and Kenrick Kin. Gripmarks:Using hand grips to transform in-hand objects into mixed real-ity input. In Proceedings of the 2020 CHI Conference on HumanFactors in Computing Systems, pages 1–11. ACM, 2020.Fan Wu, Qian Zhou, Kyoungwon Seo, Toshiro Kashiwaqi, and Sid-ney Fels. I got your point: An investigation of pointing cues in aspherical fish tank virtual reality display. In 2019 IEEE Conferenceon Virtual Reality and 3D User Interfaces (VR), pages 1237–1238.IEEE, 2019.143A.4. Additional PublicationGeorg Hagemann, Qian Zhou, Ian Stavness, and Sidney Fels. Inves-tigating spherical fish tank virtual reality displays for establishingrealistic eye-contact. In 2019 IEEE Conference on Virtual Realityand 3D User Interfaces (VR), pages 950–951. IEEE, 2019.Toshiro Kashiwagi, Kaoru Sumi, Sidney Fels, Qian Zhou, and FanWu. Crystal palace: Merging virtual objects and physical hand-held tools. In 2019 IEEE Conference on Virtual Reality and 3DUser Interfaces (VR), pages 1411–1412. IEEE, 2019.Dylan Fafard, Ian Stavness, Martin Dechant, Regan Mandryk, QianZhou, and Sidney Fels. FTVR in VR: Evaluation of 3D percep-tion with a simulated volumetric fish-tank virtual reality display.In Proceedings of the 2019 CHI Conference on Human Factors inComputing Systems, page 533. ACM, 2019.Georg Hagemann, Qian Zhou, Ian Stavness, Oky Dicky ArdiansyahPrima, and Sidney S Fels. Here’s looking at you: A sphericalFTVR display for realistic eye-contact. In Proceedings of the 2018ACM International Conference on Interactive Surfaces and Spaces(ISS), pages 357–362. ACM, 2018.144Appendix BMulti-Projector DisplayCalibration OptimizationThis chapter describes the implementation and functions for the multi-projector spherical display calibration in Chapter 3. The problem of thedisplay calibration can be formulated as the estimation of a set of parame-ters ~p:~p = (~pc, ~ps, ~pp1 , ..., ~ppN ) (B.1)Assuming we have one camera and N projectors. Camera parameters ~pchave 9 degree of freedom (DOF): 4 for the focal length and the principlepoint; 5 for lens distortion [33]. Each projector has parameters ~ppi with 10DOF: 4 for the focal length and the principle point; 3 for rotation and 3for translation. Sphere parameters ~ps have 4 DOF: 3 for the center positionand 1 for radius.As shown in Figure 3.3, we first obtain the initial guess of camera pa-rameters ~pc and projector parameters ~ppi . Camera parameters (intrinsic)are estimated based on [151] implemented with OpenCV [98]. Projector pa-rameters (intrinsic) are estimated based on [39] implemented with OpenCV.Projector parameters (extrinsic) are estimated based on the canonical solu-tion of a two view geometry described in [48] (page 257-260). Sphere pa-rameters ~ps are estimated based on a weighted linear least square solution[130].As the initial guess of all parameters has been obtained, we describethe nonlinear optimization with the error function and the Jacobian matrix.For a pixel j located at ~xpij in the image plane of projector Pi, a ray isback-projected and intersects with the sphere at the 3D point ~Xij . Theback-projection and ray-sphere intersection can be expressed as a functionf based on variables ~ppi and ~ps:~Xij = f(~xpij ; ~ppi , ~ps) (B.2)145Appendix B. Multi-Projector Display Calibration OptimizationWe now describe the formulation of ~Xij . Let KKpi represent the intrinsicmatrix (3x3), Ri and Ti represent the rotation (3x3) and translation matrix(3x1) of projector Pi. For a pixel j with the 2D coordinate of ~xpij , theback-projection of a ray starting at the pixel j can be formulated as:Ray(λ) =xijyijzij = ~Ci + ~Vijλ~Ci = −RTi Ti~Vij = (KKpiRi)−1~xpij(B.3)Substituting equation B.3 into the sphere equation with (a, b, c) as thecenter of the sphere and r as the radius:(xij − a)2 + (yij − b)2 + (zij − c)2 = r2 (B.4)We can solve for ~Xij = (xij , yij , zij)T . Let the 3D point ~Xij capturedby the camera at pixel ~xcij on the image plane. ~xcij can be expressed as afunction g based on ~Xij and ~pc:~xcij = g(~Xij ; ~pc) (B.5)The formulation of ~xcij is based on the lens distortion and projection incamera. We use a five parameter model for lens distortion [33]:(x′ijy′ij)=(xij/zijyij/zij)r2c = x′ij2+ y′ij2x′′ij = x′ij(1 + k1r2c + k2r4c + k3r6c ) + 2p1x′ijy′ij + p2(r2c + 2x′ij2)y′′ij = y′ij(1 + k1r2c + k2r4c + k3r6c ) + 2p2x′ijy′ij + p1(r2c + 2y′ij2)(B.6)where (x′ij , y′ij) is the projected 2D point from~Xij before lens distor-tion, and (x′′ij , y′′ij) is the projected point after lens distortion. k1, k2, k3 arethe radial distortion factors and p1, p2 are the tangential distortion factors.Finally, ~xcij can be computed using (x′′ij , y′′ij), where KKc is the intrinsicmatrix of the camera:~xcij = KKcx′′ijy′′ij1 (B.7)146Appendix B. Multi-Projector Display Calibration OptimizationThe error function is formulated as the re-projection error in the camera,which is the mean squared error across all pixels and projectors betweenthe observed pixel ~xcij and the estimated pixel xˆcij computed using aboveequations.E =∑i∑j(~xcij − xˆcij )2, (B.8)The above error function has been implemented with a numeric analysislibrary Alglib [105] using a Levenberg-Marquardt algorithm. To facilitatethe solver, we also compute the analytic Jacobian matrix. For illustrativepurpose, assuming we have 4 projectors with M pixels in each projector.The Jacobian matrix can be computed as:J = ∂F∂~p =A11 0 0 0 B11 C11..................A1M 0 0 0 B1M C1M0 A21 0 0 B21 C21..................0 A2M 0 0 B2M C2M0 0 A31 0 B31 C31..................0 0 A3M 0 B3M C3M0 0 0 A41 B41 C41..................︸ ︷︷ ︸projectors 1− 40 0 0 A4M ︸︷︷︸cameraB4M ︸︷︷︸sphereC4M projector 1 pixels projector 2 pixels projector 3 pixels projector 4 pixelsAij =∂~xcij∂ ~Xij∂ ~Xij∂~ppi(B.9)Bij =∂~xcij∂~pcCij =∂~xcij∂ ~Xij∂ ~Xij∂~pswhere Aij is the partial derivative of the camera 2D point ~xcij to pro-jector i parameters ~ppi . Bij is the partial derivative of ~xcij to camera pa-rameters ~pc. Cij is the partial derivative of ~xcij to sphere parameters ~ps.147Appendix B. Multi-Projector Display Calibration OptimizationThe analytic expressions of equation B.9 can be derived using equations B.3,B.4, B.6 and B.7. In practice the analytic expression of the Jacobian matrixis auto-generated using the Symbolic Math Toolbox in MATLAB [59]. Thegenerated analytic expressions are in C format and imported in the opti-mization code with Alglib. The source code of the entire calibration areopen source and can be found on [107].148Appendix CUser Study Questionnaires149C.1. User Study on Size and Depth PerceptionC.1 User Study on Size and Depth PerceptionSubject ID:   Condition Order:    Date:  3D Perspective-Corrected Spherical Display Experiment Questionnaire Form   Page 1 of 2  Using a checkmark (✔) please rate the statements below about the different display conditions: SD (-2) D (-1) N (0) A (+1) SA (+2) Strongly Disagree Disagree Neutral Agree Strongly Agree  Flat Screen SD D N A SA  Depth Task  It was easy to judge the depth      It was intuitive to judge the depth      I felt I performed the depth task well          Size Task  It was easy to judge the size and compare it to the real ball      It was intuitive to adjust the size of the virtual ball      I felt I performed the size task well          Spherical Screen SD D N A SA  Depth Task  It was easy to judge the depth      It was intuitive to judge the depth      I felt I performed the depth task well          Size Task  It was easy to judge the size and compare it to the real ball      It was intuitive to adjust the size of the virtual ball      I felt I performed the size task well         (after all tasks) Flat Screen SD D N A SA I was able to perceive the virtual pool ball like a real pool ball on this screen      I enjoyed doing tasks on this screen        Spherical Screen SD D N A SA I was able to perceive the virtual pool ball like a real pool ball on this screen      I enjoyed doing tasks on this screen        150C.1. User Study on Size and Depth PerceptionSubject ID:   Condition Order:    Date:    Page 2 of 2 1. Please specify your gender:   Male  Female   2. Please specify your age range:  18-25  26-35  36-45   46-55  56-65  Above 65  3. Are you a user of 3D graphic interface (e.g. playing 3D Games or using CAD, Blender, Unity3D)?  Yes i. Please mention names of software: ………………………...……………………...……… ii. What is the frequency of usage? 1. Daily 2.  Weekly 3. Occasionally   No 4. Have you experienced with Virtual Reality before?  Yes i. Please mention type of device: ………………………...……………………...……… ii. How many times of usage? 1. 1-5 times 2.  5-10 times       3. Above 10 times    No 5. For the depth task, you feel you performed better on:  Flat Screen  Spherical Screen  No difference on performance              Please specify the reason:     6. For the depth task, you prefer to:  Flat Screen  Spherical Screen  No preference              Please specify the reason:     7. For the size task, you feel you performed better on:  Flat Screen  Spherical Screen  No difference on performance              Please specify the reason:     8. For the size task, you prefer to:  Flat Screen  Spherical Screen  No preference              Please specify the reason:      151C.2. User Study 1 on Perceptual DualityC.2 User Study 1 on Perceptual DualitySubject ID:   Condition Order:    Date:  3D Perspective-Corrected Spherical Display Questionnaire Form   Page 1 of 2  Using a checkmark (✔) please rate the statements below about the different display conditions: SD (-2) D (-1) N (0) A (+1) SA (+2) Strongly Disagree Disagree Neutral Agree Strongly Agree    NonStereo  Head-Move SD D N A SA I was able to perceive the virtual pool ball like a real ball      I was confident of my answers when completing the task.        Stereo  Head-Move SD D N A SA I was able to perceive the virtual pool ball like a real ball      I was confident of my answers when completing the task.         NonStereo  Object-Move SD D N A SA I was able to perceive the virtual pool ball like a real ball      I was confident of my answers when completing the task.         Stereo  Object-Move SD D N A SA I was able to perceive the virtual pool ball like a real ball      I was confident of my answers when completing the task.                   152C.2. User Study 1 on Perceptual DualitySubject ID:   Condition Order:    Date:  3D Perspective-Corrected Spherical Display Questionnaire Form   Page 2 of 2  1. Please specify your gender:   Male  Female    2. Please specify your age range:  18-25  26-35  36-45  46-55  56-65  Above 65   3. Have you experienced with Virtual Reality before?  Yes i. Please mention type of device: ………………………...……………………...……… ii. How many times of usage? 1. 1-5 times 2.  5-10 times       3. Above 10 times    No  Please feel free to leave any comments:       153C.3. User Study 2 on Perceptual DualityC.3 User Study 2 on Perceptual DualitySubject ID:   Condition Name:    Date:   CRYSTAL: Spherical Fish Tank Virtual Reality Display Experiment Questionnaire Form   1. How much did your overall experience with virtual objects seem consistent with your real-world experience?  2. To what extent did you feel you could reach and grasp an object in the virtual environment?  3. To what extent did virtual objects appear geometrically correct during your movement, did they seem to have the right size and distance in relation to your position?  4. Overall, how realistic did the virtual object appear?  5. Overall, how strong did you feel the object exist in the real world?    0Not at all Very consistent Neutral -2                                -1                                     0                                       1                                  2  0Not at all Very much Neutral -2                                -1                                     0                                       1                                  2  0Not at all Very much Neutral -2                                -1                                     0                                       1                                  2  0Not at all Very realistic Neutral -2                                -1                                     0                                       1                                  2  0Not at all Very strong Neutral -2                                -1                                     0                                       1                                  2  154C.3. User Study 2 on Perceptual DualitySubject ID:   Condition Order:    Date:   CRYSTAL: Spherical Fish Tank Virtual Reality Display Demographic Questionnaire Form    1. Please specify your gender:   Male  Female   2. Please specify your age range:  18-25  26-35  36-45   46-55  56-65  Above 65  3. Have you experienced Virtual Reality before?  Yes i. Please mention type of device: ………………………...……………………...……… ii. How many times of usage? 1. 1-5 times 2.  5-10 times 3. Above 10 times   No  4. Which viewing condition that you would prefer:  Perspective1  Perspective2   Please specify the reason:     155Appendix DUser Study ResultD.1 User Study on Size and Depth PerceptionWithin Subjects Effects  Sum of Squares df Mean Square F p η²pDisplay 0.00202 1 0.00202 11.000 0.005 0.423Residual 0.00275 15 1.83e-4      Distance 1.87e-5 2 9.37e-6 0.287 0.753 0.019Residual 9.81e-4 30 3.27e-5      Display ✻ Distance 2.71e-5 2 1.35e-5 0.448 0.643 0.029Residual 9.06e-4 30 3.02e-5      Note. Type 3 Sums of Squares Figure D.1: Repeated measures two-way ANOVA on the absolute deptherror with factors of Display and Distance in the depth ranking task inChapter 5.156D.1. User Study on Size and Depth PerceptionWithin Subjects Effects  Sum of Squares df Mean Square F p partial η²Display 0.00957 1 0.00957 5.132 0.039 0.255Residual 0.02797 15 0.00186      Distance 0.00333 2 0.00167 2.093 0.141 0.122Residual 0.02389 30 7.96e-4      InitialSize 0.00119 1 0.00119 0.546 0.472 0.035Residual 0.03265 15 0.00218      Display ✻ Distance 4.57e-4 2 2.28e-4 0.348 0.709 0.023Residual 0.01969 30 6.56e-4      Display ✻ InitialSize 1.32e-4 1 1.32e-4 0.133 0.720 0.009Residual 0.01492 15 9.95e-4      Distance ✻ InitialSize 0.00250 2 0.00125 1.430 0.255 0.087Residual 0.02619 30 8.73e-4      Display ✻ Distance ✻ InitialSize 0.00119 2 5.97e-4 1.851 0.175 0.110Residual 0.00968 30 3.23e-4      Note. Type 3 Sums of Squares Figure D.2: Repeated measures three-way ANOVA on the absolute size errorwith factors of Display, Distance and InitialSize in the size matching task inChapter 5.157D.1. User Study on Size and Depth PerceptionWithin Subjects Effects  Sum of Squares df Mean Square F p partial η²Display 0.02342 1 0.02342 4.88 0.043 0.246Residual 0.07196 15 0.00480      Distance 0.01462 2 0.00731 5.13 0.012 0.255Residual 0.04276 30 0.00143      InitialSize 0.07849 1 0.07849 36.64 < .001 0.710Residual 0.03213 15 0.00214      Display ✻ Distance 0.00641 2 0.00320 5.13 0.012 0.255Residual 0.01875 30 6.25e-4      Display ✻ InitialSize 0.01634 1 0.01634 28.71 < .001 0.657Residual 0.00854 15 5.69e-4      Distance ✻ InitialSize 0.01604 2 0.00802 12.97 < .001 0.464Residual 0.01855 30 6.18e-4      Display ✻ Distance ✻ InitialSize 0.00133 2 6.65e-4 1.47 0.245 0.089Residual 0.01355 30 4.52e-4      Note. Type 3 Sums of Squares Display Distance Display Distance Mean Difference SE df t pbonferroniFlat Far - Flat Mid 0.02347 0.008 52.1 2.9318 0.045Flat Far - Flat Near 0.03388 0.008 52.1 4.2325 0.001Flat Far - Fishbowl Far -0.00576 0.01122 23.1 -0.5133 1Flat Mid - Flat Near 0.01041 0.008 52.1 1.3007 1Flat Mid - Fishbowl Mid -0.02969 0.01122 23.1 -2.6455 0.126Flat Near - Fishbowl Near -0.03081 0.01122 23.1 -2.7453 0.108Fishbowl Far - Fishbowl Mid -4.66e−4 0.008 52.1 -0.0582 1Fishbowl Far - Fishbowl Near 0.00883 0.008 52.1 1.1027 1Fishbowl Mid - Fishbowl Near 0.00929 0.008 52.1 1.1609 1Display InitialSize Display InitialSize Mean Difference SE df t pbonferroniFlat Small - Flat Large -0.05889 0.00752 22.4 -7.835 < .001Flat Small - Fishbowl Small -0.04054 0.01057 18.5 -3.834 0.004Flat Large - Fishbowl Large -0.00364 0.01057 18.5 -0.344 1Fishbowl Small - Fishbowl Large -0.02199 0.00752 22.4 -2.926 0.032Post Hoc Comparisons - Display ✻ DistanceComparisonPost Hoc Comparisons - Display ✻ InitialSizeComparisonFigure D.3: Repeated measures three-way ANOVA on the size ratio withfactors of Display, Distance and InitialSize, followed by post-hoc analysist-test with Bonferroni correction on the interaction between Display x Dis-tance and Display x InitialSize in the size matching task in Chapter 5.158D.1. User Study on Size and Depth PerceptionPaired Samples T-Test      statistic df p Mean difference SE difference Cohen's dFlat Fishbowl Student's t -1.02 15.0 0.322 -0.0110 0.0107 -0.256 Figure D.4: Results of pairwise t-test on the normalized head movement inthe depth ranking task in Chapter 5.Paired Samples T-Test      statistic df p Mean difference SE difference Cohen's dFlat Fishbowl Student's t -2.76 15.0 0.015 -0.0295 0.0107 -0.689 Figure D.5: Results of pairwise t-test on the normalized head movement inthe size matching task in Chapter 5.Paired Samples T-Test      statistic p Cohen's dFishbowl-Depth-easy Flat-Depth-easy Wilcoxon W 78.0 ᵃ 0.002 1.134Fishbowl-Depth-intuitive Flat-Depth-intuitive Wilcoxon W 62.0 ᵇ 0.009 0.834Fishbowl-Depth-confident Flat-Depth-confident Wilcoxon W 45.0 ᵈ 0.007 0.854Fishbowl-Size-easy Flat-Size-easy Wilcoxon W 35.0 ᵈ 0.146 0.413Fishbowl-Size-intuitive Flat-Size-intuitive Wilcoxon W 14.5 ᵉ 1.000 0.000Fishbowl-Size-confident Flat-Size-confident Wilcoxon W 15.0 ᶠ 0.374 0.250Fishbowl-Overall-real Flat-Overall-real Wilcoxon W 72.5 ᵃ 0.006 0.968Fishbowl-Overall-enjoy Flat-Overall-enjoy Wilcoxon W 15.0 ᵍ 0.048 0.606ᵃ 4 pair(s) of values were tiedᵇ 5 pair(s) of values were tiedᵈ 7 pair(s) of values were tiedᵉ 9 pair(s) of values were tiedᶠ 10 pair(s) of values were tiedᵍ 11 pair(s) of values were tied Figure D.6: Results of Wilcoxon Signed Rank Test on the Likert-scale post-experiment questions in the depth and size task in Chapter 5. A completecopy of the questionnaire can be found in Appendix C.1.159D.2. User Study 1 on Perceptual DualityD.2 User Study 1 on Perceptual DualityPaired Samples T-Test      statistic p Cohen's dNonStereo-Object NonStereo-Head Wilcoxon W 151.0 < .001 1.850Stereo-Object Stereo-Head Wilcoxon W 119.0 ᵃ 0.009 0.875NonStereo-Object Stereo-Object Wilcoxon W 94.0 ᵇ 0.010 0.761NonStereo-Head Stereo-Head Wilcoxon W 20.5 ᵈ 0.027 -0.664ᵃ 1 pair(s) of values were tiedᵇ 3 pair(s) of values were tiedᵈ 2 pair(s) of values were tied Figure D.7: Results of pairwise Wilcoxon Signed Rank Test on the biaserror in the size judgement task with p-values before Bonferroni correctionin Chapter 6.Paired Samples T-Test      statistic p MeandifferenceSEdifference Cohen's dStereo-Head NonStereo-Head WilcoxonW 50.5 ᵃ 0.015 1.000 0.226 0.695Stereo-Object NonStereo-ObjectWilcoxonW 45.0 ᵃ 0.059 1.000 0.193 0.518Stereo-Head Stereo-Object WilcoxonW 51.0 ᵇ 0.097 1.000 0.259 0.441NonStereo-HeadNonStereo-ObjectWilcoxonW 38.5 ᵃ 0.227 1.000 0.182 0.313ᵃ 7 pair(s) of values were tiedᵇ 6 pair(s) of values were tied Figure D.8: Results of pairwise Wilcoxon Signed Rank Test on the confi-dence rating with p-values before Bonferroni correction in Chapter 6. Re-sults are not significant after Bonferroni correction. A complete copy of thequestionnaire can be found in Appendix C.2.160D.2. User Study 1 on Perceptual DualityPaired Samples T-Test      statistic p MeandifferenceSEdifference Cohen's dStereo-Head NonStereo-Head WilcoxonW 25.5 ᵃ 0.305 1.000 0.254 0.2810Stereo-Object NonStereo-ObjectWilcoxonW 42.0 ᵇ 0.022 1.500 0.358 0.6369Stereo-Head Stereo-Object WilcoxonW 19.5 ᵇ 0.761 -4.66e−5 0.277 -0.0514NonStereo-HeadNonStereo-ObjectWilcoxonW 52.0 ᵈ 0.090 1.000 0.344 0.4152ᵃ 9 pair(s) of values were tiedᵇ 8 pair(s) of values were tiedᵈ 6 pair(s) of values were tied Figure D.9: Results of pairwise Wilcoxon Signed Rank Test on the realismrating with p-values before Bonferroni correction in Chapter 6. Results arenot significant after Bonferroni correction. A complete copy of the question-naire can be found in Appendix C.2.161D.3. User Study 2 on Perceptual DualityD.3 User Study 2 on Perceptual DualityWithin Subjects Effects  Sum of Squares df Mean Square F p η²pProjection 0.0978 1 0.0978 4.14 0.067 0.273Residual 0.2598 11 0.0236      Movement 2.2245 1 2.2245 36.64 < .001 0.769Residual 0.6678 11 0.0607      Projection ✻ Movement 0.4537 1 0.4537 10.10 0.009 0.479Residual 0.4942 11 0.0449      Note. Type 3 Sums of Squares Projection Movement Projection MovementMean DifferenceSE df t pbonferroniPerspective Object - Perspective Head 0.625 0.0938 21.5 6.66 < .001Perspective Object - WeakPerspective Object 0.285 0.0756 20.1 3.77 0.005Perspective Head - WeakPerspective Head -0.104 0.0756 20.1 -1.38 0.732WeakPerspective Object - WeakPerspective Head 0.236 0.0938 21.5 2.52 0.08Post Hoc Comparisons - Projection ✻ MovementComparisonFigure D.10: Results of repeated measures two-way ANOVA on the biaserror with factors of Projection and Movement, followed by post-hoc pairwiset-test with Bonferroni correction on the interaction between Projection xMovement.162D.3. User Study 2 on Perceptual DualityPaired Samples T-Test      statistic p Cohen's dWeakPersp-consistent Persp-consistent Wilcoxon W 28.0 ᵃ 0.019 1.018WeakPersp-reachable Persp-reachable Wilcoxon W 55.5 ᵇ 0.047 0.720WeakPersp-correctGeometry Persp-correctGeometry Wilcoxon W 50.0 ᵈ 0.024 0.894WeakPersp-realistic Persp-realistic Wilcoxon W 32.5 ᵉ 0.040 0.751WeakPersp-presence Persp-presence Wilcoxon W 37.0 ᶠ 0.090 0.568ᵃ 5 pair(s) of values were tiedᵇ 1 pair(s) of values were tiedᵈ 2 pair(s) of values were tiedᵉ 4 pair(s) of values were tiedᶠ 3 pair(s) of values were tied Figure D.11: Results of Wilcoxon Signed Rank Test on the Likert-scalepost-experiment questions in the subjective impression task in Chapter 6.A complete copy of the questionnaire can be found in Appendix C.3.163

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0391981/manifest

Comment

Related Items