Development and Application of a Description-BasedInterface for 3D ReconstructionbyKai WuBachelor of Engineering, Beijing University of Posts and Telecommunications2014A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)February 2018© Kai Wu, 2018AbstractAdvancements in state-of-the-art 3D reconstruction algorithms have sped aheadof the development of interfaces or application programming interfaces (APIs) fordevelopers, especially to those who are not experts in computer vision.In this thesis, we have designed a novel interface, specifically for 3D recon-struction techniques, which uses a description (covering the conditions of the prob-lem) to allow a user to reconstruct the shape of an object without knowledge of 3Dvision algorithms. The interface hides the details of algorithms by using a de-scription of visual and geometric properties of the object. Our interface interpretsthe description and chooses from a set of algorithms those that satisfy the descrip-tion. We show that this description can be interpreted to one appropriate algorithm,which can give a successful reconstruction result.We evaluate the interface through a proof of concept interpreter, which inter-prets the description and invokes one of three underlying algorithms for recon-struction. We demonstrate the link between the description set by the user and theresult returned using synthetic and real-world datasets where each object has beenimaged with the appropriate setup.iiLay Summary3D modelling of the real world has a wide range of applications, including 3Dmapping and navigation, content creation for virtual reality, online shopping, videogames, visual effects, cultural heritage archival, and so on. However, existing ap-proaches requires vision background to fully take advantage of these algorithms.This thesis is dedicated to developing a description-based interface for 3D recon-struction problems. The interface hides the details of algorithms via a descriptionlayer, which describes visual and geometric properties of an object. This descrip-tion can be interpreted into one appropriate algorithm, which leads to a successfulreconstruction result. This allows the application developers to deploy 3D recon-struction functionality in their domain-specific applications without the need tounderstand the algorithm details.iiiPrefaceThe entire work presented here has been done by the author, Kai Wu, with the col-laboration and supervision of Dr. Sidney Fels and Dr. Gregor Miller. A manuscriptdescribing the core of our work and our results has been submitted to a visionconference and is under anonymous review at the moment of thesis submission.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivating Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 A Problem Space of 3D Reconstruction . . . . . . . . . . 71.2.3 A Description of 3D Reconstruction . . . . . . . . . . . . 71.2.4 A Mapping of 3D Reconstruction . . . . . . . . . . . . . 71.2.5 An Interpretation of 3D Reconstruction . . . . . . . . . . 7v1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Toolboxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 3D Reconstruction Techniques . . . . . . . . . . . . . . . . . . . 112.2.1 Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Shading . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.3 Silhouette . . . . . . . . . . . . . . . . . . . . . . . . . . 223 A Problem Space of 3D Reconstruction . . . . . . . . . . . . . . . . 243.1 Problem Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.1 Discussion and Conclusions . . . . . . . . . . . . . . . . 304 A Description of 3D Reconstruction . . . . . . . . . . . . . . . . . . 324.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.1 Setup of Capturing System . . . . . . . . . . . . . . . . . 354.2.2 Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.3 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.4 Specularity . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.5 Roughness . . . . . . . . . . . . . . . . . . . . . . . . . 404.3 Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.1 Perception of Properties . . . . . . . . . . . . . . . . . . 414.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . 445 A Mapping of 3D Reconstruction . . . . . . . . . . . . . . . . . . . . 455.1 Construction of Dataset . . . . . . . . . . . . . . . . . . . . . . . 465.1.1 Selected and Baseline Methods . . . . . . . . . . . . . . 475.1.2 Synthetic Setups . . . . . . . . . . . . . . . . . . . . . . 485.1.3 Quantitative Measures . . . . . . . . . . . . . . . . . . . 485.2 Reduction of Problem Space Dimension . . . . . . . . . . . . . . 495.2.1 Reduction of Problem Space Dimension: PMVS . . . . . 50vi5.2.2 Reduction of Problem Space Dimension: EPS . . . . . . . 535.2.3 Reduction of Problem Space Dimension: GSL . . . . . . 575.3 Construction of Mapping . . . . . . . . . . . . . . . . . . . . . . 595.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . 606 An Interpretation of 3D Reconstruction . . . . . . . . . . . . . . . . 636.1 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . 646.1.1 Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3 Overview of Datasets . . . . . . . . . . . . . . . . . . . . . . . . 686.3.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 696.3.2 Image Capturing . . . . . . . . . . . . . . . . . . . . . . 696.3.3 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . 706.3.4 Real-World Dataset . . . . . . . . . . . . . . . . . . . . . 716.4 Evaluation of Interpreter . . . . . . . . . . . . . . . . . . . . . . 716.4.1 Evaluation 1: Accurate Description, Successful Result . . 726.4.2 Evaluation 2: Less Accurate Description, Less SuccessfulResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.4.3 Evaluation 3: Inaccurate Description, Poor Result . . . . . 786.5 Discussions and Conclusions . . . . . . . . . . . . . . . . . . . . 817 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 887.2 Closing Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 98A.1 Definition of 3D Reconstruction . . . . . . . . . . . . . . . . . . 98A.1.1 Basic Notations . . . . . . . . . . . . . . . . . . . . . . . 98A.1.2 Segment and Scell . . . . . . . . . . . . . . . . . . . . . 99A.1.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . 100A.1.4 Formal Definition . . . . . . . . . . . . . . . . . . . . . . 100A.1.5 Applied Definition . . . . . . . . . . . . . . . . . . . . . 101viiA.2 Physics-Based Vision . . . . . . . . . . . . . . . . . . . . . . . . 101A.2.1 Radiometric Terms . . . . . . . . . . . . . . . . . . . . . 101A.2.2 Reflectance Model: Microfacet Model . . . . . . . . . . . 103A.2.3 Image Formation: Radiometric Perspective . . . . . . . . 104A.3 Results of Real-World Objects . . . . . . . . . . . . . . . . . . . 106viiiList of TablesTable 2.1 Assumptions made by different classes of photometric stereo. . 20Table 4.1 Model of the 3D reconstruction problem. Visual and geometricproperties are selected from the problem space in Chapter 3. . . 35Table 4.2 The representations of the capturing system. The configurationof the current capturing system is label in red. . . . . . . . . . 36Table 4.3 A Model and corresponding representations of the 3D recon-struction problem. . . . . . . . . . . . . . . . . . . . . . . . . 40Table 4.4 Expression of the reconstruction problem for the four problemconditions proposed in Section 3. . . . . . . . . . . . . . . . . 43Table 5.1 Summary of the selected and baseline algorithms implementedfor the interface. . . . . . . . . . . . . . . . . . . . . . . . . . 47Table 5.2 Summary of synthetic capturing systems for three classes of al-gorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Table 5.3 This is a 3⇥ 3 factorial design. Every two properties are se-lected to test the main effects and interaction, there are in totalN2combinations. . . . . . . . . . . . . . . . . . . . . . . . . 50Table 5.4 The problem conditions under which PMVS, EPS, and GSLwork successfully in terms of the two metrics accuracy andcompleteness. . . . . . . . . . . . . . . . . . . . . . . . . . . 62ixList of FiguresFigure 1.1 The three layers of the 3D reconstruction interface. . . . . . . 3Figure 1.2 Two target user of the proposed scenario. . . . . . . . . . . . 4Figure 1.3 Scenario of using the traditional approach and the proposedinterface for 3D reconstruction tasks. . . . . . . . . . . . . . 6Figure 2.1 Three classes of algorithms with examples, visual cues usedfor reconstruction, and potential problems. The bottom leftimage: Silhouette Cones, ©CC BY-SA 3.0. The bottom rightimage: Visual Hull, ©CC BY-SA 3.0. . . . . . . . . . . . . . 12Figure 3.1 A list of properties with labels of the proposed problem space. 26Figure 3.2 Embed algorithms into the interface. The process is three-fold:1) reduce problem space by discovering effective properties de-noted by red; 2) discover mapping from the lower-dimensionalproblem space to algorithm, denoted by green; 3) mappingfrom problem space to algorithms denoted by blue. . . . . . . 28Figure 3.3 Four problem conditions selected based on the definition ofproblem space and additional assumptions. The description-based interface will be evaluated using objects satisfying theseconditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 4.1 (a). A red diffuse sphere; (b). a red specular sphere. Thesurface reflects light in a mirror-like way, showing a distortedenvironment. Since no diffuse reflection exists, the colour ofthe surface is no longer visible. . . . . . . . . . . . . . . . . . 38xFigure 4.2 Fresnel reflectance for external reflection from a variety of sub-stances. Since copper and aluminum have significant variationin their reflectance over the visible spectrum, their reflectanceis shown as three separate curves for R, G, and B. Copper’s Rcurve is highest, followed by G, and finally B (thus its reddishcolour). Aluminum’s B curve is highest, followed by G, andfinally R. (Image from “Real-Time Rendering, 3rd edition”,©2008 A K Peters, by permission) . . . . . . . . . . . . . . . 39Figure 4.3 The UI for determining the property settings, including albedo,specular, and roughness of the surface. As shown in (a), theproblem condition in this case is: texture (0.8), albedo (0.8),specular (0.2), roughness(0.2). (b) demonstrates the effect ofthe property settings on a sphere, (c) on a teapot, and (d) showsthe real-world object. . . . . . . . . . . . . . . . . . . . . . . 43Figure 5.1 Example synthetic images. The value of each property rangesfrom 0 to 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Figure 5.2 Performance of PMVS under six conditions. For instance, (a)shows the performance under the condition of changing tex-ture and albedo levels, while the others are fixed. The maineffect of a property is illustrated by the colour variation alongthe corresponding axis. The monotonic colour variation di-agonally indicates no interaction between the two properties,otherwise, there is an interaction effect. Thus in (a), we ob-serve a main effect of texture on completeness, no other maineffects and interaction effects are present. . . . . . . . . . . . 51Figure 5.3 (a) shows the reflection of light off a specular surface. V1 re-ceived the diffuse component while V2 receives the specularcomponent. (b), (c) shows the images observed from these twoviews. The specular area (red circle) observed in V2 is visiblein V1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52xiFigure 5.4 Performance of Example-based PS under six problem condi-tions. For instance, (a) shows the performance under the condi-tion of changing texture and albedo levels, while the others arefixed. The main effect of a property is illustrated by the colourvariation along the corresponding axis. The monotonic colourvariation diagonally indicates no interaction between the twoproperties, otherwise, there is an interaction effect. Thus in (a),we observe a main effect of albedo on mean angular error, noother main effects and interaction effects are present. . . . . . 54Figure 5.5 An example illustrates the effect of roughness on PS. Albedo isset as 0.8, and specular is set as 0.8. The first column shows theinput images, the second column shows the estimated normalmap, the third column shows the integrated surface, and lastcolumn shows the angular error. We can see from the qualita-tive results (normal map and height map), and quantitative re-sult (angular error) that a medium level roughness would leadto a worse normal estimation since it blurs the specular lobe. . 56Figure 5.6 Performance of Gray-coded SL under six problem conditions.For instance, (a) shows the performance under the conditionof changing texture and albedo levels, while the others arefixed. The main effect of a property is illustrated by the colourvariation along the corresponding axis. The monotonic colourvariation diagonally indicates no interaction between the twoproperties, otherwise, there is an interaction effect. Thus in (a),we observe a main effect of albedo on completeness, no othermain effects and interaction effects are present. . . . . . . . . 58Figure 5.7 Performance of PMVS, EPS, and GSL under all problem con-ditions. These are look-up tables that provide information re-garding the performance of selected algorithms compared tobaseline methods. Once a threshold value e is specified, theselook-up tables can be used as mapping from problem conditionto algorithms, i.e., return successful algorithms given a prob-lem condition. . . . . . . . . . . . . . . . . . . . . . . . . . . 61xiiFigure 6.1 Visual phenomena that indicate poor quality of reconstructionresults. (a) has rough surface which indicates poor accuracy,(b) has incomplete holes which indicates poor completeness,and (c) has spikes which indicates poor normal estimation. . . 66Figure 6.2 Representative objects of the four problem condition discussedin Chapter 3. (a)-(d): Synthetic objects, and (e)-(h) real-worldobjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 6.3 Image capturing systems for MVS/VH, PS, and SL. . . . . . . 70Figure 6.4 The representatives of the four classes of objects used for eval-uation. Example images and problem conditions of syntheticobjects are in the first two rows. The correct description ofthe problem condition of corresponding object is presented inthird row. The last row shows the algorithms returned by themapping, from which the interpreter selects one successful al-gorithm, which is coloured in red. . . . . . . . . . . . . . . . 71Figure 6.5 The representatives of the four classes of objects used for eval-uation. Example images and problem conditions of real-worldobjects are in the first two rows. The correct description ofthe problem condition of corresponding object is presented inthird row. The last row shows the algorithms returned by themapping, from which the interpreter selects one successful al-gorithm, which is coloured in red. . . . . . . . . . . . . . . . 72Figure 6.6 (a) shows the problem condition of object ‘bust’ and ‘statue’specified in the 3D software. (b) and (c) shows the syntheticobject rendered using the problem condition in (a). . . . . . . 73Figure 6.7 The interpreter selects an appropriate algorithm based on de-scription. Based on the discovered mapping from Chapter 5,EPS and GSL can achieve a successful reconstruction, whichis labelled in green while PMVS would fail to give successfulresult, which is labelled in red. . . . . . . . . . . . . . . . . . 73Figure 6.8 (a), (c) show the reconstruction results of the interpreted algo-rithm, GSL. (b), (d) show the results of the baseline method. . 74xiiiFigure 6.9 Evaluation 1: correct description leads to successful recon-struction result. The baseline results are provided so that wecan determine the quality of result returned by the algorithmchosen by the interpreter. . . . . . . . . . . . . . . . . . . . . 75Figure 6.10 (a) shows the reconstruction using the incorrect descriptionwhile (b) shows the reconstruction result using the correct de-scription. The text labelled in red represents incorrectly esti-mated property. . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 6.11 (a)-(d) show the reconstruction results given a less accurate de-scription while (e) shows the result given an accurate descrip-tion. The text labelled in red represents the property estimatedincorrectly. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Figure 6.12 Evaluation 2: less accurate description may lead to poor re-sult. Desci represents inaccurate descriptions. For each object,the first row represent the description, with the incorrectly esti-mated property coloured in red while the correct ones in black.The algorithms determined by mapping are below the descrip-tion with the algorithm selected by interpreter coloured in red(BL: baseline). The last row shows the corresponding recon-struction results. . . . . . . . . . . . . . . . . . . . . . . . . 79Figure 6.13 Evaluation 2: less accurate description may lead to poor re-construction results. Desci represents inaccurate descriptions.For each object, the first row represent the description, withthe incorrectly estimated property coloured in red while thecorrect ones in black. The algorithms determined by mappingare below the description with the algorithm selected by inter-preter coloured in red (BL: baseline). The last row shows thecorresponding reconstruction results. . . . . . . . . . . . . . 80Figure 6.14 (a) shows the reconstruction result given an incorrect descrip-tion whereas (b) shows the result given a correct description. . 81xivFigure 6.15 Evaluation 3: inaccurate description may lead to poor result.For each description, the first row represent the settings ofproperties. The algorithms determined by mapping are shownbelow with the algorithm selected by interpreter coloured inred (BL: baseline). The last row shows the corresponding re-construction results. We can see that the results of inaccuratedescriptions are poorer than those of accurate descriptions. . . 82Figure 6.16 The evaluation of interpreter using synthetic objects. The firstcolumn presents the description provided to the interpreter.Description i matches with the problem condition of syntheticobject in column (i+ 1), which is labelled in green rectangle.The last column is the algorithm selected by the interpreter.Since the interpreter would return a successful reconstructiongiven a description that matches the problem condition, thequality of reconstruction of the labelled objects indicates suc-cess/failure of the interpreter. . . . . . . . . . . . . . . . . . . 83Figure A.1 Relation between a scell and a segment . . . . . . . . . . . . 99Figure A.2 Illustration of light-matter interaction. . . . . . . . . . . . . . 102Figure A.3 The light-matter interaction. Scene radiance is linear related toincident radiance. . . . . . . . . . . . . . . . . . . . . . . . . 104Figure A.4 The light-sensor interaction. Image irradiance is linearly re-lated to scene radiance. . . . . . . . . . . . . . . . . . . . . . 105Figure A.5 The camera response function is typically non-linear. Thuspixel intensity if non-linear with respect to irradiance. How-ever, this can be corrected by radiometric calibration. Moreoften, it is assumed that pixel intensity is linearly related toimage irradiance in many vision algorithms. . . . . . . . . . . 105Figure A.6 Reconstruction results ofMVS, PS, SL, and the baseline methodVH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Figure A.7 Reconstruction results ofMVS, PS, SL, and the baseline methodVH (continued). . . . . . . . . . . . . . . . . . . . . . . . . 107xvList of Acronyms• 3D: 3-dimensional• BRDF: Bi-directional Reflectance Distribution Function• CAD: Computer Aided Design• DoF: Degree of Freedom• EPS: Example-based Photometric Stereo• GSL: Gray-coded Structured Light• MVS: Multi-View Stereo• PMVS: Patch-based Multi-View Stereo• PS: Photometric Stereo• SfS: Shape from Shading• SL: Structured Light• VH: Visual HullxviAcknowledgementsI want to thank my supervisor, Professor Sidney Fels, for offering me this op-portunity to pursue this research direction, and provide invaluable guidance in thesupervision of this thesis. I’m also deeply thankful to Dr. Gregor Miller for hismentorship and constant words of encouragement to keep me sane.I want to thank my lab mates for making it such a delight to come to work ev-eryday. I’m thankful for all the white board discussion, coding and experimenting,and late night grinding before deadlines. Those are the memories I will cherish forthe rest of my life.Last but not least, thanks to my family for their unconditional support, espe-cially in times of stress.xviiDedication 献给我的爷爷吴国利先生 xviiiChapter 1IntroductionModelling of the 3D world has been an active research topic in computer visionfor decades and has a wide range of applications including 3D mapping and nav-igation, online shopping, 3D printing, computational photography, video games,visual effects, and cultural heritage archival. The goal of 3D modelling is to recon-struct a 3D model represented by point cloud, voxel grid, depth maps, or surfacemesh, from RGB or range sensors, optionally incorporating the material of thesurface.Achieving this goal is an extremely challenging task, as it involves the reverseprocess of image formation, which is highly likely to result in a variety of possibleresults and solutions. To overcome this challenge, some assumptions must be madein terms of materials, viewpoints, and lighting conditions. In turn, a solid under-standing of the interaction of light with surface geometry and material is a prerequi-site to fully take advantage of the existing techniques. In the past decades, we havewitnessed a variety of tools and approaches to 3D modelling applied successfullyto an assortment of sub-domains, such as Computer Aided Design (CAD) tools [1],arm-mounted probes, active methods [2, 3, 11, 39] and passive image-based meth-ods [17, 23, 25, 38]. Among the existing approaches, active techniques such aslaser scanners [39], Structured Light (SL) systems [11], and Photometric Stereo(PS) [69], as well as passive methods such as Multi-View Stereo (MVS) [59], havebeen the most successful. Laser scanners and structured light techniques are seen togenerate the most accurate results, but are generally complicated to calibrate, time1consuming to scan, and demanding in terms of storing and processing. PhotometricStereo is able to achieve highly detailed reconstruction comparable to that of laserscanners, but the true depth information is lost due to the use of a single viewpoint.Further, MVS requires minimal setup and can work in both controlled, small scalelab settings as well as outdoor, medium to large scale environments. However,the quality of reconstruction is generally noisier, and is susceptible to texture andmaterial property of the surface. All of the aforementioned techniques requires anunderstanding of calibration, stereo correspondence, physics-based vision, and soon, which are not easy tasks to master.Regardless of past success and strong demands across various areas, we havenot yet witnessed any substantial progress in terms of making the mentioned tech-niques accessible to application developers or system designers (termed users forthe rest of the thesis), who generally have little or no computer vision expertise.We’ve made two key observations about computer vision algorithms: 1) few ofthese methods work well under all circumstances, nor do they share the same setupor inputs/outputs, making it difficult for developers to choose an optimal methodfor their particular application; 2) expertise knowledge is a prerequisite to fullyexploit the potentials of existing vision techniques. These observations lead us tothe following question which we address in this thesis: is it achievable to createan interface that can return a reliable reconstruction by one of the best possiblealgorithms based on the descriptions of the object or scene to be reconstructed?The interface consists of the following three layers, see Figure 1.1: the descrip-tion layer sits on top and acts as the medium between the user and the lower layers.It is through this that the user provides a description of the 3D reconstruction prob-lem. The description is passed to the interpreter layer, which chooses appropriatealgorithms given the description, and then configures each algorithm’s parameters.The interpreter can also define any necessary pre- or post-processing operations(such as noise removal or image scaling). The lowest layer of the three is wherethe algorithms sit.2L3: DescriptionL2: InterpreterL1: AlgorithmFigure 1.1: The three layers of the 3D reconstruction interface.1.1 Motivating ScenarioWe propose a scenario to further emphasize and justify the motivation: Ben is anengineer for 3DSense, a company designing and manufacturing 3D capturing sys-tems and software. It is Ben’s job to design 3D camera rigs and related softwarefor 3D capturing tasks. Ben has a background on software engineering while verylimited computer vision knowledge. Daisy, a visual effects artist, wants a 3D mod-elling system that is able to reconstruct real world objects for her projects. Shehas little programming experience, but is highly skilled at 3D modelling tools. SeeFigure 1.2 for more details of their backgrounds respectively.Ben starts by looking into general computer vision libraries and existing 3D re-construction techniques, and comes across a general vision library called OpenCV,several multi-view stereo software (PMVS, MVE), a photometric stereo techniquecalled example-based PS, and a structured light technique called Gray-coded SL.Since he fails to find any out-of-box 3D reconstruction algorithms implemented inOpenCV, he decides to use the existing techniques instead. He develops a soft-ware that integrates multiple algorithms, and provides a window to tune algorithm-specific parameters, as shown in Figure 1.3 (a). Ben builds a multi-purpose camerarig with camera-projector pairs and light sources, that is able to capture appropriateimages for algorithms across varied categories, including MVS, PS, SL, and VHtechniques. More specifically, the rig consists of three rings of camera-projectorpairs, each positioned at a zenith angle of 15, 45, and 75. The azimuth angle be-tween two neighbouring cameras is 30. The light sources are positioned so that nocamera views are blocked. He first tests the system with a white porcelain cup, andselects PMVS as the reconstruction algorithm. The ‘parameter’ window displays a3Application developer BenVisual effects artist DaisySoftware.. . Camera rig .. . , . Background. . -. . . - . . .. .. -... -Figure 1.2: Two target user of the proposed scenario.list of parameters to tune this specific algorithm, including window size, cell size,photo-consistency measure, and seven other parameters, the effects of which arelargely unclear. He runs the software using the default parameter settings since heis unaware of the effects of these parameters. The reconstructed result turns out tobe very poor in the sense that the surface is not smooth and contains many holes.He suspects that this is due to the settings of the parameters. He tries to increasethe cell size while fixing the other parameters, which leads to a even sparser re-sult. After multiple ‘trial-and-error’, the software still fails to achieve a successfulresult. Ben decides to try out another library called OpenVL, which is able to pro-duce a successful solution by selecting one of the underlying algorithms given adescription of appearance of the object. He writes a software that allows the usersto simulate the appearance of an object by tuning visual and geometric parametersusing sliders, including texture, albedo, specularity, and roughness, as shown in4Figure 1.3 (b). The ‘visualization’ window displays a synthetic object, showingthe effects of these parameters. The software then invokes the API to run recon-struction once an algorithm is selected by the interpreter. Ben places the whiteporcelain cup inside the camera rig and moves the sliders until the synthetic ob-ject resembles that of the target object. The parameter settings are as follows: 0.2(texture), 0.8 (albedo), 0.8(specularity), 0.2 (roughness). He then presses the ‘re-construction’ button, the algorithm EPS is selected by the underlying interpreter,and a successful result is obtained. After some refinements of the camera rig andthe software, Ben sends the prototype to the testing team for further testing.Daisy the 3D artist, is invited to a user testing of the 3D modelling systemby 3DSense. She starts by placing a white porcelain vase inside the camera rigand opens the software. She sees a parameter configuration interface with severalsliders labelled with visual and geometric properties, including ‘texture’, ‘albedo’,‘specularity’, and ‘roughness’. She moves the slides and observes that the syn-thetic object displays the effect of the property immediately. After some practice,she starts to tune the visual parameters of the target object: she moves each slideruntil the synthetic object appears to have the same appearance as the target ob-ject. The settings are as follows: 0.2 (texture), 0.8 (albedo), 0.8 (specularity), and0.8 (roughness). The interpreter then selects GSL as the reconstruction algorithm,and proceeds to capture the input images using the camera rig for reconstruction.However, there are noticeable holes on the reconstructed model. She observes thatthis ceramic vase is highly smooth and glossy, thus decides to set ‘roughness’ to0.2. The interpreted algorithm becomes EPS, and the reconstruction result turnsout to be smooth and with no surface holes. Daisy is curious to see how sensitivethe interpreter is with respect to less accurate descriptions. She sets the parametersto the complete opposite, i.e., 0.8 (texture), 0.2 (albedo), 0.2 (specularity), and 0.8(roughness). This time, PMVS is selected by the interpreter, and a noisy recon-structed model is obtained. She then proceeds to do the same with all her objectsincluding a bust, statue tuning each of the properties before doing the reconstruc-tion.5PMVSMVEGray-coded SLEPSLLS-PSBinary-coded SLVisual HullAlgorithms Parameterslevel 1csize 2threshold 0.7wsize 7minNImg 3CPU 4useVis 0maxAng 10PMVSMVEGray-coded SLEPSLLS-PSBinary-coded SLVisual HullAlgorithmsParameterstexturealbedospecularityroughmetallicsub-surf scatteringanisotropicVisualizationconcavity(a). Traditional 3D reconstruction (b). Interface of 3D reconstructionFigure 1.3: Scenario of using the traditional approach and the proposed in-terface for 3D reconstruction tasks.1.2 OutlineThe problem addressed in this thesis can be described as follows: construct an in-terface for 3D reconstruction that can return a reliable reconstruction result by oneof the best-suited algorithms, which is determined by the description of the problemcondition. More specifically, a problem space is proposed that transforms the 3Dreconstruction problem from one requiring knowledge of algorithm details to onethat is based on the relation between the problem condition and algorithms. Next,a well defined model and representations are developed to describe the problemspace definitively. Lastly, mapping between the problem space and the algorithmsis discovered, from which a proof of concept interpreter is proposed. An evaluationis then carried out to verify the robustness of the interpreter.1.2.1 Related WorkWe discuss the existing software and toolboxes for 3D reconstruction, and presentthe required vision background needed to fully take advantage of these toolboxes.A review of the 3D acquisition techniques is provided, organized by the visual andgeometric cues used for reconstruction.61.2.2 A Problem Space of 3D ReconstructionExisting approaches to 3D reconstruction focus more on providing algorithm so-lutions to problems, which we call an algorithm-centred approach. This approachprovides little insight to the problem conditions that a specific algorithm is ap-plicable to. We proposed a problem-centred approach that gives a well-definedproblem space, which allows further investigation of the relation between problemconditions and algorithms (termed mapping in this thesis). This mapping can beused to choose a best possible algorithm based on a described problem condition.The problem condition consists of a variety of visual and geometric properties ofobjects. The aggregate problem conditions is called problem space.1.2.3 A Description of 3D ReconstructionIn previous cases, the mapping from a problem condition to an algorithm has beenambiguous due to the problem space that is poorly defined. Here, we set out to pro-vide a rigorous description of the problem condition itself. First, a model consistingof key object properties is developed. Second, the representations of the problemspace are proposed. Lastly, common 3D reconstruction tasks are expressed usingthe proposed model and representations.1.2.4 A Mapping of 3D ReconstructionTo derive a more precise mapping from problem space to algorithms, we need toevaluate the performance of selected algorithms under varied properties and theircombinations. We use synthetic datasets to achieve this goal. The main challengein conducting such a comprehensive evaluation is the large variations of shape andmaterial properties. To overcome this issue, we first discover properties that havemain and interaction effect on algorithm performance (termed effective property).Then we evaluate the performance of each algorithm under the conditions of theseeffective properties, which serves as the basis of the mapping.1.2.5 An Interpretation of 3D ReconstructionWe conduct the evaluation of the interface around three key evaluation questions:1) can the proof of concept interpreter return one of the best-suited algorithms7that achieves a successful reconstruction given the correct description; 2) will aless accurate description give a poorer reconstruction result; 3) will an inaccuratedescription give a poor reconstruction result. To answer these questions, we carryout three separate experiments, use synthetic and real-world datasets to evaluatethe interpreter in Section 6.4.1.3 ContributionsThe main contribution of this thesis is the development and application of an in-terface for a subset of 3D reconstruction problem, which hides algorithm detailsand allows users to describe conditions surrounding the problem. We focus on asubset of problem, which is defined in Chapter 3, to approach this problem in atractable manner. This described condition, which consists of varied visual andgeometric properties, can be interpreted so that an appropriate algorithm is chosento reconstruct a successful result. This endeavour is non-trivial for two reasons: 1)currently, most approaches can only achieve satisfactory results on a limited set ofcategories of objects; 2) a solid understanding of reconstruction algorithm detailsis a prerequisite to fully take advantage of the existing techniques, which is diffi-cult for application developers to obtain. To some extent, our interface attempts toaddress a broader problem space by incorporating multiple algorithms. Though itcovers a wider range of problem space than a single algorithm, it is still confinedwithin the space covered by the existing techniques within the interface. Thus, ourevaluation is carried out within the problem space covered by the selected algo-rithms.1.4 OrganizationWe organize this thesis as follows. Chapter 2 briefly introduces 3D reconstructiontoolboxes and gives an overview of current landscape of 3D reconstruction field.In Chapter 3, we propose a simplified problem space of 3D reconstruction prob-lems and propose four problem conditions that will be investigated in depth. InChapter 4, we provide a formal description of problem condition of a 3D recon-struction problem. In Chapter 5, we develop the relation from problem condition toalgorithms by evaluating the performance of a selection of algorithms under varied8problem conditions. In Chapter 6, we use both synthetic and real-world datasets todemonstrate the interpretation of the user specified description and the robustnessof the proof of concept interpreter.9Chapter 2Related WorkIn this chapter, we review the existing software and algorithms in the field of 3Dcomputer vision. Section 2.1 discusses the existing software and toolboxes for3D computer vision. Section 2.2 presents a comprehensive review of the field ofimage-based 3D reconstruction algorithms based on varied visual and geometriccues, which include stereo correspondence, shading, silhouette, texture distortion,and focus.2.1 ToolboxesThere have been many attempts in developing computer vision or image process-ing frameworks that support rapid development of vision applications. There aremultiple general vision libraries in this field including OpenCV [14], VLFeat [65],VXL [4] and multiple Matlab libraries [37, 45]. These libraries often provide toolsfor multiple image processing and computer vision problems, including low-visiontasks such as feature detection and matching, middle-level vision tasks such as seg-mentation and tracking, and high-level vision problems such as classification andrecognition. All of these software frameworks and libraries provide vision compo-nents and algorithms without any context of how and when they should be applied.As a result, they often require expert vision knowledge for effective use. For ex-ample, many feature detectors and descriptors are provided by OpenCV but withno indication of under what conditions each works most effectively.10We have witnessed many successful software packages in the field of image-based reconstruction, which is a sub-field of 3D reconstruction. One of the mostwidely used open source software is PMVS developed by Furukawa [23], which isused not only by computer vision and graphics engineers, but also production com-panies like Industrial Light & Magic, and Google, etc. It’s often used together withBundler, which is a Structure from Motion software that estimate camera param-eters from images developed by Noah Snavely [63], and Poisson Surface Recon-struction developed by Michael Misha Kazhdan, which is a surface mesh softwarethat estimate the triangulated surface from oriented point cloud [36]. Some othernotable open source software include VisualSfM [70], CMP-MVS [28], MVE [22],and openMVG [48]. However, effective use of those software requires a basic un-derstanding of the relevant domain, including feature detection, matching, cameracalibration, dense correspondence search, etc.This current situation motivates us to provide an description-based interface fornon-vision users to access the state-of-the-art techniques in their own applications.2.2 3D Reconstruction TechniquesImage-based 3D reconstruction attempts to recover the geometry and material (op-tional) of the object from images under different viewpoints or illuminations. Theend goal here can be described as “given a set of images of an object or a scene, es-timate the most likely 3D shape that explains those images, under the assumption ofknown materials, viewpoints, and lighting conditions”. This definition reveals thatif these assumptions are violated, this becomes an ill-posed problem since multi-ple combinations of geometry, viewpoint and illumination can produce exactly thesame images [52]. Thus this makes for an extremely challenging task.The 3D reconstruction technique exploits a variety of visual and geometric cuesto extract geometry from images: stereo correspondence, shading, contour, texture,focus, etc. This review of algorithms is structured based on these reconstructioncues. Please refer to Figure 2.1 for an overview, where the algorithms are organizedbased on the cue used for reconstruction.11Class Method Cue ProblemShapefromStereoStereo corre-spondenceTexture,Albedo,SpecularShapefromIntensityShadingvariationAlbedo,Specular,Geome-tryShapefromSilhouetteSilhouette GeometryFigure 2.1: Three classes of algorithms with examples, visual cues used forreconstruction, and potential problems. The bottom left image: Silhou-ette Cones, ©CC BY-SA 3.0. The bottom right image: Visual Hull,©CC BY-SA 3.0.2.2.1 StereoStereo correspondence is one of the most widely used visual cues in 3D vision.Passive methods, including stereoscopy, trinocular stereo, and MVS, identify cor-respondences across different views, and estimate the 3D point by triangulation.However these passive approaches suffer from uniform or periodic surfaces. Ac-tive techniques attempt to overcome the correspondence problem by replacing oneof the cameras with a controllable illumination source, e.g., single-point laser, slitlaser scanner, temporal or spatially modulated Structured Light (SL), etc. Herewe refer readers to the survey article by Blais for recent developments of activemethods. We classify one of the most widely used passive methods, MVS algo-rithms, based on the taxonomy proposed in [59], which divides the field into fourclasses based on reconstruction method, and categorize one of the active methods,Structured Light technique, by projection patterns.12Passive method: Multi-View StereoVolumetric stereo algorithms compute a cost function in a 3D volume, then ex-tract a surface from this volume. One successful example is voxel colouring, whichtraverses a sampled 3D space in “depth-order” to identify voxels that have a uniquecolouring, constant across all possible interpretations of the scene [58]. Anotherthread of work formulates the problem in the Markov Random Field (MRF) frame-work and extracts the optimal surface by Graph-Cut algorithms [54, 66, 67].Surface Evolution algorithms work by iteratively evolving a volume or sur-face to minimize a cost function. This includes methods based on voxels, level set,and surface meshes. The Space Carving technique achieves a least-commitmentshape [46] by iteratively removing inconsistent voxels from the scene [38]. Level-set techniques cast the problem as a variational problem, and use a set of PDE’sas cost functions, which are deformed from an initial set of surfaces towards thedetected objects [17]. Other approaches use a deformable model and representthe scene as surface meshes that moves as a function of internal and externalforces [16]. Hiep et al. presented a visibility-based method that transforms a densepoint cloud into a surface mesh, which is fed into a mesh-based variational refine-ment that captures small details, smartly handling photo-consistency, regulariza-tion and adaptive resolution.Region Growing algorithms start with a sparse set of scene points, then prop-agate these points to spatial neighbours, and refine the cost function with respectto position and orientation of the points. Otto and Chau proposed one of the firstwork on region growing stereo search [51]. The idea of this algorithm is as fol-lows: start with an approximate match between a point in one image and a pointin another, use an adaptive least-squares correlation algorithm to produce a moreaccurate match, and use this to predict approximate matches for points in the neigh-bourhood of the first match. Lhuillier and Quan proposed a two-view quasi-denseapproach, which first sorts a list of point correspondences into a list of seed pointsby correlation score. Next, at each step of the propagation, a ‘best’ seed point ischosen. Lastly, in the immediate spatial neighbourhood of this seed point, newpotential matches are checked and the best points are added to the current list ofseed points [40, 41]. This “best-first” strategy guarantees convergence by choosing13only new matches that have not yet been selected. Further, a patch based approachis proposed that undergoes multiple iterations of matching, propagation, and fil-tering [23]. A stereoscopic approach called PatchMatch Stereo, which is inspiredby an approximate nearest neighbour matching algorithm called PatchMatch [7],starts by randomly assigning an oriented plane to each pixel in two views. Next,each pixel is taken through three iterations of propagation and refinement. Theplane is propagated to spatial neighbours, the corresponding pixel from anotherview, and across time. It can achieve sub-pixel accuracy, but is computationallyheavy and challenging for parallelism. There has been some efforts to apply Patch-Match Stereo to multi-view scenarios [24, 64, 72], and develop new propagationschemes to increase the computational efficiency [24].Depthmap Merging algorithms work by computing a per-view depthmap. Bytreating a depthmap as a 2D array of 3D points, multiple depthmaps can be con-sidered as a merged 3D point cloud. A ‘winner-takes-all’ approach uses a setof sampled depth values and picks the value with the highest photo-consistencyscore for each pixel independently. Uniform depth sampling may suffice for simpleand compact objects. However, for complex and large scenes, a proper samplingscheme is crucial to achieve high speed and quality. More sophisticated cost func-tions are derived to account for occlusion or non-Lambertian effects which mayadd noise to the photo-consistency score [25, 67]. In the case of severe occlu-sion, spatial consistency can be enforced under the assumption that neighbouringpixels have similar depth values. This can be formulated under the Markov Ran-dom Field (MRF) framework, where the problem becomes minimizing the sum of aunaryF(·) and pairwise termY(·, ·). The unary term reflects the photo-consistencyscore of assigning a depth value dp from a set of depth value to the pixel p, whereasthe pairwise term enforces the spatial regularization, and assigns the cost of settingdepth label kp, kq to a pair of neighbouring pixels p and q, respectively.E({kp}) =ÂpF(kp)+ Â(p,q)2NY(kp,kq)MVS algorithms generally have a less strict requirement on the setup, and workrelatively reliably in unconstrained environments. However, it suffers under thefollowing two conditions:14Lack of texture: Multi-View Stereo algorithms take advantage of textural in-formation to establish point correspondences across different views. Thus homo-geneous surfaces pose great challenges to MVS algorithms. We have witnessedsurprisingly good results on a textureless object “Dino” in the Middlebury MVSbenchmark [59]. It turns out that MVS algorithms are able to exploit very weak andintricate image textures, most of which come from shading or shadowing effects.However, these texture are so weak that images have to have very high quality.Non-Lambertian surface: MVS algorithms require to observe the same sur-face patch from different angles in order to establish correspondences across views.Thus, the same surface patch needs to have similar or same appearance from differ-ent perspectives, and hence, most of the algorithms assume Lambertian reflectance.Pure Lambertian surfaces are rare in reality, but it is empirically verified that mostMVS algorithms perform reasonably well on non-Lambertian surfaces. As longas the cameras can capture the diffuse reflectance component, then the photo-consistency function is able to identify and ignore images whose non-diffuse ef-fects (e.g., specular highlights) are strong, then utilize the diffuse component inthe remaining images. Further, there are some attempts to overcome this limita-tion, a pure passive methods was proposed that directly model and analyze non-Lambertian effects for MVS algorithms [34, 35].Active method: Structured LightTo overcome the problem of lack of texture, one of the cameras in stereoscopy canbe replaced by an illumination source, e.g., a projector, which is called StructuredLight technique. It is based on projecting a temporally or spatially modulated pat-tern onto a surface and viewing the illuminated surface from one or more pointsof view. The correspondence is easily detected from the projected and imagedpattern, which is triangulated to obtain the a 3D point. Each pixel in the patternis assigned a unique codeword, and the codeword is encoded by using grey level,colour or geometric representations. Structured Light is classified based on thefollowing coding strategy: temporal, spatial and direct codification [55]. Temporaltechniques generate the codeword by projecting a sequence of patterns. Spatialcodification represents each codeword in a unique spatial pattern. Direct codifica-15tion techniques define a codeword for every pixel, which is equal to its grey levelor colour.Temporally encoded SL projects a sequence of patterns successively onto thesurface. The codeword for a given pixel is formed by a sequence of illumina-tion values for that pixel across the projected patterns. This kind of pattern canachieve high accuracy due to two factors: 1) the codeword basis is small (e.g., twofor binary pattern), therefore, each bit is easily distinguishable; 2) a coarse-to-finestrategy is used, and the position of the pixel becomes more precise as the patternsare successively projected. This technique can be further classified by the way pat-tern is encoded temporally: 1) binary codeword; 2) n-ary codeword; 3) Gray-codecombined with phase shifting; 4) hybrid techniques. More details are available inthis review [55].Spatially encoded techniques concentrate all coding into a unique pattern. Thecodeword that labels a certain pixel is obtained from the neighbourhood of pixelsaround it. Normally, the visual features gathered in a neighbourhood are the inten-sity or colour of the pixels or groups of pixels around it.Directly encoded methods represent the codeword in each pixel directly. Toachieve this, we need to use either a large range of colour values or introduce pe-riodicity. However, this kind of pattern is highly sensitive to noise because the“distance” between codewords is nearly zero. Moreover, the perceived colour de-pends not only on the projected colour, but also the intrinsic colour of the surface.Therefore, reference images must be taken to eliminate the effect of surface colour.This kind of coding can be further classified as: 1). codification based on greylevels; 2). codification based on colour. More details are available in review [55].Structured Light techniques overcomes the lack of texture problem by activelyprojecting a pattern onto the surface. However, it still suffers under the followingconditions:Low surface albedo poses a great challenge to active methods, such as SL,which utilize reflected light to establish correspondences across different views.Regardless of which projection pattern is used, the most critical component of anySL system is the decoding process, which retrieves per-pixel codeword from theimaged projection pattern. Thus, the surface albedo needs to be strong enough sothat sufficient amount of reflected light can reach the camera sensor.16Non-Lambertian surfaces exhibit strong reflection in the specular direction.Images of such surfaces are challenging to interpret due to the bright points or high-lights, which makes the projected pattern indistinguishable in these areas. Thus, itis impossible to decode the pixels exhibiting specular effects.Concavity is the cause of global light transport, such as inter-reflection, whichresults in surface patches receiving light from sources other than the projector.Thus, the intensity value or colour of a pixel becomes noisier, which seriouslyaffects the accuracy of decoding process.2.2.2 ShadingShading variation is an effective visual cue for retrieving shape of a surface. Shad-ing variation depends on surface geometry (surface orientation), reflectance (ma-terial), and lighting (illumination). This is generally an ill-posed problem becausedifferent shapes illuminated under different light conditions may produce the sameimage. It becomes possible to estimate surface orientation once the reflectanceproperty and illumination are known. This technique of estimating surface shapeby shading variation is called Shape from Shading. However, this technique re-quires strict constraints on surface geometry since only one input image is used,which leads to a novel technique called Photometric Stereo in which surface orien-tation is determined from two or more images. The idea of Photometric Stereo isto vary the direction of the incident illumination between successive views whileholding the viewing direction constant. This provides enough information to deter-mine surface orientation at each pixel [68]. This technique can produce a surfacenormal map with the same resolution of the input image, i.e., to produce the pixel-wise surface normal map. Since the coefficients of the normal map are continuous,the integrated height map can reach an accuracy that cannot be achieved by any tri-angulation methods. Therefore, the Photometric Stereo technique is more desirableif the intrinsic geometric details are of great importance.Shape from ShadingThe problem of recovering the shape of a surface from intensity variation is firstproposed by Horn [31]. It assumes that the surface under consideration is of a17uniform albedo and reflectance, and that the direction of the single distant lightsource is either known or can be calibrated by the use of a reference object. Thusthe intensity I(x,y) becomes purely a function of the local surface orientation. Theinformation of reflectance, illumination, and viewing geometry can be combinedinto a single function called reflectance map R(p,q), that relates surface orientationdirectly to image intensities:I(x,y) = R(p(x,y),q(x,y))where (p,q) = (zx,zy) are surface gradients. Unfortunately, there are more un-known (per-pixel gradient) than there are measurements (per-pixel intensity). Morespecifically, surface orientation has two unknowns(p,q) whereas measurements ofthe brightness at a single pixel only provide one constraint. Thus, additional in-formation regarding the surface reflectance and illumination, as well as constraintson surface geometry, such as smoothness or integrability are required to estimate(p,q). One common used constraint is smoothness:Zp2x + p2y +q2x +q2ydxdy=Zk—pk2+k—qk2dxdyAnother is the integrability constraint:Z(pyqx)2dxdysince for a valid depth z(x,y) with (p,q) = (zx,zy), we have py = zxy = zyx = qx.Most shape from shading algorithms assume that the surface under consider-ation is of a uniform albedo and reflectance, and that the light source directionsare either known or can be calibrated by the use of a reference object. Thus, theyare applicable to textureless surfaces with uniform and known albedo. Besides, atedious calibration step needs to be carried out to estimate light direction and in-tensity. However, even by assuming the simplest reflectance model, Lambertianreflectance, the survey by Zhang [71] demonstrated that SfS algorithms generallyperform poorly, and none performs well in all cases.18Photometric StereoThe Classical Photometric Stereo, first proposed by Woodham [69], utilized mul-tiple light sources from different directions to overcome the ambiguity of Shapefrom Shading. Assuming Lambertian reflectance, P pixels per image, and Q illu-mination directions, the intensity of the ith pixel under jth illumination isIi, j = ri~n>i ~l j) I= N>Lwhere I 2 RP⇥Q stores the pixel intensity from all images. Each column containspixels from each image while each rows contains intensity of each pixel under allillumination conditions. N 2 RP⇥3 encodes the albedo-scaled surface normal foreach pixel, i.e., Ni,: = ri~n>i . L 2 R3⇥Q encodes the light source directions, i.e.,L:, j =~l j. This surface reflectance, i.e., spatially varying albedo ri, and the normalni can be estimated byN= IL+) ri = kNi,:k) ni =N>i,:kNi,:kThus, the problem of estimating shape of a Lambertian surface under knownlighting conditions has a simple solution. However, this algorithm fails to workonce these constraints are violated. Thus, past research efforts have been focusedon generalizing various assumptions made by classical photometric stereo. For thecamera assumption, orthographic projection can be achieved by using a lens withlong focus and placing the objects far from the camera. The nonlinear response canbe solved by performing radiometric calibration. The shadow and other global lighttransportation are a few of the sources of errors, where some approaches considerthem as outliers and remove them before normal estimation. The reflectance andlighting assumptions, however, are the most complicated since the reflectance prop-erties depends on material property and microscopic structure. Further, lightingcan have either an arbitrary or fixed position, orientation, and intensity. Therefore,19research on Photometric Stereo are generally on two directions: 1) generalizationof reflectance; 2) generalization of lighting conditions. A summary of assumptionsmade by various classes of PS algorithms are presented in Table 2.1.Category Camera Light source ReflectanceClassical PS Orthographic Directional, known inten-sity and directionLambertianGeneralizedlighting PSOrthographic Unknown intensity anddirection, ambientLambertianGeneralizedreflectancePSOrthographic Distant, known intensityand directionNon-LambertianTable 2.1: Assumptions made by different classes of photometric stereo.Generalization of Lighting It is possible to estimate the surface orientationwithout knowing light directions, a case also known as uncalibrated PhotometricStereo, see Table 2.1. Most uncalibrated techniques assume Lambertian techniquesand are based on factorization technique proposed in [27]. Recall the irradianceequation:I= N>LHowever, an infinite number of candidates Nˆ and Lˆ make the above equality met.In fact, any invertible 3⇥3 matrix G defines a candidate pair Nˆ = N ·G, Lˆ=G1L.Thus the normal N and light source direction L can only be recovered up to alinear transformation. It has been shown that only a 3-parameter subset of thesetransformations, known as the Generalized Bas-Relief (GBR) ambiguity, preservesurface integrability [9].Other generalized lighting conditions are any situations other than the idealcase of using a single distant point light source in a dark room, such as naturalambient light, multiple point light sources with or without ambient lighting, etc.To make the problem more tractable, the reflectance model should no longer bea general one, as this involves too many degrees of freedom that results in manydifferent shapes with incorrectly estimated general reflectance and incorrectly es-timated general lighting.20Generalization of Reflectance This direction of research has been to relaxthe assumption of Lambertian reflectance. This can be broadly divided into fourclasses of algorithms.Outlier rejection approach assumes that Non-Lambertian reflectance can bewell approximated by the sum of diffuse and specular lobe. The specular pixelsare considered as outliers in [15] and [8]. Others assume that the colour of thespecular lobe differs from that of the diffuse lobe, which allows the separation ofthe specular and diffuse components [44, 56, 57].Reference object approach uses a reference object that has similar material asthe target object. This is proposed in [62] and later revisited in [29]. The idea isthat surface points with same orientation give similar intensity values under similarreflectance and lighting. It can deal with arbitrary BRDFs as long as the referenceand target object has the same material. It can handle spatially-varying BRDFsas long as there are multiple reference objects. Each reference object serves as a“basis” BRDF, and the BRDF at any point on the target object can be approximatedas a linear combination of the basis BRDFs.Parametric reflectance model approach builds upon the idea that an arbitraryBRDF can be approximated by “basis” BRDFs, and replaces the reference objectswith sophisticated BRDF models. An isotropic Ward model is used as basis BRDF,and the surface orientation and parameters of the reflectance models are estimatediteratively [26].Invariants of BRDF approach exploits various physical properties of BRDFs.While parametric reflectance models are very good at reducing the complexity ofBRDFs, they are usually only valid for a limited class of materials. An alternativeis to exploit the invariants of BRDFs, typically including energy conservation, non-negativity, Helmholtz reciprocity, isotropy, and so on [5, 73].Photometric Stereo can work extremly well under certain constrained condi-tions. However, it generally performs poorly once the aforementioned assumptionsare violated: the classical PS and generalized reflectance PS fail to work under un-calibrated light conditions. The generalized lighting PS only handle Lambertiansurfaces under uncalibrated lighting conditions, but only achieves estimation up toa linear transformation; the classical PS and generalized lighting PS fail to workunder generalized reflectance conditions; and lastly, most PS algorithms fail to21work on conditions of generalized lighting and reflectance, one approach that hasbeen proved to work is to place multiple reference objects in the scene with thetarget object as proposed by [29].2.2.3 SilhouetteIn some cases, it’s an easy task to perform a foreground segmentation of the objectof interest, which leads to a class of techniques that reconstructs a 3D volumet-ric model from the intersection of the binary silhouettes projected into 3D. Theresulting model is called a visual hull.The basic idea of shape from silhouette algorithms is that the object lies in-side the intersection of all visual cones back-projected from silhouettes. Supposethere are multiple views V of the target object. From each viewpoint v 2 V , thesilhouette sv can be extracted, which is the region including the object’s interiorpixels and delimited by the line(s) separating the object from the background. Thesilhouette sv are generally non-convex and can represent holes due to the geometryof the object. A cone-like volume conev called (truncated) extended silhouette isgenerated by all the rays starting at the centre of projection and passing through allthe points of the silhouette. The target object is definitely internal to conev and thisis true fro every view v0 2 V ; it follows that the object is contained inside the vol-ume cV = \v2V cv. As the size of the V goes to infinity, and all possible views areincluded, cV converges to a shape known as the visual hull vh of the target object.Voxel based methods First the object space is split up into a 3D grid of voxels;each voxel is intersected with each silhouette volume; only voxels that lie insideall silhouette volumes remain part of the final shape.Marching intersections basedmethods Themarching intersection (MI) struc-ture consists of 3 orthogonal sets of rays, parallel to the X , Y , and Z axis, whichare arranged in 2D regular arrays, called the X rayset, Y rayset, Z raysetrespectively. Each ray in each rayset is projected to the image plane to find theintersections with the silhouette. These intersections are un-projected to computethe 3D intersection between the ray and the extended silhouette on this ray. Thisprocess is repeated for each silhouette, and the un-projected intersections on thesame ray are merged by the boolean AND operation.22Once the MI data structure representing the intersection of all extended sil-houettes, a triangular mesh is extracted from it. This is done by the MI techniqueproposed in [53] which traverses the “virtual cells” implicitly defined by the MI,builds a proper marching cube (MC) entry for them that in turn is used to index aMC’s lookup table.Exact polyhedral methods The silhouette is converted into a set of convex ornon-convex 2D polygons with holes allowed. The resulting visual hull with respectto those polygonal silhouettes is a polyhedron. The faces of this polyhedron lie onthe faces of the original cones. The faces of the original cones are defined bythe centre of projections and the edges in the input silhouettes. The idea of thismethod is: for each input silhouette si we compute the face of the cone. Thenwe intersect this face with cones of all other input silhouettes, i.e., a polygon-polyhedron intersection. The result of these intersections is a set of polygons thatdefine the surface of the visual hull.Visual Hull algorithms don’t rely on material properties as long as the fore-ground of the image can be reliably segmented, thus is applicable for objects witharbitrary reflectance properties. However, it fails to carve the concavities on theobject surface, thus is unsuitable for concave objects.23Chapter 3A Problem Space of 3DReconstructionWe discussed the current landscape of 3D reconstruction in Chapter 2. Previousresearch has solely focused on developing novel algorithms and software to tacklethis problem. Thus, most research efforts have been devoted to improving algo-rithm performance in terms of accuracy, completeness, computational efficiency,or relax restrictive assumptions so that they can be applied to more general situa-tions. However, this approach, which we call an algorithm-centred approach, facestwo challenges: 1) it provides little insight into the conditions that allow a specificalgorithm to work successfully; 2) it requires domain-specific knowledge to finetune algorithm specific parameters to optimize the performance. This knowledgeis either unknown or largely empirical, with each algorithm mapped roughly to asub-volume in the problem space that is poorly defined, thus requires vision knowl-edge to fully take advantage of these algorithms. In this thesis, problem space isdefined as an Ndimensional space which encompasses the material and shapeproperties of objects, and thus the axes of which consist of characteristic materialand geometric attributes (called properties). The sub-volume of the problem spaceis called problem condition(s). We argue below that a well-defined problem spaceis a critical part of designing an interface of 3D reconstruction, and should receivemore research efforts.We have established in Chapter 2 that most 3D vision algorithms target a lim-24ited set of problem conditions. They may work under one condition, but is highlylikely to fail under others. Thus, they are unsuitable to reconstruct objects with awide range of properties. It is crucial to have multiple algorithms, each registeredto a distinct sub-volume of the problem space in order to design an interface for 3Dreconstruction problem. To achieve this goal, we need to first propose key attributesof objects as axes of this Ndimensional space, which lay the foundation of de-scribing problem conditions in a consistent and rigorous manner. Next, we needto discover the relation between problem conditions and algorithms so that we canknow which algorithm performs well given a specific problem condition, whichwill be discussed in Chapter 5. Recall that problem space is an Ndimensionalspace, of which the axes are material and shape properties of objects. The rea-son we choose material and shape properties is that they can be visually and per-ceptually estimated, and are also widely used by typical 3D vision algorithms asreconstruction cues. With a well-defined problem space, we are able to describethe characteristic properties of an object that are crucial for reconstruction. For in-stance, instead of describing a cup simply as a ‘cup’, we can describe it as ‘a white,glossy porcelain cup with shallow strips on the surface’. The visual and geometricproperties, represented by words such as ‘white’, ‘glossy’, and ‘shallow’, are cru-cial in terms of determining which algorithm is able to perform well. We call it aproblem-centred approach. This approach transforms the 3D reconstruction prob-lem from one requiring knowledge and expertise of specific algorithms in termsof how to use them, to one requiring knowledge of problem conditions, which canbe perceptually estimated or measured. The advantage of the problem-centred ap-proach are as follows: 1) the properties can be universally used by most objects,without the need of algorithm-specific parameters; 2) the properties of the problemspace can be visually or perceptually estimated. Thus, there is no need to under-stand the meanings of algorithm parameters, i.e., no vision knowledge required.In this chapter, we first propose a well-defined problem space consisting of vi-sual and geometric properties of objects in Section 3.1. Since the problem space isgenerally too vast to tackle, we state addition assumptions and underlying rationaleto limit the scope of problem space in Section 3.2. Finally, we propose four mainproblem conditions that we are interested in investigating in this thesis.253.1 Problem SpaceWe first give an overview of problem space, which consists of visual and geometricproperties of real-world objects, as shown in Figure 3.1. These properties can beconceptualized as axes of the 3D reconstruction problem space. This approach al-lows us to think of algorithms as “pointers” to volumes within an Ndimensionalproblem space. Existing algorithms can be incorporated into the interface by evalu-ating the algorithm performance within the problem space, as shown in Figure 3.2.However, by no means is the presented problem space complete. There are manyother properties not included that are commonly seen in the real world. For in-stance, properties such as metalness, emission, occlusion, discontinuity, amongothers, are not considered. However, the listed set of properties are broad enoughto encompass a wide range of real-world objects. To help easy identification of aspecific problem condition, we propose the following labels to differentiate objectclasses, as shown in Figure 3.1.Mixed diffuse and specular (M)Concave (Cv)Smooth (S)Refraction(Rf)Texture Light-matterTranslucencyRepeated Texture (Tr)Textured (T) Subsurface scattering(Ss)Textureless (Tl) Diffuse (D)BrightnessDim (D)Convex (Cx)Rough (R)ConcavityRoughnessOpaque (O)Translucent (Tl)Transparent (Tp)Bright (B)Figure 3.1: A list of properties with labels of the proposed problem space.263.2 AssumptionsThough 3D reconstruction of opaque surfaces with Lambertian reflectance hasbeen a well studies problem, specular, translucent, transparent, and refractive scenesstill pose challenging problems for acquisition systems [32]. Ihrke et al. conducteda comprehensive review on acquisition approaches for transparent and specular sur-faces, and concluded that though different classes of techniques have been devel-oped to deal with these problems and good reconstruction results can be achievedwith current state-of-the-art techniques [32]. However, the proposed approachesare still highly specialized and targeted at a very specific object class [32]. Thus,the limits of existing techniques demands that only a subset of problem space canbe solved robustly. Further, it is more feasible as a proof of concept demonstrationto target a sub-space. Therefore, we make the following assumptions to the keepproblem space more tractable:Simplified Light-Matter Interaction ModelWe assume local interaction model, i.e., global light transport such as transmis-sion, refraction, cast shadow, inter-reflection are not considered. The rationalebehind our choice is that most techniques that have been developed over the pastfew decades mainly tackle object with an opaque, diffuse or mixed surface [32].Since the majority of reconstruction techniques rely on observing light reflected offa surface, surfaces exhibit significant effect of global light transport present a hugechallenge to the reconstruction problem. For specular, refractive, and translucentor transparent objects, only very specialized algorithms are applicable for recon-struction [32]. This is a widely used and accepted model in varied areas of com-puter vision, including shape from stereo, shading, and so on. As more algorithmsbecome available to tackle these types of objects, they can be embedded to the in-terface using the same approach that will be discussed in Chapter 5, as shown inFigure 3.2.Simplified Reflectance ModelMost real-world surfaces are not optically flat, or smooth at a microscopic scale.In most cases, there are irregularities present which are much larger than the light27sub problem spaceproblem spaceDimension reductionMapping discoveringAlgorithmsMappingSuccessful algorithmsFigure 3.2: Embed algorithms into the interface. The process is three-fold:1) reduce problem space by discovering effective properties denoted byred; 2) discover mapping from the lower-dimensional problem space toalgorithm, denoted by green; 3) mapping from problem space to algo-rithms denoted by blue.wavelength, but too small to be seen or resolved. The microfacet theory assumesthat the surface is composed of a large collection of microfacets, each reflectinglight from a given incoming direction into a single outgoing direction which de-pends on microfacet normal. We assume microfacet reflectance model, which con-sists of a diffuse and a specular term. The diffuse term can be approximated by aLambertian model, and the specular term is determined by material, viewing angle,shadowing, surface geometry, and so on.Simplified Geometric ModelIt is a challenging task to model geometry using mathematical descriptions. Forgeometric primitives such as cube, sphere, or cone, etc, it’s possible to describethe shape using concise descriptions. However, the task becomes more challeng-ing when it comes to shapes with varied characteristics. Furthermore it becomesmore ambiguous when natural language is employed. Thus we only consider themicroscopic roughness of the surface, which has a direct relation with the reflec-tion. Other prominent geometric properties such as concavity, which affects self-shadow, inter-reflection, depth-discontinuity, which affects the depth estimation,are not considered.28Surface ReflectanceExisting 3D vision techniques requires distinct cues for reconstruction, be it tex-ture, intensity variation, focus change, and so on. This information will becomemuch noisier and less effective on darker surfaces. Surfaces with low albedo willmake many algorithms ineffective, including most active techniques. In this thesis,dark surfaces could be challenging for all the underlying algorithms implementedwithin the interface. This works fine when it comes to design and use such an inter-face since there is no need for all underlying algorithms to work reliably. However,from a demonstrative standpoint, this would render the interface failing to call anyof the selected algorithms except the baseline methods, which fails to demonstratethe performance of interpreter. Thus, we decided to use objects with bright surfacesso that the interpreter can invoke one of the underlying algorithms given accuratedescription.Setup of Capturing SystemThe configuration of capturing system determines the selection of algorithm ina variety of ways. The capturing device determines what type of data is captured,which could be RGB images, or range data depending on whether a camera of laserscanner is used. The lighting condition determines if a passive or active methodis selected since active methods need controlled lighting. The number of vantagepoints determines if data from different viewpoints or under different illuminationsare needed. In short, variations in the capturing systems can greatly impact whichalgorithm is effective, or the performance of algorithms. However, this signifi-cantly increases the complication of the problem. Thus, we decide to use a fixedcapturing system throughout the thesis. However, note that the results obtainedunder this capturing system might not be applied to data captured using a differentcapturing system. However, the approach would not be affected by the hardwaresetup.Built upon the previously defined problem space, and additional assumptions,we define four classes of problem conditions that will be investigated in depth,as shown in Figure 3.3. We will demonstrate that it is achievable to design adescription-based interface that hides algorithm details and return solutions with-29out knowing the underlying algorithms using objects that satisfy these conditions.4123Surface DescriptionFlat, matteFlat, glossyTextured, matteTextured, glossyImage formationTextureless, near diffuse reflectance, roughTextureless, mixed diffuse and specular reflectance, smoothTextured, near diffuse reflectance, roughTextureless, mixed diffuse and specular reflectance, smoothConditionFigure 3.3: Four problem conditions selected based on the definition of prob-lem space and additional assumptions. The description-based interfacewill be evaluated using objects satisfying these conditions.3.2.1 Discussion and ConclusionsMost vision research has been devoted to developing algorithm novelties and im-proving performance. However, the conditions under which an algorithm worksreliably is often a neglected subject. However, this is a crucial part of designing a3D reconstruction interface. Since no single algorithm can work reliably under awide range of conditions, it is required to have multiple algorithms, each coveringa distinct or partially overlapping sub-volume. The problem then becomes underwhich condition can an algorithm work successfully. To answer this question, wefirst need to define the problem space and problem conditions. Otherwise, it is apoorly-defined task to describe the problem condition of an object.We first proposed a well-defined problem space, which is an Ndimensionalspace, the axes of which are reflective and geometric properties of objects. Theproblem condition(s) is then defined as a sub-volume within the problem space.Since existing 3D vision algorithms can only deal with certain classes of objects,we further proposed assumptions to limit the problem space to a more feasiblerange. Further, to implement a proof of concept interpreter for demonstrative pur-pose, we choose four problem conditions to demonstrate the proposed interpreter.Since this thesis studies a sub-space of the entire problem space, a natural ques-30tion is whether this approach extends to a broader space when incorporating prop-erties that have been left out, such as refraction, sub-surface scattering, and soon. This is a challenging task, and depends on two aspects: 1) robust techniquesthat work successfully under these conditions have been developed; 2) there is away to describe the corresponding properties. The goal of this thesis is to demon-strate whether it is possible for the proof of concept interpreter to find a reliablealgorithm given a description of problem condition. Thus, this is left as a futureresearch direction.31Chapter 4A Description of 3DReconstructionIn Chapter 3, we introduced a problem space for 3D reconstruction, which providesa new perspective to understanding algorithms without delving into algorithm de-tails. However, it is not sufficient to merely have a problem space when it comesto creating an interface for 3D reconstruction. It provides little in terms of de-scribing problem conditions that we are interested in solving. There are two keycomponents that are needed in order to provide a consistent description: a modelwhich addresses what to describe, i.e., which property is included or excluded inthe model, and corresponding representations addressing how to describe the com-ponents of a model, i.e., what characteristics of a property should be of interest.The first issue has already been somewhat addressed in Chapter 3 with the well-defined problem space. Specifically, we can use a subset of properties of the prob-lem space as the components of the model. However, the second issue has yet to beaddressed. We have not discussed which aspect of a property is crucial to problemsolving, and how it is measured or estimated. Besides, it is pointless to discussthe estimation of property parameter without determining the corresponding rep-resentation beforehand. Thus, it is practically impossible to provide a consistentdescription without a model and corresponding representations. Here is a concreteexample to further stress the importance of model and representations: how to de-scribe a porcelain vase? First and foremost, we need to decide which properties32to describe. We use surface ‘texture’ as an example. Next we need to addresshow this property, ‘texture’ in this example, is represented, i.e., what characteris-tics of the property is used to measure the strength of ‘texture’. We can choosethe scale of texel (texture element), or randomness of texel as representation oftexture. Otherwise, it make no sense to quantify a property before determining thecorresponding representation. Thus, this chapter addresses these two issues: first,we propose a model consisting of a subset of properties from the problem space.We recognize that it is practically impossible to achieve an exhaustive description,and the scope of description is determined by the scope of the problem space. Forinstance, sub-surface scattering property should not be included in the descriptionsince it is omitted in the problem space as well. Secondly, we need to determinethe representation of each property. Each property might have multiple facets thatare essential to problem solving. For instance, when we talk about texture, someimportant features include randomness of texture, scale of texel, magnitude andorientation of texture, and so on. Without determining a proper representation ofthe property, there is no way we can proceed to perceptually estimate the strengthof the property.This way of thinking is not specific to our problem. Typical computer visionproblems require, among other factors, a model of the problem space and appro-priate representations [43]. The model characterizes the relevant properties of theelements in the domain and analyze their relations. The representations describean object’s properties selected by the model to facilitate a solution of the problem.For instance, surface orientation is a part of the surface geometry model, and thecorresponding representation can be surface normal or curvature. Another exampleis colour, which is a component of a material model, and the RGB values could beone of the representations of this property. By using this formal description con-sisting of a model and corresponding representations, expressing the conditionswithin which an algorithm works reliably becomes more well-defined. Thus, thischapter is devoted to proposing a model and representations for 3D reconstructionproblem, which are the key components of the interpretable description.In this chapter, we attempt to provide a description of the 3D reconstructionproblem which allows for a well defined specification of the conditions surround-ing the problem. This description abstracts away from the functional specification33of how to estimate a reconstruction. We first propose a formal definition of the 3Dreconstruction problem in Appendix A.1. Next, Section 4.1 proposes a model to 3Dreconstruction by selecting various key characteristics of the problem space thatare crucial for describing the appearance of the object. Section 4.2 discusses cor-responding representations of the proposed model. Section 4.3 provides examplesof expressing 3D reconstruction problems using the proposed model and represen-tations. These following four layers represent the description of our accessible 3Dreconstruction framework: Definition, Model, Representation, and Expression.4.1 ModelModel and representations are fundamental for vision problem solving. Model se-lects characteristic properties of an object, and representations describe the modelselected object properties to facilitate a solution of a class of problem. A model fa-cilitates the representation of aspects in reality that are useful in a particular prob-lem domain [13]. There are two criteria in choosing the model: 1) the modelcomponents need to be useful for problem solving; 2) the components need to bevisually or perceptually identifiable so that it is easy for users to learn to describeproblem conditions using the proposed model. As discussed in Chapter 2, visualand geometric information, such as texture, shading variations, and silhouette arecritical to 3D vision algorithms. In Chapter 3, we summarized key visual and ge-ometric properties of an object that are of great importance for humans to identifyand distinguish objects. It is clear that a subset of information that both visionalgorithms and humans utilize overlaps, for instance, both textural and brightnessinformation are utilized. Thus, we select a subset of properties that are used byvision algorithms as reconstruction cues, and also easily identifiable by humans asthe main components of our model.The selection of algorithm depends not only on material and shape of objects,but also on the capturing system. The lighting condition determines if certain cat-egories of algorithms can be used. For instance, active methods needs controlledlighting condition, which typically use a light source or projector whereas passivemethods work under ambient lighting. The number of vantage points determineswhether a single vantage point algorithm or multiple vantage points algorithm is34utilized. Lastly, the static or dynamic nature of the scene determines whether analgorithm that can dynamically update the scene is needed. The key componentsof the model are shown in Table 4.1.ModelLightingVantage pointsNature of object or sceneTextureBrightnessReflectanceRoughnessConcavityTable 4.1: Model of the 3D reconstruction problem. Visual and geometricproperties are selected from the problem space in Chapter 3.4.2 RepresentationBased on the proposed model of the 3D reconstruction problem, we need to furtherdefine the representations so that the 3D reconstruction problem can be expressedusing the proposed model and representations.4.2.1 Setup of Capturing SystemThe model of the capturing system includes the lighting condition, the numberof vantage points, and the nature of the object or scene. In this thesis, we use afixed capturing system, thus the configurations of the capturing system is constantthroughout the thesis, thus is omitted in the following discussions for the sake ofbrevity. The configuration of the capturing system is shown in Table 4.2.4.2.2 TextureTexture is one of the most important cues for many computer vision algorithms. Itis generally divided into two categories, namely tactile and visual textures. Tactiletextures refer to the immediate tangible feel of a surface, whereas visual texturesrefer to the visual impression that textures produce to the human observer, which35Model Representations ValueNature of scene StaticDynamicLightingPassive Ambient lightActive Light sourceProjectorMixed Ambient& light source& projectorVantage pointSingle vantage point 1Multiple vantage pointsSmall (< 10)Medium (1050)Large (> 50)Table 4.2: The representations of the capturing system. The configuration ofthe current capturing system is label in red.are related to local spatial variations of simple stimuli like colour, orientation andintensity in an image. Here we focus only on visual textures, as they are morewidely used in the stereo vision research. The term ‘texture’ hereafter refers exclu-sively to ‘visual texture’ unless mentioned otherwise.Although texture is an important component in computer vision, there is noprecise definition for the notion of texture itself. The main reason for this is thatnatural textures often exhibit separate yet contradicting properties, such as reg-ularity versus randomness, or uniformity versus distortion, which can hardly bedescribed in a unified manner.There are various properties that make texture distinguishable: scale/size/gran-ularity, orientation, homogeneity, randomness, etc. However, due to the diverse andcomplex nature of textures, it is a challenging task to generate a synthetic texturesolely from these semantic properties, or the other way around, derive parametersfrom a given texture. The stereo vision community often takes a simplified ap-proach, classifying textures into two categories, regular and stochastic, by degreeof randomness. A regular texture is formed by regular tiling of easily identifiableelements (texels) organized into strong periodic patterns. A stochastic texture ex-hibits less noticeable elements and displays rather random patterns. Most of thereal world textures are mixtures of these two categories. In this thesis, we adopt36this simplification and consider texture randomness, which is the amount of distor-tion in the texture. Thus, a uniform texture has low texture randomness whereas ahighly textured surface has high texture randomness.4.2.3 BrightnessWhen light strikes a surface, it may be reflected, transmitted, absorbed, or scat-tered; usually, a combination of these effects occurs. The intensity or colour infor-mation received by a sensor is thus determined, among other factors, by the amountof light available after these interactions. We assume that all effects are local, thusglobal effects such as inter-reflection and transmission, among others, are omitted.This is called a local interaction model. In order to understand the contributing fac-tors of pixel intensity or colour, we need an in-depth understanding of reflection,i.e., how light is reflected off of a surface patch, and the relation between materialand intensity values. The radiometric formation of an image consists of three sep-arate processes: light-matter interaction, light-lens interaction, and light-sensorinteraction, as discussed in Appendix A.2. The conclusion is that image intensityis proportional to diffuse reflectance (albedo).Diffuse reflectance represents light that is refracted into the surface, scattered,partially absorbed, and re-emitted. The Lambertian diffuse model assumes that therefracted light has scattered enough that it has lost all directionality and thus thediffuse reflectance is constant. However, very few materials exhibit a Lambertianresponse. Many materials show a drop in grazing retro-reflection, and many othersshow a peak. This is strongly correlated to surface roughness — smooth surfacestend to have a shadowed edge, and rough surfaces tend to have a peak instead of ashadow. This grazing shadow for smooth surfaces can be explained by the Fresnelequations: at grazing angles, more energy is reflected from the surfaces and less isrefracted into the surface to be diffusely re-emitted. However, diffuse models don’tgenerally consider the effect of surface roughness on Fresnel refraction and eitherassume a smooth surface or ignore the Fresnel effect. In this thesis, we adoptthis diffuse model that neglects surface roughness, thus the albedo is consideredconstant, and not affected by grazing angle and surface roughness.In conclusion, as long as the light-sensor interaction is considered as a linear37mapping (as most vision algorithms do) or calibrated in a pre-processing step, thepixel intensity value is linearly proportional to diffuse reflectance, which is therepresentation we adopt for brightness.4.2.4 SpecularitySpecular surfaces reflect light in nearly a single direction when microscopic surfaceirregularities are small compared to light wavelength, and no subsurface scatteringis present [49]. Unlike diffuse reflections, where we experience the brightnessand colour of an object, specular reflections carry information about the structure,intensity, and spectral content of the illumination field. In other words, specularreflection is simply an image of the environment, or the illumination field, distortedby the geometry of the reflecting surface. For instance, the specular sphere inFigure 4.1 shows a distorted image of the environment instead of the underlyingsurface colour. A purely specular surface is a mirror, which is rare in nature. Mostnatural materials exhibit a mixture of specular and diffuse reflections. We use themicrofacet model as the reflectance model in this thesis, which is discussed inAppendix A.2.2. Thus the amount of specular component is determined by theFresnel reflectance, also named specular reflectance.(a) (b)Figure 4.1: (a). A red diffuse sphere; (b). a red specular sphere. The sur-face reflects light in a mirror-like way, showing a distorted environment.Since no diffuse reflection exists, the colour of the surface is no longervisible.The Fresnel reflectance is the fraction of incoming light that is reflected (asopposed to refracted) from an optically flat surface of a given substance. It variesbased on the light direction and the surface (in this case microfacet) normal. Fres-nel reflectance tells us how much of the light hitting the relevant microfacets (the38ones facing in the half-angle direction) is reflected. Fresnel reflectance depends onrefraction index (in other words, what the object is made of) and the incoming lightangle (which is plotted here on the x-axis), as shown in Figure 4.2. As the angleincreases, the Fresnel reflectance barely changes for the first 45 degrees; afterwardsit starts changing, first slowly up to about 75 degrees, and then for very glancingangles, it rapidly goes to 100% at all wavelengths. Since the Fresnel reflectancestays close to the value for 0 over most of the range, we can think of this valueF(0) as the characteristic specular reflectance of the material. This value has allthe properties of what is typically thought of as a “colour” - it is composed of RGBvalues between 0 and 1, and it is a measure of selective reflectance of light. For thisreason, we will also refer to this value as the specular colour of the surface, de-noted as cspec. We choose specular reflectance as the representation of specularity.Figure 4.2: Fresnel reflectance for external reflection from a variety of sub-stances. Since copper and aluminum have significant variation in theirreflectance over the visible spectrum, their reflectance is shown as threeseparate curves for R, G, and B. Copper’s R curve is highest, followedby G, and finally B (thus its reddish colour). Aluminum’s B curve ishighest, followed by G, and finally R. (Image from “Real-Time Render-ing, 3rd edition”, ©2008 A K Peters, by permission)394.2.5 RoughnessRoughness, which refers to the microscopic shape characteristics of a surface, con-tributes to the way in which light is reflected off of a surface. A smooth surfacemay reflect incident light in a single direction, while a rough surface may scatterthe light in various directions. Thus, variations in microscopic surface geometrycan cause specular reflections to be scattered, blurring the image of the environ-ment in an amount proportional to surface roughness. We need prior knowledge ofthe microscopic surface irregularities, or a model of the surface itself, to determinethe reflection of incident light.Possible surface models are divided into 2 categories: surfaces with exactknown profiles and surfaces with random irregularities. An exact profile may bedetermined by measuring the height at each point on the surface by means of asensor such as the stylus profilometer. This method tends to be cumbersome andimpractical, hence, it is more reasonable to model the surface as a random process,where it is described by a statistical distribution of either its height above a certainmean level, or its slope with respect to its mean (macroscopic) slope. We use thesecond statistical approach as the representation of roughness.In conclusion, the proposed model and representation of 3D reconstruction canbe summarized in Table 4.3.Model RepresentationNature of scene StaticLighting Mixed: ambient, projector, light sourcesVantage point Medium: 10 - 50Texture Texture randomnessLightness Diffuse reflectanceSpecularity Specular (Fresnel) reflectanceRoughness Distribution of microfacet normalTable 4.3: A Model and corresponding representations of the 3D reconstruc-tion problem.404.3 ExpressionIn this section, we discuss the expression of 3D reconstruction problems usingthe proposed model and representations, as shown in Table 4.3. There are tworelated issues that need to be addressed: 1) how intuitive is the proposed model forhumans to describe problem conditions; 2) how can a user estimate the parametersof properties. We first address the issue of property perception in Section 4.3.1.We can show that it is intuitive for humans to perceptually identify materials andestimate material properties. Next, we will use the proposed description to expressthe four problem conditions proposed in Chapter 3.4.3.1 Perception of PropertiesDifferent materials can be visually distinguished because they structure light in aparticular, characteristic way. The way light is structured depends heavily on theshape of object, reflective and transmittive properties of material, and illuminationfield. This process is called the ‘forward optics’ (image formation) process. Thematerial perception problem is, in some sense, the ‘inverse optics’ problem: deter-mining what combination of surface geometry, surface material, and illuminationfield generated a given image [6]. Thus, in order to recover the reflectance prop-erties of materials, the visual system must somehow disentangle the contributionsof the illumination field and geometry. This is arguably the central problem ofmaterial perception, though the answer is currently far from clear [6]. But our in-tuition and many empirical studies support the view that humans are exceptional atmaterial perception.Everyday experience suggests that human’s visual system are extremely goodat estimating material properties. We can effortlessly distinguish numerous differ-ent categories of material: textiles, stones, liquids, foodstuffs, and so on, and canrecognize many specific materials within each class such as silk, wool and cot-ton [18]. Besides, being able to visually distinguish between materials and infertheir properties by sight, is invaluable for many tasks. For instance, when deter-mining edibility, we can make subtle visual judgments of material properties todetermine whether fruit is ripe, whether soup has been left to go cold or whetherbread is going stale [18]. There are numerous experimental evidences to support41this intuition. For example, Sharan et al. have shown that subjects can identifya wide range of materials from photographs even with brief presentations [60].Fleming et al. showed subjects photographs of materials from different categoriesand asked them to rate various subjective qualities, such as hardness, glossinessand prettiness [20]. Even though subjects were not explicitly informed that thesamples belonged to different classes, the subjective ratings of the individual sam-ples were systematically clustered into categories, suggesting that subjects couldtheoretically classify materials through visual judgments of their properties.There have been some empirical studies focused on visual estimation of spe-cific properties of materials, such as glossiness, translucency or surface roughness.For instance, on the topic of glossiness, Nishida and Shinya showed that subjectscan judge the specular reflectance of computer simulated glossy surfaces [50], andFleming et al. showed that this ability generalizes across differences in lighting,as long as the illumination has statistical structure that is typical of the naturalenvironment [19]. Fleming argued that the visual system does not actually esti-mate physical parameters of materials and objects, for instance, parameters of areflectance model. Instead the brain is remarkably adept at building ‘statisticalgenerative models’ that capture the natural degrees of variation in appearance be-tween samples [18]. Though there is currently no universally accepted theory onvisual perception of material, both our intuition and empirical results do suggestthat human are exceptionally good at ‘estimating’ material properties.Despite its subjective ease, material perception still poses the visual systemwith some unique and significant challenges, because a given material can take onmany different appearances depending on the lighting, viewpoint and shape. Thus,we provide a generative approach to visual perception: a simulation software isused to generate images of a synthetic object with varied property settings to aidthe estimation of properties. The user can choose an illumination field and objectshape that is closest to the real-world enviriontment and object. Then this “trial-and-error” approach is used to estimate parameters of properties. More specifically,the user would change the value of each property and see if the rendered resultresembles the real object. A similar approach can be found in [10] where Berkitenand Rusinkiewicz used a synthetic dataset to find the contributing factors of variousPhotometric Stereo algorithms. We demonstrate this approach using the following42example shown in Figure 4.3: the brightness of the object is controlled by thealbedo value, which is determined by the value channel of HSV colour space. Todetermine the specularity and roughness of the object, we experiment with varyingparameters to get the most realistic image.TexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRoughTexSpecAlbRough(a). Propertysettings(b). Syntheticsphere(c). Syntheticvase(d). Real-worldvaseFigure 4.3: The UI for determining the property settings, including albedo,specular, and roughness of the surface. As shown in (a), the prob-lem condition in this case is: texture (0.8), albedo (0.8), specular (0.2),roughness(0.2). (b) demonstrates the effect of the property settings on asphere, (c) on a teapot, and (d) shows the real-world object.Now that we have proposed the model and representations of 3D reconstructionproblem, we can express the four proposed problem conditions using this descrip-tion. Given that all user perceived estimates would likely to be of low resolution,we use three discrete scales to set these properties: low (0.2), medium (0.5), andhigh (0.8). The expression of the reconstruction problem is shown in table 4.4.Object Texture Albedo Specular Rough LabelClass 1 low/med high low/med high Tl-B-D-RClass 2 low/med high high low/med Tl-B-M-SClass 3 high high low/med high T-B-D-RClass 4 high high high low/med T-B-M-STable 4.4: Expression of the reconstruction problem for the four problemconditions proposed in Section 3.434.4 Discussion and ConclusionsThis chapter builds upon the problem space proposed in Chapter 3, and aims to pro-vide a description to allow users to define problem condition of an object defini-tively. The description consists of two parts: a model and corresponding repre-sentations. The model answers the question of what characteristic features of anobject to describe, and the representations address the issue of how to describe theselected set of features of an object.We selected a subset of properties from the problem space as components ofthe model, and proposed corresponding representations for this model, as shown inTable 4.3. We further analyze how easy it is to perceptually estimate properties, andexpress the four problem conditions using the proposed description. The proposeddescription is the first layer of our three-layer description-based interface. It worksas the input of interpreter, and from which, an algorithm that works reliably will bechosen to reconstruct a successful digital shape.Since we only explore a subset of problem space, the proposed model andrepresentations may be insufficient to describe some classes of object, especiallyobjects with translucent, transparent, or refractive surfaces. However, the applica-tion of the description to a broader domain is limited by the availability of existingtechniques. As long as reliably algorithms have been developed, we can extend thecurrent description to incorporate complex reflective and geometric properties.44Chapter 5A Mapping of 3D ReconstructionMost vision work focuses on developing algorithm novelties, and as we have men-tioned, very few investigate the rigorous conditions under which the algorithmsthemselves work. Thus, this knowledge is only known empirically, without a rig-orous definition of the problem conditions. This relation between problem spaceand algorithms (termed as mapping) is one of the key components of the inter-preter, and is responsible for selecting one of the best possible algorithms based ondescribed problem condition. The mapping is essentially a look-up table that re-turns a list of successful algorithms given a problem condition. This section buildsupon the description proposed in Chapter 4, and attempts to find the problem con-ditions surrounding each algorithm empirically. To achieve this goal, we need toevaluate the performance of algorithms under varied problem conditions.Two challenges need to be addressed, the first of which is to evaluate the perfor-mance of algorithms under a variety of problem conditions. This requires a datasetcontaining objects captured for between-category algorithms under a variety ofproblem conditions. To the best of our knowledge, there is no such benchmarkavailable since most 3D benchmarks focus on one specific class of algorithms. Forexample, the Middlebury dataset targets MVS algorithms [59], and the ‘DiLiGenT’dataset targets Photometric Stereo algorithms [61]. This makes such benchmarksonly suitable for evaluation of within-category algorithms. Besides, there are fewdatasets with objects that cover a range of problem conditions. The reason for thelack of such a dataset is that it is practically impossible to create multiple versions45of the same object with one property, e.g., surface texture, material, and so on,varied while everything else is kept constant. In response to this challenge, wecreated a synthetic dataset using the physical-based rendering engine of Blenderto evaluate the 3D reconstruction algorithms. Our dataset includes a collection ofimages of a scene under different materials and lighting conditions. The cameraand projector’s intrinsic and extrinsic parameters are computed directly from theconfigurations of the synthetic setup, and the ground truth, including the 3D modelpoint cloud and normal map, are generated directly from Blender.The second challenge is: the problem space is an Ndimensional space, con-sisting of N properties, each with L levels, thus the total number of combinationsis LN , which increases exponentially as N increases. This is too vast of a problemspace to cope with. To make the problem space more feasible to handle, we adoptthe three-point scale, where L = 3 (Low = 0.2, Medium = 0.5, and High = 0.8).Further, We conductN2L⇥ L factorial studies to determine the properties thathave a significant effect on performance of the algorithms. We can then reduce thespace dimension by considering only these effective properties. Effective proper-ties of an algorithm are those that have a main effect, interaction effect, or both onthe performance of this specific algorithm. We illustrate the performance of algo-rithms under varied conditions as heatmaps so that the main effects and interactionsbetween properties can be easily detected.The chapter is organized as follows: section 5.1 discusses the selected algo-rithms, and the process of creating a synthetic dataset for the evaluation of selectedalgorithms. Section 5.2 discusses the procedure of identifying properties with asignificant effect on algorithm performance so that the problem space can becomemore manageable. Section 5.3 presents the lookup tables, represented as heatmaps,from problem conditions to performance of algorithms, which is served as mappingfrom problem condition to algorithms.5.1 Construction of DatasetThis section discusses the construction of a synthetic dataset that is used to evaluatebetween-class algorithms under varied problem conditions. First we choose threerepresentative algorithms from three distinct algorithm classes, as well as two base-46line methods. Then the synthetic capturing systems are presented with syntheticimages generated as examples. Lastly, quantitative measures used to evaluate theperformance of algorithms are proposed.5.1.1 Selected and Baseline MethodsWe have selected three representative algorithms, each from one of three majorclasses of algorithms presented in Chapter 3: the PMVS proposed in [23], theexample-based Photometric Stereo proposed in [29], and the Gray-coded Struc-tured Light technique. See Table 5.1 for a summary of the selected algorithms.The current implementation of SL projects both column and row patterns, anddepth values are computed using both patterns individually. A depth consistencystep is performed to reject erroneous triangulation results.Technique SummaryPMVS Patch-based, seed points propagation MVS.EPS Example-based Photometric Stereo.GSL Gray coded Structured Light technique.VH Volumetric Visual Hull.LLS-PS Linear least squares Photometric Stereo.Table 5.1: Summary of the selected and baseline algorithms implemented forthe interface.We use two baseline methods to compare our results: Visual Hull and a sim-ple linear least squares based Photometric Stereo (LLS-PS). We use Visual Hullsince it works relatively well as long as the silhouette of the object can be reli-ably extracted, thus being insensitive to material properties. In addition, the truescene is always enclosed by the reconstruction result, so the outcome is alwayspredictable. We use LLS-PS to evaluate Photometric Stereo algorithms. However,there is currently no such PS algorithm that works reasonably well under a varietyof conditions. Thus, we run this baseline algorithm under the optimal condition toensure a best possible result.475.1.2 Synthetic SetupsWe use the physical-based rendering engine of Blender, Cycles, to generate thesynthetic datasets. For each technique, the configuration of the camera remainsfixed. The image resolution is 1280⇥720, with a focal length of 35mm or 1400pix.The synthetic setups are shown in Table 5.2, and some example synthetic imagesgenerated using the setups are shown in Figure 5.1.Method Hardware ArrangementMVS 41 camera 5 rings, each has 1, 8, 8, 12, 12 cameraPS 1 camera+25 light sources 4 rings, each has 1, 8, 8, 8, 8 light sourcesSL 1 camera&projector baseline angle: 10Table 5.2: Summary of synthetic capturing systems for three classes of algo-rithms.The effects of properties simulated by the rendering engine are shown in Fig-ure 5.1.TextureAlbedoSpecularRoughnessFigure 5.1: Example synthetic images. The value of each property rangesfrom 0 to 1.5.1.3 Quantitative MeasuresWe use the metrics proposed in [59] to evaluate MVS and SL algorithms. Morespecifically, we compute the accuracy and completeness of the reconstruction. Foraccuracy, the distance between the points in the reconstruction R and the nearestpoints on ground truthG is computed, and the distance d such that X% of the pointson R are within distance d of G is considered as accuracy. A reasonable d value48is between [3,5]mm, and X is set as 95. The lower the accuracy value, the betterthe reconstruction result. For completeness, we compute the distance from G toR. Intuitively, points on G are not “covered” if no suitable nearest points on R arefound. A more practical approach computes the fraction of points of G that arewithin an allowable distance d of R. Note that as the accuracy improves, the “accu-racy value” goes down, whereas as the completeness improves, the “completenessvalue” goes up.For photometric stereo, depth information is lost since only one viewpoint isused. Thus, the previous metrics are not applicable. Here we employ another eval-uation criteria that is widely adopted, which is based on the statistics of angularerror. For each pixel, the angular error is calculated as the angle between the es-timated and ground truth normal, i.e., arccos(nTg n), where ng and n are the groundtruth and estimated normals respectively. In addition to the mean angular error, wealso calculate the standard deviation, minimum, maximum, median, first quartile,and third quartile of angular errors for each estimated normal map.5.2 Reduction of Problem Space DimensionThe greatest challenge in constructing a mapping from problem space to algorithmsis the large variations in shapes and material properties, which results in a problemspace that is too large to cope with. Suppose there are N properties, each with Ldiscrete levels, then there are in total LN different problem conditions. Thus, thefirst step, discussed in Section 5.2.1, 5.2.2, 5.2.3, is to reduce the dimensions ofproblem space by discovering the properties that have effects on performance ofalgorithms.This study is a 3⇥ 3 factorial design with two properties (factors), each withthree levels, i.e., low (0.2), medium (0.5), and high (0.8), see Table 5.3. We are in-terested in one-way interaction (main effect) and two-way interaction of the prop-erties (factors). A main effect is the effect of one of the independent variables onthe dependent variable, ignoring the effects of all other independent variables. Aninteraction occurs when the effect of one independent variable on the dependentvariable changes depending on the level of another independent variable. In ourcurrent design, this is equivalent to asking whether the effect of property i changes49depending on property j, where i 6= j, i, j 2 {1,2,3,4}. The easiest way to com-municate an interaction is to discuss it in terms of the simple main effects, whichis defined as the main effect of one independent variable (e.g., property i) at eachlevel of another independent variable (e.g., property j). We observe an interac-tion between two factors whenever the simple effects of one change as the levelsof the other factor are changed. Assume that the error variance is small, so thatdifferences in performance that are apparent on the graph are also statistically sig-nificant. Thus, we are able to interpret the main effects and interactions throughgraphs. There is a main effect of a property if there is colour variation along thecorresponding axis. If colour changes monotonically along the diagonal, then thereis no interaction between the two properties, otherwise, there is an interaction ef-fect.Property i0.2 0.5 0.8Property j0.20.50.8Table 5.3: This is a 3⇥3 factorial design. Every two properties are selected totest the main effects and interaction, there are in totalN2combinations.5.2.1 Reduction of Problem Space Dimension: PMVSWe study the main effects and interaction effects of properties on the performanceof PMVS in terms of accuracy and completeness. The performance of the algo-rithm is visualized in Figure 5.2.(a) Texture and Albedo The main effects of texture and albedo on accuracy,and the main effect of albedo on completeness are not significant whereas the maineffect of texture on completeness is significant such that surfaces with higher tex-ture leads to results of higher completeness than less textured surfaces. There isnot significant interaction effect between texture and albedo on either accuracy orcompleteness.(b) Texture and Specularity The main effects of texture and specularity onboth accuracy and completeness are significant such that surfaces with lower tex-50accuracytexture0.2 0.5 0.8albedo0.80.50.2completenesstexture0.2 0.5 0.8albedo0.80.50.20.20.40.6accuracytexture0.2 0.5 0.8specularity0.80.50.2completenesstexture0.2 0.5 0.8specularity0.80.50.20.20.40.6(a). Texture and albedo (b). Texture and specularityaccuracytexture0.2 0.5 0.8roughness0.80.50.2completenesstexture0.2 0.5 0.8roughness0.80.50.20.20.40.6accuracyalbedo0.2 0.5 0.8specularity0.80.50.2completenessalbedo0.2 0.5 0.8specularity0.80.50.20.20.40.6(c). Texture and roughness (d). Albedo and specularityaccuracyalbedo0.2 0.5 0.8roughness0.80.50.2completenessalbedo0.2 0.5 0.8roughness0.80.50.20.20.40.6accuracyspecularity0.2 0.5 0.8roughness0.80.50.2completenessspecularity0.2 0.5 0.8roughness0.80.50.20.20.40.6(e). Albedo and roughness (f). Specularity and roughnessFigure 5.2: Performance of PMVS under six conditions. For instance, (a)shows the performance under the condition of changing texture andalbedo levels, while the others are fixed. The main effect of a prop-erty is illustrated by the colour variation along the corresponding axis.The monotonic colour variation diagonally indicates no interaction be-tween the two properties, otherwise, there is an interaction effect. Thusin (a), we observe a main effect of texture on completeness, no othermain effects and interaction effects are present.51ture or surfaces with higher specularity leads to higher accuracy value. However,we argue that this main effect are caused by the interaction effect between textureand specularity. As we can see, the effect of specularity on both accuracy andcompleteness are less noticeable for lowly and highly textured surfaces whereasthe effect of specularity is most substantial for surfaces with medium texture. Thiseffect can be explained as follows: the specular lobe can only be observed by cam-eras positioned and oriented towards the specular lobe, such as camera V2 shownin Figure 5.3 (a) and (c). Cameras positioned otherwise would observe the truesurface, such as camera V1 shown in Figure 5.3 (a) and (b). The algorithm wouldthen exploit the texture information provided by views like V1, and thus would beable to reconstruct a specular surface.V1V2~nS(a) Image formation (b) V1 (c) V2Figure 5.3: (a) shows the reflection of light off a specular surface. V1 receivedthe diffuse component while V2 receives the specular component. (b),(c) shows the images observed from these two views. The specular area(red circle) observed in V2 is visible in V1.(c) Texture and Roughness The main effects of texture and roughness, andthat of roughness on completeness are not significant whereas the main effect oftexture on completeness is significant such that surfaces with higher textures leadsto results with higher completeness. There is no significant interaction effect be-tween texture and roughness on either accuracy or completeness.(d) Albedo and Specularity The main effects of albedo and specularity onaccuracy is not significant whereas the main effects of albedo and specularity oncompleteness are significant such that surfaces with higher albedo or lower spec-ularity leads to higher completeness. There is no significant interaction effect be-tween albedo and specularity in terms of accuracy and completeness as the valuevaries monotonically along the diagonal.(e) Albedo and Roughness The main effects of albedo and roughness on ac-52curacy and completeness are not significant. There is no interaction effect betweenalbedo and roughness on either accuracy or completeness.(f) Specularity and Roughness The main effects of specularity and roughnesson accuracy and completeness are not significant. There is no interaction effectbetween specularity and roughness on either accuracy or completeness.Summary: PMVSFor accuracy, specularity has a main effect such that higher specularity leads tohigher accuracy value. There are no main effects for other properties. There is asignificant effect between texture and specularity such that the effect of specularityis most substantial on surfaces with medium texture. There is also a significantinteraction effect between albedo and specularity such that specularity is most sub-stantial on surfaces with low albedo value.For completeness, there is a significant main effect from texture, and no othermain effects observed. There is a significant interaction effects between textureand specularity on completeness such that the negative effect of specularity is mostsignificant on surfaces with medium level texture. There are no other significantinteraction effects observed.5.2.2 Reduction of Problem Space Dimension: EPSWe study the main effects and interaction effects of properties on the performanceof EPS in terms of mean and standard deviation (SD) of angular error. The perfor-mance of the algorithm is visualized in Figure 5.4.(a) Texture and Albedo The main effects of texture on mean and SD of angularerror, and the main effect of albedo on SD of angular error are not significant. Themain effect of albedo on mean value of angular error is significant such that theangular error decreases as the albedo increases. There is no significant interactioneffect between texture and albedo on mean and SD of angular error.(b) Texture and Specularity The main effects of texture on mean and SD ofangular error are not significant whereas the main effects of specularity are signifi-cant such that both values increases as specularity increases. There is no significantinteraction effect between texture and specularity in terms of mean and SD of an-53mean of angular errortexture0.2 0.5 0.8albedo0.80.50.2std of angular errortexture0.2 0.5 0.8albedo0.80.50.2 2345mean of angular errortexture0.2 0.5 0.8specularity0.80.50.2std of angular errortexture0.2 0.5 0.8specularity0.80.50.2 11.5 2(a). Texture and albedo (b). Texture and specularitymean of angular errortexture0.2 0.5 0.8roughness0.80.50.2std of angular errortexture0.2 0.5 0.8roughness0.80.50.2 123mean of angular erroralbedo0.2 0.5 0.8specularity0.80.50.2std of angular erroralbedo0.2 0.5 0.8specularity0.80.50.21234(c). Texture and roughness (d). Albedo and specularitymean of angular erroralbedo0.2 0.5 0.8roughness0.80.50.2std of angular erroralbedo0.2 0.5 0.8roughness0.80.50.212345mean of angular errorspecularity0.2 0.5 0.8roughness0.80.50.2std of angular errorspecularity0.2 0.5 0.8roughness0.80.50.2 11.5 22.5(e). Albedo and roughness (f). Specularity and roughnessFigure 5.4: Performance of Example-based PS under six problem conditions.For instance, (a) shows the performance under the condition of changingtexture and albedo levels, while the others are fixed. The main effect ofa property is illustrated by the colour variation along the correspondingaxis. The monotonic colour variation diagonally indicates no interac-tion between the two properties, otherwise, there is an interaction effect.Thus in (a), we observe a main effect of albedo on mean angular error,no other main effects and interaction effects are present.54gular error.(c) Texture and Roughness The main effects of texture on mean and SD ofangular error, and the main effect of roughness on SD of angular error are notsignificant whereas the main effect of roughness on mean of angular error is sig-nificant such that the mean angular error decreases as roughness increases. There isno interaction between texture and roughness in terms of mean and SD of angularerror.(d) Albedo and Specularity The main effects of albedo and specularity onmean and SD of angular error are significant such that the mean and angular errordecrease as albedo increases or specularity decreases. There is also a significantinteraction effect between albedo and specularity such that the effect of specularityon mean and SD of angular error is most significant when the surface albedo is low.(e) Albedo and Roughness The main effects of albedo and roughness on meanof angular error are significant such that the mean angular error decreases as thealbedo or roughness increases whereas the main effects on SD of angular errorare not significant. There is no significant interaction effect between albedo androughness.(f) Specularity and Roughness Themain effect of specularity on mean and SDof angular error is significant such that the values vary as the specularity changes.There is an interaction effect between specularity and roughness. More specifically,the mean and SD of angular error do not decrease monotonically as the roughnessincreases. More specifically, the angular error becomes worse for surfaces withmedium roughness, which is counter-intuitive at first sight. However, we arguethat this is because the roughness is not strong enough to counteract the specularhighlights, causing a smoothed and blurred specular region with larger area, thusleading to a poorer normal estimation. See Figure 5.5 for visual illustrations.Summary: EPSAlbedo, specularity, and roughness all have a main effect on angular error. Morespecifically, higher albedo and roughness lead to lower mean angular error, higherspecularity leads to higher mean and SD angular error. There is a significant inter-action effect between albedo and specularity such that the effect of specularity is55Image Normal map Height map Angular error0802angular error012345678910(a). roughness: 0.20805angular error012345678910(b). roughness: 0.50808angular error012345678910(c). roughness: 0.8Figure 5.5: An example illustrates the effect of roughness on PS. Albedo isset as 0.8, and specular is set as 0.8. The first column shows the inputimages, the second column shows the estimated normal map, the thirdcolumn shows the integrated surface, and last column shows the angularerror. We can see from the qualitative results (normal map and heightmap), and quantitative result (angular error) that a medium level rough-ness would lead to a worse normal estimation since it blurs the specularlobe.56most significant on low albedo surfaces. There is also an interaction between spec-ularity and roughness such that surfaces with medium roughness lead to higherangular error compared to surface with low or high roughness.5.2.3 Reduction of Problem Space Dimension: GSLWe interpret the main effects and interactions of properties on the performance ofGSL in terms of accuracy and completeness. The performance of the algorithm isvisualized in Figure 5.6.(a) Texture and Albedo The main effect of texture on accuracy and complete-ness and the main effect of albedo on accuracy are not significant. The main effectof albedo on completeness is significant such that the results increases as the albedoincreases. There is no significant interaction effect between texture and albedo interms of accuracy and completeness.(b) Texture and Specularity The main effect of texture on accuracy and com-pleteness and the main effect of specularity on accuracy are not significant. Themain effect of specularity on completeness is significant such that the results de-creases as the specularity increases. There is no significant interaction effect be-tween texture and specularity in terms of accuracy and completeness.(c) Texture and Roughness The main effect of texture on accuracy and com-pleteness and the main effect of roughness on accuracy is not significant. The maineffect of roughness on completeness is significant such that the results increases asthe roughness increases. There is no significant interaction effect between textureand roughness in terms of accuracy and completeness.(d) Albedo and Specularity The main effects of albedo and specularity on ac-curacy are not significant whereas the main effects on completeness are significantsuch that the results increase as albedo increase or specularity decreases. There isno significant interaction effect between albedo and specularity in terms of accu-racy and completeness.(e) Albedo and Roughness The main effects of albedo and roughness on ac-curacy, and the main effect of roughness on completeness are not significant. Themain effect of albedo on completeness is significant such that the results increasesas the albedo increases. There is no significant interaction effect between albedo57accuracytexture0.2 0.5 0.8albedo0.80.50.2completenesstexture0.2 0.5 0.8albedo0.80.50.2 0.10.20.30.40.5accuracytexture0.2 0.5 0.8specularity0.80.50.2completenesstexture0.2 0.5 0.8specularity0.80.50.2 0.10.20.3(a). Texture and albedo (b). Texture and specularityaccuracytexture0.2 0.5 0.8roughness0.80.50.2completenesstexture0.2 0.5 0.8roughness0.80.50.2 0.10.20.30.4accuracyalbedo0.2 0.5 0.8specularity0.80.50.2completenessalbedo0.2 0.5 0.8specularity0.80.50.2 0.10.20.30.4(c). Texture and roughness (d). Albedo and specularityaccuracyalbedo0.2 0.5 0.8roughness0.80.50.2completenessalbedo0.2 0.5 0.8roughness0.80.50.2 0.10.20.30.4accuracyspecularity0.2 0.5 0.8roughness0.80.50.2completenessspecularity0.2 0.5 0.8roughness0.80.50.2 0.10.20.30.40.5(e). Albedo and roughness (f). Specularity and roughnessFigure 5.6: Performance of Gray-coded SL under six problem conditions.For instance, (a) shows the performance under the condition of chang-ing texture and albedo levels, while the others are fixed. The main effectof a property is illustrated by the colour variation along the correspond-ing axis. The monotonic colour variation diagonally indicates no in-teraction between the two properties, otherwise, there is an interactioneffect. Thus in (a), we observe a main effect of albedo on completeness,no other main effects and interaction effects are present.58and roughness in terms of accuracy and completeness.(f) Specular and Roughness The main effects of specularity and roughnesson accuracy are not significant whereas the main effect on completeness are sig-nificant such that the results increases as the roughness increases or specularitydecreases. There is no significant interaction effect between specularity and rough-ness in terms of accuracy and completeness.Summary: GSLThere is no main effects or interaction effects observed on the algorithm in terms ofaccuracy. Albedo, specularity, and roughness all have main effects on the algorithmin terms of completeness. There are no interaction effects observed.5.3 Construction of MappingIn the previous section, we have examined the performance of algorithms with twochanging properties at a time. This is equivalent to examine the performance of al-gorithms on a 2dimensional plane embedded in a Ndimensional space. It givesus insights into which properties have significant impacts on the performance ofalgorithms. In this section, we examine the problem conditions consisting of onlyproperties that have significant main or interaction effects on the algorithms. Thisis a much more feasible problem since only a subset of all N properties have a sig-nificant effect on a specific algorithm. The result is a one-to-many mapping fromproblem condition to algorithms. To determine if an algorithm achieves a suc-cessful reconstruction, we compare the quantitative results to those of the baselinemethods. More specifically, an algorithm is considered as a successful candidateif it achieves better reconstruction result than that of baseline methods in terms ofquantitative measures, such as accuracy, completeness, and angular error. To illus-trate the results, all quantitative measures of baseline methods are subtracted fromthose of selected algorithms, the results of which are visualized using heatmaps,as shown in Figure 5.7. To keep the results consistent across all quantitative mea-sures, i.e., positive value represents better result than the baseline methods, weinverse the results of accuracy and angular error. The mapping is essentially alook-up table that returns a list of successful algorithms given a problem condition,59and these heatmap graphs can be viewed as mapping from problem condition toalgorithms: given a problem condition, the users can specify a threshold value e ,which is the minimum tolerance of the quantitative measures between the selectedalgorithms and those of baseline methods. Any algorithm which is at least e abovethe baseline methods are considered algorithms that can work reliably under thespecified problem condition. By default, e = 0. Thus, positive value indicates bet-ter reconstruction, which indicates that the corresponding algorithm is a successfulcandidate. The mapping of the three selected algorithms are shown in Table 5.4.5.4 Discussion and ConclusionsIt is a non-trivial task to find a mapping from problem conditions to algorithmsbased on the description. By no means is the aforementioned approach the onlyway, or a perfect way, since it potentially has the problem of suffering from prop-erty scaling issue. Nonetheless, the factorial studies remains a valuable approachto obtain insights of the effective properties of a specific algorithm. The develop-ment of the mapping is an on-going process. For instance, we can include morequantitative metrics such as colour accuracy, ‘ghost’ reconstruction, and so on. Inorder to make the mapping applicable to objects with more complex shapes, weneed to consider more sophisticated geometric properties besides roughness, suchas concavity, depth-discontinuity, occlusion, etc. Furthermore, the incorporation ofmore algorithms is another way to ensure that the problem space is well covered.60accuracytexture: 0.20specularity0.2 0.5 0.8albedo0.80.50.2completenesstexture: 0.20specularity0.2 0.5 0.8albedo0.80.50.2 -0.6-0.5-0.4-0.3accuracytexture: 0.50specularity0.2 0.5 0.8albedo0.80.50.2completenesstexture: 0.50specularity0.2 0.5 0.8albedo0.80.50.2 -0.5 0 0.5accuracytexture: 0.80specularity0.2 0.5 0.8albedo0.80.50.2completenesstexture: 0.80specularity0.2 0.5 0.8albedo0.80.50.2-0.5 0 0.5(a). Look-up table of PMVS.mean of angular erroralbedo: 0.20roughness0.2 0.5 0.8specularity0.80.50.2std of angular erroralbedo: 0.20roughness0.2 0.5 0.8specularity0.80.50.2-4-2 0mean of angular erroralbedo: 0.50roughness0.2 0.5 0.8specularity0.80.50.2std of angular erroralbedo: 0.50roughness0.2 0.5 0.8specularity0.80.50.2-1 0 1mean of angular erroralbedo: 0.80roughness0.2 0.5 0.8specularity0.80.50.2std of angular erroralbedo: 0.80roughness0.2 0.5 0.8specularity0.80.50.2-1 0 1(b). Look-up table of EPS.accuracyalbedo: 0.20roughness0.2 0.5 0.8specularity0.80.50.2completenessalbedo: 0.20roughness0.2 0.5 0.8specularity0.80.50.2-0.2-0.1 0accuracyalbedo: 0.50roughness0.2 0.5 0.8specularity0.80.50.2completenessalbedo: 0.50roughness0.2 0.5 0.8specularity0.80.50.2 -0.15 -0.1-0.05 0accuracyalbedo: 0.80roughness0.2 0.5 0.8specularity0.80.50.2completenessalbedo: 0.80roughness0.2 0.5 0.8specularity0.80.50.2 00.05 0.1(c). Look-up table of GSL.Figure 5.7: Performance of PMVS, EPS, and GSL under all problem conditions. These are look-up tables that provideinformation regarding the performance of selected algorithms compared to baseline methods. Once a thresholdvalue e is specified, these look-up tables can be used as mapping from problem condition to algorithms, i.e.,return successful algorithms given a problem condition.61Texture Albedo Specular Roughness0.5 0.5 0.2 -0.5 0.8 0.2 -0.5 0.8 0.5 -0.8 0.2 0.2 -0.8 0.5 0.2 -0.8 0.8 0.2 -0.8 0.5 0.5 -0.8 0.8 0.5 -0.8 0.5 0.8 -0.8 0.8 0.8 -Texture Albedo Specular Roughness- 0.2 0.2 0.8- 0.2 0.5 0.8- 0.2 0.8 0.8- 0.5 0.2 0.2- 0.5 0.2 0.5- 0.5 0.2 0.8- 0.5 0.5 0.2- 0.5 0.5 0.8- 0.5 0.8 0.8- 0.8 0.2 0.2- 0.8 0.2 0.8- 0.8 0.5 0.2- 0.8 0.5 0.8- 0.8 0.8 0.2- 0.8 0.8 0.8Texture Albedo Specular Roughness- 0.8 0.2 0.2- 0.8 0.5 0.2- 0.8 0.2 0.8- 0.8 0.5 0.8- 0.8 0.8 0.8Table 5.4: The problem conditions under which PMVS, EPS, and GSL work successfully in terms of the two metricsaccuracy and completeness.62Chapter 6An Interpretation of 3DReconstructionNow that we have proposed a well-defined problem space of 3D reconstructionproblem, as well as a mapping from problem space to algorithms, the interpreta-tion from the problem-centred description to reliable algorithms must be shown.There are many possible ways to interpret the description, our proof of conceptinterpreter is based on the direct evaluation of performance of each 3D reconstruc-tion algorithm under different problem conditions presented in Chapter 5. Thismapping from problem space to algorithms and additional constraints allow a suc-cessful algorithm to be definitively selected.In this chapter, we first propose a proof of concept interpreter, which makesthe three-layer description-based interface complete. By no means is the proposedinterpreter the best possible interpreter, but it suffices to demonstrate the usage ofthe interface. The interpreter is responsible for receiving a description from theuser, and selects an algorithm from a suite of algorithms. We are interested inevaluating the performance of the interpreter under the four problem conditions,which are summarized in Chapter 3. More specifically, we are interested in findingout whether the interpreter is able to translate a user-specified description into analgorithm so that a successful reconstruction result can be achieved. Further, we arealso interested in the performance of the interpreter with less accurate, or inaccuratedescriptions as inputs. It gives us insights into the sensitivity of the interpreter, and63demonstrates cases where it is crucial to provide accurate descriptions.Three algorithms and two baseline methods that are introduced in Chapter 5have been implemented and integrated into the interpreter, providing basic but wellrounded reconstruction capabilities. Although more algorithms can potentially beadded, these methods are sufficient to validate the interface’s ability to translate adescription into a reconstruction solution. The integration of new algorithms thatare more appropriate under particular situations requires only that they be evalu-ated by the same process discussed in Chapter 5, as shown in Figure 3.2. Thisallows researchers to contribute novel algorithms to the interpretation of the inter-face easily.We created both synthetic and real-world datasets to evaluate the performanceof the interpreter. The datasets is made publicly available to encourage the testingof additional reconstruction algorithms. Further, the datasets can be extended toinclude new visual and geometric properties, thus providing a wider coverage ofthe problem space.This chapter is organized as follows: Section 6.1 provides a detailed roadmapof our evaluation that is centred around three key evaluation questions. Section 6.2proposes a proof of concept interpreter. Section 6.3 gives an overview of the cre-ation of dataset, including hardware calibration, and image capturing, and exampleimages of synthetic and real-world dataset. Section 6.4 addresses the evaluationquestion by demonstrating the performance of interpreter under the four problemconditions using synthetic and real-world datasets, where a satisfactory reconstruc-tion result is returned given the correct description of the object.6.1 Evaluation MethodologyThis thesis proposed a description-based interface to 3D reconstruction problemthat hides algorithm details. This interface consists of three separate layers, a de-scription, an interpreter, and an algorithm layer. The interpreter is responsible forselecting an appropriate algorithm for reconstruction based on user-specified de-scription. The goal of our evaluation is to validate if the proposed proof of conceptinterpreter can translate a user-specified description into an algorithm that gives asuccessful reconstruction result. The evaluation is centred around the following64three evaluation questions:1. Can the proof of concept interpreter return one of the best-suited algorithmsthat achieves a successful reconstruction given the correct description?2. Will a less accurate description give a poorer reconstruction result than anaccurate description?3. Will an inaccurate description give a poor reconstruction result?The first evaluation question address the issue of robustness, i.e., can the inter-preter deal with objects with varied material and geometric variations. The othertwo questions address the issue of sensitivity, i.e., can the interpreter still performreliably when the description is less accurate. The criteria of determining whethera reconstruction is successful are detailed below, along with the evaluation steps.6.1.1 CriteriaHow to determine if the reconstruction achieved by the algorithm selected by theinterpreter is successful? In this thesis, visual inspection is utilized to determinethe quality of reconstruction. More specifically, the result returned by the interfaceis compared to that of the baseline algorithms to determine if the quality is accept-able. As previously mentioned, the baseline method is chosen so that it always canprovide a decent reconstruction under most circumstances.The reason that qualitative comparison is sufficient is that we are less interestedin developing novel algorithms or improving algorithm performance. If that is thegoal, we do need a quantitative comparison among algorithms. However, the goalis to validate if the proposed interface can translate a description into an acceptableresult, thus we are more interested in showing that this user-specified descriptioncan indeed be interpreted and invoke a good-enough algorithm. Instead of com-puting quantitative values, such as accuracy, completeness, and angular error, weexamine visual appearances that represent these quantitative measures: the rough-ness of reconstructed surface indicates accuracy, the lack of surface holes indicatescompleteness, and surface spike indicates large angular error, see Figure 6.1.65(a). Rough surface (b). Surface hole (c). Surface spikeFigure 6.1: Visual phenomena that indicate poor quality of reconstruction re-sults. (a) has rough surface which indicates poor accuracy, (b) has in-complete holes which indicates poor completeness, and (c) has spikeswhich indicates poor normal estimation.6.2 InterpreterOur interface consists of three separate layers. The upper layer is the descriptionof the problem condition. The middle layer is the interpreter which receives a de-scription from the user and returns an acceptable result. The bottom layer is theactual implementation of the algorithms. The interpreter is responsible for choos-ing an appropriate 3D reconstruction algorithm based on the description of theproblem domain and additional requirements. Thus, it requires an understandingof algorithm performance across difference ranges of problem space. However,if the interpreter relies solely on the mapping, it is possible that a user-specifieddescription leads to more than one successful algorithm. We propose to use twoadditional constraints so that only one appropriate algorithm is selected.The first constraint is metric-first or shape-first. There are generally two typesof reconstruction results: euclidean/metric reconstruction, and shape reconstruc-tion. The former reconstructs a scene with metric information, but generally givesnoisier results. The latter can achieve quality that is only matched by laser scannersbut lack the depth information. The default setting of this constraint is metric-first.The second constraint is accuracy-first or completeness-first. Methods thatachieve high accuracy do not necessary achieve high completeness. In light ofthis, the user can choose an algorithm based on the priority level of accuracy andcompleteness. If multiple algorithm with comparable accuracy or completenessexists, we use a simple ranking system: if one algorithm has the same accuracybut is more complete, or more complete and equally accurate, it is chosen over the66others. The default is the accuracy-first constraint.Thus, the proposed proof of concept interpreter that consists of two compo-nents, mapping and constraints, as shown in Figure 1. Note that there are manyways of using this mapping information to create an interpreter, and by no meansis the proposed one the optimal interpreter.Algorithm 1 Proof-of-concept interpreterRequire: Description descEnsure: A successful algorithm bestAlgo returned{algos} MAPPING(desc)while {algos} 6= /0 doalgo Top({algos}), Pop({algos})if Euclidean-first thenif Accuracy-first thenscore QUANTSCORE(desc, algo, Accuracy)else if Completeness-first thenscore QUANTSCORE(desc, algo, Completeness)end ifelse if Shape-first thenscore QUANTSCORE(desc, algo, AngularError)end ifif score< bestScore thenbestScore score,bestAlgo algoend ifend whilefunction MAPPING(description)return A list of successful algorithms {algorithms}.end functionfunction QUANTSCORE(description, algorithm, type)if type== Accuracy thenreturn Accuracy score.else if type== Completeness thenreturn Completeness score.end ifreturn Angular error. . Current support three quantitative measures.end function676.3 Overview of DatasetsIn this section, we introduce a synthetic and a real-world dataset to evaluate theperformance of the interpreter under varied problem conditions. To the best ofour knowledge, there is few publicly available dataset that capturing images foralgorithms across different categories. Most existing datasets target a specific classof algorithms, such as the famous Middlebury dataset [59], DiLiGenT [61], DTURobot Image Data Sets [33], and so on.The synthetic dataset contains four objects, as shown in Figure 6.2, and thereal-world dataset contains nine objects, four of which are shown in Figure 6.2.Please refer to Figure A.6, A.7 for the complete dataset. It covers materials that arerepresentative of the four problem conditions proposed in Chapter 3: textureless-diffuse-bright (bust, statue), textureless-mixed-bright (vase0, cup), textured-diffuse-bright (barrel, pot), textured-mixed-bright (vase1, vase). In terms of surface shapes,we have focused mainly on objects with simple geometric properties, namely,smoothly curved surfaces with shallow concavities, see descriptions and exampleimages in Figure 6.2.(a). bust (b). vase0 (c). barrel (d). vase1(e). statue (f). cup (g). pot (h). vaseAppearance • textureless • textureless • textured • textured• diffuse • mixed • diffuse • mixed• bright • bright • bright • brightFigure 6.2: Representative objects of the four problem condition discussed inChapter 3. (a)-(d): Synthetic objects, and (e)-(h) real-world objects.686.3.1 CalibrationFor MVS, a calibration pattern proposed in [42] is imaged under the object, whichis used for calibrating the camera position and orientation by Structure from Mo-tion software, such as VisualSfM [70]. The focal length is known a priori andremains fixed during the image capturing process. Thus the extrinsic parameters ofthe camera can be retrieved up to a similarity transformation, and the reconstruc-tion result is a metric/euclidean reconstruction.For most PS algorithms, i.e., calibrated PS algorithms, it is necessary to esti-mate the light direction and intensity. However, the selected PS algorithm can dealwith unknown light sources and spatially-varying BRDFs. Thus, light calibrationis not a required step. Though it is preferable to correct the non-linear responseof camera, Hertzmann and Seitz discovered that it was unnecessary for EPS [29].Thus, we did not perform the radiometric calibration step. No geometric calibra-tion of the camera is needed.For SL, a open source camera-projector calibration software developed byMorenoand Taubin is used for calibration [47]. This technique works by projecting tempo-ral patterns onto the calibration pattern, and uses local homography to individuallytranslate each checker board corner from the camera plane to the projector plane.This technique can estimate both the intrinsic parameter and camera and projector,and the relative position and orientation.6.3.2 Image CapturingThe creation of the synthetic dataset is mostly the same as that discussed in Chap-ter 5, thus is omitted in this section. The images of real-world dataset are cap-tured using a Nikon D70S camera with a 1870mm lens. The image resolution is3004⇥2000 pix. The setup of the capturing systems are shown in Figure 6.3.For MVS, we capture the dataset by positioning the camera in three differentheights. The objects are about 30 50cm away from the camera, and stays fixedon a turntable. We have followed the following two steps to acquire data: 1) putthe camera at a different height, adjust the orientation so that the object is at thecentre of the frame; 2) take pictures while rotating the table. The table rotatesapproximately 30 every time. We rotate it 12 times and in total, we can obtain 3669images per object.For PS, a 70200mm lens, a hand held lamp, and two reference objects (dif-fuse and glossy) are used. The objects are positioned about 3m from the cam-era to approximate orthographic projection. To avoid inter-reflection, all data arecaptured in a dark room with everything covered by black cloth except the targetobject. We use a hand-held lamp as the light source and choose close to frontalviewpoints to avoid severe self-shadowing effect. We take 20 images per objectand select 15 plus images depending on the severity of the self-shadow effect.For SL, we use a Sanyo Pro xtraX Multiverse projector with a resolution of1024⇥768. The baseline angle of the camera projector pair is approximately 10.To alleviate the effect of ambient light, all images are captured with room lightsoff. To counteract the effect of inter-reflection, additional images are captured byprojecting an all-white and all-black patterns.(a). MVS, VH (b) PS (c) SLFigure 6.3: Image capturing systems for MVS/VH, PS, and SL.6.3.3 Synthetic DatasetEach object in the synthetic dataset represents one of the four problem conditionsdiscussed in Section 3. The description of problem condition of each syntheticobject is shown in Figure 6.4, as well as the appropriate algorithm selected by theinterpreter. Given a user specified description, the proof of concept interpreter willselect an algorithm, and any object that matches this description should be wellreconstructed by this selected algorithm.70Class # 1 2 3 4ObjectsCondition • textureless • textureless • textured • textured• diffuse • mixed • diffuse • mixed• bright • bright • bright • brightDescription Tex 0.2 0.2 0.8 0.8Alb 0.8 0.8 0.8 0.8Spec 0.2 0.8 0.2 0.8Rough 0.8 0.2 0.2 0.2Alg • EPS • EPS • PMVS • PMVS• GSL • EPS • EPS• GSLFigure 6.4: The representatives of the four classes of objects used for evalu-ation. Example images and problem conditions of synthetic objects arein the first two rows. The correct description of the problem condition ofcorresponding object is presented in third row. The last row shows thealgorithms returned by the mapping, from which the interpreter selectsone successful algorithm, which is coloured in red.6.3.4 Real-World DatasetWe select four real-world objects, each represents the one of the four problem con-ditions proposed in Chapter 3, as shown in Figure 6.5. The descriptions of problemconditions of real-world objects are shown in Figure 6.5, as well as the appropriatealgorithm selected by the interpreter. Given a user specified description, the proofof concept interpreter will select an algorithm, and any object that matches thisdescription should be well reconstructed by this selected algorithm.6.4 Evaluation of InterpreterThis section evaluates the proof of concept interpreter, which is three-fold: 1) givenan accurate description, can the interpreter select an algorithm that achieves a suc-cessful reconstruction result; 2) given an less accurate description, would the in-terpreter perform poorer than given an accurate description; 3) given an inaccurate71Class # 1 2 3 4ObjectsCondition • textureless • textureless • textured • textured• diffuse • mixed • diffuse • mixed• bright • bright • dark/bright • dark/brightDescription Tex 0.2 0.2 0.8 0.8Alb 0.8 0.8 0.8 (0.2) 0.8 (0.2)Spec 0.2 0.8 0.2 0.8Rough 0.8 0.2 0.2 0.2Alg • EPS • EPS • PMVS • PMVS• GSL • • GSLFigure 6.5: The representatives of the four classes of objects used for evalua-tion. Example images and problem conditions of real-world objects arein the first two rows. The correct description of the problem condition ofcorresponding object is presented in third row. The last row shows thealgorithms returned by the mapping, from which the interpreter selectsone successful algorithm, which is coloured in red.description, would the interpreter fail to achieve a successful reconstruction. Weprovide demonstrative results of the interpreter using the synthetic and real-worlddataset, as shown in Figure 6.2.6.4.1 Evaluation 1: Accurate Description, Successful ResultIn this section, we evaluate the performance of interpreter given a valid descrip-tion, i.e., whether the interpreter can return a successful reconstruction result givena valid description of problem condition. We provide the detailed process of re-constructing objects using the proposed interface with a example, and show thereconstruction results of the aforementioned datasets.Let’s revisit the scenario proposed in Chapter 1 using the test objects ‘bust’and ‘statue’ as an example. First, the user describes the problem condition basedon visual and geometric properties. In the example, the description of object ‘bust’72and ‘statue’ is: 0.2 (texture), 0.8 (albedo), 0.2 (specularity), and 0.8 (roughness).The synthetic objects with the same problem condition are shown in Figure 6.6.(a). property settings (b). synthetic sphere (c). synthetic objectFigure 6.6: (a) shows the problem condition of object ‘bust’ and ‘statue’specified in the 3D software. (b) and (c) shows the synthetic objectrendered using the problem condition in (a).Next, the interpreter, consisting of mapping and constraints, selects an appro-priate algorithm that could result in a successful reconstruction given the user-specified description. This is achieved by using the mapping discovered in Chap-ter 5, and additional constraints to select a reliable algorithm for reconstruction, asshown in Figure 6.7. In this example, GSL is selected as the interpreted algorithmfor reconstruction..:2 :2" -! " " :0 " Metric comparator 8.8Mapping to PMVSMapping to EPSMapping to GSL.::802002Figure 6.7: The interpreter selects an appropriate algorithm based on descrip-tion. Based on the discovered mapping from Chapter 5, EPS and GSLcan achieve a successful reconstruction, which is labelled in green whilePMVS would fail to give successful result, which is labelled in red.Lastly, the interpreted algorithm accepts images captured using the appropriatecapturing system as input, and returns a reconstruction result. We can see that73the reconstructed model of the interpreted algorithm has a much smoother surfacethan that of the baseline method, which indicates greater accuracy, as shown inFigure 6.8. Thus, we conclude that the interpreted algorithm reconstructs the objectsuccessfully.(a). GSL Result (b). BL Result (c). GSL Result (d). BL ResultFigure 6.8: (a), (c) show the reconstruction results of the interpreted algo-rithm, GSL. (b), (d) show the results of the baseline method.For more results, please refer to Figure 6.9. The figure is organized as follows:‘Desc’ row shows the description of the objects using the proposed model andrepresentations, the ‘Algo’ row shows the algorithm selected by the interpreter. The‘Results’ row shows the reconstruction results of the corresponding algorithm, andthe ‘Baseline’ row demonstrate the results of the baseline method. As discussedpreviously, we utilize visual phenomena, such as surface roughness, holes, spikesas indicators of the reconstruction quality. The baseline method achieves decentreconstruction results on all objects. Though due to the resolution of the voxelgrids, the surfaces are relatively rough. Further, the surface concavities fail to becarved out for concave objects (Bust). The performance of the interpreter on bothsynthetic and real-world datasets are consistent in that the algorithms selected bythe interpreter consistently outperform the baseline technique. All results havemuch smoother reconstructed surfaces, especially those of EPS and GSL, whichis an indicator that the accuracy of the results are much higher using the selectedalgorithm. Further, erroneous results, such as surface holes, spikes are absent fromthe reconstruction results, which indicates that the selected algorithms can achievecompleteness, and angular error no worse than the baseline.Note that the selected objects do no favour any algorithm. The significanceof this study is to demonstrate that it is achievable to translate a user-specified de-scription, which has nothing to do with algorithm-level knowledge, to an algorithm74selected from a suite of algorithms and achieve a successful reconstruction. Thereis no need to understand the underlying algorithms, or set the obscure algorithmspecific parameters.Synthetic and real-world objectsBust& Statue Vase0& Cup Barrel& Pot Vase1& VaseDesc02080208 02080802 08080202 08080802AlgoGSL EPS GSL PMVSResultsBaselineResultsBaselineFigure 6.9: Evaluation 1: correct description leads to successful reconstruc-tion result. The baseline results are provided so that we can determinethe quality of result returned by the algorithm chosen by the interpreter.6.4.2 Evaluation 2: Less Accurate Description, Less SuccessfulResultWe have demonstrated that the interpreter can achieve successful results given ac-curate description in last section. However, how would the interpreter performgiven a less accurate description? There are two guesses: 1) the interpreter wouldreturn a poor result given a less accurate description; 2) the interpreter can still75achieve successful results under some specific circumstances. In this section, weare interested to find out how sensitive the interpreter is given a less accurate de-scription, and cases where it would succeed or fail. We provide four less accuratedescriptions by iterating through four properties with one and only one property setincorrectly each time. For instance, propi is incorrectly estimated for desci whilethe rest of the properties are set correctly, i 2 {1,2,3,4}. Since a description couldbe mapped to multiple algorithms, a description that does not match the objectcould potentially result in a successful reconstruction result as well. We demon-strate the cases where less accurate descriptions lead to poor or successful results,and discuss the underlying rationale.Let’s revisit the scenario proposed in Chapter 1 using the test object ‘vase0’:the user (Daisy) first sets the parameters as follows: 0.2 (texture), 0.8 (albedo),0.8 (specularity), and 0.8 (roughness). However, the interpreted algorithm in thiscase, GSL, fails to reconstruct the glossy area on the surface, resulting in an in-complete result, as shown in 6.10 (a). The user then observes that the surface issmooth enough that the highlighted area is not blurry at all, thus decides to tweakroughness from 0.8 to 0.2. The interpreter selects EPS as the reconstruction algo-rithm, and then proceeds to achieve a smooth and complete reconstruction result,see Figure 6.10 (b). We can see from this example that less accurate descriptioncould lead to less successful reconstruction result.(a). 02080808 02080802Figure 6.10: (a) shows the reconstruction using the incorrect descriptionwhile (b) shows the reconstruction result using the correct description.The text labelled in red represents incorrectly estimated property.However, this might not always be the case. Let’s use object ‘bust’ as an ex-ample: given four less correct descriptions with exactly one property estimatedincorrectly, the algorithm selected by the interpreter can always achieve a success-76ful reconstruction result, see Figure 6.11. We conclude from this example thatit is achievable to obtain a successful reconstruction result given a less accuratedescription.(a). 08080208 (b). 02020208 (c). 02080808 (d). 02080202 (e). 02080208Figure 6.11: (a)-(d) show the reconstruction results given a less accurate de-scription while (e) shows the result given an accurate description. Thetext labelled in red represents the property estimated incorrectly.The question then becomes, under what conditions, the interpreter can achievea successful or poor reconstruction result given a less accurate description. Let’sdenote the algorithm(s) returned by the mapping given description Desci as {Algoi},and the algorithm selected by the interpreter as {Interpi}. For instance, the algo-rithms for object bust given description Desc1 is {Algo1} = {PMVS,EPS,GSL},and {Interp1} = {GSL}. Let’s also denote the algorithm(s) returned by the correctdescription as {Algo}. Thus in the case of object bust, {Algo}= {EPS,GSL}. Weobserve that a less accurate description can possibly lead to a successful recon-struction if and only if{Interpi}\{Algo} 6= /0The reason is that there are multiple algorithms that could work successfullyfor a specific object. Thus if an algorithm other than the interpreted one is chosengiven a less accurate description, a successful reconstruction result is still achiev-able. Let’s use vase0 and bust as an example. The algorithms returned from map-ping are {Algo3} = {EPS,GSL} given Desc3. The algorithms selected by the in-terpreter is {Interp3}= {GSL}. The algorithm(s) mapped from the correct descrip-tion are {Algovase0} = {EPS} for vase0, and {Algobust} = {EPS,GSL} for bust,respectively. We can see that {Interp3}T{Algobust} = {GSL}, thus a less accuratedescription can lead to a successful result whereas {Interp3}T{Algovase0} = /0,thus leading to a incomplete reconstruction result. The take-away message is thatfor an less accurate description, the reconstruction results are generally worse than77that of the interpreted result. However, for conditions that have multiple workingalgorithms, there may very well be an acceptable result.For more results, please refer to Figure 6.12, 6.13. The layout of the graph is asfollows: column wise, Desci, i 2 {1,2,3,4} presents results where a less accuratedescription is provided while the last column presents the results where the correctdescription is provided. Row wise, for each object, the result is divided into threesegments: The description of problem condition is presented in the first segment,which is coloured-coded to indicate the incorrectly set property. The order of prop-erties in the description is: texture, albedo, specularity, and roughness. Since thereis one and only one incorrectly estimated property in each less accurate description,it is coloured in red while the correctly estimated properties are coloured in black.The algorithms returned by the mapping are presented below the description in thesecond segment, with the algorithm chosen by the interpreter coloured in red. Thereconstruction result using the interpreted algorithm is shown in the last segmentof each section.6.4.3 Evaluation 3: Inaccurate Description, Poor ResultWe have demonstrated that given a less accurate description, the results may ormay not be poor. In cases where the algorithm interpreted from a less accuratedescription overlaps with the set of algorithms mapped from an accurate descrip-tion, it is still possible to achieve a successful reconstruction. In this section, weare interested to see if this conclusion can also be applied to completely inaccuratedescription, i.e., is it still possible to return a successful result. In this section, weevaluate whether the interpreter would return a poor reconstruction result given aninaccurate description of problem condition. We evaluate the performance of theinterpreter using both synthetic and real-world objects.Let’s revisit the scenario proposed in Chapter 1 using test object ‘vase0’: theuser (Daisy) is interested to test how sensitive the interpreter is by providing com-pletely incorrect description, i.e., 0.8 (texture), 0.2 (albedo), 0.2 (specularity), and0.8 (roughness) in this case. The interpreter selects PMVS as the reconstruction al-gorithm, and then proceeds to achieve a noisy and incomplete reconstruction result,which is worse than that of the interpreted algorithm, see Figure 6.14.78Object Descriptions and ResultsDesc1 Desc2 Desc3 Desc4 Correct DescBust08080208 02020208 02080808 02080202 02080208• PMVS • EPS • EPS • EPS • EPS• EPS • GSL • GSL • GSL• GSLvase008080802 02020802 02080202 02080808 02080802• PMVS • BL • EPS • EPS • EPS• EPS • GSL • GSLbarrel02080208 08020208 08080808 08080202 08080208• EPS • PMVS • PMVS • PMVS • PMVS• GSL • EPS • EPS • EPS • EPS• GSL • GSL • GSLvase102080802 08020802 08080202 08080808 08080802• EPS • BL • PMVS • PMVS • PMVS• EPS • EPS • EPS• GSL • GSLFigure 6.12: Evaluation 2: less accurate description may lead to poor re-sult. Desci represents inaccurate descriptions. For each object, thefirst row represent the description, with the incorrectly estimated prop-erty coloured in red while the correct ones in black. The algorithmsdetermined by mapping are below the description with the algorithmselected by interpreter coloured in red (BL: baseline). The last rowshows the corresponding reconstruction results.79Object Descriptions and ResultsDesc1 Desc2 Desc3 Desc4 Correct Descstatue08080208 02020208 02080808 02080202 02080208• PMVS • EPS • EPS • EPS • EPS• EPS • GSL • GSL • GSL• GSLcup08080802 02020802 02080202 02080808 02080802• PMVS • BL • EPS • EPS • EPS• EPS • GSL • GSLpot02080208 08020208 08080808 08080202 08080208• EPS • PMVS • PMVS • PMVS • PMVS• GSL • EPS • EPS • EPS • EPS• GSL • GSL • GSLvase02080802 08020802 08080202 08080808 08080802• EPS • BL • PMVS • PMVS • PMVS• EPS • EPS • EPS• GSL • GSLFigure 6.13: Evaluation 2: less accurate description may lead to poor recon-struction results. Desci represents inaccurate descriptions. For eachobject, the first row represent the description, with the incorrectly es-timated property coloured in red while the correct ones in black. Thealgorithms determined by mapping are below the description with thealgorithm selected by interpreter coloured in red (BL: baseline). Thelast row shows the corresponding reconstruction results.80(a). 08020208 (b). 02080802Figure 6.14: (a) shows the reconstruction result given an incorrect descriptionwhereas (b) shows the result given a correct description.For more results, please refer to Figure 6.15. The figure is divided into twosections: one for inaccurate description and one for accurate description. The inac-curate description is property-wise opposite to the corresponding accurate descrip-tion. The mapped algorithms are shown below the description, with the algorithmselected by the interpreter coloured in red. The reconstruction result by the inter-preted algorithm is shown as well.From the study of less accurate descriptions, we have already discovered thatit is still possible to achieve a successful result given a less accurate description.However, it becomes more likely to select a less successful algorithm or even thebaseline method (BL) given an incorrect description. Conversely, a better descrip-tion can lead to a better reconstruction result. The interpreter is, to some extent,tolerant to inaccurate descriptions. The take-away message is that the interpreter,in some cases, can still perform reliably given an inaccurate description. However,it becomes more likely to fail to achieve a successful result given an completelyinaccurate description of problem condition.6.5 Discussions and ConclusionsIn this chapter, we proposed a proof of concept interpreter, and provide demon-strative results of the performance of the interpreter. We are interested to see if theinterpreter is able to handle conditions where accurate or inaccurate descriptionsare provided. The findings can be further summarized into one graph, as shown inFigure 6.16.The figure layout is as follows: each description Desci only matches the prob-81Object Bust Vase0 Barrel Vase1IncorrectDesc 08020802 08020208 02020802 02020208• BL • PMVS • BL • EPS• EPSCorrectDesc 02080208 02080802 08080208 08080802• EPS • EPS • PMVS • PMVS• GSL • EPS • EPS• GSLObject Statue Cup Pot VaseIncorrectDesc 08020802 08020208 02020802 02020208• BL • PMVS • BL • EPS• EPSCorrectDesc 02080208 02080802 08080208 08080802• EPS • EPS • PMVS • PMVS• GSL • EPS • EPS• GSLFigure 6.15: Evaluation 3: inaccurate description may lead to poor result.For each description, the first row represent the settings of properties.The algorithms determined by mapping are shown below with the al-gorithm selected by interpreter coloured in red (BL: baseline). The lastrow shows the corresponding reconstruction results. We can see thatthe results of inaccurate descriptions are poorer than those of accuratedescriptions.82Desc # Bust Vase0 Barrel Vase1 Interp Algo.1 GSL2 EPS3 GSL4 PMVSDesc # Statue Cup Pot Vase Interp Algo.1 GSL2 EPS3 GSL4 PMVSFigure 6.16: The evaluation of interpreter using synthetic objects. The firstcolumn presents the description provided to the interpreter. Descrip-tion i matches with the problem condition of synthetic object in col-umn (i+ 1), which is labelled in green rectangle. The last column isthe algorithm selected by the interpreter. Since the interpreter wouldreturn a successful reconstruction given a description that matches theproblem condition, the quality of reconstruction of the labelled objectsindicates success/failure of the interpreter.83lem condition of object i, which is the object in the (i+1)th column. Based on pre-vious findings, objects in diagonal should return successful reconstruction whereasthe other results may or may not be successful, which is exactly the case.Building upon our description and mapping, we are able to develop a proofof concept interpreter which interprets the description of the problem, selects themost appropriate algorithm based on the mapping and returns a reliable reconstruc-tion result. The development of more sophisticated descriptions for more complexgeometry and material, incorporating new algorithms, and improving the construc-tion of mapping are some potential directions to improve the presented interface.84Chapter 7ConclusionsWith the increased sophistication of vision algorithms comes the increased barriersof applying these algorithms in real world applications. Just as personal comput-ers did not become mainstream until the graphical user interface, computer visionalgorithms will be less likely to reach to masses until creative minds from all disci-plines can take advantage of these amazing technologies without needing expertiseknowledge. To address this challenge, we proposed a three-layer interface, con-sisting of description, interpreter, and algorithms, to allow users with no visionbackground to take advantage of these vision algorithms. We review each of thechapters in details below.A Problem Space of 3D ReconstructionIn this section, we presented a well-defined problem space for 3D reconstruction al-gorithms based on the conditions surrounding the problem. It is an Ndimensionalspace, the axes of which are visual and geometric properties of objects. This al-lows users to think of an algorithm as a “pointer” to a sub-volume within the spacethat it can reliably work under. This object centred problem space allows appli-cation developers to have access to advanced vision techniques without requiringsophisticated vision knowledge of the underlying algorithms.85A Description of 3D ReconstructionAfter proposing a problem space that allows to associate algorithms to problemconditions, we proceeded to discuss a way of describing the problem condition ofa 3D reconstruction problem, which provides an abstraction layer above the un-derlying vision algorithms. The description consists of a model and correspondingrepresentations. The model selects characteristic visual and geometric propertiesof objects as components and the representation uses the key aspect of a property toquantify the strength of the specific property. The proposed description provides aformal and definitive way of describing the problem condition. We further discussthe ease of quantifying the properties, and provide concrete examples to demon-strate the process. The performance of the algorithm can be evaluated given a welldefined description, allowing a better understanding of the working conditions ofthe specific algorithm.A Mapping of 3D ReconstructionOnce we obtained the means of describing the problem conditions, we set out to in-vestigate the working conditions surrounding each algorithm. First, we investigatethe impact of pairwise properties on the performance of three algorithms acrosscategories, from which the effective properties are determined. Next, we evaluatethe selected algorithms under different problem conditions consisting only of theseeffective properties. This allows us to derive a mapping from problem conditionsto algorithms by comparing the performance to that of the baseline methods. Thisinformation provides a deeper understanding of performance of algorithms and po-tentially insights into how they may be improved.An Interpretation of 3D ReconstructionLastly, we demonstrated the use of the interface by a proof of concept interpreter,which returns a successful reconstruction result given a valid description of an ob-ject’s problem condition. The interpreter is the intermediate layer of the interface,which receives a user-specified description and invokes one of the underlying algo-rithms for reconstruction. We are interested to see how the interface would performgiven accurate, less accurate and incorrect descriptions. First, we proposed a proof86of concept interpreter that takes advantage of the discovered mapping and addi-tional constraints. Next, we presented the real-world and synthetic datasets forevaluation. Lastly, we demonstrated the performance of the interpreter by provid-ing accurate and inaccurate descriptions. The performance of the interface echoswith the statement of the thesis that a description-based interface for 3D recon-struction without knowing algorithm detail is achievable.We have discovered that the interpreter can produce a successful reconstructionresult reliably given accurate description. Further, the performance of the interfacedeteriorates as the description get farther away from the accurate description. Thus,we conclude that it is achievable to design a description-based interface that leadsto a successful reconstruction given a correct description of problem condition.Though, the interface, to some extent, is tolerant to the inaccurate description ofproblem condition. Nonetheless, the correct interpretation of problem conditionis key to a successful reconstruction. The main contribution of this thesis is thedevelopment and application of an interface to 3D reconstruction problem. Thesignificance of this contribution is that: 1) few algorithms can work for a diversecategories of objects. The interface, to some extent, can cover a wider range ofobject categories by incorporating multiple algorithms; 2) a description of objectproblem condition is provided to hide the algorithmic details, thus understandingof the algorithm, or conditions of applying algorithms are not a prerequisite.Extending beyond 3D reconstruction, our proposed three-layer interface togeneral vision problems can be applied to allow other vision tasks be more ac-cessible to application developers by providing a description to vision problemswhich allows the specifications of problem conditions. In order to provide such ac-cessibility, a description of a well defined problem space, and ways to discover theoptimal problem conditions must be addressed. This is a non-trivial task to providesuch an abstraction as it requires an understanding of the field and an ability to ab-stract away algorithmic complexity. Further, the construction of mapping requiresmore than just appropriate datasets, but also means of avoiding property scalingissue, which would make the problem space too vast to handle. In conclusion, thisthesis addresses the accessibility of one of vision topics, 3D reconstruction prob-lem, which provides a novel perspective of approaching 3D vision problem without87delving into algorithm details, and establishes a general framework of description-based interface that could be extended to other vision problems.7.1 Future Directions3D reconstruction has been one of the most important topics in vision for decadeswith a range of applications. This thesis focuses on the accessibility of these al-gorithms instead of developing algorithm novelties. We make several assumptionsand simplifications in this thesis, and thus opens up some potential future directionsthat can improve the work completed in this thesis.Geometric ModelThe current model fails to capture the geometric complexity of real world objectsand focuses mainly on visual properties. Target objects with complex geometricproperties, such as severe concavity, occlusion, depth (dis-)continuity, and so onare not captured in the dataset. The issue of incorporating these geometric prop-erties is due to the dilemma of mathematical and semantic representation: mathe-matical representations are easier to estimate but difficult for human interpretationwhile semantic representations are easier for humans to understand but challengingto be modelled mathematically. For instance, surface concavity can be mathemat-ically modelled by curvature, but this is non-trivial to estimate the parameter ofcurvature and translate the numeric value to semantically meaningful terms. Con-cavity can also be represented as a continuum with two extremities: convexity andconcavity, which is intuitive to humans. However, it is hard to interpolate to obtainany semantic terms in between, or get it translated to the mathematical counterpart.Thus, one of the research directions is to develop intuitive and semantically mean-ingful geometric model to better describe objects with complex shapes. This couldsignificantly expand the problem space and allows a much more powerful interfacethat target a broader categories of objects.Property EstimationWe have used a approach of visual inspection and “trial-and-see” to simulate andestimate the property settings of an object, which is based on user input and thus88is relatively subjective, tedious, less rigorous and prone to error. A more robustapproach is to utilize machine learning techniques to obtain visual and geometricinformation directly from images.Quantitative MeasureWe have utilized three metrics: accuracy, completeness and angular error. How-ever, there are other measures worth investigating. For instance, colour accuracythat measures the accuracy of the colour information of the reconstructed model,and ‘ghost’ error, which measures the amount of erroneously reconstructed objector scene, and so on. We provide the users with more options and controls overthe reconstruction result that suits their needs by providing additional quantitativemeasures.Mapping ConstructionThe key component of the interpreter is the mapping from problem conditions toalgorithms. This mapping information gives the interpreter information regardingwhat algorithms work reliably under a given problem condition. Every algorithmthat is to be integrated into the interpreter needs to be evaluated under a varietyof problem conditions. However, this process might suffer from property scalingissues. For instance, if the number of effective properties is N and each propertyhas L levels, the number of different combinations is LN , which increases expo-nentially as the number of properties increases. Therefore, we need better waysto discover this mapping relating problem space and algorithms. Otherwise, themapping construction suffers from what we call property scaling issue.InterpreterInterpreter is the intermediate layer of the three-layer interface, which is responsi-ble for receive user-specified description and invokes one of the successful under-lying algorithms. Currently, the implementation of the proof of concept interpreteris simplistic and does not fully take advantage of the information we have obtainedfrom the mapping construction process. Therefore, we should develop a more so-phisticated interpreter that is more powerful and offers more flexibility and options89to the users.7.2 Closing WordsNowadays, computer vision has been playing an increasingly important role in avariety of fields. However, it also requires application developers increasing exper-tise to apply these technologies to a specific application domain. We hope that theefforts in the vision community can be directed to not only develop more advancedalgorithms, but also easier access to these algorithms as well. The developmentof such a system across vision problems will require significant efforts in the formof detailed problem space, comprehensive description, sophisticated methods ofinterpretation, and so on. This is the work of the entire community of vision re-searchers, and could lead to advancements in the field similar to those seen inComputer Graphics following the creation of OpenGL. We hope that this thesisprovides a starting point for such a significant undertaking, highlighting our initialapproach to the problem, and the challenges that researchers who choose to followmay face.90Bibliography[1] Autodesk. URL http://en.wikipedia.org/wiki/Autodesk. [Accessed Feb. 2nd,2018]. ! pages 1[2] Lidar. URL http://en.wikipedia.org/wiki/Lidar. [Accessed Feb. 2nd, 2018].! pages 1[3] Kinect. URL http://en.wikipedia.org/wiki/Kinect. [Accessed Feb. 2nd, 2018].! pages 1[4] Vxl c++ libraries for computer vision research and implementation.http://vxl.sourceforge.net. [Accessed Feb. 2nd, 2018]. ! pages 10[5] N. G. Alldrin and D. J. Kriegman. Toward reconstructing surfaces witharbitrary isotropic reflectance: A stratified photometric stereo approach. In2007 IEEE 11th International Conference on Computer Vision, pages 1–8.IEEE, 2007. ! pages 21[6] B. L. Anderson. Visual perception of materials and surfaces. CurrentBiology, 21(24):R978–R983, 2011. ! pages 41[7] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman. PatchMatch:A randomized correspondence algorithm for structural image editing. ACMTransactions on Graphics (Proc. SIGGRAPH), 28(3), Aug. 2009. ! pages14[8] S. Barsky and M. Petrou. The 4-source photometric stereo technique forthree-dimensional surfaces in the presence of highlights and shadows. IEEETransactions on Pattern Analysis and Machine Intelligence, 25(10):1239–1252, 2003. ! pages 21[9] P. N. Belhumeur, D. J. Kriegman, and A. L. Yuille. The bas-relief ambiguity.International journal of computer vision, 35(1):33–44, 1999. ! pages 2091[10] S. Berkiten and S. Rusinkiewicz. An RGBN benchmark. Technical report,Technical Report TR-977-16, Princeton University, Feb. 2016. ! pages 42[11] F. Bernardini, H. Rushmeier, I. M. Martin, J. Mittleman, and G. Taubin.Building a digital model of michelangelo’s florentine pieta. IEEE ComputerGraphics and Applications, 22(1):59–67, 2002. ! pages 1[12] F. Blais. Review of 20 years of range sensor development. Journal ofElectronic Imaging, 13(1), 2004. ! pages 12[13] R. C. Bolles and P. Horaud. 3dpo: A three-dimensional part orientationsystem. The International Journal of Robotics Research, 5(3):3–26, 1986.! pages 34[14] G. Bradski and A. Kaehler. Learning OpenCV: Computer vision with theOpenCV library. ” O’Reilly Media, Inc.”, 2008. ! pages 10[15] E. N. Coleman and R. Jain. Obtaining 3-dimensional shape of textured andspecular surfaces using four-source photometry. Computer graphics andimage processing, 18(4):309–328, 1982. ! pages 21[16] C. H. Esteban and F. Schmitt. Silhouette and stereo fusion for 3d objectmodeling. Computer Vision and Image Understanding, 96(3):367–392,2004. ! pages 13[17] O. Faugeras and R. Keriven. Variational principles, surface evolution, pde’s,level set methods and the stereo problem. IEEE, 2002. ! pages 1, 13[18] R. W. Fleming. Visual perception of materials and their properties. Visionresearch, 94:62–75, 2014. ! pages 41, 42[19] R. W. Fleming, R. O. Dror, and E. H. Adelson. Real-world illumination andthe perception of surface reflectance properties. Journal of vision, 3(5):3–3,2003. ! pages 42[20] R. W. Fleming, C. Wiebel, and K. Gegenfurtner. Perceptual qualities andmaterial classes. Journal of vision, 13(8):9–9, 2013. ! pages 42[21] D. Forsyth and J. Ponce. Computer vision: a modern approach. UpperSaddle River, NJ; London: Prentice Hall, 2011. ! pages 105[22] S. Fuhrmann, F. Langguth, and M. Goesele. Mve-a multiview reconstructionenvironment. In Proceedings of the Eurographics Workshop on Graphicsand Cultural Heritage (GCH), volume 6, page 8, 2014. ! pages 1192[23] Y. Furukawa and J. Ponce. Accurate, dense, and robust multiview stereopsis.IEEE transactions on pattern analysis and machine intelligence, 32(8):1362–1376, 2010. ! pages 1, 11, 14, 47[24] S. Galliani, K. Lasinger, and K. Schindler. Massively parallel multiviewstereopsis by surface normal diffusion. In Proceedings of the IEEEInternational Conference on Computer Vision, pages 873–881, 2015. !pages 14[25] M. Goesele, B. Curless, and S. M. Seitz. Multi-view stereo revisited. In2006 IEEE Computer Society Conference on Computer Vision and PatternRecognition (CVPR’06), volume 2, pages 2402–2409. IEEE, 2006. ! pages1, 14[26] D. B. Goldman, B. Curless, A. Hertzmann, and S. M. Seitz. Shape andspatially-varying brdfs from photometric stereo. IEEE Transactions onPattern Analysis and Machine Intelligence, 32(6):1060–1071, 2010. !pages 21[27] H. Hayakawa. Photometric stereo under a light source with arbitrary motion.JOSA A, 11(11):3079–3089, 1994. ! pages 20[28] J. Heller, M. Havlena, M. Jancosek, A. Torii, and T. Pajdla. 3dreconstruction from photographs by cmp sfm web service. InMachineVision Applications (MVA), 2015 14th IAPR International Conference on,pages 30–34, May 2015. doi:10.1109/MVA.2015.7153126. ! pages 11[29] A. Hertzmann and S. M. Seitz. Example-based photometric stereo: Shapereconstruction with general, varying brdfs. IEEE Transactions on PatternAnalysis and Machine Intelligence, 27(8):1254–1264, 2005. ! pages 21,22, 47, 69[30] V. H. Hiep, R. Keriven, P. Labatut, and J.-P. Pons. Towards high-resolutionlarge-scale multi-view stereo. In Computer Vision and Pattern Recognition,2009. CVPR 2009. IEEE Conference on, pages 1430–1437. IEEE, 2009. !pages 13[31] B. K. Horn. Shape from shading: A method for obtaining the shape of asmooth opaque object from one view. 1970. ! pages 17[32] I. Ihrke, K. N. Kutulakos, H. Lensch, M. Magnor, and W. Heidrich.Transparent and specular object reconstruction. In Computer GraphicsForum, volume 29, pages 2400–2426. Wiley Online Library, 2010. ! pages2793[33] R. Jensen, A. Dahl, G. Vogiatzis, E. Tola, and H. Aanæs. Large scalemulti-view stereopsis evaluation. In 2014 IEEE Conference on ComputerVision and Pattern Recognition, pages 406–413. IEEE, 2014. ! pages 68[34] H. Jin, S. Soatto, and A. J. Yezzi. Multi-view stereo beyond lambert. InComputer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEEComputer Society Conference on, volume 1, pages I–I. IEEE, 2003. !pages 15[35] H. Jin, S. Soatto, and A. J. Yezzi. Multi-view stereo reconstruction of denseshape and complex appearance. International Journal of Computer Vision,63(3):175–189, 2005. ! pages 15[36] M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. InProceedings of the fourth Eurographics symposium on Geometry processing,volume 7, 2006. ! pages 11[37] P. D. Kovesi. MATLAB and Octave functions for computer vision andimage processing. Available from:<http://www.peterkovesi.com/matlabfns/>. ! pages 10[38] K. N. Kutulakos and S. M. Seitz. A theory of shape by space carving.International Journal of Computer Vision, 38(3):199–218, 2000. ! pages 1,13[39] M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira,M. Ginzton, S. Anderson, J. Davis, J. Ginsberg, et al. The digitalmichelangelo project: 3d scanning of large statues. In Proceedings of the27th annual conference on Computer graphics and interactive techniques,pages 131–144. ACM Press/Addison-Wesley Publishing Co., 2000. !pages 1[40] M. Lhuillier and L. Quan. Match propagation for image-based modeling andrendering. IEEE Transactions on Pattern Analysis and Machine Intelligence,24(8):1140–1146, 2002. ! pages 13[41] M. Lhuillier and L. Quan. A quasi-dense approach to surface reconstructionfrom uncalibrated images. IEEE transactions on pattern analysis andmachine intelligence, 27(3):418–433, 2005. ! pages 13[42] B. Li, L. Heng, K. Koser, and M. Pollefeys. A multiple-camera systemcalibration toolbox using a feature descriptor-based calibration pattern. InIntelligent Robots and Systems (IROS), 2013 IEEE/RSJ InternationalConference on, pages 1301–1307. IEEE, 2013. ! pages 6994[43] J. J. Little. Recovering shape and determining attitude from extendedgaussian images. PhD thesis, 1985. ! pages 33[44] S. P. Mallick, T. E. Zickler, D. J. Kriegman, and P. N. Belhumeur. Beyondlambert: Reconstructing specular surfaces using color. In 2005 IEEEComputer Society Conference on Computer Vision and Pattern Recognition(CVPR’05), volume 2, pages 619–626. Ieee, 2005. ! pages 21[45] G. Mariottini and D. Prattichizzo. Egt: a toolbox for multiple view geometryand visual servoing. IEEE Robotics and Automation Magazine, 3(12),December 2005. ! pages 10[46] D. Marr. Vision: A computational investigation into the humanrepresentation and processing of visual information. 1982. ! pages 13[47] D. Moreno and G. Taubin. Simple, accurate, and robust projector-cameracalibration. In 3D Imaging, Modeling, Processing, Visualization andTransmission (3DIMPVT), 2012 Second International Conference on, pages464–471. IEEE, 2012. ! pages 69[48] P. Moulon, P. Monasse, R. Marlet, and Others. Openmvg. an open multipleview geometry library. https://github.com/openMVG/openMVG. [AccessedFeb. 2nd, 2018]. ! pages 11[49] S. K. Nayar, K. Ikeuchi, and T. Kanade. Surface reflection: physical andgeometrical perspectives. Technical report, DTIC Document, 1989. !pages 38[50] S. Nishida and M. Shinya. Use of image-based information in judgments ofsurface-reflectance properties. JOSA A, 15(12):2951–2965, 1998. ! pages42[51] G. P. Otto and T. K. Chau. region-growingalgorithm for matching of terrainimages. Image and vision computing, 7(2):83–94, 1989. ! pages 13[52] T. Poggio, V. Torre, and C. Koch. Computational vision and regularizationtheory. Nature, 317(6035):314–319, 1985. ! pages 11[53] C. Rocchini, P. Cignoni, F. Ganovelli, C. Montani, P. Pingi, andR. Scopigno. Marching intersections: an efficient resampling algorithm forsurface management. In Shape Modeling and Applications, SMI 2001International Conference on., pages 296–305. IEEE, 2001. ! pages 2395[54] S. Roy and I. J. Cox. A maximum-flow formulation of the n-camera stereocorrespondence problem. In Computer Vision, 1998. Sixth InternationalConference on, pages 492–499. IEEE, 1998. ! pages 13[55] J. Salvi, J. Pages, and J. Batlle. Pattern codification strategies in structuredlight systems. Pattern recognition, 37(4):827–849, 2004. ! pages 15, 16[56] Y. Sato and K. Ikeuchi. Temporal-color space analysis of reflection. JOSAA, 11(11):2990–3002, 1994. ! pages 21[57] K. Schluns. Photometric stereo for non-lambertian surfaces using colorinformation. In International Conference on Computer Analysis of Imagesand Patterns, pages 444–451. Springer, 1993. ! pages 21[58] S. M. Seitz and C. R. Dyer. Photorealistic scene reconstruction by voxelcoloring. In Computer Vision and Pattern Recognition, 1997. Proceedings.,1997 IEEE Computer Society Conference on, pages 1067–1073. IEEE,1997. ! pages 13[59] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. Acomparison and evaluation of multi-view stereo reconstruction algorithms.In 2006 IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR’06), volume 1, pages 519–528. IEEE, 2006. !pages 1, 12, 15, 45, 48, 68[60] L. Sharan, R. Rosenholtz, and E. Adelson. Material perception: What canyou see in a brief glance? Journal of Vision, 9(8):784–784, 2009. ! pages42[61] B. Shi, Z. Wu, Z. Mo, D. Duan, S.-K. Yeung, and P. Tan. A benchmarkdataset and evaluation for non-lambertian and uncalibrated photometricstereo. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, pages 3707–3716, 2016. ! pages 45, 68[62] W. M. Silver. Determining shape and reflectance using multiple images.PhD thesis, Massachusetts Institute of Technology, 1980. ! pages 21[63] N. Snavely, S. M. Seitz, and R. Szeliski. Photo tourism: exploring photocollections in 3d. In ACM transactions on graphics (TOG), volume 25,pages 835–846. ACM, 2006. ! pages 11[64] Y. Uh, Y. Matsushita, and H. Byun. Efficient multiview stereo byrandom-search and propagation. In 3D Vision (3DV), 2014 2nd InternationalConference on, volume 1, pages 393–400. IEEE, 2014. ! pages 1496[65] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library ofcomputer vision algorithms. http://www.vlfeat.org/, note = ”[Accessed Feb.2nd, 2018]”, 2008. ! pages 10[66] G. Vogiatzis, P. H. Torr, and R. Cipolla. Multi-view stereo via volumetricgraph-cuts. In Computer Vision and Pattern Recognition, 2005. CVPR 2005.IEEE Computer Society Conference on, volume 2, pages 391–398. IEEE,2005. ! pages 13[67] G. Vogiatzis, C. H. Esteban, P. H. Torr, and R. Cipolla. Multiview stereo viavolumetric graph-cuts and occlusion robust photo-consistency. IEEETransactions on Pattern Analysis and Machine Intelligence, 29(12):2241–2246, 2007. ! pages 13, 14[68] R. J. Woodham. Photometric stereo: A reflectance map technique fordetermining surface orientation from image intensity. In 22nd AnnualTechnical Symposium, pages 136–143. International Society for Optics andPhotonics, 1979. ! pages 17[69] R. J. Woodham. Photometric method for determining surface orientationfrom multiple images. Optical engineering, 19(1):191139–191139, 1980. !pages 1, 19[70] C. Wu et al. Visualsfm: A visual structure from motion system. 2011. !pages 11, 69[71] R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from-shading: asurvey. IEEE transactions on pattern analysis and machine intelligence, 21(8):690–706, 1999. ! pages 18[72] E. Zheng, E. Dunn, V. Jojic, and J.-M. Frahm. Patchmatch based joint viewselection and depthmap estimation. In Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pages 1510–1517, 2014. !pages 14[73] T. E. Zickler, P. N. Belhumeur, and D. J. Kriegman. Helmholtz stereopsis:Exploiting reciprocity for surface reconstruction. International Journal ofComputer Vision, 49(2-3):215–227, 2002. ! pages 2197Appendix ASupporting MaterialsA.1 Definition of 3D ReconstructionWe will first provide definitions of some basic concepts, which include generalcomputer vision concepts such as scene, camera, and image. We then define afew other terms that are closely related to the reconstruction problem. We thenprovide reasonable approximations for a more practical definition of the problemas a whole.A.1.1 Basic NotationsWe will use the following notations: {Cn}N1n=0 represents the camera set, whichincludes both intrinsic and extrinsic parameters; {In}N1n=0 represents the set of allimages; {Ln}N1n=0 represents the set of light sources.Definition 1 (Scene) The scene S is the four-dimensional joint spatio-temporaltarget of interest.Definition 2 (Image) The image refers to the 2D observation of the 3D scene S onthe image plane of camera Ci at time t0, which is modelled as: Ii = T (S,Ci,L0, t0),or on the image plane of C0 under the light source Li at time ti, Ii = T (S,C0,Li, ti),where T is the geometric/radiometric transformation.T can be a geometric transformation which determines the 2D coordinates of a3D point, or a radiometric transformation which determines the intensity/irradianceinformation from the information of illumination, viewing direction and surface98orientation.A.1.2 Segment and ScellDefinition 3 (Segment)A segment (seg) is a distinct region in the image, and is themost basic element in the image, which can be considered as a generalized pixel.For instance, a segment can be a pixel, a window area, an edge, a contour, or aregion of arbitrary size and shape.Definition 4 (Cue) Cues are the visual or geometric characteristics of the segmentsseg that can be used for reconstruction, denoted as cue(seg).For instance, the cue can be the texture within a window area, the intensity/-colour value of a pixel, or the object contour, etc.Definition 5 (Scell) A scell (scene element, denoted as sc) is a volume in the scenewhich corresponds to at least one segment. A scell can be considered as a general-ization of a voxel.Definition 6 (Property) Properties are the visual and geometric characteristics ofthe scell sc, which would influence the cues of a segment, denoted as prop(sc).The property of the scell can be the 3D position or orientation information,visual texture, reflectance, surface orientation, roughness, convacity, etc.The relation between the terms defined above is shown in Figure A.1.3D to 2D transformationPropertiesCuesegmentscelementFigure A.1: Relation between a scell and a segment99A.1.3 ConsistencyEvery photograph of a 3D scene taken from a camera Ci partitions the set of allpossible scenes into two families, those that reproduce the photograph and thosethat do not. We characterize this constraint for a given shape and a given radianceassignment by the notion of consistency.Definition 7 (Consistency criterion) The consistency criterion checks whether theproperties of a scell sc can produce the cues observed in the corresponding segmentseg.consist(prop(sc),cue(seg)) = 1) consistentconsist(prop(sc),cue(seg)) = 0) not consistentDefinition 8 (Segment consistency) Let S be the scene. A scell s2 S that is visiblefrom Ci is consistent with the image Ii if and only if the consistency criterion istrue.Definition 9 (Image consistency) A scene S is image consistent with image Ii ifany scell 8s 2 S visible from the cameraCi is segment consistent with this image.Definition 10 (Scene consistency) A scene S is scene consistent with a set ofimages {In}N1n=0 if it’s image consistency with each image Ii 2 {In}N1n=0 in the set.A.1.4 Formal DefinitionDefinition 11 (3D reconstruction problem) Given a set of images {In}N1n=0 cap-tured by cameras {Cn}N1n=0 , or under a set of light sources {Ln}N1n=0 , find a set ofscells {scm}M1m=0 such that any scell is consistent with the visible images in the set{In}N1n=0 , i.e., 8sci 2 {scm}M1m=0 , we the have following:consist(prop(sci),cue(seg(i, j))) = 1.where seg(i, j) is the corresponding segment of sci in camera Cj. Alternatively, 3Dreconstruction tries to find a set of scells {scm}M1m=0 that are scene consistent withthe image set {In}N1n=0100A.1.5 Applied DefinitionWhile the definition presented above gives a formal definition of the problem of 3Dreconstruction, it is not necessarily applicable in a practical setting. In this section,we extend this formal definition to an approximate, yet applied version.Definition 12 (Consistency score) The consistency score measures the similaritybetween a scell sc and the corresponding segment seg.consist(prop(sc),cue(seg)) = x, x 2 [0,1]consist(prop(sc),cue(seg)) = 1) consistentconsist(prop(sc),cue(seg)) = 0) not consistentDefinition 13 (Applied consistency criterion) A scell sc and a segment seg areconsidered consistent if the the consistency score is above a pre-defined thresholde .consist(prop(sc),cue(seg))> eDefinition 14 (Applied 3DReconstruction Problem)Given a set of images {In}N1n=0captured by cameras {Cn}N1n=0 , or under a set of light sources {Ln}N1n=0 , find a set ofscells {scm}M1m=0 such that the consistency score between the set of scells and theircorresponding segments {seg(i, j)}M1,N1i=0, j=0 are maximized.maximizeN1Âj=0M1Âi=0consist(prop(sci),cue(seg(i, j)))A.2 Physics-Based VisionA.2.1 Radiometric TermsBelow is a list of radiometry terms, see Figure A.2 for an illustration:• Solid angle (dw): 3D counterpart of angle, dw = dAcosqiR2 (steradian).• Projected solid angle (dW): dW= cosqdw .101• Incident radiance (Li(qi,fi)): light flux received from the direction (qi,fi)on a unit surface area, unit (watt ·m2 · steradian1).• Irradiance (Ei(qi,fi)): light Flux (power) incident per unit surface area fromall direction, Ei(qi,fi) =RWi Li(qi,fi)dWi(watt/m2).• Surface radiance (Lr(qr,fr)): light flux emmited from a unit surface area inthe direction (qr,fr), unit (watt ·m2 · steradian1).Figure A.2: Illustration of light-matter interaction.Definition 15 (BRDF) the ratio of the scene radiance Lr(qr,fr) to the irradianceEi(qi,fi), i.e., f (qi,fi,qr,fr) = Lsur f ace(qr,fr)Esur f ace(qi,fi) .The BRDF is used in the reflectance equation:Lsur f ace(qr,fr) = f (qi,fi,qr,fr)Esur f ace(qi,fi)=ZWf (qi,fi,qr,fr)Li(qi,fi)cosqidwiThe phenomena described by the BRDF includes (at least for dielectrics) twodistinct physical phenomena - surface reflection and subsurface scattering. Sinceeach of these phenomena has different behaviour, BRDFs typically include a sep-arate term for each one. The BRDF term describing surface reflection is usuallycalled the specular term and the term describing subsurface scattering is called thediffuse term.102A.2.2 Reflectance Model: Microfacet ModelThe basis for most physically-based specular BRDF term ismicrofacet theory. Thistheory was developed to describe surface reflection from general (non-opticallyflat) surfaces. The basic assumption underlying microfacet theory is that the sur-face is composed of many microfacet, too small to be seen individually. Each mi-crofacet is assumed to be optically flat, and splits light into exactly two directions- reflection and refraction.The microfacet model postulates that if a surface reflection can occur betweena given light vector l and view vector v, then there must exist some portion of thesurface, or microfacet, with a normal aligned halfway between the l and v. This“half vector”, sometimes referred to as the microsurface normal, is thus definedas h = l+v|l+v| . Not all microfacets for which n = h will contribute to the reflection:some are blocked by other microfacets from the direction of l (shadowing), fromthe direction of v (masking), or from both. Microfacet theory assumes that allshadowed light is lost from the specular term; in reality, due to multiple surfacereflections some of it will eventually be visible, but this is not accounted for inmicrofacet theory. This is not typically a major source of error in most cases (roughmetal surfaces are a possible exception).With these assumptions (optically flat microfacets, no inter-reflection), the mi-crofacet specular BRDF term has the following form:f (l,v) = D(qh)F(qd)G(ql,qv)4cosql cosqvwhere ql and qv are the angles of incidence of l and v vectors with respect tothe normal qh is the angle between the normal and the half vector, and qd is the“difference” angle between l and the half vector. For the specular term, D is themicrofacet distribution function and is responsible for the shape of the specularpeak, F is the Fresnel reflection coefficient, and G is the geometric attenuation orshadowing factor.Although there are several model for subsurface local reflection in the litera-ture, the most widely-used on by far is the Lambertian BRDF term. The Lamber-tian BRDF is actually a constant value; the well known cosine of (n · l) factor ispart of the reflection equation, not the BRDF. The exact value of the Lambertian103BRDF isfLambert(l,v) =cdi f fpHere cdi f f is the fraction of light which is diffusely reflected. It is an RGB valuewith R, G, and B restricted to the 0 - 1 range, and corresponds closely to what mostpeople think of as a “surface colour”. This parameter is typically referred to as thediffuse colour.Most physically plausible models not specifically described in microfacet formcan still be interpreted as microfacet models in that they have a distribution func-tion, a Fresnel factor, and some additional factor which could be considered a geo-metric shadowing factor. The only real difference between microfacet models andother models is whether they include the explicit 14cosql cosqv factor that comes fromthe microfacet derivation. For models that exclude this factor, an implied shad-owing factor can be determined by multiplying the model by 4cosql cosqv afterfactoring out the D and F factors.A.2.3 Image Formation: Radiometric PerspectiveThis section discusses the radiometric perspective of image formation. Specifically,we discuss the contributing factors that determine the pixel intensity of an image.Light-Matter InteractionThe relation between the incoming illumination and reflected light is modelledusing the bidirectional reflectance distribution function (BRDF), refer to the Ap-pendix A.2.1 for definitions of radiometric terms.Incident Radiance Material Scene RadianceFigure A.3: The light-matter interaction. Scene radiance is linear related toincident radiance.As we can see from the definition of BRDF, scene radiance is a function ofBRDF given a fixed incident radiance. BRDF is a bidirectional function, which104depends on both incoming and outgoing directions. It can be simplified under spe-cific reflectance models. For instance, BRDF can be simplified as Diffuse albedoor surface albedo when using Lambertian reflectance, which is the proportion ofincident light that is reflected by the surface.Light-Lens InteractionA common assumption made in vision community is that radiance is constant as itpropagates along a ray. Therefore, the scene radiance is the same as the radiancepassing through the lens, which is the same as the radiance received by the sensor.Since image irradiance is the radiance accumulated on a unit surface area, it followsthat image irradiance is linear related to the scene radiance. Thus, it can be shownthat image irradiance is proportional to scene radiance [21].Scene Radiance Lens Image IrradianceFigure A.4: The light-sensor interaction. Image irradiance is linearly relatedto scene radiance.Light-Sensor InteractionThe camera response function relating image irradiance at the image plane to mea-sured pixel intensity values is a non-linear mapping. It is common to assume thatintensity is linear related to image irradiance in many vision algorithms. A linearrelation can be retrieved by radiometric calibration.Image Irradiance NL-response IntensityFigure A.5: The camera response function is typically non-linear. Thus pixelintensity if non-linear with respect to irradiance. However, this can becorrected by radiometric calibration. More often, it is assumed thatpixel intensity is linearly related to image irradiance in many visionalgorithms.105A.3 Results of Real-World ObjectsImage PMVS EPS GSL VH (BL)Figure A.6: Reconstruction results of MVS, PS, SL, and the baseline methodVH.106Image PMVS Example-based PS Gray SL VH(BL)Figure A.7: Reconstruction results of MVS, PS, SL, and the baseline methodVH (continued).107
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Development and application of a description-based...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Development and application of a description-based interface for 3D reconstruction Wu, Kai 2018
pdf
Page Metadata
Item Metadata
Title | Development and application of a description-based interface for 3D reconstruction |
Creator |
Wu, Kai |
Publisher | University of British Columbia |
Date Issued | 2018 |
Description | Advancements in state-of-the-art 3D reconstruction algorithms have sped ahead of the development of interfaces or application programming interfaces (APIs) for developers, especially to those who are not experts in computer vision. In this thesis, we have designed a novel interface, specifically for 3D reconstruction techniques, which uses a description (covering the conditions of the problem) to allow a user to reconstruct the shape of an object without knowledge of 3D vision algorithms. The interface hides the details of algorithms by using a description of visual and geometric properties of the object. Our interface interprets the description and chooses from a set of algorithms those that satisfy the description. We show that this description can be interpreted to one appropriate algorithm, which can give a successful reconstruction result. We evaluate the interface through a proof of concept interpreter, which interprets the description and invokes one of three underlying algorithms for reconstruction. We demonstrate the link between the description set by the user and the result returned using synthetic and real-world datasets where each object has been imaged with the appropriate setup. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2018-02-14 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0363879 |
URI | http://hdl.handle.net/2429/64632 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2017-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2018_may_wu_kai.pdf [ 43.39MB ]
- Metadata
- JSON: 24-1.0363879.json
- JSON-LD: 24-1.0363879-ld.json
- RDF/XML (Pretty): 24-1.0363879-rdf.xml
- RDF/JSON: 24-1.0363879-rdf.json
- Turtle: 24-1.0363879-turtle.txt
- N-Triples: 24-1.0363879-rdf-ntriples.txt
- Original Record: 24-1.0363879-source.json
- Full Text
- 24-1.0363879-fulltext.txt
- Citation
- 24-1.0363879.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0363879/manifest