An Interactive 3D Geometric Model Acquisition and Registration System by Yushuang L i u A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF THE REQUIREMENTS FOR T H EDEGREE OF M a s t e r of Science in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Computer Science) We accept this thesis as conforming ^fce^fehe requirecijstandard The University of British Columbia M a y 2003 © Yushuang L i u , 2003 In presenting this degree at the thesis in partial fulfilment University of freely available for reference copying of department this or publication of British Columbia, and study. of I agree I further agree thesis for scholarly purposes by his or her the representatives. may be It C&-r> fiwOlY' SOi #>\Ce_—• A The University of British Columbia Vancouver, Canada DE-6 (2/88) that the for an advanced Library shall make it that permission for extensive granted by the understood head that this thesis for financial gain shall not be allowed without permission. Department of is requirements of my copying or my written Abstract T h e acquisition of geometric information from real world objects has now become a major way of modeling complex scenes and environments. Unfortunately, most optical methods for geometric model acquisition require the combination of partial information from different view points, i n order to obtain a single, coherent model. T h i s , i n turn, requires the registration of partial models into a common coordinate frame, a process that is usually done offline. A s a consequence, holes due to undersampling and missing information often cannot be detected u n t i l after the registration process. T h i s thesis tries to solve the 3D registration problem using graphics hardware to achieve an interactive speed, so that online next-best-view planning and hole detection during the scanning process can be conducted. T h e system implemented realizes effective model reconstruction and refinement by interactively using a stereo camera setup and a hardware accelerated depth-map based registration algorithm. T h e emphasis is put on the hardware accelerated 3D range registration algorithm which can be categorized as a variant of the well known Iterated Closest Point ( I C P ) algorithm. T h i s work contributes to the field of 3D geometric model acquisition i n that it proposes a fast method that leverages the function and structure of modern graphics hardwares. Technically, graphics hardware is used i n two operations. T h e first is to compute a rigid transformation using modern graphics rendering pipeline. T h e second is to combine two rendered depth images to compute quickly the absolute difference of z-values of each projected points i n the overlap region i n the underlying two depth images, where the computation is used to obtain an error metric for a specific candidate transformation during numerical iteration. T h i s error metric is based on depth images which are rendered by point-based rasterization. T h i s system currently performs roughly one registration per second, and is therefore fast enough for on-the-fly evaluation by the user. Given more time, the same method is also capable of producing full geometric models at higher quality. ii Contents Abstract ii Contents iii List of Tables vi List of Figures vii Acknowledgements viii 1 Introduction 1 1.1 Motivation 2 1.2 Related Work 4 1.2.1 3D Measurement Techniques 4 1.2.2 Previous Registration Methods 6 1.3 Contributions and System Overview 8 1.4 Thesis Organization 9 2 Graphics Rendering Pipeline 10 2.1 Rendering Pipeline Overview 10 2.2 Coordinate Systems i n 3D Rendering Pipeline 12 2.2.1 Geometry Related Coordinate Systems 13 2.2.2 Texture Coordinate System 15 iii 3 Range Data Acquisition 3.1 17 Digiclops Stereo V i s i o n System 17 3.1.1 18 Digiclops Stereo Processing 3.2 Integrating Digiclops into the System 20 3.3 Stereo Processing Environment 23 3.3.1 Coordinate Systems Used i n the Triclops L i b r a r y 23 3.3.2 Stereo Processing Parameters 23 4 Range-Based Registration 25 4.1 P r o b l e m Specification and General Approaches 26 4.2 Range-Based I C P A l g o r i t h m 28 4.2.1 E x i s t i n g I C P Variants 29 4.2.2 Overview of O u r Range Based I C P A l g o r i t h m 35 4.2.3 Coordinate Systems Involved 38 4.2.4 Parameter Space 42 4.2.5 Surface Consistency Measure Function 44 4.2.6 Optimization Algorithm 46 5 Hardware Acceleration 51 5.1 Off L o a d i n g Geometric Computations onto G P U 5.2 Accelerating Surface Consistency Evaluation Using N V I D I A Register Combiners Mechanism 52 6 Model Integration 6.1 6.2 51 56 Merging Algorithm 56 6.1.1 Voxel G r i d D a t a Structure 57 6.1.2 Voxel M o d e l Integration 58 G l o b a l Optimisation 59 iv 7 Experimental Results 62 7.1 Synthetic Range Image Test for Registration 63 7.2 R e a l M o d e l Acquisition Result 64 8 Conclusions and Future Work 68 Bibliography 70 v List of Tables Triclops stereo processing parameters, descriptions and values used in the range acquisition module. vi 24 List of Figures 2.1 Diagram of major stages i n graphics rendering pipeline 11 2.2 Coordinate systems i n 3D rendering pipeline 13 3.1 A picture of the Digiclops stereo vision system 19 3.2 Image and world coordinate systems i n the Triclops library 24 4.1 D i a g r a m of the control point matching strategy. 37 4.2 Three explicitly involved coordinate systems 39 4.3 Possible outcomes for simplex deformation t r i a l steps 48 5.1 D i a g r a m of texture compositing strategy w i t h O p e n G L N V I D I A register combiners extension. [Kil02] 54 7.1 Experimental result of synthetic data registration 63 7.2 A picture of the ceramics model unicorn 65 7.3 Experimental results of the real data 67 vii Acknowledgements I would like to thank my supervisor, Wolfgang Heidrich, for his support over the years, for his insightful ideas and the enlightening conversations we have had on the project, and for the opportunity he has offered me to work on this interesting project. T h a n k s also to my advisor, Dinesh P a i , for his valuable advice on my studies, which inspired me to enter the exciting area of computer graphics; thanks to Professor D a v i d Lowe for his kindly proofreading of this thesis; thanks to D o n M u r r a y for his great help i n working w i t h the device. Thanks also to D a v i d P r i t c h a r d , Daniel Archambault, L i s a Streit, Shuzhen wang, and Peng Zhao for their friendship and suggestions on my research; thanks to P o i n t G r e y Research Inc. and B C A S I for their financial support. YUSHUANG L I U The University of British Columbia May 2003 viii Chapter 1 Introduction W i t h the tremendous increase i n P C power and the fast improvement of dedicated graphics hardware, the research scope of computer graphics has been greatly broadened over the past decade. Measurements of real world objects have been employed into each of the three traditional computer graphics research fields to produce more realistic effects, e.g. 3D scanning is used for modeling, motion capture is used for animation, a n d surface reflectance measurement is used for rendering specific materials. This thesis focuses on the topic of fast 3D geometric model acquisition from real world objects. A practical approach for acquiring the 3D shape from a real object and thereby modeling the object is to obtain the surface information directly using a range finder. T h e surface information thus scanned is the so called range data which usually refers to range images or 3D point clouds. Range data is widely used to model real world objects and environments. However there exits a fundamental problem here, which is due to occlusions and/or limitation on field of views, one set of range data scanned from a single point of view contains only samples from part of the object surface. Therefore to acquire a complete model, multiple range data sets from good distributed point of views need to be scanned and then aligned into a common coordinate system. T h e process of aligning two or more three dimensional 1 data sets into one common coordinate system w i t h rigid motions is a fundamental problem, namely 3D registration. For range scanning tasks, although some researchers b u i l d their own range scanners for specific applications [RHHL02], there are several commercial range scanners available to choose which are normally easy to manipulate [Cyb]. I n our project which aims at building a fast, near real-time model acquisition system by taking advantages of dedicated graphics hardware, we use a commercial stereo vision system, the D i g i c l o p s ™ 1 [Res], to do the range scanning, and put more effort on accelerating 3D registration problem by using concurrent graphics hardware. 1.1 Motivation 3D modeling has applications not only i n Computer Graphics but also i n a wide variety of other fields: • Medical Visualization. More and more visualization techniques have been used i n medical diagnose and treatment, such as M R I , C I , Ultrasound. Often, different scans (either from different scanners or from different positions) have to be combined to form a 3D model so that physicians can see the internal organs on a computer screen. • A r t Documentation and Study. One good example is the D i g i t a l Michelangelo Project [ L P C 0 0 ] at Stanford University, which involves scanning the + sculptures and architectures of Michelangelo i n I t a l y P o i n t G r e y Research Inc. and B C A S I for their financial support; thanks to and postprocessing the data acquired there to produce electronic 3D geometric models. For those sculptures and architectures, nothing can be better documents than the 3 D models obtained directly from the real objects. In addition, by producing a set of d i g i c l o p s ™ is a trademark of Point Grey Research Inc. 2 3D computer models, these immortal masterpieces i n art history can be made available to scholars worldwide.- • Computer Aided Manufacturing. One direct application of 3D model acquisition i n this area is reverse engineering, which is the process of producing digital models of existing parts and then remanufacture them. T h i s application problem arises from cases when the mechanical parts need to be remanefactured were hand-tuned to fit the existing machinery or the parts were designed by hand while the remanufacturing process needs a digital model. • Electronic Entertainment and Virtual Reality. A s video games are moving toward interactive 3D graphics, and film industry is relying on synthetic imagery more and more for producing special effects, the demands for digitized models are getting huge. A t the same time, v i r t u a l reality as a way of simulatinPointGrey Research Inc. and B C A S I for their financial support; thanks tog real world environments and experiences is gaining more popularity. A l l of these application areas may require 3D geometric models digitized directly from real world objects or sculptures created by artists. • Electronic Commerce. Online shopping has now been widely accepted and well recognized as a great trend i n this century. More and more individuals as well as organizations are relying on it for shopping supplies and marketing products. If 3D models created from real products can be put on the web and made available to be explored interactively by customers, then the customers will definitely get a much better view of their interested product than only from 2D images. G i v e n so many applications of 3D modeling from real world objects i n both industry and scientific fields, the current techniques are far from practical for most purposes. One of the difficulties concerns the scanning equipment which usually requires sophistications to use. Another comes from the registration process, since 3 most existing 3D free surface registration algorithms require intensive human interaction and they are usually very time consuming as well. For most traditional 3D acquisition methods, the range data capturing and the model creation are separate stages of a pipelined process. Therefore, it is often very hard to make sure a l l relevant data has been captured i n the first stage since the resulted model cannot be seen until after the registration process. Often users w i l l find i n the second stage that certain views have been missed, or the surface of the object is not covered i n certain regions. In this case, users will have to go back to the first stage and capture the missing data. T h i s may be difficult, for example, because the underlying object is no longer available, or because the whole setup needs to be recalibrated. The goal of this particular project is to implement an interactive 3 D ge- ometric model acquisition pipeline which allows a user to monitor the geometric acquisition process and make data capturing decisions accordingly. 1.2 Related Work To acquire a 3D geometric model from real world object has been a long-standing research topic i n the field of computational geometry and computer vision. Especially i n recent years, w i t h the great increment i n P C power and dedicated graphics hardware broadening the application areas of real world 3D models, it is gaining more and more researchers' interests. 1.2.1 3D Measurement Techniques W i t h the development of lasers, C C D ' s and high speed sampling and t i m i n g circuitry, methods of 3D shape measurement of real world objects have been evolving rapidly i n recent years. Based on whether they interact w i t h the object (direct contact w i t h the object or project some k i n d of energy onto it) or not, these methods fall into two primary categories: passive and active methods. 4 Passive shape digitizing algorithms have been deeply explored for decades i n the computer vision community. Several approaches, such as shape from shading and shape from silhouettes for single images, stereo algorithms for image pairs, and optical flow and factorization methods for video streams have been developed and applied to various applications. Although, general passive methods are computationally expensive, there are approaches w i t h i n this scope suitable for real time use having been developed [FHM+83, Mat97, M B R + 0 0 ] . T h i s k i n d of approach has a l i m i t a t i o n that it usually applies to scenes w i t h considerate textures. Various active 3D digitizing methods coming i n a wide range of accuracies as well as costs have been developed. T h e literature provides an abundance of those. For a very nice taxonomy, refer to [ R C M 0 1 ] . In short: active 3D digitizing systems + are either contact sensor based or non-contact sensor based. T y p i c a l touch sensor based systems consist of a touch probe attached to a mechanical jointed arm or pointer. B y touching the probe to the surface of the underlying object and measuring the joint angles and the lengths of the arms when the contact event is signaled, the position of the sensor head, which is also the position of the surface point, is recorded. A m o n g various contact sensor systems, the Coordinate Measuring M a chines ( C M M ' s ) are extremely precise and costly, which are currently the standard tool for making precision shape measurements i n industrial manufacturing. A large majority of active digitizing systems are non-contact sensor based, which operate by projecting specific k i n d of energy waves onto the object and then measure the amount of transmitted or reflected energy or the time needed for the receiver to receive the energy bounced back from the interested object point. T h e energy waves can be b o t h optical or non-optical including sonar, microwave radar ( R A d i o Detecting A n d Ranging), X - R a y , laser etc. For some variants of active non-contact range scanning systems, see the surveys on optical based methods [Bes88], methods based on time of flight [Inc], depth from defocus [SNN96], photometric stereo [HRG97], or projected-light triangulation [Cur97]. T h e real-time scanning-stripe system de- 5 signed by Grass [GTK92] i n 1992 falls i n this scope, too. C o m p a r i n g to passive methods, the active approaches require specific devices controlling and measuring the transmitted/reflected energy or the flying time of reflected energy and are thus more expensive. In addition they are normally faster, more robust and more accurate, and they do not require the existence of good scene textures to work. 1.2.2 Previous Registration M e t h o d s Due to the limitations of the contact based methods (slow, expensive, clumsy, etc.), more and more 3D measurements are made by the non-contact methods which usually offer 3D range data as measurement result. For those model acquisition pipelines starting from range data, normally there are two major phases: first is the registration stage which is the process of bringing two data set into alignment, and then the registered views are integrated into a seamless 3D model. M a n y different methods have been proposed for the registration of two 3D shapes. Some early methods rely on precisely calibrated mechanical equipment [SKSI90, S K S I 9 1 , V A 8 6 ] , such as a turntable and a robot arm to provide well controlled motion of the range finder or the subject. W h e n using methods i n this category, the underlying object and the sensor are moved relatively by a known amount i n the data acquisition stage; and then i n the merging stage, the view transformations provided by the apparatus are assumed to be accurate enough. A c t u a l l y by using dedicated hardware and sophisticated calibration, this approach avoids the registration problem. Usually, it has a lower accuracy than the other approaches explained later, and the relative motions of object and sensor are restricted to calibrated motions. T h e n i n early 90's, a powerful 3D registration algorithm based only on geometry i n the overlapping area, the Iterative Closest Point algorithms or the I C P algorithm i n abbreviation, has been introduced by C h e n and M e d i o n i [CM91] and Besl and M c K a y [BM92]. Since then many invariants based on the basic idea of this algorithm have been proposed for many specific applications. T h e algorithm 6 starts from an initial guess of the rigid motion transformation between two range data sets, and then iteratively evaluates and improves the current transformation by searching for corresponding point pairs on the two data sets and optimizing a predefined similarity measure. I C P has now been a well established algorithm for aligning two or more 3D shapes, which is widely used for the registration of 3D range data sets obtained by 3D scanners. However most existing systems running I C P are slow and not easy to use, either because a lot of human interaction is needed for obtaining a rough alignment or because very little feedback is available i n the data acquisition stage making good view planning almost impossible. To improve the level of control and automation for 3D range registration process, extensive research work has been done. Maver and Bajcs [MB93], and Stamos and A l l e n [SA98] have proposed automated next-best-view systems to guide the user for good sensor poses. T h i s k i n d of approach either involves intensive computational work or needs specific hardware to control the scanner positions. Another trend of automation registration process assumes that the relative location and orientation of the two scans to be aligned are completely unknown, and then by using surface features or exhaustive search, the optimized transformation can be found. These approaches either only deal w i t h special objects or very time consuming and they are typically slow and not robust. Recently, Rusinkiewick and coworker enumerated and classified many i n variants of the I C P algorithm, evaluated the effects of different approaches on the convergence speed [RL01]. Chapter 4 provides a summary of those I C P variants and the comparisons of their performances. Based on the survey, the authors proposed a real-time 3D model acquisition system which allows the user to see the continuously growing model on a computer screen at the same time when the hand held subject is moved and scanned [RHHL02]. T h e quick online feedback makes it possible for the user to oversee the scanning process, thus the next-best-view can always be decided actively and holes i n the partial model can be filled interactively. T h e i r system uses 7 a QOHz. structured-light range finder and can aligned two range data sets i n several tens of milliseconds, a whole model can been obtained w i t h i n half an hour. T h e n , i f necessary, a more accurate postprocessing can been conducted afterwards using the registration results from the real-time phase as i n i t i a l guesses. Besides the traditional I C P algorithm and its variants, there are other approaches to the range registration problem. Labsik et al proposed a new method for the registration of free-form surfaces which is not based on finding closest points [LRGOO], instead their algorithm takes advantage of the modern graphics hardware and uses the depth buffer for determining the distance between the surfaces. Seen from a higher level, this method is still an I C P variant. 1.3 Our Contributions and System Overview 3D model acquisition system aims at an interactive working mode, which al- though it might not be able to create high-resolution models as a variety of existing systems can do, it should be easy to manipulate for general users and fast enough for the user to get a preview of the ongoing partial 3D model. Therefore, view planning can be done while range data is scanned and thus data quality can be ensured. After the interactive range scanning and registration phases, i f desired, a more precise but much slower post-process can be done to get a model w i t h a higher precision. Our 3D model acquisition system consists of three modules: the range scanning module, the 3D-3D range registration module and the model integration module. Since it is aimed at an interactive working mode, speed is the major concern for each of the three modules. For the range scanning, we choose a commercialized stereo vision system, the Digiclops camera, as the range finder, which works in real time. For the 3D-3D range registration process, we adopt a depth based ICP algorithm, which takes advantages of the modern graphics hardware for geo- metric computations and thus runs faster than common I C P algorithms. T h e fast converging downhill simplex method is used for the numerical optimization. A l s o 8 for this module, we proposed a hardware accelerated approach which combines two depth images into one to compute quickly the absolute difference of z-values of each projected point and thus greatly reduces the necessary amount of memory accesses for registration purpose. For the model integration module, a volumetric model is used, which is fast to add new data on and fast to render as well. The major effort has been i n the acceleration of the 3D-3D registration pro- cess using graphics hardware. 1.4 Thesis Organization The remainder of this thesis is organized as follows. Chapter 2 describes some highlighted points of the modern graphics rendering pipeline of which we take advantage for the implementation of our system. Chapter 3 presents the usage of the range scanner, the Digiclops camera i n the model acquisition pipeline. Chapter 4 describes the design of the depth buffer image based 3D range registration algorithm. Some research results of the important invariants of the I C P algorithm have been summarized for a better understanding. Chapter 5 describes two approaches of using concurrent graphics hardware to accelerate the registration process, which are the major contributions of our work. Charpter 6 introduces the integration of the voxel 3D model and some possible ways of optimizing it offline. Chapter 7 presents the experimental results on both synthetic range data sets and real range images grabbed by the Digiclops camera. Chapter 8 makes a conclusion of this research work and suggests some ideas for further improvements of this system. 9 Chapter 2 Graphics Rendering Pipeline Our model acquisition system is designed to take advantage of dedicated graph- ics hardware for 3D range data registration as well as 3D geometric model display tasks. In order to discuss the usages of graphics hardware, it is useful to have a closer look at the fundamental features of a graphics system on which we rely on for the implementations of our model acquisition algorithms. T h e application programm i n g interface ( A P I ) we use to control the computer graphics tools is O p e n G L ( G L is the abbreviation for Graphics L i b r a r y ) , but since most contemporary graphics hardware and A P I s are variations of the traditional 3D rendering pipeline, we w i l l keep the discussion independent of any specific programming interface. OpenGL is only chosen due to its open structure and the fact that it is well specified and documented. 2.1 Rendering Pipeline Overview Because of the processing of information i n 3D graphics rendering occurs i n a.sequential order, where results computed by the previous step are used as input for the current one, an abstract model named "rendering pipeline" has long been used to describe the sequence of operations applied to a l l original data sent through this processing "pipe". T h e output of the pipeline is the final raster image drawn i n the 10 ocs modeling • transformation WCS viewing transformation VCS projection transformation CCS > DCS viewport transformation NDCS perspective division Figure 2.1: D i a g r a m of major stages i n graphics rendering pipeline. In the figure, O C S is the abbreviation for object coordinate system, W C S is for world coordinate system, V C S is for viewing coordinate system, C C S is for clipping coordinate system, N D C S is for normalized device coordinate system, and D C S is for device coordinate system which is also called window coordinate system. frame buffer. Figure 2.1. depicts the major stages of the graphics rendering pipeline which applies to most contemporary graphics hardwares and A P I s . A l l geometric data sent directly for geometric operations are eventually described by vertices sitting i n the object coordinate system which is a local coordinate system tied to the underlying object. If the input data are mathematically described by a few control points, such as smooth curves or surfaces, an evaluator stage w i l l be responsible for approximating curves or surfaces w i t h points before they are sent through the pipeline for geometric operations. Internally, a homogeneous coordinate system which is a 4-dimensional coordinate system denoted by [x,y, z,w] , is used T 11 to represent vertices. A l l geometric operations deal only w i t h homogeneous vertices and their associated data (e.g., normal vectors or colors). T h e major tasks of geometric operations are geometric transformations and their related calculations. A s shown i n the Figure 2.1, a series of operations are applied to b r i n g vertices from their 3D object coordinate system to the pixel positions i n the window coordinate system on the screen, which include: • Transformations. Usually it includes modeling, viewing, and projection operations. W h e n a scene object is sent through the pipeline, some combination of rotation, translation, scaling, reflecting, orthographic projection, and perspective projection transformations w i l l be applied. • Clipping. Objects or part of an object outside of the defined viewing frustum should not be seen by viewers so that they should be clipped or thrown away. • Viewport transformation. T h i s is the process responsible for setting up the correspondences between the transformed coordinates and the quantized screen pixel position i n the window coordinate system. In addition to those major operations which apply to a l l geometric primitives set through the pipeline, some optional steps may also get involved i f enabled. Lighting calculations, texture mapping, and special plan clipping are some examples. 2.2 Coordinate Systems in 3D Rendering Pipeline T h i s work deals mostly w i t h geometric primitives of vertices represented i n and transformed among various coordinate systems i n 3D and 2D spaces. Therefore it may help us get a clearer idea to look at more details of the coordinate systems involved i n the rendering pipeline and the transformations that tie them together. Those coordinate systems would determine the various geometric representations of 12 X v^rnax"""' •/max"''' ^max"' ^ (1,1.1) v "1 I I NDQS (0.0,0) (-1.-1,-1) Figure 2.2: Coordinate systems i n 3D rendering pipeline. a vertex while it is transformed w i t h i n the pipeline before being drawn as a pixel on the screen. 2.2.1 Geometry Related Coordinate Systems. Figure 2.2 shows the five coordinate frames the geometric calculations use i n the rendering pipeline i n the order the operations occur i n a drawing process. A series of mathematical transformations convert the 3D coordinates of geometric primitives i n their individual object coordinate system ( O C S ) , which is the input to the rendering 13 pipeline, to their corresponding pixel positions i n the window coordinate system or the device coordinate system ( D C S ) , which is the final output from the pipeline. Four intermediate coordinate systems including a world coordinate system ( W C S ) , a viewing coordinate system ( V C S ) , a clipping coordinate system ( C C S ) , and a normalized device coordinate system ( N D C S ) sit inside the pipe appearing i n the order as they are mentioned here. W h e n a vertex goes through the rendering pipeline, its representation changes from one coordinate system to the next one u n t i l it is rendered into the frame buffer as a pixel. Originally, a l l geometric primitives are represented i n their i n d i v i d u a l object coordinate systems. For positioning and orientating a l l objects i n a scene and keep tracking the moving objects, a world coordinate system is employed. T h e transformation closely related to this coordinate system is the modeling transformation which is responsible for setting scene objects properly for view. T h e viewing transformation brings a l l vertices from the world coordinate system to the eye coordinate system or the viewing coordinate system which always stitches to the eye point and oriented accordingly. There is a duality i n the nature of viewing and modeling transformations, that is moving the eye i n some way can be viewed as moving the object oppositely. Therefore both viewing transformation and modeling transformation modify the same modelview matrix. However, when rendering a scene, the viewing transformation must precede the modeling transformation i n order to make sure that the incoming object's coordinates are properly transformed to the eye coordinate system. T h e clipping coordinate system coming after the projection transformation is so named because the standard clippings happen right here when geometries are represented i n this coordinate system. A viewing frustum is defined by the projection transformation applied to the scene, which action is analogous to choosing a lens for a camera when shooting a picture using a real camera. Intuitively, only those objects inside this frustum can be seen by the viewer and thus need to be drawn. T h e 14 standard clipping refers to the clipping of scene against the border of the viewing fustum. Usually, two types of projection transformations are provided to control how the viewing frustum is specified and how the objects are projected onto the computer screen. One is the perspective projection which mimics how things are perceived by human eyes or pinhole cameras. T h e most prominent property of perspective projection is the foreshortening effect which is the further an object is to the eye (or the pinhole of camera), the smaller it appears to be. T o acquire photo realistic effects of 3D objects, it has to be perspective projection. T h e other type of projection is the orthographic projection which preserves objects' real size i n 3D by mapping them directly onto the screen. B y using orthographic projection, although the rendered objects may look distorted, they actually reflect the real or scaled measurements of objects i n a l l 3 coordinates. Orthographic projection is therefore widely used i n architectural and computer-aided design applications. After the standard clipping, viewing volume is transformed to a unit cube through a perspective division step which is to divide the three coordinate values by the fourth coordinate, w. T h e unit cube is named normalized device coordinate system. Finally, the viewport transformation maps the scene onto the computer screen by transforming the normalized device coordinate system into the quantized window coordinates or the device coordinate system, which stage is often referred to as Rasterazation. B y manipulating the dimensions of the viewport, the final image can be enlarged, shrunk or stretched. The visible part of the scene are now rendered as a 2D image into the frame buffer, the quantized x and y coordinates of a point are the row and column indices to its corresponding pixel, and its z coordinate is saved i n the depth buffer. 2.2.2 Texture Coordinate System Besides the six aforementioned coordinate systems, the texture coordinate system is another important coordinate frame used i n computer graphics to indicate how 15 textures should be mapped to the geometry. Depending on the nature of textures, texture coordinate can be one, two, three or even four dimensional. W i t h the fourth coordinate typically given the value 1, four dimensional texture coordinates is usually interpreted as homogeneous texture coordinates. W h e n a texture mapped scene is drawn, the object coordinate of a vertex determine the position of its rendered image and the normalized texture coordinates of it are used to look up its associated texel i n texture memories. There are two ways of assigning texture coordinates to vertices. One is to explicitly assign texture coordinates when the geometry is defined, the other is automatic texture generation which is to compute the texture coordinates automatically. This feature plays an important role i n this work. If this automatic texture generation mechanism is used, one or more texture coordinate components can be computed as the distance between the vertex and an arbitrary pre-defined plane i n object or viewing coordinate systems. Besides, a specific mode for generating texture coordinates i n a spherical environment map is also available. Texture coordinate calculations happen before the projection transformation through_ the manipulation of a 4 x 4 homogeneous texture matrix, which makes texture coordinates perspectively correct and makes it possible to apply geometric transformations to textures as well. N o matter what the dimensions of a real texture is, a l l texture coordinates are w i t h i n the range of [0,1]. Texture coordinates outside of this range can either be clamped to 0 or assigned the repeated value of its border. 16 Chapter 3 Range Data Acquisition Our 3D model acquisition system begins w i t h a range data scanning step. T o make it possible for the system to work at interactive rates, choosing a proper range data acquisition device is the first crucial decision to make. A commercially available stereo vision imaging system, the Digiclops Stereo V i s i o n System [PoiOla] provided by Point Grey Research Inc. [Res], which provides real-time 3D digital image capture is used for the range scanning purpose i n our system. 3.1 Digiclops Stereo Vision System As explained i n Chapter 1, the literature provides an abundance of different methods for obtaining 3D geometry from real world objects. Based on whether the measurement is made by interacting w i t h the object or not, these methods follow two directions: passive and active sensing. Passive methods are mainly explored i n the computer vision community, which include shape-from-shading for single image, stereo triangulation for pairs of images, and optical flow and factorization methods for video streams, etc. T h e Digiclops Stereo V i s i o n System is based on stereo triangulation technique and thus falls i n the passive sensing category. 17 3.1.1 Digiclops Stereo Processing A s clarified above, the Digiclops camera is a commercially available stereo vision system which makes range measurements using stereo triangulation techniques. Having been explored by computer vision community for decades, the purpose of the stereo vision technique is to perform range measurements using images obtained from slightly offset cameras. There are three major steps i n performing general stereo processing, which includes: 1. E s t a b l i s h correspondences between image features grabbed by different views of same scene points. The Digiclops accompanying software establishes correspondence between raw images using the S u m of Absolute Differences correlation method. 2. Calculate the relative displacements between those feature correspondences i n image coordinate system. 3. Determine the 3D locations of the scene features relative to the local cameras coordinate system, using the knowledge of the camera geometry. A picture of the Digiclops camera is shown i n Figure 3.1. W i t h a size of 15.5cm x 15.5cm x 5.0cm, it can be easily fixed onto a t r i p o d or held by hand. A s shown i n the picture, on hardware part, instead of making up of only two cameras, it is a trinocular stereo system consisting of an onboard three-camera module. T h e reason for using multiple cameras is to make matching between images more robust. T h e geometric setup of the Digiclops camera is like a letter L w i t h the three cameras situated on the vertices of a right angle triangle. Imposed by the geometric features of the imaging system, the epipolar constraints applies to the stereo system, which transforms the two dimensional search for correspondences into a one dimensional one when establishing correspondences between pixels i n different images. In addition, the L setup of cameras is to align the epipolar lines w i t h the image rows and 18 columns, and to reduce the image deformations due to rectification. O n software part, the Digiclops camera comes w i t h an accompanying software system, the D i g i clops and T r i c l o p s ™ libraries, which i n combination w i t h the three camera module, 1 perform range measurements. Figure 3.1: A picture of the Digiclops stereo vision system. W h e n making range measurements, the camera module obtains three images of the scene object from slightly different view points simultaneously, which are then digitized and stored i n the memory of the computer. W h i l e the images appear quite similar, from the knowledge of stereo vision, there's a shift between image pixels corresponding to the same scene point i n different images. T h e closer the object point is, the larger the shift of its corresponding images appear to be. W h e n the software system processes raw data, correspondences are firstly established between features captured i n two offset images. Based on those correspondences, the shifts 1 Triclops™ is a trademark of the Point Grey Research Inc. 19 are then calculated for all established corresponding point pairs. F i n a l l y , by using the knowledge of the camera's geometry, it is possible to determine the distances between the scene points and a reference plane which is the plane the three camera determine. W i t h the resulted depth/disparity image at hand, it is possible to calculate the 3D position of each point i n the image relative to the origin of the local camera coordinate system. As for the stereo processing strategy of the three raw images obtained simultaneously by Digiclops, the image from the camera located at the square corner of the right angle triangle is taken to be the reference one. T h i s reference image is then correlated horizontally w i t h the image on the left and vertically w i t h the one on the top to get two depth/disparity maps, which are then merged together to produce one dense fused depth map. T h i s approach of correlating b o t h horizontally and vertically has the advantage of introducing redundancies which allow us to obtain more reliable results. For example, based on those redundancies, we can verify whether two matches are compatible or not by testing i f the matched points in the left and top images lie on the same diagonal epipolar line. If this correlation is trustable by a predefined criteria then the two matches measure the same depth and we can be almost absolutely sure that they are correct because the searches for the two corresponding points are made on completely different areas horizontally and vertically. Otherwise, at least one match is false and we can optionally select the better one or ignore them both. . 3.2 Integrating Digiclops into the System According to the classification of 3D range scanners explained i n chapter 1, the Digiclops camera is a passive digitizer using stereo triangulation algorithms to make 3D range measurements, which on hardware part is a three-camera module. T h e design of using three onboard cameras dramatically reduces the difficulty involved in using general stereo vision systems, which as a result makes the Digiclops camera 20 very easy to manipulate for even general users who know nothing about stereo and 3D imaging. Basically, it functions simply as a hand-held camera and can be used to capture 3D real world objects by pointing the lenses to the objects and monitoring the results displayed on a computer screen. There are several practical advantages for choosing the Digiclops camera [PoiOl PoiOlb] as the range scanner i n our model acquisition system. Firstly, the Digiclops camera consists of three on board progressive scan C C D s that allow for full 3 D ranging of moving objects without interlacing problems found i n standard N T S C C C D s . Secondly, the system is calibrated to a high precision prior to shipping, b o t h lens distortions and misalignments have already been compensated and known as system constants. In this way we can avoid having to perform extensive calibration algorithms by ourselves when conducting 3D measurements. T h i r d l y , the combined software A P I , the Digiclops and Triclops S D K ' s , provides real-time range images using stereo vision technology, which allows users to accurately measure the relative distance of every valid pixel i n an image and generate dense depth maps fast and accurately. Finally, the device acquires color of the object at the same time as range information making the registration of textures and geometries done i n one process, which is a task not many range scanners can tackle. T h e Digiclops camera operates at a frame rate of about 16Hz for a image resolution of 640 x 480 from each camera, and its resolution can be as high as 1024 x 768. Instead of using the disparity maps output from the Digiclops directly, our registration algorithm starts from 3D point clouds. Therefore, i n addition to the C P U time required to do other calculations besides the stereo processing, the frame rate for generating new point clouds is i n the order of 10Hz on a 1.6GHz P e n t i u m 4 machine. A l t h o u g h this rate cannot be considered true real-time yet, an interactive model acquisition can be achieved i f the subject is moved slowly. Ideally, we are expecting to realize a 3D video camera out of the Digiclops, 21 which can be used to capture real world 3D shapes i n real-time while the Digiclops camera is held i n the user's hand and moved around the underlying object slowly and statically. However, there are practical problems arising when the scanner is used i n this way. First of all, its operating rate is not fast enough for real-time capturing so that the user has to be very careful when changing view points. Secondly, i n order to make our range based registration algorithm work, two successive images should have good overlapping area. B u t usually the subject is about half a meter away from the camera, so when the user tries to move the Digiclops camera around the subject by hand, slight tilts of the equipment may result i n very large displacements of the scene images, the subject may even get out of the field of view of the cameras on board. Finally, due to the limitation of stereo techniques, the Digiclops fails to produce correct 3D information for points along discontinuities i n the scene. T h i s makes the range registration using 3D measurement calculated directly from original data maybe erroneous. T h i s problem mainly manifests itself along silhouette edges. To get around the above mentioned problems, we have adopted an alternative acquisition strategy to acquire the range data from the underlying object, which includes two major approaches. One approach is a solid colored background has been employed so that a color segmentation algorithm can be used to get r i d of the erroneous sample points along the silhouette together w i t h a l l other background points; the other approach is to fix the Digiclops camera on top of a t r i p o d and rotate the subject instead of the range finder when making range measurements. T o make the movement of the subject easier, we actually put it around the center of a turntable covered by the black background, and rotate the table and thus the subject. I n this way, once the position of the subject relative to Digiclops is fixed, it wouldn't be moved very far from there since the motion of the subject is basically rotate about the a line about to pass its center. 22 3.3 Stereo Processing Environment E a c h group of the three raw images acquired by the Digiclops system has to be processed i n order to get the desired 3D range data. T h e Triclops library accompanying the Digiclops camera is the software S D K responsible for the generation of depth maps from the raw data, which is the ultimate result of the Triclops A P I . Those depth maps can then be used to generate range information i n other formats, e.g. the 3 D point clouds i n our application. 3.3.1 Coordinate Systems Used i n the Triclops L i b r a r y W h e n t a l k i n g about 3D information and operations, there is always a coordinate system involved. O n l y when relative to the coordinate system, the measurements and representations are meaninful. T h e Triclops library represents raw data i n its image coordinate system, the origin of which is at the top left corner of the digitized image. T h i s coordinate is only used by the library functions i n our system and thus can be seen as opaque by the user. T h e depth maps output from the Triclops library are represented i n its world coordinate system, which when seen at the level of the whole acquisition system is actually the local sensor coordinate system and/or the individual object coordinate system. T h e two coordinate systems used for Digiclops data processing by the Triclops library are shown i n Figure 3.2 [PoiOlb]. 3.3.2 The Stereo Processing Parameters Triclops data processing block takes raw images obtained from the Digiclops camera module and produces depth images out of them. Further characteristics of depth maps may vary depending on a number of stereo processing parameter settings. T h e Triclops S D K allows specifying several of the characteristics stereo processing may use. T h e brief descriptions and the values of the parameters we use in our range finding module have been summarized i n Table 3.1. T h e values of a l l parameters are chosen by experiments. 23 Image col A World Z m/ Object in world Image raw World X World Y / ) Viewing direction Figure 3.2: Image and world coordinate systems i n the Triclops library. Table 3.1: Triclops stereo processing parameters, descriptions and values used i n the range acquisition module. Stereo Parameters Description of the Parameter Value of the Parameter Image size or resolution A few supported raw image sizes. Larger ones consume more time; smaller ones contain less data. 640 x 480, Disparity range P i x e l range of distance measurements done i n images. 0 ~ 220 (close to the largest) Mask size Preprocessing W i n d o w size for stereo processing Does image unpacking, smoothing, rectification and edge detection. Validation Supported methods for verifying measurement correctness 13 edge detection mask size: 11 surfaceValidation: on textureValidation: off surface Validations ize: 100 surfaceValidatipnDifference: 0.9 uniquenessValidation: off Regions of Interest Regions of image processing done. A l l o w i n g establishing correspondences to subpixel accuracy, and thus offer more precise measurements. Subpixel Interpolation 24 The full image. on strictSubpixelValidation: off Chapter 4 Range-Based Registration H a v i n g obtained 3D data from the range scanner, i n our case the Digiclops camera, the next step is to align the range data sets scanned from different point of views and thus sitting i n different local sensor coordinate systems to a common global world coordinate system. T h i s process is the so called 3D-3D registration. Researchers from the computer vision community have attempted to formalize the description of this fundamental problem: "Given 3 D data i n a sensor coordinate system, which describes a data shape that may correspond to a model shape, and given a model shape i n a model coordinate system i n a different geometric shape representation, estimate the optimal rotation and translation that aligns or registers the model shape and the data shape minimizing the distance between the shapes and thereby allowing determination of the equivalence of the shapes v i a a mean-square distance metric." [BM92] Usually, a range data registration strategy consists of two major steps: correspondence selection i n which candidate correspondences between data sets are chosen, and rigid motion estimation i n which the rigid motion optimizing a predetermined surface similarity measure function between the two range data sets are estimated. T h e application problem confronted i n our project is the point-set matching problem without correspondences. A s mentioned i n Chapter 1, there exists a well established algorithm, the 25 I C P algorithm, for registering 3D geometric models based on geometry only (these are some I C P variants using additional information besides geometry, e.g. color and intensity [JK97]). A l t h o u g h our algorithm tackles the registration problem from a different approach than the classical I C P s , seen from an abstract level of view, it is still an I C P variant. Therefore, taking a closer look at the common processing phases involved i n basic I C P algorithm may lead to a well understanding of our approach. 4.1 Problem Specification and General Approaches The most popular used strategy for extracting the surface structure of an real world object and thereby modeling the 3D object these days is to obtain multiple range images from various viewing positions and orientations, and then to merge them together to reconstruct the complete surface. Due to the involvement of multiple range views, corresponding transformations that register each of these range images sitting i n their individual local sensor coordinate systems into the global coordinate system need to be determined so that a l l of them can be expressed i n a unique modeling coordinate system. More formally, the range registration problem can be stated as: given N range images of an object i n a scene, each one being a piece of the 3D structure of the object seen from a particular viewpoint and thus sitting i n its i n d i v i d u a l sensor coordinate frame, the registration process is to find N corresponding transformations Tj, T , 2 Tjv, that specify the positions and orientations of each local sensor.frame relative to the unique global modeling frame which we call the reference coordinate system. Given that range image i ( i = 1,...,N) consists of a set of 3D points (or vertices of a mesh) Si represented i n the local sensor coordinate system i, the transformation Tj transforms Si into the unique coordinate system, S i = T(Si), where S^ is the representation of the point set i n the global coordinate system. B y transforming a l l point sets of the N range images, a new set of 3D points which is 26 the union of all transformed range data sets S=\jT (S ) i S^,mathematically = \Jtf i (4.1) i i=l i=l can be generated. T h i s new set of points is the surface boundary 3D model of the object reconstructed from the multiple range views [BL93] o f the subject. T h e existing methods used for solving the 3D-3D registration problem fall into two main categories. One is to rely on precisely calibrated mechanical equipment to determine the transformation between different views. T h i s approach actually avoids the problem of searching for the appropriate transformation by blindly accepting the registration transformation provided by the acquisition apparatus [SKSI90, SKSI91]. Precisely calibrated turntables and robot arms are the most frequently used equipments for registration purpose [SKSI90, S K S I 9 1 , VA86]. The advantage of this approach is that it circumvents the complex registration process. B u t usually, the accuracy of this k i n d of methods is much lower than the methods falling i n the second category and the acquisition systems are expensive and not flexible. T h e second category of registration methods derives the transformation between range images from the information contained i n the range images as well as some additional information provided by the range data acquisition system. In this k i n d of systems, the transformation parameters are gradually updated and refined until according to a predetermined criteria, the views are precisely registered. Normally, a feedback function measuring the quality of the current registration is needed. Also, i n most cases, an estimation of the registration between the image to be registered and the reference one is part of the information available. Theoretically, i f a set of point pairs i n which two points i n different images corresponding to the same surface point on the subject can be precisely matched, then the registration problem is transformed to solving a group of equations. B u t practically, the correspondence information is not usually available, which makes 27 finding the correspondence a sub-problem of the registration process. T w o possible solutions exist. T h e first one pairs points by matching features from the two underlying views, i n which both kinds of features that are either parts of the scene or synthetic features that are added on to the scene manually have been used by researchers [SM92, Chu96, ea95, JH97, FH86]. U s i n g scene features for correspondence matching directly is highly feature dependent, normally it greatly limits the application area of the system since very few objects contain desirable features, while using added on features destroys the surface structure of the original scene. In addition, it spends a large percentage of computation time on extracting the invariant features [CHC99], which is another major drawback. T h e other possible approach to the point matching problem compares the differences i n the structure of the surface across views, which is a process operated over the entire surface or over a set of chosen control points. The I C P method, the most popular registration method used nowadays, is w i t h i n this category. 4.2 Range-Based ICP Algorithm T h e I C P algorithm, originally introduced by C h e n and M e d i o n i [CM91] and then extended by Besl and M c K a y [BM92] i n early 1990's, has now become the dominant method for registering partially overlapping 3D shapes based on geometry only (some approaches may use additional information other than geometry, such as color and intensity). Its m a i n application is for aligning multiple range data outputs of 3D scanners obtained from different point of views and thus sitting i n different local sensor coordinate systems into one common coordinate system. I C P works w i t h two data sets, usually two meshes (but not necessarily), one is the reference and the other is the data set to be registered to the coordinate system i n which the reference range data set sits. Starting from an initial guess of their relative transformation, the algorithm interactively improves the intermediate registration results between the two underlying overlapping surfaces by searching for the unique 28 transformation that optimize a predefined surface consistency measure. G u i d e d by the internal mechanism of the chosen numerical optimization algorithm, the process keeps operating until by a predetermined criteria a solution has been found or the m a x i m u m number of iterations has been reached (in this case, the algorithm fails). A t each iteration, the quality of the current transformation is firstly evaluated by generating corresponding point pairs based upon the current relative position of the two surfaces and calculating the predefined surface consistency measure function based on the matched point pairs. T h e n the registration is gradually improved using the numerical strategy. T h e guideline for the optimization process is actually the numerical algorithm. Four sub-problems need to be solved i n an I C P process, which are: • initializing the transformation parameters which is the starting point for the process to improve; • generating corresponding point pairs based on which the current registration can be evaluated; • coming up w i t h a good surface consistency measure function which is used to rate the level to which the two pieces of surfaces could be aligned; • finding a suitable numerical optimization strategy which is the guide for the whole process. A l t h o u g h a crucial step, the initial guess generation problem is usually not addressed i n the I C P algorithm itself. W h e n applying the I C P algorithm, it is almost always assumed that a starting point is ready. A l l the other three sub-problems are normally included i n the I C P algorithm. 4.2.1 Existing ICP Variants Researchers have been working on different applications of the I C P algorithm and proposed many variants based on the basic strategy affecting all phases of the al- 29 gorithm, refer to papers [FH86, SM92, JH97, D W J 9 7 , C H C 9 8 , C H C 9 9 , Pul99a]. Recently, Rusinkiewicz and his coworker have studied the existing variants thoroughly and done lots of comparisons among them [RL01]. T h e y used a six-stage scheme to describe the basic algorithm and classify those I C P variants, which can be seen as the invariant for a l l the variants. T h e effects of different approaches adopted by the underlying variants i n one or more stages on the speed performance and result accuracy have been explored. These comparison results can be valuable guidelines for building new I C P variants for new application requirements. The six-stage scheme and some of Rusinkiewicz's results are summarized bellow: Control Points Selection Control points are the points sampled from one or both data sets which w i l l then be matched to points i n the other data set. Since it is those control points that w i l l be used for the following registration stages, they need to be very representative for the whole range data set. Five existing selection strategies, including a new one they proposed i n the same work, have been examined, which are: • Besl's approach [BM92], which is to use a l l available points as the control points; • Turk's uniform sub-sampling method [TL94]; • Masuda's random sampling approach [MSY96]; • Weik's approach [Wei97] which takes advantage of the associated intensity of the range data i n addition to the geometry information, selecting points w i t h high intensity gradient; • author's new feature based sampling strategy [RL01] which is to choose control points based on their associated normals, the goal is to make the normals of the selected points span as large an angular space as possible. 30 Considering the sampling direction can be sampling from only one data set or from b o t h [GRB94], there are actually two approaches for each of the above mentioned five. A u t h o r ' s comparisons show that the convergence performance of all those approaches for scenes w i t h a good distribution of normals is similar. However, for some difficult scenes, it is sometimes the small features that actually determine the correct registration. To use feature based methods is usually the way to get the correct registration for those kinds of scenes. Author's normal space sampling method is the only one among the six that successfully aligned such a difficult test scene i n their experiments. For the effects of sampling directions, they found that it depends on the adopted matching strategy as well as on the overlapping condition of the two data sets. If the matching strategy is a symmetric one and the two range images have good overlap, the sampling direction doesn't really matter, otherwise to sample from both data sets is a little bit superior i n terms of b o t h convergence speed and registration accuracy. Point Matching T h e goal of this stage is for each control point to find the corresponding point i n the other range data set so that both of the two points correspond to the same surface point. However there are far less information available for this pairing. Different matching strategies have been proposed. T h e strategies investigated by the authors fall into two major categories: • Projection based algorithms, which include "normal shooting" [CM91], "reverse calibration" [Neu97, BL95] and project-search [Pul99b, Wei97, D W J 9 8 , BS97]. The "normal shooting" approach is to project the control points i n the direction of their normals and find the point at the intersection, while "reverse calibration" is to project the control points of the other data set from the sensor's position and find the intersection point. T h e project-search strategy is to project the control points to the other data set along the sensor's projec31 tion ray and then find the corresponding point according to various predefined metrics. • Non-projection based algorithm, which refers to the matching scheme of finding the closed point i n the other data set [BM92]. Similar to the control points selection approaches, i f we restrict the pairing to only those pairs i n which the two points are compatible to each other according to a specific metric [ G R B 9 4 , Pul99a], then we can get another matching method from each of the above mentioned approaches. A s to time performance, the project-search method shows a significant advantage, while the closest-point algorithm is very robust although it might not be fast. If the pairing is not restricted to compatible pairs, a l l the algorithms can be accelerated by a A; — d tree data structure and/or closest-point caching [Sim96]. Weighting and Rejecting Point Pairs After the control points are matched, different weights are assigned to a l l pairs i n order to give pairs that are most likely to be correctly matched more importance and to reduce the influence of outliers. T h e step that immediately follows weighting, rejecting point pairs, is closely related to weighting, which may reject certain point pairs completely. T h e intention for the rejecting point pairs step is to get r i d of the erroneous points which may be produced i n the range scanning process or when matching the control points. Four weighting methods have been examined i n Rusinkiewicz's work: • Constant weighting, which is actually ignore this step and treat a l l point pairs acquired i n the previous steps equally. • Treating the point pairs w i t h smaller point-to-point distances as more trustable ones, the larger the distance is, the smaller its assigned weight is [GRB94]. 32 • Weighting according to compatibility of normals or colors [GRB94]. • Weighting based on some statistical analysis on the range scan process. P r o m his experiments, Rusinkiewicz concludes that normally the weighting strategy w i l l not affect the convergence rate much, and that the effects w i l l be highly datadependent. A s to the step of rejecting outliers, actually, it can been seen as assigning a 0 weight to certain point pairs. Three kinds of rejection strategies have been explored i n Rusinkiewicz work: • Threshold the point-to-point distances. T h e threshold can be a distance value, a percentage out of all participating pairs [Pul99a] or some value generated from statistics [MSY96]. • Rejecting point pairs that are not consistent w i t h neighboring pairs, usually the consistency here refers to point-to-point distance [DWJ98]. • Rejecting point pairs w i t h one point on surface boundaries [TL94]. Comparisons show that outlier rejection strategies don't not really affect the convergence rate although it might affect the accuracy and stability of the registration algorithm. However, the strategy of excluding boundary point pairs is very useful for eliminating wrongly paired points when aligning partially overlapped range data sets. Surface Consistency Measure and Optimization T h e surface consistency measure is a function based on current alignment of the two underlying data sets, the value of which indicates to what extent b o t h of the two data sets can be representations of the same physical surface under the current registration. A t each optimization iteration, this consistency measure is first 33 evaluated based on the matched point pairs and then improved. So that from the mathematical point of view, the goal of the registration process is actually to optimize the surface consistency measure function. T h e difference between its current value and the predetermined stop criteria is the source of the force that pushes the numerical optimization process to keep searching i n the parameter space. T h e authors explored the effects of two different surface consistency measures (which is named error metric i n their paper) on the performance of the registration process. T h e two surface consistency measures are: • S u m of point-to-point distances over the entire overlapping area. T h e most intuitive approach uses the value of this sum only, while some other approaches may add on color [JK97] differences or differences based on information of other kinds. For this consistency measure, closed-form solution exists, several numerical methods can been used to solve it [ELF97]. • S u m of point-to-plane [CM91] distances, where the plane is the plane contains the corresponding point and perpendicular to the normal associated to this point. T h e optimization of this consistency measure is a non-linear problem, where there's no closed-form solution exists. Experiments show that to adopt different numerical optimization methods does not change the accuracy and stability very much[ELF97]. So that the m a i n concern of choosing a suitable numerical algorithm is the convergence rate. A s to the several existing ways of formulating the searching scheme [Sim96, M S Y 9 6 , B L 9 3 , B M 9 2 , C M 9 1 ] , since more complex ones tend to converge slowly, only two of them are explored by the authors, which include standard "select-match-optimize" scheme w i t h [BM92] and without [CM91] extrapolation. Experiments also show that the point-to-plane measure is dramatically superior than the point-to-point measure. A l t h o u g h the extrapolation strategy during the optimization process increases stability and avoids problems caused by overshoot, it might consume more computing 34 time. Based on their experiments, the authors concluded that for a fast I C P variant which might be suitable for real-time applications, it should use project-search strategy for point matching, a point-to-plane surface consistency measure and adopt the standard "search-match-optimize" I C P iteration [RL01]. T h e y proposed and implemented a new fast I C P algorithm adopting the above mentioned strategies, which is very promising for real-time applications. For other stages where different approaches do not change the convergence rate much, their method adopted the simplest ones which include random control points selecting, constant weighting and a distance threshold based outlier rejecting step. 4.2.2 Overview of O u r Range Based I C P A l g o r i t h m W i t h the exception of the work done by Rusinkiericz et al [RL01], the performance of the 3D-3D registration task has so far been too low for interactive applications. A s mentioned i n the previous subsection, Rusinkiewicz achieves the interactive rates by only using a random subset of points on the model for registration. For the same purpose, we go along a different route: taking advantages of the contemporary graphics hardware and rendering pipeline which are extremely efficient for geometric computations. Considering the fact that the registration problem involves mostly computations that the graphics hardware is optimized for, we adopt a depth based I C P method that by using the rendering pipeline offloads the geometric computations to the dedicated graphics board. O u r algorithm is generated from the intuition that i f two pieces of a 3D shape match (by match, I mean represent the same surface i n the same coordinate frame) , then their overlapping regions should match i n a l l directions, specifically, their depth images match i n the overlapping area. B y looking at the two depth images, the expensive task of explicitly finding corresponding points i n the two 3 D data sets have been avoided. Instead, we only need to do the searching i n one 35 direction which is along the depth. There are other researchers attempted solving the 3D registration problem from this direction [LRGOO]. W i t h these i n m i n d , we render the two range data sets and use the rendered depth buffer images for determining a one dimensional distance based consistency measure function which w i l l be optimized i n the registration process. T h i s algorithm makes use of graphics hardware for b o t h tasks of transforming the geometry from one coordinate system to another, and for evaluating the surface consistency measure function using the rasterization part of the graphics hardware. We can categorize our method using Rusinkiewicz' framework for I C P algorithms [RL01] • Control Points Selection. A l l points i n the current range data set obtained from the range scan module are used as the control points for computing the surface consistency measure. • Point Matching. T h e projection-based approach is adopted. E a c h time when we evaluate the current transformation, we render the data set of selected control points i n the reference coordinate frame v i a the transformation under an orthographic projection, and then pair each visible control point w i t h the point i n the other range image along the same orthographic projection ray, as shown i n Figure 4.1. B y passing the underlying range data through the rendering pipeline, we utilize the fast modern graphics hardware to carry out the complex geometric calculations. T h e reason why we choose an orthographic projection over a perspective one is two fold: firstly, the perspective foreshortening effect is avoided i n this way, so that the resolution of the rendered image is independent of the distance from the viewer; secondly, it simplifies the matching strategy. To pair up the points along the same orthographic projection ray, we can simply match the points i n the two depth images w i t h same row and column indices. W h i l e under a perspective projection, this simple pairing method does not have any meaningful physical interpolation. 36 control points Figure 4.1: Diagram of the control point matching strategy, which is to match points i n different rendered depth maps along same orthographic projection rays. • P o i n t P a i r W e i g h t i n g . We treat all generated point pairs equally by simply ignoring this step, all pairs are now assigned a weight of 1. Note that not a l l points from one range map have a corresponding point i n the other, since the object may not cover the whole image and the stereo algorithm may not be able to recover the depth at every pixel, which means we would have to throw away the control points that can not be paired up by the point matching strategy. • Outliers Rejection. In our model reconstruction system, no explicit re- jection method has been used during the registration process. However, as mentioned i n Chapter 3, a color segmentation method is used to get r i d of 37 background points. T h i s step, at the same time, eliminates most of the erroneous points along surface discontinuities employed by the l i m i t a t i o n of stereo processing. • Surface Consistency M e a s u r e . A sum of point-to-point distance based function is used as our consistency metric. Due to the fact that for a sum of point-to-point distances, the contribution of non-overlapping pixels to the consistency measure is equivalent to pixel pairs that have been perfectly aligned. It is not enough to consider the distances only, therefore the overlapping area has been employed to our surface consistency measure function as well. Details are covered i n the "Surface Consistency Measure" subsection. • O p t i m i z a t i o n . We use the standard "select-match-optimize" iteration scheme and a multidimensional numerical optimization method named "downhill simplex" [ P T V F 9 4 ] to find the local m i n i m u m around an i n i t i a l starting point i n the parameter space. Details of the "downhill simplex" algorithm w i l l be described later on i n subsection "Numerical O p t i m i z a t i o n " . 4.2.3 Coordinate Systems Involved T h e registration problem itself is originated from 3D representations of the same physical surface i n different coordinate systems. So that to make our registration goal more clearly, we need to take a closer look at a l l the coordinate systems involved i n our model acquisition system and find out how they are related to each other. There are a l l together three categories of coordinate systems explicitly involved i n our model acquisition pipeline, which include a global world coordinate system, a series local sensor coordinate systems, and the window coordinate system of a computer screen. Figure 4.2 illustrates the coordinated systems mentioned here. 38 (Xmax-I, Ynuix-I, Zmax-1) s (0, o, 0) Figure 4.2: Three coordinate systems explicitly involved i n the model acquisition pipeline. T h e top one is the unique global world coordinate system which is located at around the geometric center of the underlying subject, the middle one is one of the local sensor coordinate systems which are glued to each i n d i v i d u a l sensor position, while the b o t t o m one is the window coordinate system which is a quantized coordinate system on a computer screen. 39 Local Sensor Coordinate Systems Those are the coordinate systems which the Triclops library functions use for the measurements of the underlying world object and the representations of the resulting 3D range data output. E a c h scanned 3D data set is represented i n its individual local sensor coordinate system whose origin is always located at the pinhole position of the current reference camera. A s explained Chapter ? ? , i n our model acquisition system, i n order to obtain enough range data for reconstructing a complete 3D model, instead of fixing the subject and moving the camera to difference viewing positions, we actually fix the camera and moving the subject. T h i s approach makes the data acquisition process much easier to manipulate, which is mathematically equivalent to the moving camera approach. Because of the duality, we can always think i n the way that the camera moves around the fixed subject, therefore sensor coordinate systems keep changing during the range scanning process although the actual range finder is fixed. In our system, the local sensor coordinate systems are right-handed w i t h a real dimensional measurement of the object i n meters. Global World Coordinate System Our destination, the 3D geometric model, lies i n this space. subject somewhere close to its geometry center. It is fixed on the W i t h o u t loosing generosity, we actually choose the world to be the translated local coordinate system of the first range image whose origin is located at about the center of the subject. A s same as the local sensor system, it is right-handed and w i t h a real measurement i n meter. B y using such a global coordinate frame, we can acquire a model of real size. Window Coordinate System A l l the range data sets are rendered i n this coordinate system for display on the screen and the surface consistency measure is also evaluated i n this quantized coordinate system. T h e window coordinate system is actually an intermediate frame 40 employed by the rendering pipeline we use for registration. Once the registration process is finished, the range data w i l l be transformed directly from its original local coordinate system to the global one and the window coordinate system will decease. Unlike the above mentioned two, it is a left-handed system which is quantized into pixels i n b o t h x and y dimensions and scaled to the range of [0,1] i n depth or the in dimension z. Since that a l l the local sensor coordinate systems and the global one use the same measurement and that they a l l use the right-handed convention, there is only a rigid body transformation relating each of the them to one another. Specifically, one such transformation w i l l bring one local coordinate system to the global coordinate system. O u r registration process is to find these rigid body transformations for each of the local sensor coordinate frames so that a l l valid range data sets can be merged into the global coordinate system. A l t h o u g h there is a left-handed system, the window coordinate, involved i n the registration process, it is actually transparent to the user, which is used by the O p e n G L rendering pipeline and the evaluation of the surface consistency measure only. D u r i n g the registration, b o t h the reference data set and the current data set under registration are rendered into this window coordinate system using same rendering environment. So that a l l the geometric calculations can be carried out implicitly through rendering process and then the surface consistency measure function can be evaluated based on their rendering results i n this quantized system too. Considering that they are b o t h rendered using the same rendering parameters, i f the comparison shows that the two rendered images align well i n this frame, then it means the same rigid transformation aligns the local coordinate frame and the reference frame to the same accuracy as well. 41 4.2.4 P a r a m e t e r Space Since this 3D acquisition pipeline starts from raw data obtained by cameras, are there any intrinsic camera parameters need to be found i n the registration process? T h e answer is no. A l l parameters need to be recovered are the six parameters for rigid body transformation. A l t h o u g h the 3D scanner we use, the Digiclops camera, is actually a stereo vision system, which is made up of three C C D cameras, we don't need to consider any of the intrinsic camera parameters related to the vision system, such as the calibration factors caused by shape of the sensor or len distortions. A s already mentioned i n Chapter 3, the system is precisely calibrated before shipping and a l l the intrinsic stereo parameters that matter are fixed and can be treated as system constants [PoiOla, PoiOlb]. T h e Triclops S D K provides functions to transform points from the image coordinate system to object coordinate system which is the local sensor coordinate system mentioned i n previous subsection. Therefore, the image coordinate system is actually invisible i n the registration and integration processes and can thus be ignored. A l l parameters we need to find out i n the registration process are the six rigid motion parameters describing rotations and translations of the sensor coordinate frame relative to the reference coordinate frame. So that our registration problem is actually a six dimensional optimization problem w i t h each dimension i n the parameter space representing one rigid motion parameter. Representation of Rigid Motion Transformation Euler angles are very popular used i n defining 3D rotations. T h e basic idea of it is to factor a rotation matrix as a product of sub-rotations about the three major coordinate axes. T h e form of the factorization depends on the specified ordering of the three successive rotations, which i n t u r n is determined by the needs of the application. T h e approach of factoring rotation as Euler angles makes it very easy to compose the rotation matrix using O p e n G L A P I , which can be implemented by 42 basically issuing three standard O p e n G L commands i n sequence. There are various conventions for the sub-rotation ordering, such as "yzx" convention and "zxj/" con- vention. We use the xyz" (sometimes named "pich-roll-yaw") convention, which u is popular used i n the computer vision community and aviation engineering. U n der this convention, the rotation matrix is factorized as R = R (6 )Ry(6 )R (6 ), ( x x y z z where the subscripts x, y, z represent the axis around which the sub-rotations are conducted, and 6 ,9 ,9 x y z are values of the sub-rotation angles around the corre- sponding axis. A s explained previously, our registration process starts from 3 D point clouds, and the goal is to optimize the six degree rigid motion parameters between a local sensor coordinate system and the reference global coordinate system. A m o n g those parameters, three are for translations along three coordinate axis and three are for the rotations about the three coordinate axis. T h e position of the range finder i n the global world coordinate system which is abstracted as the origin of the local sensor coordinate system, is expressed by a translation vector, t = (t , t , t ) , T x the relative orientation of it can be described by three angles y z R(9 ,6 ,9 ), x y z while where the subscript x denotes the sub-rotation angle around the x-axis, similar meaning applies to the subscripts y and z. Therefore, a rigid motion of a 3D point p = ( p , P j , p ^ ) r e l a t i v e to the reference global coordinate system can be given by a linear T x / transformation T = Rp + t. However, this representation has a disadvantage coming from its physical interpolation which is the expression of a rotation followed by a transformation. Note that the transformation is now conducted i n the new, already rotated coordinate system. So that for an object point far away from the origin, small variations i n one or more rotation angles may result i n a large displacement of the object point, although the translation vector may be unchanged or very close to the previous one. T h i s is a undesired feature i n our work since the sampled surface points are almost always far from the origin of the global coordinate, which is somewhere close to the geometry center of the object. 43 I f the transformation is instead described by a rotation R a n d a preceding translation t (the rotation center is thus the translated origin), the effects of rotation and translation would be much better distinguished, and thus the above mentioned problem w i l l be avoided. We have therefore decided to adopt this slightly different transformation scheme, which can be mathematically expressed as T = R(p + i ) [NK99, LHS01]. T h e six rigid motion parameters {t ,t ,t ,9 ,6 ,6 ) x y z x y are those need to be recovered by the z registration process. For each range image we acquire i n its i n d i v i d u a l local sensor coordinate system, the six parameters have to be recovered i n order for the data to be aligned to the model sitting i n the global coordinate system. We offload these matrix transformations onto the graphics hardware using the O p e n G L A P I for further calculations. 4.2.5 Surface Consistency Measure Function Throughout the range data registration process, it is necessary to compare two surface representations to evaluate to which degree they could refer to the same physical surface. A numerical surface consistency metric is employed for this purpose, which acts as the objective function to be optimized by the numerical optimization algorithm. Different local surface alignment metrics have been explored by researchers [HubOl]. In our approach an overlapping region based surface distance function is used. For the registration process, we store the partial model that we have acquired so far i n a voxel grid. From this partial model we can easily compute a dense range map by rendering a single O p e n G L point for every occupied voxel. Similarly, every input range image can be converted into a range m a p seen from another view by simply re-projecting the individual points i n the range data set. A s explained i n the algorithm overview section, we match points along orthographic projection rays, where the pair wise distance is simply the distance between two corresponding points i n z direction (in camera space). A n intuitive alignment 44 metric is the sum of pair wise distances over the overlapping region i n the two rendered range images, mathematically represented as: consistencyi(T) = ] T d(p,C(p,T)) (4.2) pesa where function d() is the 3D Euclidean distance between two points d(p[,p~2) = \\p~i — P~2\\, T denotes the current transformation to be evaluated, a n d S 0 (a function of the current transformation T ) denotes the overlapping area between the two depth images, C is the correspondence function that yields the point q on the second surface, which is the projected corresponding point to point p under the current transformation T . Intuitively, the larger the accumulated distance value is, the larger the value of the surface consistency measure function w i l l be. Therefore, the transformation yielding the best registration w i l l be the one w i t h the smallest consistency measure value. However, there's a problem using this consistency measure function. Usually, the larger the overlapping area is, the more corresponding point pairs w i l l be taken into account, and thus the larger the consistency measure tends to be. W h i l e on the other hand, larger overlapping areas often indicate better alignment. B y taking this into consideration, we normalize this consistency measure by the number of pixel pairs that have contributions to the accumulated distance to get an improved consistency metric. , consistency \{T) consistency {T) = ^ ^ mN 2 (4.3) Where ||5 (T)|| is the number of matched point pairs i n the overlapping area. How0 ever, there is another problem arising from the second definition. W h e n there is no corresponding point found for a control point, the point is simply ignored; while when the two paired points are perfectly aligned, the distance between them is 0. T h u s the contributions of the two opposite cases to the consistency measure are identical! In order to encourage larger overlap regions and distinguish the above 45 mentioned two cases, we employ a threshold e into our consistency measure function: Ci i f | | S | | = 0; 0 consistency{T) = { \\ ^ )\\ S i f0 W °W ' < S T E d(p,c(p,T)) f \\S„(T)\\ 1 1 | | 9 | | < e ( ) 4-4 > W o\\ > eb where C i and C are simply two predetermined constants. These two constants and 2 the threshold have t o be carefully chosen so that the condition C l > \\S (T)\\ > \\S (T)\\ 0 is always true. Here, we choose 0 " (4 5) C = \\S (T)\\, and C\ = 1.5, since the condition 2 0 d(p,C(p,T)) < 1.0 always holds for depth buffer images whose values have been normalized to [0,1]. 4.2.6 Optimization Algorithm T h e discussions i n subsection 4.2.4 have made it clear that the registration problem confronted i n this project is a multidimensional optimization problem. M u l t i d i m e n sional here means the number of parameters we are optimizing is more than one. We choose the downhill simplex method [ P T V F 9 4 ] as the numerical optimization method for this task, which simply slides downhill i n cost i n a self-contained manner that almost does not put any limitations on the cost function. I n our application, the surface consistency measure function acts as the cost i n the downhill simplex algorithm. T h i s algorithm requires no derivatives at a l l , only function evaluations are needed. A l t h o u g h i n terms of the number of function evaluations it requires, downhill simplex method appears not very efficient,.it is often the most time efficient method when the cost function is not very complex. N a m e d "downhill simplex", this 46 method has nothing to do w i t h a simplex which is a geometric figure consisting of one more vertex than the number of dimensions of the space it spans. T h e simplex here refers to a figure i n the parameter space and the naming simply uses the deformation of a simplex i n the parameter space to figuratively describe the searching procedure. Generally only simplexes without degeneracy are considered valid. E a c h vertex of a simplex i n an N dimensional parameter space can be represented by an N-dimensional vector which when viewed outside of the downhill simplex algorithm and w i t h i n our 3D registration application is the representation of a six dimensional rigid motion transformation. A s same as most of multidimensional optimization algorithms, downhill simplex method requires an initial guess to get started. T h e n guided by its inner mechanism, the algorithm has the ability to go downhill until it converges at a local m i n i m u m . T h e starting point is actually N + 1 points spanning a simplex i n the N dimensional parameter space. T h e overall force pushing the simplex to a local m i n i m u m is the goal to replace the vertex w i t h the highest cost w i t h a vertex w i t h a lower cost. W o r k i n g iteratively, each iteration starts by a sorting procedure, which is to sort the vertices according to their cost. There are four types of de- formation actions that a simplex can take i n each iteration: reflection, extraction, contraction and multidimensional contraction. Before all actions except for the multidimensional contraction can really happen, a trial deformation of the simplex is conducted and the cost of the potential vertex has to be tested to see i f the trial deformation is feasible. O n l y the deformations that get r i d of the highest point can be really made out. T h e possible simplex deformation actions i n a two dimensional space are depicted i n Figure 4.3 and summarized bellow: • A reflection is to t r y the point that is reflective of the highest vertex against its opposite face (not necessarily two dimensional). Reflection reserves the inner volume of the simplex and thus maintain the degeneracy of the simplex. • A n expansion happens when the previous action reaches a lower cost point. 47 Figure 4.3: Possible outcomes for simplex deformation t r i a l steps: a ) A reflection. b ) A n expansion following the reflection. c ) A one dimensional contraction. d ) A multidimensional contraction [ P T V F 9 4 ] . It is the t r i a l deformation of expanding the simplex further along the same direction as the previous successful reflection or expansion, which makes the searching space expanding. • A contraction trial happens when the expansion can not be applied. T h e simplex contracts along the reversed direction of its preceding successful reflection, expansion or contraction. • A multidimensional contraction applies i f none of the above trials has found a 48 feasible point. T h i s is the action that the simplex contracts along a l l directions towards the lowest vertex, which may make the process converge too fast to a local m i n i m u m . Lensch et al [LHSOO] proposed a random displacement of the highest cost point w i t h i n the simplex using the simulated annealing method to replace the multidimensional contraction deformation. Although this approach avoids the fast local convergence problem i n most cases, it is too slow to be suitable for our interactive modeling project. Here we keep the multidimensional contraction step, as our starting points are close to the real optimization (refer to Chapter 3), the local convergence problem does not seem to affect the registration result much. To start the downhill simplex method, N +1 vertices are needed to initialize a simplex. In our approach, one of them is the i n i t i a l guess of the rigid motion, and the other N points can be carefully deduced from the single initial point. We simply double one of the N original parameters i n t u r n while keeping others unchanged, which can be mathematically represented as P , = Po + PQI where PQ is the initial guess of the rigid motion and P Q is its i th component, Pi is the i th vertex of the starting simplex. T o terminate the process, however, we have to take a l l the dimensions and thus a l l the simplex vertices into consideration. Press et al's termination criteria is used here, which is to end the iteration when the cost of a l l simplex vertices are so similar that further iterations won't improve it much or when the number of deformation trials has reached a predefined m a x i m u m number. T h e downhill method is then restarted immediately using this result as the i n i t i a l guess until the results of two successive downhill simplex optimization processes do not differ much. T h e pseudo code of this algorithm is listed bellow: 49 Initialize; While not terminate { sort all vertices according to their costs, do a reflection trial step, if (the trial cost < the lowest cost) do an expansion trial step, else if (the trial cost > the second highest cost) do a one diemnsional contraction trial step, if (the trial cost > the highest cost) do a multidimensional contraction. } T h e functionality of each deformation trial step is two-fold: one is to return the cost value of the potential point; the other is to replace the highest point w i t h the trial point i f the trial cost is lower than the highest cost, or to keep the previous shape and cost otherwise. For our interactive model acquisition application, the subject is moved slowly and in a consistent manner before a fixed range finder, so that we can use the transformation parameters obtained from the previous frame as the starting point and usually this approach leads to an acceptable solution according to the predetermined criteria. If the motion of the subject is too fast or not i n a consistent manner and the two successively obtained range images do not have good overlap or the rigid transformations for two successive range data sets differ too much, the registration process may fail, and the algorithm has to be reset to a new starting situation, (for example by roughly aligning the partial model by hand to the next image). 50 Chapter 5 Hardware Acceleration The current generation of graphics hardware is quite flexible i n its programma- bility, and its use is no longer limited to the traditional rendering and animation purposes [Kil02]. O u r work of using graphics hardware for registration purpose is inspired by the fact that the graphics hardware is specifically designed to out-strip the performance of conventional C P U hardware for the class of computations that graphics hardware is dedicated to. O u r model acquisition system accelerates the registration process by taking advantages of the contemporary graphics hardware functionality from two separate aspects. 5.1 The Off Loading Geometric Computations onto G P U contemporary graphics hardware today is comparable to C P U s i n design and transistor complexity, so that they are often called G P U s . T h e y are positioned to i m prove the performance of graphics computations which are highly parallel processes w i t h very piplineable algorithms. Consider that the registration process involves mainly geometric computations and that graphics hardware is designed to accelerate exactly those k i n d of computations, we can thus bring graphics hardware that traditionally was specialized for conventional 3D rendering and animation operations to bear on our registration task for accelerating the geometric computations. 51 We therefore off load the geometric primitives, here are the 3D point clouds output from the range acquisition module, onto the graphics board. T h e n , by rendering the 3D points into proper coordinate frames, the mathematical computations that carry out the 3D transformations are thus done through the rendering pipeline by specifically optimized hardware. T h e two range data sets used for the evaluation of the surface consistency measure function, which are two depth images, are generated as the rendering results. T h i s approach, on one hand saves C P U time for other tasks by off loading all geometric computations to the dedicated graphics board, on the other hand paralyzes the complex geometric computations which is the reason for developing graphics hardware by carrying out the m a t h calculations i m p l i c i t l y through the graphics rendering pipeline. 5.2 Accelerating Surface Consistency Evaluation Using NVIDIA Register Combiners Mechanism The most time consuming process for our 3D range registration algorithm is the multidimensional numerical optimization which includes several tens of function evaluations of the alignment metric. For each such evaluation, we need to use not only the depth image rendered from the current point set but also the depth image rendered from the reference geometry, which means we need to access two pieces of memory for the most demanding operation. Note that we already use graphics hardware to accelerate the registration process by generating the two range images in the first place. However, we can go further. If the two depth images can be somehow combined into one, then accessing one piece of memory would be enough to evaluate the alignment metric, that would help speed up the registration process. However, the traditional O p e n G L environment for combining pixel p r i m a r y colour w i t h several texture colours are lacking. 52 T h i s can be achieved w i t h the help of a specific piece of graphics hardware, the N V I D I A register combiners [NVI02] or a similar programmable hardware, which provides a flexible way of combining several parameters to arrive at a final pixel colour. Supporting at least two versatile general combiner stages and a special final combiner stage which is meant to perform a colour sum and fog computation, the register combiners sit i n the O p e n G L rendering pipeline as a rasterization processing stage operating i n parallel to the traditional O p e n G L texture environment. T h e Register Combiners mechanism allows a developer to specify a number of inputs, perform addition, multiplication or dot products, and "produce an output, which may i n t u r n be used as the input to a subsequent combiner. Inputs can be any of the texture colours, vertex colors, or even constants. W h e n the register combin- ers are used for rasterization purpose, the traditional texture environment w i l l be ignored and its state w i l l not be changed by the register combiner stage. Because the developer has control over the inputs to its register set and operations performed on them, the register combiner mechanism can be used to help implement some functions i n hardware for very sophisticated shading techniques. Some tasks requiring multi-pass rendering w i t h the traditional rasterization functionality can be implemented i n fewer passes by using the register combiners. T h i s is where our idea of combining the two depth images into one w i t h rendering pass comes from. A diagram depicting how a complex texture is composted by this specific piece of hardware using O p e n G L extension is shown i n Figure 5.1. Our accelerating strategy starts by encoding the depth information into a rendered color image by applying a one dimensional linear texture whose coordinate has been automatically generated as the distance to a predefined xy plane. A n d then map the texture coordinates to z direction of the object during the rendering process. After that, the absolute distances of the depth values between corresponding points of the reference depth map and the current one are calculated i m p l i c i t l y by using the register combiners texture compositing mechanism. T w o general register combiner 53 Figure 5.1: D i a g r a m of texture compositing strategy w i t h O p e n G L N V I D I A register combiners extension. [Kil02] stages and one global constant color are used i n our implementation. T h e mathematical expression of the distance function (see subsection Surface Consistency Measure) is the absolute difference between the two depth values stored i n the same pixel position of two different depth maps. To calculate the absolute difference, we take advantage of the "mux" operation provided by the register combiners, for which the resulted pixel value is determined by a special alpha channel value: mux{AB, CD) : = (SpareO[a] > 0.5)?AB : CD (5.1) Where AB, CD represent the two texture operands, and SpareO is one of the registers i n the register combiners' register set. In general combiner stage 1, an intermediate image is obtained by programming the combiners carefully. T h e pixel values 54 i n the intermediate image are generated as: {texl + (1 - tex2))/2, (5.2) which w i l l then act as the special alpha values mentioned above. Since we have 0 < texl,tex2 < 1 (5.3) the resulted pixel values fall into the same range as well. T h e n i n general combiner stage 2, the intermediate image is mapped to A = 2* {texl + (1 - tex2))/2 - 1 (5.4) C = - 2 * (texl + (1 - tex2))/2 + 1 and b o t h B a n d D are assigned constant 1. B y subscribing A,B,C,D into equa- tion 5.1, the pixel wised values of the absolute distances between texl a n d texl, \texl — tex2\, can be generated. Finally, after the mux operation is applied, a n d the pixel wised absolute differences are generated i n the rendering process, the surface consistency measure function can be evaluated by reading back this rendered image and summing up the per-pixel differences over the entire overlapping area. 55 Chapter 6 Model Integration In the previous chapters, we have shown how the information for building a 3D geometric model could be obtained by the range scanning and the registration processes in our model acquisation system. T h i s chapter addresses the problem of how this information can be properly merged together to b u i l d realism representation of the object. Since our system is aiming at an interactive working mode, realistic is not the only concern. T o give the user a quick enough feedback about how the model looks like currently so that the user can verify the model and detect holes during the acquisition process is a must. A l s o , the model acquired i n the interactive phase of the system is trading off accuracy for speed, therefore the uneven distribution of the registration inaccuracy may result a significantly different model although the individual range data set is accuratly registered. So i f desired, the same registration algorithm can later be used offline as a post-process method or some other global optimization strategies can be applied to produce a model w i t h a higher accuracy. 6.1 Merging Algorithm Traditionally, computer graphics uses triangles as rendering primitives, which approach has large costs on setup and rasterization for scences consist of large amounts of geometry. In order to decrease such costs and improve the efficiency for surface 56 rendering and display, the approach of point rendering which uses points as display primitives for continuous surfaces has been proposed [1W85]. G i v e n the goal of a live feedback of the model being scanned and reconstructed, some way must be found to display the constantly accumulated partial model at an acceptable frame rate. M o d e l i n g algorithms commonly used these days for display, simplification, and progressive transmission of meshes are too slow to be suitable for real-time use. For an in-depth survey of the existing merging algorithms, reger to paper [Cur97]. Usually, those approaches fall into two major categories: merge by operating only on point clouds [ H D D 9 2 , N A K 9 8 ] or merge using the connectivity information as + well [TL94, BL96]. We therefore adopt a 3D grid data structure and a point rendering approach which draws raw range sample points directly without reconstructing the polygonal surface mesh. T h i s method does not generate the surface model w i t h the highest quality that is possible these days, however it is faster and feasible to display the constantly increasing model on current hardware when the scanning process is on going, so that the user can see which areas of the model have been scanned and where there are holes need to be filled. B y using this approach, we sacrifice the quality of the 3D model under reconstruction i n order to get a reasonably fast merging and display rate for a live feedback. Some more accurate merging algorithms may be applied offline after enough range data have been obtained and roughly aligned to get a surface reconstruction w i t h a higher quality. 6.1.1 Voxel Grid Data Structure A s stated previously, our real-time scanning system adopts a 3D grid model and a point rendering method for interactive model integration and display. A n octree hierarchical data structure is used to achieve the compression of regions of empty space that is not occupied by any surface point. T h e whole octree represents the model we have so far aquired interactively. In each occupied octree cell, the 3D 57 location of the surface point i n the form of a vector of 3 components, the R G B colour data of the point, and an associated rough normal vector are stored. For the grid resolution, theoretically, a cell size on the order of the spacing of the range data output from the range scanner is optimal. Smaller grid cells use more memory, slowdown the rendering frame rate and be more sensitive to noise, yet it won't result i n a better quality limited by the resolution of the input range data from the range finding process. Considering the purpose for using point rendering, which is to trade off accuracy for an interactive frame rate, we adopt this o p t i m a l cell size or a cell size a little bit larger than this one. 6.1.2 Voxel M o d e l Integration Our voxel model merging method discards a l l the connectivity information and performs surface merging based on the point sample themselves only. T w o major tasks are tackled i n this process, one is to quatize the points to be merged to a 3D regular voxel grid and the other is to merge the quantized points into the grid stored i n the octree. Since surface data is accumulating rapidly, i n order to maintain an acceptable rendering rate, the number of primitives that must be rendered has to be restricted. Here we simply combine a l l points i n one grid to one point and discard all redundant, overlapping 3D data, which idea is similar to Rossignac and Borrel's mesh simplification method. A s to color, i n the range acquisition stage, we don't control lighting conditions at a l l , so that colors obtained for the same surface point may vary a little bit from view to view. T h e color information stored i n the acquired model is only roughly true. Therefore, we can randomly choose the color of one point falling into an octree cell as the color of this cell or use a mean color i n some sense. For the use of the associated rough normal vector, as specified i n Chapter [?], our registration process uses the depth image rendered from the current partial model as the reference frame, but since the model is not closed yet and also there might be some holes on the surface, part of the depth image may not be rendered from 58 the points on the front surface which is desired. Instead, it maybe the rendered image of points on the back surface seem through the existing holes or from the side that has not been closed yet. T h i s w i l l lead to significant registration errors for the reference frame may contain wrong geometric information. In order to get around this problem, we apply a back surface culling algorithm when rendering the reference from the partial model. T h i s approach makes sure that only those points on the front surface w i l l get rendred into the reference frame. A rough normal associated w i t h each surface point is maintained at each occupied voxel m a i n l y for a p p l y i n g this algorithm. We dynamically update this rough normal by the averaged viewing direction from which this cell has been seen so far. A s surface samples coming from multiple range images scanned from different point of views may map to the same voxel of the model, the rough normal associated w i t h each voxel changes as the point data accumulating into the octree storing the model. A l t h o u g h this rough normal is sometimes far from the real one, it works well enough for our back face culling algorithm. 6.2 Global Optimisation Usually, for multiple range view registration systems a gap may appear between the first and the last views due to the error accumulation between successive range images. T h i s is caused by the local nature of both the registration process and the geometric information available i n each individual view. In our interactive registration system, we are able to trade off accuracy for speed. So a slower offline post-process using the same strategies may be applied to reconstruct a final 3D model of a higher quality. Registration algorithms can be classified into two categories, pairwise registration and global registration. Pairwise registration algorithms register two data sets at a time while global registration algorithms perform registration process on multiple data sets simultaneously. O u r depth based range registration algorithm is 59 a pairwise registration technique, but its usage i n this system is not limited to pairwise registration only. A s mentioned previously, we use the rendered depth image of the current partial model instead of an individual range image as the reference, which actually consists of multiple range data sets. In this way we are actually registering one range data set to the combination of a l l data sets aligned to the model previously simultaneously. B u t still an uneven distribution of the registration error may result, some global optimisation strategy is needed to decrease the accumulated error due to the local nature of the registration process. In our model acquisition pipeline, after the interactive registration pass, we have already gained more preliminary information for the under constructed 3D model, which include • the user is confident that enough range data sets have been acquired by the range finding module from the interactive working mode (i.e. no holes left unfilled, and the model can be closed); • 3D registration results obtained by the interactive process are avaiable as very good initial guesses for the numerical optimization algorithm running i n the post-processing module; • a preliminary voxel model stored i n an octree is available. Based on the available information, the improvements of the 3D model can be approached from several different approaches: one is to use a better rigid motion prediction obtained from the interactive registration process as the starting points of the numerical optimization, which not only decreases the number of iterations required for registering one image speeding up the numerical process, but also i m proves the registration accuracy; the second approach is instead of registering the current range image i n sequence, we can do a global optimization step by taking the 60 full model, deleting all point contributed from one view, and then re-registering that view into the model. B y applying this global registration strategy, the individual range data set is sort of registered to all the other data sets simultaneously. A s views are randomly picked and treated like this, we can gradually improve the registration of all views. Finally, we can use a voting scheme to determine outliers i n the geometric model (e.g. erroneous data provided to us by the stereo algorithm i n the range finding module). We do this by assuming that only those voxels containing surface points contributed from a certain m i n i m u m number of different views (2-4 usually) yield good results for us. G i v e n all the preliminary information, the post-process improving the model from the above mentioned three approaches can be r u n completely automatically. 61 Chapter 7 Experimental Results Before describing the test results, some practical issues concerning the implementation are discussed. Range Data Acquisition: we use the Triclops library which is the software development kit shipped w i t h the Digiclops. T h e output from this process is the corresponding 3D point cloud ready for registration using rendering pipeline. Test system: our model acquisition system is implemented on a system running Redhat L i n u x 7.3, w i t h Intel Pentium 41.6GHz processor and a GeForce 3 G P U . T h e first part is the scanning process including the generation of 3 D point cloud, which operates at 3-4Hz, and the later one for registration and integration operates at about 1Hz on average. T h i s could be accelerated further by using newer hardware or a dual-processor machine (since the stereo matching and the registration part are independent of each other). Model Integration: for a quick rendering, we use a voxel model stored i n an octree model for the interactive mode. After a range image has been registered to the global coordinates system its points are merged into this octree. 62 Figure 7.1: Experiment result of the synthetic data registration. A picture of surface points from two different views registered to the global coordinate system. 7.1 Synthetic Range Image Test for Registration We tested our range based 3D-3D registration algorithm on a pair of synthetic range images created from a polygonal geometric model orthogonally mapped on the x — y plane. T h e second range image is generated from the same model transformed by a rotation vector of (—5.0,15.0,5.0) (rotation parameters are i n degrees) followed by a translation of 5 percent of its boundaing box size along a l l three major axes. B o t h original range data sets consist about 17,000 points. T h e mergered partial model is shown i n Figure 7.1. Here, we use different colors to distinguish between 63 pixels coming from different scans. T h e z-fighting of the two scans indicates a good alignment. For the computation of per-pixel difference, b o t h implementations using hardware acceleration and calculating in software directly have been tested and compared. To demonstrate the algorithm's ability to converge, we adopted various starting guesses which are a l l not very close to the true value. T h e standard deviation for a l l the six rigid motion papameters have been calculated, among which the standard deviations for rotations are less than 0.5 degree a n d for translations are less than 2 percent of the bounding box size of the subject. Considering the window size used for calculations, the results are satisfying. F r o m these experiments, we can make the following observations: (1) T h e algorithm is capable of aligning range images accuratly. (2) T h e fully hardware accelerated implementation has a lower accuracy relative to the implementation that computs the per-pixel difference i n software. T h i s is because the G P U we use has only eight bits for each color chanel-the most recent generation of graphics hardware should not have these problems. (3) T h e fully hardware accelerated implementation is faster althought there is not too much difference. A s mentioned above, the starting position for this test is not close to the true value, the registration would be faster for better i n i t i a l guesses. 7.2 Real Model Acquisition Result We applied the model acquisition system to a textured ceramics m o d e l (Figure 7.2. D u r i n g the acquisition process, the subject is moved slowly relative to the Digiclops, the newly scanned data set is registered to the partial model, and the updated model is rendered to the screen at the same time. For better control without occluding the object, we actually put the subject onto a plate and rotate the plate instead of the subject itself or the camera. After it is rotated for a whole circle, a l l the side views have been captured, the subject is then rotated about the axis that is roughly normal to the previous axis to get the data at the bottom of the model which can 64 not be acquired by the circular scanning. B y watching the immediately updated model, the user can find holes where no data has been acquired, and do more scans of that area to fill the holes. T h e movements have to be slow enough that successive scans overlap each other i n a not too small area so that the registration result from previous views can be used as the starting point for the registration of the current view. Figure 7.2: A picture of the ceramics model unicorn. 65 Figure 7.3 shows several results of the 3D model acquired from this subject. T h e top image shows the result for a voxel grid of resolution 100 x 100 x 100, while the bottom left image has been generated w i t h a resolution of 200 x 200 x 200. T h e bottome right one includes the global optimization step. A s can been seen, the stereo algorithm combined w i t h our registration method works quite well. There are, however, problems w i t h very sharp features, such as the horn of the unicore. Here the stereo data is very sparse, so that it is hard to capture the horn completely. In addition, the sparseness of the data also causes problems w i t h the alignment. 66 Figure 7.3: Results of the real world object model acquisition. B o t t o m two: a low (left) and a high (right) resolution results without global optimization. Top one: high resolution model w i t h global optimization. 67 Chapter 8 Conclusions and Future Work In summary, we have presented a 3D geometric model acquisition pipeline which consists of three modules: a range scanning module, a 3D-3D registration module and a model integration module. The system is aimed at an interactive working mode, so that efficiency is the major concern for each of the three modules. For the range scanning, we choose the commercialized stereo system, the Digiclops, as the range finder, which works i n real-time. For the 3D-3D registration process, we adopt a depth basesd I C P algorithm, which takes advantages of the modern graphics hardware for geometric computations and thus runs faster than common I C P variants. T h e fast converging downhill simplex method is used for numerical optimization. A n important contribution of this system is a hardware accelerated approach which combines two images into one and thus greatly reduces the necessary amount of memory access for the demanding computation task of surface consistency evaluation. For the model integration module, we use a volumatric model which is fast to add new data on and fast to render as well. T h e major effort, however, has been i n the acceleration of the 3D-3D registration process using graphics hardware. T h e results that we obtained using our system are quite encouraging, i n the future, we would like to further improve the quality of the final model, for example by performing a better color filtering to reduce shading irregulations. We also hope 68 to make use of a multi-processor machine to improve the performance to about 5 frames per second. At that point, the initial guess for the new positions will be more accurate (due to faster update rates), so that the increase in performace will also manifest itself in an increased precision. 69 Bibliography [Bes88] P. Besl. Active optical range imaging sensors. Machine Vision and Applications, 1, 1988. [BL93] G . Blais and M . D . Levine. Registering multiview range data to create 3d computer objects. Technical Report T R - C I M - 9 3 - 1 6 , Center for Intelligent Machines, M c G i l l University, Montheal, Quebec, Canada, November 1993. [BL95] G . Blais and M . Levine. Registering multiview range data to create 3d computer objects. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 17(8), 1995. [BL96] B.Curless and M . Levoy. A volumetric method for building complex models from range images. In Proceedings of SIGGRAPH, [BM92] 1996. P . Besl and N . M c K a y . A method for registration of 3-d shapes. Transactions on Pattern Analysis and Machine Intelligence IEEE (PAMI), 14(2):239-256, 1992. [BS97] R . Benjemaa and F . Schmitt. Fast global registraion of 3d sampled surfaces using a multi-z-buffer echnique. In Proceedings of 3-D Digital Imaging and Modeling, 1997. [CHC98] C . C h e n , Y . H u n g , and J . Cheng. A fast automatic method for registration of partially-overlapping range images. In Proceedings of 3-D Digital Imaging and Modeling, 1998. [CHC99] C . C h e n , Y . H u n g , and J . Chen. Ransac-based darces: A new ap- proach to fast automatic registration of partially overlapping range i m ages. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(11):1229-1234, 1999. [Chu96] C S . Chua. 3d free-form surface registration and object recognition. Internation Journal of Computer Vision, 17:77-99, 1996. 70 [CM91] Y . C h e n and G . Medioni. Object modeling by registration of multi- ple range images. In Proceedings of IEEE Conference on Robotics and Automation, 1991. [Cur97] B r i a n Curless. New Methods for Surface Reconstruction from Range Images. P h D thesis, Stanford University, 1997. [Cyb] [DWJ97] Cyberware. http://www.cyberware.com. C . D o r a i , J . Weng, a n d A . J a i n . O p t i m a l registration o f object views using range data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(10), 1997. [DWJ98] C . Dorai, J . Weng, and A . Jain. Registration a n d integration of multiple object views for 3d model construction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 1998. [ea95] K . Higuchi et a l . B u i l d i n g 3-d models from unregistered range images. Graphics Models and Image Processing, 57(4):315-333, 1995. [ELF97] D . W . Eggert, A . Lorusso, and R . B . Fisher. E s t i m a t i n g 3-d rigid b o d y transformations: A comparison of four major algorithms. Machine Vision and Application, 9(5/6), 1997. [FH86] O . Faugeras a n d M . Hebert. cating of 3-d objects. T h e representation, recognition, a n d lo- International Journal of Robotic Research, 5(3), 1986. [FHM+83] O . Faugeras, B . Hotz, H . M a t h i e u , Z . Zhang T . V i e v i l l e , P . F u a , E . Theron, L . M o l l , G.{ Berry, P. Vuillemin, a n d C . Proy. R e a l time correlation-based stereo: A l g o r i t h m , implementations and applications. Technical Report RR-2013, I N R I A , 1983. [GRB94] G . G o d i n , M . R i o u x , and R . Baribeau. Three-dimensional registration using range and intensity information. In Proceedings of SPIE: Videometric III, volume 2350, 1994. [GTK92] A . Gruss, S. Tada, and T . Kanade. A vlsi smart sensor for fast range imaging. In Proceedings of IEEE International Conference on Intelligent Robots and Systems, 1992. [HDD+92] H . Hoppe, T . DeRose, T . Duchamp, J . M c D o n a l d , a n d W . Stuetzle. Surface reconstruction from unorganized points. I n Proceedings of SIGGRAPH, 1992. 71 [HRG97] G . T a u b i n H . Rushmeier and A . Gueziec. A p p l y i n g shape from lighting variation to b u m p map capture. I n Proceedings of Eurographics Rendering Workshop, 1997. [HubOl] D . F . Huber. A u t o m a t i c 3d modeling using range images obtained from unknown viewpoints. In Proceedings of the Third International Conference on 3-D Digital Imaging and Modeling, pages 153-160, M a y 2001. [Inc] 3 D V Systems Inc. http://www.3dvsystems.com/. [JH97] A . Johnson and M . Hebert. Surface registration by matching oriented points. In Proceedings of 3-D Digital Imaging and Modeling, 1997. [JK97] A . Johnson and S. K a n g . Registration and integration of textured 3-d data. In Proceedings of 3-D Digital Imaging and Modeling, 1997. [Kil02] M . J . K i l g a r d . Graphics hardware functionality for geometric computations w i t h opengl. Circa, 99, 2002. [LHS00] H . Lensch, W . Heidrich, and H . - P . Seidel. A u t o m a t e d texture registration and stitching for real world models. In Proceedings of Pacific Graphics '00, October 2000. [LHS01] H . Lensch, W . Heidrich, and H . - P . Seidel. Silhouette-based algorithm for texture registration and stitching. Graphical Models, 63(4):245-262, J u l y 2001. [LPC+00] M . Levoy, K . P u l l i , B . Curless, S. Rusinkiewicz, D . Koller, L . Pereira, M . Ginzton, S. Anderson, J . Davis, J . Ginsberg, J . Shade, and D . Fulk. T h e D i g i t a l Michelangelo Project: 3D Scanning of Large Statues. In Proceedings of Siggraph, pages 131-144, J u l y 2000. [LRG00] U . Labsik, S. R o m a n , and G . Greiner. D e p t h based registration of freeform surfaces. In Proceedings of the 2000 Conference on Vision and Modeling and Visualization, pages 121-128, November 2000. [1W85] M . levoy and T . W h i t t e d . T h e use of points as display primitives. Technical Report T R 85-022, 1985. [Mat97] Y . Matsumoto. A portable three-dimensional digitizer. In Proceedings of 3-D Digital Imaging and Modeling, 1997. 72 [MB93] J . Maver and R . Bajcsy. Occlusions as a guide for planning the next view. IEEE Transactions on Pattern Analysis and Machine intelligence, 15(5), 1993. [MBR+00] W . Matusik, C . Buehler, R . Raskar, S. Gortler, a n d L . M c M i l l a n . Imagebased visul hulls. In Proceedings of SIGGRAPH, [MSY96] T . Masuda, K . Sakaue, and N . Yokoya. 2000. Registration and integration of multiple range images for 3-d model construction. I n Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996. [NAK98] M . B e r n N . A m e n t a and M . Kamvysselis. A new voronoi-based surface reconstruction algorithm. In Proceedings of SIGGRAPH, [Neu97] 1998. P. Neugebauer. Geometrical cloning of 3d objects v i a simultaneous registration of multiple range images. In Proceedings of International Conference on Shape Modeling and Applications, M a r c h 1997. [NK99] P . J . Neugebauer and K . K l e i n . Texturing 3d models of real world objects from multiple unregistered photographic views. Computer Graphics Forum, 18(3):245-256, September 1999. [NVI02] N V I D I A Corporation. NVIDIA OpenGL Extension Specifications, 2002. Available from h t t p : / / w w w . n v i d i a . c o m . [PoiOla] P o i n t G r e y Research. DIGICLOPS Installation Guide and Camera Con- trol API Command Reference, 2001. [PoiOlb] P o i n t G r e y Research. TRICLOPS Stereo Vision Software Development Kit (SDK), 2001. [PTVF94] W . Press, S. Teukolsky, W . Vetterling, and N . Flannery. Numerical recipes in C : the art of scientific computing. Cambridge U n i v . Press, 2nd ed. edition, 1994. [Pul99a] K . P u l l i . M u l t i v i e w registration for large data sets. In Proceedings of 3-D Digital Imaging and Modeling, 1999. [Pul99b] K . Pulli. Surface Reconstruction and Display from Range and Color Data. P h D thesis, University of Washington, 1999. 73 [RCM+01] C . Rocchini, R Cignoni, C . M o n t a n i , R P i n g i , and R . Scopigno. A low cost 3d scanner based on structured light. In Proceedings of Eurographics, September 2001. [Res] Point Grey Research, http://www.ptgrey.com/. [RHHL02] S. Rusinkiewicz, O . Hall-Holt, and M . Levoy. Real-time 3d model acquisition. In Proceedings of Siggraph, pages 438-446, J u l y 2002. [RL01] S. Rusinkiewicz and M . Levoy. Efficient variants of the icp algorithm. In Proceedings of Third International Conference on 3D Digital Imaging and Modeling, 2001. [SA98] I. Stamos and P. A l l e n . Interactive sensor planning; In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1998. [Sim96] D . Simon. Fast and Accurate Shape-Based Registration. P h D thesis, Carnegie M e l l o n university, 1996. [SKSI90] Y . Sakaguchi, H . K a t o , K . Sato, and S. Inokuchi. Generatio of 3-d models based on image fusion of range data, November 1990. [SKSI91] Y . Sakaguchi, H . K a t o , K . Sato, and S. Inokuchi. A c q u i s i t i o n of entire surface data based on fusion of range data. IEEE Transactions, 74(10):3417-3422, October 1991. [SM92] F . Stein and G . Medioni. Structual indexing: Efficient 3-d object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):125-145, 1992. [SNN96] M . Watanabe S. Nayar and M . Noguchi. Real-time focus range sensors. IEEE Transactions on Machine Pattern Analysis and Machine Intelligence, 18(12), 1996. [TL94] G . Turk and M . L e v o y Y . Zipped polygon meshes from range images. In Proceedings of SIGGRAPH, [VA86] 1994. B . C . Vemuri and J . K . Aggarwal. 3-d model construction from multiple views using range and intensity data. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 435-437, 1986. 74 [Wei97] S. Weik. Registration of 3-d partial surface models using luminance and depth information. In Proceedings of 3-D Digital Imaging and Modeling, 1997. 75
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- An interactive 3D geometric model acquisition and registration...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
An interactive 3D geometric model acquisition and registration system Liu, Yushuang 2003
pdf
Page Metadata
Item Metadata
Title | An interactive 3D geometric model acquisition and registration system |
Creator |
Liu, Yushuang |
Date Issued | 2003 |
Description | The acquisition of geometric information from real world objects has now become a major way of modeling complex scenes and environments. Unfortunately, most optical methods for geometric model acquisition require the combination of partial information from different view points, in order to obtain a single, coherent model. This, in turn, requires the registration of partial models into a common coordinate frame, a process that is usually done offline. As a consequence, holes due to undersampling and missing information often cannot be detected until after the registration process. This thesis tries to solve the 3D registration problem using graphics hardware to achieve an interactive speed, so that online next-best-view planning and hole detection during the scanning process can be conducted. The system implemented realizes effective model reconstruction and refinement by interactively using a stereo camera setup and a hardware accelerated depth-map based registration algorithm. The emphasis is put on the hardware accelerated 3D range registration algorithm which can be categorized as a variant of the well known Iterated Closest Point (ICP) algorithm. This work contributes to the field of 3D geometric model acquisition in that it proposes a fast method that leverages the function and structure of modern graphics hardwares. Technically, graphics hardware is used i n two operations. The first is to compute a rigid transformation using modern graphics rendering pipeline. The second is to combine two rendered depth images to compute quickly the absolute difference of z-values of each projected points in the overlap region in the underlying two depth images, where the computation is used to obtain an error metric for a specific candidate transformation during numerical iteration. This error metric is based on depth images which are rendered by point-based rasterization. This system currently performs roughly one registration per second, and is therefore fast enough for on-the-fly evaluation by the user. Given more time, the same method is also capable of producing full geometric models at higher quality. |
Extent | 5010896 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-10-24 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051696 |
URI | http://hdl.handle.net/2429/14176 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2003-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2003-0265.pdf [ 4.78MB ]
- Metadata
- JSON: 831-1.0051696.json
- JSON-LD: 831-1.0051696-ld.json
- RDF/XML (Pretty): 831-1.0051696-rdf.xml
- RDF/JSON: 831-1.0051696-rdf.json
- Turtle: 831-1.0051696-turtle.txt
- N-Triples: 831-1.0051696-rdf-ntriples.txt
- Original Record: 831-1.0051696-source.json
- Full Text
- 831-1.0051696-fulltext.txt
- Citation
- 831-1.0051696.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051696/manifest