PHOTOMETRIC STEREO VIA LOCALITY SENSITIVE HIGH-DIMENSION HASHING by LIN ZHONG B . S c , Zhejiang University, 2002 A THESIS S U B M I T T E D IN PARTIAL F U L F I L L M E N T OF THE REQUIREMENTS FOR T H E D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES ( C O M P U T E R SCIENCE) T H E UNIVERSITY OF BRITISH C O L U M B I A A p r i l 2005 © L i n Zhong, 2005 Abstract Photometric stereo is a shape-from-shading method for recovering threedimensional surface orientation information from two-dimensional images w i t h differing illumination but the same viewing geometry. According to the former photometric stereo methods, the inconvenience of calibration and the cost of searching for the gradient of the best match between reference images' brightness and target images' brightness remain as the major problems in this area. Lately, the increasing interest i n geometry reconstruction by using photometric stereo has led to a new method for improving the original photometric stereo method. T h i s approach is based on the assumption that two points with the same surface property should reflect the same light and show the same brightness. T h i s new method largely simplifies the traditional calibration experiment by getting the reference object's and target object's information at the same time. Moreover, this new convenient method provides greater opportunity for photometric stereo to be applied to practical realtime robot vision. In this thesis, we extend the new photometric stereo method of Hertzmenn & Seitz that uses multiple images of an object together w i t h a calibration object. For each point i n the registered collection of images, we have a large number of brightness values. Photometric stereo finds a matching collection of brightness values from the calibration object and overdetermines the surface normal. W i t h a large number of images i n high dimensions, finding similar brightnesses becomes costly. To speed up the search, we apply locality sensitive high dimensional hashing ( L S H ) to compute the irregular target object's surface orientation. T h e experimental results of a simplified photometric stereo experiment show consistent results i n surface orientation. L S H can be implemented very efficiently; and it offers the possibility of practical photometric stereo computation w i t h a large number of images. Finally, we present the idea that L S H should be applied in a variety of computer vision areas. Contents Abstract ii Contents iii List of Figures v Acknowledgements 1 Introduction 1.1 2 1 Photometric Stereo 2 6 Background 2.1. 2.2 Shape from Shading and Reflectance M a p 7 2.1.1 Diffuse Reflection 7 2.1.2 Specular Reflection 8 2.1.3 Phong Model 9 2.3 9 Photometric Stereo 2.2.1 3 vii Surface Curvature 11 Non-Lambert.ian Photometric Stereo Modeling Implementation and Experiments 12 14 3.1 Experiment Setting 15 3.2 Photometric Stereo from High-Dimension Images 16 iii . 3.2.1 20 3.3 Match From Reference linages for Target Images 22 3.4 High-Dimension Locality Sensitive Hashing 23 3.4.1 Notation 23 3.4.2 The Algorithm 25 3.5 4 Least Squares Fitting of Ellipse Experimental Results 31 Conclusion 51 Bibliography 53 iv List of Figures 3.1 Video images with both the reference and target objects 15 3.2 T h e L S H hash tables structure 27 3.3 Initialization of high-dimensional hash table 29 3.4 Approximate Nearest Neighbor query answering algorithm 29 3.5 Three images of ball under different illustrations 32 3.6 T h e boundary contour of the reference ball 32 3.7 Three images of doll face under different illustrations 33 3.8 The gradient estimation of the ceramic doll face 33 3.9 Gradient estimation of the doll face from different angles 34 3.10 Eight images of a billiard ball under a. hand-held moving light, source 35 3.11 The boundary contour of the reference billiard ball 35 3 . 1 2 Eight input images of the target elephant. 36 3.13 T h e gradient estimation of the target, red elephant 37 3 . 1 4 Gradient Aspect and Needlemap estimation of elephant's side . . . . 38 3 . 1 5 Gradient slope map2 39 3 . 1 6 Aspect and needlemap of front side of elephant estimation 40 3.17 Gradient slope map3 41 3.18 Back side of elephant's aspect and needlemap estimation 42 3.19 T h e gradient estimation for a painted bottle 44 3.20 The 3D reconstructed model of the green bottle 45 3.21 Convergence performance with varying image numbers 46 v 3.22 Convergence performance with varying the number of hash tables . . 47 3.23 T h e cosine of the angle between 49 3.24 Comparison between bottle reconstructed and laser scanned 50 vi Acknowledgements I would like to thank my supervisor D r . James J . Little and D r . Robert W o o d h a m for all their guidance, patience and leadership. W i t h o u t their effort, this thesis would not have been possible. Thanks to D r . D a v i d Lowe for giving me suggestions and being my thesis's second reader. for providing me w i t h image data. I am also grateful to D r . Steven M . Steitz I thank my family and my friends, who have encouraged and supported me in my work. LIN ZHONG THE UNIVERSITY OF BRITISH COLUMBIA April 2005 vii Chapter 1 Introduction V i s i o n is our most powerful but also most complicated sense. A person sees seamless images through the vision apparatus of the eyes and the brain, which is the amazing function we call vision. H u m a n vision receives seamless images, including the details of shape, colour, position and motion, without any noticeable breaks while they are being updated. In other words, people perceive a coherent three-dimensional world with its invariant properties. These properties, i n some way, are sensed by human directly. Basically, the changes of brightness and colour are the most important information for human eyes. H u m a n vision can also immediately distinguish the difference among objects based upon previous experience. However, there also exist some limitations for human vision, such as change blindness, limited spectral sensitivity. misconception and Change blindness means that human vision adapts w i t h difficulty large changes in a scene and misunderstand real objects. It often happens as the result of eye movements, image flips, movie cuts and other disturbances. Misconception means that human vision is influenced by the shape of an object. Limited spectral sensitivity refers to the human eye's limited wavelength range, which ranges only from 400nm to 700nm. As for computer vision, received information consists of mathematically characterized two-dimensional images. T h e explicit challenge of computer vision is how 1 to acquire a three-dimensional description from two-dimensional images. Moreover, how to robustly produce the result is another considerations. Compared to human vision, human judgment is difficult to quantify for computer vision. For example, what information about scenes can be extracted from an image using only basic assumptions about physics and optics? W h a t kinds of computations are required to be performed? How are real-world models and knowledge represented and used? These are the key questions of computer vision. For photometric stereo, how to solve the above questions would help photometric stereo get the 3D information from the 2D images correctly and efficiently. T h e information we can get directly from 2D images are the brightness and its distribution. Brightness is determined by the light source's radiance, the object's geometry, the object's irradiance factor and also the camera's performance. Physical-based computer vision pays attention to the relationship between geometrical structures and radiometry. Recovering the three-dimensional object's shape from a two-dimensional image is an important problem — shape from shading — i n computer vision. In terms of the computation of shape from shading, the method of using multiple 2D images, obtained from the same viewpoint but under different illumination conditions, has been presented as an effective way to obtain additional local radiometric constraints. T h e basic principle of this method is to use multi- ple images to overdetermine the solution locally; further, this method strengthens robustness by permitting local validation of the radiometric models. 1.1 Photometric Stereo T h e photometric stereo method is a shape-from-shading technique to recover the orientation of surface patches from a number of images taken under different lighting conditions [14], which has been implemented i n different experimental environments and has also produced consistently satisfactory results. T h e traditional photometric stereo method is based on physical-based modelling, which follows the principles 2 of optics. It works well for a regular object's computation; however, when it encounters complicated irregular objects, it fails to build up their physical models. T h e traditional photometric stereo technique is not enough in such cases. Visual scientists often use a system of units (photometric units) that scale the physical power (radiometric units) by matching the spectral sensitivity of the eye. Optics determines that the surface normal vector relates the object's geometry to its image's irradiance. In an orthographic projection, the viewing direction and hence the phase angle is constant for all the surface elements. Thus, for a fixed light source and geometry, the ratio of scene radiance to scene irradiance depends on the surface normal vector. B u t the equation cannot be inverted locally because image brightness provide only one measurement, whereas surface normal vector has 2 degrees of freedom ( D O F ) . Under this circumstance, W o o d h a m [27, 28] developed the photometric stereo method by using more than two local brightness measurement i n order to overdetermine the 2 D O F of the surface orientation. In the image irradiance equation, the formulation assumes that each surface element receives illumination only directly from the light source (s). T h i s is correct when scenes consist of a single convex object. Actually, the surface radiance is from the light source, but it is impacted by other factors, such as shadows, inter-reflections and other non-local radiance effects. In our work, we ignore those factors and assume the object's surface is made of the same material. Surface shading provides information of the surface geometry. However, if illumination, camera geometry, or reflectance varies the appearance of the object would be different. A s a result, without knowing the way to build the model — the relationship of the contribution of each factor — it is very difficult to compute the shape. To address this problem, we introduce the theory that two points with the same surface orientation reflect the same light toward the viewer. T h i s is the basic idea of shape by example, i n which we search the same appearance points and assume their surface normals are identical. 3 T h e motivation of this thesis is to overcome the limitation of high computational demands i n example-based photometric stereo when large numbers of images are used. Under the assumption that two points w i t h the same surface property should have the same appearance i n an image, we adapted the method of [11] by placing the reference object and target object i n the same image, which means that calibration of the camera or lighting environment is unnecessary i n this case. T h e reason for using this method is to deal with cases when the target object is hard to move. In that situation, instead of using the former calibration method [28], putting the reference object beside the target object would allow for flexibility for the photometric stereo. Following this setup, we acquired multiple images for both objects. Each pixel on the object has d brightness values from d images. T h e n the problem becomes finding the best match between the brightness vector of the reference object and the brightness vector of the target object; this means searching in a high-dimensional space. Some techniques have already been applied for these purposes. T h e first method is the lookup table [28]; the total size of the lookup table is (2 ), which depends on the dimension d and the number of bits b into which brightbd ness is quantized. T h i s is prohibitively large when d — the number of the images — is large. Also the table is sparse, and time-consuming weighted interpolation is also needed to fill the empty entries i n the table. Hertzmann & Seitz [11] used the A N N (approximate nearest neighbour) method to solve the problem of matching. It solves the problem well, but when the dimension exceeds 10 — 20 [10], it is also highly time consuming. However it depends on the degree of accuracy required. T h e prohibitive computation means the photometric stereo technique is not available for real-time applications. T h e practical benefits of real-time implementation are the ability to integrate photometric stereo and motion, and to replace the harmful laser range sensor for medical shape detection. Recently, a new method, locality sensitive hashing [10], has been introduced in database applications, usually in the context of high-dimensional similarity searching. It keeps the speed of hashing, and adds i n 4 the property of clustering similar points together. We implemented a robust system analyzing high dimensional images. O u r approach simplifies the traditional shape from shading experiment by avoiding calibration as in [28]; moreover, it can rapidly and accurately estimate the surface orientation of irregular objects from a large database of example images v i a LSH. We applied the algorithm to several examples: Lambertian and non-Lambertian cases, multiple reference objects, different surface materials. We also compared the results w i t h the exhaustive search method and the laser scanned images to show the accuracy of the method. Moreover, i n this thesis, we analyzed computational complexity to illustrate the efficiency of this algorithm. In this thesis, we described the method in the following order. 2 introduced related work in computer vision. Chapter Chapter 3 demonstrated the ex- periment's details, including the environmental setting, and explained the speedup technique used in this thesis. T h e final chapter provided the results of our method and conclusions. 5 Chapter 2 Background Given a fixed viewpoint and a certain illumination, objects w i t h various surface materials, following optic laws, will each have a distinctive appearance. A l l of these constraints determine the colour and intensity of the image of an object. T h e local reflectance character depends not only on the irregular geometry of the object but also on the reflectance surface material of the objects. We also need to consider the inter-reflections and casting shadows. Moreover, the wavelength of the light must also be taken into consideration for the reflectance model building. How to model all those factors to build up a general reflectance model exactly is very complicated and difficult. However, research on reflectance models is still ongoing. Book [26] collects quite a few introductions to physical modeling. After thinking about these physical constraints on reflectance, H o r n [12] first made a great observation that image irradiance can be expressed as a function that only depends on surface orientation, which is suitable for many practical image applications. Horn's research introduces the idea of the reflectance map and points out a new formulation for the problem of shape from shading. T h e reflectance map makes explicit the relationship between surface orientation and image brightness. It is a representational tool used i n developing methods for recovering surface shape from images. 6 2.1 Shape from Shading and Reflectance M a p Shape from shading is a method for determining the shape of a surface from its image. Following standard geometry, the surface is represented as z = f(x, y) i n a left-handed Euclidean coordinate system, when the viewer looks i n the positive Z direction. T h e image projection is orthographic and i n the X F - p l a n e . Thus, the surface gradient (p, q) is denoted as df(x,y) P = ^ x - ' df(x,y) q Z = ^ y — the surface normal vector of that point (x,y) is \p,q, — 1]. We could also simply use [p, q] to represent the surface orientation. T h e image irradiance equation is E(x,y) = R(p,q), where E{x,y) is the image irradiance and R(p,q) is the reflectance map [14, 16]. T h e reflectance map determines the image brightness as a function of the surface orientation. T h e n the shape-from-shading problem is defined as computing for a smooth surface z = f(x,y). For a single image, shape-from-shading problems are typically solved by exploitation of a priori constraints on the reflectance map R(p, q) [5, 15]. In the following we will discuss the different reflectance map models. 2.1.1 Diffuse Reflection W h e n the surface is microscopically rough, the light rays will reflect and diffuse i n many different directions. Reflection from rough surfaces such as clothing, paper, and asphalt roadways leads to a type of reflection known as diffuse reflection. L a m bertian reflection is the ideal case of diffuse reflection. For the ideal Lambertian material surface, which is illuminated by a distant light source, appears equally bright from all viewing directions and reflects all incident light, absorbing none. In Lambertian reflection, the image irradiance is proportional to the cos(0j), where the Oi is the incident light angle. Consider a source of radiance E illuminating 7 a Lambertian surface. T h e scene radiance is L = - £ c o s 6 > j , f o r 0* > 0, 7T Taking dot products of the corresponding unit vectors, we obtain cos Oi — 1+ . pp + qq s ._ s A = T h i s gives us a good idea of how brightness depends on surface orientation. T h i s result is called the reflectance map, denoted by R(p,q) [14]. T h e reflectance map depends on the properties of the object's surface material and the distribution of light sources. T h e radiance cannot be negative, so we should, strictly speaking, impose the restriction 0 < Oi < n/2. T h e radiance will be zero for values of Oi outside this range. p/ N [ P , q ) 1 + PsV + QsQ Jl+p>+q'Wl+Pi+<l'i For other diffuse reflection cases, the radiance will also depend on the Albedo — the material's bidirectional reflectance factor. It varies spatially, independent of the gradient. 2.1.2 Specular Reflection Reflection off smooth surfaces such as mirrors or a calm body of water leads to a kind of reflection known as specular reflection. Each individual light ray of the bundle follows the law of reflection. If the bundle of light rays is incident upon a smooth surface, then the light rays reflect and remain concentrated i n a bundle upon leaving the surface. Specular surface reflects all incident light i n a direction that lies in the same plane as the incident ray and the surface normal. Specular reflection is also a function of the light incidence angle 0. If 0 = 0 ° , there is almost no specular reflectance. W h e n 0 = 80°, specular reflectance is high. A full specular reflectance function is the Bi-directional Reflectance Distribution Function ( B R D F ) . For many materials the specular coefficient is approximately constant. Helmholtz Reciprocity 8 is one of B R D F ' s important properties: fr(&i,Qo) = fr(6o,6i) where #j is the incident angle and the 9 is the output angle. T h e equation illustrates 0 that the reflectance of the material is not one-way street; its incoming to outgoing pathway is the same as its outgoing to income pathway. 2.1.3 Phong Model The P h o n g reflection is another type of reflectance; it is a combination of the diffuse and specular illumination. T h i s is an empirical model based not on physics but on physical observation. Its image irradiance is proportional to [13] . R{Pt9) cos0i[(l-a)+acos (s/2) n = ^oWm where Qi is the incident angle, (f> is the phase angle (the angle between the vector pointing to the light source and the vector pointing to the viewer), and s is the off-specular angle (the angle between the vector pointing to the viewer and the vector that defines, relative to the light-source direction and the surface normal, the direction of perfect specular reflection), a is a fraction, 0 < a < 1. T h e factor a adjusts how much of the incident light is reflected specularly, and n is a number that models how compact the specular patch is comparing w i t h the perfect specular reflection, i.e., n would be very large, like n > 200, if the surface is specular. W h e n n is small, the surface is duller. Parameters a and n vary w i t h respect to the material's properties. 2.2 Photometric Stereo The photometric stereo method has been developed to recover the orientation of surface patches from a number of images taken under different lighting conditions 9 [24, 27, 17, 28]. W o o d h a m [27, 28] presents the three image irradiance equations: Ei(x,y) = Ri(p,q), E (x,y) = Rz(p,q), E (x,y) = R (p, 2 3 3 q), The three image irradiance equations overdetermine the solution of p, q, at each point of (x,y), because the solution uses the three brightness measurements of E\, E2 and E3 to configure the two parameters p and q. T h e images can only describe the object's 2-D information, and the three image irradiance equations present the way to connect 3-D information with 2D images. B y measuring the [E\, E2, E3] under the three light source situation, we acquire the 3D p and q by computing the gradient of the object. T h e implementation method is simple. It keeps the three light source and camera still, and takes images of the calibration object and the target object at the same position one by one. T h e calibration object of known shape is made of the same material as that of the target object's surface. It uses the triple of measured brightness values \E\, E2, E3} to search for the corresponding gradient (p, q) in a lookup table. T h e method is straightforward but still has some limitations. Assume the image has 2 =256 gray values, then the full size of the 8 lookup table is 2 x 2 x 2 —• 2 8 8 8 24 entries. Besides that, we consider that the per entry is two double type size (for the gradient p and q) to store the gradient value. T h e n the full size is more than 1G. Moreover, the huge table is very sparse, which results in time-consuming interpolation to fill the table. However, the interpolation is no ganrantee that all the table entries are filled. To solve the large table search, Woodham's work [28] accelerates the search by using 2 6 gray value; the other 2 2 value is separated as the three tables choose an index. Thus, by using the lookup table w i t h 2 18 entries, the retrieving achieves a near-real-time throughput at 15 H z . If the surface material's albedo also varies spatially, independent of the gradient, 10 then the image irradiance equation w i l l be [7, 17, 22, 23] Ei(x,y) = p(x,y)Ri(p,q), Ei{x,y) = p(x,y)R (p,q), Ez{x,y) = p(x,y)R (p,q), 2 3 E {x,y) = p{x,y)R (p,q) 4 4 where the p(x, y) is the albedo, a function of (x,y). Therefore, the former image irradiance equation is a special case, where the albedo is the constant (and normalized to 1). For this situation, there are 3 D O F (degrees of freedom) i n the equation, p, q, p(x,y). If the albedo parameter p is constant, then we can simplify the problem as the former; if the albedo varies along w i t h the position of the surface, then we need one more light source of image irradiance equation to over-determine the 3 D O F . Thus, I n addition to a fixed scale factor, the reflectance map gives the dependence of scene radiance on surface orientation. It is often convenient to plot the surface R(p,q) as a function of the gradient(p, q). T h e pg-plane is called gradient space, and every point i n it corresponds to a particular surface orientation. T h e above expression also assumes that any additive offsets, e.g., due to atmospheric scattering, have been removed. 2.2.1 Surface Curvature We use the representation for surface curvature based on the gradient (p, q) [28]. There are 3 D O F to the curvature of a point of a smooth surface. One representation is the 2 x 2 matrix of the second partial derivatives of the surface z — f(x,y). _ d f(x,y) ~ dx 2 P x 2 _ d f(x,y) ~ dxdy 2 , P y _ d f(x,y) ~ dydx 2 , Q x , < l y T h e Hessian matrix of z = f(x, y) is H = = Vx Py Qx % 11 • - _ ~ d f(x,y) Oy ' 2 2 . T h e n the surface curvature matrix is C = (l+p + )-l 2 2 q+ 1 -pq -pq p +1 q H 2 T h e principal curvatures of the matrix C are k\ and k , which are also the two 2 eigenvalues of the matrix C. u\ and u> are eigenvectors of the C, which are also the 2 direction of ki and k . Thus k\, k , u\, and u> together are the four independent 2 2 2 parameters that describe the surface curvature of that the object. Also, Besl and J a i n [4] introduced the idea that the sections of a smooth surface could be segmented into one of eight basic types based on the sign and the zeros of Gaussian and mean curvature. T h e Gaussian curvature K, also called the total curvature, is the product, K — k\k , of the principal curvatures. 2 T h e mean curvature H is the average, H — {k\ + k )/2, of the principal curvatures. T h e surface curvature 2 provides another description for the 3D shape information. 2.3 Non-Lambertian Photometric Stereo Modeling M u c h research is devoted to the recovery of non-Lambertian reflectance parameters. Nayar [22] applied photometric stereo using a "hybrid reflectance model." Tagare and deGigueiredo [25] developed the theory of photometric stereo for the class of m — lobe reflectance maps. K a y & Caelli [20] continued their work by investigating the problem from a practical point of view. T h e y applied nonlinear regression to a large number of input images. Those methods not only recover the surface orientation but also compute reflectance parameters. T h e problem is that their methods requires quite a lot of images, and the algorithms are fairly complicated. Many non-Lambertian surfaces exhibit near-Lambertian behaviour outside their regions of specularity. B y segmenting these surface areas, it is a very attractive option to apply a linear algorithm, which was developed for Lambertian surfaces, to surfaces w i t h non-Lambertian reflectance and treat highlights as deviations from Lambertian 12 Law. T h i s technique was proposed by Coleman and J a i n [7]. T h e above work deals mostly with gray-scale images. Some recent work uses colour images to get more redundant images by separating an R G B colour image into 3 independent images. These methods are called Shape-from-Color. T h e surface should be illuminated by several light sources that are spectrally distinct and whose directions do not lie i n the same plane, which means they are not linearly dependent on each other. T h i s method was applied to the task of face recognition. Christensen [6] introduced the method of colour photometric stereo for surfaces w i t h an arbitrary reflectance. T h e method is a generalization of [28] and also uses look-up tables. T h e disadvantage of this method is that the surface should be either uniformly coloured or its colours should form distinct separable clusters in the colour space, which significantly restricts the choice of acceptable surfaces. Another disadvantage is the need for a preliminary calibration. To overcome these limitations, most subsequent work on photometric stereo turned to using analytic models rather than the empirical models of surface reflectance [2]. Some of these methods yield good results, while real-world materials do not fit these traditional models. Therefore, researchers developed the approach of combining the developed empirical models and the analytic models to simulate the real-world's complicated surface material [21, 30, 1, 11, 29]. T h e y introduced the method of using more than one calibration object, each of them has a weight, while at the same time segmenting the real target object w i t h different reflectance properties. T h e disadvantage of this method is that it requires quite a lot of images and the matching algorithms are pretty inefficient. In this thesis, we show that these major difficulties are overcome w i t h the new technique—high-dimension locality sensitive hashing, w i t h which the photometric stereo method becomes a practical shape recovery application. 13 Chapter 3 Implementation and Experiments We use more than three images to overdetermine the mapping relationship between the geometry and the brightness and so overcome the limitations of the original photometric stereo method. W o o d h a m [28] and H o r n [14] used only three images ( 3 D O F ) to solve the problem of Photometric stereo — computing the 2 D O F s of the gradient p and q. T h i s methodology works well when the reference object and the target object are easy to move to the same spot to simulate the same illumination condition. Also it analyzes the reference object and target object separately, which is not convenient enough for real-time processing. O n the other hand, Hertzmann & Seitz's [11] work, uses multiple images to reconstruct the geometry of the target object by simply putting the calibration object and the target object in the same scene. T h i s method is much more flexible than Woodham's [28] work. However, their geometry is not really accurate as the method's major objective is to determine the geometry's reconstruction, and also the speed is 5 minutes, which is too long. Based on the definition of photometric stereo, and inspired by the novel idea of texture reconstruction, we would like to adapt the methods by putting the calibration object and the target objects in the scene together, using the locality high-dimension 14 hashing technique to accelerate the search for the mapping relationship between the gradient and the corresponding brightness of the objects. 3.1 Experiment Setting T h e reference sphere and the target object have been juxtaposed w i t h a certain distance on the same scene. Their surfaces have been painted w i t h the same red tempera paints to get the Lambertian effect. We set a C a n o n O p t u r a P i N T S C digital video camcorder stationary to facing the objects and moved the light source, a P h i l i p 150W bulb, by hand, 4 meters from the objects. T h e n we recorded a series of video sequence, in which the reference object and target object were shot with same exposure under different light illuminations. We separated the 640 x 480 video into 640 x 480 image frames. Due to the hand-held light source's movement and the digital camera's focus is automatically adjusted, it is necessary to eliminate those blurred images. A n example of the frame is showed in Figure 3.1. After that, we separated the calibration object and the target object from the image and analyzed them. Figure 3.10 and Figure 3.12 present the reference object and the target object under the same light source. (a) image 1 (b) image2 Figure 3.1: Video images w i t h both the reference and target objects 15 3.2 Photometric Stereo from High-Dimension Images We captured many 2D images under different lighting, so that for a particular pixel position for all these images, we got an array of the correspondent brightness values. These brightness values could be treated as a very big number by concatenating them; however, this method would take out the independence of the data to each other. Considering this, we would handle the multiple brightness values as a high dimensional brightness vector. Determining the shape from the shading information of multiple images is the basic idea of photometric stereo. T h e radiometry of the object i n the image depends on illumination, camera geometry, and the object's reflectance factor of the object. A change i n one of them w i l l affect the appearance of the object i n the image. Unless we know all the factors accurately, it is very hard to compute the shape. In this case, we propose a calibration method by addressing the orientation consistency cue [11]: Cue: T w o points with the same material and the same surface orientation reflect identical brightness when they are under the same illumination situation. T h i s cue holds under the following assumption: A s s u m p t i o n : B o t h points have the same B R D F property, the light sources are directional (i.e., distant), the camera is orthographic, and there are no shadows, interreflections and transparency effects, or other non-local effects that do not depend purely on the B R D F . Also, the surface orientation, incident illumination, and viewer direction are the same for both points. The above cue is clear about presenting the distribution of the object's radiometry of the object. If a point has a highlight on an object, the other point w i t h the same surface geometry should have the same highlight too. Therefore, we can apply the cue since we know one point's brightness and its surface orientation, and since another point has the same brightness, we can say that the second point has the same surface orientation as the first point. In this way, we can extend this 16 knowledge to compute the other points' surface orientations. Now, we w i l l present our implementation algorithm for computing the surface orientation of an irregular shape object by using its known shape object as calibration. Also, we introduce an effective technique for accelerating the searching process in real-time. After isolating the reference object and the target object from the captured video frames, and following the basic theory, we present the following notation representation. We use / [ , . . . , I r d to denote the reference images and l\,...,I d to present the target images, where d is the dimension. Further, we make the assumption that the corresponding reference I\ and targets l\ are exposed under the same illumination condition. For each pixel position p(x, y) i n all reference images, let V r p be the brightness vector, which is the set of brightnesses observed at that pixel over n images: •t jt ]T. 1,<J> •••! d,qi 1 i (3.2) For our case, we tried the 4, 8, 16, 32, 64 observation vectors to determine one gradient vector [p, q]. (3-3) T h e idea comes from photometric stereo by multiple images [24]. It is very difficult to straightforwardly set up the relationship between the brightness appearance and the gradient object. Thus, the three image irradiance equations are presented as: Ei(x,y) = q) E (x,y) = R (p,q) E (x,y) = R (p,q) 2 3 17 2 3 (3.4) which use 3 D 0 F brightness to overdetermine the 2 D O F gradient. If the surface material's bidirectional reflectance albedo factor also varies spatially, independent of the gradient, then the image irradiance equations become E\{x,y) p(x,y)Ri(p,q), E {x,y) p{x,y)R (p,q), E (x,y) p{x,y)R (p, E (x,y) p(x,y)R (p,q), 2 3 4 2 3 (3.5) q), 4 T h e n there are 3 D O F , including the albedo and the gradient, which need to be determined. Therefore, we may use 4 D O F brightness to overdetermine this situation by using 4-images irradiation equations. Moreover, although we make the above assumptions, i n the our experiments, we could not get the illumination to be as accurate as the former calibration method [28]. This's the reason why we would rather use more than 4-images irradiance equations to overdetermine the gradient and the albedo. Above all, given a pixel q on the target object, the surface normal at q is determined simply by retrieving a point p on the reference object w i t h the best matching of the brightness vector, which minimizes \\V£ — V^'||. A complete correspondence determines the surface normal for each pixel on the target object. (3.6) It is very novel to realize that this formulation treats the photometric stereo methodology as 2-images stereo matching problem—figure out the pixel by pixel similarity between the image of the target object and the image of the reference object. Woodham[27] addresses the correspondence problem i n binocular stereo should be fundamentally exclusive when solving the photometric stereo, however, under the assumption made i n this paper, the camcorder captured both the reference and target objects at the same time. T h e light illumination is distant to the objects, so that the mapping relationship between the reference object and target object is 18 acceptable. Compared to the model-fitting methods [2, 14, 22, 27, 11] done before, we present an efficient hashing mapping technique for our model. If the reflectance map's parametric form varies, the hashing method would be a good alternative choice. First, let us process the images of the reference object. A s we mentioned in the experimental setting, the images with both reference object and target object are captured by a stationary camcorder. A t the same time, we move the light source by hand 3-4 meters away from the objects to simulate distant illumination. We select d frames from the captured video series. T h e method needs to make sure that the illumination situations are not linearly dependent on each other, and also that the images need to be very clear, for when the illumination changes, the focus of the lens of the digital camcorder will change as well, so that some of the images are not out of focus, which are not as suitable as good samples to serve as the references. T h e captured images are in R G B mode. Instead of processing all the three-color channels for further reference, it is reasonable to analyze only the most sensitive color channel. For our case, the reference object and the target object are both painted w i t h red, flat tempera paint, so red is the most sensitive channel here. After that, we need to preprocess the d images. First, we separate the image with the reference objects and the target objects into two images. Second, we need to isolate the reference object from the background. O u r method is to sum the d reference images, and compute the threshold of the summed image according to its histogram. T h e threshold should be the dividing value to separate the black background and the object on the histogram. Based on that, we can then figure out the equation of the reference sphere object for further computing the geometry of the reference sphere. Here we use the least square fit-ellipse algorithm to get the 2D ellipse equation from the reference sphere's boundary data. Also, the threshold should be applied to the target images to separate them from its background. T h e purpose of computing the 2D equation is to get the 3D ellipse equation. T h e n we 19 can get each pixel of the ellipse's gradient. T h e robust algorithm of getting the fitting ellipse equation will be introduced i n the following section. 3.2.1 Least Squares Fitting of Ellipse Here we adapted a new algorithm for fitting ellipses [8]. It is a direct ellipse specific fitting algorithm, which can also handle bad data very well. T h e ellipse equation can be presented as F(a, x) = a • x = ax + bxy + cy + dx + ey + f = 0 2 (3.7) 2 where a = [a, b, c, d, e, /] and x = \x , xy, y , x, y, 1]. T h e aim here is to determine 2 2 the parameter a. In general, the problem of parameter estimation is often cast as an optimization problem. One of the traditional methods for optimization uses the least squares fitting. T h e other two general techniques are K a l m a n filtering and robust clustering. For the least squares fitting method, the focus is on computing the set of parameters that minimize the distance between the real-data points and the formulated curves and surfaces. Given finite input data set points D = (x,y)i, i 6 TV, which i n this case is the coordination set of the reference object's boundary points, the problem is to fit the ellipse curve equation F(a, x) to D and minimize the distance measure: 1 N — ^2 dist(F(a, .D)) —> minimum (3.8) i=l It is difficult to solve the optimization problem in general, and we do not guarantee there is a solution for it. In this case, we adapted the method by arbitrarily scaling the parameters to get the right equation of the ellipse equality constraint 4ac—b = 1 2 [9] to solve the ellipse fitting problem. T h e n we can cast the problem into a quadratic 20 constraint, which also expresses it i n the matrix form a Ca = 1: 0 0 2 0 0 0 0 -1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (3.9) J T h e n the problem reduces to minimize E = aC T | | D a | | , subject to the constraint 2 = 1 where the matrix D is defined as D = [x\,X2, ...,x ] . T a n B y applied the Lagrange multiplier A and differentiation, the algorithm gets the following system of simultaneous equations 2D Da - 2XCa = 0 (3.10) 1 afCa = 1 (3.11) It is equivalent to the following: where S is the scatter matrix D D. T Sa = XCa (3.12) dfCa (3.13) = 1 T h e n the problem is solved by considering the generalized eigenvectors Sa = XCa. If (Xi,Ui) solves this , where Aj is the eigenvalue and Ui is the eigenvector, then so does (A, fiUi) for any \x. According to a Ca T we can figure out the parameter fi, and we can find the value of fi u[Cui 2 Mi uj Cui ufSui = 1, = 1 giving (3.14) Finally, setting a, = /Zjttj. Follow this algorithm, it yields six eigenvalue-eigenvector pairs (Xi, Ui). Each of them gives a local m i n i m u m if the term under the square root of the solution of mu is positive. S is positive, so the denominator ufSui is also 21 positive for a l l itj. Therefore, the square root exists if Aj > 0, so any solutions for figuring a must have positive eigenvalues. T h e problem of minimizing E = \\Da\\ subject to 4ac — b = 1 generates 2 exactly one solution for a, which has been proved i n [8]. Now, we can transfer the ellipse equation F(a, x) into {x-cx) + \ (y-cy) 2 2 2 =r (3.15) 2 where (cx,cy) is the center of the ellipse. T h e object surface is give i n the format of z = f{x,y) i n a left-handed Euclidean coordinate system, i n which the viewer looks in the direction of the positive Z-axes. T h e equation of the sphere can been written down as (x-cx) + 2 \ (y-cy) + 2 2 z =r 2 2 (3.16) T h e gradient of a given point on the reference sphere can be expressed as P = ~ . 9 = -7 z z (3-17) T h e n the surface normal of that point can be presented as [p, q, — 1]. 3.3 Match From Reference Images for Target Images Above all, the least-squares ellipse-fitting algorithm helps to compute the gradient for each point on the reference object. Now we need to set up the mapping relationship between the gradient and the brightness. Given that the brightness data is more than 3 images, the next step is to think about how to store the high-dimensional data. Also, we should think about what kind of storage is convenient for the next step, that is, searching for the gradient of the irregular object. There are three factors that should be taken into consideration: the table's size, the searching speed, and the hit accuracy. In general, the task of finding the reference object brightness vector closest to the target object's brightness vector is a nearest-neighbor search problem. Some techniques have already been applied for this purpose. T h e first 22 method is the lookup table [28]; it stores the gradient by using the brightness as the index; the total size of the lookup table depends on the dimension of the brightness. If the dimension of the brightness is 3, then the size of the table would be ( 2 ) . d 3 Also the table is sparse, and so time-consuming weighted interpolations are also required to fill the empty entries i n the table. A l t h o u g h the lookup table's initialization is slow, this method is suitable for the search when the dimension is less than 4. However, when the dimensions increase, this method can not do the job well. Hertzmann & Seitz's work, they used the A N N (approximate nearest neighbour) method to solve the problem. Basically, it uses the k — d trees [3] to solve the problem. To some degree, it can solve the problem well, but when the dimension exceeds 10 — 20 [10], it is also highly time consuming. Recently, a new method, locality sensitive hashing [19], has been introduced i n database applications, usually i n the context of high-dimensional similarity searching. It preserves the fast character of hashing, but also adds i n the property of similar points that are clustered together. T h e following section will describe in detail how to apply locality-sensitive hashing in the photometric stereo. 3.4 3.4.1 H i g h - D i m e n s i o n Locality Sensitive H a s h i n g Notation To determine the geometry of the target sphere, we need to compute the target object's surface normal. In other words, we need to figure out the object's gradient. We already have the mapping relationship between the calibration object's brightness and its gradient. For the target object, the job is to figure out its gradient distribution by using the mapping relationship according to its high dimensional brightness vector. T h e main goal is to find the target object's best brightness vector to match the reference object's brightness vector, and use the reference object's gradient as the target object's gradient. T h e n the problem can be posed as a N e a r e s t 23 N e i g h b o r S e a r c h ( N N S ) problem. T h e N N S problem is also defined as e - N N S : Given a set P of points in a normed space l , preprocess P so as to efficiently d return a point p G P for any given query point q, such that d(p, q) < (1 + e)d(q, P), where d(q, P) is the distance of q to the its closest point i n P. For the purpose of devising a main memory algorithm for solving the e — NNS) problem, Indyk & M o t w a n i [19] introduced the Locality Sensitive Hashing ( L S H ) technique for high-dimensional data. In our case, we adapted the original L S H algorithm [10] to search for matches i n the high dimensional brightness vector spaces. T h e basic idea of L S H is to hash the high dimensional points so as to ensure that the probability of collision is much higher for points that are close to each other than for those that are far apart. A hash function is a function that converts input from a large domain into output i n a smaller range. In our case, the input data are the image size brightness vectors p = [x\,..., Xd]; however, there are possible values of p while we may have on the order of 1 0 of the calibration object. 5 2 bd pixels i n the image Therefore, our input data is from a large domain, but the cardinality of the input data is small compared to its domain. Based on that, using hashing to solve the matching problem is reasonable. In the following, we w i l l present the algorithm i n detail. Let if denote the Euclidean space R d under the l vector p is [x\,... Xd}- T h e length of the vector is ( | x i | r p norm, where the point + •. • + \x i\ ) ^ the brightness of a certain pixel in image d', where 1 < d' < d. r 1 r ( • x$ is Furthermore, dr{p,<l) — \\p — q\\ denotes the distance between the points p and q i n if. Also H d presents the d-dimensional H a m m i n g metric space; it could be understood as the space of binary vectors of length d under the standard H a m m i n g matrices. 24 3.4.2 The Algorithm First of all, we illustrate Indyk & Motwani's [19] algorithm in terms of inputs represented as binary vectors using the standard H a m m i n g metric. Let C be the largest coordinate in the whole points set P. T h e n we can embed P into the H a m m i n g cube H d with d! = Cd, by transforming each point p = (xi, ...Xd) into a binary vector. v(p) = U n a r y ( z ) . . . U n a r y ( a ; ) c 1 c d (3.18) where U n a r y . ( x ) is the unary representation of x, i.e., is a sequence of x ones c followed by C — x zeroes. For example, if x is 67, and C is 255, then 67 255-67=188 U n a r y ( x ) = T^~l 0^~0 c ., (3.19) For any pair of points p, q w i t h coordinates i n the set 1...C, di(p,q)=d (v(p),v(q))). H (3.20) Where d\ is the distance under norm l\ and djj is the H a m m i n g distance function. T h a t embedding equation preserves the H a m m i n g distances between the two highdimensional points. T h e H a m m i n g distance between points [78,45,30] and points [82, 56,89] w i t h C — 255 is 4+11+59 — 74. B y doing this, we discover the connection that allows us to express the distance between a pair of high-dimensional points. T h e result also showed that the closest points have the m i n i m u m difference for all the dimensions of the points. Therefore, i n the sequence we can concentrate on solving e - N N S i n the H a m m i n g space H '. d T h e unary representation i n H a m m i n g space provides us w i t h a straightforward framework for the algorithm. However, it is very expensive to convert the data into unary representation when C and dimension d is large. For this reason, we need to work out a modified algorithm and i n time make it run independent of C . Then, we implemented an improved L S H algorithm without converting the high dimensional data to the H a m m i n g space. 25 T h e details of the algorithm are as follows. I is the number of hash tables, and there are I subsets I\... Ii... I\. The dimension of the data is denoted as d. Let p\i denote the projection of vector v(p) = [x\,... ,Xd} on the coordinate set / , and let be the brightness of point p in image d!, where k < d. We compute p|/. by selecting the coordinate positions and applying the hashing function g%(p) to a certain bucket of the hash table Ti, i G I. Denote gi(jp) = p\j r T h r o u g h that, we project the high-dimensional data v(p) to a certain hash number gt(jp). B y the preprocessing, we store each p G P i n the bucket gi(p), for i = 1,..., I, and get I compact hash tables for the next step's nearest neighbour searching. B y computing the projection per Ii, we get the whole projection p\j. A s the total number of buckets per hash table may be large, we compress the buckets by assigning them to a standard fixed chaining number of bucket hashing. Thus, there are two levels of the hashing technique for a certain pixel point p: • A p p l y the L S H function to map the high-dimensional brightness vector p to the bucket gi(p). • Use the standard hash function to map the contents of these buckets into a hash table of size M. The m a x i m u m bucket size of the latter hash table is denoted as B. In the algorithm the hash table is a chained hash table. W h e n the number of points matching a certain hash number exceeds the size B of the bucket, a new bucket is dynamically allocated and linked to and from the old bucket. However, to process the large amount of data, dynamically allocating the space is not an advisable choice here, since the images' size are large, which means the table is large. In our implementation we use the simpler fixed size buckets method for each hash table item rather than chaining. The approach is as follows: if a bucket in a given hashing index is full, a new matching point cannot be added to it, since it will be added to some other index w i t h high probability. In our application, we put the point into the next bucket 26 available for hashing indexing. B y using the fixed bucket size policy, we reduce the link structure's overhead. T h e whole structure is clearer and also convenient for the next step's search. T h e structure of the hash table is presented i n 3.2. B M< V I I t I I I I I Figure 3.2: T h e L S H hash tables structure Each bucket not only stores the high-dimension brightness vector but also keeps its correspondent gradient vector [p, q]. T h e brightness vector in our case is stored as unsigned char set and the 2D gradient vector is saved as two float number. After that, we must discuss how to determine the suitable parameters I, the number of the hash tables and M, the size of the hash tables. T h e number n of points from the reference object's surface, the size M of the hash table, and the maximum bucket size B are related in the following equation: n M = a- (3.21) where a is the memory utilization parameter. Basically, a > 1. It is the ratio of the memory allocated for the index to the size of the data set, and should be taken into consideration when the dimension is very large and the number of buckets per hash table index exceeds the memory. In our application, we set the whole table size to contain all the data set in the hash table as: M = L | j 27 (3.22) It is also very important to choose the I - the number of hash tables. In our experiments, we compared the results from using different choices of /. T h e more hash tables, the more accurate the search would be. However, beyond a certain number, the result does not improve significantly. T h e n there is no reason to waste space and time to do the search i n those redundant hash tables. T h e m a i n goal is to find the threshold for the choice of I, and to get the best matching results. Basically, from the definition of [10], let B(p,r) notate the set of point elements from P within the distance r from p. Also following is the definition of ( n , r , P\, P2) - sensitive 2 as: • if p E B(q,ri) then P \h(a) • if p i B(q,r ) then P #[/i(a) = h(p)} < p2 2 rH = h(p)] > pi r where H is the hashing function family, pi > p and r\ <r . 2 2 T h e n the / is calculated as I= n (3.23) p and log l / p 2 For more detail see [18]. After setting the parameters for the hash table's size and its total number, the hash tables are constructed to store the reference object's brightness vectors, including their corresponding gradient vector. F i g 3.3 shows the initialization procedure for filling the hash tables with the reference object's data. To process a query q, the brightness vector from the irregular target object, we search all indices hash tables g\(q),gi(q) until we encounter at least c • I (c > 1) points matching the current brightness vector. Clearly, the number of accesses is always upper bounded by the number of indices, which is equal to /. Let pi, ...,pt be the points encountered i n the process of searching i n the hash table Tj, by the hashing number index, and we get a fixed number of buckets linked by the index. T h e job 28 A l g o r i t h m Initialization for the hash table Prehash Points set P (the high-dimensional data vectors) and I (number of hash tables), for (i = 1 , . . . , I) Initialize hash table 7$ by generating a random hash function gi(-) for i = 1,... ,1 for (j = 1,... ,n) Insert the point pj to the bucket gi{pj) of the hash table Ti O u t p u t Hash tables Tj, i = 1 , . . . , I Figure 3.3: Initialization of high-dimensional hash table by inserting the points of brightness vectors and their gradient vector. is to choose the best fit - the closest to q from those buckets. T h e n approximate K — NNS is applied to minimize the distance between the query point q and the hit bucket's point. T h e optimal value of K - the number of the buckets per hash index - is chosen to maximize the probability that a point p "close" to q will fall into the same bucket as q, and also to minimize the probability that a point p' far away from q w i l l fall into the same bucket. In general, we may return one best match point from each hash table, and put those best fits from different hash tables together. T h e n the problem remains as a K — NNS problem for these points, and pick up the "closest" point p to q, and use p's gradient vector as query point g's gradient. The query strategy is presented i n F i g 3.4. A l g o r i t h m Approximate Nearest Neighbor Query hashquery ( A set of n query points q from the target object) for (i = 1,..., n) for (hash tables Tj, j — 1 , . . . , I) generated by the hash table initialization algorithm O u t p u t A set S of 1 (or fewer) approximate nearest neighbors according to q R e t u r n the nearest neighbours for each qi found i n set S Figure 3.4: Approximate Nearest Neighbor query answering algorithm The L S H function we used takes advantage of a linear combination of high- 29 dimensional data. B y doing this, we also get the distance between any two points. Those brightness vectors are presented as [x\,..., Xd], and we make the L S H mapping strategy the hashing function i n our implementation as h(xi, ...,x ) = (ai • xi H d \-ad- i r f ) m o d M (3.25) T h e permutation hashing method is good for implementation; how to choose the parameters and make this method random enough is very important at this stage. T h e hashing function should also satisfy the requirement that there be a low probability of collision. Here, we also denote the hash function as h(x) = (ax T m o d M). • M = 2 — m,m is a prime; C • ai is a random number from interval [0, M — 1]. where C is a number to make the M < N—N is the pixel number from reference image, and <ij is sufficiently random to give low probability of collision. For our experiment, we choose C from 8 to 15. B y doing that, the I hash tables have different size of length and different bucket size for each hash index. T h e hashing should be doing well; otherwise the result will not be satisfactory. B y tried some hashing number M and a, the results of our experiment are reasonably accurate. T h e results of our experiments are discussed i n the next section. In general, through all the above steps, we first get the reference object's oneto-one mapping relationship between the brightness vector and the gradient vector. T h e n we store the mapping into the high-dimensional hash tables to preserve the locality similarities between the near neighbour points. For the best fit for the irregular target object's gradient vector, we figure out the best fit from each hash table, then pick up the "closest" point among those best fits and calculate the query point's approximate gradient. A s a result, the irregular target object's surface orientation has been figured out. 30 3.5 Experimental Results In this section we report the results of our experiments w i t h the locality-sensitive hashing method. T h e surface orientation's estimations are evaluated i n three formats as the figure shows: (i) the slope angle of the gradient t a n the aspect of the gradient tan _ 1 (j>/ij); - 1 \J(p + q ); (ii) 2 2 (iii) the needlemap for the surface orientation T h e first experiment's source image is provided by [28]. T h e reference sphere (260 x 310 pixels) (Figure 3.5) and the target doll face (260 x 380 pixels) (Figure 3.7) were made w i t h ceramic material, which approximates the perfect Lambertian situation. Figure 3.6 shows the reference sphere's contour estimation by the leastsquare fitting algorithm and its gradient mapping. B o t h the reference object and the target object images were taken i n a calibration manner; the light illumination for the reference object and the target object is identical. Because of the quality of the lighting, using only two hash tables already estimates the doll's surface orientation very well. Figure 3.8 shows the gradient estimation for the doll face by varying the number of hash tables. Figure 3.9 shows the doll face's orientation estimation from a different viewing point. T h e second experiment uses the example shown i n Figure 3.1: one billiard ball (208 x 185 pixels) (Figure 3.10) and one plastic elephant toy (371 x 271 pixels) (Figure 3.12). After painting them w i t h the same red tempera paint, we put them together, and shot a video with the Sony Camcorder. Figure 3.11 shows the reference ball's gradient estimation. We took the video sequence for the elephant toy from three viewing angles. Figures 3.13(t), 3.15(.i), and 3.17(i) shows the three examples' gradient estimation w i t h varying the input image number and the number of the hash tables. Figures 3.14, 3.16, and 3.18 illustrated the gradient aspect and needlemap evaluation for the three examples. T h e t h i r d experiment's source data is derived from [11]. T h e reference object is also a billiard ball (328 x 322 pixels) 3.19(e), and the target object is a painted bot- 31 (a) Light source 1 (b) Light source 2 (c) Light source 3 Figure 3.5: Three images of ball under different illustration: (a) light source 1; (b) light source 2; (c) light source 3. (a) (b) Figure 3.6: T h e boundary contour of the reference ball and the gradient of the reference ball. tie (398 x 1176) 3.19(a). In this case, the object is an example of a non-Lambertian surface with distinct specular highlights. B o t h the calibration object and the test object appear i n the same image; there are 8 colour images. Figure 3.19 shows its gradient estimation. Also, we reconstructed the green bottle's depth map i n figure 3.20. 32 (a) (b) (c) Figure 3.7: Three images of doll face under different illustrations: (a) light source 1; (b) light source 2; (c) light source 3. (a) 1 hash table (b) 3 hash tables (c) 6 hash tables Figure 3.8: T h e gradient estimation of the ceramic doll face by varying the number of hash tables. Since the input images are acquired from the original photometric stereo method, and they are quite accurate. Therefore,only three hash tables already give a very good estimation of the gradient. T h e estimation used 3 hash talbes is much smoother t h a n the one only used 1 hash table. We could see the difference from the forehead part. 33 Figure 3.9: Gradient estimation of doll face from different angles, including the gradient slope angle and gradient aspect estimation 34 (a) Light source 1 (b) Light source 2 (c) Light source 3 (d) Light source 4 (e) Light source 5 (f) Light source 6 (g) Light source 7 (h) Light source 8 Figure 3.10: Eight images of a billiard ball under a hand-held moving light source (a) (b) (c) (d) Figure 3.11: T h e boundary contour of the reference billiard ball and the gradient of the reference billiard ball 35 (a) Light 1 (b) Light 2 (c) Light 3 (d) Light 4 (e) Light 5 (f) Light 6 (g) Light 7 (h) Light 8 Figure 3.12: Eight input images of the target elephant under the same light source as the billiard ball 36 Figure 3.13: T h e gradient estimation of the target red elephant according to different hash numbers and based on different number sample images. T h e images show the slope angle of the gradient. Row 1 to Row 5 uses 4, 8, 16, 32, 64 images separately. C o l u m n 1 to C o l u m n 4 uses 1, 2, 4, 6 hash tables separately. 37 (a) Gradient aspect of elephant's side (b) Needlemap of elephant's side Figure 3.14: Gradient Aspect and Needlemap estimation of elephant's side 38 (a) 2 hash tables and 16 images (b) 4 hash tables and 16 images H (c) 6 hash tables and 16 images I (d) 2 hash tables and 32 images (e) 4 hash tables and 32 images (f) 6 hash tables and 32 images (g) 2 hash tables and 64 images (h) 4 hash tables and 64 images (i) 6 hash tables and 64 images Figure 3.15: Gradient slope map2: the gradient estimation of the target red elephant according to different hash number and based on different numbers sample images 39 J f 1 (a) Aspect of front side of elephant M< 11* t fr rf 11r / **s***^*zzzzzzz~~z*i- :r»t:ts=^;::::::::-;i::HH::^5^r:»J3 ^ Kfc.\r\ .>. , ,B , ,.,.,,,.<, * • ( . • , * + + ^•*•** ^*** *^* , ********* A / * • • - • ^ . - i f ^^a-t-c** re*-re f i * • t ***** ***n A * **Z-t ** -•-«- - . . » . .....,...„./•• . , . ********** J^S'l' ( T R1 < * • ' > J t (b) Needlemap of front side of elephant Figure 3.16: Aspect and needlemap of front side of elephant estimation 40 (a) 2 hash tables and 16 images (b) 4 hash tables and 16 images (c) 6 hash tables and 16 images (d) 2 hash tables and 32 images (e) 4 hash tables and 32 images (f) 6 hash tables and 32 images (g) 2 hash tables and 64 images (h) 4 hash tables and 64 images (i) 6 hash tables and 64 images Figure 3.17: Gradient slope map3: T h e gradient estimation of the target red elephant according to different hash numbers and based on different number sample images 41 (a) Aspect of back side of elephant (b) Needlemap of back side of elephant Figure 3.18: Back side of elephant's aspect and needlemap estimation 42 We also analyzed the results by varying the number of images d and the number of hashing I. Figure 3.21 — the plastic elephant experiment — shows that the more input images are used, the more accurate the results. T h e resolution of the 640 x 480 camcorder is not high, so we need more images to overdetermine the gradient. T h e results in Figure 3.22 present the change in the average angular difference when we increase the number of hash tables. T h e figure shows that the larger I is, the more accurate the gradient estimation. There is a threshold for the choice of I. In Figure 3.22, the threshold is 6. It means that using 6 hash tables already generate a good result for the gradient estimation. A d d i n g more hash tables does not improve accuracy. For comparison purposes we computed the gradient for the bottle example by brute-force lookup; i.e., we built a table w i t h all the approximately 10,000 pixels in the calibration object. Each entry has a key of 24 x 8 bits. C o m p u t i n g the gradient by direct lookup and finding the best-matching key by linear search in the table takes approximately 4 hours on a 2.4GHz Intel machine w i t h 2 G B of memory. In contrast our L S H version took 5 minutes for the bottle example and 90 seconds for the elephant example w i t h 80 x 8 bit keys. We also computed the point-by-point angular difference between the bottle's normal computed by L S H and the normals computed by brute-force lookup. Figure 3.23 shows an image of the angle's cosine between the normals. In addition, we also compared our results to the laser-scanned results. Figure 3.24 presents the difference between them. We selected several slices of the same position, both from the reconstructed bottle and the laser-scanned model. The reconstructed bottle shows that our result is very close to the laser-scanned model. Another significant performance improvement of our method is speed. T h e table size of the lookup-table of Woodham's method is 2 , where d is the dimension bd of the images and b is the number of quantization levels. T h e size of the L S H hash table is I x N , r where I is the number of the hash tables and 7V is the reference r 43 () a (b) (c) (d) (e) Figure 3.19: T h e gradient estimation for a painted bottle, (a) shows one input image, (b) shows one reference image (c) shows the slope angle of the gradient, (d) shows the aspect of the gradient, (e) shows the needlemap of the surface orientation. 11 X 26 6 I 1 0 10 1 20 1 30 1 1 40 50 Number of images 1 1 1 60 70 80 Figure 3.21: Convergence performance w i t h varying image numbers. T h e vertical axis displays the average angle between the normal computed w i t h the L S H method using the first n images and 6 hash tables, and the normal computed by brute force with 80 images, for the plastic elephant example. 46 1 2 3 4 5 6 Number of hash tables 7 8 9 Figure 3.22: Convergence performance with varying the number of hash tables. T h e vertical axis displays the average angle between the normal computed w i t h the n hash tables and 64 input images, and the normal computed from brute force search from 80 input images for the plastic elephant example. 47 objects' image size exclusive of the background size. T h e search is O(N) problem. Here we assume the pixel numbers in reference image (N ) r and the target image (iV )are similar size N. T h e table size for brute force search method is d x N , where t T d is the input image numbers. T h e computational complexity is 0(N ). 2 matching method, the table size is d x N , r 0(N)0(l/e) d is I x N r For A N N and the computational complexity is [19], which is exponential in d. In our implementation, the table size the total number of operations is (d +1 x B) x N, where I is the number of hash tables and B is the bucket size of the hash items. Thus the computational complexity is 0(N); i.e., the L S H hashing is constant per pixel. Evaluating the hashing function is simple, and the tables can be limited i n size, so the results can be computed rapidly. There is a cost involved i n setting up the hash table, but we envision inspection applications where that cost can be absorbed at the beginning, and only inexpensive hashing operations are done per object. 48 Figure 3.23: T h e cosine of the angle between the gradient computed by L S H and the gradient computed by brute force lookup. T h e image shows values between 1.0 in white and 0.985 (approximately the cosine of 10 degrees) in black. 49 50 Chapter 4 Conclusion In this thesis, we present a new method that uses locality sensitive hashing to rapidly compute the high-dimensional data's best match for photometric stereo. T h e advantage of putting the reference object and target object in the same scene, following Hertzmann & Seitz, is that there is no need to move the reference object and target object one by one to get the same illumination, which largely simplifies the experimental process. T h r o u g h this method, we can only move the easy handle refenrence object around the hard to move target object and get its surface orienation. According to the robustness of the algorithm, we found that low-resolution video sequence input image frames (the toy elephant example) also generated a satisfactory gradient estimation. T h e multiple results show that this method works very accurately for both Lambertian and non-Lambertian cases, and the computation time is much reduced. T h e simplified experiment and the robust searching algorithm broaden the practical applications for photometric stereo. T h e idea is simple but very useful. T h e practical benefit of the fast photometric stereo implementation is that it could be integrated with motion detection. It provides an efficient reconstruction for an object's geometry, which can be an appropriate alternative for laser range sensors. T h e data acquisition process—taking photos by cameras—is not harmful to the 51 patients, and also is cheaper than the laser range sensors and also easy to control. T h e fast speed L S H photometric stereo method would allow the viewer to observe the surface model immediately, and also enable the viewer to control the model for where he wants to look more i n detail at the same time. B y doing so, it would save a lot of time for both the patients and the doctor to make other appointments for further examinations by slow photometric stereo method. Moreover, the efficient photometric stereo v i a L S H could also been embedded into a robot to get better path exploration by detecting the surrounding obstacles' shape. Also out of the efficient matching method, we could use more than one reference object to build up the search hash tables, which would not affect the performance of the searching speed. Based on that, we can further do photometric stereo on colourful objects. There are some interesting unanswered questions regarding for future work: how can we verify the accuracy of results? For the Laser Scanned result, we find that there are some holes on the model; and for brute force result, we find that the search is based on the accuracy of reference images. How can we bound the number of hash tables and the dimensions of the input images? For different objects and different lighting, do we always need a large number of images? How can we use photometric stereo v i a L S H to handle varied and colourful objects? Also when moving the camcorder, the auto focus may blur the images, therefore how do we select the subset of images that leads to a fast and accurate solution? Moreover, how do we speed up the algorithm to real time, and how do we derive depth data effectively from orientation data? 52 Bibliography [1] S. B a r s k y and M . Petrou. T h e 4-source photometric stereo technique for three-dimensional surfaces in the presence highlights and shadows. Trans.PAMI, 25(10):1239-1252, October 2003. [2] R . Basri and D . Jacobs. Computer IEEE Photometric stereo w i t h general unknown lighting. Vision and Pattern Recognition, 2:374-381, 2001. [3] J . L . Bentley. Multidimensional binary search trees used for associative searching. Communication of the ACM, 18(9), September 1975. [4] P. J . Besl and R . C . Jain. Three-dimensional object recognition. Computer Vision Graphics Image Process, 33:33-80, 1986. [5] P. B r o u . U s i n g the gaussian image to find the orientation of objects. Int. J. Robotics Res., 3:89-125, 1984. [6] P. H . Christensen and L . G . Shapiro. Three-dimensinal shape from color photometric stereo. Int'l J. computer Vision, 13(2):213-227, 1994. [7] E . Coleman and R . J a i n . Obtaining 3-dimensional shape of textured and specular surface using four-source photometry. Comp. Graphics and Image Proc, 18:308-328, 1982. [8] A . F i t z g i b b o n , M . P i l u , and R . B , Fisher. Direct least square fitting of elipses. IEEE Transactions on PAMI, 21(5):38-55, M a y 1999. [9] A . W . F i t z g i b b o n and R . B . Fisher. A buyer's guide to conic fitting. In Machine British Vision Conf., B i r m i n g h a m , England, 1995. [10] Aristides Gionis, P i o t r Indyk, and Rajeev Motwani. Similarity search i n high dimensions v i a hashing. In VLDB '99: Proceedings of the 25th International Conference on Very Large Data Bases, pages 518-529. M o r g a n K a u f m a n n P u b lishers Inc., 1999. 53 [11] A . Hertzmann and S. M . Seitz. Shape and materials by example: A photometric stereo approach. Conference on Computer Vision and Pattern Recognition., l(9):533-540, June 2003. [12] B . K . P. Horn. Understanding image intensities. Artif. Inteli, 8:201-231, 1977. [13] B . K . P. Horn. Hill-shading and the reflectance map. In IEEE, volume 69, pages 14-47, 1981. [14] B . K . P. Horn. Robot Vision. T h e M I T Press, Cambridge, Massachusetts, 1986. [15] B . K . P. H o r n and K . Ikeuchi. T h e mechanical maniplation of randomly oriented parts. Sci. Am., 251(8), 1984. [16] B . K . P. H o r n and eds. M . J . Brooks. Shpae from Shading. M I T Press, C a m - bridge, Mass., 1989. [17] K . Ikeuchi. Determining surface orientations of specular surfaces by using the photometric stereo method. IEEE Trans. PAMI, 3:661-669, 1981. [18] P. Indyk. Nearest neighbors i n high-dimensional spaces. Handbook of Discrete and Computational Geometry (2nd edition), 2003. [19] P i o t r Indyk and Rajeev Motwani. Approximate nearest neighbor - towards removing the curse of dimensionality. In STOC annual ACM symposium '98: Proceedings of the thirtieth on Theory of computing, pages 604-613. A C M Press, 1998. [20] G . K a y and T . Caelli. Estimating the parameters of an illumination model using photometric stereo. Graphical Models and Image Processing, 56(5):365388, September 1995. [21] S. Magda, T . Zickler, D . Kriegman, and P. Belhumeur. Beyond lambert: Reconstructing surfaces with arbitrary brdfs. In ICCV 2001, pages 391-398, 2001. [22] S. Nayar, K . Ikeuchi, and T . Kanade. Determing shape and reflectance of hybrid surfaces by photometric sampling. IEEE Trans, on Robotics and Automation, 6(4):418-431, 1990. [23] M . A . Penna and S. S.chen. Shape-from-shading using multiple light sources. Int. J. Inteli. Syst., 1, 1986. [24] W i l l i a m . M . Silver. Determining shape and reflectance using multiple M I T , Cambridge, M A , 1980. 54 images. [25] Hemant D . Tagare and Pari J . P. deFigueiredo. A theory of photometric stereo for a class of diffuse non-lambertian surfaces. IEEE Trans. PAMI, 13(2):133- 152, 1991. [26] L . B . Wolff, S. A . Shafer, and G . E . Healey. Physics-based and Practice, vision: Principles Shape Recovery. Jones and Bartlett, Boston, M A , 1992. [27] R . J . Woodham. Photometric method for determining suface orientation from multiple images. Optical Engineering, [28] Robert J . Woodham. 19(1), 1980. Gradient and curvature from the method, including local confidence estimation. photometric-stereo Optical Society of America, l(3):3050-3068, 1994. [29] L . Zhang, B . Curless, A . Hertzmann, and S. M . Seitz. Shape and motion under varying illumination: Unifying structure from motion, photometric stereo, and multi-view stereo. In The 9th IEEE International Conference on Computer Vision, pages 618-625, Oct. 2003. [30] T . Zicklerand, P. Belhumeur, and D . J . Kriegman. Helmholtz stereopsis: E x ploiting reciprocity for surface reconstruction. In ECCV 869-884, 2002. 55 2001, volume 3, pages
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Photometric stereo via locality sensitive high-dimension...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Photometric stereo via locality sensitive high-dimension hashing Zhong, Lin 2005
pdf
Page Metadata
Item Metadata
Title | Photometric stereo via locality sensitive high-dimension hashing |
Creator |
Zhong, Lin |
Date Issued | 2005 |
Description | Photometric stereo is a shape-from-shading method for recovering three-dimensional surface orientation information from two-dimensional images with differing illumination but the same viewing geometry. According to the former photometric stereo methods, the inconvenience of calibration and the cost of searching for the gradient of the best match between reference images' brightness and target images' brightness remain as the major problems in this area. Lately, the increasing interest in geometry reconstruction by using photometric stereo has led to a new method for improving the original photometric stereo method. This approach is based on the assumption that two points with the same surface property should reflect the same light and show the same brightness. This new method largely simplifies the traditional calibration experiment by getting the reference object's and target object's information at the same time. Moreover, this new convenient method provides greater opportunity for photometric stereo to be applied to practical real-time robot vision. In this thesis, we extend the new photometric stereo method of Hertzmenn & Seitz that uses multiple images of an object together with a calibration object. For each point in the registered collection of images, we have a large number of brightness values. Photometric stereo finds a matching collection of brightness values from the calibration object and overdetermines the surface normal. With a large number of images in high dimensions, finding similar brightnesses becomes costly. To speed up the search, we apply locality sensitive high dimensional hashing (LSH) to compute the irregular target object's surface orientation. The experimental results of a simplified photometric stereo experiment show consistent results in surface orientation LSH can be implemented very efficiently; and it offers the possibility of practical photometric stereo computation with a large number of images. Finally, we present the idea that LSH should be applied in a variety of computer vision areas. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2009-12-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051569 |
URI | http://hdl.handle.net/2429/16440 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2005-0351.pdf [ 6.36MB ]
- Metadata
- JSON: 831-1.0051569.json
- JSON-LD: 831-1.0051569-ld.json
- RDF/XML (Pretty): 831-1.0051569-rdf.xml
- RDF/JSON: 831-1.0051569-rdf.json
- Turtle: 831-1.0051569-turtle.txt
- N-Triples: 831-1.0051569-rdf-ntriples.txt
- Original Record: 831-1.0051569-source.json
- Full Text
- 831-1.0051569-fulltext.txt
- Citation
- 831-1.0051569.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051569/manifest