MULTI-RESOLUTION STEREO VISION WITH APPLICATION TO THE AUTOMATED MEASUREMENT OF LOGS by JAMES JOSEPH CLARK B.A.Sc, The University of British Columbia, 1980 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Electrical Engineering We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September, 1985 c James Joseph Clark, 1985 In presenting t h i s thesis i n p a r t i a l fulfilment of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of ELCCT/ZJCAC^ £rQ£{jj The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date DE-6 (3/81) .Sgy-rctwb-er- 6, ££fU*}& ABSTRACT A serial multi-resolution stereo matching algorithm is presented Marr-Poggio matcher (Marr and Poggio, disambiguation and in-range/out-of-range functions. It is proposed that 1979). It is shown mechanisms a disparity function that that is based on the the Marr-Poggio are unreliable for non-constant estimate reconstructed from feature disparity the disparity samples at the lower resolution levels be used to disambiguate possible matches at the high resolutions. Also presented is a disparity scanning algorithm with a similar control structure, which is based on an algorithm recently proposed by Grimson (1985). It is seen measurements that the proposed algorithms will function reliably only if the disparity are accurate and if the reconstruction process is accurate. The various sources of errors in the matching are analyzed in detail. Witkin's (Witkin, 1983) scale space is used as an analytic tool for describing a hitherto unreported spatial filtering of the images with non-constant The reconstruction reconstruction distributed are samples process is analyzed in detail. Current methods reviewed. based A new method on applying that caused by disparity functions. for coordinate reconstructing functions from arbitrarily function is process is analyzed, and a general formula for The error due to the reconstruction the as a function of the function spectra, transformations for performing the to the sampled presented. error form of disparity error, sample distribution and reconstruction filter impulse response is derived. Experimental studies are presented which show how the matching algorithms perform with surfaces of varying bandwidths, and with additive image noise. It is proposed that matching of scale space feature maps can eliminate many of the problems that the Marr-Poggio type of matchers have. A method for matching maps which operates in the domain of linear disparity functions is presented. ii scale space This algorithm is used to experimentally for non-constant verify the effect of spatial disparity measurements disparity functions. It is shown that measurements can be made give an independent estimate of the disparity gradient diffrequency. It is shown that the filtering effect for linear disparities. industrial application for on the binocular scale space maps that This leads to the concept of binocular diffrequency measurements are Experiments gradient can be obtained by diffrequency An filtering on the are described by the which show that the spatial disparity measurement. stereo vision is described. measurement of logs, or log scaling. A moment based method from the segmented not affected The application automated the log volume two dimensional disparity map of the log scene is described. Experiments are described which indicate that log volumes can be estimated iii for estimating is to within 10%. Table of Contents Abstract ii Table of Contents iv List of Figures vii Acknowledgements xv I. I N T R O D U C T I O N 1 1.1 The 1.2 Overview of the II. Stereo Vision FEATURE Problem 1 Thesis '. 8 MATCHING 13 2.1 Image Representations 13 2.2 The Feature 21 2.3 The M a r r - Poggio 2.4 Problems 2.5 A 2.6 Disparity Scanning Matching Algorithms 39 2.7 Other matching 42 2.8 Summary III. IV. Matching Problem Matching Algorithm with the 23 M a r r - P o g g i o Matching Scheme 27 Simplified Multi-resolution Matching Algorithm 34 methods of Chapter RECONSTRUCTION 2 44 O F T H E DISPARITY FUNCTION FROM ITS S A M P L E S 45 3.1 Introduction 3.2 Interpolator 3.3 The Methods of Grimson and Terzopoulos 49 3.4 The W K S Sampling Theorem 57 3.5 The Transformation or Warping Method - ID Case 65 3.6 The Transformation or Warping Method - 2 D Case 75 3.7 Implementation 3.8 Including Surface 3.9 Summary ERROR 45 Methods of the 48 2D and its Extensions Transformation Gradient Information Method in the 88 Reconstruction Process 95 of chapter 3 ANALYSIS O F DISCRETE 99 MULTIRESOLUTION iv FEATURE MATCHING 100 4.1 Sources 4.2 Effect of Sensor 4.3 Analysis of Disparity Measurement 4.4 Reconstruction 4.5 Matching Error Analysis 170 4.6 Geometry 183 4.7 Effect of the 4.8 Summary of chapter V. of Error in Discrete Multiresolution Feature Noise on Feature Error Matching 100 Position Errors 106 due to Filtering 115 Analysis 145 Errors EXPERIMENTS various WITH errors on the multi-resolution matching algorithm 4 189 192 T H E DISCRETE MULTI-RESOLUTION MATCHING ALGORITHMS 194 5.1 Introduction 194 5.2 Implementation 5.3 Frequency 5.4 Surface 5.5 Performance 5.6 Comparison of the of the Response Multi-Resolution Feature of the Gradient Response of the Multi-resolution Matching Algorithms of the 5.8 Summary of Chapter SPACE Log Scaling 233 MATCHING 259 Introduction 6.2 Matching of Scale 6.3 Matching of Two Dimensional Scale 6.4 Problems With 6.5 Implications for Biological Depth Perception 6.6 Summary of Chapter BINOCULAR 226 257 6.1 VII. Noise 231 5 FEATURE 220 with the DispScan Matching Algorithm Application of Image Analysis to SCALE Matching Algorithms Simplified Matching Algorithm 198 205 Matching Algorithms with Additive 5.7 VI. Extraction 259 Space Scale Image Representations Space Feature 263 Maps 267 Matching 6 271 £ 277 279 DIFFREQUENCY 280 7.1 Introduction 280 7.2 Diffrequency Measurement 282 v 7.3 Psychophysical 7.4 Experiments 7.5 Summary Summary 8.2 Directions For Diffrequency Stereo 288 292 of Chapter VIII. C O N C L U S I O N S 8.1 Evidence 7 AND A 297 LOOK TO THE FUTURE 298 and Conclusions for Future 298 Work 304 Appendices 306 References : vi 333 List of Figures 1.1 Two correlated arrays of numbers which encode a three dimensional scene 3 1.2 Two views of a scene taken from different vantage points 4 1.3 The setup that produced the images shown in figure 1.2 4 1.4 The structure of the thesis 9 2.1 A summary of the topics covered in chapter 2 14 2.2 A feature pyramid 15 2.3 The response of the V G filter to a step input. The zero crossings of the filtered 2 output are seen to coincide with the edge 18 2.4 The scale map of a random one dimensional function 20 2.5 The geometry of the epipolar lines 22 2.6 The Marr-Poggio multi-resolution matching scheme 25 2.7 The three matching pools of the three pool hypothesis 26 2.8 The failure of the Marr-Poggio in-range/out-of-range mechanism for discontinuous disparity functions. After Grimson, 1985 2.9 The failure (continuous) 2.10 of the Man-Poggio 29 in-range/out-of-range mechanism for linear disparity functions The failure of the Marr-Poggio 30 disambiguation procedure for non- constant disparity functions 32 2.11 The operation of the multi-resolution nearest neighbour matching algorithm 35 2.12 Nearest neighbour matching 37 2.13 The processing flow of a multi-level iterative matching algorithm 38 2.14 The operation of the multi-resolution Dispscan matching algorithm 41 3.1 The topics covered in this chapter 47 3.2 Computational molecules for the relaxation surface approximation algorithm (after Terzopoulos, 1982). The thick bars indicate the boundary of the grid 3.3 The three special types of sample distributions formula of Yen (1956) handled by the 54 reconstruction 63 vii 3.4 a) A function, f(t) sampled at non-uniformly distributed positions b) The transformed function, g(r), sampled at uniformly distributed positions 66 3.5 A burst type of signal, with time-varying bandwidth 70 3.6 The reconstruction 74 3.7 The hexagonal 3.8 The Voronoi and Dirichlet tessellations for a set of points 3.9 a) A set {x,,}. of a chirp signal for uniform and non-uniform sampling sampling lattice for functions with isotropic spectra b) An attempt to create a G H T from 76 82 (x.J. c) Some local GHTs of the point set of a) 85 3.10 The operation of the mapping heuristic for N = 7 89 3.11 The sample locations in \ 90 3.12 The relation of the heuristic mapping efficiency in terms of sample density to the g for N =7 shape of the sample distribution 94 4.1 The topics covered in chapter 4 101 4.2 The perturbation of feature contours by additive noise 107 4.3 The probability SNR 4.4 f 4.6 The of zero crossing position error, for o<~\, and 112 function of zero crossing position error, for SNR=1, and .5, 1., 2. and 4 Probability 0^=1, function .5, 1., 2., and 4 The probability density a = 4.5 = density of an 113 n pixel error, for n=0, 1, and 2, given that q = l/i/2 and as a function of the SNR left and right scale maps 114 of a randomly textured, tilted surface with a disparity gradient of -60/255 117 4.7 The relationship between the left and right scale maps of a tilted surface 119 4.8 A zero crossing contour of the random process F(x,a) 121 4.9 The probability density function of the disparity measurement error for k = 3, and 4 1, 2, 126 viii 4.10 The probability of an n pixel disparity measurement error as a function of the disparity gradient, given q = l/j/2, for n = 4.11 0, 1, 2, and 3 128 The left and right skew maps obtained from a randomly textured surface with a horizontal disparity gradient of -60/255 with a — 2 4.12 The probability density a = l , 0, = 133 of disparity measurement error for zero crossing features, -0.1 to -0.4 138 4.13 The probability density of disparity measurement error for zero crossing features, 0, = -0.1, o = 1, to 4 139 4.14 The probability of an N pixel error for zero crossing j3, for a = l , q = 1V2 and N = 0,1,2 and 3 140 4.15 The probability of an N pixel error for zero crossing a for p\ = -0.1, q = features as a function of l/j/2 and N = features as a function of 0,1,2 and 3 141 4.16 The probability density function of the disparity measurement error for extremum features for p\=-0.1 to -0.4 with a = l 4.17 142 The probability density function of the disparity measurement error for extremum features with p\=-0.1 for a = 1,2,3 and 4 with q = l/j/2 143 4.18 The probability of an N pixel disparity measurement error for extremum features as a function of p\ for a = 1 and q = l / / 2 143 4.19 The probability of an N pixel disparity measurement error for extremum features as a function of a for /3j = -0.1 and q = l/v/2 144 4.20 The shapes of the regions of support for F(a>) and G(c3) for exact reconstruction of f(x) from its samples 147 4.21 Aliasing error in the reconstruction 4.22 The effect of having caused by too low a sample density an improper reconstruction filter. Note that 148 the central repetition is partly filtered out and that parts of the other repetitions are passed by the 4.23 A plot of Slepians approximation filter to the optimum ix 148 filter 154 4.24 A plot of Slepians first approximation to the optimum filter extended past its region of strict validity 4.25 The average RMS reconstruction a = l V 3 , for c=5, 4.26 The average The error for a Gaussian process and filter, with 10, 15 and 20 164 RMS reconstruction a = l V 6 , for c=5, 4.27 155 error for a Gaussian process and filter, with 10, 15 and 20 165 average RMS reconstruction a = l/>/12, for c=5, error for a Gaussian process and filter, with 10, 15 and 20 165 4.28 The matching process 171 4.29 The analysis of the matching error given that the closest match to the estimated match position is the Nth feature 4.30 The theoretical probability of obtaining the disparity measurement 4.31 174 error, a = correct match as a function of the /2 178 Experimentally derived relationship between the probability of obtaining the correct match and the error in the disparity estimate for a number of different angle quantizations 4.32 179 The probability density of obtaining a matching error as a function of the error in the disparity estimate 180 4.33 The distortion of zero crossing contours for non constant disparity functions 4.34 The probability of obtaining estimate error for non-constant the correct match as a function of disparity functions the 181 disparity 182 4.35 The stereo camera geometry 184 4.36 The effects of vertical misalignment on the disparity measurements 187 4.37 The action of the various errors on the matching process 190 5.1 The topics covered in this chapter 195 5.2 The spatial filtering process 199 5.3 The frequency response of the four spatial filters 5.4 The zero crossing pyramid of a random image pair x 203 206 5.5 The variation of the RMS disparity error with the matching region size 207 5.6 The variation of the RMS disparity error with the number of resolution levels 208 5.7 The RMS disparity error as a function of the number of relaxation 5.8 Perspective plots of the disparity the warping reconstruction 5.9 RMS disparity disparity 5.10 = 5.11 = 5.12 = 5.13 = 5.14 = 5.15 = 5.16 = 5.17 = 5.18 = 5.19 = = maximum 211 error as a function of a , g for matching algorithm 1, maximum 212 error as a function of a , g for matching algorithm 1, maximum 212 error as a function of a g) for matching algorithm 1, maximum 213 error as a function of a , g for matching algorithm 2, maximum 213 error as a function of a , for matching algorithm 2, maximum 214 error as a function of a , g for matching algorithm 2, maximum 214 error as a function of 0 , g for matching algorithm 2, maximum 215 error as a function of a , g for matching algorithm 3, maximum 215 error as a function of a , for matching algorithm 3, maximum 216 error as a function of o g> for matching algorithm 3, maximum 15 RMS disparity disparity 1, 10 RMS disparity disparity 5.20 = algorithm 5 RMS disparity disparity matching 20 RMS disparity disparity for 15 RMS disparity disparity g 10 RMS disparity disparity a , 5 RMS disparity disparity of 20 RMS disparity disparity function 210 15 RMS disparity disparity method a 209 1, with 10 RMS disparity disparity as method 5 RMS disparity disparity error function obtained using matching iterations 216 error as a function of a , 20 for matching algorithm 3, maximum 217 xi 5.21 Perspective plots of the error maps for the three reconstruction methods, obtained for a =40 5.22 Perspective 217 plots of the disparity function obtained for a =40 for the three reconstruction techniques 218 5.23 The left hand scale map for the disparity gradient experiments 222 5.24 The right hand scale map for the /3 j =—20/255 case 222 5.25 The right hand scale map for the /31 =—40/255 case 223 5.26 The right hand scale map for the /31 =—60/255 case 223 5.27 The right hand scale map for the /31 = -80/255 case 5.28 The measured RMS disparity error due to filtering as a 224 function of a for disparity gradients of -20/255, -40/255, -60/255 and -80/255 5.29 The expected RMS disparity error due to filtering as a 225 function of a for disparity gradients of -20/255, -40/255, -60/255 and -80/255 5.30 Increase in RMS disparity error as a function of added noise variance. 225 Surface a ' 30, Iterative matching, Zero crossings only 5.31 Increase in RMS disparity error 227 as a function of added noise variance. Surface a = 15, Iterative matching, Zero crossings only 5.32 Increase in RMS disparity error 228 as a function of added noise variance. Surface a = 30, Iterative matching, Zero crossings and extrema 228 5.33 Increase in RMS disparity error as a function of added noise variance. Surface o —15, Iterative matching, Zero crossings and extrema 5.34 The pseudo-variance noise variance, for of the additive noise error = 229 as a function of the additive y/2 230 5.35 The RMS disparity errors for the Dispscan and simplified matching algorithms as a function of a 232 g 5.36 A log lying on a flat deck 236 5.37 The effects of thresholding figure 5.36 236 5.38 The result of applying an edge operator (Marr-Hildreth) to figure 5.36 238 5.39 The video log scaling system setup 239 xii 5.40 Two stereo image pairs depicting single log scenes 244 5.41 The zero crossings of the steTeo pairs shown in figure 5.40 245 5.42 The thresholded disparity maps of the log scenes 246 5.43 The approximation of the log boundary by its convex hull 248 5.44 The filled in log region 249 5.45 The detected log boundary 250 5.46 Fitting an ellipsoid to the log region 255 6.1 The topics covered in this chapter 260 6.2 Three adjacent one dimensional slices of a two dimensional scale map exhibiting non- well- behavedness 268 6.3 A linked coarsely quantized scale map (j/2:l a ratio) 272 6.4 Splitting and merging of scale map contours for nonlinear disparities 274 6.5 A stereo pair of scale maps with sinusoidal disparity 274 6.6 The scale maps of a real stereo image pair 276 7.1 The topics to be covered in this chapter 281 7.2 The spatial organization of a foveal image representation 284 7.3 The transformed version of the foveal image representation offigure7.1 284 7.4 The relationship between the left and right foveal scale maps 286 7.5 The relationship between the left and right scale maps for /30 = 0 287 7.6 The relationship between the left and right foveal scale maps for /30 = 0 287 7.7 A pair of ambiguous sinusoidal stimuli 289 7.8 The diffrequency search paths for a logarithmically scaled a axis 293 7.9 The RMS diffrequency error as a function of a for p\= -20/255, linear disparity 294 7.10 The RMS diffrequency error as a function of a for p\= -40/255, linear disparity 295 7.11 The RMS diffrequency error as a function of a for j3x = -60/255, linear disparity 295 7.12 The RMS diffrequency error as a function of a for 0!= -80/255, linear disparity 296 xiii 7.13 The standard deviation of the 2.1 The diamond search 2.2 Examples of edges in the diffrequency path and the five five quantization error edge types edge modes 296 313 314 xiv ACKNOWLEDGEMENTS The interaction production of a thesis does not occur in a vacuum. Without the support and of a number of people and organizations this particular thesis would never have been completed. I owe a enthusiasm great deal and confidence to my supervisor, in my work gave Dr. Peter Lawrence, me the encouragement whose never-ending needed to successfully times, by Dr. Allan undertake this research. I would like to acknowledge the advice provided, at various Mackworth of the Computer Science department at U.B.C. The fellow students with which one has the opportunity of working with provide much of the intellectual and social interactions that one requires if they are not to emerge from their studies narrow-minded and lacking in basic having many fine people as colleagues. I tennis skills. would I have especially like had the pleasure of to acknowledge the contributions, both social as well as intellectual, of Brian Maranda, Richard Jankowski, Nick Jaeger, Norman Beaulieu, Kevin Huscroft and Jim Reimer. As well, I must acknowledge some of the Computer Science students that have shown me the view from their side, as well as keeping me entertained on Friday afternoons. Barry Brachman has been a friend as well as a colleague, and Jim Little and Marc Majka have shown me the computational side of Computer Vision. I would also like to thank the support given by the Forest Engineering Institute of Canada, particularly Verne Wellburn and Alex Sinclair. The B.C. Science Council funded the research described in this thesis with a grant G.R.E.A.T. scholarship. xv and provided me with a much appreciated Finally, way I would like to thank my mother through my long studies, and not making me high school counselor suggested. xvi and father become for being behind me all the a welder or something like my 1 I - INTRODUCTION "They came to Bethsaida and some people brought him a blind man whom they begged him to touch. He took the blind man by the hand and led him outside the village. Then putting spittle on his eyes and laying his hands on him, he asked, 'Can you see anything?' The man, who was beginning to see, replied, 'I can see people; they look like trees to me, but they are walking about.' Then he laid his hands on the man's eyes again and he saw clearly; he was cured and he could see everything plainly and distinctly." 1 1.1 - The Stereo Vision Problem Like the blind man in the quotation that begins this thesis, the machines of man's creation have recently acquired the gift of sight. This gift allows these machines the ability to perform tasks unheard of a scant decade ago such as being able to autonomously manoeuver in a loosely constrained environment, visually inspect industrial components for defects, locating and tracking objects for the purpose of manipulating them, and just plain seeing what's out there. However, unlike the benefactor of the above quotation, human engineers have somewhat less than divine powers. As a result our machines currently operate in a mostly black and white, blurred, and two dimensional world and don't really understand what they are looking at unless industrial object with it is explained applications at except under the spatial perception to them this time by a cannot human. Most vision systems the three dimensional position of an determine most contrived conditions. It will good enough to enable them to that be a while before play baseball (not are used in we see to robots mention the other skills involved). There the survey have article been by methods Jarvis, developed, such 1983) which can as structured accurately light and laser determine the ranging (see three dimensional positions of objects. These techniques, however, require a very constrained environment and are active, which means that they alter their visual environment in some fashion (such as by projecting patterns of light on to the scene). Being taken out of this constrained environment and placed into a less constrained one significantly reduces Mark 8:22-25, Jerusalem Bible the ability of these systems. The 2 fact that a system often used by engineering uses active methods is not necessarily biological point of systems view, an (the active bat's echo-location sensing system consumption, efficiency and its effect on the depth the visual method processing of is perception methods choice for centred around function and the Despite some encouraging advances to the efforts Technology) biological of the late David a drawback, and such methods is must environment. a be determination However, of on its From size, an power for visual perception Much how design of similar methods example). judged systems is passive. the good are current research biological passive of into depth tailored for use in machines. in our understanding of such methods (due in large part Marr and his co-workers stereo vision systems that can operate at the well in loosely Massachusetts constrained Institue of environments have not yet been developed. It has acquainted been with the the author's computer experience vision field, with that laymen, they as well greatly as educated underestimate people the not difficulties involved in the perception of depth. The usual argument that they present is that the human visual system can perceive depth so effortlessly processes involved must be present arrays of numbers, them with two determine example fairly simple. One way such they, quite naturally, in which to as answer assume that the these people is to those in figure 1.1, and ask them to the depth of the object which gave rise to these arrays of numbers. This simple points out the processing that mechanism fact that the human visual system that is tailored to process spatial actually imagery contains such as a very that complex caused by patterns of light falling on our retinae. This same mechanism operates very inefficiently when presented way a with stimuli such as the array of numbers depicted in figure 1.1. In fact the only human can detect depth in this pair centres altogether and try and determine of arrays is to bypass the visual processing the correlation between the two arrays with the use of the higher cognitive centres of the brain, which are ill adapted for such computations. In doing this sort of experiment, a human begins to appreciate actually involved in stereoscopic depth perception. the complex processes that are 3 67 68 73 67 56 44 49 55 48 47 49 49 46 41 44 47 48 50 57 53 67 70 77 76 65 65 50 58 52 45 50 48 45 44 45 49 45 51 50 46 68 74 76 73 65 78 65 56 61 56 57 54 47 49 52 54 52 54 57 52 71 73 72 72 58 71 80 58 61 63 58 57 48 49 48 50 49 52 55 51 71 73 69 66 53 46 76 56 48 62 56 52 48 44 47 48 49 51 54 54 74 79 76 69 61 34 70 71 60 67 66 56 52 49 49 52 53 55 59 58 68 78 81 69 65 32 43 81 70 62 67 56 50 46 47 49 51 53 58 58 61 76 80 69 67 42 24 65 74 59 66 56 48 44 45 47 48 50 54 57 66 73 81 73 73 56 24 42 77 65 67 62 52 45 48 48 49 51 56 57 62 67 73 71 72 62 25 23 59 71 61 62 51 45 42 44 45 48 52 54 65 64 68 72 74 69 36 20 42 73 67 68 61 45 44 44 52 54 57 60 67 64 65 72 76 73 49 20 30 64 75 68 65 50 46 43 50 51 55 57 69 60 64 65 72 73 57 25 22 48 77 67 63 54 49 47 51 53 53 57 81 63 61 60 72 81 69 38 19 40 77 76 60 51 50 50 54 51 53 55 87 62 50 49 63 81 72 45 17 31 63 76 56 34 45 49 53 48 49 52 94 70 46 48 63 85 78 54 25 26 58 75 54 25 42 54 58 52 52 57 96 79 41 46 55 77 78 62 31 21 45 69 44 20 32 48 53 50 49 53 99 85 52 49 55 76 82 68 46 20 38 62 43 18 21 38 49 44 46 50 96 95 64 50 54 68 83 75 59 23 28 62 47 21 17 28 4 4. 45 42 52 87 27 26 22 17 13 9 6 9 9 10 11 13 16 19 23 20 17 26 33 33 29 26 23 19 14 11 8 8 10 7 10 9 13 13 22 24 19 16 25 33 27 25 25 19 15 12 10 9 11 8 10 12 11 12 19 23 24 18 17 24 26 24 24 20 15 13 8 11 10 10 9 12 15 15 18 21 24 25 20 17 27 25 22 21 16 13 9 10 11 11 8 11 14 16 17 20 24 26 25 17 28 25 21 19 16 15 9 8 10 10 9 9 12 16 16 17 23 26 27 21 27 26 21 18 17 16 11 8 9 9 8 8 12 14 15 16 19 25 28 21 25 26 21 18 19 15 13 10 9 8 8 10 12 13 15 14 18 24 26 20 23 24 20 17 17 15 12 9 8 8 9 12 13 13 14 14 16 22 25 15 19 23 20 19 17 16 14 10 8 9 9 11 11 13 13 14 14 21 21 14 15 19 20 18 18 15 12 12 7 8 8 8 9 11 12 13 13 16 16 9 13 16 20 20 16 15 13 13 8 6 7 7 10 10 12 12 11 12 9 5 12 15 20 22 17 17 15 14 10 8 5 6 10 11 13 12 11 11 7 1 13 16 20 22 20 18 18 14 13 8 5 7 8 12 13 10 8 5 3 1 13 15 17 22 22 18 18 14 14 9 5 14 15 16 20 21 20 18 15 15 11 15 15 14 17 19 16 16 16 14 12 14 11 13 14 19 14 14 14 14 11 7 6 9 10 12 11 4 0 1 3 11 11 10 10 7 5 10 10 11 10 6 6 6 2 4 99 78 50 51 59 77 77 65 34 22 56 58 27 18 27 41 48 47 53 LEFT 26 24 19 16 11 9 11 11 13 12 13 16 19 25 26 27 29 30 34 31 28 25 20 17 12 9 10 9 12 11 13 15 17 21 24 21 25 31 34 32 6 6 6 9 9 0 7 9 9 10 10 11 2 0 RIGHT FIGURE 1.1 Two correlated arrays of numbers which encode a three dimensional scene In order to understand the difficulties involved in stereoscopic depth perception one must first understand the process by which depth can, in principle, be measured. Consider the pair of images shown in figure 1.2. These are two views of the same scene as seen from two different vantage points. A schematic description of the situation is depicted in Figure 1.3. Looking closely at these images we notice that the image of the same physical point (such as the lower right corner of the telephone for example) occurs at different points in the two images. Even closer inspection reveals that this difference in position is not the same for all FIGURE 1.2 Two views of a scene taken from different vantage points. FOCAL PoiNT IMAGE FIGURE PLflME 1.3 The setup that produced the images shown in figure 1.2. physical points in the scene. In fact it can be seen from figure 1.3 that the farther away a 5 physical point is from the cameras, the smaller is the difference in position, or disparity. This is the basic principle which allows us to measure from two different positions. As Marr (1974) depth given two images has stated, there of a scene taken are three steps that are required for the determination of depth in this manner to proceed. These are: 1. A point in one of the images corresponding to an actual physical event must be found. 2. The corresponding point must be found in the other image. 3. geometry The disparity between these corresponding points is measured and, given the of the imaging process, the depth to the point in space giving rise to the physical event is computed. The second of these steps, commonly refered to as the correspondence problem, is the step that has proven the most difficult to implement. The problem, reduced to its most basic form, is how can we distinguish one point in an image from all of the other points in the image? Part perception of the answer process, namely to this question the requirement lies in the first step of the stereo that the features that we try to match depth between the two images arise from unique physical events. As Marr and Poggio (1979) point out, this rules out the use of image intensity as a matching feature since a physical event cannot be uniquely associated with a given image intensity due to the fact that there are many physical events which give rise features which are more intensity be used. associated to that image directly related However, even intensity. to physical events, if one can find with physical events, the correspondence of the inherent ambiguity of these matching object descriptions (chair, already been done telephone Marr and Poggio (1979) suggest that such as sharp other changes in image a set of features that can be uniquely problem is still not solved. This is because features. Unless very complex features such as etc.), implying that an immense amount of processing has to extract these features, are used, the features to be matched are to some extent ambiguous. This means that two independent features may look identical, (even if they result from totally separate physical events) and can be confused by the system that is 6 attempting to find correspondences. One of the main challenges that stereo vision researchers have faced is in reducing this feature ambiguity without excessively increasing the amount of processing required to find the features. The Multi-Resolution Paradigm For most scenes the range of shifts or disparities scene can be bounded. This means that any search between the two images of that for corresponding features can be limited to a certain finite region. Now, if the density of the features were such that there were few features in this matching region then the correspondence Thus one possible method for determining problem would not be very difficult. correspondences involves matching scene features which have a low density. Alternatively one can limit the size of the matching region, which means limiting the range of disparities that the system devised a scheme disparity range. whereby This type can handle. Marr and Poggio (1979) one could use features with a high density of scheme was first used in the work and handle a large of Moravec (1977). The technique used by Marr and Poggio, which was intended to model the way in which human (and some other biological systems) performed stereoscopic multi-resolution feature set Such a set consists The low density, or low resolution, feature depth perception, involved using a of collections of features of various densities. set can be matched over a large range of disparities, but only a sparse set of depth values is obtained. These values, however, can be used to guide the estimate to the next dense (higher resolution) feature set, which can be matched over a smaller range of disparities, and which provide a denser set of depth values. This process can proceed to higher and higher resolutions. The net result of this algorithm is that a dense set of depth values can be obtained over a large disparity range. However, this explanation has been oversimplified, and like many things, the process is not a simple as it seems. There must be a way of transferring information from one resolution level to another, which was not addressed by Marr and Poggio in their proposal of the multi-resolution method. The matching algorithm proposed by Marr and Poggio, which we describe in chapter 2, is seen to have problems with non-constant disparity functions. in detail 7 Goals of the thesis The objectives of this thesis are four-fold. First we wish to develop a multi-resolution stereo matching algorithm that is potentially rapid enough for real time applications. Secondly we want to analyze the component parts of the resulting algorithms to see where and how errors are introduced, and how these errors affect the performance of the matching algorithms. Thirdly, we wish to test the algorithm thoroughly and apply it to an industrial task, that of log scaling. Our final goal is to understand the mechanisms that are performance of multi-resolution matching algorithms and propose other offer improved performance. most important to the algorithms which may 8 1.2 - Overview of the Thesis This section briefly describes the layout of the thesis and points out the contributions that each chapter makes to the attainment topics to be covered in the thesis, of the goals of the thesis. Figure 1.4 depicts the where they can be found, them. In addition, at the end of each chapter and the relationship we present a summary of the most between important points raised in that chapter. Chapter Marr-Poggio 2 begins with multi-resolution a discussion algorithm of multi-resolution (1979) is examined image in some representations. detail. The We point out difficulties involved with the Marr-Poggio method when the disparity function is not constant It is shown information that, must to handle non-constant be accurately passed to disparity functions, the higher the low resolution resolutions. Based on this disparity idea we propose a modification to the Marr-Poggio algorithm. A different sort of matching algorithm, based on some recent work of Grimson (1985) is described, which involves scanning through a large disparity range for possible matches. Grimson's method is essentially single-resolution, using lower resolution information only to disambiguate competing matches. We show that this method also has difficulty with non-constant this algorithm which allows non-constant For data from chapter the multi-resolution methods lower resolutions to higher disparity functions. We propose a modification of disparity functions to be handled. discussed is seen in this thesis, the projection to be of paramount of disparity importance. Thus, in 3, we discuss, at some length, various methods by which this can be done. Grimson (1981b, 1982) has also discussed this problem, but only in the context of 'filling in' the gaps between the disparity values at the highest resolution, rather than the process of reconstructing the disparity function at the lower resolutions, which is what we are concerned with in this thesis. We show that the assumptions implicit in Grimson's reconstruction method are not entirely valid for the lower resolutions. Because of this we look at other possible methods for the disparity function sampling theory; reconstruction. This search led us to examine methods based on that is the theory of reconstructing analytic functions from their samples. One 9 Scale space matching CW-fo Image representations Binocular Diffrequency CH.Z 1 measurement CH"? Feature matching CH.2 Error analysis CH.4 Disparity function reconstruction Experiments CH.3 I CH.5 Application to Log scaling CH. & < Conclusions CH.8 FIGURE 1.4 The structure of the thesis. of the problems that we encounter is that the disparity samples we obtain are distributed non-uniformly, whereas most reconstruction methods based on sampling theory require the 10 samples to be on regular lattices (such method, based reconstruction on the as rectangular sampling theory for or hexagonal). uniform sample We therefore distributions, which of functions from non-uniformly distributed samples. We present dimensional versions dimensional case. This problem of course; radio telescope of this method, method has and a give wider a computational domain of develop a it has been succesfully applied to the reconstruction the one and two algorithm applicability than allows for the the two stereo vision of synthetic aperture imagery (Clark, Palmer and Lawrence, 1985). We also discuss how this method can be altered to handle the inclusion of disparity gradient, or surface normal information in the reconstruction process. It has been suggested (e.g. Ikeuchi, 1983) independent information about the surface will that the addition of such result in a more robust system for obtaining the three dimensional structure of an object Chapter 4, in some respects, forms the heart of the thesis. In this chapter we analyze the various processes by which errors These errors measured camera can can arise in the determination of the be partitioned into four distinct types. position of the features. This can result misalignment as well as from a heretofore The first type from sensor noise, depth function. is an error in the position quantization, unreported effect involving spatial filtering of images of tilted surfaces. The second class of errors from the incorrect matching matching error resolution levels. It However, it is parameters (such incorrectly. is the matching errors. of ambiguous features. is sensitive to errors in the is shown that using more shown that non-constant as edge orientation) Experimental studies are We of the the errors show, for our simplified disparity estimate obtained complex disparity features described These are which features functions reduces can the cause which arise algorithm, that from the lower matching distortion error. in the which may cause them to be matched show that this distortion causes an increase in the matching error. The third class of errors consists of those incurred in transferring disparity information from one resolution level to the next The chief component of this error is the error in 11 performing the reconstruction of the disparity function from the disparity samples at a given resolution. The fourth and last type of error described is the error inherent in the computation of the depth values from the disparity values. In general, these errors depend on the accuracy to which the various parameters of the imaging process, such as the baseline between the sensors, relative sensor tilt, focal length, etc., are known. If the values of these parameters are in error, then industrial so will applications require accurate depth (such as the values. This log scaling is an important application consideration described in for chapter the many 5) which measurements. Chapter 5 done in analyses the describes chapter the results of a 4. We run the number of experiments multi-resolution designed matching algorithms to check the proposed in chapter 2 on a number of different synthetic image pairs. These image pairs are created from arrays of uncorrelated (white) normally distributed random numbers. The disparity functions that are used are gaussian cylinders, that is constant disparity parallel to one axis and varying with a gaussian shape parallel to the other axis. The spread (or variance) of this gaussian disparity function is varied, which has the result of changing the bandwidth of the disparity function. Chapter 5 concludes with an example a typically messy yard. We show to industrial application; that of measuring or scaling lumber in a lumber sort that stereo vision provides process. The steps involved in this automated of this of the application of stereo depth perception process on actual images are a feasible solution to the automation of this log scaling procedure are outlined and examples provided. Estimates of the log measurement accuracy obtainable are given. Chapters 6 and 7 address alternative methods, which are problems encountered methods for obtaining disparity information. These based on scale space image representations, are immune to some of the by the matching techniques proposed in chapter 2. 12 Chapter 6 considers the idea of matching scale maps (defined in chapter Chapter measurement 7 discusses a recently proposed binocular process, that of 2). diffrequency Introduced by Blakemore (1970), this process involves measuring, in some fashion, the difference in the spatial frequency content between the two images. Using the scale space transform as mathematical binocular an analytic as well as a representational basis. It is shown that measurements scale space image representation and tool, we give this idea a firm of the diffrequency can be made on the that these diffrequency values are directly related to the disparity gradient of the surface. The final chapter (8) contains a discussion of the results obtained in the previous chapters and makes some conclusions. We provide possible directions for future research, based on the unanswered questions that the thesis raises. 13 D - FEATURE - 2.1 Image In the in MATCHING Representations this chapter correspondence block diagram we discuss between ways them. form in that image A figure in which summary 2.1. The a of bulk pair the of images topics of the matchable it features, such can covered sections in marked be matched this chapter with '*' to yield is given contain new material. In order representation features such should as to illumination, raw be these correspond such intensity an In one features. In at a the a degree there for scene use by are more in to localized the create intensity scene. In referred shadows of the and this to symbolic changes. regard, as other a These features edges) usually discontinuities in objects, surface markings and so on. matching process. Features such as the by the event of ambiguity as as for use physical necessary mistakenly orientation adequate given (often the surface not higher a be general, different be confused A for must resolution in events is in matching the than scene. process since These so-called do edge features. indistinguishable gray they cannot gray level That is, in a given levels than there are edges. order scheme, the physical adequate with image indistinguishable in be of changes events levels are a terms intensity will exhibit be distinct physical uniquely associated of in to image features image region the images discontinuities features also each the localized correspond Thus of pairs given able such with each of feature to be useful in a multi-resolution to create a multi-resolution image representation a of resolution. decreases, type thus representation The consists feature reducing the density of a set generally matching (but ambiguity single not comprised resolution of these descriptions, decreases as features to feature pyramid (also known as (as neccessarily) matching there are less other). popular multi-resolution feature cone) (Tanimoto, 1978, image and representation Levine is the 1978). A feature pyramid is depicted in figure 14 Introduction I CH.l.\ Summary Feature Matching Image CH 2-1 representations CH2.I Disparity Scan The Marr-Poggio matching algorithm « algorithms CH. 23 A simplified Problems method with the # CW.3-S CH.2.6 Marr-Poggio ^ approach CU.2.4 (*) Indicates New Material FIGURE 2.1 A summary of the topics covered in chapter 2. 2.2. A feature pyramid is made up of a number of single resolution feature maps, each 15 FIGURE 2.2 a resolution a level that the If This to means fraction of the preceeding its resolution. that there higher one resolution, can at resolutions resolution matching use the the processing search so obtained In this may be achieved. over to limit (1985) discusses gained resolutions, may be reduced. process Since a large single resolution matching lower is less much more from these the amount way an overall level data of the search in computation possible of processing resulting resolutions processes required in computation matching, to than at over the at levels. guide the higher a single high one can perform a and use the disparity region at the higher from such levels, is the at the lower resolution low resolution at low resolutions the coarser resolution at the lower quickly saving is spatially quantized at the resolution, In the case o f stereo image range cheaply scheme. is, the there then the required size the savings level. Each information at the lower takes place information high That is less the characteristic pyramid shape. processsing a is some proportional quantization. in A feature pyramid. resolutions. a stereo matching estimates Grimson scheme over 16 Feature resolutions, pyramids the type are of distinguished feature by encoded information at the various resolutions in three the be found in (Rosenfeld, These three dimensional implementations of feature 1984). Quadtrees structures are pyramids obtain in the higher pyramid, the way in succesive which the resolutions is Applications and descriptions of quadtrees known resolution between of succeeding can be extended the ratio and the are obtained. If the ratio 2:1 the resulting pyramid is known as a quadtree. can factors; as lower levels to three spatial dimensions. oct-trees (Srihari, resolution in some levels 1984). Some by averaging way. Other the information contained obtain each resolution level by independent means. As will be seen, the stereo algorithms to be described later acquire the multi-resolution feature maps by operating implementations on the information in the higher resolutions. Often multi-resolution feature descriptions have feature resolutions. The resulting structure is strictly of having less a constant spatial quantization at all not a pyramid, and loses the advantage data to process at the lower resolutions. These types of structures are used when one wishes to have a high spatial resolution, along with a low feature density. One resolution can also envisage a multi-resolution image representation between successive levels is vanishingly small. In such continuum of resolution levels. Such an image representation values. representation wherein the difference in a case we would have a has been called a scale space (Witkin, 1983), where the term scale space refers to the continuum of resolution The term multi-resolution scale zero space crossings originally was meant of V G filtered images 2 to apply to (these terms will the case of the be defined shortly). However, the concept can, and should be, extended to cover any type of feature for which a continuum of resolutions can be defined. We have stated that features based on localized changes in image intensity are suitable features for representation a stereo based matching on these (1980), who proposed system. How can features? This an edge operator question we obtain a was addressed multi-resolution image by Marr and Hildreth that was able to detect edges at a given scale or 17 resolution. By altering a parameter in their operator, detected, consists of three parts. The first part involves smoothing or localized. This operator edges at different resolutions could be the image by convolving it with a gaussian low pass filter. This allows different resolutions to be achieved by changing the cutoff frequency of the low pass filter. The second part of Marr and Hildreth's multi-resolution edge detection operator involves taking the second spatial derivative (or in two dimensions taking the Laplacian) of the low pass filtered signal. The final step is to determine where the differentiated signal passes through zero. Such a point is called a zero crossing, and for linear image intensity changes locates the filter is popularly known as a V G filter, from its component parts. 2 edge exactly. The In two dimensions the V G filter's impulse response is written: 2 V G(r) 2 where reduces r = 2 the = x + y. 2 2 -(l/iro- )[l-rV2a ]e" 4 J The factor resolution of the a (2.1.1) (rV2c72) in the filter response resulting edge representation. is the scale factor. Increasing o The operation of the V G edge 2 detector is shown in figure 2.3 on a step edge signal. An edge pyramid can be built up using the Marr-Hildreth edge operator an image with the V G filter for a number of different values of a, 2 by filtering and then finding the zero crossings. We will call the resulting structure a zero crossing pyramid. In order to reduce computation and storage requirements in our stereo system, position to a level directly proportional to the a we quantize the zero crossing value of the V G filter at each resolution. 2 Thus the positions of low resolution edges are specified less accurately than are edges at the high resolutions. A scale space image representation can be obtained using the V G filter by applying 2 the filter to an image at a continuum of a values. In fact this was the original definition of the scale space. We can write this operation as an integral transform, which we call the scale space transform, as follows (for the one dimensional case): 18 i I STEP INPUT •-t i ii LARGE CT ? f ) ( ~ \ J V^G &L——^— ^ — ^ OUTPUT »t SMALL cT J FIGURE 2.3 The response of the V G filter to a step input The zero crossings of thefilteredoutput are seen to coincide with the edge. 2 F(x,a) = dVdx /".RujaV/5Jr e" (x-u)V2aJ du (2.1.2) F(x,a) is said to be the scale space transform of f(x). The above equation is seen to be a 2 convolution of f(x) with the function dVdx aV/27re~^ x_u ^ /2a , which is the impulse 2 response of the one dimensional V G filter. This one dimensional scale space transform can be straightforwardly extended to handle higher dimensions. This is discussed in more detail in chapter 4.3. The function G(x,o), obtained as follows: G(x,a) = 1 if F(x,o) = 0, and zero otherwise (2.1.3) is called the scale map function of f(x). It is a binary function that is zero everywhere except at the zeroes of the scale space transform. The scale map of a random function is shown in figure 2.4. The a scale is logarithmic. Note how, as the resolution decreases (i.e 19 increasing a), the density of the zero crossing contours also decreases. The map, will two image representations described here, the zero crossing pyramid and the scale be the representations that we will matching process in the rest of the thesis. be using in our development of the stereo FIGURE 2.4 The scale map of a random one dimensional function. 21 2.2 - The Feature Matching Problem Once the multi-resolution image description has been built for the two images in the stereo pair, we must match them to obtain the disparity function between corresponding points in the two images. The disparity function, which is usually required to be known at all points in the image, not just at those points coincident with a feature, is defined as follows. Suppose the and intensity gCxjJ for the functions in the left image. If two images we are given by f(x^) assume perfectly matched for the sensors and right image ideal viewing conditions then we can write: f(x ) = R where d(x^) a two dimensional space, search to one of each both components other. This of from the two components means that we between can reduce the components would require disparity of the the vector searching could be disparity vector dimensionality of camera focal point through the scene axis (the point the can be seen in line between the focal eyes or cameras) and the. direction of view of either of the cameras (i.e. the are of the disparity vector is epipolar constraint. The idea behind this constraint If one forms a plane defined by the inter-ocular points of the intersection that dimension. This dependence inherent in the so-called vector so However, it can be shown that the independent figure 2.5. (2.2.1) R This would seem to imply that any search for a match determined. not R is the disparity function. Note that d is a vector quantity. That is, it has two components. over g(x -rf(x )) being imaged), then the the of this plane with the image planes of the two cameras define two lines, one in each camera's image plane. The importance of these lines, called the epipolar lines, is that any feature on the epipolar line in one image has its corresponding feature on the associated epipolar line in the search along the other image. Thus, to perform the search for a match, one need only epipolar line in the other image associated with given view direction. Note that, in general, the epipolar lines change as the view direction changes. This means that one must determine where the epipolar lines are before the search is begun. To determine the 22 IMAGE OPTIC A X E S FIGURE epipolar such line as knowledge employed. are for the The geometry of the 2.5 given view direction inter-ocular axis. If of a the camera A particularly parallel. In This is the will be are produced no that in the this exception. one geometry can is epipolar geometry for Even if the in principle modified image requires is simple epipolar case the assumed this epipolar lines. not known) accurate known then two a lines are all experiments camera geometry rectify the parallel in is images epipolar lines are to of the (possibly dimensional geometry results if the most space the knowledge camera limited matching stereo not such by some other vision, horizontal. that and a coarse must two be cameras and are horizontal. in this thesis horizontal coordinate if search optic axes of the each geometry, epipolar transformations we lines so 23 2.3 - The Marr-Poggio Matching Algorithm Marr intended This and Poggio to model method algorithm filters the was proposed way in which subsequently a multi-resolution the human implemented visual by matching system Grimson mechanism performed (1981a). The which was stereo matching. operation of this was as follows. A filtered (1979) mulu-resolution image are at 3, 6, four different representation resolutions 12, and 24 picture quantized to 3 0 ° levels. and same the image spatial comprised is of constructed. elements (pixels). (pixel size) is zero The values The orientation The positions of the zero crossings resolution the used at crossings of a V for these 2 G four of the zero crossings are are measured all four o of to within one pixel, levels (hence, the resulting structure is not a pyramid). The the pair of multi-resolution Marr-Poggio crossings is scheme done proceeds along the over obtaining Marr which ambiguous and Poggio is less than region is very as match a given the lower information obtained Matching for is at a limited than the disparity search range region are examined that more region resolution disparity roughly in size 1979) level matched. Search crossings one possible to for must matching have same reduce if the the possibility in the search size of the of region). matching one match in the search is limited to V 2 o , the maximum is 4\/2o. Thus, is relatively since high. a decreases is very small. The Marr-Poggio is out of range of the true are in range. method disparity (this in which we are searching). to see i f they the zero the same orientation. T h e match than T h e matching in range at the high resolutions does not lie in the region resolutions zero and have of the search given are then resolution. the probability of getting high resolution means that the true match each and Poggio, increases, the maximum at low resolutions detects when (Marr at to light) is, more the size that can be measured Conversely, happens, then low. Because the resolution or dark (that shown V2o, lines. is searched matches have region disparity a representations in parallel epipolar contrast sign (i.e. light to dark region image When this If so, the disparity by the low resolution searches are used to shift the search region in the 24 higher resolutions so as to bring the process is depicted in figure search region the following probabilistic scheme. Poggio to be on the order of 0.7. at least one disparity. This whether or not a given search region is in range If the probability of there being at least one match being actual 2.6. The Marr-Poggio method determines with into range of the match in search region is out search range then region approaches 1. the probability of there Thus, by examining percentage of zero crossings that have possible matches in a neighbourhood one can whether or Clearly, the provide a not the search neighborhood region in is in range which the meaningful estimate of the the in the search region was shown by M a n and If the search region is in range, the of of the statistics match are true disparity tabulated proportion. determine in that neighborhood. must If the the be proportion large enough to of matches in a neighborhood falls below a certain threshold (say .8) then the neighborhood is declared out of range and all matches are rejected. The lower resolutions are then examined to see if they are in range. If they are then the disparity information provided by these low resolutions used to shift the search region at the high resolution and the are matching process is repeated until the search region is in range at the high resolution for all points in the image. It can happen that there is more than one possible match when the search region matches can be the find out Poggio which one perform the is in correct one, is the range). Since it unless we can correct one, disambiguation by we is obvious disambiguate will examining incur an whether reference to the current disparity estimate, is positive, negative compared to the that in the search region (even not of these possible these possible matches, that is, error the in the match. measured Marr and disparity, with or zero. This measure is then dominant disparity sign in a neighborhood about the If the sign of the possible match agrees with the all dominant match feature being matched. then it is chosen as the correct match. If none of the possible matches agree with the dominant disparity sign, or if there is more than one match that agrees with the dominant disparity sign then no match is made. This is known as the three pool matching hypothesis as it assumes that there are only three types of disparity detectors, divergent (-), convergent (+ ) and null (0) which are 25 LOW R E S MATCHING^ RANGE Hi'RES. MATCHING RANGE do- _DISPAR|TY -FUNCTION HI-RES IN R A N G E FIGURE broadly figure tuned 2.7. 2.6 as The visual system disparity under is to disparity three (Marr Grimson disambiguate The M a r r - P o g g i o multi-resolution pool (1985) question at possible the matches (as the neighborhood) the for grouped example that is the no match at three model information that be from candidates. disparity resolution. would one into scheme. pools. has The been scheme proposed is depicted for the in human p323). matching with lower then 1979, suggests agreement are mechanism and Poggio, between in and matching If the all is This information the low case made. the lower involves in a resolution if the low resolutions choosing the neighborhood information resolution can be match about cannot disparity used the to whose feature disambiguate varied within 26 MATCHING ) RANGE ( C O N V E R G E N T POOL d N U L L POOL DIVERGENT POOL FIGURE 2.7 The three matching pools of the three pool hypothesis. 27 2.4 - Problems with the Marr-Poggio Matching Scheme There with regard are to a number matching of problems with the images with Marr-Poggio matching non-constant disparity functions. scheme, We especially describe these problems in this section. Passing information from lower resolutions to higher. In a recent paper (Grimson, 1985) Grimson makes the following comments: "What is the effect of driving the matching process in a coarse-to-fine manner? At the next finer level, there will in general be twice as many feature points. If image features persist across scales, which they usually do, then in general, each of the feature points at the finer scale can be associated with a feature point at the coarser scale. This will not always be the case, of course, and if there is no corresponding feature at the coarser scale, then ... the use of multiple scales implies no savings of computational expense." This quotation implies that in the Marr-Poggio scheme lower resolution to higher resolutions only for those information is transferred from features that persist across scales. However, since there are generally twice as many feature points at the next higher resolution, it follows that only half of the features will persist between one resolution and the lower resolution level (this is easily observed in the scale space representation, As Grimson points out, the fact that information from the next e.g. figure 2.4). lower resolution is not available for matching those features which do not persist to the lower resolutions, means that a larger search region is needed, increasing the probability of obtaining ambiguous matches, and hence, increasing the probability of matching error. One way of solving the problem of transferring information between resolution levels is not to rely on only those features that persist between resolution levels, but rather reconstruct the disparity function from the information at all features at the lower resolutions, so that an estimate of the disparity is available at all points in the lower resolution image. In this way, a disparity features that estimate persist is available from the for lower all features resolutions. in This the higher technique resolution, is hinted not at just those by Grimson (1981a) in his early paper describing his implementation of the Marr-Poggio matching scheme. 28 He proposes extracting the disparity information from a region in a low resolution image, to be used by the higher resolution matching, by finding the median, mode or average of the disparity values in that region. Grimson did not go into any detail on this topic (only one sentence in a 36 page paper) and it is not clear whether or not he was suggesting a reconstruction scheme such as we are proposing and in any case seems to contradict the implications of the quote given above taken from a later paper of his (Grimson,1985). In- range/Out- of- range detection A second, and more fundamental, problem with the' Marr-Poggio matching scheme rests in what they refer to as the continuity constraint. This constraint states that disparity varies smoothly almost everywhere and that only a small fraction of the scene is composed of boundaries that are discontinuous in depth (Marr and Poggio, 1979 p.303). Grimson (1985) gives a simple example which illustrates how the in-range/out-of-range detection mechanism of Marr and Poggio, which is based on matching statistics in the neighborhood about the feature to be matched, fails for discontinuous disparity functions. This failure is given as a reason for why the continuity constraint is required. The idea behind Grimson's example is depicted in figure 2.8. Suppose that the matching statistics are compiled over the square region with sides of length d, as shown in the figure. Now suppose that (x/d) of this neighborhood covers region A, which has a disparity that is out of range of the matching process and the remaining (1-x/d) portion of the neighborhood covers region B which is in range of the matching process. If e is the threshold of the percentage of matching zero crossings in the neighborhood that is required to declare the match (of the feature in the centre of the neighborhood) in range, then for what values of x will the percentage of matched points in the neighborhood exceed el Grimson shows that if x is less than or equal to (l-e)d/0.3 then the matches in the neighborhood will be declared within range. It is clear from the diagram that if x is less than 0.5d then the match in the centre of the region will be over region B, and hence will be in range. However, if e is greater than 0.85 the algorithm will actually conclude that the match is out 29 7ou-r-crf -range A -</-:t FIGURE 2.8 The failure of the Marr-Poggio in-range/out-of-range mechanism for discontinuous disparity functions. After Grimson, 1985 of range. Conversely if e is too low (i.e. less than 0.85) then one can encounter situations wherein matches that are actually out of range are taken by Marr and Poggio's algorithm to be in range. Grimson took this failure of the Marr-Poggio in-range/out-of-range result of the violation of the continuity constraint However, as detector to be a we will show, the Marr-Poggio in-range/out-of-range detector can also fail for disparity functions that satisfy the continuity constraint. In fact, it can be shown that their in-range/out-of-range detector will only work 100% of the time for constant disparity functions. To see this consider a modified version of Grimson's example. Instead of having a discontinuous disparity function we assume a linear disparity function, as shown in figure 2.9, which obviously satisfies the continuity constraint We assume, for convenience, that the disparity function varies only along the x axis and is constant along the y-axis. The following analysis also holds for the more general case of an arbitrarily oriented linear disparity function. Suppose that the range of the matching process is 2w. Suppose that we have a possible match to a feature at point P, and 30 ^disparity /*,-foncrion 2w 'out of ran-^e F I G U R E 2.9 T h e failure of the Marr-Poggio in-range/out-of-range (continuous) disparity functions. linear that our doesn't the disparity estimate d about point P, that the Since we know that nearly match Marr-Poggio and 70% of percentage of features function the is m then P is actually that the match Suppose that we count then a point know this). Thus it is clear matching process. size at have at least that will detector out-of-range be matched it is simple to show for the feature one possible 100% o f the features features (although the match. in-range will If this would matching accept a match, percentage the match the percentage exceeds as process of features e correct. will have we can calculate in the neighborhood. If the slope that of in a neighborhood of of the matching have algorithm at point P is in range the percentage of features, In-range/Out-of-range the exact mechanism for the of the disparity having a match in neighborhood is: p = 100*[0.7 + If this percentage exceeds e (2.4.1) 0.6w/md] (say 85%) then the match is taken to be in-range. However, we 31 know that the match is actually in-range. Thus if p is less than e erroneously declared out of range. We cannot reduce then the match will be e below 70% because otherwise matches that are truly out of range will be taken to be in range. Note that, even if we take e to be a very non-conservative value near 70% (i.e. 70%+5) there can be cases where in-range matches will be rejected. This is because, for any 6>0 we can find a value of the disparity function slope, m, for which p is less than 70 + 8. In fact this occurs when m > This means that the Marr-Poggio in-range/out-of-range continuous surfaces. detection scheme 60w/d5. fails even for some Upon closer examination, it is seen that the Marr-Poggio theory not a continuity constraint but a constancy constraint for strict validity. We will requires see in the next discussion that this is the case for the Marr-Poggio disambiguation scheme as well. Grimson matches. (1981), (1985) suggests that figural continuity can be used to disambiguate possible The use of figural continuity, involves distinguishing between indicating out-of-range, which was first proposed matches which give rise by Mayhew and Frisby to small scattered segments, and matches which give rise to extended contours, indicating in-range. This method seems to be very effective. We have not studied the use of figural continuity in our algorithms, as it is fairly computationally intensive, and as we shall see later, our algorithms work well enough without it Problems with disambiguation As you may recall, one of the methods that Marr and Poggio suggested for disambiguating between possible matches was to examine the dominant sign of the disparity in a neighborhood about the feature to be matched. The candidate match which was consistent with the domimant disparity sign was then chosen as the correct match. However this method works only for disparity functions that are constant, seen in figure 2.10. Suppose we want or almost to disambiguate so. That this is so can be the possible matches to a feature located at point P. Suppose that there are two possible matches, one with a positive disparity (relative to the current disparity estimate) and the other with a negative disparity. In order to disambiguate these we find the dominant disparity sign of the unambiguously matched features 32 matched •features dispari+y £ estimats •function FIGURE 2.10 The failure of the Marr-Poggio disambiguation procedure for non-constant disparity functions. in a neighborhood about P. Futhermore, since there are the dominant negative disparity sign Note that these more negative is taken to disparities both positive and negative. disparities in the neighborhood than positive ones, be negative. disparity is taken to be the correct match can modify the are Thus the candidate match with even though it is in fact incorrect disambiguation method so that if there is more than the One one type of disparity sign in a neighborhood then no attempt is made to assign a match to the ambiguous feature. However, if a disparity function is non-constant almost everywhere then very little disambiguation will be performed and very few matches will be made.. Grimson (1985) suggests that figured continuity can matches as well as determine whether or not a feature This method sufficiently involves accepting large extent only those matches However, the test for be used to is in range of the matching process. which result figural in extended continuity can expensive and may break down for noisy images, or for images whose as described in chapter disambiguate possible be contours with computationally features are distorted 4. In these cases the zero crossing contours may be become broken 33 up. 34 2.5 - A Simplified Multi-resolution Matching Algorithm. Based on the discussion in the previous section we can make the following is unreliable observations. The 1. in-range/out-of-range for non-constant 2. The non-constant mechanism of Marr and Poggio functions. disambiguation process proposed by Marr and Poggio is likewise unreliable is passed from lower than done by for disparity functions. 3. The resolutions disparity detection way needs in which to be disparity examined information in more detail was resolutions Marr and to higher Poggio, and Grimson. Since for non-constant applications), need to would be the increased now functions these effect of the due to incorrect system a it that is did evident not probability of simple are are these processing error propose (which operations eliminating hand, that the detection a neighborhood of points other heavily on the The since would simplify A ensure and the in-range/out-of-range disparity examine operations On the and the quite for in rule rather feature operations all and to processes than computationally each somewhat that disambiguation the Clearly, resulting in a these processes eliminating faster in (because matched) we together. unreliable exception intensive be are most of the wonder what eliminating these matching scheme. run the risk of would have to we matching. perform in-range detection having out-of-range matching information passed algorithm to the that or tries or disambiguation ambiguous matches to these satisfy higher resolutions from the be very conditions small. by We relying lower. multi-resolution nearest neighbour matching algorithm The algorithm is operation depicted of this algorithm, schematically in which figure 2.11. we call The the idea nearest- neighbour matching behind this algorithm is as 35 DISPARITY ESTIMATE FROM PREVIOUS L E V E L LEFT IMAGE V G FILTER AND IC. D E T E C T 2 NEAREST NEIGHBOUR MATCHER V G FILTER AND Z.C. D E T E C T 2 k DISPARITY SAMPLES DISPARITY FUNCTION RECONSTRUCTION DISPARITY E S T I M A T E TO NEXT L E V E L FIGURE 2.11 The operation of the multi-resolution nearest neighbour matching algorithm. follows. We eliminate all in-range/out-of-range checking and disambiguation of possible matches, partly to save on computation and partly because of the difficulties these processes have with non-constant disparity functions. In so doing we require that a disparity estimate be available that can be used to determine the centre of the matching region at a given resolution, and that this estimate be as accurate as possible in order to reduce the probability of the feature being out of range of the matching process. In order to reduce the probability of obtaining ambiguous matches, we must restrict the effective matching range. This is done 36 by using nearest neighbour matching. If the disparity estimate is sufficiently accurate, then the match whose disparity is closest to the and we therefore estimated disparity is most likely the correct match, treat it as such. Hence the term 'nearest neighbour matching'. This form of matching is depicted in figure 2.12. This type of matching algorithm requires that the disparity estimate be fairly accurate (however, as we will see in chapter 4.5, disparity estimate is obtained from the a disparity reconstruct estimate the for disparity all error can be tolerated). In our system the lower resolution matching processes. In order to have feature function some points in the sparser from the set higher of resolution disparity it values is necessary available at to the lower resolutions. Thus, for the high resolution disparity estimate to be accurate it is required that both the disparity measurements at the disparity function from these measurements accurate disparity estimate at a given lower resolutions, and the reconstruction of the be accurate. One way in which to provide a more resolution level is to transfer information from the higher resolutions as well as the lower (after an estimate is available at the higher resolution, of course). The processing flow of such an iterative matching method is given in figure 2.13. This multi-level flow of information from low resolutions resolutions is similar algorithm (discussed computational to that proposed in chapter requirement, as 3.3). the by The and disparity described in chapter improvement matching performance of the resolutions surface of this approach performed in every iteration. Experiments in the high Terzopoulos (1982) in his main drawback matching, to function back to low reconstruction is the reconstruction 5 indicate, however, increased must be that some algorithm is obtained as a result of such iteration. The feature representation that is used in our algorithm is the zero crossing pyramid with the spatial quantization proportional to the value of a. This means that the average zero crossing density in zero crossings per pixel is independent of the resolution level. Using a pyramid structure means that less computation is required to perform the matching search for a given disparity range at low resolutions than at high resolutions. We also consider the use of extrema (peaks or valleys) of the V G filtered image as features to be matched. This was 2 37 EPIPOLAR LINE 21 ESTIMATED POSSIBLE MATCH POSITION MATCHES CHOSEN MATCH (NEAREST NEIGHBOR) FIGURE 2.12 Nearest suggested some by Frisby and Mayhew (1981) mechanisms bulk of the thesis (chapter 4) process by experimental is performed involves and their as o f the reconstruction supported well. who claimed that extremum phenomena concerning human stereopsis The well neighbour matching. effect process results were required if to be explained. an examination of the disparity on the performance (chapters (chapter sufficiently well, were features 3 of the matching and 4). These 5). These the simplified studies theoretical indicate matching measurement that, algorithm error algorithm as examinations are i f the reconstruction described here works 38 RESOLUTION LEVEL 1 M r LEVEL M 2 r M LEVEL 3 LEVEL 4 FIGURE 2.13 The processing flow of a- multi-level iterative matching algorithm. 39 2.6 - Disparity Scanning Matching Algorithms. In matching by a recent paper (Grimson, from low resolutions scanning through a large 1985) Grimson suggested to high, the matching disparity range and that instead of guiding the be done at the highest resolution only, noting at which disparities a match was possible for a given feature. Thus for each feature, a list of possible disparity values could be tabulated. recommends resolutions Then these possible matches could be disambiguated using the to do the information obtained by performing in some fashion. Grimson the disparity scan at lower disambiguation. Note that since the disparity range is the same for all resolutions the following comment by Nishihara (Nishihara, 1984) "Marr and Poggio's idea of trading off resolution abandoned in (Grimson's) technique." rings true: for range seems to be largely Grimson's method of disambiguation is as follows. Given a set of possible matches at a given resolution, a neighbourhood at for unambiguous matches. same, and if this the If the disparity next lower resolution disparity values value is one of feature point is checked of these unambiguous matches are all the the about possible the disparity values for the high resolution feature, then the high resolution feature is assigned that match. Otherwise the high resolution feature is assigned no match at all. Note that this method suffers from the same problem disambiguation fails as did Marr and Poggio's mechanism; it for disparity functions. If a region of the lower resolution image has non-constant there will, in general, be more non-constant disparity then than one disparity value in that region. Thus no assignment of disparity values will be possible for the higher resolution features. To remedy multi-resolution this problem we dispscan algorithm. propose We the " following algorithm, begin the matching process at resolution level. At this level we scan through the entire disparity range, matches. At each feature for neighborhood of points about which we have this feature. For found at least one which we the call very the lowest looking for possible match, each possible disparity of the we examine central a feature we count the percentage of matches in this neighborhood that result in a disparity within 2 pixels of this disparity. The candidate disparity which has the largest percentage of such 40 matches is taken to be the correct disparity for the feature. It is obvious that we cannot do this at the higher resolutions reasons that we gave for when the the disparity inadequacy of the function is not constant, Marr-Poggio matching for the technique. However, after doing the above process at the lowest resolution, we have an estimate of the function, higher with which we can resolutions entire disparity have. Then, about this we do the range for and each feature modify the and following. mark such matching As at down all possible tabulate the the disparity, process at the lowest possible we percentage the disparities we scan that a examine the possible matches of disparity higher resolutions. resolution, features a whose At the through given in same the feature can neighborhood disparities, when added to the difference between the disparity estimate at the central feature and the disparity estimate at candidate the neighborhood feature, lie disparity with the within largest percentage 2 pixels is chosen of as the the candidate correct one. disparity. Note The that the disparity estimate from the lower resolution is being used to guide the disambiguation process, but in a operation way of that allows the the algorithm is algorithm depicted simple multi-resolution matching to work for graphically in that the disparity values possible and that the reconstruction as possible. reconstruction As we have process are stated studied figure 2.14. disparity This obtained at the lower resolutions. low resolutions functions. algorithm, algorithm described in the previous section, on the accuracy of the information provided by the important non-constant The like the depends crucially This means that it is must be as accurate as of the disparity function from these values be as accurate earlier, the examination in some detail in the of the disparity remainder chapter, focuses on methods for performing the reconstruction process. » of the measurement and thesis. The next DISPARITY 4 N = 5 FOR 0 - 2 3 ±2 PIXEL MATCHING RANGE N = 9 P0RD-I6 M=3 ESTIMATED H DISPARITY FUNCTION MINUS _ r _ OFFSET il F0RD-/2 n N = 4-F0RD=G • POSSIBLE ©POSSIBLE NEIGHBORHOOD F E A T U R E CENTRAL FEATURE X NEIGHBORHOOD F E A T U R E ® FIGURE 2.14 CENTRAL The MATCH MATCH POSITION FEATURE POSITION operation of the multi- resolution Dispscan matching algorithm. 42 2.7 - Other matching methods The Marr-Poggio type of stereo matcher is not the only mechanism that has been proposed. Historically, the Marr-Poggio algorithm arose from a consideration of previous efforts to model the human stereo vision proposal by Julesz (1971) solution for the disparity spatially distributed, system. The bulk of these methods that the human stereo vision function is obtained but interacting, disparity process based on the is cooperative. by the cooperation detecting were neurons. That of a large Such is, a number of cooperative algorithms were proposed by Nelson (1975), Dev (1975), Sugie and Suwa (1977). Marr and Poggio (1976) presented a cooperative incorporated physical algorithm constraints However, even previously mentioned algorithms, Also, as Marr this (1982) analyzed in the algorithm. p303). (later formulation cooperative works has in Marr, Palm, of the computational algorithm, which poorly on natural remarked, and Poggio, iterative performed imagery methods (Marr 1978) which structure better of the than the and Poggio, 1979, (which most cooperative algorithms are) are not likely to be used by biological systems because of the slow speed of the neurons. Fast operation is obtainable only by one-shot networks of processing elements (neurons). algorithms utilizing highly parallel The inadequacy of Marr and Poggio's cooperative method led Marr and Poggio to develop their multi-resolution matching algorithm which has been described in the previous sections. Research into stereo vision has not been limited to the modeling of biological systems, of course. earliest There of these O'Handley, have been methods and Yagi, many methods was the intensity 1973). This technique proposed based specifically area for machine correlation involves correlating vision. The technique (e.g. Levine, the intensity functions of regions of the stereo image pair to determine the disparity in these regions. These methods typically suffer from low resolution and from the high ambiguity of the image intensity values. Baker and Binford (1981) used edge correlation in addition to intensity correlation in an effort to reduce which both coarse-to-fine reduced the matching ambiguity. These also the ambiguity and sped up the introduced a number of constraints computations. They also used a approach to limit the amount of computation required for the correlation. Ohta 43 and Kanade (1985) have proposed an algorithm which uses dynamic programming methods for searching large spaces of candidate matches in three dimensions (that is along the epipolar line as well as across it, and along the disparity dimension). Baker and Binford (1981) also use values dynamic programming drawback to search the space of possible disparity of these dynamic programming approaches computation required, especially for complex seem for matches. The to lie in the immense amount of scenes. From the point of view of simplicity in implementation the Marr-Poggio type of matching, or the nearest neighbour type of matching we propose is preferable over the correlational methods. concentrated on the Marr-Poggio type of approach. It is for this reason that we have 44 2.8 - Summary of Chapter 2 - The problem of matching feature based image descriptions was introduced. - The Marr-Poggio (1979) multi-resolution matching method was reviewed. - The out-of-range multi-resolution matching detection method and disambiguation techniques were shown to be unreliable used in the Marr-Poggio for non-constant disparity functions. Questions transferred - were raised about the way in which the Marr-Poggio algorithm information from the lower resolutions to the higher. A simple algorithm, which does away relies on reconstruction (the Nearest with the out-of-range of the disparity Neighbour detection Matching Algorithm) and disambiguation function at each resolution was proposed processes, and instead to provide a disparity estimate to guide the matching at the next higher resolution. Disparity scanning matching algorithms, as proposed by Grimson (1985) are introduced. A multi-resolution disparity scanning matching algorithm based on reconstruction of the disparity function at each resolution is proposed Algorithm). This algorithm is able to handle non-constant (the Multi-Resolution DispScan disparity functions, whereas it is not clear whether or not Grimson's method can. chapter It is shown crucially depend that the peformance on the accuracy of the matching of the disparity algorithms proposed measurements resolutions and on the accuracy of the disparity function reconstruction process. at in this the lower 45 III - RECONSTRUCTION OF THE DISPARITY FUNCTION FROM ITS SAMPLES 3.1 - Introduction In the discussion, in the previous chapter, of the discrete multi-resolution matching algorithms, it was pointed out that the sparsely sampled disparity function obtained at a given resolution level must be, at least partially, reconstructed or interpolated to provide a denser sampling. This denser sampling is needed since guidance of the matching process at the next higher resolution requires a depth estimate at each feature location Because the resolution feature disparity density increases as function is not the resolution sampled at all increases, locations in this higher resolution. it follows that corresponding the to lower the higher resolution feature positions. Hence the lower resolution disparity function must be reconstructed or interpolated image to provide disparity values at all feature locations in the higher resolution representation. The highest reconstruction resolution resolution of the level, where feature matching gaps between containing the disparity feature values disparity function from its samples there is no need process. In this points at and each provide case the provide point to a in the disparity reconstruction complete image. is also estimates to is needed disparity One required at map, may to that argue a fill is, that higher in an it the the array is not necessary to know the disparity at each point in the image, but only where it is needed by some visual process. However, it is unlikely that the place where a disparity value is required is always going to coincide with a location at which the matching algorithm explicitly provides a disparity required. value. Grimson considerations, for performing the applicability of Hence some (Grimson, the 1981b) interpolation reconstruction the amount reconstruction provides process. operation. reconstruction of methods It further In this should described or interpolation motivation, chapter be we pointed in this based will out chapter reconstruction on that is not always be psychophysical discuss stereo vision case but to a very wide range of applications. For example, discusses the use of the transformation will the methods of domain of limited to (Clark et al, the 1985) method described later to an application 46 in radio astronomy. The topics covered in this chapter are listed in block diagram form in figure 3.1. A note on the terminology used in this thesis. There are three terms which will be used to denote the process of obtaining the value of a function at a point where it is not known explicitly. These are interpolation, approximation and reconstruction. These are, for the purpose of this thesis, defined as follows. Interpolation is the process of fitting a known (class of) function(s) through the measured function values such that the resulting function has the same values at the measurement points as the measured values. Approximation is the same as interpolation except that the condition that the interpolated function have the same values as the measured values at the measurement points is not enforced. Reconstruction the process of obtaining the exact function that has been sampled, using is defined as information (or assumptions) about the underlying function (for example, whether or not it is bandlimited). It should be noted that many of the methods, however the underlying function, and reconstruction difference is that the hence nothing can be methods look similaT to interpolation interpolation methods assume nothing of the said, in general, of their accuracy. In the reconstruction methods the underlying functions are assumed to satisfy some constraints, which allow the accuracy of the reconstruction to be determined. Proofs of Theorems stated in this chapter can be found in the Appendix. Introduction I CH 3.1 Grimson and Terzopoulos 1 The WKS sampling * method theory and i t s reconstruction methods extensions CH The Warping 3A I CH3.5 Extension t o 2D (Relaxation) CH I 3.fo Implementation (*) Indicates New M a t e r i a l CH 3.? Adding gradient information t o the reconstruction process CH 3.? F I G U R E 3.1 The topics covered in this chapter. 48 3.2 - lnterpolatory Methods We will only briefly describe some interpolation methods, as these produce results which typically contain more error than the methods to be described later. A good review of interpolation methods can be found in Appendix VI of (Grimson, 1981b). The simplest form of interpolation is known as nearest neighbour interpolation. In this method a given function value is assigned the value of the function sample nearest to it. A slightly more complex method is linear interpolation wherein straight line segments are fitted between adjacent sample points (for the one dimensional case; in two dimensions the points are segmented, fit through polynomials. which methods each of or triangularized, into triplets and a planar function, f = these triplets). However, higher almost certainly involve would piecewise smoothness constraints coefficients in a patch order are (ID not basic idea can be extended to a + bx + cy, is higher polynomial functions tend to exhibit oscillatory be matching met. This For case) may of present in low order example, be set in the actual function. polynomial the case so that the of Spline sections cubic measured such splines, sample order behaviour interpolation that two function values certain of the at the ends of the patch match with the interpolation, and the other two coefficients will be chosen so as to ensure that the first and second derivative of the cubic segment match those of the two adjacent cubic segments; 49 3.3 - The Methods of Grimson and Terzopoulos The first detailed analysis of the surface reconstruction problem in the context of stereo vision was performed by Grimson, (Grimson, 1981b, 1982) as part of his PhD research. The approach Grimson took was to construct a complete surface (or disparity) description based only on the surface information known along zero crossing contours (Grimson used zero crossings infinite values as the features to be matched set of possible surfaces known imaging along process the zero which, that could crossing in the correspondence satisfy contours, when coupled with process). To constrain the the conditions imposed by the disparity Grimson the shape relied on the properties and reflectance characteristics of the of the surface, gave rise to the zero crossings in the image. Informally, his method was to compute the surface which fitted the known surface depth values and was 'most consistent' with the implicit shading information. Crudely put, this information can be described as implying that the reconstructed surface should, when passed through a V G filter, not contain any new zero 2 crossings that were not in the orignal zero crossing set This was formally referred to by Grimson (1981b, 1982) as the 'surface consistency' constraint, and informally as the 'No news is good news' constraint and was stated as follows: The absence of zero crossings constrains the surface shape. Grimson best surface shows that the adoption of this constraint results in the conclusion that the to fit the known orientation (also known data is the one that minimizes the variation in surface as the quadratic variation of the depth gradient function) over the surface. Grimson shows (1982) that the functional to be minimized is the following: 6(0 = [//(f xx 2 + 2f xy 2 +f yy 2 )dxdy] 1/2 (3.3.1) Grimson (1982) shows that the above minimization problem can be characterized by using the calculus of variations to provide a set of differential equations (known as the Euler equations) that the minimal function must obey. Doing this, Grimson obtained the following differential 50 equation for f: V'f where V = fxxxx +2f xxyy +f yyyy - is the biharmonic operator. (3 3 2) \J.->.*-J 0 The boundary conditions for this P.D.E. are given by (for the case of a square boundary, aligned with the coordinate axes): f f x x = 0, f y = 0 for the boundaries parallel to the x axis. (3.3.3) = 0, f = 0 for the boundaries parallel to the y axis. (3.3.4) XX y y x It can be shown (Terzopoulos, 1982) that the minimal the above P.D.E. can be modeled as the surface function obtained as the solution of that a thin metal plate takes when it is constrained to pass through the known depth values. Grimson (1981b) presents a computational method whereby the minimal surface can be obtained. Essentially this method involves searching for the surface function f which minimizes the functional 0(0 (equation 3.3.1) with a conjugate gradient search algorithm. Since the surface representation to be determined in practice, as well as the input depth constraints, are defined converted in a into discrete a rather than continuous discrete • problem. differences and the integrations This fashion, involves the above conversion of minimization problem is the differentiations into into summations. The function to be minimized then becomes (Grimson, 1981b, pl96): yy\-7 m-f + where Is- •} is the set of reconstructed surface depth values and {c. •} is the set of known 51 surface depths. The indices i j refer to positions on the grid of surface points. The scalar 0 is a smoothness parameter. exactiy. If it is zero, the resulting surface will fit the known data values If non-zero then the resulting surface will be a smooth approximation to the actual surface. In this fashion errors in the measured surface depth values may be smoothed out. Computationally, the conjugate gradient algorithm of Grimson, for the discrete case can be (see Terzopoulos, 1982) formulated as a relaxation algorithm. Relaxation methods are iterative procedures for determining the solutions of linear systems of equations of the form: Au and = I (3.3.6) hence: u where = A is a A " 1 ! known (3.3.7) N x N nonsingular matrix and J is a known N x l column vector. Relaxation methods provide estimates of the solution vector u . The operation of the relaxation methods can be described by the following matrix equation: S where (k+D = GU (k) G is the iteration + E (3.3.8) matrix estimate of the solution vector u ^ and is a function of A and where k + 1 ^ Pc=J. is obtained by multiplying the previous estimate of the solution vector u ^ ) by the iteration matrix G and adding to it the vector constraints, f different types relaxation, (which is zero of relaxation for which G = for the points schemes The current at which no depth can be put into this D~^(L+U) where A = D-L-U diagonal D, upper triangular U , and lower triangular L value of known depth is known). Three form. The first type is Jacobi is the decomposition of A into components. The other two types of relaxation methods are the Gauss-Seidel relaxation, for which G = (D-L)~Hj, and Successive 52 Overrelaxation (SOR) for which G parameter = 1 1 1 with the (I-wD~ L)~ [(l-w)I+a;D~' TJ] relaxation we(1,2). The Jacobi method is a parallel method in that each element of the new solution vector can be computed simultaneously. It requires that both the new and old solution vectors be stored completely. The Gauss-Seidel method is a sequential method as the computation of a particular element of the new solution vector can use the elements of the new solution vector that have been already computed. Thus the Gauss-Seidel algorithm will operate faster than the Jacobi algorithm. The Successive Overrelaxation algorithm is a generalization of the Gauss-Seidel algorithm (one obtains the Gauss-Seidel algorithm from the SOR algorithm when u> — l). This algorithm scales the residual vector (the vector that is added to the old solution to provide the new solution vector) by a number greater than one in an attempt to speed convergence. Terzopoulos (1982) provides the conditions on G which insure that the iterative process converges. In particular he shows that the SOR algorithm can not be guaranteed to converge if co is greater than or equal to 2. The matrix A is given by the 1 Hessian of the functional 0(0 and is given by: h A = [3 0(u )/3uij9ukl], l<ij,k,l<N (3.3.9) 2 2 where NxN is the size of the grid of depth values. Note that A is an N xN sized array. For any reasonable value of N this results in a very large array. However A is very sparse, and banded. We can characterize the computations required at each point in the surface grid at each iteration as a multiplication of the neighbouring surface depth estimates by a set of fixed coefficients as shown in the following relaxation iteration formula: c 0 iJ 2 v + v + v + v ( i - l j - l i + l j - l i - l j + l i + lj + l) (20+0)^ 8 = v + v + v + V - ( i-lJ i + lj i j - l iJ + l + y + v +v ) ( i-2j i + 2j ij-2 U + 2> + V + + (3.3.10) These coefficients are termed computational molecules by Terzopoulos (1982) who derived the 53 values of the near the problem coefficients, both grid boundaries where come for the the into play. These interior grid points as well as for boundary conditions imposed by the same molecules were obtained the grid points formulation of the by Grimson (1981b) in his specification of the conjugate gradient algorithm. These computational molecules will be used in the implementation of a relaxation algorithm later in the thesis, and are displayed in figure 3.2. The convergence of the aforementioned relaxation methods and of the conjugate gradient methods turns out to be painfully slow. In an effort to speed up the computation of the solution vector, Terzopoulos (1982) proposed the This method involves using depth The coarsely sampled depth values at one of the standard constraints the single grid relaxation at a use of a iterative multi-grid algorithm. number of different levels of resolution. low resolutions algorithms. would be interpolated first, using Since these algorithms reduce the high frequency errors in the depth estimate very quickly, a coarse solution can be obtained with a small number relaxation of on the iterations. This coarse estimate is then next higher resolution level. The processing the highest resolution level has been reached. Because, high since frequency used errors are diminished, and as an initial estimate continues in this manner for until at each resolution level, the relatively decreasing the resolution decreases the frequency of the errors that can be filtered out, it can be seen that such multi-grid methods can quickly eliminate resolution relaxation but take far measurements components having fairly low frequencies. The normal, single methods can eliminate the high frequency error components quite quickly, more are error iterations to eliminate the low frequency errors. Thus, if depth available at a number of different spatial resolutions, then the Terzopoulos multi-grid algorithm would be expected to be much faster than Grimson's method for surface reconstruction. Let us now discuss whether or not Grimson and Terzopoulos' algorithms are actually suitable for our application. Let us start by checking the validity of the assumptions made by Grimson (1981b). 54 FIGURE 3.2 Computational molecules for the relaxation surface approximation algorithm (after Terzopoulos, 1982). The thick bars indicate the boundary of the grid. The starting point for Grimson's development of his surface reconstruction procedure was his surface consistency constraint. Basically this said that if there was a region of the surface for which there was no zero crossing observed in the V G image then the surface 2 could not be changing appreciably, otherwise a zero crossing would have been created. This 55 statement seems reasonable enough but it neglects one important fact. V G filtering operation restricts the density of observed zero crossings. 2 2 This fact is that the In fact, the larger the space constant of the V G filter, the lower the density of observed zero crossings. This can 2 be easily seen by looking at any one of the scale space maps that are depicted in chapters 4 through 6. Thus we see that Grimson's surface consistency constraint strictly holds only for the case of V filtering, and not for V G filtering. There can be cases where the 2 2 surface changes appreciably and not result in an observed zero crossing, simply because the change in the surface caused an intensity change which was of too high a frequency and was therefore filtered out by the resolution surface V G operation. This 2 representations may not since these high be a major problem frequency surface changes for the higher that are filtered out would not be perceived by the visual system anyway. At the lower resolutions, however, the problem is more serious. The surface will be fairly low frequency changes, changes and ones that are filtered out at that would be these resolutions perceivable by the visual system. Of course, if one is only trying to obtain a multi-resolution surface representation, as Terzopoulos (1982) does, then it is not expected that the lower resolution representation would capture all of the higher frequency surface changes. On the other hand, we have seen that the multi-resolution stereo matching process low resolution information so that the matching can proceed efficiently and accurately higher resolutions. Thus, if the requires an accurate disparity estimate low resolution surface reconstruction process from the at the does not detect the higher frequency changes in the surface (or disparity) then the higher resolution matching algorithms will be required to be more robust Thus we should ask ourselves whether or not there are other consistency question chapter, surface constraint, prompted reconstruction that can the search better for methods, reconstruct reconstruction which the do higher methods not assume Grimson's frequency surface that are described all of which are based on assumptions of the frequency content 3 surface changes. in the This next of the function to One of the reasons that Grimson did not account for this may lie in the fact that, in the statement of his Surface Consistency Theorem and its proof, he used V filtering instead of the more general case of V G filtering. In the case of V filtering all of the (topographic) zero crossings are present and the Theorem holds. However, when V G filtering is used the Surface Consistency Theorem is no longer valid. It should be pointed out that Grimson intended his surface reconstruction algorithm for the task of filling in the disparity array at the highest resolution only. 2 2 2 2 2 3 56 be reconstructed Before Terzopoulos' (i.e. bandlimitedness). we get into these multi-grid algorithm Terzopoulos' algorithm appears in the be available at all to our let us system. briefly discuss multi-resolution matching to be tailor made multi-resolution matching estimates methods, a problem algorithm. for the reconstruction with applying At first glance processes to be done However, Terzopoulos' algorithm requires resolutions before the surface reconstruction can that depth take place. Clearly, the multi-resolution matching algorithm requires that the disparity function at a given resolution level be reconstructed the next resolution level features can be matched. This before means that the multi-grid reconstruction to perform a number of relaxation method is inapplicable. The best that one can do is iterations not as bad as it seems because the at each resolution level. This state of affairs is disparity function estimate from the previous resolution level is available to provide an starting point for the relaxation the low relaxation frequency iterations errors will needed to have been suppressed achieve a certain in this error level process. Presumably, much of manner. is less Thus, the number of than what would be expected from a single resolution level relaxation operation. However, having said this, it turns out that in practice, the relaxation reconstruction methods to be discussed in the next chapter. method is still fairly slow compared to the 57 3.4 - T h e W K S Sampling Theorem and its Extensions In are this section based upon we will series be looking expansions of these researchers including Kotel'nikov (1933), that one can represent coefficients are the samples result is given theorem, here J . M Whittaker at methods (1929), functions. 3.1, It has E . T . Whittaker been (1915), of functions that shown, by C . E . Shannon a host which in honour of the aforementioned distributed. The precise we call (after (Jerri, 1977)) of (1949) and a bandlimited function with an infinite series of the function, suitably as Theorem for the reconstruction whose statement of this the W K S sampling mathematicians. Theorem 3.1 The uniform 1-D sampling theorem (WKS) If F(CJ) = 0 can f(t) is be reconstructed This distributed constant at function for o)>o;o = 7r/T f(t) the a = uniformly. a the points Fourier t = nT, n transform n = F(o>) such 0,+ l , ± 2 , . . . , adapted 0 theorem f(t) (3.4.1) 0 requires for its validity that That is, the spatial from that then f(nT)sin[w (t-nT)]/[w (t-nT)] W K S sampling theorem, One at having from its samples f(nT) as follows: O u r application, however, non-uniformly variable rv=-oo sampling methods, one and is sampled exactly Z of or temporal produces samples as it stands, that between of the function be sampling instants be a that are non-uniformly distributed. Hence, is not valid the W K S theorem interval the samples for our application. W e will allow the reconstruction now look of functions from distributed samples. of the conditions on f(x) for the W K S theorem bandlimited. This means that the following expression is true: to hold is that it be 58 f(x) where ] [-7r,7r] = is some in the /, e^^ujdu bounded interval (which subsequent we will text) and F(CJ) is the instead of this condition, f(x) For (3.4.2) take, without loss of generality, Fourier transform of f(x). to be Now, suppose, was bandlimited with respect to some other integral transform. example: f(x) = / j K(XJCJ)F (W)CL> (3.4.3) k Suppose further that F e L j ( I ) (that is, f(x) K such that the set {K(x cj)} nJ has bounded energy) and that K(x£))e L (I) is 2 for all integer n is a complete orthogonal set on the interval I. Then any ' K bandlimited' function f(x) can be reconstructed as specified in Theorem 3.2 (due to Kramer (1959)). Theorem 3.2 statement The generalized of his theorem WKS is that sampling contained Let I be an interval and Lj(I) <°°. theorem in Jerri, 1977 (due to Kramer, (Theorem 1959, this precise III-A-1)) be the class of functions </»(x) for which fj|0(x)| dx 2 Suppose that for each real x: f(x) = Jj K(x*>)g(u)du (3.4.4) where g(w)eL (I). Suppose for each real x, K(x^>)eL (I), and that there exists a countable set 2 2 E={x } such that {K(x £))3 n n f(x) where = is a complete orthogonal set on I. Then: lim Ef(x )S (x) n (3.4.5) 59 S (x) = ; K( ^)K*(x w)cL;// |K(x ^)| cL n Note that i X nJ if K(XA>) = e I JCdX 2 n (3.4.6) J and x = nT then we get the standard W K S n theorem. As an example, a valid K ( x ^ ) is the Bessel function of m sampling order, wJ^xw), which results in the inverse Hankel transform: f(x) where I = = /j F(u) o)J (xw)dw (3.4.7) m [0,1]. For this case it can be shown that: S n« = V m W ^ V - ^ ^ + W < > 3A8 The sample sequence for this reconstruction formula is given implicitly by the positions of the zeroes of J (x). m graph That is, x of the function J m n satisfies J ( x m m n ) = 0. It is easily seen, for any value of m, that the sample (however, the sequence approaches uniformity for large values of x by looking sequence at a is not uniform ). For this reconstruction formula to hold, f(x) must be bandlimited with respect to the Hankel transform. This means that F(u>) (which is the Hankel transform of f(x)) vanishes for u> outside the interval I = [0,1]. This is a different condition on f(x) than in the Fourier transform case. Note that a function f(x) reconstructible that is not bandlimited in the usual with the above Bessel Fourier function reconstruction transform formula. There types of functions that can be used as kernels in reconstruction some of these, including the associated sense may still be are many formula. Jerri (1977) other lists Legendre functions and the Chebyshev functions of the second kind. All of these reconstruction formula require that the function be sampled in some non-uniform fashion. However, this sample sequence is fixed, for each different reconstruction filter kernel. Thus while the sample distributions are non-uniform they are certainly not 60 arbitrary. Thus these methods are not applicable to our application in which the sample distribution is not known a priori and varies from case to case. One arbitrary possible sample K(x£))eL (I) 2 way in which distribution {x l is n such that (K(x A>)} we might to which one can proceed {K(x jx>)3 is a (although they n is to choose complete might determine, for a sampling theorem each case, a kernel for an function is a complete orthogonal set. In general this would seem to n be an impossible task, involving a search in try to obtain orthogonal not have over the entire space of 1^(1). However, one way a specific kernel and ask whether set explained for an arbitrary their motivation sample in this set {x L n fashion) or not the set This approach was taken by Beutler (1966), Yao and Thomas (1967), and Higgins (1976). They took as the reconstruction kernel function the standard Fourier what conditions is the set {e^n} transform a complete kernel e^ x and asked the question: Under orthogonal set? This question was first looked into by Paley and Weiner (1934) who showed that this set is closed" if the sample set {x } fi obeys the following condition: |x -n| n < 2 1/7T (3.4.10) That is, if the sample set deviated by an amount less than l/7r sequence, then the set {e^ xn} 2 from the uniform sample is closed. Levinson (1940) provided a tighter lower bound on the maximum allowable sample deviation and stated that: |x |-|n| < n 1/4 (3.4.11) Levinson claimed that this is the 'best possible' bound. "Closure in this context means that: fjfitxXeJ^nJdx = 0 iff f(x)=0 almost everywhere for all x and all (3.4.9) x n e ^ x n 5- 61 Since functions easily the e ^ are known = n Higgins corresponds as methods. non-harmonic Fourier series methods Given this which basis utilize these set it can be a that, sin[7r(x-x )]/[7r(x-x )] n (1976) unique Kronecker another by reconstruction shown that: S (x) the n are not harmonically related, shows and <.|.> set of complete for g = n e^° n x the Lagrange type S (x) = n that biorthogonal delta (3.4.12) n for basis every {g (x)} set such n indicates orthogonal basis the the resulting reconstruction <g product kernel filter for n that inner reconstruction lg (x)} n | g m > a = S operation. Hilbert n m , space where Thus this functions. F o r example, 6 there is n m results Higgins function corresponding to g in shows is given reconstruction function: H(x)/[H'(x )(x-x )] n (3.4.13) n where: H(x) Higgins further expression (1940) = (x-Xo)rT(l-x/x )(l-x/x_ ) rl states for that if the sample H(x) can be put into also pointed out the existence formula for S (x) n (although not (3.4.14) tl sequence closed form. x n is a It should for the reasons that sample sequences) allow a wider come across no one could for other range extend kernels of different studies of such this besides sample an method (of the Fourier sequences approach, to however, function of n, then the be pointed o f the biorthogonal sequence Higgins derived by Y a o and Thomas (1967), albeit in a less elegant, Presumably rational more down the above did). This result was also manner. closure conditions transform kernels. This be used for reconstruction. and it Levinson and wrote direct determining out that was felt to would be on the perhaps We outside have the 62 scope of this thesis to attempt our own study of this matter. A for different approach was proposed by Yen (1956) who provide reconstruction formula a number of special number of constraints sampling cases of non-uniform sampling. His method on the reconstructed distribution, and setting involved applying a function value, obtained from the nature up systems of linear equations from which of the the function values can be solved. The special cases for which Yen provided reconstruction formulae are as follows. 1. - Migration of a finite number of uniform sample points. 2. - Sampling with a single gap in an otherwise uniform distribution. 3. - Recurrent nonuniform sampling. 4. - Reconstruction of time-limited, band-limited signals from arbitrarily distributed samples. The types figure 3.3. of sample distributions implied The reconstruction down here. The interested by the first three cases of Yen are depicted in formulae that result are very complex and will not be written reader is referred to (Yen, 1956). These equations also appear in the PhD thesis of F. Marvasti (Marvasti, 1973), who used them in an adaptive quantizer. He did not credit Yen with these equations, although he did include (Yen, 1956) in his list of cited literature. Marvasti did, however, propose some interesting methods of his own in his thesis, which we will now discuss. He presents a pair of techniques for 'uniformizing' a non-uniform sample sequence so that the function can be reconstructed with the W K S formula. The first of these techniques involves estimating the function values at uniformly spaced positions based on a finite number of previous (non-uniformly distributed) function samples. To do this estimation he used the method of (Yen, 1956) (again Marvasti did not cite this paper as the source of this method) which reconstructed a time-limited and bandlimited function from its samples (which can be arbitrarily distributed) in the interval where the function is non-zero. 63 FIGURE 3.3 The three special types of sample distributions handled by the reconstruction formula of Yen (1956). Although Marvasti does not mention it, it is clear that if the function whose samples are to uniformized are not time-limited, then this uniformization will not yield exact results. Hence the reconstructed function, after uniformization, will be in error. The second of Marvasti's uniformization schemes works by predicting the function values at the uniformly distributed 64 based on the (non-uniformly information provided distributed) with by a finite a linear predictor. This function values at uniformly distributed locations, states that the error number in the prediction method of preceding method but with function produces estimates of the an added jitter is greater than samples noise. Marvasti that produced using Yen's time-limited function reconstruction formula. To conclude this section we will consider reconstruction methods which involve simple linear filtering of the sampled function (which can be modeled as an impulse train). Marvasti (1984) shows that, if the sampling function, S = E8(x-x ) is thought of as the zero-crossing f\'-oa of the following F M signal: FM = sin[w x + c 11 fp(x)dx] (3.4.15) and if u> is much larger than the bandwidth of p(x), then Q passing the sampled obtained function through by Papoulis (1966) although a lowpass his the jitter noise produced by filter is negligible. A similar result was derivation proceeded directly from the W K S formulation. The essential conclusion of both Marvasti's and Papoulis' analyses is that the jitter error is negligble only if the deviation of the sample positions from the closest uniform sequence was small enough. This is the same sort of condition imposed by the non-harmonic Fourier series methods. It is clear that all of the methods described in this previous section constrain the allowed sample positions. Such techniques are therefore wherein the sample density varies all tightly inapplicable to situations significantly. In applying the stereo depth measurement algorithms described in this thesis to real life imagery, one is often faced with feature wherein the feature reconstruction images density varies considerably over the image. Thus, it is evident that the methods just discussed are clearly inappropriate for our application. What we require is a reconstruction method that allows reconstruction of functions from truly arbitrarily distributed sample sets. The next three sections describe the derivation and implementation of just such a method. 65 3.5 - The Transformation or Warping Method In two this section we dimensions in the develop a one next section, ID Case dimensional sampling theory, that allows one, under certain which we extend conditions, to to reconstruct bandlimited functions exactly from arbitrarily distributed samples. This theory is seen to be a generalization of the analysis of Papoulis (1966) who showed how the standard uniform sampling theory of Whittaker, Shannon and Kotel'nikov could be extended to sample sequences that were result slight deviations from a uniform sample sequence. We show how a more general can be obtained by treating a non-uniform sample sequence as resulting from a coordinate transformation of a uniform sample sequence instead of being merely deviated from the uniform sequence. Section 3.6 details how these coordinate dimensional case. Section 3.7 section 3.6, distributed for describes a heuristic algorithm, based on the theory developed in performing samples. We transformations can be determined in the two will two dimensional first derive a function reconstruction non-uniform sampling from non-uniformly theorem for the one dimensional case and then extend this result to the two dimensional case in the next section. The (WKS) starting point for our derivation sampling theorem (Theorem 3.1). where t , the position of the n n by T the classical Whittaker-Kotel'nikov-Shannon Let us consider a non-uniform sample sequence such that we end up with another transformation to f(t), function h(r) = 7(t ), n for some uniformly spaced (figure 3.4b) arbitrary to, then the samples and we can use Theorem 3.1. to be exact we must have that h(r) be bandlimited to u 0 of the function h(r) will of be h(r) 7 r / T . If this is so, we can then reverse the stretching/compression operation, and retrieve the reconstructed function f(t) by using the relationship: is such that For the reconstruction = described as shown in figure 3.4b, with a sampling period of T units. If the transformation, 7 , between t and r U + nT n This function is sampled non-uniformly as shown at the Now, suppose we apply a stretching/compressing — 7(f), {t 5 sample, is not necessarily equal to nT. For example, refer to the function shown in figure 3.4a. locations t . is version of the 66 0) b) FIGURE 3.4 a) A function, f(t) sampled at non-uniformly distributed positions b) The transformed function, g(r), sampled at uniformly distributed positions. f(t) = h(7(t)) Substituting this relationship into (3.4.1) of Theorem 3.2, and using T = (3.5.1) 7(t) yields 67 f(t) Hence, = L f ( t ) sin[wo(7(t)-nT)]/[a)o(7(t)-nT)] (3.5.2) n=-oo " in order to reconstruct f(t) from its non-uniformly spaced suffices to find the invertible and one-to-one samples function 7(f) such that 7(t ) = n f (t ), it n nT and then to use (3.5.2). The reconstruction formula (3.5.2) is equivalent to the one derived by Papoulis (1966), who treated the case of sample positions that were deviated slightly from a uniform sample distribution. However, his analysis indicated that this reconstruction would became are always be subject to an aliasing error which became would never be exact, but smaller as the sample deviation smaller. This conclusion is too pessimistic, however, and it can be shown that there cases for which the samples are not uniformly distributed and yet the reconstruction can be exact The conditions under which an exact reconstruction can be obtained are discussed below. In order for (3.5.2) to hold, the function h(r) must be bandlimited to u> - Thus h(r) 0 is a member of the set B„, , which is defined as the set of all functions whose Fourier "Jo transforms vanish for |o>| >w - Let us define the set C 0 are the image of a function in B„ to be the set of all functions which under the transformation 7" \ It can be seen that C "J P is the set of all functions that can be reconstructed The set C is clearly non-empty, and thus approximately true for all functions is incorrect 7 exactly with (3.5.2), for a given 7 . Papoulis' assertion that (3.5.2) is only Equation (3.5.2) is approximate only for those functions that are not members of C . An interesting band-limited. That point this to be noted, is so can be seen is that the functions by examination spectra of h(r) and f(t). It is possible to show that: in C are generally not of the relationship between the 68 F(X) = f%?(k»)H(u>)±> (3.5.3) where the function P , which can be thought of as a frequency-variant blurring function, is defined by: = J^e^ P(XA>) That is, P(AAJ) 27r T ^'y( ))e^ 27rXr dr (3.5.4) is the Fourier transform of the angle modulated signal: 2 pw(T) = e/ ™T« (3.5.5) Thus, if Pw(t) is not bandlimited, as is usually the case for angle modulated signals, then generally f(t) will not be either. One can show that there exists a transformation 7 such that the FM signal defined by: f f(t) = t~J • ° is a member of C n(s)ds (3.5.6) whenfl(s)is a positive, continuous function. Hence, such a function can always be reconstructed exactly, when sampled at the times t =7~^(nT), even when it is not n strictly bandlimited. Let us summarize the details of the above analysis in the form of a theorem : Theorem 3.3 The non-uniform 1-D sampling theorem Let a function f(t) of one variable be sampled at the points t = t , where t is not necessarily a sequence of uniformly spaced numbers. If a one-to-one continuous mapping 7(t) exists such that nT= 7(tn)> and if h(r) = f(y~\r)) is bandlimited to u) = ir/T, t then the 69 following equation holds: f(t) = I f ( t j sin[ej.( (t)-nT)]/[wo(7(l>-nT)] This theorem same manner (3.5.7) 7 n--co i J can be generalized that Theorem to include 3.1 was generalized other orthogonal to Theorem basis 3.2. This functions generalization in the will not be explicitly stated here. The reconstruction Consider a 'burst' method described here can be thought of in a different type signal such as that shown in figure 3.5. Intuitively, we would expect that a uniform sampling of f(t) that allows an exact reconstruction, efficient central sampling scheme. region where there are lower reasonable to require there are high frequency components frequency It seems components. a would not be the most higher than Suppose that, at every make a local estimate of its bandwidth - sample density in the outer regions signal. This conclusion has been reached who derived the reconstruction = f(t) = B(t). Then, it would follow that we would have to The instantaneous (albeit from a different direction) given implicitly by: (3.5.8) n I derivative of the by Horiuchi (1968) n/(2B(t )) n--oa where formula (3.5.9) for a signal with a time varying bandwidth B(t) that is sampled at the points t n in the point of the function f(t), we sample at a rate of 2B(t) samples/unit time in order to allow an exact reconstruction t manner. f ( t j sin[7r(2B(t)t-n)]/[7r(2B(t)t-n)] (3.5.9) Ii of the mapping function, sampling rate and hence we have that: bj(i)/oi can be thought of as the 70 f (t) t FIGURE 3.5 A burst type of signal, with time-varying bandwidth 97(t)/9t = (27T/cj )B(t) 0 If the bandwidth B(t) is a constant or 7(t) = t (3.5.10) k +J*0 (27r/cj )B(r)dr (or approximately 0 so over a given interval) then we can say: 7(t) With = (3.5.11) (27r/o) )B(t)t 0 this equation for 7(t) we can see that equation (3.5.2) and (3.5.9) are equivalent bandwidth is not approximately (3.5.9) are not equivalent. constant then our equation (3.5.2) and Horiuchi's If the equation 71 that Equations (3.5.8) and (3.5.10) is, how to sample a signal tells us (implicitly) how to optimally sample with the smallest number of sample points a signal, while still allowing an exact reconstruction of the signal. Equation (3.5.9) suggests that we could interpret the reconstructed bandwidth Given B(t), to the signal L an arbitrary integrate When f(t) as the response of a time f(t)5(t-t ). Thus n varying (or adaptive) one can envisage lowpass filter, with the following function f(t) we estimate its bandwidth as a function of time. We then this bandwidth function, as in equation (3.5.10), to yield the warping function 7(t). this warping function crosses an integer value-n, we sample f(t). We then store or transmit the sample f(t ) along with the time at which the sample was taken, t . n fl then, knowing all of the f(t ) and t fl The measure preceding analysis n We can values, reconstruct f(t) using (3.5.7). is complicated by the fact that an exact 'local' bandwidth does not exist as bandwidth is defined globally, being a frequency domain measure. Thus, we can only obtain local band width validity of equations (3.5.8)-(3.5.10). 'estimates', In practice, which may cause concern as to the however, the reconstruction used are defined only over a finite area (truncation of the reconstruction loses process. nothing by assuming the local bandwidth over this finite area formulas that are series), to and so one be the actual bandwidth. In practice, the use of the above reconstruction theorem requires the knowledge of the function 7(t) at all points t for which we desire a reconstruction. If an analytical expression for the sampling sequence analytical expression expression t Q is known (e.g. to include non integer t = s(n)) then we can simply values (e.g. 7(f) = extend this s(t)). If no such analytical is available (as is usually the case) or if the analytical expression can not be extended to non integer values, then 7(0 must be found by interpolation between the known 7(n) points. The only constraint on this interpolation is that it must yield a 7 that is one-to-one and invertible (monotonic). We will now present an example showing the effectiveness of non-uniform sampling and reconstruction for signals with time varying bandwidth. Consider the following F M signal: 72 f(t) where = (3.5.12) sin[27T0(t)] <p(l) is the phase function, defined in terms of the instantaneous signal frequency (bandwidth) as follows: 0(t) Let = us consider f*B(r)dr the case (3.5.13) of a quadratic phase function, or linear bandwidth function. Specifically let us define: B(t) = And hence k i = t/20,000 (3.5.14) 1/40,000, k 2 = 0 and k 3 = 0. It can be seen that the Nyquist rate (for |t|<1000) is 1/10 samples per unit time. Thus the uniform sample sequence (for |tj<1000) is t =10n. n In the non-uniform case, the sampling sequence is obtained from equation 3.5.8 as follows: = 7(t) c/(27r/a; )B(T)dT 0 = 2c0(t) (3.5.15) The constant c is chosen so that there are 100 samples in the interval [0,1000] (the same as for the uniform case). We then have that: = 7(t) ct /20,000 (3.5.16) 2 We know that 7 ^ = ^ Therefore we must have that 7 (1000) = 100. This yields a value of 2 for c. We can now see that the sample sequence for the non-uniform case is given by: t n = 100/(n) (3.5.17) 73 Note that, for n<100, the value of |t -10n| is greater than 10/4. This violates the condition n (3.4.11) that the sample sequence must meet for the non-harmonic fourier series methods to work. Thus these methods are inappropriate for this application. The function in equation (3.5.12) was sampled at the uniform and non-uniform sample points, and then summations in each reconstructed using of these reconstruction magnitudes of the two reconstructions hence the reconstruction formulae formula were truncated (3.4.1) and (3.5.7). The to 21 samples. The error are shown in figure 3.6, as a function of time (and the signal bandwidth). Notice that the error for the uniform case rises as the signal frequency rises because of the increased aliasing and truncation errors, while the error for the non-uniform case remains more or less constant, as expected. The total RMS error for 10<t<990 is 0.0982 for the uniform case and 0.0706 for the non-uniform case. Let us now examine the case of two dimensions. the extension of the above one dimensional sampling theory to 74 S o FIGURE 3.6 The reconstruction of a chirp signal for uniform and non-uniform sampling. 75 3.6 - The Transformation or Warping Method - 2D Case As in the one dimensional case, the development of a two dimensional non-uniform sampling theory begins with the consideration of the uniform sampling theory. The theory behind the reconstruction of functions of two variables from uniformly distributed samples of these functions was developed by Petersen and Middleton (1962). Mersereau (1979) and Mersereau and Speake (1983) have studied the more general problem of processing multidimensional signals that have been sampled on uniform lattices, especially hexagonal lattices which have added importance in this thesis. We are concerned here only with signal reconstruction but it is evident that the type of signal processing techniques described by Mersereau and Speake can be extended, using the results of this thesis, to the case of non-uniform sampling. The essentials of the work of Petersen and Middleton (1962) is summarized in theorem 3.4. This theorem describes the conditions under which a function of two variables, f(x), can be reconstructed exactly from its samples taken at points on a uniform lattice. This theorem basically extends the one dimensional uniform sampling theorem (Theorem 3.1) to two dimensions. Theorem 3.4 The uniform two dimensional sampling theorem Suppose that a function of two variables f(x) is sampled at points in the infinite sampling set {xs} defined by: xg = {x: x=l,v1 + l,vJ, 1,,1, = 0,±1,±2,±3..., Vj*kv2} (3.6.1) The vectors v, and v2 form the basis for the sampling lattice defined by the points in {xg}. Such a sampling lattice is shown infigure3.7 for v, = (2/V3,0) and v2 = (lV3,l). Furthermore let the support of the Fourier transform F(w) of f(x) be bounded by the region R in u space. The spectrum of the sampled function, f(x)=E6(x-x„)f(x), is made up of an infinite s s f*,3 76 88 70 71 71 73 81 FIGURE 3.7 The hexagonal sampling lattice for functions with isotropic spectra number of repetitions of the spectrum F(i3) and is given by F (c3)=EF(a5+c3 ), where the set {cjg} is defined by: CJ$ ={U: u> = l,u, + l2Uj, 1 1 ,1 2 =0,±1,±2,±3..., Ui^kuj} (3.6.2) 77 and where the frequency domain basis vectors u} and u2 are related to the spatial domain basis vectors v, and v2 by T T T u, v, = u 2 v 2 = 2ir, and T U i v 2 = u 2 v, = 0 (3.6.3) If F(c3) = 0 when F(CJ+C3s)^0 (for every " g ^ 0 ) , then the spectral repetitions do not overlap, and the following equation holds: f(x) = I f(x )g(x-x ) .03 S (3.6.4) S where g(x) is the inverse Fourier transform of the lowpassfilterfunction G(w) defined by: = Q ueR G(o5) = arbitrary = c3iR, c3-c3^R "~^s 0 e (3.6.5) ^ Q is a constant that is equal to the area of each of the sampling latdce cells and is the inverse of the sample density. In terms of the sampling lattice basis vectors, v, and v2, Q is given by i T J Q = •|v1| |v2|'-(vl v2) Now, (3.6.6) following the lead of the analysis performed in section 3.5 for the one dimensional case, let us introduce a second function of two variables, h(x), that is the image of f(x) under the coordinate transformation I = 7(x) (3.6.7) 78 i.e. f(x) Let this 3.6.1). reversing the dimensional -y(x), contains s under reconstruction (3.6.8) 7 be such into a uniformly spaced Since. {f } reconstructed h( (x)) transformation, is transformed equation = h(f) the coordinate {Xg}, be = will the conditions be exact coordinate points set of samples Once samples {J^ (such as the set defined by that lie on a regular lattice, the function h ( f ) can of Theorem 3.4. h ( £ ) has been transformation sampling theorem that the set of non-uniformly spaced (3.6.7). This which is stated below If h ( £ ) is suitably reconstructed, is the basis as Theorem f(x) of bandlimited then can be obtained by our non-uniform two 3.5. Theorem 3.5 The non-uniform two dimensional sampling theorem Suppose {x }. g that a function of two variables Now, if there exists a o n e - t o - o n e 1 = 7(x) and I g = T f(x) is sampled continuous mapping y at points such (x ) in the infinite set that: (3.6.9) s and if the function defined by h(?) satisfies f(7 (I)) the conditions o f Theorem h(f) where _1 = g(£) = £h(? (3.6.10) 3.4 then )g(I-? ) is as defined by (3.6.5). the following is true: (3.6.11) Hence: 79 f(x) = Let I us assume that h ( £ ) is an isotropic isotropic, as transform of the function origin. R = If in (Petersen this is the (H : | X | < 7 r } . (3.6.12) f(i s )g(7(x)-7(x s )) It and Middleton, is a disk case, we can then 1962), shaped can function, where to region define be shown mean that the in the frequency the region that we have R, (Petersen here taken support plane, mentioned of the term the centred about the in Theorem and Middleton, 1962 Fourier 3.4, eq. 74 as with B=l/2): g(I) = (fi/v'3M*\l\)'(*\'Z\) where Jj is the first order Bessel (3-6.13) function o f the first kind. Combining equations (3.6.12) and (3.6.13) yields the following result: f(x) = L f(xs) (TTV3)1,(771 T(x)-T(x )j )/(7TiT(x)-T(XG)|) (3.6.14) s It can be shown space) 2/y/3. this for such This any sampling isotropic lattice at which we know x. process Unlike case, know we wish the values the is 1962) that the most efficient sampling lattice (in J a hexagonal is the one shown (3.6.13), the values we are to use equation dimensional points once an and Middleton, lattice and, by equation If one (Petersen (3.6.12) the mapping to obtain in figure one dimensional at are also as our reconstruction function y(x) the sample case, with however, a characteristic 3.7. T h e values o f 7(xg) a reconstruction. o f 7(x) lattice of £ g formula, it is are fixed by we must, points interpolate not a trivial as in the and at all other A s in the one dimensional {xg}, of fixed. at all sample points spacing to matter case we can, find to 7(x) for obtain a 80 mapping, r-{xs}->{^s}, between the sample sets in x and continuous mapping space, that yields a one-to-one mapping function 7 . In the one dimensional case one and only exists (restricting interpolation of 7(t ) = fi such and \ the sign of the derivative of 7 to one such be positive), given by nT, but in two dimensions there may be, in general, any number of mappings. The difficulty lies in the fact that there is no general scheme for ordering arbitrarily distributed points in two dimensions analogous to the sequential ordering available in one dimension, such that adjacency properties are preserved. For the purpose of the following discussion let us make the following definitions. Definition Partition A partition of a planar region R is a set of line segments, called Links, that divide R into a number of distinct, possibly overlapping subregions. The endpoints of these Links are called the Vertices of the partition. There can be no free vertices (i.e. those vertices respect to the point set {J^ is the belonging to only one Link) in a partition except at the boundary of R. Definition Tessellation A tessellation is a partition whose regions do not overlap. Definition Voronoi Tessellation The tessellation Voronoi tessellation whose of the J plane with Links consist of points equidistant from any two points ^.,?-e{7 } and no 1 closer to any other points J^e{jj. J s The vertices of the Voronoi tesellation are those points equidistant from three or more points in { £ } and no closer to any other point in g subregions created by the Voronoi tessellation contain all points ^ {£ } g than any other point in {| }. g The closer to a given point in 81 Definition Dirichlet The dual line Tessellation Dirichlet tessellation (sometimes of the Voronoi tesselation. T h e Dirichlet segments with respect connecting to {J^ An the points share example of can be seen sampling created lattice by tessellation Definition A set P- points same vertex are tessellations tessellations are can be found created shown figure the points tessellation triangles. in 3.8. in Ahuja and Schacter by connecting non-overlapping defined to Conserving be P-Adjacent Partition mapping, T ^ x l - X ^ } , i f they Mapping is termed a partition having vertex of ? We Further (1983). in the hexagonal and as such, the regions will denote vertices of a this particular are partition P and (ACPM) an A C P M set {1^ such if it takes a partition that the points in {xl having vertex have the same as their images in f £ } . g a result of the preservation o f adjacency number of Links has the same triangular tessellation Link. the partition are set of Link. Adjacency P-adjacency As subregions in the Voronoi the Adjacency into {Xgl partition of as is the by D ^ . a common Definition whose Delaunay triangulation) can be thought to their nearest neighbours is a Dirichlet the Two share that to as the tessellation and Dirichlet discussion of Dirichlet and Voronoi It in {J^ a common Voronoi referred regions, properties, a region in a partition has the as the corresponding region in its image number the regions of links as its image of any A C P M , under under an A C P M . an A C P M . Hence, P , o f D ^ are also triangular overlapping). Note also that the inverse of an A C P M is itself an A C P M . Also, since each D ^ has (although possibly Yoronoi Dirichlet FIGURE Definition 3.8 The Voronoi Generalized A Hexagonal tessellation Generalized created Hexagonal and Dirichlet tessellations Tessellation by Tessellation of points (GHT) applying or for a set an ACPM to G H T . A l l vertices the of a tessellation G H T are D^ the is called junction of a six Links. As the x, points was said in {Xg} not necessarily The interior (known) written of fact at the and the a member that beginning points of in ij^, of {Xg}, the regions of P x of these regions, we values of 7(x) at vertices as: section, we can once determine by interpolation of the one the this are triangular should let of this we the region. be a mapping, T, between mapping function, 7(x), for values. T suggests £=7(x) have that, some Such a given linear linear a point x combination combination in the of the can be 83 7 where (x) r(x.)I(x^ = | ( i ) ^ ( i + 1 ) ^ ( i+ 2)j (3.6.15) ) V(x) = (xi,x2,x3) is the vertex set of the region denotes (i)modulo(3)+l. The function I(x,x,.\ Jc,- , j( of P containing x x , ^ ) is some r and where (i) 3 interpolation function which results in an invertible 7 . The simplest such interpolation that we can do between three non-collinear points is a Trilinear Interpolation which fits a planar (vector valued) surface to these three points. The interpolation function for this method is given in equation (3.6.16). This equation describes a (xAx^.rjXxj'.xAO), plane passing through the points 2 I ( x ^ 1 > x 2 ^ 3 ) = [x'(xVx 3) + and (xj'.x^O). x (x -x ) + + I 2 1 1 3 2 (x x' -x x )]/A 2 3 2 2 2 (3.6.16) 1 3 where = A and x = x (x -x ) I 1 2 2 2 3 1 xMxVx ,) (x x -x x ) 1 2 3 2 3 (3.6.17) 2 (x^x ). 2 We can now state the following theorem given 2 one—to-one (transformation) point set mapping will which supplies the conditions under which a yield the one-to-one and continuous mapping function that is required for equation (3.6.11) to be valid. Theorem 3.6 Invertibility of a mapping Given interpolation the Dirichlet tessellation defined by equation image of D ^ P x D (3.6.16) h on then, ~\ as if there defined is an above, and the ACPM, r ~ \ such trilinear that the is a G H T and the points in the set V(x) are not collinear, the mapping 7(x) defined by (3.6.15) is one-to-one and continuous. 84 The the a image given key condition in Theorem 3.6 is that T~ of D ^ , be a G H T . Thus, sample set Ix J , we need in order to first be an A C P M , to perform the reconstruction find the tessellation P O mapping T It of P is suspected lattice that run into trouble trying to order the set and no proof of this lines. This process conjecture, in figure the ordering 3.9a. This set a G H T from of points less can be regular. It turns mapped This, of course, other ways However, without does out that, creating o f trying to construct this example does this in { £ } all must x under the have three up of points set by trying region operation. Even i f a G H T could be found this GHTs consider lie along in to map points figure 3.7. twelve or beyond a a involved with of this thinner point no points collinear vertices. as there are a number of set of points; problems become certain region in it to The result the regions of the tessellation a G H T from the that }xg{. tomography (e.g. see figure 3 of Pan using our construction, out for creating cases. F o r example, a valid proof of our conjecture point all sets of points that algorithms given g an overlapping not constitute made in X - r a y is shown in figure 3.9b. A s we proceed and of P a G H T from but it can be seen set is found We try to create following D ^ . The regions large numbers o f points in some type of sample Kak, 1983). by shown tessellation, it is not possible to create have radial image must be the junction of six Links. x We of points P , of a function for whose v that A is the hexagonal sides and the vertices or conversely, one o f which may work. in performing the thinness of the regions the mapping of the G H T so found would cause problems with the interpolation process. However, those samples one-to-one in a and continuous only a if we truncate finite continuous over the reconstruction region about everywhere. this restricted mapping function that created x, then The region. This formula (3.6.12) it would mapping to only not be function take necessary need be into account for y to be one-to-one would mean that we would need only a partition that was only locally a G H T . In this and to find way it is 85 b) An attempt to create a GHT from {Xg}. expected that a mapping could be found for any set {x }. g For example, figure 3.9c shows a few of these local GHTs defined on the sample set of figure 3.9a. The price we pay for 86 this weakening of Theorem 3.5 is that the reconstruction is no longer exact, even if h ( £ ) is suitably bandlimited. In practice such a truncation of the reconstruction equation is unavoidable as one can only process a finite number of samples in a finite time. Let the finite set of sample points used to reconstruct f(x) be }x } e{x }. Note that s for different reconstruction points x we may have different sets {x } . g 0 0 s When only a finite number of terms are used in equation (3.6.12) the resulting value of f(x ) will not be exact 0 but will be subject to a 'truncation' error term. This truncation error is defined in equation (3.6.18) and can be bounded as shown in equation (3.6.19). e (x) 2 t = |f(x>f (x)|> R = |S ftx ) (wV3)J,(|7(xh7(xs)|)/(Tr|7(x>-7(x )|)| (3.6.18) s J s In finding this bound we have used the Triangle Inequality and the fact that | Jj(x)| <l/j/2 (Abramowitz and Stegun, 1965). The above bound suggests that we make the distances in {fgJo as large as possible for £ e { ? } . s 0 between J In other words, {? } s 0 and all points not should consist of the N g points closest to J. How one are we to determine constraint on {x" } s 0 which points {x 3 by requiring continuously into the Dirichlet tessellation from the points in {x } g In order 0 s that 0 map into J ^ o ? Theorem 3.6 provides the partition PXQ be a tessellation and map D^Q, where PXQ and D^Q are the partitions formed and (%Jo respectively. for the interpolation of y (equation 3.6.15) to be valid we must stipulate that x be contained in one of the regions of PXQ. This also means that ~% lies in one of the regions of U-^ 0 87 Theorem 3.6 requires that H(^) = 0 when H(X+^ )^0 S for every ^ ^0. s condition is not satisfied the value of f(x) obtained with equation (3.6.12) will If this be in error. This error is referred to as aliasing error. It can be shown that if the aliasing errors due to two different sample have distributions are compared, the distribution with the higher density the lower aliasing error. aliasing error Now, it can be shown that, of the reconstruction maximum minimum possible possible density. aliasing This error, of {x } g 0 increases. This that has, in addition to the above mentioned conditions, s 0 a for a given f(x), the localized (3.6.12) decreases as the density suggests that we should select {x* } will will or, ensure that, alternatively, for a given will give us f(x), the we will maximum obtain a allowable bandlimit that a function can possess while still yielding an exact reconstruction. We can summarize the conditions on S x l s The set {x } g 0 must be such that there 0 as follows: exists a tessellation PXQ that can be continously mapped into the Dirichlet tessellation DNQ. The o f P The point x at which the function is to be reconstructed must lie within one of the regions x0- density of the set i x ^ o must be the maximum possible subject to the above two constraints. In truncation general, finding errors is very the optimal mapping that difficult In the next section jointly we will minimizes present the aliasing and a heuristic algorithm which is near optimal for homogenous sample distributions. This algorithm guarantees finding a mapping which locally satisfies the conditions of Theorem 3.6. 88 3.7 - Implementation of the 2D Based on the Transformation Method foregoing discussions, we propose This algorithm finds, for a given point x in that the truncation and following reconstruction and sample set {xgL locally satisfies the conditions of Theorem 3.6. sub-optimal, the a subset {x^o algorithm. of {xg} that It will be seen that this algorithm is generally aliasing errors may not be the minimum possible value. For homogenous sample distributions, however, this algorithm will be optimal. The motivation behind the algorithm is as follows, ln the application that initiated this study, the sample distributions were non-uniform but homogenous. applications. Thus the sample points {xg} a regular sample perturbation is set, small remain somewhere such as the enough so that in a 60° This is the case in many could be thought of arising from the peturbation of hexagonal a given lattice. point Our on the algorithm original assumes hexagonal sector about its original position. See figure 3.10 that this lattice will for an example of such a perturbation. The algorithm begins by trying to find the centre point of the lattice. Such a point is denoted by 1 in figure 3.11. point, to 0 the point in x reconstruction. This point here N we become use equal g closest {xg} will to then be for to consist of the N determining {x^o. consisting of a 6 0 ° closest point 0 for higher at which to x0. mapped to {f } g 0 g g points closest to J, the N g values points will can be to algorithm devised which maps into 1 , 0 space in x then as shown in figure 3.11. be the within 60° obtain about 6 the a described but they we must result sectors, and because we want we can use the following heuristic sector, as shown in figure 3.10. These wish Because we have assumed that the points in {xg} of the points in {J } Divide we In the particular increasingly more difficult. Once we have the point x 0 from the slight perturbations g 0 the hexagonal We take as the perturbed value of this mapped to J . 7. Algorithms find the other N - 1 . points of {xg}. {^ } x, 'original' point x 0 into six procedure regions, each Find the point in each of these sectors other points in {xg}0. These points are 89 X Space 4 +-X FIGURE 3.10 The operation of the mapping heuristic for N =7 G This algorithm is described procedurally in a pseudo high level language below, procedure (* RECONSTRUCT({ x 3,{f(x )} g g ,x ,f(x)) To reconstruct the value of a function f ( x ) at a point x function samples {Xg}. It is assumed that N = 7 and that {xt g begin IF x = X j C { x s } THEN f(x) = f ( x p ELSE Starts earch = x (• Start the spiral search at x *) given an arbitrary set of is homogenous. •*) 90 4 Space FIGURE 3.11 The sample locations in 1 for N = 7 g FindNearestNeighbor(x ,{x } Jc ,StartSearch) 0 g c (* Look for the centre point of ! x ^ 0 *) FindMapping(x ,lx g} ,{x ] ,\1 J o) 0 g 0 (* Find the mapping between [x^0 and {J^0 *) InterpolateMapping(x „,f ,1 x ]0,{I g3 o) g 0 (* Interpolate to find the mapping of x into J and (3.6.16)) •) f(x) = 0 FOR i = 1 TO N g DO BEGIN (* Compute the reconstruction sum *) f(x) = f(x)+g(?I )f(x i ) i (e.g. using equations (3.6.15) 91 ENDFOR ENDELSE endproc The nearest neighbor finding procedure 'FindNearestNeighbor' can number of ways. For example, the efficient spiral search be technique of (Hall, done in a 1982), modified to search over monotonically increasing distances was used in the examples described later in this thesis. This modification is described in the appendix. The given mapping procedure 'FindMapping' determines the mapping between {x }o and {"£ } g the sample set {xg}. This procedure is detailed in the following g 0 pseudo high level program. procedure FindMapping(x0,{x ^ ,{xg}0,{| g}0) I. = (0,0) StartSearch = (* Map x 0 while x0 to f o , the centroid of { f } s Foundl = false or Found2 = false Found5 = false or Found6=false (* While the N g points in {J^0 0 *) Found3 = false or or Found4=false or do have not all been assigned, do the following: •) begin FindNearestNeighbor(xn,{xg}, x ,StartSearch) 0 (* Perform a spiral search, starting from StartSearch to find xfi closest to x 0 but no nearer than StartSearch. *) If(-30°<Angle(x, x„)<30°)and (Foundl = false) then (* Determine whether or not x n 1. •) 2 ? , = (2,2V3) x,=x n Foundl = true is in the Jl sector. If so assign x n to 92 endif If(30°<Angle(x^ )<90°)and 0 (Found2 = false) then I 2 = (1V3.1) x = x 2 n Found2 = true endif If(90°<Angle(x3c )<150 ° )and(Found3 = false) then 0 I =(-1V3,1) 3 Found3 = true x =x 3 n endif I f ( 1 5 0 ° < A n g l e ( x , x ) < 2 1 0 °)and(Found4 = false) then 0 1 = (-2V3,0) 4 Found4 = true x,=x n endif If(210°<Angle(x,x„)<270 ° )and(Found5 = false) then 1 = (-1V3-1) 5 Found5 = true endif If(270°<Angle(x,x„)<330 °)and(Found6 = false) then 1 = (l//3-l) 6 Found6 = true x =x 6 n endif StartSearch=x n (* Start search for the next sample at x endwhile n not x* *) c 93 endproc The Angle function used here computes the angle between the vector x-x 0 and some reference vector. The above mapping heuristic works well when the samples are distributed more or less isotropically. For example, in figure 3.12a, the set {x } of maximum density has been found. 0 When the sample distribution is markedly non-isotropic it would be expected that another mapping procedure could do better, as can be seen in figure 3.12b, where the optimum 5xs50 has clearly not been found by the algorithm. It is natural to ask how the reconstruction and two dimensions can be extended method to the- reconstruction described in this paper of higher dimensional functions. It is evident that Theorem 3.6, following the analysis of Petersen uniform case, can be directly extended to higher dimensionality of the functions and variables can be similarly extended, the function 7 where n can be determined is the dimensionality involved. The search of the sampled function. of two dimensional function reconstruction and Lawrence, 1984) merely by increasing the procedure, outlined above, for any dimension. From this T, by fitting an n-dimensional hyperplane heuristic can be expected to fall as the dimension increases, Examples and Middleton (1962) for the dimensionality, to allow the determination of T for one The efficiency to n+1 points, of the mapping however. can be found in (Clark, Palmer 94 a) b) Isotropic Distribution Anisotropic Distribution FIGURE 3.12 The relation of the heuristic mapping efficiency in terms of sample density to the shape of the sample distribution. 95 3.8 - Including Surface Gradient Information in the Reconstruction Process In reconstructing the shapes of surfaces in practice the height (or depth or surface amplitude) information that is available is frequently sparse surface reconstructions can be performed if one can obtain some other of the surface shape. For example, a surface shape value of the surface and noisy. In these cases better gradient vector. An example independent measures descriptor that is often available is the of such a measurement is given by the diffrequency measurement described in chapter 7 of this thesis. Other means by which surface gradient information can be obtained include photometric stereo (Woodham, question one then asks is: how can the information about the surface with the depth measurements this to perform the surface reconstruction? question and provides a numerical gradient along zero crossing contours, surface reconstruction method 1978). The gradient be combined Grimson (1984) considers for obtaining measurements of the surface with an eye to incorporating this information into his algorithm (see the discussion in chapter 3.3). Unfortunately he did not provide any method for performing this incorporation. Ikeuchi (1983) also talks about combining depth gradient measurements shape from shading with depth measurements map. He proposed a scheme obtained using obtained from a stereo algorithm into a depth whereby the relative depth of a surface is obtained from the depth gradient (or surface normal) information by minimizing the squared difference between the reconstructed depth gradient and the measured (sampled) gradient over the surface. An absolute function is then obtained by using the amplitude information to determine the depth offset It can be seen, however, that only one depth measurement is sufficient, in principle, to specify this provide a depth more offset, accurate and Ikeuchi estimate uses of this Ikeuchi's technique does not take advantage the rest offset. of the depth This leads measurements only to one to the conclusion that of all of the information available in the depth amplitude information, but relies inordinately on the gradient information. In this chapter we show how the transformation reconstruction method described in the previous section can be extended to incorporate depth gradient information. This will be done 96 for the two dimensional case only; the cases for other dimensions can be obtained in a similar manner. It will be seen that both the amplitude and gradient information are ascribed equal importance As the in the development extension developed that, in performing the reconstruction. of the two dimensional which allows the incorporation of gradient transformation reconstruction information is based on the theory by Petersen and Middleton (1964) for the case of uniform sampling. They show for uniform sampling, using both amplitude and gradient information point one requires only one third the sample density for exact reconstruction which method, amplitude information only is used. Conversely this at each sample than the case in means that functions with three times the bandwidth can be reconstructed exactly with the same sampling positions. The reconstruction theorem for the uniform samping, amplitude plus gradient case is stated below as theorem 3.7 and is due to Petersen and Middleton (1964). Theorem sampling 3.7 The on a hexagonal Suppose to uniform 27rB two dimensional sampling sampling set {x } g for amplitude and gradient grid that a function of two variables radians, theorem and its gradient f(x), with isotropic Vf(x) are sampled at points spectra and bandlimited in the infinite hexagonal defined by: x = {x: x = l v + l v , l ,l = 0 , ± l , ± 2 , + 3..., v\;tkv } v, = (v/3/2B,-l/2B) and v 1 1 2 2 1 2 2 (3.8.1) where: 2 = (0,1/B) The corresponding frequency domain spectral repetition lattice basis vectors are given by: (3.8.2) 97 u, The = (4irB/|/3,0) and U 2 = (27rB/v/3,27rB) (3.8.3) result of this frequency domain repetition lattice is that the spectral repetitions exhibit a threefold overlap. The addition of the gradient information insures that this overlap does not result in any aliasing errors. If the above conditions hold then the following equation is true: f(£) = L f(x )g(x-x )+Vf(xs)-E(x-xs) s (3.8.4) s where g(x) is given by the following expression: and g(x) = 6j/3/(27rB) {sin(27rBx V3)[cos(27rBx )-cos(27rBx / /3)]}/[x (x -3xa )] (3.8.5) K(x) = xg(x) (3.8.6) 3 2 1 1 l 1 2 1 2 where: The non-uniform sampling theorem for amplitude and gradient sampling can be derived from theorem 3.7 in the same manner as the amplitude only theorem was obtained in chapter 3.6. Thus the non-uniform sampling theorem is therefore only stated (as Theorem 3.8) and not explicitly derived. Theorem 3.8 sampling on The a non-uniform hexagonal two dimensional sampling theorem for amplitude and gradient grid Suppose that a function of two variables f(x) and its gradient Vf(x) are sampled at points in the infinite set {xg}. Now, if there exists a one-to-one continuous mapping y such that: 98 £ = 7(x) and | s = 7(xg) (3.8.7) and if the function defined by _1 (3.8.8) P(I) = f(7 (I)) satisfies the conditions of Theorem 3.6 then the following is true: (3.8.9) where g(x) and K(x) are as defined by equations (3.8.5) and (3.8.6). Hence we have, after applying the transformation: (3.8.10) where |J| = 19x/9-y| is the Jacobian of the inverse transformation 7 (£). The heuristic mapping algorithm described in chapter 3.7 can be used to implement the amplitude and gradient sampling and reconstruction process as well as the amplitude sampling only reconstruction process. The importance of theorem 3.8 is that it shows the existence of a method by which a two dimensional function can be reconstructed using non-uniformly distributed information about both its amplitude and its gradient 99 3.9 - Summary of chapter 3. - The disparity function must be reconstructed as accurately as possible from its samples at each resolution level. - The reconstruction process must handle arbitrarily distributed samples. - Grimson's surface reconstruction method (relaxation method) is based on his surface consistency constraint which we show to be invalid for smoothed (low pass filtered) images. - Current reconstruction methods based on sampling theory can not handle arbitrarily distributed samples. - A warping or transformation method is introduced which can handle arbitrarily distributed samples. - This method is shown to have the gradient information into the reconstruction process. ability to incorporate independent surface 100 IV - ERROR ANALYSIS O F DISCRETE M U L T I R E S O L U T I O N F E A T U R E 4.1 Sources of Error in Discrete Multiresolution Feature Matching - In this chapter multiresolution feature we analyze the various sources of error that are matching. We also provide a general feature matching systems which will enable us to estimate types of errors on the accuracy, MATCHING involved in discrete model of discrete multiresolution the relative effects of the different and more importantly from a procedural point of view, the stability of the matching algorithms. This has not been done, except at the most superficial level, by other researchers in this area. Without such an analysis as the one presented here, one cannot and fully predict the performance of a given matching algorithm. For Poggio (1979) examined in detail only mismatching of 'ghost' features; one and did not of the take errors into account discussed the other example, Marr here, that of sources of the error which, as we will see, are as important, if not more so. The topics discussed in this chapter are summarized in block diagram form in figure 4.1. The sources of error in a general discrete multiresolution feature matching process are listed below, with their specific characteristics 1. Sensor Noise - (section and effects. 4.2) The image sensors that are used in practical vision systems, such as video cameras or biological retinae, are non-ideal devices, and as such produce signals which are contaminated with various types of noise. In the case of video cameras the types of noise that may be present include thermal noise of the electronic circuitry used to process the video signal, shot noise due to the random nature of the electron beam in the vidicon tube, 1/f noise due to surface effects on the vidicon target, imperfections or blemishes in the optics of the camera, and so on. The main effect of these noises is to produce shifts in the parameters of the measured features. For example, in the case of zero crossing features, the sensor noise causes 101 CH4.I Summary Sources of Error Analysis of Sensor noise Analysis of Analysis of Analysis of F i l t e r i n g error Reconstruction Matching error CH 4.5 CH4-.T CH4.S error Z Relationship between ^ scale maps of stereo pairs CH 4.4 Derivation of Probability of the optimal zero error reconstruction as a function f i l t e r for I truncation of the disparity ^ estimate error 4.S *" Derivation of * the general reconstruction error expression * I * Effect of Effect of the errors on the matching alg orithm orientation CH <K7- quantization CH4.6 CH 4.4 Geometry error Example - a analysis Gaussian surface 1 CH <f .4 CH4.6 * Effect of vertical camera misalignment Extension to non-uniform CH 4-.fe ^ sampling CH 4.4 FIGURE a random 4.1 The topics covered in chapter 4. shift in the measured position of the zero error in the value of disparity that was derived crossings. This would then from these zero crossing positions. cause an 102 2. Filtering It can be shown that, Error - (section 4.3) if, in the feature construction process, one performs spatial filtering of the image (as is the case in all multiresolution systems) there will, in general, be a non-corresponding change (i.e. a change in one image that is uncorrelated with a change in the other image) in the position of corresponding features in a stereo pair. Specifically this means that if a stereo camera pair is viewing a surface disparity (such as a tilted surface), that gives rise to a non constant the disparity derived from the difference of the measured positions of corresponding zero crossings will not be the true disparity. This is due to the fact that, in this case, the spatial frequency content of the two images is different, but the images are being filtered with different spatial characteristics This type of error the will same filter. Thus corresponding be, when filtered, altered features, which have in a non-corresponding fashion. has not been previously described in the computer vision literature. We will use the Scale space transform described in chapter 2 to analyze this error. 3. In Reconstruction any discrete Error - (section multiresolution 4.4) stereo system disparity function at all resolutions in the system. it is necessary to have Since the matching process a complete does not (in most cases) produce disparity values at all the points in our working space (but only at the positions where the features of the two images were matched) one must perform a disparity function reconstruction to fill in all the required missing disparity values. As we have seen in chapter 3.6, if the true disparity function is suitably bandlimited (for a definition of what suitably bandlimited means in this context see chapter 3.6) then the disparity function can be, in principle, reconstructed exactly from a set of its samples. However, this is only true if we have available an infinite sufficiently high. In practice number of samples, and if the density of this sample set is neither of these conditions are always attained, and as a result, the reconstruction process will yield a disparity function that is only an approximation to the true disparity function. Hence the reconstructed that has added to it a reconstruction error. disparity function is the true disparity function 103 4. Matching Error - (section 4.5) Features which are to be matched are generally primitive features as explained in chapter 2. As such they are to some extent ambiguous, meaning that it is not always possible to determine correspondences between images. The various types of matching algorithms that have been proposed try to reduce this ambiguity by making use of some of the structural properties of the features. For example, the Marr-Poggio matching of zero crossing features utilizes the fact that zero crossings can not get too close together. However, no matter how elaborate the matching algorithm is, there will always be some occasions in which incorrect correspondences are made. Thus, the disparity values produced by the matching algorithm will be subject to a matching error. It will be shown that the distribution of the matching error depends critically on the distribution of the error in the initial disparity estimate that was used to guide the matching. It turns out that this becomes quite important in examining the stability of the multiresolution matching algorithm. From their writings, it appears that Marr and Poggio were unaware of the importance of this fact It will also be shown that the orientation parameters of corresponding features can be distorted when the disparity function is not constant. This too results in errors. 5. Quantization Error In practice, the positions of the features can be, or will be, measured only to within a certain precision. This means that the measured feature position will differ from the true position. This results in an error which is referred to as quantization error. In pyramid type systems (see chapter 2), the quantization error decreases as the resolution increases. In what follows, we of qop where will assume that the features are located to within a quantization is the filter space constant In the cases in which we probability density functions, p„(e), we determine continuous can obtain the discrete probability mass function p (n), that arises from the quantization process as follows: 104 P (n) e 6. Geometry = (4.1.1) e - Error The S P (e)de (section imaging 4.6) system, being non-ideal, may introduce non-corresponding geometric distortions in the images which will result in non-corresponding shifts in the feature positions. Vidicon camera tubes are notorious for such geometric distortions. Another type of error that is a function of the geometry produced when the geometric parameters of the imaging system is of the imaging system are imprecisely known. These parameters include the inter-camera baseline distance, the relative angle of tilt of the cameras, the camera focal lengths, and the size of the sensors. All of these parameters are involved in the calculation of depth from disparity values, and hence, if any of them are in error, the computed the disparity depth necessarily error occurs after will be in error as well. the disparity measurement, the performance of the disparity measurement Since the above to depth conversion geometric errors do not affect algorithm. A type of geometry error, that does affect the matching process is the error caused by not knowing accurately enough the epipolar lines along which the search for matching cameras have a vertical offset relative features must to each other, be made. For example, if the and this offset is not known, disparity measurements made of non-vertical features will be in error. We errors will now examine in detail the cause and the effects listed above. For the purposes of these analyses we will of each assume of the types of that the two input images are zero mean, white, Gaussian random processes. This is done in order that we may be able to get a handle on the mathematical analysis involved in computing the distributions of the various errors and thus obtain representative results on the performance matching algorithms when disturbed by these various noise sources. image model would errors would become mean that the mathematics enormously complex. involved As will In most cases, a realistic in estimating the effects be seen, even of the of these with the assumption of Gaussian white noise, we only obtain approximate results in some cases. 105 The The error analyses will be done for the cases of zero crossing and extremum extremal crossings features. points are the zero crossings of 9/9x (V G*f(x ,x )). That is, they are the zero of the We will denote 1 directional 2 1 2 derivative of the V G filtered image 2 in the horizontal direction. the probability density function of the zero crossing errors by p (e), and the £ probability density function of the extremal point errors by /> (e). The corresponding e discrete probability mass functions will be indicated by a ' * ' over the p's. Section 4.7 will present a discussion of how the various performance of the simplified multi-resolution matching algorithm. sources of error affect the 106 4.2 - Effect of Sensor Noise on Feature Position In this section signal process function, f(x). crossing) of that is Let f(x) we examine V G 2 us define to be XQ. the following problem. Suppose we have a white Gaussian filtered the (with a filter position of an scale arbitrary constant feature Thus f(xQ) = 0. Now suppose yielding o^) Let n the variance of the the variance of the unfiltered signal be we add to corresponds to XQ be given by x . c unfiltered noise signal n(x) Let of. the Correspondence random (extremum point or the noise signal, which has a Gaussian distribution, so that the output of the f (x) = f(x) + n(x). a position of the original signal a filter is given by be denoted and feature of f (x) n and x in this case is defined as follows: (4.2.1) lie on corresponding feature contours. That is: lim D(C N -C 0 ) = 0 where CFI (4.2.2) is the feature contour through the point x* and CQ c the point XQ. D is some metric define the point x* n line through XQ. measuring the to be the point on CFI distance x* , c n and XQ is the feature contour through between two feature contours. that is the closest to XQ The relationship between XQ, X , Note that, in general x n CN and CQ the features of f(x) is shown in figure horizontal 4.2. (4.2.1). with those of f (x). n there was no added noise, then the measured disparity would be zero everywhere that the correspondence Now along the horizontal are not corresponding points, as defined by equation Now suppose we were trying to match shift, or that n lim |x - x J = 0 Note that x zero If (we assume problem can be solved). Now, if we add some noise, the horizontal disparity, between corresponding features (which is what is measured most stereo matchers) is a random variable equal to the distance between points x* n Thus we can write the feature position error, e as follows: by and XQ. 107 FIGURE 4.2 The perturbation e Our goal p = of feature contours by additive noise. (x -x >(l,0) n (4.2.3) 0 in the remainder of this section is to derive an expression for the probability density function of e . A one dimensional version of this problem was analyzed by Lunscher (1983). noise However, the noise to be constant while model he used the signal was unecessarily was assumed linear). restrictive Wiejak (as he assumed the (1983) discusses the probability of a zero crossing changing sign or of a new zero crossing being formed due to the addition of noise. Nishihara (1983) also analyzes this problem. In (Nishihara, 1982) is presented experimental evidence for the effects of noise on the zero crossings and no analysis is performed. However, in this thesis crossings we are concerned due to noise, as the appearance only with the perturbation or disappearance of zero crossings will affect only the. matching statistics and not the accuracy of the disparity measurements Zero Crossing Position Errors of zero themselves. 108 We assume that, near the unperturbed zero crossing point XQ, the signal function f(x) and the noise function n(x) can both be approximated by a plane. We can then write f(x) as follows: f(x) = (a/c,b/c)-(x-x ) (4.2.4) 0 where (a,b,c) is the unit normal vector of the signal plane and x y indicates the dot product : operation. Similarly, the noise function can be approximated by: n<x) = (a /c ,b /c )-(x-x ) n n n n (4.2.5) n0 where ( >b > ) is the unit normal vector of the noise function plane. We will take, without a n c n n loss of generality, XQ = (0,0). The function obtained by adding the noise to f(x) can also be approximated by a plane, and can be defined by: n(x) + f(x) = One (a/c+a /c ,b/c + b /c )-(x) n n n n (x ) - ( a / c , b / c ) n0 n n n (4.2.6) n can now see that x , the horizontal position of the perturbed zero crossing, is equal to n the position error, ep, and is given by: e p Thus the error <- ^ V S ^ + V ^ = 4 17) is seen to be proportional to the slope of the noise function and inversely proportional to the signal slope. These proportionalities were also noted by Nishihara (1983). We need to find an expression for the joint probability density of k, = x , k = (a/c) n and k =(a /c ). 3 n n Since f(x) and n(x) are uncorrelated we have that: p(k ,k ,k ) = 1 2 2 3 pp(k ) p (ki,k ) 2 n 3 (4.2.8) 109 Also, we can assume that X R Q will be independent of the slope of the noise function. Thus we can write: pCkiXk,) The = Pf (k ) P 2 (4.2.9) 3 zero crossing rate of a horizontal slice of the random functions f(x) and n(x). Rice (1945) has derived the following ^^(T) = autocovariance R where expression for R, in terms of the autocovariance function, of a horizontal slice of n(x). His result is: R = [->// "(0)/V/ (0)] ld 1/2 ld (4.2.10) /7r function is derived in the appendix and we have: i/(31/22) (l/of) (4.2.11) is the space constant of the V G filter. 2 With regard to k The R3 probability density of xFIQ is assumed to be uniform over a range 1/R, where R is the expected The (k,) p (k ) xn0 2 and k 3 it can be seen that k probability densities of k and k 2 11 Pf (k ) = [-27r^f (0)] Pf (k ) = [-27rV/ (0)] 2 3 2 = 3f(x)/9xi and k 3 — 9n(x)/9xi. can then be shown to be as follows (Rice, 1945): 1 / 2 exp(k 2 V^ (0)) (4.2.12) 1 / 2 exp(k3Vv// (0)) (4.2.13) f 1] and 3 n 11 n I1 110 n where \p^ (r) n \l>n (T) 1 d /dri [\p^r)], = being the autocovariance function of f(x). Likewise, 2 9V9TI2[V/>- (T)], ^ ( 7 ) — being the autocovariance function of n(x). We can now write: p(k„k ,k ) 2 3 = [R/(4jri/ty »(<W "(0))] f exp(-[k /i// (0) + k /i// (0)]) n 2 2 f I1 3 2 n 11 |k,|<l/R (4.2.14) = 0 if |k,|>l/R Now, k =k +k 4 2 kj = k / k 3 4 3 from equation 4.2.7 we have that e has a Gaussian distribution, with variance = k]k /(k + k ). 3 2 3 [\t'f (0)+\// (0)]/2. n n 11 The random variable The random variable is shown by Miller (1964, p50, equation 2.4.1) to have the following density: p (u) k5 = 1/2 2 3 2|W| (W22U + W )/ r[(W u + W 1 1 ) -4W 11 7 where W is the inverse of the covariance 22 2 12 2 u] (4.2.15) 2 matrix of the random variables k 4 and k . It can 3 be shown that: Wa where 2 2/s -2/s -2/s 2(l + s )/s 2 we have 2 set -\J/ (0) = o n (4.2.16) 2 2 n 2 2 and defined s = \//^ (0)/\// (0) 2 l 11 n to be the 'signal to noise ratio' or SNR. Equation (4.2.14) for p ^ reduces to: p (u) k5 The position error = e S[U (1 + S )+1]/TT([U (1 + S ) + 1] -4U ) 2 is seen 2 2 2 to be the product 2 of the two random Since k, is uniformly distributed from - 1 / R to l / R , we can write p48, in his proof of Theorem 4): (4.2.17) 2 variables ki and k . 5 (following Miller, 1964, Ill P (e) ep = R J P 5(u)/u du "Re (4.2.18) k This integral can be evaluated CRC Standard Mathematical Tables) to yield: p (e) = (substitute v = u R/4 - 2 4 2 #109 and #120 R/(27r)tan [{(l + s ) (Re) + (s -l)}/2s] 1 Rs/(47r)log|(l + s ) (Re) + 2(s -l)(Re) + l| 2 and use integrals 2 l J J + 2 Rs/(27r)log|(l + s )(Re) | 2 2 of the (4.2.19) 2 In figure 4.3 we plot P p( p) Tor o^=l, for a number of different values of s . Notice that, e e as the SNR increases, 2 the distribution shifts to smaller error values. Figure 4.4 depicts the case of s = l , for a number of different filter o's. Notice that, as the filter a distribution approaches that expands towards higher error values. It can be seen that, increases, the for large e, p(e) l/(2R(l + s )e ). Thus e p(e) vanishes slower than 1/e as e goes to °°. This means 2 the variance 2 2 of p(e) is undefined, (but see section 5.5 for the definition of a 'pseudo-variance'). Extremal Point The Position analysis Errors for the extremum feature feature case. However, in this case the actual case is the same as for the zero crossing value of R is different It can be shown that (see Rice, 1945 for the derivation): R = Note extremum = (4.2.20) <l/»)[-*i d "W* l d "(<tt] (1/0^)^(137/31) that R e x u - e m u m / R z c = 1-771. This means that the expected extrema is less than the expected interval between zero crossings interval between of a function. From figure SNR • = .5 o =1.0 A = 2.0 + =4.0 CO o ou co G o> Q Zero Crossing Shift FIGURE 4.3 The probability density function of zero crossing position error, for a =l, f and SNR = 0 .5, 1., 2., and 4. 1 2 3 4 5 Zero Crossing Shift FIGURE 4.4 The probability density function of zero crossing position error, for SNR=1, and o = f .5, 1., 2. and 4. 113 4.4 we can becomes see that, as compressed towards error of the extremum Quantized Position decreases, the probability density function of the position error smaller error values. From this we conclude that the features will always be less than that for the zero crossing Error position features. Compulation. To find the discrete probability mass function, p .(n) we can use equation (4.1.1). The ep pr resulting integral is not amenable to simple analytical integration techniques so we must rely on numerical integration. To compute the following graphs we used Lyness' (Lyness) S Q U A N K integration algorithm. The special case of n = 0 gives us the probability of a zero pixel error. This probability is plotted in figure 4.5, along with the probability of 'a error as a function of the signal to noise ratio, given that q = l V 2 , and It should be again pointed out that the analysis presented relatively small noise levels (i.e. small SNR). In addition to the in the position of the true (physically significant) features, 1, 2 or 3 pixel or=l. in this chapter is valid for noise causing a perturbation an excessively high noise level will cause the creation of features that are entirely noise related and have no physical significance. Furthermore, the true features will begin to be broken up, and may vanish entirely. 114 O Signal to Noise Ratio FIGURE 4.5 Probability of an n pixel error, for n=0, 1, and 2, given that q = l//2 and o =l, as a function of the SNR. f 115 4.3 - Analysis of Disparity Measurement Errors due to Filtering In this section we examine the errors in the measured disparity that result from the fact that we are spatially filtering the images with the V G filter. This process has not been 2 described elsewhere in the literature. We will initially examine the one dimensional case and then discuss the extension to two dimensions. We will use in the analysis transform (SST) introduced in chapter of the filtering error the concept 2.1. The use of the SST will of the scale space allow us to find the relationship between the corresponding zero crossings of the two stereo V G filtered images. 2 One Dimensional Let Filtering us consider a situation wherein we are viewing a surface that gives rise to a disparity function of the form: d(x ) L = x -x R L = 0o + p \ x (4.3.1) L It can be seen that if the left eye sees a light intensity pattern g(x) and the right eye a light intensity pattern f(x), then g(x) and f(x) are related as follows: g(x ) L = f(x ) (4.3.2) R Note that this equation is only approximately true, as the observed intensity of a scene point generally depends on the angle However, for parallel cameras with large for the two cameras between are usually fairly the surface normal vector and the view direction. focal lengths the difference between view directions small, so that the difference in observed image intensity will be small. Highly specular surfaces are quite troublesome in this regard, since the observed such image cases intensity is very sensitive to the angle of observation. In order we must use an equation function of position and of the surface such as g(x^)h(x^,n(x,y)) = f(x ), R where to analyze h is some normal vector n . If we assume that (4.3.2) is valid 116 we can write, using (4.3.1): g(x ) L = f(/3„ + (l + p\)x ) (4.3.3) L The SST of f(x) is given by equation (2.1.2) and is repeated below: F(x,a) = dVdx f*Jf(u)aV(/2T)] J e ~ ( x - u ) V 2 a 2 du (4.3.4) The SST of g(x) is obtained by substituting (4.3.3) into (4.3.4) to yield: G(x,a) = letting v = 2 oo 0 (x 1 u)2/2a2 du (4.3.5) j3 + (l + j3i)u we obtain: 0 G(x,a) = e dVdx /" [f(/3 + (l + / 3 ) u ) a V ( / 2 ^ ) ] e " ~ (l + /3 ) dVd((l + /3 )x) ;r [f(v)aV((l + /3 V 2^r)] 1 2 - ( x ( l + p\) + pVv)V2((l + p\)a) 1 2 2 a3 1 r d v ( 4 1 6 ) Thus the SSTs of the left and right images are related as follows: G(x,a) = (l + /3 i r F(^o + (l + |3 )x,(l + /3 )a) (4.3.7) F(x,a) = (l + p\)G((x-p\)/(l + p\),a/(l + p\)) (4-3.8) 1 1 1 or Thus, for /3 = 0, the right hand scale map is obtained from the left hand scale map by a 0 uniform expansion in both the x and a this can be seen directions by a factor of 1/(1 + /3i). A n example of in figure 4.6 which shows the left and right scale maps of a randomly 117 FIGURE 4.6 The left and right scale maps of a randomly textured, tilted surface with a disparity gradient of -60/255. textured surface with j3j =-60/255. The fact that the left and right scale maps are scaled versions of each other are readily apparent Now, crossings consider between the the idealized case left and right in images which we with no can error match the whatsoever. corresponding Let zero us define the measured disparity as follows: V L' x a ) = The variables x^(o) x R (°)- L< x and x (a) R a ) ( 4 - 1 9 ) are the positions of the corresponding zero crossings in the left and right images, measured with filter resolution a. The error in the disparity measurement at a resolution a is given by: e d ( x L' x a ) = d r^ \J ~ ^l) o) d = x (ahx (a)-/3o-0.x (a) R L L (4.3.10) 118 Let us denote the track, in scale space, of the single zero crossing feature (defined by G(x,o) = 0), that passes through the point (x^,a ), by the functional representation: 0 a Note = C(x) with inverse x = that C(x) is a monotonic C _ 1 (a) (4.3.11) function, as the point where dC/dx is zero is the point where two distinct features merge and does not strictiy belong to the contour (see Yuille and Poggio, 1983a for more on this matter). Let Then us assume we have F ( x , a ) = 0, R that the zero that C(x^) and since o = o . 0 the points crossing From (x ,a ) R 0 locations equation are measured (4.3.8) and (x^,o ) 0 it at a resolution o = o0 . can be shown lie on corresponding that, zero since crossing contours, C((x -/3o)/(l + /3i)) R = a , / ( l + p\) (4.3.12) This is shown diagrammatically in figure 4.7. We can rewrite the disparity error in terms of C~*(a) as follows: e (x ,a ) d Note L 0 = (l + j3,)Cr (ao/(l + / 3 , ) ) - x - / 3 x 1 that the disparity error L 1 (4.3.13) L is independent of pY This is what we would since translation of a function does not affect the magnitude of its frequency spectra. set e zero scale d to zero in equation (4.3.13) and solve for a = C ( x ) . ze disparity error map contours expect Let us This will define the family of loci. The significance of this family of curves lies in the fact that, if the do not belong to this family, there will be measurement error. Setting e^ to zero and rearranging (4.3.13) we obtain, a non-zero disparity 119 0" FIGURE 4.7 The relationship between the left and right scale maps of a tilted surface. x = C This equation _1 ze defines obtain the following ( a 0/(1 + 13 0) = (xL+p'1xL)/(l + p\) = x L a family of vertical lines. we substitute (4.3.14) into (4.3.13), we relationship, e (a) = (a./a)[CT (a)-Cr (a)] d If (4.3.14) 1 1 ze (4.3.15) 120 Therefore we conclude that the disparity error is a Jo times the horizontal difference between the scale space zero crossing contour through the point (x^,a ) and the zero error line, x = 0 x^, measured illustrates at o = a / ( l + 0i). An example o how one can determine of this the disparity error is shown in figure 4.7, which with a graphical construction provided one has the scale map of the left hand image. The important point to be noted in the above discussion is that, for there to be zero disparity measurement belong error, to the family all of the scale of zero error conclude that one can, in general, when one is viewing a surface map contours lines. In general, expect to obtain with constant of the image function must all this is not the case, zero and we must disparity measurement disparity (/3i=0) error or when the'zero only crossing measurements are made at a resolution of o = 0. 0 In (Clark and Lawrence, 1984c) is a derivation of the disparity measurement error for —fx—x Y /2 f(x)=e v 0 / , for the case of zero crossing features. They show that the error in this case increases as o increases and as |0,| increases. We will now provide an approximation for the probability density function of the disparity measurement error for f(x) a Gaussian white random process. Zero Crossing Features It turns out that it is not possible to determine an exact expression for the probability density function P (e j) due to the complex, non-stationary nature of the SST, F(x,a), which ed ( is a two dimensional random process. that is valid for small values of fi u We can, however, derive an approximation to P j(e j), ec ( using the following procedure. Consider the situation shown in figure 4.8, which depicts a zero crossing contour of a particular realization of the random process F(x,a). Note that for small A a contour is approximately straight this case is given simply by: and forms an angle <j> with the o the zero crossing axis. The error, e^ in 121 cr F(x,<r)=o FIGURE 4.8 A zero crossing contour of the random process F(x,a) e d = (xi-xojao/o-j = Now (x,-x ) can be expressed in terms of 0, a 0 e d (4.3.16) ( x i - X o X l + /3i) 0 and /3i to give the following expression for : (4.3.17) where /u=tan(0) is the slope of the line perpendicular to the zero crossing contour at (x ,a ). 0 0 The gradient vector, T}=(rj ,rj )=VF(x,a) measured at (x ,o ) is also perpendicular to the zero 1 2 0 0 crossing line. Thus we have that: (4.3.18) and 122 e = d (4.3.19) -00/3,(772/17,) Now our problem reduces to finding the probability density function of the above function of the two random variables 77 and x 77 2 given that F(xo,0o) = O. However, because nature of F(x,o) its autocovariance function is not of the complicated, non-stationary, Fortunately, we can convert our problem from the two dimensional case involving F(x,o) to a one dimensional problem involving the function is well behaved an interesting property (1983a) that the of the scale space transform. scale space transform 9 F(X,CT)/9X 2 = 2 (l/a) Following the of any or Diffusion Now 9F(x,a)/9o is merely n = function g(x) and easy to work with. That type commonly known as the n random Heat = F(x,a ), 0 we are well behaved. whose autocovariance able to do this results from It was pointed out by Yuille and Poggio function satisfies a differential equation equation. In particular we have that: 9F(X,CT)/3O (4.3.20) and 3F(x,o)/9x is 77,. Thus we can write: 2 (a 9 F(x,a )/9x ) / (9F(x,a )/9x) 2 0 notation of a 2 0 (4.3.21) 0 of Rice (1945), let us set £ =g(x) = F(x,o- ), V =dg(x)/dx= 9F(x,o )/9x, 0 0 and $ =d g(x)/dx = 9 F(x,a )/9x . Thus we can write: 2 2 M = 2 0 2 (4.3.22) a 0 $/7? We now have an expression for y in terms of the derivatives of a one dimensional function g(x). We can (VanMarcke, 1983 write p52): the conditional probability of TJ and $ given that £ =0 as follows 123 Pc«,r?U=0) T = (1/[2TT/|B|]) exp(-($,7?) B \i,-n)/2) (4.3.23) where B is a — B -B12B22 ^ B n function of the partitioned covariance Bo where B and (4.3.24) T 1 2 B 2 2 = B Bn n B , of the vector 0 of random variables (4.3.25) B12 B n is 2x2, B n matrix, 2 2 is 2x1 and B 2 2 is l x l . It can be shown (Rice, 1945), that B n , B 1 2 can be expressed in terms of the autocovariance function, \p(r) of g(x) as follows: B„ i//""(0) o = B 1 2 B 22 T 0 (4.3.26) -no) = (iT(0), 0) (4.3.27) = <//(0) (4.3.28) Hence we have that: B 1/a = 0 2 0 1/b (4.3.29) 2 where we have defined: a and 2 = iK0)/[W0)F"(0H<n0)) ] 2 (4.3.30) 124 b = 2 -IAT(O) (4.3.31) Thus we have that: = P (S.T?|*=0) C Now (ab/27r) e (a $ + b r} )/2 2 2 2 2 ( let us make the transformation M = $ 0 / T J . o 4 J J 2 ) From the laws of transformations of random variables (e.g., see VanMarcke 1983, p32) we have that the probability density function of ix and 77 given £ =0 is: P (M,T?|£=0) = m = Now P (T7M/OO,T?|$=0) (ab/(27ra )) ^WoS + V)/! 0 to obtain P ( M ) we need only M (4.3.33) (T?/O0) c integrate ( out the dependence of P^(M,T?|£ =0) 4 3 3 4 ) on TJ. Doing so we get: P (M) = m 5 2 ( b a „ / a ) / ( 7 r [ M + (ba /a) ]) 0 Now making the transformation e = - 0 o / 3 i M d p (e ) ed d = (4.3.35) 2 we obtain: k/(7r[e + k ]) d 2 (4.3.36) 2 where we have defined: Notice that, in the limit as a approaches zero, p(ju) approaches 8(M). Thus the angle of the zero crossing with respect to the vertical approaches zero with probability one. This proves the conjecture that the scale map contours are vertical at a = 0 . 5 0 0 125 k = -pVob/a (4.3.37) This zero crossing position error is seen to have a Cauchy distribution. The form of the autocovariance function is derived in the appendix and it is seen that: n/ i// (0) = (-l) V7r(n + 4)!/[(n/2 + 2)!2 (n) = n + 5 a 0 n *] for n even (4.3.38) 0 for n odd. From these relations we can compute the value of k. The result is: k = (4.3.39) CTO/3, Thus, finally, we get: Ped(ed) = aopVOrfe^ + o V M ) (4.3.40) r Infigure4.9 we plot Ped(ed) f° various values of k. Note that as k increases the density function spreads out towards higher error values. The variance of e^ is undefined, but we can obtain a measure of the dispersion of ped byfittinga Gaussian density to ped by setting P (0) equal to (1/a^(lit)). Doing this we obtain: ed od 2 (4.3.41) = 7roV0,V2 In terms of a, = a0/(l + /3 ) we get: 1 2 a. = a 7T(o1/a0-l) /2 0 2 (4.3.42) 126 u o_ u u k value D = 1.0' C o O © A O CU + = 2.0 = 3.0 = 4.0 o o Zero Crossing Shift FIGURE 4.9 The probability density function of the disparity measurement error for k 1, 2, 3, and 4. Thus the "variance" of the disparity the third power of of, measurement error due to filtering increases roughly as for a given a0. Note that this variance is independent of the power, in the original signal. This is to be expected as scaling a function does not alter the locations of its zero crossings. Quantized Disparity Measurement Error for Zero Crossing Let us now include the effect of quantization density function, p g d > to the discrete probability Features by converting mass function, p the continuous probability e d , using equation (4.1.1). Doing so, we get: p (n) ed The = -(l/7T)[tan ((n+ l/2)q/p\) _1 tan" ((n-l/2)q/p\)] 1 probability of having a zero pixel error (P (0)) is given by: ed (4.3.43) 127 P (0) = ed (4.3.44) -(2/7r)tan- (q/(2/3 )) 1 1 Note that P (n) is independent of the resolution of the filter. This is due to the fact that ed as a increases, the size of the pixels (qa) increases as well. In figure 4.10 we plot P (n) ed as a function of the disparity gradient for n = Extremum 0, 1, 2, and 3, with q = 1 V 2 . Features We can obtain taking the covariance the probability -^ 0 1 2 B 22 = T function of the extremum position error by matrix of (4.3.24) to be: B» B density (6) (0) (4.3.45) 0 tf (0) w 0// (0), 0 ) (4.3.46) 4) = -*"(0) (4.3.47) Thus we get, for B defined as in (4.3.24) and (4.3.29): a 2 = -^"(0)/[i//"(0)V/ (0)-(^ (0)) ] (6) (4) 2 (4.3.48) and 2 b Apart = ( lA// %) from these changes the rest of the derivation is the same as for the zero feature case. Thus we have that: (4.3.49) crossing 128 Disparity Gradient FIGURE 4.10 The probability of an n pixel disparity measurement error as a function of the disparity gradient, given q = l V 2 , for n = 0, 1, 2, and 3. ^ (e ) ed d = k/(7r[e + k']) d (4.3.50) 2 where k is as defined by (4.3.33). Using equation (4.3.38) to obtain the required derivatives we get: k = a,0, (4.3.51) This is the same result as for the zero crossing case. Thus we conclude that, for the one dimensional case, the effect of filtering on the disparity measurement is the same for extremum features as for zero crossing features. Quantized Disparity Measurement Error for Extremum Features It is clear that the discrete probability mass function for the extremum feature case is the same as that of the zero crossing case and is therefore given by equation (4.3.43). 129 Nonlinear Disparity Functions In general, be the disparity of the linear between form assumed functions above. that one obtains If the disparity from real world surfaces will not function is non-linear, a relationship the left and right eye SSTs such as that in equation (4.3.7) can not be found. This being so, we can not expect to be able to find an expression, zero disparity error curves. function about analogous to (4.3.14), for the The best we can do in such a case is to linearize the disparity a point at which we wish to obtain a value for the disparity measurement error. If we expand d(x) in a Taylor's Series about a point x 0 d(x) = We L d n=o (n) rf(x„)/dx (n) (x-x„) we have, (4.3.52) n can linearize by taking the first two terms of this expansion. valid (i.e. the approximation error less than a certain This linearization will be amount) only in a small neighbourhood about x . We will assume that this linearization will be valid. We then say that, 0 d(\) where = d(x ) + dc?(x„)/dx (x-x ) 0 0 = /UxO+x/S^x,,) (4.3.53) /31 = dof(x )/dx and i3 = rf(x0)-Xo/31(Xo). This linearization then allows the computation of 0 0 the disparity measurement error. Extension to the Two Now rise let us consider to a linear intensity Dimensional disparity Case. a two dimensional surface that, when viewed binocularly, function of the form (4.3.1). Then, pattern g(x ,y) and the right L and f(x,y) are related as follows: if the left gives eye sees a light eye an intensity pattern given by f(x ,y), then g(x,y) R 130 g(xL,y) = f(xR,y) (4.3.54) Given the disparity function (4.3.1) we can show that: g(x,y) = fU30 + (H-p\)x,y) (4.3.55) Now let us define the two dimensional Scale Space Transform (2DSST) as follows. 2 2 2 2 2 F(x,y,o,e) = [e 3 /3x + 3 /3y ] [ x u ;/: 00 f(u 1 ,u 2 )aV(27re)e- ( - ') Ve2+ U2 /2a2 (>'- ^ du1duJ (4.3.56) Note that this is a generalization of the two dimensional transform of Yuille and Poggio who defined: 2 2 t(x_Ui:)2+ (y F(x,y,a,e) = V j;"00f(u1,u2)a /(27Te)e" U2)2j/2a2 " du1du2 (4.3.57) The reason for this generalization is to facilitate the description of the relation between the transforms of the left and right images. Since the disparity gradient is in the x-direction only, the effect of having a non-constant disparity is to skew the 2DSST in the (x,y,o\e) space as we go from the left eye to the right eye. This is why we need an extra scaling parameter, the skew factor e, in our definition of the 2DSST. Let us now derive the relationship between G(x,y,a,e) and F(x,y,o,c). Using (4.3.62) we can write: 2 2 2 2 2 G(x,y,o,e) = [e 3 /3x + 3 /3y ] 2 [(x Ul) SSZjWo + (l + P1)u1>u2)a /(27r e )e~ ~ ' + ( y _ U 2 ) 2 ]/2a2 dUldu2 (4.3.58) 131 , Letting Vi = u (l + p i) + j3 1 0 G(x,y,a,e) and v = u 2 = 2 we obtain: [e 9 / 9 x + 9 /9y ] 2 2 2 2 2 rf!.f(v,v,)aV(2,^ dv^v./d + ^O Now, if we define x = x(l + /30 + pY and e = e(l + /3j) (4.3.59) then we can rewrite the above equation as: G(x,y,a,e) = [e 9 /9 x + 9 /9y ] 2 ;;: f(v ,v )a /(2 re)e- ( 00 1 2 2 [ 7 = x Vl 2 ) 2 2/&2 + (y- 2 V 2 F(x,y,o,e) = 2 « / 2 c ; 2 dv dv 1 (4.3.60) 2 F(x(l + /3 ) + /3 ,y,a,e(l + /3,)) 1 (4.3.61) 0 and conversely: F(x,y,a,e) = Now let us define G((x-p\)/(l + p\),y,CT,e/(l + p\)) a two dimensional function, by holding (4.3.62) o and y to be constant, as follows: F(x,e) = F(x,y ,0 ,O o (4.3.63) o We will call this transform, for lack of a better term, the Skew Space Transform (SKST) of the two dimensional function f(x). This transform and is a planar describe slice the relation through between described earlier). We have: the four the left is parametrized by the y 0 and o 0 values, dimensional SST defined by (4.3.56). We can now and right skew maps (analogous to the scale maps 132 F(x,e) = g(x/(l + /3 ),e/(l + /3 )) 1 (4.3.64) 1 where we have set p\ to zero for reasons of clarity. Now, let e x R V G 2 = for which F ( x , l ) R 1. This then gives us the case of V G filtering. The points x 2 = 0 and G(x ,l) L filtered image along y = y = similar and 0 define the locations of zero crossings of the for a filter space constant of a . 0 0 It is evident that the analysis of the two dimensional disparity measurement be L to the one dimensional analysis, with F replacing and e F error will replacing a. In particular we can write (following (4.3.15)): e^l/G + flO) = where x = C (l + p\)[C \l/(l + ^)yc ^ ( 1 / ( 1 + /3 0)] (4.3.65) (e) is the track in (x,e) space (skew space) of the zero crossing of G(x,e) that passes through (x^,l), and C ~ * ( ) is the zero error line given simply by x = x^. ze e In figure 4.11 we plot the skew maps of F(x,e) = 0 and G(x,e) = 0, for a surface with /3! =—60/255 for a = 2. Compare these maps with the scale space maps of the one dimensionally filtered image shown in figure 4.6. As probability in the one dimensional case density we can obtain an approximate function of the disparity measurement error, given expression that g(x) for the is a zero mean white gaussian process. We make the assumption that Ae =-/3 /(l + /3 ) is small, so that 1 the zero crossing through the point (x^.l) is approximately straight, error, e H 1 forming an angle 0. The in this case is given simply by: e d = (x!-x )/ei L = Now (xi-x ) can be expressed T (xj-XoXl + ^ O (4.3.66) in terms of 0 and j3] to give the following expression for 133 Skew map. Disparity Gradient = 0, sigma = 2 IT : : : 1 Skew map, Disparity Gradient=-60, sigma = 2 1 FIGURE 4.11 The left and right skew maps obtained from a randomly textured surface with a horizontal disparity gradient of -60/255 with a =2. (4.3.67) where M=tan(</>) is the slope of the line perpendicular to the zero crossing contour at (x ,l). L The gradient vector, 77 = (77i,7} )=VF(x,e) measured at (x ,l) 2 L is also perpendicular to the zero crossing line. Thus we have that: M = Th/rj, (4.3.68) and e Now d = /3I(T? /T?1)/(1 + P\) 2 (4.3.69) our problem reduces to finding the probability density function of the above function of 134 the two random variables 77 j and 7? given that F ( x l ) = 0. 2 L> The solution of this problem, however, is quite a bit more complex than was the case in the one dimensional analysis. The reason for this is that the differential equation relating the e derivatives to the x derivatives is not as simple as equation (4.3.20). It can be shown (by performing the differentiation with respect to y in (4.3.56)) that F(x,e) can be defined as: F(x,e) = € 3V9x H,(x,e) J 2 + H (x,e) (4.3.70) 2 where H,(x,e) = sZMv)oAe/2*) e-( - ) H,(x,e) = SZMu)Oo/(e/3x) e~ x u V 2 e 2 a ° du (4.3.71) du (4.3.72) 2 ( x _ u ) 2 / 2 e 2 a ( ) 2 and where hi and h are defined as follows: 2 hi(u) h >(") = SZjMoJ{\TH) e" = /r f(u,v)/(a„/2i)[(y -v)Va„ -l] oo ( y o " v ) 2 / 2 C T 2 0 ° dv (4.3.73) 2 e ( - ) _ y v 2 / 2 f f 2 dv (4.3.74) We can now see that: 9F79x = e 9 H,/9x 2 3 3 + 9H /3x 2 = i?, when e=l (4.3.75) 135 9F/9e = 2e9 H,/9x2 + e 9 3H,/9x 9e 2 2 2 It can be shown that (Yuille and Poggio, function for the diffusion equation, = r) when e = l 2 and H , and H 2 are the result of convolving some are solutions of the diffusion equation. With the 2 as described by Yuille and Poggio (appendix, 2 (4.3.76) 1983a), since the Gaussian function is the Green's function with the Gaussian, then H , and H boundary conditions on FL and H 9H2/9e + Yuille and Poggio, 1983a) it can be seen that: 9 H,/9x 2 = (l/c)9H /9e (4.3.77) 9 H /9x 2 = (l/e)9H /9e (4.3.78) 2 2 2 1 2 Therefore we can write rj , in terms of x derivatives only, as follows: 2 r? 2 = 2e9 H /9x 2 1 2 + Let a = ( T j j , r j 2 , £ ) . The covariance e 9 Hj/9x 3 4 + 4 matrix of a e9 H /9x 2 2 2 when e = l (4.3.79) is given by E ( a a ) . Where E(-) indicates the T matrix expectation operator. Using the procedure of Rice (Rice, 1945) it can be shown that: E(rH) = -^i Efai) = -4^ mi) = Vi (0) + 2 « / / (6) (4) ] ( 0 ) + 2^ (4 12 (4) \0)+a ^ 2 12 1 (0)-i// (8 (2) 2 (2) (0) \0) + a »// 2 (0)+^ (0) 2 2 (4.3.80) (4 \ 0 ) + 4a^ 1 (6) (0) + 2a »// 2 12 (6) (0) (4.3.81) (4.3.82) 136 E(7?i7h) In = (4.3.83) 0 (4.3.84) E(T?,0 = 0 Efa,$) = 2 ^ / % ) + a«// the above i// (r) 2 cross-covariance I (6) (0) + 2a^ is the autocovariance function of H} 12 (4) (0) + 2^ of H i , n I// (T) 2 (2) ( 0 ) + ai// that 2 (2) (0) " of H 2 is the 2 as shown in equation (4.3.25), we can see that the joint probability density function of rji and 77 given that £ =0 2 1/a 0 and ^ I ( T ) 2 and H . If we partition the covariance matrix of a B (4.3.85) 0 2 l/b is given by: (4.3.86) ! where 1/a 2 = (4.3.87) E(T?0 and 1/b The 2 covariance = (4.3.88) EfaD-EO^VEtt ) 1 functions are derived in the Appendix, as well as formulae for their derivatives at zero. Inserting these values into the above equations we get: 1/a 2 = 18.25v/ir/a 5 (4.3.89) 137 1/b = 2 [1.579a + 4.345a + 16.11]i/7T/o 4 2 (4.3.90) 5 We have that: PC(T?2,T?,|$=0) = (ab/27r) e ( ^ a l J + b 2 7 ? ) 2 J (4.3.91) / 2 If we let M =773/77, we get: P (M7h) = M (ab/27T)7 e ?1 7?l2(a2 + (4.3.92) b 2 M 2 ) / 2 Integrating out the dependence on 771 gives us: 2 = P^(M) a/[7rb(M + a /b )] 2 Making the transformation e^ = Ped^ = k/[7r(e d 2 (4.3.93) 2 -p\ju we get: + k )] (4.3.94) 2 where: k = -/3,a/b = -f3 /(0.881 + 0.238a + 0.0864a ) 2 lV (4.3.95) 4 As in the one dimensional case the error has a Cauchy distribution. In figure 4.12 we plot P j(e) for a range of /3] values, holding e( 4.13, we plot p insensitive e d for a range of a to the value of a over a = values, holding $1 = -0.1. Notice that p quite of a a large range one-dimensional case, p . was not a function of a at all. values. 1. In figure e d Recall is relatively that in the FIGURE 4.12 The probability density of disparity measurement error for zero crossing features, o = l , p\ = -0.1 to -0.4. Zero C Feature Shift FIGURE 4.13 The probability density of disparity measurement error for zero crossing features, p\ = -0.1, a = 1, to 4. 139 Quantized Zero Crossing Disparity Measurement It can be seen that the form of the quantized disparity measurement mass function is the same as for the one-dimensional case. The form error probability of k is different however, as seen above. In figure 4.14 we plot P j(n) as a function of the disparity gradient ec for n = 0, 1, 2 and 3 with q = function of o for n = l/\/2, and o = 1. In figure 4.15 we plot p j as a £( 0, 1, 2 and 3, holding p\=-0.1 probabilities are relatively insensitive to changes in o and q = 1//2. Note that these over a large range of o values. This is due to the fact that the size of the pixels increase linearly with a. Extremum Features The probability density function of the disparity measurement feature case has the same form matrix B functions that is used (equation is different, in the two cases. 4.3.94) as in the zero however, due to error for the extremum crossing feature case. The the difference As in the one dimensional case in the autocovariance the autocovariances in the extremum and zero crossing cases can be seen to be related by: extremum We can then calculate the new values of 1/a l/a : (4.3.96) zero crossing 2 and 1/b . Doing so we obtain = 7035,/Tr/(64a ) = / rr [ a (2.695) + o (20.87) + (99.49)] lo 2 (4.3.97) 7 and 1/b Hence 2 4 2 9 (4.3.98) o *w N value D O A + -0.1 -0.2 = = = = 0 1 2 3 -0.3 -0.5 -0.4 Disparity Gradient for ZC Features FIGURE 4.14 The probability of an N pixel error for zero crossing features as function of p\ for a = l, q = I V 2 and N = 0,1.2 and 3. N value = o= A = + = • 0 1 2 3 -e1 2 3 -e—< i -p4* 4 iti it f— ' 5 1 Filter Sigma for ZC Features F I G U R E 4.15 The probability of an N pixel error for zero crossing features as function of a for 0, = -0.1, q = 1//2 and N = 0,1,2 and 3. 141 k We = (-/?,>/ (0.0245a + 0.1898a + 0.9051) 4 plot the resulting (4.3.99) 2 error function /^(e^) for a = l and /3,=-0.1 -> figure 4.16, and in figure 4.17 we plot /^(e^) for a =1,2,3 and 4 with /3 =-0.1 ; l/j/2. Compare these graphs to the zero crossing -0.5 in and q = feature case; these differ only slightly. Quantized Extremum Disparity Measurement Error The basic expression for the probability mass function /> (n) is the same as for the ed zero crossing In feature case, except that the expression for k(a) is different, as noted above. figure 4.18 we plot the probability of getting an N pixel disparity measurement error, when using extremum features, as a function of /3] for a = l , given q = l/v/2. 4.19 we plot the probability of getting an N pixel disparity In figure measurement error, when using extremum features, as a function of a for /3j=-.l, given q = l/|/2. 0.5 1 1.5 Extremum Feature Shift FIGURE 4.16 The probability density function of the disparity measurement error for extremum features for 0j = -O.l to -0.4 with o = l. 0.5 1 1.5 Extremum Feature Shift FIGURE 4.17 The probability density function of the disparity measurement error for extremum features with 01=-0.1 for a = 1,2,3 and 4 with q = l/y/2. o u Ed N value • = 0 o =1 = 2 3 CU + = as «0 © O >> — d CO -e—o X) O hi CU j -0.1 -0.2 4 fr 9 $L-$I -0.3 ^ -0.4 1- -0.5 Disparity Gradient for EX Features FIGURE 4.18 The probability of an N pixel disparity measurement error for extremum features as a function of p\ for a=l and q = l/i/2. N value • = 0 o-1 = 2 + = 3 -e- -e- -e—<i •t 4 1 0 1 2 3 5 Filter Sigma for EX Features FIGURE 4.19 The probability of an N pixel disparity measurement error for extremum features as a function of a for p\ = -0.1 and q=lV2. Numbering error. Text for leaf 144 not produced. 145 4.4 - Reconstruction Error Analysis In this section we will examine the errors that arise in the reconstruction of the disparity function from the sparsely distributed samples provided by the matching algorithm. We chapter will analyze the errors 3. described earlier. in detail for the non-uniform reconstruction The interpolation scheme of Grimson method of (1981b) and Terzopoulos (1982) will not be analyzed here, partly due to the difficulty of doing so. Uniform Reconstruction Let dimensional us Error begin by reconstruction Analysis examining process the errors produced as derived by Petersen by the standard and Middleton uniform two (1962). It will be seen that the error analysis for this case will generalize to produce an error analysis for the non-uniform cases. The reconstruction equation of Petersen and Middleton was written down in chapter 3, and is repeated below: fix) = If(x ) g(x-x ) m (4.4.1) s where {x 3 s = Ix : x = l.vS + l j V , ; 1,,1 2 = 0,±1,±2,..J (4.4.2) It can be shown that the spectrum of the sampled function f (x) s If(x)5(x-x„) (4.4.3) LF(w+ws) (4.4.4) is given by 146 where CJ -X =27T6JJ. The above holds only if {x${ S contains an infinite number of points. F(c3) s is the Fourier transform of f(x). Using the convolution property of the Fourier transform, we can write: F (S) = T F (w)G(S) c = LF(S+wJG(£) (4.4.5) It can be seen from this equation that, if F(W+CJ s ) = 0 for G J ^ U is given by equation 3.6.5, then F (c3) r = and F(c3)^0, and if G(w) F(d>), and the reconstruction is exact. The above conditions on G(w) and F(CJ) for exact reconstruction are illustrated graphically in figure 4.20. There are three situations in which the above reconstruction formula will not be exact The first situation occurs when the spectral repetitions sampled function f (x) overlap with the central figure 4.21. Some of the energy in the spectral (i.e. of (\ \ ) u 2 F(c5) = in the spectrum of the (0,0)) repetition, as shown in repetitions is added to the central repetition passed by the filter. This causes an error in the reconstructed energy of the value known as aliasing error. 6 The second situation in which reconstruction filter, g(x), does not pass all of the central errors arise is when the spectral repetition. In other reconstruction words part of the energy in the function that we are trying to reconstruct is filtered out Sometimes this is not such a bad thing because some of the aliased. energy from the other spectral repetitions may also be filtered out The reconstruction filter can also have too wide a bandwidth, so that even if the function has a high enough sample density to eliminate all aliasing (or overlap of the spectral repetitions), the filter may pass some of the energy in the spectral repetitions anyway. Both of these situations are depicted in figure 4.22. In practice this error, which we call the Filtering Error, can be avoided if the maximum region of support of F(c3) and the sampling lattice is known, as one can then design an appropriate filter. The aliasing error derives its name from the fact that a given frequency component of the spectrum of f(x), when replicated, actually becomes a different frequency component Thus these frequency components are actually under an assumed name (frequency) or alias. 6 147 F I G U R E 4.20 The shapes of the regions of support for F(CJ) and G(ZJ) for exact reconstruction of f(x) from its samples. SUPPORT OF •-ALIASING F I G U R E 4.21 density. ERROR Aliasing error in the reconstruction caused by too low a sample 148 ^ = FILTER ERROR FIGURE 4.22 The effect of having an improper reconstruction Filter. Note that the central repetition is partly filtered out and that parts of the other repetitions are passed by the filter. The third type of error that can arise, which will be seen to be a form of filtering error, occurs when a finite subset of the points in ixg} are used to perform the reconstruction of a value of f(x). This error is known as truncation error (because it arises when the reconstruction summation is truncated). The effect of limiting the number of sample points used in the reconstruction is equivalent to reconstruction with afilterthat is a spatially truncated version of the optimum reconstructionfilter.Since the support of this truncated filter in the space domain is bounded, its support in the frequency domain cannot be bounded. \ Hence this truncation error is seen to actually be afilteringerror, since all frequencies will be passed to some extent by thefilter,including those present in the non-central spectral repetitions of f(x). We can write down an expression that includes the effects of all the above errors on the reconstructed function as follows: 149 e_(x) = where g (x) t f(x>f (x) = is the truncated f(x) - version Zf(5)6[(x-xJ]*g (x) (4.4.6) t of the reconstruction filter. Obviously, in order to minimize the truncation, or filtering error, we should choose the Filter function g (x) that has t the following properties: - g (x) = - |G(t3)| is a minimum for all u> outside the region fi, where Q, is the region of t 0 outside some region R (i.e. is spatially bounded) support of F(CJ). Derivation of the optimal filter under truncation We will now derive an approximation for this optimal (in the sense of minimizing the truncation error) filter. Let us assume that both R and 0 are disk shaped with radii a and b respectively. The total energy in the filter function can be seen to be given simply by: A The second energy R = // |g(x)| dS 2 R relation comes = from /;!jG(^)| c£3 (4.4.7) 2 Parseval's in G(c3) for |c3|>b is equivalent theorem. The condition that to the condition that we minimize the the energy in G(a>) be maximized for |o>| <b. Let us define this energy by A^. This has the following value: Afl where * = J/ G*(w)G(w)ck3 (4.4.8) n _ indicates the complex conjugate. Our task is therefore to determine the function g(x) that maximizes AQ/A . We can write A ^ r = ;; dH/(16rr ) Q This can be simplified to: 4 / in terms of g(x) as follows: f^e+J^V(x)c£ SSZ^'^VGWY (4.4.9) 150 = (l/47r )/J//^K s (y-x)g(y)g*(x)cSdy (4.4.10) 2 where K (y-x) g = (l/47r );; e"J '^- )dH J R H (4.4.11) J Slepian (1964) shows that the maximum of A ^ / A j ^ is equal to the largest eigenvalue of the following integral equation, and that the corresponding eigenfunction of this integral equation is the optimal filter function. W(y) = J7R Ks(y-x>//(x)dx for |y|<a (4.4.12) Slepian also shows that this equation can be rewritten in the form: x ai/'Cy) - SS R where c=b/a c^ ^V(x)dx and a = 2n\/\/c. Note for |y|<a (4.4.13) that, for the case a = b = ° ° (i.e. g(x) is non-truncated and non-bandlimited) we get the equation ag(y) or ag(y) = the = UZ a eJ ' g(x)cE (x (4.4.14) y) G(y). The only function whose Fourier transform has the same form as itself is Gaussian. That the Gaussian is the optimum function in terms of jointly localizing the energy in both the space (or time) and frequency domains is a well known result. However, once we apply the condition that the function be space limited, the Gaussian is no longer the optimum function. In order to find this function we must solve equation. Slepian (1964) showed that this equation can be rewritten as: the above integral 151 7 N,n N,n 0 (JN(crrVcrr>(r')dr' 0<r<l = ( r ) (4.4.15) where <£ N>n (r) = /rR N n (r) (4.4.16) and = TlSU for n,N = / N v caNn/(27rj ) (4.4.17) 0,1,2... and where we define ^N,n ' ( r = e ) R N,n ( r ) < c o s ( N 9 ) to be the polar representation of ^ N ( x ) . n ( 1 - ^ ) 0 ^ - 2 ^ + [(1/4- N ) / r - c r + x]tf> = The 2 2 equation 0 (4.4.19) bounded solutions of this equation, for arbitrary N are known as the generalized spheroidal by X g i v e 2 7 N (Slepian, 1964). Bounded solutions occur only for discrete x Junctions n where X N,n+l < 7 N n <X N n a N n n d + 1 7 - One can similarly order the 7 N+ln < 7 > Slepian also shows that the solution of the the above integral equation is also the solution to the following Sturm-Liouville 2 4A18 Nn - T n u s ^ l a r § N n e s t prolate values, given (Slepian, 1964, p. 3039) to eigenvalue of the integral equation (4.4.13) is obtained when N = n = 0. Thus we have that g(x) = ^ a Q ( x ) = 0 o o ( | x | ) V | x | (4.4.20) 152 Now it remains to find an expression for 0QQ(I"). In the literature no closed form exists for this function and we can only write down approximations. Slepian (1964) provides a number of approximations to this function that are valid over different ranges of r. For large -1/4 c and small r (|r|<c ) he shows that: 4> (r) = kVr e ~ 00 where L</^(x) = r2c/2 L 0 (0) ( r c ) for |x|<c" J (4.4.21) 1/4 1 is the Laguerre polynomial of degree zero and k is a constant. We can now write: C g(x) = k e X V2 I I for |x|<c~ (4.4.22) 1/4 Note that this is the Gaussian function, which, as we have seen earlier, is the exact solution when there is no spatial or frequency domain constraints. When we apply such constraints, such as requiring that the function be bandlimited and spacelimited, then the Gaussian is only an approximate solution. Let us now rescale the spatial axes (recall that we earlier scaled so that a = l ) giving x->x/a, c->a c to give 2 g(x) This result Lunscher energy = k e _ c x l l for |x| < ( a c ) ~ 2 (4.4.23) 1/4 is similar to the one obtained by Shanmugan et al (1979), later (1983), who found in the vicinity approximation |x| <(a c)~^ ). 2 2 / 2 /4 beyond the one dimensional filter whose of the step the rather input, restrictive A question range step response arises that it is strictly defined of this over (i.e. Lunscher (1983) does not say anything about this but apparently assumes that Shanmugan et al (1979) cite a paper is incurred by extending Shanmugan had maximum as to the validity this approximation can be extended to |x|<a (recall that by definition g(x) = error corrected by et al misread by Streifer the range Streifer's (1969) claiming that he shows that little of validity of the above remarks. 0 for |x|>a). Streifer says that approximation. However, the approximation can be 153 extended because, in his application multiplied by a function which in his application decreases rapidly the prolate for x>c~^ . /4 spheroidal Thus, function is in his application, extending the approximation will not incur much additional error. However, in our application (as well as Lunscher's and that of Shanmugan et al) we must be more careful in extending our approximations. Actually we need not extend the above approximation past its region of validity at all since Slepian (1964, p 3027) provides, in addition to the above approximation, approximations which, taken together, are valid over the range from |x| = c ~ t o |x| =a. These can be paraphrased as follows: g(x) valid for (a c)~ 1/4 2 ac(v/a2 2 x 2 3) <|x| < a - ( a c ) = 2 2 1/4 "l l /[(a+v/(a -|x| ))(a -|x| ) ] 2 g(x) valid k e~ = _1 (4.4.24) 2 and 2 2 k e" /„(aci/(a -|x| )) 3 (4.4.25) a2c for a-(a c) ^<|x|<a. 2 For large values of c the first of these approximations would cover most of the range in which we are interested (i.e. 0 to a). A plot of g(r) using these approximations figure 4.23. for r in the range 0 to 1, for a = l In figure 4.24 we plot, under from extending and various values of c is given in the same conditions, the filter g(r) that results the approximation of (4.4.23) past its range of validity. It is seen that the filter that results from using equation (4.4.23) for all x is not appreciably different than the filter that thesis, arises from valid approximation. Thus, we will, in the rest of the follow the intuition of Lunscher and Shanmugan et al, and use equation (4.4.23) to define the approximately time the strictly that optimal filter (under the specification truncation). of this filter is only It should be pointed out at this to within a multiplicative constant The problem of scaling the filter properly will be discussed later in this section when we derive expressions for the mean square reconstruction error. 154 FIGURE 4.24 A plot of Slepians first approximation to the optimum filter extended past its region of strict validity. 155 Derivation of the general reconstruction error expression We will now derive a formula for the reconstruction error for the general case (of linear filter reconstruction). Since the disparity functions that we will be trying to reconstruct are, in practice, of a non-deterministic nature we will treat the disparity function, f(x), as a stochastic process, and determine an expression for the mean square reconstruction error. This has been done, in part, by Petersen and Middleton (1962) who show that, for the case of ideal filtering, the minimum mean square reconstruction error is given by: H[f(x)-f R (x)p} = 1/(4**) / / - [ * ( S ) - ( G ^ ) / Q ) Z : e x p a x w » ( S + 5 J ] ( £ 5 m where $(a3) is the Fourier transform of the autocovariance s (4.4.26) s function of f(x) (i.e. the power spectral density), and Q is as defined in equation (3.6.6) of chapter 3. G(c3) is the Fourier transform of the ideal reconstruction filter and is defined by: G(w) = Q$(O))/[£<I>(CJ+C3 )] m (4.4.27) s The set lc3 3 is the set of the frequency domain repetitions caused by the sampling (this set s is the dual of the spatial sample set defined (3.6.3)). If the power spectral by equation (3.6.1) as shown by equation density of f(x) is sufficiently bandlimited then the minimum mean square error vanishes (note: minimum in this context means that the mean square error produced using the above filter G(c3) is less produced using any other filter). square reconstruction error Petersen uniformly than or equal and Middleton averaged over also a sampling to the mean derive cell. square error the minimum mean This quantity may be useful in the analysis of the reconstruction errors in the non-uniform case, where the size of the sampling cell varies. This averaged mean square reconstruction error is given by: E {[f(x>f R (x)?} = A 1/(4TT ) / / ^ * < S ) [ l - G < S ) / O J ( S 2 (4.4.28) 156 However, this equation includes the effects of aliasing only, as it assumes that the ideal reconstruction filter is being used. In practice, as we have seen, we may not be reconstructing with the ideal reconstruction filter. In this case, then, the above formulae for the mean square errors will only supply a lower bound. We need a more general formula for the mean square reconstruction error. Again, such an equation is supplied by Petersen and Middleton (1962) and is given below: 2 H[f(x>fR(x)] } = K(U) - 2IK(x-x„)g(x-xJ + ZIK(x -x fetf-i" )g(x-ST ) m %,U& 1 2 1 (4.4.29) 2 where K(x) is the autocovariance function of f(x). Petersen and Middleton do not, however, provide a general frequency domain representation of the mean square error such as was given for the ideal filter case (4.4.28). They do show that the first two terms of the above expression can be put in terms of frequency domain quantities as follows: K(U) - 2IK(x-xs)g(i-xs) = 1/(4*')//"„,[*(£) - 2(G(£)/Q)Lexp(-jx£ )*(S-5 )]dw S ' (4.4.30) S It is the third term in equation (4.4.29) that they do not provide a frequency domain expression for. Let us call this third term T, for simplicity. We will now derive a frequency domain expression for T. We can expand T in terms of delta functions as follows, using the integral properties of the delta function: T = ////rooK(r?-s)g(x-r)g(x-s)ZI6(r-xSi)5(s-xS2)d?ds (4.4.31) It can be shown (using the result of Appendix A in Petersen and Middleton, 1962) that: ZZ5(r-xSi)5(s-xS2) = (l/Q')ELexp(-jr u^expHsu^) (4.4.32) 157 where {aj } g is the dual set to l!x j] T = (l/Q )ZI/;//" 2 If we make the change K(x) as given by equation (3.6.3). Thus we can write: g ffi K(r-s)g(x-r)g(x-s)exp(-jr^ )expHs^ )drds Si of variables y = r - x , and z = s - x , Sj (4.4.33) and use the fact that g(x) and are even functions we obtain: T = a;//;" K(y-z)g(y)g(z)exp(-jyw )exp(-jztJ )dydz oo Si S2 (4.4.34) where we have defined, for simplicity, the operator a to be: a = (l/Q )LLexp(-jx-(w_ + w „ )) (4.4.35) 2 Separating out the functions that depend only on y gives us: T The = a//" g(y)exp(-jy^ )[//" K(y-z)g(z)exp(-jzicj )dz]dy ra Si tt S2 (4.4.36) integral in the brackets can be recognized as a convolution. Hence we can write: T = a//" ,g(y)exp(-jyw )[K(y)»g^)exp(-jy5 )]dy B S i S j (4.4.37) Replacing the bracketed term by its Fourier transform representation gives: T = a/(47r );/"„g(y)exp(-jyw )[/;" 4>(^)G(^-w pexpayw)d3]dy (4.4.38) a/(47r )//" #(w)G(i-w S2 )[;/r g^)exp(-jy^ )expCyw)dy]dw (4.4.39) 2 Si ffi s Rearranging this equation gives us: T = 2 w ro Si 158 Evaluation of the bracketed integral as a Fourier transform results in: T = a/(47r )fJ"" * ( S ) G ( w - 5 J c )G@-c3 )du (4.4.40) Substituting the expression for a yields: T 2 = (1/(4TT Q ))//" <i>£)£Eexp^ 2 (4.4.41) ro The total mean square error, E, for the general case of aliasing and truncation can now be written, and is done so below: E 2 l/(47T );/r [*(w)-2(G(w)/Q)Iexp(-jx^ )d>(^-w )] = oa s (1/Q )Jj" *(S>ELexp<-jx-(5 J w +w s ))G<5-5 ) G ( 5 - 5 This formula is valid for any filter function g(x) + (4.4.42) )]du and for any random process f(x). We can obtain an expression for the average mean square error over a single sample cell, T, as was done by Petersen and Middleton (1962) in the case of ideal filtering. Let us denote this average mean square error by E . It can be seen that E E K(0) - is given by: = (1/QJEJ J [ 2 K ( x - x ) g ( x - x ) r s + ZZK(x - x g g )g(x-x )g(x-x )]dx g (4.4.43) Making a change of variables and noting that the summation of integrals over the elementary cells T is the same as integrating over the entire space allows us to write: 159 K(0) - (2/Q)/;" K(x)g(x)cS oo + (l/Q)/JR2LK(x -x )g(x-x )g(x-x ))dx Sj s Si (4.4.44) Si Let us define the function r(x) as follows: r(x) = 1, for x e T and r(x) = 0 elsewhere. (4.4.45) Then, using equation (4.4.42) we can write the third term in equation (4.4.29) as: (l/47r Q )IL;/" r(x)exp(-jx-(w 2 Recognizing 3 oo +w ))dx J f~„<l>(w)G(u-£ ) G ( S - 5 . )du (4.4.46) the first integral in the above expression as a Fourier transform allows us to rewrite this as: (l/(47r Q ))IIR(w 2 3 Si +u )S / " S 00 $(w)G(w-w )G(w-w )c£3 Si Petersen and Middleton (1962, Appendix D) show that R(w ) = g zero for cj„e{wJ S> (4.4.47) S2 Q for c3 ek3 } s g = U and is & TJ. This means that, since co„ +a> is a member of k o j , R(C3„ +CJ„ ) iis S Si s and is zero elsewhere. This allows us to rewrite (4.4.47) as: 0 S>1 o equal to Q for c3 = - C J 31 » »2 2 2 (l/(4 r Q ))Z;;" 4'(w)G(^-^ )G(w+w s )ck3 7 J 2 oo (4.4.48) s Combining this result with equation (4.4.30) gives us the following expression for the general averaged mean square error: E = l/(47r2);/rc=*(^)[l " (2/Q)G(£) + (1/Q )EG<S-5JG<£+W-)](£ J (4.4.49) It can be seen that this expression reduces to (4.4.28) when G(c3) is the ideal reconstruction 160 filter. Notice the difference between this expression and the one given previously (equation (4.4.28)) for the ideal filter case. With a non-ideal filter (e.g. a truncated one), the average mean square error depends on the sample set, whereas in the ideal case it did not. Optimal scaling of the reconstruction filter. Let us now consider the problem of scaling thefilter.As we have seen earlier, the optimum (under truncation) filter is only specified to within a multiplicative constant. What should this constant be? To answer this question, let us suppose that G(c3) can be written as kS(c3) where S(0) = 1. It makes sense to determine k such that E is minimized. We can write: J J E = l/(47r )J/" *<S)[1 - (2k/Q)S(5) + (kVQ )E5<£-5c)5(S+wc)]c£ (4.4.50) Differentiating with respect to k and setting equal to zero to find the extremal points yields: k = QJ/!»*(S)B^)<ko / Z / j ' ^ J S ^ + w j S < £ - 5 )oS (4.4.51) m Often we will not have complete information about the process to be reconstructed. In such a case we will find k such that E is minimized for some assumed process. For example, we could assume that f(x) is a constant Then k is given by: 2 k = Q/ZS (w„) (4.4.52) From the above discussion we can see that, in general, the reconstruction error depends on three factors: 1. The power spectral density # (or equivalents the autocovariance function) of f(x). 2. The reconstructionfilterG(OJ). 161 3. The sample set f x J (or its frequency domain dual set !w }). g Variation of any of these parameters will cause a change in the average mean squared error. Example of the reconstruction We error compulation will now illuminate the details of the preceeding discussion with an example. The conditions of this example are similar to the conditions of some of the experiments described in chapter 5. Let the power spectral density of the process be given by a cylindrical gaussian function: d>(cj) = A a v/7Texp(cjj a /4)5(aj ) 2 2 s s 2 (4.4.53) 2 where u>={u> v>i). Let the filter be given by the approximate filter defined by (4.4.23), except x that it extends to °°. That is: g(x) = k exp(-c|x| /2) (4.4.54) 2 Thus: (4.4.55) We will determine k so that the mean square error is minimized when a f(x) is constant). Doing this yields: (k/v/(27rc)) = Q/Iexp(-H /c) Let us define S as follows: 2 g is °° (i.e. when (4.4.56) 162 S =;Iexp(-|aJ |Vc) (4.4.57) s Then E can be written: E - (A'o^ic)/(4Tt>)SZJe*vHjSos>/4) (l/S)exp(-w (l/c+a V4))]dcj 1 2 s (2/S)exp(-w, (l/2c+o )) ~ 2 s 2 + (4.4.58) 1 This integral can be evaluated to give: E = 2 A /(27r)[l + ( l / S ) { l V ( l + 4/(ca S ))-2V(l + 2/(ca ))}] 2 (4.4.59) 2 s The RMS error (i/E) is seen to be proportional to A, the amplitude of the function being reconstructed. This dependance can be seen in the experiments described in chapter 5. Notice that E does not go to zero as a goes to infinity (except for c=0) but approaches 1— 1/S. g This is due to the fact that the exponential filter lets in some energy from the non-central spectral repetitions that are created which was described earlier by the sampling. This is an example of Filtering Error in this section (not to be confused with the filtering error described in section 4.3 which refers to the effects of V G filtering). However, as the distance 2 between samples decreases to zero, S goes to one and E goes zero, E approaches A /27T for all values of c (except zero). 2 to zero. As a Let us assume that we have regular hexagonal sampling. Then, g approaches from chapter 3, we have that: w for l \ u 2 g = lju. + lju, taking on all integer values. The vectors u \ and u (4.4.60) 2 are obtained from the spatial sampling basis using (3.6.3). If we let the distance between nearest neighbour samples be 2a, then the resulting frequency domain sample basis vectors can be computed to be: 163 u, = (7T/a)(l,-lV3) and u2 = (7r7a)(0,2V3) (4.4.61) We can now write S as follows: 2 2 2 2 S = ZZexp(-47r (l1 -l1l2 + l2 )/(3a c)) (4.4.62) In figure 4.25 we plot the average RMS reconstruction error v/E as a function of og for four different values of the filter constant c, given that A=\/(2n), and a = lV3. Figures 4.26 and 4.27 are similar to figure 4.25 except that a = l/i/6 and 1/2/3 respectively. Some conclusions can be immediately drawn. The first is that reducing the sample spacing, a, reduces the error. Secondly, increasing the value of the filter constant c, reduces the error for small values of ag, while decreasing c reduces the error for large values of ag. It is also evident that increasing the separation between samples results in increased error for large ag due to the passing by the filter of the energy in the spectral repetitions, which move closer to the frequency plane origin as the samples get farther apart. Nonuniform Sampling Let us now consider the case of the reconstruction error produced by the warping or transformation method described in chapter 3 for nonuniformly distributed samples. Recall that this method involved making a coordinate transformation based on the sample distribution, in such a way that the samples in the new coordinate system were uniformly distributed, so that the standard uniform reconstruction method of Petersen and Middleton (1962) could be used. The transformed version, h(x), of the function, f(x) to be reconstructed is related to f(x) as follows: _1 h(x) = f(7 (x)) (4.4.63) where 7(x) is the transformation from uniform to nonuniform coordinates. To determine the 164 c value • =c = 5 o = H 1 -A 1 H 1 1 1 4 1 1 V- A ia A A A A A A A c =c +=c A A A A = 10 = 15 = 20 6 A H 1 A A 1- A—i i -e—e—e—e—e—e—e—e—e—e—e—e—e—e—6—e—e—e—<) -B B B B B B -1— 5 10 B B B B -r- Sigma S 15 B B B S 20 B—-B B—11 i 25 FIGURE 4.25 The average RMS reconstruction error for a Gaussian process and filter, with a = lV3, for c=5, 10, 15 and 20. FIGURE 4.26 The average RMS reconstruction error for a Gaussian process and filter, with a = l V 6 , for c=5, 10, 15 and 20. 165 c value D = c =5 o = c = 10 A = c = 15 + = c = 20 1 t- H H 1 10 1- H (- 15 (- •+-• 20 Sigma S 25 FIGURE 4.27 The average RMS reconstruction error for a Gaussian process and filter, with a = W l 2 , for c=5, 10, 15 and 20. average mean square reconstruction determine the power spectral that the set {a>} g error for a given reconstruction filter g(x), we need only density, $^(5), of h(x) and then is fixed for all cases, as described in chapter use equation (4.4.49). Note 3.6. Let us assume that we know the power spectral density, $fe>), of the function, f(x), that is to be reconstructed. The question to be answered now is: How can $^(u>) be obtained from $f<S)? For arbitrary this is an intractable obtain some problem. However, by making some representative results. For example, transformation algorithm described in chapter lattice arose from perturbing a uniform in the assumptions about development of we can the heuristic 3.7, we assumed that the nonuniform sampling lattice slightly. In this case we can model as follows: 7 _ 1 (x) = Ax + B ( x ) (4.4.64) where B ( x ) is some random vector process and A is a constant 2x2 matrix. We can, without 166 loss of generality, assume A to be the identity matrix, I, by suitably rotating, scaling and translating the target (e.g. I space in chapter 3) coordinate system. Thus we can write: h(x) = f(x+E(x)) (4.4.65) Let us assume that both f(x) and b(x) are stationary and zero mean processes. the autocovariance (or autocorrelation) <//{<?) = Hf(x)f(x+7)j \ph (r) = Then functions of h(x) and f(x) can be written as follows: = Hh(x)h(x+7)} Hf(0)f(7)} = (4.4.66) Hh(0)h(7)J (4.4.67) Using the transformation between f and h we get: ^ (7) h = Hf(x+B(x))f(x+7+B(x+r))} = Hf(0)f(B(x+7)-B(x)+7)} (4.4.68) Let us define a new random vector process, c ( x , r ) as follows: c(x,r) = B(x+7)-B(x) (4.4.69) Since B(x) is stationary we can write: C(X,T) = c(r) = E(r)-E(0) (4.4.70) Now let us obtain a single random value of c(7), or event, and call this event Ci(x). For this event we can write ^^(j) in terms of <j>t(j) as follows: 167 * (T) Hf(0)f(c (7)+7)} = hl 1 = * (4.4.71) ft+£>(?)) However, in determining '/'^(r) we must consider all possible values that c ( r ) can take on. Thus we must perform an expectation tf (r) h = mft+c(T))} operation with respect to c ( r ) . Doing so yields: = UZa V (c> c ft (4.4.72) +c)dc where P ( c ) is the probability density funcdon of the random process c . c If P ( c ) and \frft) c are even functions the above equation is seen to. be a convolution. Hence we can write: tf (r) h = (4.4.73) pftMft) Taking the Fourier transform of this relationship allows us to write: * (5) h = <fic &yi>ft) (4.4.74) where # (w) is the characteristic Junction of the distribution P ( T ) . Thus, if we know p (c) c c and <j>ft) we can, in principle, determine ^ ( C J ) compute the average mean square reconstruction When invertibility modelling condition transformation must which can be used in equadon (4.4.49) to error. the random perturbation function, B(x), one must keep in mind the on 7~^(x) never equal (which zero). states This that condition the Jacobian means that determinant B(x) must of the obey the following: |l + 9B(x)/3x| > (we have arbitrarily selected 0 (4.4.75) the sign of the Jacobian to be positive, it could just as easily be negative, in which case the above quantity must always be less than zero.) This can be 168 written as: where rj u One 1+ T h i + 7 h 2 + T?n7? 22 - 77 77 - dby/dxu 1 2 3bi/3x , - Vii 2 possible model for 5(x) (4.4.76) > 0 21 7? = 2) 9b /3xi 2 and 7722 = 3b /3x . 2 2 can be obtained by assuming the probability density of the 7 7 . . to vanish over the range (-5,6), where 5 lies in the range With this condition, the Jacobian of 7 " ^ is guaranteed [-(l+i/3)/2, (1—1/3)/2]. to be always positive. Let us further assume that the power spectral density of the bj(x) vanish outside a disk of radius B in the frequency derivatives domain (i.e. the bj(x) are bandlimited to of a bandlimited deterministic B). Papoulis function are themselves (1967) shows that the limited in magnitude. He extends this result to the case of random functions in the one dimensional case. However, the limit is now a limit on the RMS value of the derivatives. His derivation can be extended to the two dimensional case to give the following limit on the mean square value of the derivatives of bj(x): H|3 k+r b,(x,y)/3x 3y | } k r 2 ^ P B 2 k+ 2 r (4.4.77) where P is the power in b (x), defined as t P and = 2 (l/47T )// n 4> (£)c£ b (4.4.78) is the region of support of the Fourier transform of b,(x) (i.e. where it is non zero). Note that this does not limit the maximum of the derivatives of the functions. However, if the bj are Gaussian distributed, then so are the derivatives of the bj. If we fix the B / P product so that it is less than 8/2, then the probability that the magnitude of the derivatives will not exceed 5 will be 0.955 (as this is the 2a value of the Gaussian distribution). Thus we can create a model for the perturbation noise B(x) by assuming the functions bi(x) and 169 b (x) to be Gaussian distributed and bandlimited such that the product of the bandwidth and 2 the square root [-(W3)/2, of the power of the c is are less than 5/2, where 5 is in the range dV3)/2]. Since B(x) <£ (u>) bj(x) also has a Gaussian distribution, so does c(x). Gaussian. This means that ^ ( w ) for this Thus the characteristic function perturbation model is simply a Gaussian weighted version of ^ w ) . In the case of the sample sequence created by zero crossing contours of V G filtered 2 images, the above perturbation model is not valid. It appears to be very difficult to obtain a model for this case, primarily perturbation contours) function and also in the because because zero of the high crossing case of the difficulty degree of correlation exhibited by the (because zero crossings lie along continuous in finding a model which both satisfies the invertibilty constraint on 7~^(x) and results in a closed form expression for the characteristic function of c(x). the For this reason no further work was done on trying to obtain a model for perturbation function for the developed the theory necessary the case of reconstructing zero crossing sample sequence. However, this section has to compute the average mean square reconstruction error for from non-uniformly distributed samples using the transformation method of chapter 3, even if it turns out that the application of this theory to some sample sequences may be an intractable problem. 170 4.5 - Matching Error Analysis We will now consider matching of features. the contribution to disparity error produced by incorrect Marr and Poggio (1979) discuss this topic in terms of the probability of there being more than one feature work the in the matching region. The assumption implicit in their is that if there is only one feature in the matching range then that feature is the correct match. However, as we will see in this section, this assumption is valid only if the disparity estimate error is very small. They did not consider the effect on the matching error of relatively large disparity estimate error can result from the action detailed analysis of the neighbour matching. of a which, as we have number of error matching error, for the This analysis brings out the 7 processes. case of zero dependence seen in the previous sections, In this section crossing of the we provide a features with nearest matching error on the error in the disparity estimate as well as on the size of the matching region. The away The side. matching process from the estimated is depicted in figure 4.28. The true match is a distance e^ match position and is taken to be the origin of the epipolar line. matching region extends a distance r Ghost matches (defined as away from the estimated match position on either m features, other than the true match, which lie within the matching region) lie at distances T- from the true match. An incorrect match will be made if one of the ghost matches (say ghost match j) lies closer to the estimated match position than does the true match. Thus the matching error will be equal to Tj (and not zero). Note that reducing the size of the matching region will not necessarily reduce the probability of error. It will do so only if the error in the disparity estimate is smaller than the size of this reduced matching region. In general this will not be case and the matching region must be fairly large so that, if there is a relatively large error in the disparity estimate, a chance that the true match reduced error. proceed to In higher general, levels there is still (or a ghost match close to it) will be chosen, resulting in a what we require of resolution, the from our variance matching of the algorithms, disparity error is that, as we gets smaller in 'Recall from chapter 2 that in this form of matching, the match is taken to be the matching feature nearest to the estimated match location. 171 <ftrue m a t c h m FIGURE 4.28 The matching process. absolute terms (or stays more or less constant the resolution increases). in terms of our pixels, which get smaller as To this end, we provide an analysis of the matching error to see if, in fact, the disparity error does converge (i.e. the matching error should be smaller than the error in the initial disparity estimate). For the purposes of the following analysis let us assume, that the matching region is of infinite extent This is not as bold an assumption as it may seem because of the way that the matching is done. Recall that the match is taken to be the closest matching feature to the estimated match position. The only time that having a very large matching region will, have an effect is when the true match is missing, and instead of just having a situation (which would not affect always be incorrect the larger the 'no-match' matching error) we would have a match which would Note that this can also happen with smaller matching regions, but with matching regions there is a slight possibility that the induced matching error be quite large (if there are no ghost matches near to where the true match should be). will 172 Probability density of the matching error We can write the probability density of the matching feature at T = e are m no matching error e m as p(e ) = m Prob{ a given that there is a matching feature at 0 (the true match) and that there features in the region (otherwise these features would have been [T,2e^~r] selected as the match) 3 Let us define P^C?") to be the probability density of the interval r between the feature at the origin and the N ^ matching feature (where we order consecutive 1 features along origin). Let us further 8 the epipolar line define feature given that it is r : ...-2-1,0,1,2... 0 corresponding P (N) to be the probability units from the zeroth definitions it can be seen that the probability to the feature of a feature being at the the N feature (the one at the origin). With these density of the matching error can now be written: Pm< m) e ^ g W W - ' PN-I^X*'* f o r e d < e m < 2 e d ^ and N Pm( m> = N ? 0 P e m ( * W e and is zero for all other values hand side of these equations given that it is a distance e P^rOdr,! for 0 < e < e m of e . For each value of N in the summation, is the probability that the matching m (4.5.2) d feature is the N the right feature from the zeroth feature, times the probability that there is no other feature closer to the estimated match position. Note that in evaluating the probability of a feature closer to the estimated match position than the N the N-l^ N+1 1 and the N - 1 features. If e m is greater than feature, we need only consider e d we need only look at the feature since the N - i ^ ( i > l ) features can be closer to the estimated match position This notation differs slightly from the notation by Longuet-Higgins (1963) who defined P(T) to be the probability density of the interval r between the feature at the origin and the N+1 matching feature. The reason we use the modified notation is to allow the definition of P 0 (T) which is the probability of the same feature being r units apart, which is obviously equal to one at T = 0 and zero everywhere else. 173 than the N case of e m m feature only if the N - l feature is as well. A similar argument holds for the m less than e^, in which case we need only consider the N + l ^ feature. This is shown in figure 4.29. We have assumed that the occurence of a gap with no features in the interval ( e ,2e ) is independent of the occurence m not strictly valid (except for large order to obtain any mathematical of the features at zero and e . This is values of e ) but is an assumption m we must make in headway. An interesting special case is that of e = 0. P (0) m m i s m probability that the correct e match will be found. From the above equations we can see that: 2e 1 - JT o P^rOdT, d P (0) m Note that P (0) m = is a function (4.5.3) of the disparity provides a number of approximations to estimate error, e^. Longuet-Higgins (1963) for the case of zero crossing features of a one dimensional random Gaussian distributed process, f(x). For the case of matching zero crossings without regard to their sign, Longuet-Higgins gives the following approximation: (4.5.4) P,(r) = X(+,-;r) - X( + ,-,-;r) where: X( + , - , t ) = (l/27ry(-^ /i//(r)X/(MnM )[v/(l-v 0 22 12 2 )-v ARCOS(v )]/[^ (0)-^ (r)] 12 2 12 (4.5.5) 2 X( + ,--;r) = (l/47r );v/(-<// /LV(r)y(MiiM22M33)[i/|v| +s a + (s -7r)a + (s -7r)a 3dT 2 0 1 1 2 2 3 3 1 (4.5.6) 174 e-sfimcite M- •e ^ , 2C0 *— -e m FIGURE 4.29 The analysis o f the matching error estimated match position is the N t h feature. and given that the closest match to the where: D «r,s inn D (4.5.7) ln = = , ^ns „ (4.5.8) ln nl nn where n = 2 for X ( + ,-) and n = 3 for X ( + ,-,-)• v.. s, = = u../i/(y -u-) (4.5.9) ARCOS[(v v -v )/v ((l-v 31 12 23 / 31 2 )(l-v 23 2 ))] (4.5.10) 175 s = ARCOS[(v v -v )V((l-v 12 s = ARCOS[(v v -v )/ /((l-v 23 2 3 12 23 23 31 31 12 v 2 2 )(l-v )(l-v 31 12 2 ))] (4.5.11) ))] (4.5.12) 2 (These angles are to be taken in the range (0,7r)) vKr) a, = v v +v 12 23 (4.5.13) a 2 = v v +v 31 (4.5.14) a, = v v +v 12 (4.5.15) 3] 12 23 23 31 is the autocorrelation function of the random process, f(x). The subscripts on the autocorrelation function in the above matrices have the following meaning: *y For = \KT j - T j ) (4.5.16) the case of matching zero crossings with the same contrast sign Longuet-Higgins gives the following approximation: P,(T) = X( + , + ; r ) - X( + ,+,+ ; r ) (4.5.17) where: X ( + ,+ ;T) = (l/27T) /(- // /^(r)>/(MiiM 2)[v/(l-v ) + v A R C O S ( - v ) ] / [ ^ ( 0 ) - ^ ( r ) ] v < 0 2 2 12 12 12 2 2 (4.5.18) 176 X( + , + ,+ ;T) Marr and Poggio, formulae 1979) 1 analysis some writings (Grimson, (1979) minor that they errors. These 1 of the ghost in approximating for these approximations contained t (l/47r )/v/(-<//o/D^(r )y(MiiM22M33)[t/|v|+s a + s a + S3a ]dT (4.5.19) o in their and X ( + , - ; r ) X ( + ,+ ;T) 2 = match 2 wrote down errors were 2 3 probabilities P-^r) for the two cases. in their paper propogated 1981a, 1981b). However, by the appearance provided by Grimson (1981b) it is clear 1 1 used only Furthermore, the (Marr and Poggio, through to Grimson's of the graphs of the functions that the proper equations were used, and that the equations in Marr and Poggio's paper were misprinted (and merely copied by Grimson). The errors were P^O") w a s the following: * w h e n *t s n o u the exponent ld in the [i// (0)~V/ (r)] 2 2 term in the definition of have been -3/2; and the exponents of M (T) and M 2 3 (T) in 22 the definition of H ( r ) were 1 when they should have been 2. Note that Marr and Poggio used the notation of (Rice, 1945) whereas we use the notation of (Longuet-Higgins, 1963). 9 The autocorrelation gaussian process that function of a one dimensional slice of a random two has been V G filtered 2 is derived in the Appendix. dimensional Using this autocorrelation function, we can compute P J ( T ) and hence p (0) as a function of e^ for the m cases of matching zero crossings with and without regard (alternatively these cases can be viewed as matching quantized to within 180° and 360° 4.30 to the sign of the zero crossings zero crossings whose orientations are respectively). The resulting functions are plotted in figure (for a filter o of j/2). The curves do not approach zero asymptotically as they should, but pass right through zero and go negative. This is due to the fact that the approximations used for P-^(T) are accurate only for small values of r. For large values of r they It should also be pointed out that the plots of P ( T ) given by Grimson (Grimson, 1981b, p76-77) are for the case of the filter a = l . It might be assumed, from his graphs that only the horizontal scale changes when a changes but it is not so. The vertical scale also changes. In fact, as a becomes larger (coarser resolution) the peak height of the probability distribution becomes smaller. This is not indicated by Grimson's graphs as omly the horizontal axis on his graphs are scaled in terms of a, while the scaling on the vertical axes is constant 9 177 overestimate P-^(r). Note that as the disparity estimate error is increased, the probability of choosing the correct match quickly goes to zero. Effect of quantization of the orientation of the zero crossings We would crossings expect that, according to their if instead of matching raw zero crossings, we matched zero orientations (within some angle quantization) the probability of obtaining the correct match for a given disparity estimate error would increase. However, the analysis of this probability becomes increasingly more difficult (because one is required to do a two dimensional analysis instead of a one dimensional analysis) and one can only find very crude approximations. We can, however, perform experimental determinations of this probability for different orientation quantizations. This was done for angle quantizations of 22.5°, 180° 60°, and 360°. We generated a 256x256 array of gaussian distributed random variables with mean 128 and variance 64 . This array was then filtered with a V G filter with o=\/2. The 2 2 zero crossings of this filtered array were found and the orientations of these zero crossings were quantized to the desired granularity. This array of zero crossings was replicated in another array which was shifted by an amount e^ with respect to the initial zero crossing array. The matching process was then performed between these two arrays. The total number of correct matches was found and this number was divided by the total number of all matches to give the probability of obtaining a correct match. This procedure was repeated for a number of different values for e^ (integer steps from 0 to 25). The results are displayed in the graph shown in figure 4.31. It is evident that the smaller the angle quantization, the higher the probability of estimate (however, see process to obtaining the correct match for a given error in the disparity the discussion in the next section on the sensitivity of the matching perturbations in the zero crossing orientations, which increases as the angle quantization decreases). Another quantity of interest is the probability density of obtaining a given non-zero matching error as a function of the error in the disparity estimate. However, for non-zero matching errors, finding even an approximate expression for the right hand side of equation 178 u O u u Angle Quantiz. o 180 degrees o 360 degrees Wm C .A o .*-> S3 p so i N O o ^ o a5 1 10 1 15 i 20 25 Error in the Disparity Estimate FIGURE 4.30 The theoretical probability of obtaining the correct match as a function of the disparity measurement error, o = y/2 5 10 15 20 Error in the Disparity Estimate FIGURE 4.31 Experimentally derived relationship between the probability of obtaining the correct match and the error in the disparity estimate for a number of different angle quantizations. 179 (4.5.2) is very difficult as finding a closed form expression for P ( T ) for N > 3 n of matching zero crossings without regard to sign) can not be done (for the case in general (see the discussion on this point in Longuet-Higgins, 1963). However, we can, as we did for the zero matching error case, obtain some representative figure 4.32 for the cases of e m = results numerically. The results are depicted in 0 (same as previously), 1, 2 and 3. As expected, the probability density for a non-zero matching error is a maximum at some non-zero value of the disparity estimate error error, and the larger the matching error, the larger is the disparity for which the probability reaches a peak. This indicates that, as the disparity estimate error increases, the expected value of the magnitude of the matching error will also increase. In error conclusion we can make some general comments about the matching error. in the disparity estimate correct match will be found. algorithm will converge. is relatively small, Thus there will If the be a high probability that the we can be confident that our multi-resolution matching If, on the other hand, the error in the disparity estimate is large then the matching error may be large as well. However, there is still at least a 50% chance that the matching error will be less than the disparity error (since the match may be either closer to or farther away from the true match, and for large disparity errors the chances of one or the other happening are about equal). Thus the matching algorithm may still converge if an iterative procedure (one in which the matching is done repeatedly at a single resolution level) is performed. It is difficult to evaluate how small the disparity error must be in order for the matching procedure to converge. The problem is further complicated by the highly correlated nature of most disparity functions encountered in practice, one section of the image the disparity estimate other region. This often will happen near process produces a large amount of error. enough then the matching process matching process is inherently unstable disparity estimate will may be highly accurate but way off in some disparity discontinuities where If the disparity error hardly which will mean that in ever produce the reconstruction in a given region is high the correct match. Thus the as a large enough disturbance (error) can dislodge the from the somewhat stable point at the correct value, and will never make its way back to the correct value. 180 Matching Error • 1.0 5 10 15 20 25 Error in the Disparity Estimate FIGURE 4.32 The probability density of obtaining a matching error as a function of the error in the disparity estimate. Errors in the orientation When the disparity surface in going from one image is non-constant to the other, as the zero crossing contours will be distorted shown in figure 4.33. Let us assume that the orientation of a given zero crossing in the left image is 0 . Then, if the disparity funcdon is O lineaT along the y axis with a gradient of m, and constant along the x axis, the orientation of the corresponding zero crossing in the right image is not 0 0i The = but is given by: (4.5.20) tan (tan(0 )+m) _1 result of this change process O o in the orientation of the zero crossings to make incorrect matches, or is to cause the matching to cause matches to be missed. To test the effect disparity gradients on the orientation, and how it affects the matching process, of we performed the following experiment: We generated a 256x256 array of gaussian distributed random values, as above. A second 181 FIGURE functions. array 4.33 The distortion of zero crossing contours for non constant disparity was generated which contain the values of the first array horizontally shifted by an amount equal to 20*exp(-(I-128)**2/3200) where I is the row number of the array (1-256). Thus the disparity is constant perform along a row and varies along a column. the matching of the zero crossings for angle degrees, for various values of the disparity estimate was tabulated in each case. The results quantizations of 22.5, error. The percentage are shown Then in figure 4.34. It we tried to 60, of correct and 180 matches can be seen, in comparison with figure 4.32 that one of the effects of the non-constant disparity is to reduce the proportion of correct matches. This is due to the change in the zero crossing orientation. It can also be seen quantization. This that the effect is to be expected is more pronounced for the smaller levels of angle as the larger the angle quantization, the larger the required perturbation to get a change in the angle measurement The bottom line is that if the disparity function produce discussed. some is non-constant matching error over (as is usually and above the case) the matching algorithm will all the other sources of error we have Error in the Disparity Estimate FIGURE 4.34 The probability of obtaining the correct match as a function of the disparity estimate error for non-constant disparity functions. 183 4.6 - Geometry Errors Errors in camera parameters In the Appendix is derived the relationships between the camera geometry (shown in figure 4.35) the image and the physical (viewer displacements, centred) (X],x ) 2 coordinates (X,Y,Z). These relationships are summarized as follows: Z = (l + a ) f d / [ f ( x - x ) + a ( f - x x ) ] X = x,Z/f (4.6.2) Y = (4.6.3) J x 2 1 (4.6.1) 2 y Z/f 2 where a = tan(2/3). We can now determine errors in the measured the sensitivity of the computed position coordinates (or assumed) camera (X,Y,Z) to parameters j3, d , and f. The sensitivities are x obtained by partial differentiation with respect to the parameters in question. Thus, for Z we get: 3Z/3f where D = x , - x 2 = 2Z/f - Z[D+2af]/A is the disparity and A = f D + a ( f + x,x ) 2 We also have: (4.6.4) (4.6.5) 184 FIGURE 4.35 The stereo camera geometry. 3Z/3d = Z/d 3Z/3XJ = -Z(f+ax )/A (4.6.7) 3Z/3x = Z(f-ax,)/A (4.6.8) = sec (/3)9Z/3a x 2 3Z/3/3 For P>>XiX 2> 2 2 = [2af d /A 7 x Z(P + x,x )]sec (^)/A 2 2 (4.6.9) d =0 and d = d the angle sensitivity can be written: 0=0, 3Z/30 (4.6.6) x z = x -Z /d 2 (4.6.10) 185 This sensitivity can become quite large for large depth values, meaning that slight errors in the measured camera tilt angle can result in large errors in the computed depth. The sensitivities of X can be written as: 2 3X/3f = (x2/03Z/3f-(xJ/f )Z (4.6.11) 3X/3dx = (x2/f)3Z/3dx (4.6.12) 3X/3x, = (Xj/OSZ/Sx, (4.6.13) 3X/3x2 = (x2/f)3Z/3x2 + Z/f (4.6.14) 3X/3/3 = (x2/Q3Z/3/3 (4.6.15) The Y sensitivities are obtained in a similar fashion. For truly accurate depth and position computation, precise values for the camera parameters must be obtained. This can be done by careful setup of the cameras, minimizing the effects of external disturbances such as vibration, or by accurate estimation of the camera parameters from image plane measurements of ground control points. This is frequently done in photogrammetric applications (see Ghosh, 1979). Errors due to vertical misalignments 186 If the relative camera tilt angle is nonzero then there will be vertical as well as horizontal disparities (another way of saying that the epipolar lines are not horizontal). If the matching algorithm assumes a horizontal epipolar line along which to search the fact that there are vertical disparities will cause for matches then errors in the measured disparities and also may cause matches to disappear altogether. These two events are depicted in figure 4.36. Let vertical us now derive the probability density misalignment for the case of gaussian from figure 4.36 that the disparity error and the amount of vertical e y misalignment, 8 of the disparity error produced by this random white noise processes. It can be seen is a function of the zero crossing orientation 8 (which is assumed to be constant), and is given by: e y = 5/tan(0) (4.6.16) For any isotropic random process the angle 8 of the zero crossings has a uniform distribution in the interval (-7T,7r). However, in our matching algorithm we ignore all zero crossings that lie close (within an angle A ) to the horizontal. Thus we take the distribution of the angles to be uniform only over the ranges ( - A , A - 7 r ) and (A,7r-A). Thus we have: P«(0) Outside this range = 1/[2TT-4A] for 6>e(-A,A-7r) and (A,TT-A) ?Q(8) is zero. Let u = l/tan(0) and 8 (4.6.17) = tan (1/M). The probability density of u is then given by: P (M) M = = (4.6.18) ?e (8)\d8(n)/dn\ [(2TT-4A)(M + 1)] 2 -1 for tan ( l / M ) e ( - A , A - 7 r ) and (A,7r-A) (4.6.19) 187 ZERO CROSSING ASSUMED EPIPOLAR "> LINE TRUE EPIPOLAR-7 LINE ^ 8 t INCORRECT DISPARITY t ZERO CROSSING MISSING MATCHES FIGURE 4.36 The effects of vertical misalignment on the disparity measurements. Outside this range P^(M) is zero. Since ty = 5n we can write: ev< v> P e = P M (M(e )|3M(e )/3e v v 5/[(27T-4A)(5 + e )] J v (4.6.20) x for e )e(0,8/tan(A) 2 Outside this range P e v ( e y ) is zero. The variance of P ev P V o e y 2 = (25/(27r-45))/e /[5 + e ]de = 2 o (4.6.21) v 2 2 is given by 5 [l/(tan(A)(7r-2A))-l/2] 2 (4.6.22) For small A we have: o e v 2 = 6 /(7rA) 2 (4.6.23) 188 Thus the standard deviation of the error due to the proportional to the magnitude of the vertical misalignment vertical misalignment is seen to be 189 4.7 - Effect of the various errors on the multi-resolution matching algorithm In this section we discuss how the errors described in the previous sections affect the performance of the simplified multi-resolution matching algorithm. The shown in various error sources can be seen to act on the matching process at the points figure 4.37. The disparity measurement errors due to filtering, sensor noise, quantization, and vertical misalignment can be thought of as adding to the positions of the zero crossings that are input to the matcher. The matching error is added to the output of the matcher, and the reconstruction error is added to the output of the reconstruction process. Note that However, the reconstruction the reconstruction process process actually filters, or smooths, also smooths the other error functions. out the disparity function, resulting in a reconstruction error if the disparity function has appreciable high frequency components. It is also evident from figure 4.37 that the matching error estimate obtained resolution level, from the next and hence lower (and in the iterative on the various errors at that recursion which defines the error in the disparity estimate depends on the disparity algorithm, the next level. We can write higher) down a at a given resolution level, k, as follows: e « d e where d function. 0 d ( 0 ) = = k d o " is the initial The reconstruction operation smoothed eW+riejV-^e^ + ^ K c ^ + (4.7.-1) ^ d ( 4 > 7 (lowest reconstruction misalignment errors. are d< >-d = acts resolution) process on is the disparity estimate, indicated filtering, by sensor r{...}. - 2 ) and d is the true disparity It noise, can be seen, that quantization and the vertical Since the reconstruction operation is essentially a smoothing, these errors out somewhat as well. Recall that, in chapter 3, we showed that the 190 FIGURE 4.37 reconstruction error The action of the various errors on the matching process. was reduced when the sample density increased. This fact, coupled with the smoothing of the error function by the reconstruction process suggests that the effect of including slightly getting matches that are reduce the overall disparity error reconstruction incorrect, rather than rid of them may actually by increasing the sample density which in turn reduces the error. This would probably be the case only in the regions of the images for which the disparity function was rapidly changing. It would be the desirable disparity to error obtain at each a closed resolution form expression level, so that we for the could probability distribution of examine the convergence of the matching algorithm. However, even if we assume white Gaussian noise for the input images, the distribution of the various error sources are all markedly non-Gaussian as we have seen in the discussion in the previous sections. analytical expressions for the disparity error probability Thus it is not possible to obtain density function. We can, however, 191 with reference to figure 4.37, make some qualitative statements. We know, from the earlier sections in this chapter, that all of the disparity measurement errors (except for the error due to vertical resolution misalignment increases. of This the means cameras) that as and the we proceed reconstruction to errors decrease higher and higher as the resolutions, the accuracy of the disparity measurements increase. As well, the accuracy of the disparity function increases. The only question lies with the matching algorithm. If the matching algorithm can match accurately seen, the estimate error then convergence of the matching algorithm is assured. However, as we have performance of our matching algorithm depends on the that guides it. In chapter exhibits an impulse at zero estimate of the disparity we saw that the probability density of the matching 4.5 error. The magnitude of this impulse is a monotonically decreasing function of the error in the disparity estimate. that it shows that the accuracy in the The importance of this analysis is disparity function can have a certain level of error and the matching algorithm will still yield exact matches most of the time. Crudely put, we can conclude function that if the reconstruction algorithm will is disparity measurements sufficiently converge. If the errors in the may happen with very noisy sensors, may happen with surfaces accurate, or are at all sufficiently accurate, resolution levels, disparity measurements if the that have high spatial are and then the disparity the excessively disparity function reconstruction frequency components or matching high, as is poor, as from using poor reconstruction methods, the matching algorithm may not converge. Often it may happen that the various sources of error are not uniformly distributed throughout the image but rather tend to accumulate in distinct regions. In this case the matching algorithm may converge over most of the experiments image but diverge over scattered patches described in the next chapter, sufficiently well. of it This is seen in some of the especially when the reconstruction process is done 192 4.8 - Summary of chapter 4 - The major sources of error in the simplified multi-resolution matching algorithm are sensor noise, spatial filtering effects, reconstruction errors, matching errors and geometry errors. - The disparity error due to the sensor noise increases as the resolution decreases and as the signal to noise ratio - decreases. The filtering error, due to spatial filtering of the images for non-constant disparity functions, increases as the disparity gradient increases, and as the resolution decreases. - The left and right scale maps of a one-dimensional stereo pair, for linear disparity functions, are related by a simple expansion factor. Map Two dimensional functions can be represented by the Two Dimensional Scale Space which has through this two spatial dimensions and two function, obtained by holding one scale dimensions. A scale and one two spatial dimensional slice dimension constant, results in the Skew Map of the function. It is seen that the Skew Maps of two functions, for linear disparity, are related by a simple expansion factor. - The reconstruction error is composed of three, somewhat interacting, components, the truncation, aliasing and filtering errors. - The optimal truncated reconstruction filter is derived, consisting of generalized prolate spheroidal wavefunctions. - A general expression for the reconstruction error is derived, involving the sample distribution, function spectrum, and the reconstruction filter impulse response. - The reconstruction error is seen to, in general, rise as the resolution decreases (due to decreased sample density) and as the disparity function bandwidth increases. 193 - The distribution of magnitude of this the matching impulse decreases as the error exhibits error in the an impulse at zero disparity estimate error. supplied to The the matching process is increased. The fact that this impulse exists indicates that the matcher can tolerate some error in the disparity estimate and still yield exact matching. - The level of quantization of the zero crossing orientation affects the matching error, ln general the finer the quantization, the smaller the error. - If the disparity function is not constant, then, in general, the orientation of corresponding zero crossings will not be the same. This can cause an increase in the matching error. The increase in the matching error is seen to be greater for finer orientation quantizations, and for zero crossings whose orientation approaches vertical. - Errors in the measured or assumed camera geometry parameters will cause errors in the computation of depth and position from the disparity measurements. - Vertical misalignment of the error is greatest for zero crossings cameras cause errors in the measured disparity. This near horizontal. For Gaussian white random noise images the error standard deviation is proportional to the amount of vertical misalignment - The total disparity error at any resolution can be written as a recursive function of the errors at the previous resolutions. - The reconstruction process tends to smooth out the disparity measurement errors (not including the errors produced by the reconstruction process itself). - If the disparity errors multiple of the V G filter a) 2 at each resolution are and if the reconstruction relatively small (compared process to some is sufficiently accurate then the matching algorithm will converge. - The matching algorithm may small patches of the image. converge over most of an image and diverge over 194 V - EXPERIMENTS WITH T H E DISCRETE MULTI-RESOLUTION MATCHING ALGORITHMS 5.1 - Introduction This chapter presents the description and results of computational experiments performed to illustrate some of the analyses of the discrete multi- resolution matching algorithms that were done in the previous chapter. The topics covered in this chapter and the relationship between them are summarized in figure 5.1. A summary of the findings of this chapter is given at the end of the chapter. The experiments noise, reconstruction multi-resolution for with the last errors, filtering errors and matching errors on the performance of the matching algorithms. All of the experiments described in this chapter (except section) Gaussian detailed in this chapter are designed to examine the effects of sensor were performed on images distributions with mean 128 whose intensities were and standard deviation randomly distributed of 64. The intensity distributions were truncated so that all the intensities had values between zero and 255. The departure from the true Gaussian distribution caused by this truncation is assumed to be negligible. The measure of error that is used in these experiments is the RMS disparity error. That is, the square root of the average of the square of the difference between the measured disparity and the actual disparity at each point in the image. The surfaces used in the experiments were ones which gave rise to disparity functions of the form d(x,y) = exp(-yV2o ) d, max 2 s (5.1.1) and rf(x,y) = rf max exp(-x /2a ) 2 2 s (5.1.2) 195 Introduction CH5.I Application of Experiments stereo vision * Summary to log scaling CHS."? Implementation of the f i l t e r CH 5.2 Frequency response Review of basic CM 5.3 techniques CM 5.1 Error due to disparity gradients Analysis of the CH 6". 4 measurement accuracy CH 5.? Error due to * Finding the log in the disparity map sensor noise ^ I CH CHS.? 5.5 Comparison of the simple Estimating the log volume CH * method to the dispscan method 5.? (*) Indicates New Material FIGURE 5.1 The topics covered in this chapter. These are cylindrical gaussian functions. These disparity functions were chosen of reasons. First, by changing o we can change the effective for a number bandwidth of the disparity 196 function, thereby exercising cylinders is aligned with there is surface the the no filtering effect reconstruction x-axis, the in this algorithms. disparity gradient case. Furthermore which would cause missing or incorrect the y-axis Secondly, along the there matches. when are If the no the axis of these x-axis is zero. Thus self-occlusions of the cylinder axis is aligned with then there is a disparity gradient along the x-axis. If we constrain the disparity gradient along the x-axis to be less than one, there will be no occlusions. Thus, we can obtain a measure of the effect of the filtering error on the matching algorithm by performing the experiment first on a cylinder with its axis along the x-axis and then repeating it with the cylinder aligned y-axis. The change in the the filtering error effect (but see remarks components). The for modeled the expected attributed section to with third the reason disparity using these the gaussian functions obtained in the observed disparity error can below on decoupling the cylinders was that they be error somewhat log scaling application described in 5.6. We examine, in these experiments, the use of three different matching algorithms. These are: 1. Matching using zero crossing features only. 2. Iterative (two pass) matching using zero crossing features only. 3. Iterative (two pass) matching using zero crossing and extremum features. Furthermore, three different types of reconstruction algorithms are tried. These are: 1. Warping or Transformation reconstruction method (see chapter filter derived in chapter 2. Relaxation 3.7) using the optimal 4.4. reconstruction method, (see chapter 3.3). This method is used only for the iterative matching algorithms, in order to speed its convergence rate. 3. 9x9 Averaging. This method was suggested by Grimson (1981a) as obtaining a disparity value from a region. It consists of merely averaging the a means of values of all 197 sample points located in a 9x9 pixel region of the reconstruction grid centred on the point to be reconstructed. T h e motivation for using such a method is that it provides a check on whether or not a simple reconstruction is all that is required. This method is computationally much cheaper than the other two reconstruction techniques. However such a large region causes problems when the disparity function is not constant. One of the difficulties that we encounter in doing these experiments is in decoupling the various sources of error from the resultant disparity error. The effect of the sensor noise is decoupled from the other conditions the same error contributions by repeating a given experiment holding all except that a Gaussian random function with a given variance is added to one of the input images. The increase in the disparity error can be then attributed to this added noise signal. Similarly, the effect of changing reconstruction methods and the effect of changing the surface bandwidth is obtained with the same process; while holding chapter, the all other conditions matching error the same. However, as is dependent on the errors we varying a given parameter have in the seen in the disparity estimate, previous which in turn is a function of the various measurement and reconstruction errors. This means that the matching between error two can not experiments be held will constant Thus always consist the observed of changes differences in the in disparity matching error as error well as changes in the disparity measurement error due to the effect being tested for (such as sensor noise). However, reconstruction etc.) all is not lost If the disparity estimate error (due to sensor is small enough so that the correct match is always within the matching region, then the matching error will be essentially independent of the disparity estimate (as noise, was shown in the previous chapter). In this case the changes error in the observed disparity error will be due to changes in the parameter that is being varied. Before going on to the presentation of the actual experiments we will briefly describe the implementation of the multi-resolution feature detection algorithm. - 198 5.2 - Implementation of the Multi-Resolution Feature ln this section we discuss the implementation of the production of the multi-resolution feature up into two sections: the Extraction subsystem image representation. spatial filtering to produce the responsible for the This subsystem can be broken set of spatial frequency channels, and the feature detection. The spatial filtering is performed as shown in figure 5.2. The sampling one lowpass filter rate reducer. half its and sub-sampler sections form The lowpass filter restricts the previous maximum, to limit the a two-dimensional decimator maximum frequency of the aliasing error when the filtered or image to image is subsampled. Each decimation stage reduces the number of image samples by a factor of four (by two in each of the horizontal and vertical directions). exactly the same set of filter coefficients. Each successive Each lowpass filter section has stage of the decimator is followed by a V G bandpass filter. Even though the coefficients for each of these V G filters are the 2 2 same, the apparent frequency response of these filters with respect to the input have different centre frequencies because of the sampling rate reduction. This scheme of spatial frequency channel production offers distinct advantages direct method in which the having a different frequency input signal response. is filtered by over the four separate bandpass filters, each The first, and probably least important advantage, is that only one set of filter coefficients is required for all the lowpass filters and for all the V G 2 filters. A more important advantage lies in the fact that the centre frequency of the prototypical bandpass filter, with respect to the input, is fairly high, radians. bandpass In designing required to approximate finite wordlength digital an ideal filter response filters the to a given accuracy on the number order of of n/2 coefficients is inversely proportional to the centre frequency. For example, in the direct method we would require a filter size on the order of 8Nx8N for the lowest (fourth) spatial frequency channel filter (given that the highest frequency filter was of size NxN), compared to the NxN size filter required in the hierarchical scheme filters in the for all the channels. Of course hierarchical case but these too will we must take into account be of constant, the low pass not exponential, size. In 199 INPUT IMAGE V 6 , CHAKINFl 2 BANDPASS (N.N) FILTER 1 (N-N) LOW PASS FILTER • SUBSAMPLER V 6 BPF ^.CHANNEL 2 V G BPF , CHANNEL 2 fW) 3 LP F SS PRODUCTION OF SPATIAL FREQUENCY C H A N N E L S — i — FIGURE 5.2 The spatial Filtering process. addition, the structure of our hierarchical filtering system facilitates the can be compared with pipelining of computation as can be seen in (Clark and Lav/Tence, 1985c). The developed filtering method by Crowley described and Stern in this (1984). thesis They compute the Difference of the technique Low-Pass (or 200 D O L P ) transform of an image, which, for Gaussian low-pass filters, closely approximates the V G 2 filter. separability method Their method uses of the Gaussian produces subsampling low-pass filter bandpass filtering at to to resolution reduce reduce levels computation and the amount that are a also uses of computation. factor of \/2 the Their apart, in comparison to our method which produces bandpass filters with resolution a factor of 2 apart In some cases this may be useful, but for our application the Crowley and Stern method needs to do twice as much computation than is actually required. Clark and Lawrence (1984, and 1985c) describe a proposed hardware this thesis that takes advantage to increase the speed implementation of the filtering method described in of the fact that it can be implemented in a pipelined fashion of computation. It is not clear whether the method of Crowley and Stern can be similarly configured. If we let the lowpass filter prototype have a frequency response U^\U>i) and the bandpass filter have a frequency response B ( b > i £ > ) then the frequency responses of the spatial 2 frequency channels, referred to the input, are as follows: H1(CJ1JW2) = B(CJJ^J) H ( w i j w ) = B(2CJ !,2CJ )L(W i AJ2) 2 2 2 H 3 (OJ ] A> 2) = B(4co, ,4CJ )Uw x A> 2 )L(2o) i ,2CJ2) 2 H4(W1A) ) = B(8CL)I,8CJ2)UCJ1AJ )U2W ,2CO ^4JI,4J2) 2 2 1 and ^ w £ ) ) = L(a) + 27rkA>2 + 27r^) for 2 1 2 1 k,l = ± 1,2,3... The prototype one-dimensional transformation lowpass takes a two-dimensional filter using one-dimensional lowpass the filter McClellan filter with was designed transformation transfer function by transforming (McClellan, Fi(co) 1973). a This and produces a two-dimensional filter with transfer function: (5.2.1) 201 where: f(w cj ) lJ This = 2 transformation two-dimensional arcos[.5(cos(cji) + cos(co ) + cos(a),)cos(6J )-l)] 2 preserves the optimality (if present) of the one-dimensional filter in the design. The one-dimensional algorithm (McClellan (5.2.2) 2 et al 1973) filter was designed using the Remez exchange to produce an optimal half band lowpass filter (optimal in the sense that the peak approximation error to an ideal lowpass filter is minimized using a minimax criterion). The used in peak sidelobe level of the the filter. For N=25 this low pass filter is set level is about by the -33dB. One number of coefficients result of the transformation is that the resulting filter displays octant symmetry. Thus L(6Ji£j ) 2 L(WI,-CJ ) = = U-cj!A> ) - 2 2 = 1X^2,-0)0 \X-UJ (JJI) = means that, for N odd, there are only ( N + l ) V 8 + (n + l)/4 N . This 2 can result symmetry is taken computer, instead processor described using the Fast VAX-11/750 in in a large advantage of being of. savings in to build a in (Clark and Lawrence, Fourier Transform (FFT) we 1985c), on were special an U and however. Typical C P U times to use device This 2 FPS-100 array throughput a general such implemented the minicomputer. The multi-resolution filtering process figure 5.2, increased forced purpose we L(GJ2A>I) L(-Cc> £Ji). - 2 = unique filter coefficients instead of computation However, since able IX-CL) -CJ ) 2 McClellan as the if the purpose systolic filtering operations processor used was still attached to a that depicted for performing four level filtering on a 256x256 image were on the order of 70 seconds, for a lightly loaded system. The prototype bandpass filter is, as mentioned earlier, a 2 V G filter with transfer function B(a> p )=k(w +w )e" 1 2 1 2 2 2 a2(cJl2+£j22) (5.2.3) 202 The value of o coefficients) is chosen to trade and low aliasing error off between high bandwidth (lower number (due to the sampling of the ideal continuous of filter filter). We set the o value for the highest resolution level to be y/ 2. The frequency response of the four spatial filters is shown in figure 5.3. One of the spatial frequency axes has been suppressed (CJ = 0) 2 for clarity. The peak sidelobe levels are less than 33 dB in all cases. Zero crossings are detected by scanning along horizontal lines (rasters) for either a zero value or for a change in sign. When one of these is found, a zero crossing is assigned to the position of the left pixel in the case of a sign change, and to the zero pixel in the zero value case. Once the zero crossings have been detected, which gets rid of small (one or two pixels across) isolated clumps of zero crossings. After this is done, we compute a measure zero we perform a thinning procedure crossing pixel. This value is of the angle of the zero crossing contour then used between possible matches. The angle measurements 6 sectors in a 360° in the matching algorithm to through the disambiguate are quantized to to sectors of 6 0 ° , that is, circle. To provide some measure of noise immunity, we ignore all zero crossings whose contrast falls below a given threshold. The threshold used will depend on the expected signal to noise ratio of the V G filtered images (which in turn will depend on the 2 processor 20/255 word length and camera characteristics). In our experiments as the majority of noise-like zero crossings fell below we used a threshold of this threshold. The zero crossing detection algorithm did not use the array processor at all, and typical C P U times for the zero crossing detection process on a four level image set (256x256, 128x128,...) were on the order of 90 seconds for a lightly loaded system. Combining these times with the typical filtering times reported above for the filtering process results Seconds (about five and a half minutes) in times on the order of 320 for performing the multi-resolution filtering and zero crossing extraction on a stereo pair of 256x256 images. 203 FREQUENCY FIGURE 5.3 The frequency response of the four spatial filters. Extremal performed by points (points searching along of local maxima horizontal lines and minima) were also between zero successive detected. crossings This was for the maximum, or minumum value. Note that this procedure Finds only one extremal point between successive zero crossings. Thus, since the expected number of extremal points is greater than the expected number of zero crossings (see equations 4.3.11 and 4.3.20), we will not find all 204 the extrema with this procedure. However, in practice, the number of extrema due to noise is substantial. The largest extremum in a given interval (which we call a semi-local extremum) is most likely not a noise extremum (although the position of this extremum may be perturbed by noise). In using the semi-local extrema as features we trade off feature density for the assurance that the features are created by events in the scene and not by noise. 205 5.3 - Frequency Response of the Matching Algorithms In this chapter we examine the performance of three different matching schemes, using three different reconstruction surface disparity methods, as we vary the frequency domain characteristics function. The change attained by varying the value of a values of a and g g All stereo pairs disparity function frequency content is in the disparity function equation (5.1.1) and (5.1.2). The used in the experiments 80 pixels. in the surface of the in this chapter are 5, 10, 15, 20, 25, 30, 35, 40, 45 in the experiments were 256x256 arrays of white gaussian random numbers with mean 128 and standard deviation of 64. The left and right images are related by the following equation: I where d(x,y) right (x,y) = I (x+tf(x,y),y) (5.3.1) left is given in (5.1.1). Figure 5.4 shows the zero crossing pyramid of such a random image pair. The three matching algorithms that are used are: 1. - Single pass, zero crossing features only. 2. - Two pass iterative, zero crossing features only. 3. - Two pass iterative, zero crossing and extremum features. The size of the matching region was seven pixels wide (r = 3 pixels). Figure 5.5 shows how the RMS disparity error varies as the size of the matching region changes (single pass, zero crossings only with reconstruction by the transformation method). Beyond r much difference in the measured RMS disparity error. This is because m there is not the matching is done from the centre of the matching region outwards. Thus a feature in the outlying parts of the matching region will be taken to be the match only if there is no features the centre of the matching region (see chapter 4.5). For large r probability of happening, and hence increasing r m closer towards this will have a very small past a certain point will have little effect 206 FIGURE 5.4 For the The zero crossing pyramid of a random image pair. purposes of the experiments three resolution levels were used. The variation 207 o CM o Ui Ui CO © a, tn • — i< Q S i z e of R • = 2 o = 3 ° = 4 T 10 20 30 40 of the RMS The variation of the RMS disparity error with changes 80 70 60 Surface Sigma FIGURE 5.5 —r- —r- 50 disparity error with the matching region size. in the number of resolution levels is shown in figure 5.6. It can be seen from this graph that at least three levels of resolution are required for these experiments. The three reconstruction methods used in these experiments are 5.1. those listed in section The relaxation method is used only in the iterative procedures, as the convergence slow in the single pass case. Figure 5.7 charts the RMS is too disparity error as a function of the number of relaxation iterations performed, and provides a visual indication of the convergence. It is evident that even more iterations are necessary for complete convergence. However, even at fifty iterations the amount of computation is very high, and the reconstruction takes longer than for the averaging or transformation methods. The shown estimate results of in figures 5.9 of the RMS the to frequency 5.20. error response The thick solid due to the experiments line regions of are in each the summarized by of these graphs image that can not the graphs represents be an matched (because the two images do not overlap in these regions). We assume that the disparity value FIGURE 5.6 The variation of the RMS disparity error with the number of resolution levels. Surface Sigma FIGURE 5.7 The RMS disparity error as a function of the number of relaxation iterations. 209 obtained in these regions will be, on the average, one half the disparity of the actual disparity at these points. This assumption is supported by examination of the actual disparity values obtained in these regions in the experimental tests. Perspective shaded plots 10 of the disparity functions obtained with algorithm 1 and the warping reconstruction method for disparity function a of 10, 20, 40, and 80 figure 5.8. It can be seen that the majority of the matching errors are edges the of image. Also noticeable is the degradation in are given in to be found at the performance for the higher bandwidth surface, evident in the poorly defined ridge in the peak of the measured disparity function in the o— 10 case. In figure 5.21 is shown perspective, plots of the error maps obtained using matching method 1 for the three different reconstruction methods, for the case of a 200. g = 40. The number of relaxation iterations used for the relaxation reconstruction was Note that the errors are typically localized to small patches of relatively high error. The warping method is seen to yield the lowest error. Figure 5.22 shows perspective plots of the disparity functions for these cases. The warping method results in the least amount of spike errors, but produces a surface that is not as smooth as the relaxation or averaging methods. The following points can be made from these results: 1. the The averaging method performs comparably to the transformation method, while relaxation method is somewhat worse. This is to be expected since the transformation method, due to truncation, is far from optimal, and the relaxation method is not convergent The warping or transformation method is seen to be the best overall in terms of reducing the amount of the highly localized spike errors. 2. - The performance with addition of relaxation extrema features reconstruction, but to does the zero not with the other two reconstruction methods except at low a due to the fact the the relaxation reconstruction crossing appreciably g features affect the improves performance values. This can be explained as method requires less iterations to converge The program for plotting these shaded plots was written by Richard Jankowski of the Electrical Engineering Department U.B.C. 10 the 210 FIGURE 5.8 Perspective plots of the disparity function obtained using matching method 1, with the warping reconstruction method. when the sample density is increased, density may not improve the while for reconstruction if the the other disparity methods, increasing function is already the sample oversampled. Recon. Method a Warping 8O co . ?_. Ayerag ing u. IM CO CM CO 05 10 T 20 - 30 40 50 60 Surface Sigma —r— 70 B0 FIGURE 5.9 RMS disparity error as a function of o , for matching algorithm 1, maximum disparity = 5. g Surface Sigma FIGURE 5.10 RMS disparity error as a function of o , for matching algorithm 1, maximum disparity = 10. g -1 Recon. Method D Warping O eo u b Ed .?._ Averaging 20 30 40 50 Surface Sigma FIGURE 5.11 RMS disparity error as a function of o maximum disparity = 15. 30 40 60 70 80 for matching algorithm 50 Surface Sigma F I G U R E 5.12 RMS disparity error as a function of o' for matching algorithm maximum disparity = 20. O t-i Recon. Method o Warping . ? . . A V erag 1 ng ^"Relaxation CO- u> Cz3 efl CM - a, 3S -A A O- 10 20 A A —T— 30 —r40 —r60 50 Surface Sigma • u 70 80 FIGURE 5.13 RMS disparity error as a function of a , for matching algorithm 2, maximum disparity = 5. g O Recon. Method D Warping . 9.. Averaging a Relaxation P 5 - i 0 i 10 i 20 i 30 i 40 i 50 i 60 I 70 T 80 Surface Sigma FIGURE 5.14 RMS disparity error as a function of a , for matching algorithm 2, maximum disparity = 10. g Recon. Method o Warping . 9_. AY?rag ing a Relaxation n O u ed a CM CO CO - r - 10 20 -r- 30 40 50 60 70 80 Surface Sigma FIGURE 5.15 RMS disparity error as a function of o maximum disparity = 15. o-| 0 i 10 1 20 i 30 '• 1 1 40 50 i 60 Surface Sigma FIGURE 5.16 RMS disparity error as a function of a maximum disparity = 20. for matching algorithm 2, g) i 70 80 for matching algorithm 2, 215 Recon. Method a Warping . ?.. A Y crag l ng A Relaxation u> O n Ui u Ed ed cuCX CO Q S o- 0 i i 10 20 i 30 A ., 40 i 50 Surface Sigma FIGURE 5.17 RMS maximum disparity = 5. 70 80 disparity error as a function of o S' for matching algorithm 3, Recon. Method • Warping u O n u Averaging Relaxation H cd 60 CM- CO CO OS "T" 10 20 30 40 —T" 50 Surface Sigma FIGURE 5.18 RMS maximum disparity = 10. 60 70 80 disparity error as a function of o S' for matching algorithm 3, kl O u Recon. Method a Warping m- . 9 . . AY J l J J 1 ng e a ^""Relaxation W CM - 10 CO 10 20 30 40 50 60 70 80 Surface Sigma FIGURE 5.19 RMS disparity error as a function of og. for matching algorithm maximum disparity = 15. Surface Sigma FIGURE 5.20 RMS disparity error as a function of os , for matching algorithm maximum disparity = 20. 217 This has as a corollary the statement that the reconstruction error is not a major component 218 WARPING RELAXATION AVERAGING FIGURE 5.22 Perspective plots of the disparity function obtained for a =40 three reconstruction techniques. of the disparity error, at least not for the larger a g values. for the 219 3. - The RMS disparity error has been seen in chapter 4.4 is proportional to the maximum disparity value. This to be the case for the reconstruction error. It is also evident that the other sources of error, that is, additive noise, quantization error, and filtering error, would, in general, not be related to the maximum disparity value. This implies that the reconstruction process, coupled with the matching process, as described in chapter 4.5, at low resolutions, is responsible for the bulk of the disparity error. 4. - The RMS disparity error, after subtracting the estimated RMS component due to edge effects, is seen to increase as a g decreases. This is due to the reconstruction error and roughly follows the example given in chapter 4.4 (where the same type of disparity function was use, but a slightly different type of filter). This effect can be seen in figure 5.18 for the a = 10 case where the disparity function is not completely reconstructed, due to the rapid change in the disparity function. 5. seen to The RMS disparity error paradoxically rise for large ag in the case of the warping reconstruction method is values. This is due to edge effects. The warping method assumes a fixed boundary of zero value, while the other two methods assume a free boundary. Thus, in the warping case, the reconstructed function is pulled down to zero at the edges. The error incurred by this pulling down disparity values near the edges of such surfaces for low a g is largest are surfaces which quickly drop down to zero. for large ag surfaces since the relatively high compared to the values 220 5.4 - Surface Gradient Response of the Matching Algorithms The effect of the surface gradient (in the x-direction) on the performance of the matching algorithms is obtained as follows. The RMS disparity measurement error is found for a number of different gaussian cylinder surfaces as in the previous section. However, in these cases the axis of the cylinders were aligned with the y-axis. This means that there is a disparity gradient along the x-axis. Because of this disparity gradient we expect to observe an increase, in the measured RMS disparity error from the case in which the cylinder axis was aligned with the x-axis. To obtain a value for the error due to the filtering effect, we run the same experiment twice, once with the gaussian cylinder aligned with the x-axis and then with the cylinder axis aligned with the y-axis. The difference in the measured RMS disparity error can be attributed to the effect of the V G filtering process 2 on the feature positions. For most values of o the average disparity gradient is approximately given by: daVdx = avg d /256 max n We performed the (5.4.1) ' mflv experiments v for four values of ^ m a x ; 5, 10, 15, and 20. Ten different values of surface sigma were used; these were the same values as used in the experiments of the previous section. Since the average disparity gradient is essentially independent of surface sigma, the RMS errors obtained using the different sigma values (for the same the rf_„ ) v rn.3.x can be averaged results of this to obtain a better estimate experiment were of the filtering error component inconclusive, as the variance of the error However the differences so obtained were very high. In component order we to fully must examine somehow the effect decouple of the disparity disparity gradients on measurement the filtering error process from the correspondence process. This can be done, for the case of one-dimensional intensity functions, with the determine use of scale-space the correspondence matching as described in chapter correspondence has been between established, two we one can dimensional measure 6. With this method we can intensity the functions. disparity values Once this between 221 corresponding features. We can then compare these disparity values to obtain filtering error values. We can then compare values to the actual these experimentally disparity determined filtering errors to the theoretical predictions given in chapter 4. In chapter 4 it was pointed out of the filtering error could be done that the theoretical linear disparity analysis functions. Thus, in the following experiments, only in the case of we used only linear disparity functions. The case of nonlinear disparity functions is treated in chapter 6. The experiments f(x). proceeded as follows. We created a random one dimensional function A second random function, g(x), was then derived from f(x) using equation (4.3.3). This means that f(x) and g(x) can be thought of as the right and left intensity from a surface having a disparity function of the form d(x) = (lix. two functions then matched (for a given value using of p\) were then the technique performed the disparity between described in the scale map. The RMS error actual disparities for each value of o -80/255. for four values in section 6.2. of the between The scale maps These After corresponding scale map contours value performed computed. functions arising two scale maps were the matching were computed these measured of these disparity had been for each a values and the in the scale map is then obtained. This procedure was disparity gradient The right hand scale maps of the intensity p\; -20/255, -40/255, -60/255 and functions for each of these four cases are shown in figures 5.24 to 5.27. The left hand scale map is the same for each case, since the same random function is used as the left hand function. The scale map of this function is shown in figure 5.23. The results are plotted in figure 5.28. Figure 5.29 shows a theoretical prediction of the expected R M S disparity measurement error due to filtering as a function of a and p\. This prediction was obtained by assuming that the RMS filtering error is given by the square root of the sum of the variance of the theoretical filtering error distribution (equation the variance 4.3.40) and of the quantization error distribution. However, since the variance of the filtering error distribution, as strictly defined, is infinite, we can only obtain a 'pseudo-variance' This pseudo-variance value. is obtained by fitting a gaussian distribution to the actual filtering error distribution. This is performed by setting the peak values of the distributions to be the same. FIGURE 5.24 The right hand scale map for the ^ =-20/255 case. Doing this we obtain for the pseudo-variance the expression given in equation (4.3.41). The 223 F I G U R E 5.26 The right hand scale map for the p\ =-60/255 case, variance of the quantization error is simply given as 1/6 of the zero crossing position 224 FIGURE 5.28 The measured RMS disparity error due to filtering as a function of o for disparity gradients of -20/255, -40/255, -60/255 and -80/255. 225 CM Filter Sigma FIGURE 5.29 The expected R M S disparity error due to Filtering as a function of o for disparity gradients of -20/255, -40/255, -60/255 and -80/255. quantization level, and is thus equal to 1/6 since the zero crossing this experiment were quantized to one pixel, for all a and 5.29 show that the experimentally obtained position measurements in and |3 values. Comparing figures 5.28 filtering errors are indeed similar to the theoretical predictions. This experiment thus shows that the filtering error effect is obtained in practice and needs to be considered algorithm. in analyzing the disparity errors produced by a matching 226 5.5 - Performance of the Matching Algorithms with Additive Noise In this section we describe experiments that were performed to test the effect of adding gaussian white noise to one of the images in a stereo pair on the performance of the discrete multi-resolution matching algorithms. These experiments proceeded as follows. First the experiments described in section 5.3 (for the case of d m a x = 10) were performed to give a noise-free baseline for the RMS error measurements. Then the experiments were repeated, this time with gaussian white noise of variance on added to one of the images. This extra noise caused a shift in the position of the zero crossings of that image, thereby causing an error in the measured disparity. This was analyzed in chapter 4.2. The difference in the RMS error between the depicted in two sets of figures experiments 5.30 to were 5.33. The tabulated. The results maximum noise of variance these experiments tested was 400. are This corresponds to a minimum signal to noise ratio of 10, as the signal, variance was. 4096. Note that this is a fairly high signal to noise ratio yet the effect of the added noise on the RMS disparity error was still significant. We can obtain a theoretical prediction of the expected RMS disparity error due to the additive noise by using the results of the analysis of this error performed in chapter 4.2. As was the case distribution measured in is the filtering undefined. RMS error Thus with a error, the the best variance that 'pseudo-variance' we of can of the the do theoretical is to additive noise error qualitatively compare additive noise error the distribution. The pseudo-variance measure that we use is defined implicitly as follows: •f ep P P ( e ) d C P = i e _ x V 2 dx = 0.3413 (5.5.1) Thus <7p is the point at which the area under the probability density curve is equal to the area under the standard gaussian (normal) probability density curve in the interval (0,1) (recall that the standard deviation of the standard guassian distribution is 1 and the mean zero, thus the above interval is one standard deviation pseudo-variance is plotted in figure 5.34 from the mean of the distribution). The as a function of the noise variance for a filter a Ul o u U. >><=> Recon. Method • = Warping o = Averaging A = Relaxation 120 160 200 240 280 400 Noise Variance FIGURE 5.30 Increase in RMS disparity error as a function of added noise variance. Surface o =30, Iterative matching, Zero crossings only. T 160 200 240 Noise Variance 280 320 360 400 FIGURE 5.31 Increase in RMS disparity error as a function of added noise variance. Surface o = 15, Iterative matching, Zero crossings only. O u u< Recon. Method D = Warping o = Averaging A = Relaxation 160 200 240 Noise Variance 260 320 r 360 1 400 F I G U R E 5.32 Increase in RMS disparity error as a function of added noise variance. Surface a = 30, Iterative matching, Zero crossings and extrema. Noise Variance F I G U R E 5.33 Increase in RMS disparity error as a function of added noise variance. Surface a = 15, Iterative matching, Zero crossings and extrema. 229 of \/2 (the function of indicates that highest the resolution additive the filter noise It variance. RMS disparity variance (note that the statistical deviation of the measurements a). error is This is a seen that concurs the with roughly linear nature of the measurements pseudo-variance the experimental function of the is a linear evidence that additive noise means that there will be some from any trend such as the assumed linear one). However, the magnitude of the pseudo-variance is about 1/5 that of the measured increase in the RMS disparity of error. This is presumably a result the fact that our definition of the pseudo-variance was somewhat ad-hoc, and another definition may have produced a closer fit to the measured magnitudes. However the expect it to capture more truly the nature of the form of the definition variation in the is such that one disparity error. would Also, the discrepancy in the error magnitudes is almost certainly due in part to the fact that the errors due to the noise induced shifting of the zero crossing and extrema the pseudo-variance is a measure features (which is v/hat of) can not be decoupled completely from the effects of the matching and reconstruction processes. It performs is seen from the the best in terms additive of noise minimizing tests the that the increase in warping the reconstruction RMS disparity method error. This suggests that the warping method is better than the other methods at smoothing out the high frequency error decreased from components. 30 to 15. The This RMS suggests disparity that the error increases mechanism only causing slightly the as a increase in g is the disparity error in the presence of additive noise is only loosely coupled to the reconstruction process. Similarly, the addition of the extremum features does not affect the increase in RMS disparity error appreciably (except for the relaxation case at a g = 15). This again that the reconstruction process is loosely coupled to the noise error mechanism. suggests 230 o o d - | I 1 1 0 40 80 120 160 1 200 1 240 Noise Variance I I i 280 320 360 T 400 FIGURE 5.34 The pseudo-variance of the additive noise error as a function of the additive noise variance, for of = y/2. 231 5.6 - Comparison of the Simplified Matching Algorithm with the DispScan Multi-resolution Matching Algorithm. In this section we compare the performance of the multi-resolution Dispscan matching algorithm with the simplified matching algorithm. We varying a tested the and a g multi-resolution maximum disparity Dispscan of 10 algorithm pixels. were used to do the disparity function reconstruction. function of a It better is than computation performs The 9x9 random image pairs with average and warping methods The resulting RMS disparity errors, as a are plotted in figure 5.35. g seen the that the multi-resolution simplified method. and takes almost better reconstruction on the with the twice warping method. This indicates Dispscan matching However the Dispscan algorithm algorithm as long. It is also observed reconstruction that, even process is very important to the performance method than performs involves somewhat much more that the Dispscan algorithm with the for the Dispscan algorithm, the of the overall matching process. 9x9 averaging reconstruction Algorithm D dispscan. avg o dispscan, warp a simple, warp Ui O co U Ui W 30 40 50 70 Surface Sigma FIGURE 5.35 The RMS disparity errors for the Dispscan and simplified matching algorithms as a function of a_. 233 5.7 - Application of Image Analysis to Log Scaling In this section of the thesis we discuss the application of image analysis methods to a particular industrial application, that of log scaling. Log scaling is the process of measuring, or estimating the volume of wood in a log or group of logs. There are four basic reasons for wanting to scale logs: (Sinclair, 1980) 1. To determine payments and royalties to the Crown; 2. To determine payments to logging contractors; 3. To find the volume of logs for sale, trade or transfer; 4. To calculate divisional, departmental, or area production. The two most common techniques for scaling logs are weigh scaling and stick scaling. Weigh scaling weight of involves weighing a the empty truck. The truckload of resulting load logs and weight subtracting can be volume of the load of logs provided the mean density of the used this weight to estimate from the the total logs is known. Stick scaling involves a man walking over a log as it lies on the ground and measuring the length and the widths of the log at the two ends with a measuring stick. These measurements are written into a log book from which, at the end of a shift or workday, the log volumes are calculated. Stick scaling is more accurate on a piece by piece basis than is weight scaling, as the log density used in the estimation of the weight scale volume can be in error due to variations in the log's moisture content as well as to invalid assumptions as to the species of log. In contrast weight scaling provides information about a set of logs only. However, the main drawback of stick scaling is that it is very slow compared to weight scaling, especially if the logs are small. This is an important consideration now that forest firms are harvesting smaller and smaller logs. Sample scaling, which involves stick scaling only a small sample set of logs from a larger group of logs, can be used to speed up the scaling operation, but is useful only for large, homogenous, groups of logs for which volume statistics can be 234 adequately characterized (similar to the case of weight scaling). Sinclair (1980) has noted that, in a large number of British Columbia coastal logging operations, the log scaling function was the controlling factor in determining productivity. Hence, if one could speed up the log scaling process, then presumably one could improve the productivity of a sort yard operation . Sinclair also states that as the scalers are rushed or 11 pressured, the quality of the scaling is diminished automating pressures the scaling process, that a human scaler which would is exposed logging machinery rumbling about nearby), (the measurement presumably to (such would error be unaffected increases). by the workplace as the menacing presence be expected to increase Thus the of very large reliability and of the log scaling process. This section of the thesis looks at a process whereby repeatability the log scaling operation can be done automatically. One could conceivably automate the stick scaling operation by designing a robot to directly replace the human stick scaler. This machine would walk along the logs and measure them with a shiny chrome measuring stick. This approach however, would have most of the human problems of stick scaling as well engineering such a mechanism are as some new ones. At any rate the problems of enormous and the present state of the art is not sufficiently advanced to permit such a design. Demaerschalk et al (1980) have demonstrated improved accuracy over weight scaling by the use of optical methods in truck load volume estimation. Their method involved taking photographs of the truck load from the back and the side, enlarging these photographs and manually estimating (by counting the dots on an overlay grid) various geometric such as the area of bark, the number of logs and so on. These parameters parameters were input to a statistical regression procedure which provided an estimate Although this technique was designed to replace weight scaling, the general principle, that of optically sensing and measuring log parameters, for the volume of the truck load. could be applied to the domain of stick scaling; that of logs lying flat on the ground. A sort yard is a place where logs from the harvesting grounds are sorted as to species and grade, bundled, and sent to the mills. n 235 This painfully type of procedure, counting dots, instead of manually talcing photographs, could, in principle, be fully automated enlarging them and with television cameras and (special purpose) image analysis hardware. Simple techniques The simplest technique that one can use for processing images, widely used in log handling applications (as well as other yet one of the most applications) is that of binary thresholding. In this technique, all piexels whose intensity is greater than a given threshold is a assigned a high or T assigned level. All pixels whose intensities fall below this threshold would be the complementary low or '0' value. Thus the image those comprised of pixels with level T is partitioned into two sets, and the other containing the pixels with level '0'. If the intensity of the object(s) that one is interested in is sufficiently different from the other parts of the image, then this method provides a simple mechanism for separating the object from its surround. However, if the objects do not have an uniform intensity distribution, then the thresholding surround does operation not have will cause the object a uniform intensity to become broken up. Similarly distribution, then parts of it will if the be confused with the object. Figure 5.36 shows a photograph of a typical log lying on a flat deck. Figure 5.37 shows the result of applying a threshold equal to the mean intensity of the original image. Note that the black and white regions do not correspond to any meaningful structures. In general, to make the thresholding technique viable requires that one be able to control the lighting conditions, as well as the characteristics of the objects being imaged, very closely. This, for example, can be done by backlighting of the log which reduces the apparent texture of the logs visible surface (since it is in darkness) and provides a bright, uniform background. This technique, commonly refered to as broken beam scanning has been successfully employed in sawmills, where such a setup can be readily arranged. Examples of such systems are given in (Vit, 1962) and (Hand, 1975). Another implementation of the thresholding technique, highly reflective white paint (thus reducing method which is to paint over the effect of has been proposed the surface texture) for the of the log with and darkening the background (Miller and Tardif, 1970). In (Whittington, 1979) is described a method whereby 236 FIGURE 5.37 an array of The effects of thresholding figure photodiodes senses, and thresholds, the 5.36 light reflected off of a peeled log. 237 Vadnais (Vadnais, 1976) describes a similar system that detects the boundaries of a sawn board in order to determine proper edging strategies. However, control either in the the situations encountered lighting or the in most characteristics sort yards, of the surfaces one (logs can not practically and background). Thus the image processing technique must be able to handle textured surfaces and varying contrasts, such as is the case in figure 5.36. systems utilize edge detection Some of the more advanced log handling image processing algorithms. thresholding techniques because These of their relative techniques are more insensitivity to changes powerful than the in illumination across the scene. Instead of trying to distinguish between the body of the log and the background, edge based techniques search for the boundary between the log and the background. Since logs are usually simply connected (no holes) the body of the log can be determined from its boundary. Edge detection is performed by measuring, with some differential operator, the local variation in the intensity. If this variation exceeds a given level, we infer an locality. There are 1976 many such edge operators for a review, also see chapter theory of edge detection). adequate operator for our to the background is detected, Rosenfeld and Kak, 2 of this thesis for a discussion of the Marr- Hildreth However, even edge detection based schemes are not in themselves application. Figure image that have been used (see edge in that 5.38 of figure 5.36. many other show the results of Notice that, while the edges are applying an edge boundary of the found as well. It is a detection log and the very difficult task to determine which of these edges belong to the log-background boundary and which are due to the texture in the log interior and background. Thus, background contrast, to when solve both our the particular logs and problem, the which background is are to distinguish highly textured logs and from the of varying we require a more complex image analysis technique. In the previous chapters of the thesis we have seen that stereo depth perception works best under just these conditions. That is, the the more edge segments depth measurement there are process. Also, in a stereo image pair, the greater the reliability of since the logs in our application are lying on a flat deck, their occluding boundary (which is the boundary that we see in the image) is closer to 238 FIGURE 5.38 the camera principle, than an The result of applying an edge operator (Marr-Hildreth) to figure is error the background. Thus free segmentation of thresholding the logs the from depth the value will background. give There 5.36. us, are in other methods, other than the one described in this thesis, for determining the distance to the logs; these include laser ranging, and structured finding widespread constrained attainable light techniques (Jarvis, 1983). These techniques application in industrial vision systems. However, they generally are require a environment (in terms of controlling the ambient light). Such control may not be in a logging sort yard, and a system that can operate under more arbitrary conditions would be desirable. Analysis of the The measurement setup of the accuracy video log scaling system is given in figure 5.39. We assume that the camera axes are parallel and that the logs are lying on a flat deck parallel to the image planes of the cameras and that the focal points of the cameras are the same distance the deck. In the Appendix is derived the formula for the For the simple geometry involved in this application we have: depth in terms of the from disparity. 239 FIGURE 5.39 z = fd/D The video log scaling system or D where D is the disparity, = fd/z (5.7.1) z is the vertical point on the log or deck being imaged, between setup. depth from the camera focal point to the the f is the camera focal length and d is the distance the focal points of the two cameras. Let the size of the sensor be given by s and the number of imaging elements on the sensor (pixels) be denoted N. Then we have that: N*D/s (5.7.2) is the disparity measured in pixels. Let the distance from the camera focal points to the deck p = be denoted h. Then the minimum disparity will be: (5.7.3) 240 If the maximum log radius is denoted r max then the maximum disparity range that the matching algorithm will be faced with is given by: Ap The mean = (5.7.4) 2*N*f*d*r/[(s*h)(h-2r)] depth accuracy is given by 2r/Ap (for quantization of disparity to the nearest pixel) or: depth resolution In this application = the depth sh(h-2r)/Nfd resolution (5.7.5) is not important More important is the horizontal position accuracy. However the scaling factor for the horizontal position is almost constant and can be calibrated beforehand or during the log scaling process if ground control points can be detected in the images. If the horizontal scale factor is known, the error in the measured ground position, A X , is a function of the position error in the image, Ax. It can be shown that: AX = (h + f)s/(fN) A n where A n is the image position error measured (5.7.6) in pixels. In the best case analysis the only error in the image position will be due to the pixel quantization. In this case An = l/v/12. If we are viewing a cylindrical log with dimensions of LxW, then its volume is given by: V = (5.7.7) TT(W/2) L 2 If the errors in the measurements of L and W are equal to A X , then we can write the normalized error in the computed volume as: AV/V = (1/L + 2/W) A X (5.7.8) 241 The Field of view common to both cameras has dimensions (sh/f) (vertical) by (max(sh/f-d,0)) (horizontal). For the camera setup in the examples that follow, typical values for the camera parameters were: d = 10 inches N = 256 pixels h = 100 inches f = 46 mm r 8 inches = s = 25 mm Thus we have that: p = 0.1 mm/pixel Pmin Ap = = 4 5 P i x e l s 9 pixels The mean depth resolution = 1.89 inches/pixel The field of view is 55.5 inches by 45.5 inches. If the only errors in localizing positions in the image were due to quantization of the pixels, then we have that Ax = 0.062 inches. If there is a log in the scene with dimensions L=40 inches and W = the normalized error in its volume computation would be A V / V = 0.013 10 inches, then (or 1.3%). In practice A n may be higher; on the order of 1 pixel. In this case A V / V would rise to 4.6%. 242 Doubling the resolution to N = 512 would extend the required disparity range to 18 pixels but would halve the error in the volume. Moving the cameras farther apart would also increase the range in disparity values, but would do so at the cost of less overlap in the stereo pair, which means that disparity values would be available over a smaller region. Because only using the disparity values to distinguish the logs from the background, we are we do not require high disparity resolution. Thus we can move the cameras closer together and get an increased image overlap and a decrease in the disparity range. As we have seen in the experiments in the previous section, and in the theoretical analyses of chapter 4, the matching algorithm performs better for smaller disparity ranges. Thus, the reduced matching created by moving the cameras closer together should improve the extraction range of the disparity maps. We can not move the cameras too close together, however, as then the logs and the background will not be sufficiently separated in disparity value to allow a noise free segmentation. Experiments have shown that a disparity range of between 4 and 15 pixels gives good results. In the cases described later, the disparity ranges were on the order of 5 pixels. In a practical system we would require a vertical (from the camera's point of view) field of view of about 10 metres, and a horizontal field of view of at least 2 metres. The height of the camera would be on the order of 10 metres above the ground on which the logs are lying. Given a sensor size of 25 mm the field of view requirements result in a needed focal length of 25 mm. Assuming a resolution of 512x512 requiring that the disparity range, Ap to be 10 pixels cameras be 4 metres. The horizontal field of view (N=512) and taking r=0.25m, and gives that the distance is then 6 metres. between the The mean depth resolution is 5 cm/pixel. Thus a log with a diameter of 25 cm would have a disparity range of 5 pixels. Finding the disparity The map matching algorithm described in chapter 2 can be used to obtain the disparity map from the stereo image pair of the scene containing the logs. disparity range produced by the log scene for a 256x256 sensor necessary to use more than one level of resolution in the experiments Because of the limited resolution, it was not that we performed. In 243 a practical resolution 512x512 situation the level may the sensor resolution may be be required. However, even disparity range is not larger, when excessively the large. in which case more sensor resolution Thus we can than is increased expect that one to the multi-resolution matching algorithm will perform well. For the matching process to work with only a few levels of resolution, the initial disparity estimate must be as accurate as possible. Fortunately, the nature of our application ensures that this can disparity will always, for parallel camera axes, occur is fixed in relation to the be the at the background. Since the background cameras, its disparity can be computed The size of the logs can be bounded by some expected value (say bound the maximum disparity case. The minimum value. Thus a reasonably or measured 75 beforehand. cm), and so we can good estimate to the average scene disparity can be computed. The matching process was performed on the using only purposes of one level this of resolution. experiment, directly The by initial hand stereo image pairs shown in figure 5.40 disparity on the estimate images. was There misalignment which was measured by hand on the images and accounted process. These steps can be done in a practical system measured, was also for a the vertical for in the matching as an initial calibration this is done for a given camera setup a number of images can be processed step. After with no human intervention. The zero crossings of the left and right images in the stereo pairs of figure 5.40 shown in figure 5.41. We do not do the reconstruction process at the are highest resoluion in this case because we wish to both save computation and not distort the disparity discontinuity at the boundary of the log. The segmentation function reconstruction process to be explained next does the disparity in such a way as to accurately retain the position of the log boundary. It is still necessary to perform the reconstruction Segmenting the Once the disparity map to find step at the lower resolutions. the log disparity map has been computed it can be thresholded to extract all the objects which have a disparity greater than a given value. If the deck that the logs are lying 244 FIGURE on 5.40 is parallel to the constant and the disparity always Two stereo image pairs depicting single log scenes. planes of the (magnitude of the deck. correspond to computed image of the) Thus the logs. In cameras then disparity of the objects that are practice, there logs left is the will after always disparity always the a of the be amount will be greater thresholding small deck the process should of disparity function, which results in some isolated points which are than noise in the not part of the log. These can be Filtered out by requiring that there be a given number of other points in a given sized neighbourhood about the point in question. then the point is removed. At this point in the processing which loosely thresholding Note define the logs and filtering the that the disparity map region, disparity as seen map in figure obtained the right hand side of the image. This is due to the detail in this noise. However, region when and the the zero crossings segmentation that process, were a sparse set which the many depicts stereo pairs incorrect found were mainly below, is the of disparity fact that there was described points form a distinctly non-log shaped region and are condition is not we have 5.42, from of figure 5.42b) contains If this of points result figure of 5.40. values along very little image due performed, thus recognized satisfied to these camera noise as a non-log object 245 FIGURE 5.41 and discarded. The zero crossings of the stereo pairs shown in figure 5.40. 246 FIGURE 5.42 The thresholded disparity maps of the log scenes. The remaining processing steps require that we know the boundary of our log region. Thus we must somehow obtain this boundary from the set of points produced by the 247 thresholding thresholded line process. image We accomplish this in the following we search fifteen pixels to either with an orientation of 9 0 ° . If there is at manner. For pixel in the side of the pixel in question along a least one 'hi' pixel on both sides of the centre pixel then we set the centre pixel to 'hi'. We then repeat line orientations of 7 7 ° , 4 5 ° , 22° each this process with search and 0 ° . Finally the entire process is repeated. It can be seen that, if instead of just the five angles above, we used a continuum of angles, and if we extended the search range to infinity instead of 15 pixels, and if the process was iterated indefinitely, the boundary of the resulting filled in region would be the convex hull original set of representation thresholded points. of often the convex for the log region, since sharp protusions on the result from knots or branches, boundary However, the log as computation, we limit the will cause the convex shown in figure 5.43. region of search For in the hull not log's surface, hull to widely this reason, filling is of the a desirable such as may diverge from the as in process 12 well to as 15 to true save on pixels. Thus, a non-convex region can be obtained, which is still filled in and roughly conforms to the log shape. The result of applying this filling in process to our real log image is shown in figure 5.44. We can now determine the log boundary simply by noting which pixels are 'hi' and are adjacent to only one 'hi' pixel to the left and right The log boundaries obtained for the real log images are shown in figure 5.45. The actual log boundaries are also drawn in (dotted lines) for comparison purposes. In practice the image may contain more than one log or there may be other (real will objects or hallucinated by the computer) which pass the disparity threshold. In this case there be examine more than one connected region in the the shapes of these regions to see segmented whether output In this case we can or not they have the characteristics of logs (long and thin). If they do then we compute their volumes. If not, we discard them. The moment calculation described in the boundary be constructed, algorithm for deriving known as the this representation next section requires that a representation chain code from the representation boundary image (Freeman, map will of each 1974). The not be given The convex hull of a set of points is the convex polygon of smallest area that encloses all of the points in the set 12 248 LOG BOUNDARY •CONVEX HULL FIGURE 5.43 here. However boundaries 2, 3 include will see Volume so on. not the the have the in the is then shape long of next section, possible to For perform slender moments example to between on each region allow us to estimate the the could number associated the hull. we with inside different the boundary distinguish between shape of by its convex distinguish computations measurements and the log boundary logs logs. the closed boundaries separately. and other Other boundary, 1. Such objects computations which as we volume can volume of a log. Computation estimate is include it of the unique labels. can determination the estimated. often that them We would Once be note assigning computations do we by and which T h e approximation the required important cunit (or log Over has the volume in There cubic course of the been a foot) two of time a based on log scaling are succesfully process, scales (Watts, of 1983, separated number a volume the to from of simple small having out number make that Forestry are its background, formulae of only a have been measurements. small commonly number used; developed Since of the Handbook for B.C.). A its is measurements board board speed to foot foot is and the 249 FIGURE 5.44 The filled in log region, volume of wood in a piece of dimensions 12"xl2"xl". There are board foot scaling rules and 250 V 1 \\ \\ \ \ \\ \v \\ [t V \ \ \ \ V V V V. FIGURE \ J /? Sr 5.45 The detected log boundary. cubic foot scaling rules. These rules include adjustments for such things as kerf loss (due to non-zero width sawmill blades) and butt flare. Due to these adjustments the relation between 251 board feet and cubic feet measures is not 12 board feet to the cubic foot but rather 5.63 (Dilworth, 1975). We list some of the scaling rules below (from Dilworth, 1975). Board Foot Scaling Rules Knoufs rule of thumb - length in feet, and D = V = (D -3D)L/20 where 2 mean diameter V = log volume, L = log in inches. Girard and Bruce rule of thumb - V = 1.58D -4D-8 (for 32 foot logs). 2 Scribners Decimal C log rule (Scribners Dec. C.) of the volume (gives volume in tens of board drop off the least significant digit feet). This is the official rule for the U.S. forest service. British Columbia rule (superceded V = (D-3/2) (.7854)(8L/132). This is the old B.C. standard 2 by the Smalian rule given below). Sammi's rule of thumb Brereton Doyle - V = V = V = (D-l) L/20 2 0.0654D L. Used for measuring logs to be exported on ships. 2 (D-4) L/16. Erratic measure, 2 not widely used. Cuhic Foot Scaling Rules Rapraeger's rule - V = .005454154(D+L/16) L. 2 D = diameter at the small end. Assumes a taper of 1 inch every 8 feet of length. Sorenson's rule - V = .005454154(D + L/20) L. 2 Assumes a taper of 1 inch every 10 feet of length. Ffuber's rule - V = C L where C = area at centre of the log. This is unsuitable for decked, rafted or loaded logs due to the difficulty of measuring the radius of the centre. Smalian - V = (b+t)L/2 where b = area of the base of the log and t = the top of the log. This is the official rule in British Columbia. area of 252 Dilworth listed (1975) supplies a table above. These were of the relative obtained from measurements accuracies on a series the volume obtained by Newton's rule, which is V = of the cubic foot rules of Douglas fir logs, and (b + 4c + t)L/6 (where c is the area of a slice through the centre of the log), was used as a baseline for comparison purposes. The results were as follows: Smalian : -5.8% Huber : -2.9% Sorenson : -3.78% Rapraeger : -4.7% In our automated process, we could, in principle, determine the volume more accurately than any of these rules by summing up the incremental areas along the axis of the detected log. However, a given user of the system may want the volume measures to the value given by the log scaling rule that they are used to be comparable to (such as the official government rule). In this case we would use the appropriate log rule. What is required of our image analysis process in order to compute the volumes, given the segmentation of the log? If we were to use the accurate method, we would need to determine the axis of the log; a deceptively difficult task. The use of one of the various log rules requires that one or more of the following be known: The length L of the log along its major axis, the diameter of the log at the top, centre and bottom, and the mean log diameter. In the system described herein the following method is used to find the axis of the log. We first measure the moments, m , m , m 00 m 00 = / f D dxdy 10 ou m lu m 20 and m , where: 02 (5.7.9) 253 m m Using a computed 10 oi = J*/ x dxdy (5.7.10) R •= / / R (5.7.11) y dxdy mn = / T xy dxdy (5.7.12) m 2o = //R dxdy (5.7.13) m o2 = J/RY dxdy (5.7.14) discrete with R x2 3 version line of Greens's integrals along theorem (Tang, the boundary of R 1982) (R the above moments is the segmented can be log region), thereby cutting the required computation by an order of magnitude. The reader is directed to (Tang, 1982) for the exact formulation of the moment calculations. Once these moments have been computed the centroid and axes of the ellipsoid that also gives rise to these moment values can be determined. The centroid of this ellipsoid is given by: (x ,y ) = c c (m /m o,moi/moo) 10 The axes of the log will 0 pass through this point (5.7.15) The axis vectors are the eigenvectors of the following matrix: C where: = C20 c„ C n Co 2 (5.7.16) 254 C n = rrin/m,'00 (5.7.17) C 20 = m / m[00 (5.7.18) C 02 = ni /m '00 (5.7.19) 20 02 ( ( The eigenvalues of this matrix give the lengths of the axes corresponding to the perpindicular eigenvector. Once the long axis has been determined we can search along this axis for the boundary points which cross this axis. The distance between these points gives the estimated length of the log. Correspondingly, we can find intersection the width of the short axis with the boundary. These of the log by finding the width and length estimates can be used in a number of the above scaling rules. We should be able to obtain a more accurate volume calculation by integrating the incremental areas measured along the long axis. We can also obtain the radius of the middle of the log, which is needed in some rules, by measuring the width of the log along the short axis (through measurements the centroid). These are depicted in figure 5.46. Note that the above procedure well for objects that are warped or curved as the central of the scaling will not work very axis cannot be approximated as a straight line as we have assumed. Also the centroid may, in such a case, lie outside the log region. The algorithm described above requires that the centroid be inside the boundary. More complex shape analysis techniques be employed for such cases. We have problem any closer than this. However, the automated not examined this system would in all likelihood be used to measure high value logs which are unlikely to exhibit warpage and other defects. We have applied the above procedure to the real log images pictured earlier. The actual volume of the visible portion of the logs were measured by hand (stick scaling) using Newton's formula and are compared below with the volume computed using the video method 255 FIGURE 5.46 Fitting an ellipsoid to the log region. with a number of scaling rules, volume for the first log (figure including the integration 5.40a) and the second 5.40b. Comparison of Volume Estimates (in cubic feet) 1. Hand Measured Volume = 2. Integration along Axis = 3. Length*7T "(Width/2) 2 = 1.480 ; 0.7200 1.749 ; 0.564 1.749 ; 0.637 4. Smalian Rule Volume = 1.538 ; 0.617 5. Huber's Rule Volume = 1.678 ; 0.698 6. Sorenson's Rule Volume = 1.454 ; 0.617 method. number The first number is the for the log shown in figure 256 7. Rapraeger's Rule Volume = 8. Newton's Rule Volume = 1.631 ; ; 0.619 0.671 9. Frustum of Cone Volume = 1.536 ; 10. Average of the estimates = 1.590 ; 0.630 It values 1.468 is vary described seen that somewhat. in this the correspondence The thesis were differences done of 0.617 the range chiefly to various from about show that estimates 2% to stereo to the hand 20%. The disparity can measured experiments be used to segment out the log from its background, and were not designed to obtain statistics as to the accuracy of the system over a large sample of logs. More experimentation is required in this regard. Also better techniques for obtaining the log region from the thresholded disparity map could be devised, which would increase the accuracy. However, it is evident that stereo disparity can indeed be used to get an estimate (around 10% accuracy with the methods given here). The processing algorithm hardware logical next step. the as or it stands could with special purpose Such a hardware industrial setting. now be implemented hardware. This has in not off been done This test would presumably bring to light some obtainable volume accuracy obtainable (after performing a large Following this prototype shelf image but is the implementation would enable a test of the algorithm in unthought of practical problems which need solving, and can also be used to provide a statistical logs). the phase, a production number of tests on different model could be indicated by the technical and economic analysis of the prototype estimate of the developed, if this performance. was 257 5.8 - Summary of Chapter 5 - An efficient method for implementing the zero crossing pyramid construction was presented. - Experiments were performed to test the performance of the matching algorithms on random noise stereograms with non-constant disparity functions. - The RMS disparity errors were seen to decrease as the disparity functions became smoother and smoother. 9x9 - The disparity errors were seen to be concentrated - The warping or transformation reconstruction averaging or relaxation reconstruction in small patches. method worked better than either methods, with regard seen increase to decreased the RMS disparity error. - The RMS disparity errors were to with the disparity range of the disparity function. This can be explained as being due to the reconstruction error. - The filtering effect predicted in chapter 4.3 was observed experimentally in the case of one dimensional random image pairs. Its effect on the matching of two dimensional image pairs could not be ascertained. gaussian It was white shown experimentally that the noise was added to the increase input images in the was RMS disparity error when linearly related to the deviation of the added noise. This was in agreement with the predictions of chapter - standard 4.2. The multi-resolution DispScan algorithm was seen to be slightly more accurate than the simplified matching algorithm. This improvement, however, was at the expense of increased computation. 258 - The simplified matching algorithm was successfully applied to the log scaling problem. - Moment based methods were developed to estimate the long and short axes of the logs, from which measurements can be made to be used in volume estimate calculations. - The hand measured difference volume volume was on the boundary finding methods incorrect in the order estimate obtained of 2 to 20%. from these calculations The errors are and the due in part to the used and in part to the pixel quantization. Errors also arise from disparity values near the result in more accurate estimates. true boundary, used. Better boundary finding methods may 259 VI - 6.1 S C A L E SPACE F E A T U R E - MATCHING Introduction This chapter matching shown process. that discusses the possibility of using scale space representations The topics covered one of the in this chapter main problems that the are in the stereo illustrated in figure 6.1. We have discrete multi-resolution feature matching algorithms suffer from is the error induced by the reconstruction process. This error is largest at low resolutions and can cause the multi-resolution matching algorithms to become unstable. It would be preferable, then, to have an algorithm in which the reconstruction process would not be required, at least disparity measurements it would high be at low resolutions. We have also seen decrease as the resolution decreases, preferable to have a matching that the accuracy of the at least for 'tilted' surfaces. Thus algorithm that made disparity measurements resolutions only. Let us summarize these two conditions on the at matching algorithm as follows: Cl - The disparity measurements are to be interpolated only at the highest resolution C2 - Disparity measurements are to be made (or used) only at the highest resolution level. level. From the above, we can say that a desirable feature matching algorithm would be one that in some way refrained from making any disparity measurements, and interpolating these measurements, we have the ambiguity problem the until the between highest possible resolution level. However, as features increases the resolution increases, making the earlier, matching very complex. In order to reduce the amount of feature ambiguity we must reduce resolution. However, reducing the measurement remains as seen is resolution also reduces the accuracy Clearly, some form of multi-resolution matching is necessary. - is there any multi-resolution conditions C l and C2 defined above? feature matching algorithm of the disparity The question that that can satisfy 260 Introduction CH6.» Matching without disparity measurement or reconstruction CH 6.1 Problems with scale space matching k matching Scale space ' 1 CM 6.2. Mokhtarian's CM 64 Matching 2D scale maps method CH6.I JL A new matching CM 6.3 method Biological implications CH C.s CM 6.7 Experiments (chapter 5.4) (*) Indicates New Material FIGURE 6.1 The topics covered in this chapter. The answer to this question lies in the scale space representation of the stereo images, Before we describe how the use of the scale space representation allows this question to be 261 answered in the affirmative, we should discuss again the reasons why discrete multi-resolution matching algorithms cannot satisy conditions C l and C2. ln order for the matching process to take place have at a (discrete) resolution a good enough match come estimate of the is sufficiently small to ensure This means level, with a minimum of feature ambiguity, we disparity so that the size of the search that the must region for a percentage of false matches is suitably small. that we have some a priori knowledge of the disparity function which can only from the lower resolution matching process. This means that the algorithm must have measured the disparity at some lower resolution. This means that condition C2 cannot be met Furthermore, be matched since the low resolution disparity measurements at the operation to resolution matching these types higher provide the resolutions, higher processes. of algorithms. must of disparity density Therefore, It can there it is clear be seen that the be are an sparser than the interpolation information or required that condition C l can features to reconstruction by not the higher be met by fundamental problem that these discrete matching methods have is that they perform a given matching operation at one resolution at a given time. That is, although they may use information from other resolutions to guide the matching, the matching itself is between features at a single resolution level only. This is the key point. If one allows the matching algorithm to match features across resolution levels, then conditions C l and C2 can indeed be satisfied. This brings provides an resolution chauvinism us to integrated the idea of multi-resolution scale space representation matching. of the The scale image. map There of is an no image built-in to cause one to believe that matching should be done at one resolution at a time. Look again at the scale maps of the simulated and real stereo pairs that were shown in chapter cardboard 4 and 5. Now try this simple experiment or paper with long thin slits cut in them. Place stereo pair such that the slits lie over Get two identical pieces of these over the scale maps of a a line of constant resolution (a, and the same in both). Now try and do the matching (assuming that you have an error prone estimate of the actual matches). It is pretty the scale map contours. difficult Now remove the pieces of cardboard and try to match If you are like most humans this will be a much easier task than 262 the previous one. Presumably this will be the case for computers as well. If one examines the way in which humans perform the task of matching the contours of the scale maps, they find that it is the global shape of the contours and the relationships between the contours that are used to perform the matching. This is due to the fact that, though segments of the scale contour map contours over a narrow range of a values may be quite ambiguous, segments which cover a large range of a values are much less ambiguous. Since, in the scale map, the contours are continuous in x and a, knowing the correspondence between contours at low resolutions means that one knows the correspondence at high resolutions as well, and vice-versa. Thus, one resolution, for one the needs smallest only measure error) to the know disparity the at disparity resolution values at all (i.e. the resolutions. highest Thus condition C 2 is satisfied. Furthermore interpolation or reconstruction of the disparity function is required only at the highest resolution (so that a suitably dense set of disparity values are available for the next processing step). Thus condition CI is satisfied. From this we conlude that, given that the process of scale map matching is feasible, such a process would be preferred to the discrete multi-resolution algorithms described in the earlier part of this thesis. In the following scale space image section we examine representations. some methods that have been developed for matching 263 6.2 - Matching of Scale Space Image Representations As the concept of scale space image representations that the application of these representations is fairly new it stands to reason to computational vision processes is only getting started. There has, to the authors knowledge, been only one system described in the literature that explicitly matches scale space function representations. This system, described in the thesis of F. Mokhtarian (Mokhtarian, 1984) is used to match the scale space curvature function of two planar curves representations (see Mackworth and Mokhtarian, of the 1984). This system was designed for a limited domain, that is one in which the scale maps to be matched were scaled and translated versions of each other. The reasoning behind this matching algorithm is as follows. The scale map can be considered as a tree structure. Witkin, (1983) was. the first to point this out The root of this tree is an imaginary contour which encloses all of the real scale map contours. Each of the real contours in the scale map encloses zero or more other contours, which correspond to the children of that contour. Each of these children enclose their own children and so on. Thus the scale map can be thought of as a multi-level tree. Each of the nodes in the tree correspond to a connected scale space contour, and have associated with them a left and right branch and a peak (unless the contour is incomplete, in which case it passes either the left, right or top boundaries of the scale map). Since the scale maps are assumed to be related by only a scaling and a translation, the transformation between corresponding contours in a pair of scale maps has only two parameters which need be determined. Mackworth and Mokhtarian's matching algorithm determines the minimum cost tree match, where the cost of a contour match contours once the smaller is defined as the average distance of the two has been transformed (i.e. scaled between and translated). Incomplete contours are not matched. The algorithm used to determine the lowest cost node matches is an adaption of the Uniform Cost Algorithm (Nilsson, 1971). Full matching algorithm can be found in Mokhtarian's thesis (Mokhtarian, 1984). details of the 264 We in the have developed a somewhat simpler scale map matching algorithm which operates same domain as Mackworth and Mokhtarian's algorithm. As with Mackworth and Mokhtarian's algorithm, we create a tree from the scale map whose nodes correspond to scale map contours and the children of these nodes are those scale map contours enclosed by the node contour. above) set However, instead of performing a search of node matches, polarity of the scale our algorithm merely tries map contours along the to find the lowest cost (as defined to maximize the correlation of the line of minimum a. The polarity of a scale map contour is defined, in this instance, to be the sign of the V G filtered signal outside 2 (that is, above and to the sides) the contour. The ordered list of contour polarities along the minimum o line provides a signature of the scale map, which can be used to match scale maps. The autocorrelation of this signature almost always, in practice, peak at some shift. Thus, to match ordered polarity list along the two scale minimum a maps we can measure line. the exhibits a distinct the correlation of the The peak of this correlation function will give the horizontal shift between corresponding scale map contours at minimum o. Note that, for non-zero disparity gradients, there will be scale map contours on the edge regions of one of the scale in the maps which have no corresponding contour in the other, effective field of view in the two images. Thus, due to the difference in this case, the horizontal shift predicted by the correlation will not be zero, but will be equal to the number of these 'new' boundary contours on the left side of the scale map. Also, if there is a non-zero disparity gradient, new contours may appear at the bottom of one of the scale maps and will not correspond with any of the contours in the other scale map. In this case the correlation peak will be diminished from its maximum possible value and correct matches cannot be made for all contours. The larger the disparity is, the greater the number of new contours. To handle this, our algorithm does the following. First we make a list consisting of all of the possible matches for each contour in the left hand scale map. We order the contours in the left and right scale maps according to height If we assume that the left image is the expanded one then we know that a given contour in the left scale map cannot match to any contour in the right scale map that is larger. Also we know that matching contours must have the same polarity. Furthermore, since the disparity function is linear, we know that the positional order 265 of contours in the right hand scale map must be the same as the order contours in the left map that they match of the scale map to. That is, there are no position reversals. can determine the proper correspondences by applying these three constraints list of possible matches. This has been algorithm We 13 tested on a number to weed out the of one dimensional stereo scale map pairs of random Gaussian noise, with linear disparity functions. The matching algorithm matched perfectly on all of these cases. These cases were described in chapter 5.4 where they were used to test for the presence of the Filtering effect described in chapter 4.3. There are multi-resolution in image the literature descriptions representations notable of these is the representation that are of systems similar to value) scale map, where 2 tree, where the peaks and ridges matching map representations. is seen to be a coarsely the features are the peaks ridges in the two dimensional V G filtered image. a scale perform of Most of Crowley (Crowley, 1984, Crowley and Parker, 1984, and Crowley and Stern, 1984). This representation a which 14 (what we have This representation are the nodes and they quantized (in the called extrema) and is also in the form of are linked both at a given resolution level as well as across resolution levels. In addition the local maxima of the three dimensional filter output (analogous features, to a quantized 2D scale space transform) are also used as and hence become nodes in the representations. a type of scale space representation, have described earlier. representation was representation can also colleague in It is clear that this representation is although the tree structure is different than the one we The application that Crowley had in mind when he developed this the It description be used of his (Lowrie, and matching to perform of shapes. the stereo correspondence 1984) did, in fact, use this representation is evident matching. that his Indeed, a to perform matching of stereo pairs. Just as a note in passing to complete this section, Yuille and Poggio, in one of their recent technical reports (Yuille and Poggio, 1983a) commented that they were looking into Such reversals can occur with nonlinear disparities, such as are obtained for long thin objects such as wires or poles. "Actually, for computational reasons, Crowley uses the D O G or Difference O f Gaussians operator which can be shown to be a good approximation to the V G operator (see e.g. Marr and Hildreth, 1980). 13 2 266 using scale maps in matching stereo pairs. It is clear that the scale map concept is turning out to be a useful practical tool as well as a theoretical tool. 267 6.3 - Matching of Two Dimensional Scale Maps The above mentioned methods are all one dimensional processes in that matching is done only along a line in the image plane, and the scale maps that are matched are only one dimensional. There are two drawbacks to this one-dimensional matching process. The first drawback is that such methods are unable to match scale maps derived from scenes having an appreciable vertical disparity component. Granted, such a condition may be rare, depending on the camera geometries, to produce horizontal and if vertical disparities are present the images may be rectifiable, epipolar lines. The second drawback occurs when the scale maps are computed via two dimensional filtering of the image, as is the case in a number of stereo systems (e.g. Grimson, dimensional scale 1981a). map (as It defined can be shown by Yuille that a one dimensional and Poggio, 1984a (equation slice of a two 4.3.57 of this thesis)) is, in general, not well-behaved where well-behavedness here refers to the property of scale maps of having contours that never contain points of local minima (or as Yuille and Poggio call them, upside down mountains and volcanoes). The proof of this statement is given in the appendix along with the conditions under which a slice of a two dimensional scale map is well-behaved. A scale map that is not well-behaved may possibly contain that never reach encounter contours the minimum a line. Clearly, the matching algorithms described above will difficulty in such a case. Just to drive home the non-well-behavedness of these one dimensional slices of scale maps consider the maps shown in figure 6.2. These represent three adjacent (i.e. slightly different y coordinates) a Gaussian white noise image function. 15 slices of a two dimensional scale map for The presence of scale map contours with local minima are evident These local minima are marked by small open circles on the maps. There are three ways in which we can remedy this problem. The first way is suggested by the theory developed in chapter 4.3 concerning the skew map representation of a two dimensional function. It was shown then that the skew map (which has only one spatial dimension) has the same properties as a one dimensional scale map. Since Yuille and Poggio These examples of one dimensional scale map slices were computed and plotted by Hans Wasmeier of the Electrical Engineering Department, U B C as part of a course project 15 268 FIGURE 6.2 Three adjacent one dimensional slices of a two dimensional scale map exhibiting non-well-behavedness. (1983a) have shown that all one dimensional scale maps are well-behaved, then skew map. Thus we conclude that we can use the skew map representation so is the as our input to the matching algorithms. There is a problem with this approach, however. The way in which the skew map is computed is to take a two-dimensional slice of a four-dimensional function (i.e. that defined by equation 4.3.56). Thus, the production of a skew map involves an order of magnitude increase in the amount (which involves computation of computation over the production of a scale map of a three dimensional function). This would rule out such an 269 approach for use in all practical A second involves actually remedy implementations. to the performing the problem matching of matching two dimensional in two dimensions. image representations Yuille and Poggio have shown that the full three dimensional form of the two dimensional scale map is always well-behaved (even though slices of it are not). Although the two dimensional scale map contours may contain saddle points, which look like local minima when sliced, there is always a path from such a point to the maximum possible skew map minimum a resolution. method. line. Thus contours can This solution, however, Matching a two dimensional be matched suffers from scale map the and tracked to same problem as involves an order of not be as great as for the process involves fewer computations skew map method, however, since the the magnitude increase over the matching of a two dimensional scale map. The increase in computation probably their tree will matching than does the spatial filtering process. Thus this method is preferable to the skew map method. The third solution, and probably the best, is to perform one dimensional filtering only. That is, compute separate one dimensional scale maps along each epipolar line in the images and match these. The filtering and matching of magnitude drawback image to less than this the solution computational two dimensional scale map is the effect of one complexity matching will both be an order method. dimensional filtering on The only possible a two dimensional function. As Grimson (1981a) points out, one dimensional, or directional, filtering tends to smear out edges in a direction perpendicular to the line of filtering. We can summarize by stating that, if we want to match scale based representations of two dimensional image functions, we must do one of the following: 1 - If the epipolar lines of the imaging system are not known perform a full two dimensional matching of the two dimensional scale maps. If the epipolar lines of the imaging system are known then: then we must 270 2 - We can perform a full two dimensional matching of the two dimensional scale maps anyway. 3 - We can compute the skew maps at each epipolar line of the two images and match these. 4 - We can perform one dimensional filtering along the epipolar lines in order to compute the one dimensional scale maps along these lines, and match these maps. From the point of view of minimizing computational complexity method 4 is the best. However, other considerations, such as minimization of feature distortion, or lack of information about the imaging geometry may indicate that one of the other methods be used. 271 6.4 - Problems With Scale Space Feature Matching As with theoretical, all computational vision processes, there with implementing stereo algorithms based and foremost of these problems is the scale map, a complete convolution of required. In order to maintain coherence the o space feature and matching. First the value of a image of the scale with used in the construction of the appropriate V G filter 2 is map contours the quantization between one can see that this quantization should be at most on the order of (i.e. a^/aj _-^<l.l for a logarithmic scale), and smaller for regions where the scale map c contours approach aj /a _-^ = 2, c on scale problems, practical values must be fairly small. Looking at the examples shown in chapters 4 and 5 and later in this chapter, 1.1:1 some immense computational load that is imposed in the computation of the scale maps themselves. For each the are there k different a horizontal. is very Obviously, little in coherence the discrete between values. From this it is evident that the required to compute a scale map, over the same least times LOG(2.)/LOG(l.l) = 7.27 Crowley (Crowley, 1984 as much o as for multi-resolution algorithms, segments of a scale map where contour at number of complete image convolutions range the as the discrete discrete algorithms, is at algorithms. Note that the and Crowley and Parker, 1984) and Lowrie (Lowrie, 1983) methods, which were not designed with the scale space in mind, have quantization ratios i/ 2:1. These contour problem. methods coherence This would and would thus require is shown in figure shown in figure 4.5 be 6.3 to a ratio of /2:1 expected the to have addition where we of have problems with heuristics loss designed quantized the left of to matching of scale map handle this hand scale map and linked the contour segments using the Crowley method (basically linking the contour segments at one a value to the nearest contour segment of the same sign for the next lowest a value). Comparing this (coarsely) quantized version of the scale map to the finely quantized (ratio = 1.018:1) scale map of figure 4.5, we see that the coarsely quantized scale map differs in some places from the finely quantized scale map. It must be noted, however, that, by and large, the two scale maps agree possible to obtain matches between two coarsely quantized scale maps. and it may be 272 FIGURE 6.3 A linked coarsely quantized scale map (v/2:l a ratio) The computational load is a great problem in that, with current hardware, scale space matching algorithms, in their full two dimensional glory, take far too long to be of any practical use. Indeed, all the experiments on scale space matching described in this thesis are for the one problem dimensional case is but a paper faster and more tiger, complex only. From another point in that one can always (at hardware load. For the time being, however, architectures of view, the computational least to a certain load point) create that will be able to bear the computational it is a problem, and scale space matching systems will not be ready for the real-time world for a while yet A more functions are means that fundamental problem results not, the in general, shape of merely from the expanded corresponding scale or map fact that compressed contours the left and versions of each may not be right other. mere image This expanded versions of each other. This may make some matching algorithms fail, if they were based on 273 matching the shapes. Furthermore, nonlinear disparity wherein the scale map contours split and merge functions can give rise in different fashions in the to instances left and right scale maps. This essentially means that the left and right sides of a closed contour scale map may correspond to the right and left side of two adjacent scale map. This is illustrated in Figure 6.4. Note that one cannot contours achieve in one in the other correspondence in this case by matching complete contours. A possible solution to this problem is to manipulate the scale maps so that they are transformed into an isomorphic pair. One can do this by identifying places in the where splitting or merging of scale map contours to occur. For example these points are scale maps narrow necks between closely indicated (with a small circle) derived for a sinusoidal disparity points have been detected (relative previously discussed to the other scale map) spaced scale in figure 6.5 map contours. Some = 40sin(7rx/256)). Once one can edit the scale map by splitting or merging the methods. Alternatively, a scale maps are tree is likely of which shows a stereo pair of function (d(x) that have been singled out in such a way that the the scale map matching matchable method these contours with one which of performs splitting and merging of nodes in order to make the trees isomorphic can be used (see for example the method of Lu, 1984). The tree in this case would be the tree whose nodes are scale map contours whose children are contours that are contained by the father contour is the tree discussed earlier in the discussion of the scale map matching (this algorithm of Mackworth and Mokhtarian). A second problem that is encountered in practice is that intensity functions have added to them noise (i.e. from the camera is uncorrelated between of noise. electronics), If the image and this noise the left and right image functions, then the scale maps of the two images will contain some non corresponding contours, or at the very least the positions of the corresponding contours will be perturbed merging of the contours will algorithms be able to handle the must take place enough (even so that for linear extra noise non corresponding disparity functions). generated contours as splitting and The matching well as the non corresponding splitting and merging of the scale map contours. The effect of additive Gaussian 274 LEFT RIGHT FIGURE 6.4 Splitting and merging of scale map contours for nonlinear disparities FIGURE 6.5 A stereo pair of scale maps with sinusoidal disparity white noise on the scale maps of a stereo pair for the linear disparity case can be seen in 275 the experiments of chapter 5. In practice functions. This can be seen in figure 6.6 Note that the real images exhibit very non-linear disparity which shows the scale maps of a real image pair. splitting and merging of contours can be seen. However the human eye can easily determine the correspondences between these two maps. This indicates that it may be possible to fashion a scale space matcher that will work on real stereo imagery. FIGURE 6.6 The scale maps of a real stereo image pair. 277 6.5 - Implications for Biological Depth Perception The about above whether Hildreth scale space matching or not such processes techniques raise a number of interesting may be used in biological vision systems. questions Marr and (1980) and Marr and Poggio (1981) have pointed out current physiological evidence that the human visual system does contain V G type filtering mechanisms that cover a wide 2 range of resolutions or scales. Marr and Ullman (1981) go on to claim that certain neurons in the visual cortex may be performing zero crossing detection. Thus, it would seem that all the neural machinery is present to, at the very least, compute some form of scale map. What is not clear hierarchy, is whether that perform there the are mechanisms, matching of higher these up in the visual scale map like cortex processing representations. If such mechanisms exist, are they like the ones discussed earlier (i.e. tree matching), or do they use some other algorithm? We cannot say at present, and must wait for further neurophysiological research. One can obtain evidence for scale via psychological experiments people can fuse stereograms as well. map based processing in the human visual system One such piece of evidence is the observation that of sparse line drawings as well as dense random dot stereograms. The discrete multi-resolution matching algorithms analyzed earlier can be seen to have a lot of trouble with very sparse prone. Since the scale density of the feature achieved. Julesz (1970) matching algorithms feature sets, since the reconstruction process map method does not require any reconstruction set does pointed not affect out the on sparse apparent and dense mechanisms in the human visual system the quality feature of the differences sets will be very to take place, the correspondences that are in the performance and proposed for performing stereopsis. that error there of his be two One would work on the densely featured images and the other would perform the relatively simple tasks of matching sparse images. This conclusion is perfectly valid, except that a single method that could perform both of these tasks would be preferrable, if only on the grounds of simplicity and elegance. system According to current thought in biological circles, development of the human visual is due to evolutionary pressures. That is, any feature of the visual system, such as 278 depth perception, was developed because in doing so the survivability of the organism was in some way enhanced with the addition of the feature. However, while it is fairly easy to come up with ways in which the addition of depth perception would benefit an organism, it is difficult stereograms, human to see which rarely, visual handling what system sparse to advantage if ever, evolve there occur a would in nature. separate depth imagery. The conclusion we can mechanism, while developed to functioning, handle very sparse handle be dense imagery as well. matching methods can handle dense and sparse in the Thus ability to it is unreasonable perception mechanism make is that the imagery process can The point to also, to very sparse expect the capable of only human depth perception in the course of be made is that scale its map images with equal facility and thus, in the light of the psychological observation noted above, would be preferred over a method such as the discrete decreases. multi-resolution method, whose performance degrades as the feature density 279 6.6 - Summary of Chapter 6 high would A multi-resolution matching algorithm that used disparity measurements resolutions perform measurements - only, and performed better than one disparity function reconstruction that used low resolution (and at obtained at high resolution only higher error) disparity and disparity function reconstruction. Scale space matching algorithms need only measure and reconstruct the disparity function at the highest available resolution, thereby minimizing resolution dependent errors. - A constraint based method for matching scale maps for linear disparities is proposed. - This algorithm is successfully applied to random image pairs, (chapter 5.4). - Scale space matching increased computational - Nonlinear correspondences - of two dimensional image pairs is problematic due to the requirements. disparity functions distort the scale maps making the determination of quite difficult. Scale maps can handle sparse stereo pairs just as easily as dense stereo pairs which is not the case for the other multi-resolution matching methods described in this these (but is the case for the human vision system). 280 VII - BINOCULAR 7.1 - Introduction DIFFREQUENCY In chapter 4.3, we proved, using the Scale Space Transform, that the measurements of position disparity made from the zero crossings of V G filtered images, will not be exact if 2 the true disparity function is not constant This error was shown, for random noise images, to increase as the filter resolution decreased. This observation leads one to ask whether or not there is some measurement that can be made, other than position disparity, that provides a better shape descriptor. In the this chapter we show that one can obtain, from the Scale Map, measurements of binocular diffrequency, loosely defined two image functions. It will diffrequency values are as be shown that directly related to the difference in spatial frequency content of in the case of linear disparity functions, these the disparity gradient and contain none of the errors induced by spatially filtering that were observed in the position disparity measurement errors. We can conclude that for low spatial resolutions where the position disparity errors are large, the diffrequency measurements may be a better descriptor of the surface shape. We present the findings of some psychophysical research done by others which tends to support this conclusion. The results of experiments on one dimensional functions are presented which show that the measurements of diffrequency from the Scale maps do provide a good estimate of the disparity gradient and that the errors in these measurements are more or less independent of the spatial resolution of the filters and due primarily to quantization. Figure 7.1 shows the structure of this chapter. Introduction j 1 cm./ v Measuring diffrequency from a pair of £ scale maps ^ CH7.Z P»ychophysical evidence for ^ diffrequency | CH7.3 Experiments on linear * disparities | CH7.4 Summary CH 7.6 (*) Indicates New Material FIGURE 7.1 The topics to be covered in this chapter. 282 7.2 - Diffrequency Measurement Let us consider a situation disparity function with a constant, tilted planar rise surface wherein we are viewing a surface to a non zero, gradient when viewed binocularly. For example, a viewed al a distance to an approximately that gives rise constant large disparity compared gradient. to the inter-ocular The true disparity baseline gives function has the following form, in this case. d(x ) L = x -x R = L (7.2.1) Po + P:X L It can be seen that if the left eye sees a light intensity pattern g(x) and the right eye a light intensity pattern f(x), then g(x) and f(x) are related as follows: g(x ) L = f(x ) (7.2.2) R and hence we can write, using equation (7.2.1): g(x ) L = f(/3„ + (l + p\)x ) (7.2.3) L We can now, using equation (4.3.7), show that the SSTs of the left and right images are related as follows: G(x,a) = ( l + p \ f F ( p \ + ( l + |3 )x,(l + /3 )o) (7.2.4) F(x,o) = ( l + /3 )G{(x-^o)/(l + /31),a/(l + /3 )) (7.2.5) 1 1 1 or Note that the scale 1 map of f(x) is obtained 1 from the scale map of g(x) by a uniform 283 expansion by a factor 1/(1 + ^) in both the x and a which shows a scale map pair for a tilted surface. directions. Look again at figure 4.5 It is clear that the left and right scale maps are scaled versions of each other. Foveal Diffrequency If we make the coordinate transformations: r = log (a) and y = 2 x/o (7.2.6) it can be shown that: 5(yj) = G(x,a) = 2" k F(y + 0 „ 2 ~ , r + k ) where k=log (l + /3 ) and F(y,r) = 2 (7.2.7) r 1 F(x,a). It can be seen that, for r fixed to some value r , G"(y,r ) is the same as a scaled and translated version of F(y,r). This transformation is the 0 0 type one would expect to find in a foveal system, where the highest resolution (smallest o) elements are clustered in a central region and the lower resolution elements are more loosely distributed outwards toward the periphery. Such a foveal representation, similar to the human system, is shown schematically in figure 7.2. This diagram shows the field of view for each resolution level, where the number of elements in any one resolution level is the same. We have assumed that there are elements sensitive to the central region at all resolutions. There is evidence, both neurological and psychological, that this is indeed the case in humans (see for example, Wilson and Giese (1977), especially equation (10)). Applying the transformations 7.2 results in the representation of equation (7.2.6) to the foveal representation of figure shown in figure 7.3. The most striking feature of this image representation is its homogeneity, which is reminiscent of the spatial frequency columns found in the human visual cortex transformation (7.2.6) (Maffei and Fiorentini (1977)). on the foveal representation, we will Because denote of this the image effect of the of the SST 284 FIGURE 7.2 FIGURE 7.3 The spatial organization of a foveal image representation. The transformed version of the foveal image representation under the transformation (7.2.6) as the foveal scale space transform or FSST. of figure 7.1 The FSST of a 285 function f(x) is defined as follows, 2" V2i F(y,r) = The 2 2 T (X set of zeroes of the FSST will be referred knows j3 scale map of f(x), do this the which contours map where f(x) correspondence in the of f(x). and g(x) We foveal will (7.2.8) to as the foveal scale map of f(x). from the foveal scale map of g(x) correspond correspondence must be able with those in the problem can in equation (7.2.7) we can see be solved without any that a point (y,r) elements of the this -r 0 F(y,r) = 0 vector process is illustrated in the is the same as the point (y + j3 2 ,r+k) in the foveal scale map of by adding to 2 on to tell foveal scale where k = log ((l + /3i)). Hence one can obtain the foveal scale map of g(x) point If one are the right and left eye image functions, ln order assume that the foveal scale map of g(x) each dv problem must be solved. That is, one error. Using the relation expressed f(x), ( y _ v ) V 2 and /3j one can graphically construct the foveal scale map of g(x) 0 to d /dy /" f(v2 ) e ~ 2 curve the by the in (y,r) space the foveal diffrequency and example vector the (|3 2~ ,k). 0 r We will foveal disparity. This call the construction shown in figure 7.4. The important point to be made about the above process is that it is, in principle, invertible. That is, given that we know the foveal scale maps of both f(x) and g(x), the and true disparity function p\ p\. we should be able to determine This inversion process stereo; of determining disparity gradients from frequency Let us assume that f} 0 systems since is zero (this may is the the parameters of basis of diffrequency differences. be a good assumption for human vision the disparity at the centre of the visual field or fovea is always being driven to zero by the vergence mechanisms). as shown in figure 7.5. In particular, In this case the left and right scale maps are the point in the right scale map corresponding given point in the left scale map is obtained by finding the intersection related to a of the line passing through the origin and the corresponding point in the left scale map with the corresponding right eye scale map contour. It can be observed that the ratio of the a values of corresponding right and left eye points is a constant, and that this ratio is equal to (l + p\). 286 F I G U R E 7.4 The relationship between the left and right foveal scale maps. Thus, if we measure this ratio, the disparity gradient can be easily obtained. If we transform to the foveal scale space the problem is even simpler, as shown in figure 7.6. In this case, corresponding points in the two foveal scale maps are located directly above each other, and the vertical shift is equal to k=log (l + 01). From this measurement the disparity gradient can 2 again be obtained. 287 FIGURE 7.6 The relationship between the left and right foveal scale maps for /3 = 0. 0 288 7.3 - Psychophysical Evidence For Diffrequency Stereo Blakemore presented whose (1970) human period and Fiorentini subjects with was different and Maffei binocularly ambiguous in each eye. Such (1971), in independent stimuli consisting a stimulus pair experiments, of sinusoidal is shown gratings in figure 7.7. A unique disparity function can not be found for such stimuli, yet the subjects did not perceive any rivalrous surfaces, but instead perceived a stable tilted surface. was evidence for a differences between binocular mechanism that was based Blakemore claimed that this on the perception the two eyes, as the usual binocular mechanism of frequency based on retinal position disparity would not give a unique result Tyler and Sutter (1978) improved on Blakemore's experiment in an effort to eliminate all possibility of the use of any position disparity information. They used dynamic (i.e. time varying) random noise images such that the left and right eye images were uncorrelated. Thus there were no corresponding obtain position disparity however, different features in the left measurements. and right The spatial frequency Thus, if there is any binocular based on diffrequency measurements, grating diffrequency stimuli mechanism in the human were, visual system then a perception of depth should be elicited. This is in but was nonetheless operated that could be used to ranges of the two image mechanism fact what the subjects tested by Tyler and Sutter reported. sinusoidal images only at present medium The. effect was weaker They or high also found diffrequency that values; than for the pure presumably because the error involved in using only position disparity information at low diffrequencies is small. This disparity is in accord error with does in general the findings of chapter increase as the disparity 4.4 wherein gradient could be useful fields but diffrequency a finely tuned processing for animals, no conjunctive to disparity such system. gradient are The equations exact frequencies, and as such This means that diffrequency measurements as crabs and some reptiles, eye movements. that the increases. Tyler and Sutter argue that diffrequency measurements could be made at low spatial do not require we showed That derived is, there measurement of diffrequency for linear disparity gradients, that have above overlapping visual relating is no inherent error binocular in the in contrast to the case of position 289 F I G U R E 7.7 disparity A pair of ambiguous sinusoidal stimuli. measurement resolutions This would would be just as precise indicate that diffrequency measurements made as at high resolutions. Thus Tyler and Sutter's at low argument seems to be supported by our analysis. It should be pointed out here that we can not obtain an expression In such for the a case the chapter 4.3, the scale maps for a nonlinear disparity function. diffrequency measurements may exhibit a form of Filtering error such as that derived for the in relationship between disparity measurement case in chapter 4.3. if the region of interest is sufficiently small However, as was pointed out we can function about the centre of that region. This linearization process will linearize the disparity become less valid as the resolution decreases. It has been be obtained range . 16 even found (Blakemore, if This behaviour the can cumulative 1970) that a fused perception horizontal position not be explained as the disparity result of a tilted surface can exceeds of matching Panum's algorithms fusional such as "Panum's fusional range is the range of disparities over which a binocular stimulus consisting of a vertical line can be brought into correspondence by the human visual system, for a given position of the eyes. 290 Marr and Poggio's (1979). It has been fusion is the disparity gradient Tyler, 1973). It can be seen found that the limiting and not the position that a disparity frequency disparity gradient explained by positing a diffrequency mechanism. scatter of the diffrequency sensitive factor in obtaining binocular (Burt and Julesz, 1980, and limited fusional range can be readily If the diffrequency value exceeds the spatial neurons, then a diffrequency value can not be obtained. Hence fusion will not be produced. In human light visual information detection of these psychophysical system in some neurons the foveal units. These neurons mechanisms fashion. sensitive scatter in the retinal detect neural There performing be spatial neurons frequency receptive fields of these afferent (/3 2~ ). 0 r diffrequency sensitive This scatter the depth 0 the that that there processing receive ranges. There exists in the of signals diffrequency from would need feature detector neurons, would be larger in order to for the higher to a set of higher feature to be a resolution neurons would combine with the normal disparity et al 1967) to feed signals which • would compute the /3 sense describe for could to different disparity (see Barlow findings, it seems evident level cortical sensitive neurons and /31 values (or at least values of parameters which in some and tilt of the surface). The fact that corresponding foveal scale map contours lie right above each other (as shown by figure 7.6) means that the connections required for diffrequency measurement can be local and regular (see figure 7.2). The analysis presented in this section indicates that a possible reason for the existence of diffrequency measurement units in the human visual system, least, is to provide better surface shape descriptors than at low spatial frequencies at those provided by measurements of position disparity. In fact, since the measurement of position disparities at low resolutions are susceptible which to the filtering errors are immune reliable matching depth an independent in chapter to these errors (for linear information algorithms. discussed at low resolutions Perhaps more importantly, disparities 4.3, the diffrequency anyway), for the initial measurements, can be used to provide phases of the multi-resolution the diffrequency measurement process provides measure of the surface orientation, which can then be used in conjunction with the position disparity measurements to provide an improved reconstruction of the surface 291 shape. 292 7.4 - Experiments In this section we present the results of experiments designed to test the adequacy of diffrequency measurements in estimating the disparity gradient of a surface, when that surface gives rise to a random, Gaussian, intensity distribution. The experiments proceeded as follows. We generated the left and right scale maps of one dimensional random, white, Gaussian distributed, functions for constant of -20/255, -40/255, -60/255 and -80/255. disparity gradients These scale maps are the same ones shown in figures 5.23 to 5.27. We then, for each disparity gradient case, match the left and right scale maps (using measure the simple method the diffrequency value of each left hand scale along which equation described map. Since the scale we measure for this curve 6.2). Then, for each maps have is not a a logarithmically scaled straight line, a but, rather, as follows. The diffrequency path corresponding point as /3j is varied) value of a we scale map contour crossing that value of o the disparity is derived in chapter axis, in the the path is curved. The (i.e. the path of a for linear (x,o) axes scaling through the point (in the left hand scale map) (x ,a ) is given by: 0 a a x/x — 0 0 (7.4.1) 0 Making the logarithmic scaling transformation: I = (255./6.65)LOG (a-2) (7.4.2) 2 that is used in the computation of the scale maps, we get I(x) = ,6.65I /255, (255./6.65)LOG ((2 + 2' )x/x 0 2 for the equation of the diffrequency path. Some figure 7.7. Notice how these paths are almost 0 - sample vertical (7.4.3) 2 diffrequency paths near are depicted in the x origin and flatten out 293 Diffrequency Search Paths • • • - « • • « i i m •* m m FIGURE 7.8 The diffrequency search paths for a logarithmically scaled a axis. towards large x values. Once diffrequency all of the diffrequency values have measurements with changes in a is then been measured the R M S error in these computed. The variation of the RMS diffrequency error for each value of disparity gradient tested is shown in figures 7.8 to 7.11. In all the cases, the diffrequency error was on the order of 10% of the actual value for all a for values in the range tested. It can be seen, however, that the diffrequency errors the linear disparity functions, are not zero (as predicted by theory). Also the error is seen to rise with an increase in o. These effects can be accounted for by the quantization in the scale map contour position measurements. An expression for the diffrequency quantization error variance as a function of the disparity gradient and a is derived in the Appendix. The standard deviation of the quantization error (the square root of the variance) is a measure of the R M S quantization error. This is plotted in figure 7.12 as a function of a gradients of -20/255, -40/255, -60/255 and -80/255. for disparity It is seen that the quantization error does indeed form a large portion of the observed R M S diffrequency measurement error. The Filter Sigma FIGURE 7.9 The RMS diffrequency error as a function of o for 0,= -20/255, linear disparity Filter Sigma FIGURE 7.10 The RMS diffrequency error as a function of o for p\ = -40/255, linear disparity 295 o d Filter Sigma FIGURE 7.11 The RMS diffrequency error as a function of o for 0 i = -60/255, linear disparity Filter Sigma FIGURE 7.12 The RMS diffrequency error as a function of a for p\= -80/255, linear disparity 296 o d fcd W c o • 1-4 11 c CO 3 Or 8 d 100 10 Filter Sigma FIGURE 7.13 variation in the be explained diffrequency The standard deviation of the diffrequency quantization error measured by the value of the diffrequency (which should be independent of o) fact that there are measurements to be made. less contours for larger Since are there fluctuations in the estimate of the mean diffrequency increase. J less a values, measurements, and hence, the can less statistical 297 7.5 - Summary of Chapter 7 - The disparity gradient of a surface can be obtained from measurements made directly on the scale maps of a stereo image pair of that surface. - The corresponding 1/(1 + binocular features in diffrequency, the scale defined maps of as a the one where j3, is the disparity gradient of the - If one makes the coordinate one obtains the foveal scale map, wherein the ratio of the dimensional spatial image frequencies pair is equal of to surface. transformation r=log (a), 2 y = x/a to so called because its structure resembles density of elements is greater for high resolutions the scale map, the human fovea, than low. ln the foveal scale map, corresponding features lie directly above each other, resulting in a simple implementation structure for the diffrequency measurement - The diffrequency process has been proposed by others as an alternate (to the standard position stereo process) mechanism for acquisition of depth information in animals. - The diffrequency measurements, for linear disparities, are not affected filtering process, as was the case for position disparity measurements. by the spatial This suggests that for low resolutions at least that diffrequency measurements may be more reliable measures of the surface shape than position disparity - Experiments measurements. were done which show that diffrequency measurements can be obtained from random one dimensional stereo pairs for linear disparities, and that the errors in these measurments were on the order of the expected quantization error. - The independent disparity disparity of function the gradient position reconstruction information disparity process disparity information alone, (see chapter obtained measurements. to 3.8) obtain a from Therefore diffrequency both can better reconstruction measurements be than used in is the with position 298 VIII - 8.1 - CONCLUSIONS Summary and AND A LOOK TO THE FUTURE Conclusions In this thesis we have presented a multi-resolution stereo feature matching algorithm that was very simple. This algorithm was based on the Marr-Poggio (1979) algorithm. We have dispensed detection mechanisms non-constant simply with their computationally intensive use on the grounds that these disambiguation and mechanisms do in-range/out-of-range not work reliably for disparity functions. To take the place of these mechanisms we proposed that we a disparity function estimate, obtained from the lower resolution levels by a reconstruction process, to disambiguate competing matches at a given resolution. Two matching techniques for a given resolution level were proposed, the nearest neighbour matching scheme, which took the match closest to the estimated match position to be the correct one, and the more computationally intensive dispscan algorithm which takes as the correct match that which resulted in the largest correlation with the estimated disparity function in a neighborhood about the match. We predicted that, in order for these methods to work, using such simple matching methods, the accuracy of the low resolution disparity estimate would need to be quite high. In this light we analyzed the mechanisms which would give rise to errors in the disparity estimate and tried to predict what effect they would have on the matching algorithms. These analyses have shown that the error in the disparity estimate, for white gaussian noise images, increases as the resolution decreases, as the disparity gradient increases, as the feature density (usually a function of resolution) decreases, as the camera signal to noise ratio decreases, and as the disparity range increases. errors affected the matching process, We were not able to fully detail how these due to the mathematical complexity. We were able to show that the nearest neighbor matching scheme can tolerate a certain level of error in the disparity estimate, which depends on the feature density and type, and still yield exact matching. Thus we can conclude that if the disparity estimate is sufficiently accurate at each 299 resolution level (and the the estimate matching algorithm will can be less exact at lower resolutions than at high) then converge to near the correct value. This general statement was borne out by the experimental results given in chapter 5. The process experimental evidence made clear the importance of the reconstruction process; the of obtaining a complete samples of the either because disparity function estimate at each resolution level from the disparity function at that resolution. If the reconstruction was not done well, the reconstruction method was poor or because the true disparity function was varying too rapidly, there would be significant amounts of error in the final disparity function estimate. It was seen that, since the sources of error generally show spatial variations, the error in the final disparity function were also not uniformly distributed. For example, regions where the disparity function changes rapidly will be prone to reconstruction and filtering error, while relatively smooth regions will be accurately measured. We neccessarily method examined a number of techniques for performing the reconstruction process, which involves non-uniformly distributed samples. proposed constraint) by Grimson (1981b) was based which were not always valid. This fact, on We pointed out assumptions (his that the surface relaxation consistency combined with the shortcomings of other known methods for reconstructing functions from non-uniformly distributed samples led us to the development of our own method, which we call the warping or transformation method. This method was seen to perform better than the relaxation method or an averaging method in regards to minimizing the large isolated errors that the matching algorithm produces when it diverges. We have also pointed out that the process of reconstructing the disparity function from its samples also smooths crude 9x9 averaging the errors reconstruction in these samples somewhat technique worked well This (apart may from explain why the its propensity for localized error). The averaging of the disparity values also averaged the disparity errors, which being zero mean and uncorrelated should average out to zero. 300 The key factor in obtaining high accuracy keeping the disparity measurements from our simplified matching algorithm is in and the reconstructed disparity function estimate sufficiently accurate at all resolutions. However, at low resolutions, which are the most important in terms of the convergence reconstructions is to make of the algorithm, obtaining accurate disparity spatial more accurate measurements filtering functions as of the disparity function at error (for linear disparities at well). The diffrequency measurement least, and mechanism very coarse resolutions (implying very matching scale space representations any disparity measurements disparity can be easily may simple implementation). We which are immune to for other disparity be important. as a vision of the disparity gradient The second answer lies in as a whole. Scale maps can be matched without making or reconstructions. measured lower resolutions. possibly module of its own, due to its ability to make accurate measurements at and are difficult We have proposed two possible answers to this problem. The first have shown that this can be done by using diffrequency measurements, the measurements at the Once the highest scale maps resolution in have the been matched, scale map, and the then reconstructed to give a complete disparity function. We have shown that the simplified multi-resolution matching successfully applied to the industrial task of automated log scaling. algorithm can be 301 VIII - 8.1 - CONCLUSIONS AND A LOOK TO THE FUTURE Summary and Conclusions In this thesis that was have very simple. This dispensed detection simply with their mechanisms non-constant use we have presented on a multi-resolution stereo feature matching algorithm algorithm was computationally the grounds based on the intensive that Marr-Poggio (1979) algorithm. We disambiguation and these mechanisms do in-range/out-of-range not work reliably disparity functions. To take the place of these mechanisms we proposed that we a reconstruction disparity process, function estimate, obtained from the lower resolution levels which took the match closest to the estimated the a largest correlation with scheme, match position to be the correct one, and the more computationally intensive dispscan algorithm which takes as the correct match in by to disambiguate competing matches at a given resolution. Two matching techniques for a given resolution level were proposed, the nearest neighbour matching resulted for the estimated disparity function in that which a neighborhood about the match. We predicted methods, the that, in order for these methods accuracy of the low resolution In this light we analyzed the mechanisms to work, using such simple matching disparity estimate would need to be quite high. which would give rise to errors in the disparity estimate and tried to predict what effect they would have on the matching algorithms. These analyses have shown that the error in the disparity estimate, for white gaussian noise images, increases as the resolution decreases, as the disparity gradient increases, as the feature density (usually a function of resolution) decreases, as the camera signal to noise ratio decreases, and as the errors affected the disparity range increases. matching process, due to We were not able to fully detail how these the show that the nearest neighbor matching scheme disparity estimate, which depends on the matching. Thus we can conclude that if the mathematical complexity. We were able to can tolerate a certain level of error feature density and type, and still yield in the exact disparity estimate is sufficiently accurate at each 302 resolution level (and the estimate the matching algorithm will can be less exact at converge to near the lower resolutions than at high) then correct value. This general statement was borne out by the experimental results given in chapter 5. The experimental evidence made clear the importance of the reconstruction process; process of obtaining a complete disparity function estimate at each the resolution level from the samples of the disparity function at that resolution. If the reconstruction was not done well, either because the reconstruction method was poor or because the true disparity function was varying too rapidly, there would be significant amounts of error in the final disparity function estimate. It was seen that, since the sources of error generally show spatial variations, the error in the final disparity function were also not uniformly distributed. For example, regions where the disparity function changes rapidly will be prone to reconstruction and filtering error, while relatively smooth regions will be accurately measured. We neccessarily method examined a number of techniques for performing the reconstruction process, which involves non-uniformly distributed samples. proposed constraint) by Grimson (1981b) was based which were not always valid. This fact, known methods for reconstructing on We pointed out assumptions (his that the surface relaxation consistency combined with the shortcomings of other functions from non-uniformly distributed samples led us to the development of our own method, which we call the warping or transformation method. This method was seen to perform better than the relaxation method or an averaging method in regards to minimizing the large isolated errors that the matching algorithm produces when it diverges. We have also pointed out that the process of reconstructing the disparity function from its samples also smooths crude 9x9 averaging the errors reconstruction in these samples somewhat technique worked well This may (apart from explain why the its propensity for localized error). The averaging of the disparity values also averaged the disparity errors, which being zero mean and uncorrelated should average out to zero. 303 The key factor in obtaining high accuracy keeping the disparity measurements from our simplified matching algorithm is in and the reconstructed disparity function estimate sufficiently accurate at all resolutions. However, at low resolutions, which are the most important in terms of the convergence reconstructions is to make of the algorithm, obtaining accurate disparity spatial more accurate filtering functions as measurements of the disparity function at error (for linear disparities at well). The diffrequency measurement least, and very coarse resolutions (implying very matching scale space representations any disparity measurements disparity can reconstructed We be easily for the Once the highest other mechanism may be important scale maps have resolution in the disparity as a vision of the disparity gradient as a whole. Scale maps can be matched at resolutions. We which are immune to simple implementation). The second or reconstructions. measured lower possibly module of its own, due to its ability to make accurate measurements at and are difficult We have proposed two possible answers to this problem. The first have shown that this can be done by using diffrequency measurements, the measurements answer lies in without making been matched, scale map, and the then to give a complete disparity function. have shown that the simplified multi-resolution matching successfully applied to the industrial task of automated log scaling. algorithm can be 304 8.2 - Directions for Future Work In this thesis computation. complex we have tried However, increased matching algorithms. to keep our algorithms simple, for the sake matching For accuracy example the may be obtained figural of rapid with the use of more continuity constraint suggested by Grimson (1985) and by Ohta and Kanade (1985) may provide improved performance. However these algorithms must be made more computationally efficient. Incorporating information from other will certainly help the convergence obtained process. as a result For example vision modules, for example shape from shading, of the matching algorithm (see Ikeuchi, 1983). Information of a recognition process can also be used to aid in the matching the vision module may decide that it is looking at a box (or a log) and use a 3D model of the object to guide the stereo matching process. Also more complex reconstruction with from algorithms may examined. non-bandlimited functions, real scenes usually such have). as those Grimson methods for handling the reconstruction not pursued representation has object this any further. It of the disparity data only one disparity value, The current reconstruction techniques with discontinuities (which (1981b) and Terzopoulos have problems disparity functions (1982) have discussed of functions with discontinuities. They have apparently may be that what is required is a change in the for a scene from a functional form, wherein each point to an object has a unique disparity value. based Reconstruction object, independently of all the other objects. representation would then wherein each point of an be performed over a given Note that this method requires input from the higher cognitive levels of the vision system. The scale space methods so briefly described in this thesis, need much improvement In particular, matching of scale maps of non-linear disparity functions should be looked into. Two dimensional excessive scale computation space required image. Perhaps approximate matching needs to calculate examination, the scale maps as does the for an entire problem of the two dimensional methods based on quantizing the scale space or coding the scale space contours could be devised. 305 From an engineering standpoint the obvious next step is to implement the algorithms described herein in hardware capable of real time operation. This is especially so for the case of the log scaling application described in chapter 5 where processing times on the order of seconds are required, as opposed to the 20 minute span taken by the general purpose minicomputer. We have addressed Lawrence, 1984, and Clark and Lawrence, 1985b). implementation on a this problem in some detail in (Clark and 306 APPENDIX I - PROOFS O F T H E O R E M S Proof of T H E O R E M Proofs S T A T E D IN CHAPTER 3 3.1 . of this theorem can be found in many places. For a survey of these see (Jerri, 1977). Proof of T H E O R E M The 3.2 : proof of theorem 3.2 can be found in (Jerri, 1977). Proof of T H E O R E M 3.3 : Since h(r) is bandlimited to w = 7r, we can write (from Theorem 3.1) : 0 h(r)=Ih(n)g(T-n) Now, since a one-to-one (1) continuous mapping 7(1) exists such that n = 7(t ) and r = 7 ( t ) , we n have that h(7(t))=£h(7(t ))g(7(t)-n) n Because h(7(t)) = f(t) we have that (2) 307 CO f(t)=Zf(t )g( (t)-n) n (3) 7 Q.E.D. Proof of T H E O R E M For 3.4 : a proof of this theorem see Petersen and Middleton, Proof of T H E O R E M 1962. 3.5 : From Theorem 3.4 we have that: h(f) = L h(L)g(f-Is) Now, since 7(x ) g = T (4) and 7(x) = £, we have that: (5) The condition that h(?) transformation y = f(x), along with the be non zero everywhere, gives then that: condition that the Jacobian of the 308 f(x) = I f(x )g( 7 (x)-7(x s )) (6) s Q.E.D. Proof of T H E O R E M 3.6 : A matrix mapping -y(x) is one-to-one |37(x)/9x| and continuous if the determinant is non-zero everywhere. At the interior points of P x of its Jacobian (those points that are not vertex or Link points of the partition) we have y as defined in equations (27), (28) and (29). We can rewrite (27) as follows: 7(x) = A V U A ^ X + C ] (7) where r = r\ r 2 2 3 2 (x 2-x 3) 2 2 (X 3-X 0 (X ,-X ) 2 2 2 (xS-xS) 1 (xS-xS) (xS-x,) 1 The (8) vector C is of no consequence in this proof as it does not appear in the expression for the Jacobian. A is as given in equation (29). After some algebraic manipulation we get that: J = A" |r (X^)| 2 T = a/3/A 2 (9) 309 where a = x (x 3-x ) + x (x -x ) + x 3(x -x ) 5 I I 2 1 i J 2 i 1 2 , 1 j 2 3 i 1 ] 1 2 2 (10) 1 i i 0 = r 1(r 3-r 2)+r 2(r 1-r 3)+r 3(r 2-r 1) It can be shown that a = 0 iff the points x\,x ,x 2 r!,r2,r3 are collinear. Since the points Jacobian at the interior points of P In order that 7 x Tj (ii) are collinear, and that )3=0 3 are not collinear, (3^0 and continuous at the Links of P that the value of the Jacobians on either side of an Link of P hexagonal points lattice r!,r2,r3 in ? in J We we must have the same sign. which illustrate the mapping of points in x have mapped the points ensure Xi^ Jc 2 in x 3 space to the space into the x that a > 0 and that /3>0 space to the point T 4 partiton regions P , D 2 space. x x space, creating the partition regions P] and D . Let us assume sample point in x and 1.2, will have a nonzero when the points in the set V(x) are not collinear. be one-to-one Consider figures 1.1 and y iff the points 2 for this mapping. We now wish to map a of the hexagonal lattice in J space, to create the that share a common Link with P i and D i . Imagine that r4 was not constrained to lie on a vertex of the hexagonal sampling lattice, but could lie anywhere in J space. It can be seen that if I\ was to lie anywhere on the line through Tj and T 2 , would be zero (as the points only along this line can /3 T] and T 2 negative. lattice, as does T 3 Now since riJ 2 and T 4 are collinear). Furthermore be zero. Hence if T 4 then /3 we have it can be seen that 0 constraint that it can be seen that lies on the same side of the line through is positive, and if it lies on the the that /? T4 must for the region formed by lie on opposite side then 0 the and T 4 hexagonal is sampling is negative. Now for "3 FIGURE 1.1 A portion of the partition P-^Q in x space. r4 FIGURE 1.2 A the Jacobian Hence x t of 7 must portion of the partition D^Q to have the same sign in P 2 in \ as in P i , a 2 by points x,, x , and x . That is, P 2 Q.ED. for P lie on the side of the line through it, and x2 words the region formed formed by points Xi, x , and x be one-to-one space. 4 and continuous. x 3 2 must also be negative. opposite to x3. In other must not overlap the region formed must be a tessellation for the mapping function 7 to Proof of THEOREM 3.7 : The proof of this theorem can be found in (Petersen and Middleton, 1964). Proof of THEOREM 3.8 : The proof of this theorem follows from the proof of theorem 3.7. 312 APPENDIX II - This A METRICALLY ORDERED SEARCH appendix describes searches in the transformation is a modification rectangular of region the the reconstruction one in (whereas Hall's over 1982) algorithm be extended arbitrary the boundaries. The improvement over that is used to perform algorithm described in chapter (Hall, can search to search algorithm ALGORITHM and performed convex the performs a regions a is metrically ordered. That is, no point is searched 3. diamond rectangular paper neighbour This algorithm search search). by modifying the algorithm given in the nearest This over a algorithm test for reaching above is that the before a point that is closer (using a Euclidean distance metric) to the start point The search is performed along the path shown in figure 2.1. search region. Note how the search Most of the path changes in response complexity in the boundary conditions. The edges of the search algorithm to reaching a boundary of the is due to the handling of these path are divided into five different types. These types, numbered 1,2,3,4 and 5, are defined as follows: Edge Type 1: Edges along the lower right side of the diamond. Edge Type 2: Edges along the upper right side of the diamond. Edge Type 3: Edges along the upper left side of the diamond. Edge Type 4: Edges along the lower left side of the diamond. Edge Type 5: The short edge causing an offset in the search path between edge type 4 and 1. Examples of these edge types are shown in figure 2.1. The edges, along which the search takes place, can be in one of five modes. These modes describe the relation of the edge with the boundaries of the search region. These five modes are summarized as follows: Mode 1 = 1: Mode I represents the case wherein an edge lies completely within the search region. That is, no part of the edge lies in the boundary. 313 FIGURE 2.1 Mode 2 = The diamond search path and the five edge types. IO : Mode 10 represents the case wherein the initial portion of the edge lies within the search region and the rest of the edge lies beyond the boundary. Mode 3 = 01 : Mode 01 represents the case wherein the initial portion of the edge lies outside the boundary and the rest of the edge lies within the search region. Mode 4 = 010 : Mode OIO represents the case wherein the initial part of the edge lies outside the boundary, the middle part lies within the search region, and the final part of the edge lies outside the search region. Mode 5 = B : Mode B represents the case wherein the entire the search region. Examples of edges of each of these modes are given in figure 2.2. edge lies outside of 314 BOUNDARY MOPE 1 MODE Z \ MODE 3 FIGURE 2.2 The Examples of edges in the five edge modes. operation of the algorithm is fairly simple. The search merely cycles between edge 1 to edge 2 to edge 3 to edge 4 to edge 5 to edge 1 etc. After each cycle the length of the edges is increased by one pixel. When the search along any one of the edges encounters a boundary, the mode of that The edge, and of the next edge is altered to indicate this fact. search point (X,Y) and the endpoint of the previous edge (E(i-1)) are also altered when the search along edge i reaches a boundary. The exception is edge type 5 whose parameters (length, mode) are never changed. In this way the search pattern shown in figure 2.1 obtained. A pseudo-high-level language description of this algorithm is given below, begin ** X,Y is the starting point * * El <- X + d; •* d is the search step size ** is E2 < - Y - d ; ** Initialize the edge directions ** E3 < - X-d; E4 < - Y + d; for X i <= 1 X 4 until + do m(i) < - I ** Initialize the edge modes to I 1; TEST(X,Y) SEARCH EDGE 2 SEARCH E D G E 3 ** Search along the edge directions ** SEARCH EDGE SEARCH EDGE 5 SEARCH EDGE 1 go 4 L to end DIAGONAL SEGMENT procedure SEARCH(dx,dy,Xl,X2,Yl) begin N if <N for floor(|(X2-Xl)/dx|); = i 0 <- then return 1 until N do XI <- XI + dx; ** Increment the search position ** Yl <- Yl + dy; begin TEST(Xl.Y) ** Test for the quantity being searched for ** end end procedure begin SEARCH E D G E 1 dx = 1; ** Initialize the direction of search ** dy = - 1 ; Step 1: [Mode I edge generation] if m(l) = I then begin if E l > XRIGHT then go to HITBOUND1; DIAGONAL E4 < - SEGMENT SEARCH(dx,dy,X,El,Y); E4 + d; return end HITBOUND1: begin if E l > m(l) 2*XRIGHT - XSTART then go to ENTERB; < - IO; Let m(2) be adjusted to reflect that the initial segment of edge 2) is now outside of the search region.; DIAGONAL E4 < - S E G M E N T SEARCH(dx,dy,X,XRIGHT,Y); E4 + d; return end Step 2: [Mode IO edge generation] If m(l) = IO then begin if E l > 2*XRIGHT - DIAGONAL E4 < X S E G M E N T SEARCH(dx,dy,X,XRIGHT,Y); E4 + d; <- E l ; X S T A R T then go to E N T E R B ; Y <- YSTART; return end Step 3: [Mode 01 edge generation] If m(l) = 01 then begin if E l > 2*XRIGHT - Y <- YTOP X <- El - if E l > Y + XSTART then go to ENTERB; YSTART; XRIGHT then go to HITB0UND2; DIAGONAL SEGMENT E4 < - E4 + SEARCH(dx,dy,X,El,Y); d; return end HITBOUND2: begin m(l) < Let 010; m(2) be adjusted to reflect that the initial segment of edge 2) is now outside of the search region.; DIAGONAL SEGMENT X <- Y <- E4 < - SEARCH(dx,dy,X,XRIGHT,Y); El; YSTART; E4 + d; return end Step 4: [Mode 010 edge generation] if m(l) = 010 then begin if E l > 2*XRIGHT - Y <- YTOP; X <- El - Y + XSTART then go to ENTERB; YSTART; D I A G O N A L S E G M E N T SEARCH(dx,dy,X,XRIGHT,Y); X <- El; Y <- YSTART; E4 <- E4 + d; return end Step 5: [Mode B generation] if m(l) = B then begin E4 <- E4 + d; X <- El; Y <- YSTART; return end Step 6: [Outer right boundary first reached] ENTERB: begin m(l) <- B; Let m(2) be adjusted to reflect that the initial segment of edge 2) is now outside of the search region.; if all m(i) = B then return; else E4 <- E4 + d; 319 X <- Y <- El; YSTART; return end end procedure SEARCH E D G E 5 begin X <- X if X > + 1; XRIGHT then m(l) = B; else TEST(X,Y) return end The procedures S E A R C H measure. EDGE 2, 3, and 4 are The form of these procedures Details such as the is the same sign of dx and dy, and the condition are different. not written down here, as a space saving as for procedure detection SEARCH and handling EDGE 1. of the boundary 320 APPENDIX m - DERIVATION OF COVARIANCE FUNCTIONS In this appendix we derive the expressions for the covariance functions and their derivatives that are required in the body of the report. One dimensional We case. will first determine the autocovariance of the slice along o=o0 of the one dimensional scale space transform. This slice is obtained by filtering a one dimensional white Gaussian signal, having power spectral density of, with a filter having the following frequency response: H(w) = -ao w e"" 3 2 (1) 2 0 ( , V 2 The power spectrum of the filtered signal is given by a- |H(w)| 2 S(w) = 2 giving: (2) ofo^t'^ 0" 2 The autocovariance of the filtered function is simply the Fourier transform of S(a>) so that we get: <Kr) 25 = (3) (ofoo/iW^nsiT/ooy-iiT/ooy+ih' ^^^ If one performs a McLaurin Series expansion of e~'-^(T'/ 'o ) 0 tf( )(0) = n a (-l) i 2 n/2 ( n + 4)!/[(n/2 + 2)!2 for n even and is identically zero for n odd. Two dimensional case. 2 j ^ t ^ Q s e g n n n + 5 a„ "V7r] (4) 321 We will now derive the expressions for the autocovariances the cross-covariance h, and h Hj and H of H ! and H . Let S](CJ) and S (w) 2 2 of H i and H 2 be the power spectral densities of respectively. Let us define PI(CJ) and P (CJ) to be the power spectral 2 2 respectively. These are related to Si and S 2 P,(o>) = \oWe~ ° \ P (u) |a w e u2 = 2 2 2e2/2 2 2 _ c j 2 a 2 e V 2 | 2 along with densities of as follows: 2 S,(w) (5) S (CJ) (6) 2 —cj cr e /2 2 where 2 2 2 is 2 O CJ C -x /2e a 2 d /dx 2 2 2 o/e\/2ire The . transforms of PI(CJ) the Fourier transform of the filter ^2(r) are function 2 and P (w) 2 autocovariance respectively. In functions ^ i ( r ) and order determine to these the Fourier functions we must find expressions for Si and S . 2 Let us define the following two dimensional functions: = f(x,y)»a5(x)V27r e" g (x,y) = f(x,y)»a5(x)/ /27r 9 /3y gl (x,y) 2 where (•) l 2 (7) y / 2 a 2 e " y V 2 a 2 (8) indicates the convolution operator. It can be seen that: g.(x,y ) = h,(x) (9) g (x,y ) = h (x) (10) 0 and 2 0 2 322 That is, h, and h Mersereau hi and h are slices of the two dimensional functions g, and g . We can use 2 2 and Oppenheim (1974) slice projection theorem to find from which we can then get Si and S . Using u 2 the the Fourier transforms of the slice projection theorem we can write: 2 U22 2n /Ih,(x)} = o.SZ o t~ ° 4o flh (x)} = - a T "a^j'e""^ a 2 = o^Vl-n (11) and 2 7 2 dw = 2 a J IK la (12) Hence we have that: 2 2 S,(o>) = Oj o 27r (13) S (w) = oflir/o (14) and 2 2 Therefore we obtain: P,(w) = ofl-noW e P2(CJ) ofltioW e ~" 2 e , a J (15) _ c j 2 e J a 2 (16) and = Evaluating the inverse Fourier transforms and setting e -1 yields: 323 \MT) = .25a o /7r[.25(T/a) -3(r/a) + 3 ] e " i 2 , l 2 2 5 ( T / ( 7 ) 2 (17) -4 and \p {r) = \|/,(r). As before o 2 it can be shown that the value of the derivatives of these functions at zero are given by: xjjS \0) = (-l) n n/2 i / * ( n + 4)!/[(n/2 + 2)!2 n+ 4 a ~ ] n (18) 1 for n even and are zero for n odd. The cross-covariance mean of ^ function \jj of H , and H u can be shown to be the geometric 2 and \p . Thus, we have: 2 = iMO .25a ; V7r/a[.25(T/a) -3(T/a) + 3 ] e " 4 2 2 5 ( T / a ) 2 (19) and ^ ( )(0) 12 n = ( - l ) V 7 r ( n + 4)!/t(n/2 + 2)!2 n/ n+ 4 a n + 1 ] (20) for n even and are equal to zero for n odd. One dimensional slice of a two dimensional function Consider the 2D filter with the following frequency response: H(CJ,^ ) 2 = J a*(w 1 +a)' 2 )e" By the slice-projection theorem the following Fourier . transform: (w2l+6,J2)aV2 (Mersereau (21) and Oppenheim, 1974) a slice p(x) of h(x,y) has 324 P(u) = - o<sZ W u> )c-V » ^ ^ m aj/27r[u a + l ] e " 2 2 + + 2 2 /2 (22) 2 (23) u i a V 2 2 This result, apart from a scale factor of 27ro , was derived by Grimson (1981b). The power spectrum of p(x) P (u) 2 = can now be determined and is given by: 27ra [l + 2o u + o - V ] e " 2 2 2 (24) u 2 a 2 Taking the inverse Fourier transform of the power spectrum yields the covariance function of the signal obtained by passing white noise, with unit variance, through the filter. <//(T) = ( o j / 7 r / 4 ) [ l l - 5 ( T / a ) + .25(T/0) ]e" 2 4 T2 (25) 325 APPENDIX IV - DERIVATION OF THE DEPTH FROM DISPARITY EQUATION In this appendix we derive the relationship between the three dimensional position of a physical scene point being imaged, and the two dimensional positions of the image of the scene point in the image planes of the two cameras. Figure 4.35 depicts the geometry of the imaging situation. From this diagram we can see that: tan(2/3+0,) = ( a+ xyOAl-aXj/f) r (26) where tan(2)3) Xj (27) is the horizontal position of the imaged point in the image plane of camera 1, and f is the focal length of camera 1. Similarly we have that: tan(-0 ) 2 = -x /f (28) 2 It can be seen that: X = Z x / f and Y = Zy /f 2 (29) 2 where (X,Y,Z) is the three dimensional position of the scene point being imaged. Thus the X and Y coordinates of the scene point are functions of the depth Z, the camera focal length, and the image plane coordinates of the scene point in camera 2. From figure 4.35 we can see that: d x = Ztan(-0 )+(Z-d )tan(20+0 ) 2 z 2 (30) 326 or, since d z = ad Z = d [ l + atan(20 + x > x e)]/[tan(-e ) + tan(2/3+0 )] I 2 1 (31) Hence, substituting in the expressions for the tangents we have, after some algebra: Z = where D = X ] - x 2 f (l + a ) d / [ f D + a(f + x x )] 2 : x is the disparity. 2 1 2 (32) 327 APPENDIX V - T H E CONDITIONS FOR W E L L - B E H A V E D SLICES O F 2D SCALE MAPS In this appendix we derive the conditions on a two dimensional function which ensure that a given one dimensional slice of its scale map is itself a well behaved (in the sense of Yuille and Poggio, 1983a) scale map. A well behaved scale map comes from a • scale space transform of the form: F(x,a) = where Ii} USZj(u)e~ ~ (X is some linear (33) du} U)2/2o2 differential operator in x. Now, let F (x,a) be a slice of the 2 - D scale space transform defined by equation (4.3.57) (a skew factor of 1) as follows: F*(x.o) = V ;;" f*(u,v)(aV27r)e" 2 00 [(x " u)2 + ( y _ v ) 2 ; i / 2 a 2 dudv| y = y() (34) We can rewrite this as: F*(x,o) = Og(u,x) + h ( u ) ] e ( x u ) V 2 a 2 du (35) where g(u,x) = /r f*(u,v)/(27r) [(x-u)Vo -l] oo 2 e~ ( y , r v ) 2 / 2 a 2 dv (36) dv (37) and h(u) = /r.f.(u.v)/(2ir) [ ( y „ - v ) V a - l ] 2 e" ( y °" v ) 2 / 2 a 2 We can rewrite the scale space transform of equation (33) as follows: 328 F(x.cr) = = j" r o f(u)Ue~ ( x " u ) V 2 ° }du (38) 2 /VCuMx-iOe-^-^^du (39) where p(x-u) is a polynomial in (x-u). Thus, for F (x,o) to be well to be well behaved we require that: g(u,x) + h(u) = f(u)p(x-u) (40) where f(u) is any function of u and p(x-u) is a polynomial in (x-u). This implies that: g(u,x) = [p(x-u)-l]h(u) = p*(x-u)h(u) (41) * * where p is also a polynomial. From equations (4) and (5) we can see that p can only be the following: p*(x-u) = k[(x-u) /o -l] 2 (42) 2 where k is some constant This condition means that: /ro/(u,v)e-(y°- ) v 2 / 2 a 2 dv This equation can only be satisfied f(u,v) = = k;" f*(u,v)[(y -v) /a -l]e(y°- ) c if f(u,v) is separable, 2 2 v 2 / 2 a 2 dv (43) that is if: f,(u)f (v) (44) 2 Thus we conclude that a one dimensional well behaved o o slice of a two dimensional scale map is itself a scale map if and only if the two dimensional function which produced the two dimensional scale map is separable with respect to the axis along which the slice is made. 330 APPENDIX VI - DERIVATION OF THE DIFFREQUENCY QUANTIZATION ERROR The disparity gradient 0 i is a function of the scales of corresponding scale map contours, as detailed in chapter 7. If the scale in the left image is 0 , and that in the right image is o then the disparity gradient is given by: 2 = (a./o,-].) 0, Let /3 = exacdy , l + /3j = o /o . 2 l (45) The values of the scales in our experiments are not determined but are quantized. This means that the computed value of (3 is also quantized, and is not exact disparity In this appendix we derive the probability density gradient produced by this quantization, and also function of the error compute the variance in the of this quantization error. The quantization error is given by: (a + e )/(ai + e,) - e = = 2 (46) o /Oi 2 2 ( e i O ^ d O i V t o i C a i + ei)] where ei and e 2 (47) are the quantization errors of the measured values of Oi and o . 2 We assume that e s and e are uniformly 2 distributed and independent of each other. Thus their joint probability density can be written as: P P and Ox (ei,e ) = P ClC 2 l/[(a -bi)(a -b )] 1 2 2 for ai<e!>bi is zero otherwise. The values of ai, a , bi, and b 2 and o . 2 experiments: and a < e > b 2 2 (48) 2 2 These functions 2 are functions of the true values of are as follows, for the logarithmic quantization used in our 331 = b ^ where k = value of a kN >-2 ^ ' k N + ) (49) 2 kN,_ k(N,-l/2) _ 2 k N _ k ( N , + l/2) 2 2 2 (51) kN _ k(N -l/2) 2 2 : for Ni and N Using (50) 2 6.65/255, 2 + 2 2 1 / 2 = _ b 2 the laws (52) is the quantized value of a , k N l and 2 + 2 k N z is the quantized integers. 2 of transformation of variables for probability density functions we obtain: P (e) e = J"!' P (ei,e2(e,e ))|a(e ,e2)/9(e ,e)|de CD e]e2 1 1 1 1 (53) Using equation (48) we can see that this expression can be rewritten as P (e) = 1, = max{ a!, {& -Qa )a /[ta ->ro J e /^(crj -*- e )/[(a -b )(a -b )] de 1 1 1 2 2 2 (54) where and 2 l 1 1 } (55) 12 = max! bi, (b-to^o^/leo^ + o2 ] ) and l i < l . If l i > l 2 P (e) = e for li 1]<12 then p (e) 2 £ = (56) 0. Thus we get after integration: ( l - l i X a . + (li + l )/2)/[(a -b )(a -b )] 3 2 and is zero for and 12 switch between 1,>12. their 1 1 3 (57) 2 Let us define Lj and L 2 to be the values of e for which two possible forms (equations (55) and (56)). These values are seen to be: Li = (a a,-a o )/(a,o, + ai ) U = ( b ^ - b . o O / t b . o . + o, ) 2 1 ' 2 2 (59) 2 It can be seen that, since the cases l] = (a 2 a,-eai 2 )/(a 2 + eai) occur at (and p the same time and that a i is always less than (58) l = ( b a i - e a j ) / ( a + eOi) never 2 bi, the 2 2 2 only case for which li>l 2 is zero) is when: e a i > ( b O i - e o ) / ( o - + effi) (60) bi<(b,ai-eoV)/(o + e a i ) (61) 2 2 2 2 or when: 2 Let us define L 3 and L* to be the values of e for which l i = l . 2 that these values are: It can be shown L, = (b a,-a,0 )/(a,a, + 0 ) (62) L = (a 0i-b,0 )/(a,0]+ 0 ) (63) 4 2 2 2 2 2 2 2 2 We can now obtain an expression for p g by substituting in the proper value of li and 1, in equation (?) according to the value of e. We will is very tedious and does not contribute not write down the expression much more than here as it equation (57). The variance of the diffrequency quantization error can be computed as follows: ° q = /!=o P ( ) e2 A closed form expression length. The standard e e < ) de 64 for this integral can be obtained but is not given here due to its deviation chapter 7 as a function of a 2 (square root of the variance) is plotted and 0i (where we have set o i = o / ( l + 0j)). 2 in figure 7.13 in 334 References 1) Abramowitz, M. and Stegun, I.A. 1965, "Handbook of Mathematical Functions.". Dover, New York 2) Ahuja, N. and Schacter, B. 1983, "Pattern John Models.". Wiley and Sons 3) Baker, H.H., and Binford, T.O. 1981, "Depth from edge and intensity based stereo.", Proc. 7th Int. Joint Conf. Art. Intell., Vancouver, B.C. 4) Barlow, H.B., Blakemore, C. and Pettigrew, J.D. 1967, "The neural mechanism of binocular depth discrimination.", Journal of Physiology, London, Vol. 193, pp 327- 342 5) Beutler, F.J. 1966, "Error free recovery of signals from irregularly spaced samples.", SI AM Review, Vol. 8, No. 3, pp 328-335 6) Blakemore, C. 1970, "A new kind of stereoscopic vision.", Vision Research, Vol. 10, pp 1181-1199 7) Burt, P., and Julesz, B. 1980, "A disparity gradient limit for binocular fusion.", Science, Vol. 181, pp 276-278 8) Clark, J.J., and Lawrence, P.D. 1984, "A hierarchical image analysis system based upon oriented zero crossings of bandpassed images.", in Multiresolution Image Processing and Analysis. Rosenfeld, A. ed., pp 148-168, Springer-Verlag, Berlin 9) Clark, J.J., Palmer, M.R. and Lawrence, P.D. 1985a, "A transformation method for non-uniformly spaced samples.", Accepted for publication, Processing. IEEE 10) Clark, J.J., and Lawrence, P.D. 1985b, the reconstruction Transactions "A systolic parallel processor for the edge images using the V G operator.", on Acoustics, rapid computation of functions Speech and for publication, 11) Clark, J.J., and Lawrence, P.D. 1985c, Journal of Parallel " A theoretical basis for diffrequency stereo.", Submitted for publication. and Distributed Signal of multi-resolution 2 Accepted from Computing. 335 12) Crowley, J.L. 1984, "A multiresolution representation in Multiresolution Image 169-189, Springer-Verlag, for shape.", Processing Berlin and Analysis. Rosenfeld, A. ed., pp. 13) Crowley, J.L. and Parker, A.C., 1984, "A representation for shape based on peaks and ridges in the difference of lowpass transform.", IEEE 2, pp Transactions 156-169 on Pattern Analysis and Machine Intelligence, Vol. 6, No. Vol. 6, No. 14) Crowley, J.L. and Stern, R.M. 1984, "Fast computation of the difference of low-pass transform.", IEEE 2, pp Transactions 212-222 on Pattern Analysis and Machine Intelligence, 15) Demaerschalk, J.P., Cottell, P.L., and Zobeiry, M. 1980, "Photographs improve statistical efficiency of truckload scaling.", Vol. 16) Dev, P. 1975, 10, No. 3, pp 269-277 "Perception of depth surfaces in random-dot stereograms: A neural model.", International Journal of Man-Machine Studies, Vol. 7, pp 511-528 17) Dilworth, J.R. 1975, "Log Scaling and Timber Cruising", OSU Book Stores, Corvallis, Oregon 18) Fiorentini, A. and Maffei, L. 1971, "Binocular depth perception without geometrical Vision Research, Vol. 11, pp cues.", 1299-1311 19) Freeman, H. 1974, "Computer processing of line-drawing images.", Computer Surveys, Vol. 6, pp 57-97 20) Frisby, J.P. and Mayhew, J.E.W. 1980, "Spatial frequency tuned channels: implications for structure from psychophysical and computational studies of stereopsis.", and function Philosophical 95-116 Vol. Transactions of the 21) Frisby, J.P. and Mayhew, J.E.W. 1981, "Psychophysical stereopsis.", Artificial 22) Ghosh, S.K. 1979, Intelligence, "Analytical Pergamon and Royal computational Vol. 17, pp Society studies of London towards a B, theory of 290, pp human 349-385 Photogrammetrv.". Press, New York 23) Grimson, W.E.L. 1981a, "A computer implementation of a theory of human stereo vision.", Philosophical pp217-253 Transactions of the Royal Society of London B, Vol. 292, 336 24) Grimson, W.E.L. 1981b, "From Images to Surfaces: A Computational Visual System.". MIT Press, Cambridge, Mass. Study of the Human Early 25) Grimson, W.E.L. 1982, "A computational theory of visual surface interpolation.", Philosophical Transactions of the Royal Society of London B, Vol. 298, pp 395-427 26) Grimson, W.E.L. 1984, "Binocular shading and visual surface reconstruction.". Computer Vision, Graphics and Image Processing, Vol. 28, No. I, pp 19-43 27) Grimson, W.E.L. 1985, "Computational experiments with a feature based stereo algorithm.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 7, No. 1, pp 17-34 28) Hall, R.W., 1982, "Efficient spiral search in bounded spaces.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 4, No. 2, pp 208-214 29) Hand, D.E. 1975, "Scanners can be simple.", in Modern Sawmill Techniques. Miller- Freeman, San Fransisco. White, V. (ed), Vol. 6, pp 187-196, 30) Higgins, J.R. 1976, "A sampling theorem for irregularly spaced sample points.", IEEE Transactions on Information Theory, September 1976 31) Horiuchi, K. 1968, "Sampling principle for continuous signals with time-varying bands.", Information and Control, Vol. 13, pp 53-61 32) Ikeuchi, K. 1983, "Constructing a depth map from images.", MIT Al Memo 744, Mass. Inst. Tech., Cambridge Mass. 33) Jarvis, R.A. 1983, "A perspective on range finding techniques for computer vision.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5, No. 2, pp 122-139 34) Jerri, A.J. 1977, "The Shannon sampling theorem-Its various extensions and applications: A tutorial review.", Proceedings of the IEEE, Vol. 65, No. 11, pp 1565-1596 35) Julesz, B . 1971, "The Foundations of Cvclopean Perception", University of Chicago Press, Chicago 337 36) Kotel'nikov, V.A. 1933 "On the transmission capacity of 'ether' and wire in electrocommunications.", lzd. Red. Upr. Svyazi RKKA (Moscow) 37) Kramer, H.P. 1959 " A generalized sampling theorem.", Journal of Mathematical Physics, Vol. 38., pp 68-72 38) Levine, M.D. 1978, "A knowledge based computer vision system.", in Computer Vision Systems. Hanson, A. 335-351, Academic Press and Riseman, E. (eds.) pp 26, American 39) Levine, M.D., O'Handley, D.A., and Yagi, G.M. 1973, "Computer determination of depth maps.", Computer Graphics and Image Processing, Vol. 2, pp 131-150 40) Levinson, N. 1940, "Gap and Density Theorems". American Mathematical Society Colloquim Publications, Vol. Mathematical Society, New York 41) Longuet-Higgins, M.S. 1962, "The distribution of intervals between zeroes of a stationary random function.", Philosophical Transactions of the Royal Society of London A., Vol. 254, pp 557-599 42) Lowry, A. 1984, it it M.S. Thesis, Carnegie-Mellon University, Pittsburgh, PA 43) Lu, S.Y. 1984, " A tree matching algorithm based on node splitting and merging.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 2, pp 249-256 44) Lunscher, W.H.H.J., 1983, " A digital image preprocessor for optical character recognition.", MaSc thesis, Dept. Electrical Engineering, University of British Columbia, Vancouver. 45) Lyness, J.N., " S Q U A N K (Simpson Quadrature Used Adaptively Noise Killed)", CACM Algorithm No. 379 46) Mackworth, A.K. and Mokhtarian, F. 1984, "Scale based description of planar curves.", Dept. of Computer Science Technical Report, 84-1, University of British Columbia, also in Proceedings of the Fifth National Conference of the Canadian Society for Computational Studies of Intelligence, London, Ont. 1984 47) Maffei, L. and Fiorentini, A. 1977, "Spatial frequency rows in the striate visual cortex.", Vision Research, Vol. 17, pp 257-264 338 48) Marr, D. 1974, "A note in the computation of binocular disparity in a symbolic, low level processor.", MIT Al memo no. 327, Mass. Institute of Tech., Cambridge, MA. 49) Marr, D. 1982, "Vision; A Computational Investigation Processing of Visual Information.". W.H. Freeman, San Francisco. in the Human Reperesentation and 50) Marr, D. and Hildreth, E. 1980, "Theory of edge detection.", Proceedings of the Royal Society of London B, Vol. 207, pp 187-217 51) Marr, D., Palm, G., and Poggio, T. 1978, "Analysis of a cooperative stereo algorithm.", Biological Cybernetics, vol. 28, pp 223-239 52) Marr, D., and Poggio, T. 1976, "Cooperative computation of stereo disparity.", Science, Vol. 194, pp 283-287 53) Marr, D. and Poggio, T. 1979, " A computational theory of human stereo vision.", Proceedings of the Royal Society of London B, Vol. 204, pp 301-328 54) Marr, D. and Ullman, S. 1981, "Directional selectivity and its use in early visual processing.", Proceedings of the Royal Society of London B, Vol. 211, pp 151-180 55) Marvasti, F. 1973, "Transmission and Reconstruction of Signals using Functionally Zero-Crossings.", PhD Thesis, Rensselaer Polytechnic Institute, Troy, New York Related 56) Marvasti, F. 1984, "Spectrum of non-uniform samples.", Electronics Letters, Vol. 20, No. 21, p 896 57) McClellan, J.H. 1973, "The design of two-dimensional digital filters by transformations.", Proceedings of the 7th Annual Princeton Conference on Information Systems 58) McClellan, J.H. Parks, T.W. and Rabiner, L.R. 1973, Science " A complete program for designing optimum FIR linear phase digital filters.", IEEE Transactions on Audio and Electroacoustics, Vol. 21, pp 506-526 59) Mersereau, R.M., 1979, "The processing of hexagonally sampled two-dimensional signals.", Proceedings of the IEEE, Vol. 67, pp 930-949 339 60) Mersereau, R.M. and Oppenheim, A.V. 1974 "Digital reconstruction of multidimensional signals from their Proceedings of the IEEE, Vol. 62, pp 1319-1338 reconstructions.", 61) Mersereau, R.M. and Speake, T.C., 1983, "The processing of periodically sampled multidimensional signals.", IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 31, No. 1, pp 188-194 62) Miller, K.S., 1964, "Multidimensional Gaussian Distributions.", John Wiley and Sons, New York 63) Miller, D.G. and Tardif, Y. 1970, "A video technique for measuring the solid volume of stacked pulpwood.", Pulp and Paper Magazine of Canada, Vol. 71, No. 8, pp 40-41 64) Mokhtarian, F., 1984, "Scale space description and recognition of planar curves.", MSc. Thesis, Dept. of Computer Science, University of British Vancouver, B.C. Columbia, 65) Moravec, H.P. 1977, "Towards automatic visual obstacle avoidance.", Proc. 5th Int. Joint Conf. Artificial Intell., p 584 66) Nelson, J.L, 1975, "Globality and stereoscopic fusion in binocular vision.", Journal of Theoretical Biology, Vol. 49, pp 1-88 67) Nishihara, H.K. 1983, "Hidden information in early visual processing.", Proc. SPIE, Vol. 360, pp 76-87 68) Nishihara, H.K. 1984, "Practical real-time imaging stereo matcher.", Optical Engineering, Vol. 23, No. 5, pp 536-545 69) Ohta, Y. and Kanade, T. 1985, "Stereo by intra- and inter-scan line search using dynamic programming.", IEEE Transactions on Pall. Anal, and Mach. Intell., Vol. 7, No. 2, pp 139-154 70) Paley, R.E.A.C. and Weiner, N. 1934, "Fourier Transforms in the Complex Domain", American Mathematical Society Colloquim Publications, Vol. 19, American Mathematical Society, New York 71) Pan, S.X. and Kak, A.C. 1983, "A computational study of reconstruction algorithms for diffraction tomography: Interpolation versus Filtered Backpropogation.", IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 31, No. 5, pp 1262-1275 340 72) Papoulis, A. 1966, "Error analysis in sampling theory.", Proceedings of the IEEE, Vol. 54, No. 7, pp 947-955 73) Papoulis, A. 1967, "Limits on bandlimited signals.", Proceedings of the IEEE, Vol. 55, No. 74) Petersen, D.P. and Middleton, D. 1962, "Sampling and reconstruction of N-dimensional Euclidean spaces.", Information and Control, Vol. 5, pp 10, pp 1677-1686 wave-number "Reconstruction of multidimensional stochastic measurements of amplitude and gradient", and Control, functions in 279-323 75) Petersen, D.P. and Middleton, D. 1964, Information limited Vol. 7, pp fields from discrete 445-476 76) Rice, S.O. 1945, "Mathematical analysis of random noise.", Bell System Technical Journal, Vol. 24, pp 46-156 77) Rosenfeld, A. (ed.) 1984, "Multiresolution Image Processing and Analysis", Springer-Verlag, Berlin 78) Rosenfeld, A. and Kak, A.C. 1976, "Digital Picture Processing", Academic Press, New York 79) Shanmugan, K.S., Dickey, F.M., and Green, J.A. 1979, "An optimal frequency domain filter for edge detection in digital pictures.", IEEE 37-49 Transactions on Pattern Analysis and Machine 80) Sinclair, A.W.J. 1980, "Evaluation and economic analysis of twenty-six coast of British Columbia.", FERIC Canada Technical Note TN-39, Forest Intelligence, Vol. 1, pp log sorting operations on the Engineering Research Institute 81) Slepian, D. 1964, of "Prolate spheroidal wave functions, Fourier analysis and uncertainty-IV: Extensions to many dimensions; Generalized prolate spheroidal functions.", Bell System Technical 82) Srihari, S.N. 1984, "Multiresolution 3-d in Multiresolution Journal, Processing "Optical resonator modes of the Optical 1964, pp 3009-3057 image processing and graphics.", Image 83) Streifer, W. 1965, Journal November Society and rectangular Analysis, A. Rosenfeld reflectors of spherical curvature.", of America, Vol. 55, no. 7, pp 868-877 341 84) Sugic, N. and Suwa, M . 1977, "A scheme evidence.", Biological for binocular Cybernetics, depth perception Vol. 26, pp suggested by neurophysiological 1-15 85) Tanimoto, S.L. 1978, "Regular hierarchical image and processing structures in machine vision.", in Computer Press 86) Tang, G.Y. 1982, "A Vision Systems. Hanson, A. and Riseman, E. (eds), Academic discrete version of Green's theorem.", IEEE 3, pp Transactions 242-249 on Pattern 87) Terzopoulos, D. 1982, "Multi-level reconstruction element methods.", MIT Al memo 88) Tyler, C.W. 1973, "Stereoscopic Science, 671, Analysis and Machine of visual surfaces: Mass. Inst. Tech., Intelligence, Cambridge, Mass. "Depth from spatial frequency difference: An old kind of Research, effect". 276- 278 89) Tyler, C.W. and Sutter, E.E. 1979, Vision No. Variational principles and Finite vision: Cortical limitations and a disparity scaling Vol. 181, pp Vol. 4, Vol. 19, pp stereopsis?", 359-365 90) Vadnais, C. 1976, "Raise sideboard recovery with computerized edger.", in Modern Sawmill Miller- Freeman, San Techniques. Francisco White, V. (ed.), Vol. 6, pp 154-161, 91) VanMarcke, E. 1983, "Random Fields: Analysis and Synthesis.". MIT 92) Vit, R. 1962, Press, Cambridge Massachussetts "Electronic log scaler and its application in the logging industry.", Canadian p526 Pulp and Paper Assoc., Woodlands Section, Index No. 2125 (B6), of B.C., 93) Watts, S.B. (ed.), 1983 "Forestry Handbook for B.C.". published Vancouver 94) Whittaker, E.T. 1915, "On the theory.", Proceedings by the Forestry functions which are of the Royal Undergraduate represented Society Society, by the of Edinburgh, University expansions Vol. 35, pp of interpolatory 181-194 95) Whittaker, J.M. 1929, "The Fourier theory of the cardinal functions.", Proceedings of the Mathematical Society of Edinburgh, Vol. 1, pp 169-176 342 96) Whittington, J.A. 1979, "Computer control in a chip-n-saw operation.", pill-118 97) Wiejak, J.S. 1983, "Edge location accuracy.", Proc. SPIE, Vol. 467, pp 164-169 98) Wiley, R.G. 1978, "Recovery of bandlimited signals from unequally spaced samples.", IEEE Transactions on Communications, Vol 26, No. 1, pp 135-137 99) Wilson, II.R. and Bergen, J.R. 1979, "A four mechanism model for spatial vision.", Vision Research, Vol. 19, pp 19-32 100) Wilson, H.R. and Giese, S.C. 1977, "Threshold visibility of frequency gradient patterns.", Vision Research, Vol. 17, pp 1177-1190 101) Witkin, A. 1983, "Scale-space filtering.", Proc. 8th 1019-1022 102) Woodham, R.J. 1978, "Reflectance Int. Joint Conf. Karlsruhe West Germany, map techniques for analysing surface defects in metal T.R. 457, AI Lab, Mass. 103) Yao, K. and Thomas, J.B. 1967, "On some stability expansions.", IEEE Art. Intell., Transactions Inst. Tech., Cambridge, and interpolating on Circuit Theory, properties Vol. 14, pp of nonuniform 404-408 "On nonuniform sampling of bandwidth-limited signals.", Transactions on Circuit Theory, December 1956, pp 105) Yuille, A.L., and Poggio, T. 1983a, "Scaling theorems for zero-crossings.", MIT AI memo 722, Mass. Inst. Tech., Cambridge, Mass 106) Yuille, A.L., and Poggio, T. 1983b, "Fingerprint theorems MIT AI memo for zero-crossings.", 730, Mass. Inst. Tech., castings.", Mass. 104) Yen, J.L. 1956, IRE pp Cambridge, Mass 251-257 sampling
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Multi-resolution stereo vision with application to...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Multi-resolution stereo vision with application to the automated measurement of logs Clark, James Joseph 1985
pdf
Page Metadata
Item Metadata
Title | Multi-resolution stereo vision with application to the automated measurement of logs |
Creator |
Clark, James Joseph |
Publisher | University of British Columbia |
Date Issued | 1985 |
Description | A serial multi-resolution stereo matching algorithm is presented that is based on the Marr-Poggio matcher (Marr and Poggio, 1979). It is shown that the Marr-Poggio feature disambiguation and in-range/out-of-range mechanisms are unreliable for non-constant disparity functions. It is proposed that a disparity function estimate reconstructed from the disparity samples at the lower resolution levels be used to disambiguate possible matches at the high resolutions. Also presented is a disparity scanning algorithm with a similar control structure, which is based on an algorithm recently proposed by Grimson (1985). It is seen that the proposed algorithms will function reliably only if the disparity measurements are accurate and if the reconstruction process is accurate. The various sources of errors in the matching are analyzed in detail. Witkin's (Witkin, 1983) scale space is used as an analytic tool for describing a hitherto unreported form of disparity error, that caused by spatial filtering of the images with non-constant disparity functions. The reconstruction process is analyzed in detail. Current methods for performing the reconstruction are reviewed. A new method for reconstructing functions from arbitrarily distributed samples based on applying coordinate transformations to the sampled function is presented. The error due to the reconstruction process is analyzed, and a general formula for the error as a function of the function spectra, sample distribution and reconstruction filter impulse response is derived. Experimental studies are presented which show how the matching algorithms perform with surfaces of varying bandwidths, and with additive image noise. It is proposed that matching of scale space feature maps can eliminate many of the problems that the Marr-Poggio type of matchers have. A method for matching scale space maps which operates in the domain of linear disparity functions is presented. This algorithm is used to experimentally verify the effect of spatial filtering on the disparity measurements for non-constant disparity functions. It is shown that measurements can be made on the binocular scale space maps that give an independent estimate of the disparity gradient this leads to the concept of binocular diffrequency. It is shown that the diffrequency measurements are not affected by the spatial filtering effect for linear disparities. Experiments are described which show that the disparity gradient can be obtained by diffrequency measurement. An industrial application for stereo vision is described. The application is automated measurement of logs, or log scaling. A moment based method for estimating the log volume from the segmented two dimensional disparity map of the log scene is described. Experiments are described which indicate that log volumes can be estimated to within 10%. |
Subject |
Forests and forestry -- Measurement |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0096549 |
URI | http://hdl.handle.net/2429/25582 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1985_A1 C53.pdf [ 14.87MB ]
- Metadata
- JSON: 831-1.0096549.json
- JSON-LD: 831-1.0096549-ld.json
- RDF/XML (Pretty): 831-1.0096549-rdf.xml
- RDF/JSON: 831-1.0096549-rdf.json
- Turtle: 831-1.0096549-turtle.txt
- N-Triples: 831-1.0096549-rdf-ntriples.txt
- Original Record: 831-1.0096549-source.json
- Full Text
- 831-1.0096549-fulltext.txt
- Citation
- 831-1.0096549.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096549/manifest