MULTI-RESOLUTION STEREO VISION WITH APPLICATION TO THE AUTOMATED MEASUREMENT OF LOGS by JAMES JOSEPH CLARK B.A.Sc, The University of British Columbia, 1980 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Electrical Engineering We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September, 1985 c James Joseph Clark, 1985 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make i t freely available for reference and study. I further for scholarly purposes may be granted by the head of my department or by his or her representatives. It i s understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of ELCCT/ZJCAC^ £rQ£{jj ££fU*}& The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date .Sgy-rctwb-er- 6, agree that permission for extensive copying of this thesis DE-6 (3/81) ABSTRACT A serial multi-resolution stereo matching algorithm is presented that is based on the Marr-Poggio matcher (Marr and Poggio, 1979). It is shown that the Marr-Poggio feature disambiguation and in-range/out-of-range mechanisms are unreliable for non-constant disparity functions. It is proposed that a disparity function estimate reconstructed from the disparity samples at the lower resolution levels be used to disambiguate possible matches at the high resolutions. Also presented is a disparity scanning algorithm with a similar control structure, which is based on an algorithm recently proposed by Grimson (1985). It is seen that the proposed algorithms will function reliably only if the disparity measurements are accurate and if the reconstruction process is accurate. The various sources of errors in the matching are analyzed in detail. Witkin's (Witkin, 1983) scale space is used as an analytic tool for describing a hitherto unreported form of disparity error, that caused by spatial filtering of the images with non-constant disparity functions. The reconstruction process is analyzed in detail. Current methods for performing the reconstruction are reviewed. A new method for reconstructing functions from arbitrarily distributed samples based on applying coordinate transformations to the sampled function is presented. The error due to the reconstruction process is analyzed, and a general formula for the error as a function of the function spectra, sample distribution and reconstruction filter impulse response is derived. Experimental studies are presented which show how the matching algorithms perform with surfaces of varying bandwidths, and with additive image noise. It is proposed that matching of scale space feature maps can eliminate many of the problems that the Marr-Poggio type of matchers have. A method for matching scale space maps which operates in the domain of linear disparity functions is presented. This algorithm ii is used to experimentally verify the effect of spatial filtering on the disparity measurements for non-constant disparity functions. It is shown that measurements can be made on the binocular scale space maps that give an independent estimate of the disparity gradient This leads to the concept of binocular diffrequency. It is shown that the diffrequency measurements are not affected by the spatial filtering effect for linear disparities. Experiments are described which show that the disparity gradient can be obtained by diffrequency measurement. An industrial application for stereo vision is described. The application is automated measurement of logs, or log scaling. A moment based method for estimating the log volume from the segmented two dimensional disparity map of the log scene is described. Experiments are described which indicate that log volumes can be estimated to within 10%. iii Table of Contents Abstract ii Table of Contents iv List of Figures vii Acknowledgements xv I. I N T R O D U C T I O N 1 1.1 The Stereo Vision Problem 1 1.2 Overview of the Thesis '. 8 II. F E A T U R E M A T C H I N G 13 2.1 Image Representations 13 2.2 The Feature Matching Problem 21 2.3 The M a r r - Poggio Matching Algorithm 23 2.4 Problems with the Marr-Poggio Matching Scheme 27 2.5 A Simplified Multi-resolution Matching Algorithm 34 2.6 Disparity Scanning Matching Algorithms 39 2.7 Other matching methods 42 2.8 Summary of Chapter 2 44 III. R E C O N S T R U C T I O N O F T H E D I S P A R I T Y F U N C T I O N F R O M ITS S A M P L E S 45 3.1 Introduction 45 3.2 Interpolator Methods 48 3.3 The Methods of Grimson and Terzopoulos 49 3.4 The W K S Sampling Theorem and its Extensions 57 3.5 The Transformation or Warping Method - I D Case 65 3.6 The Transformation or Warping Method - 2D Case 75 3.7 Implementation of the 2D Transformation Method 88 3.8 Including Surface Gradient Information in the Reconstruction Process 95 3.9 Summary of chapter 3 99 IV. E R R O R A N A L Y S I S O F D I S C R E T E M U L T I R E S O L U T I O N F E A T U R E M A T C H I N G 100 iv 4.1 Sources of Error in Discrete Multiresolution Feature Matching 100 4.2 Effect of Sensor Noise on Feature Position 106 4.3 Analysis of Disparity Measurement Errors due to Filtering 115 4.4 Reconstruction Error Analysis 145 4.5 Matching Error Analysis 170 4.6 Geometry Errors 183 4.7 Effect of the various errors on the multi-resolution matching algorithm 189 4.8 Summary of chapter 4 192 V. E X P E R I M E N T S W I T H T H E D I S C R E T E M U L T I - R E S O L U T I O N M A T C H I N G A L G O R I T H M S 194 5.1 Introduction 194 5.2 Implementation of the Multi-Resolution Feature Extraction 198 5.3 Frequency Response of the Matching Algorithms 205 5.4 Surface Gradient Response of the Matching Algorithms 220 5.5 Performance of the Matching Algorithms with Additive Noise 226 5.6 Comparison of the Simplified Matching Algorithm with the DispScan Multi-resolution Matching Algorithm 231 5.7 Application of Image Analysis to Log Scaling 233 5.8 Summary of Chapter 5 257 VI. S C A L E S P A C E F E A T U R E M A T C H I N G 259 6.1 Introduction 259 6.2 Matching of Scale Space Image Representations 263 6.3 Matching of Two Dimensional Scale Maps 267 6.4 Problems With Scale Space Feature Matching 271 6.5 Implications for Biological Depth Perception £ 277 6.6 Summary of Chapter 6 279 VII. B I N O C U L A R D I F F R E Q U E N C Y 280 7.1 Introduction 280 7.2 Diffrequency Measurement 282 v 7.3 Psychophysical Evidence For Diffrequency Stereo 288 7.4 Experiments 292 7.5 Summary of Chapter 7 297 VIII. C O N C L U S I O N S A N D A L O O K T O T H E F U T U R E 298 8.1 Summary and Conclusions 298 8.2 Directions for Future Work 304 Appendices 306 References : 333 vi List of Figures 1.1 Two correlated arrays of numbers which encode a three dimensional scene 3 1.2 Two views of a scene taken from different vantage points 4 1.3 The setup that produced the images shown in figure 1.2 4 1.4 The structure of the thesis 9 2.1 A summary of the topics covered in chapter 2 14 2.2 A feature pyramid 15 2.3 The response of the V 2 G filter to a step input. The zero crossings of the filtered output are seen to coincide with the edge 18 2.4 The scale map of a random one dimensional function 20 2.5 The geometry of the epipolar lines 22 2.6 The Marr-Poggio multi-resolution matching scheme 25 2.7 The three matching pools of the three pool hypothesis 26 2.8 The failure of the Marr-Poggio in-range/out-of-range mechanism for discontinuous disparity functions. After Grimson, 1985 29 2.9 The failure of the Man-Poggio in-range/out-of-range mechanism for linear (continuous) disparity functions 30 2.10 The failure of the Marr-Poggio disambiguation procedure for non- constant disparity functions 32 2.11 The operation of the multi-resolution nearest neighbour matching algorithm 35 2.12 Nearest neighbour matching 37 2.13 The processing flow of a multi-level iterative matching algorithm 38 2.14 The operation of the multi-resolution Dispscan matching algorithm 41 3.1 The topics covered in this chapter 47 3.2 Computational molecules for the relaxation surface approximation algorithm (after Terzopoulos, 1982). The thick bars indicate the boundary of the grid 54 3.3 The three special types of sample distributions handled by the reconstruction formula of Yen (1956) 63 vii 3.4 a) A function, f(t) sampled at non-uniformly distributed positions b) The transformed function, g(r), sampled at uniformly distributed positions 66 3.5 A burst type of signal, with time-varying bandwidth 70 3.6 The reconstruction of a chirp signal for uniform and non-uniform sampling 74 3.7 The hexagonal sampling lattice for functions with isotropic spectra 76 3.8 The Voronoi and Dirichlet tessellations for a set of points 82 3.9 a) A set {x,,}. b) An attempt to create a G H T from (x.J. c) Some local GHTs of the point set of a) 85 3.10 The operation of the mapping heuristic for N g =7 89 3.11 The sample locations in \ for N =7 90 3.12 The relation of the heuristic mapping efficiency in terms of sample density to the shape of the sample distribution 94 4.1 The topics covered in chapter 4 101 4.2 The perturbation of feature contours by additive noise 107 4.3 The probability density function of zero crossing position error, for o<~\, and SNR = .5, 1., 2., and 4 112 4.4 The probability density function of zero crossing position error, for SNR=1, and a f = .5, 1., 2. and 4 113 4.5 Probability of an n pixel error, for n=0, 1, and 2, given that q = l/i/2 and 0^=1, as a function of the SNR 114 4.6 The left and right scale maps of a randomly textured, tilted surface with a disparity gradient of -60/255 117 4.7 The relationship between the left and right scale maps of a tilted surface 119 4.8 A zero crossing contour of the random process F(x,a) 121 4.9 The probability density function of the disparity measurement error for k = 1, 2, 3, and 4 126 viii 4.10 The probability of an n pixel disparity measurement error as a function of the disparity gradient, given q = l/j/2, for n = 0, 1, 2, and 3 128 4.11 The left and right skew maps obtained from a randomly textured surface with a horizontal disparity gradient of -60/255 with a — 2 133 4.12 The probability density of disparity measurement error for zero crossing features, a = l , 0, = -0.1 to -0.4 138 4.13 The probability density of disparity measurement error for zero crossing features, 0, = -0.1, o = 1, to 4 139 4.14 The probability of an N pixel error for zero crossing features as a function of j3, for a = l , q = 1V2 and N = 0,1,2 and 3 140 4.15 The probability of an N pixel error for zero crossing features as a function of a for p\ = -0.1, q = l/j/2 and N = 0,1,2 and 3 141 4.16 The probability density function of the disparity measurement error for extremum features for p\=-0.1 to -0.4 with a = l 142 4.17 The probability density function of the disparity measurement error for extremum features with p\=-0.1 for a = 1,2,3 and 4 with q = l/j/2 143 4.18 The probability of an N pixel disparity measurement error for extremum features as a function of p\ for a = 1 and q = l / / 2 143 4.19 The probability of an N pixel disparity measurement error for extremum features as a function of a for /3j = -0.1 and q = l/v/2 144 4.20 The shapes of the regions of support for F(a>) and G(c3) for exact reconstruction of f(x) from its samples 147 4.21 Aliasing error in the reconstruction caused by too low a sample density 148 4.22 The effect of having an improper reconstruction filter. Note that the central repetition is partly filtered out and that parts of the other repetitions are passed by the filter 148 4.23 A plot of Slepians approximation to the optimum filter 154 ix 4.24 A plot of Slepians first approximation to the optimum filter extended past its region of strict validity 155 4.25 The average RMS reconstruction error for a Gaussian process and filter, with a = lV3, for c=5, 10, 15 and 20 164 4.26 The average RMS reconstruction error for a Gaussian process and filter, with a = lV6, for c=5, 10, 15 and 20 165 4.27 The average RMS reconstruction error for a Gaussian process and filter, with a = l/>/12, for c=5, 10, 15 and 20 165 4.28 The matching process 171 4.29 The analysis of the matching error given that the closest match to the estimated match position is the Nth feature 174 4.30 The theoretical probability of obtaining the correct match as a function of the disparity measurement error, a = / 2 178 4.31 Experimentally derived relationship between the probability of obtaining the correct match and the error in the disparity estimate for a number of different angle quantizations 179 4.32 The probability density of obtaining a matching error as a function of the error in the disparity estimate 180 4.33 The distortion of zero crossing contours for non constant disparity functions 181 4.34 The probability of obtaining the correct match as a function of the disparity estimate error for non-constant disparity functions 182 4.35 The stereo camera geometry 184 4.36 The effects of vertical misalignment on the disparity measurements 187 4.37 The action of the various errors on the matching process 190 5.1 The topics covered in this chapter 195 5.2 The spatial filtering process 199 5.3 The frequency response of the four spatial filters 203 5.4 The zero crossing pyramid of a random image pair 206 x 5.5 The variation of the RMS disparity error with the matching region size 207 5.6 The variation of the RMS disparity error with the number of resolution levels 208 5.7 The RMS disparity error as a function of the number of relaxation iterations 209 5.8 Perspective plots of the disparity function obtained using matching method 1, with the warping reconstruction method 210 5.9 RMS disparity error as a function of a g , for matching algorithm 1, maximum disparity = 5 211 5.10 RMS disparity error as a function of a g , for matching algorithm 1, maximum disparity = 10 212 5.11 RMS disparity error as a function of a g , for matching algorithm 1, maximum disparity = 15 212 5.12 RMS disparity error as a function of a g ) for matching algorithm 1, maximum disparity = 20 213 5.13 RMS disparity error as a function of a g , for matching algorithm 2, maximum disparity = 5 213 5.14 RMS disparity error as a function of a , for matching algorithm 2, maximum disparity = 10 214 5.15 RMS disparity error as a function of a g , for matching algorithm 2, maximum disparity = 15 214 5.16 RMS disparity error as a function of 0 g , for matching algorithm 2, maximum disparity = 20 215 5.17 RMS disparity error as a function of a g , for matching algorithm 3, maximum disparity = 5 215 5.18 RMS disparity error as a function of a , for matching algorithm 3, maximum disparity = 10 216 5.19 RMS disparity error as a function of o g > for matching algorithm 3, maximum disparity = 15 216 5.20 RMS disparity error as a function of a , for matching algorithm 3, maximum disparity = 20 217 xi 5.21 Perspective plots of the error maps for the three reconstruction methods, obtained for a =40 217 5.22 Perspective plots of the disparity function obtained for a =40 for the three reconstruction techniques 218 5.23 The left hand scale map for the disparity gradient experiments 222 5.24 The right hand scale map for the /3 j =—20/255 case 222 5.25 The right hand scale map for the /31 =—40/255 case 223 5.26 The right hand scale map for the /31 =—60/255 case 223 5.27 The right hand scale map for the /31 = -80/255 case 224 5.28 The measured RMS disparity error due to filtering as a function of a for disparity gradients of -20/255, -40/255, -60/255 and -80/255 225 5.29 The expected RMS disparity error due to filtering as a function of a for disparity gradients of -20/255, -40/255, -60/255 and -80/255 225 5.30 Increase in RMS disparity error as a function of added noise variance. Surface a ' 30, Iterative matching, Zero crossings only 227 5.31 Increase in RMS disparity error as a function of added noise variance. Surface a = 15, Iterative matching, Zero crossings only 228 5.32 Increase in RMS disparity error as a function of added noise variance. Surface a = 30, Iterative matching, Zero crossings and extrema 228 5.33 Increase in RMS disparity error as a function of added noise variance. Surface o —15, Iterative matching, Zero crossings and extrema 229 5.34 The pseudo-variance of the additive noise error as a function of the additive noise variance, for = y/2 230 5.35 The RMS disparity errors for the Dispscan and simplified matching algorithms as a function of a g 232 5.36 A log lying on a flat deck 236 5.37 The effects of thresholding figure 5.36 236 5.38 The result of applying an edge operator (Marr-Hildreth) to figure 5.36 238 5.39 The video log scaling system setup 239 xii 5.40 Two stereo image pairs depicting single log scenes 244 5.41 The zero crossings of the steTeo pairs shown in figure 5.40 245 5.42 The thresholded disparity maps of the log scenes 246 5.43 The approximation of the log boundary by its convex hull 248 5.44 The filled in log region 249 5.45 The detected log boundary 250 5.46 Fitting an ellipsoid to the log region 255 6.1 The topics covered in this chapter 260 6.2 Three adjacent one dimensional slices of a two dimensional scale map exhibiting non- well- behavedness 268 6.3 A linked coarsely quantized scale map (j/2:l a ratio) 272 6.4 Splitting and merging of scale map contours for nonlinear disparities 274 6.5 A stereo pair of scale maps with sinusoidal disparity 274 6.6 The scale maps of a real stereo image pair 276 7.1 The topics to be covered in this chapter 281 7.2 The spatial organization of a foveal image representation 284 7.3 The transformed version of the foveal image representation of figure 7.1 284 7.4 The relationship between the left and right foveal scale maps 286 7.5 The relationship between the left and right scale maps for /30 = 0 287 7.6 The relationship between the left and right foveal scale maps for /30 = 0 287 7.7 A pair of ambiguous sinusoidal stimuli 289 7.8 The diffrequency search paths for a logarithmically scaled a axis 293 7.9 The RMS diffrequency error as a function of a for p\= -20/255, linear disparity 294 7.10 The RMS diffrequency error as a function of a for p\= -40/255, linear disparity 295 7.11 The RMS diffrequency error as a function of a for j3 x = -60/255, linear disparity 295 7.12 The RMS diffrequency error as a function of a for 0!= -80/255, linear disparity 296 xiii 7.13 The standard deviation of the diffrequency quantization error 296 2.1 The diamond search path and the five edge types 313 2.2 Examples of edges in the five edge modes 314 xiv ACKNOWLEDGEMENTS The production of a thesis does not occur in a vacuum. Without the support and interaction of a number of people and organizations this particular thesis would never have been completed. I owe a great deal to my supervisor, Dr. Peter Lawrence, whose never-ending enthusiasm and confidence in my work gave me the encouragement needed to successfully undertake this research. I would like to acknowledge the advice provided, at various times, by Dr. Allan Mackworth of the Computer Science department at U.B.C. The fellow students with which one has the opportunity of working with provide much of the intellectual and social interactions that one requires if they are not to emerge from their studies narrow-minded and lacking in basic tennis skills. I have had the pleasure of having many fine people as colleagues. I would especially like to acknowledge the contributions, both social as well as intellectual, of Brian Maranda, Richard Jankowski, Nick Jaeger, Norman Beaulieu, Kevin Huscroft and Jim Reimer. As well, I must acknowledge some of the Computer Science students that have shown me the view from their side, as well as keeping me entertained on Friday afternoons. Barry Brachman has been a friend as well as a colleague, and Jim Little and Marc Majka have shown me the computational side of Computer Vision. I would also like to thank the support given by the Forest Engineering Institute of Canada, particularly Verne Wellburn and Alex Sinclair. The B.C. Science Council funded the research described in this thesis with a grant and provided me with a much appreciated G.R.E.A.T. scholarship. xv Finally, I would like to thank my mother and father for being behind me all the way through my long studies, and not making me become a welder or something like my high school counselor suggested. xvi 1 I - INTRODUCTION "They came to Bethsaida and some people brought him a blind man whom they begged him to touch. He took the blind man by the hand and led him outside the village. Then putting spittle on his eyes and laying his hands on him, he asked, 'Can you see anything?' The man, who was beginning to see, replied, 'I can see people; they look like trees to me, but they are walking about.' Then he laid his hands on the man's eyes again and he saw clearly; he was cured and he could see everything plainly and distinctly."1 1.1 - The Stereo Vision Problem Like the blind man in the quotation that begins this thesis, the machines of man's creation have recently acquired the gift of sight. This gift allows these machines the ability to perform tasks unheard of a scant decade ago such as being able to autonomously manoeuver in a loosely constrained environment, visually inspect industrial components for defects, locating and tracking objects for the purpose of manipulating them, and just plain seeing what's out there. However, unlike the benefactor of the above quotation, human engineers have somewhat less than divine powers. As a result our machines currently operate in a mostly black and white, blurred, and two dimensional world and don't really understand what they are looking at unless it is explained to them by a human. Most vision systems that are used in industrial applications at this time cannot determine the three dimensional position of an object except under the most contrived conditions. It will be a while before we see robots with spatial perception good enough to enable them to play baseball (not to mention the other skills involved). There have been methods developed, such as structured light and laser ranging (see the survey article by Jarvis, 1983) which can accurately determine the three dimensional positions of objects. These techniques, however, require a very constrained environment and are active, which means that they alter their visual environment in some fashion (such as by projecting patterns of light on to the scene). Being taken out of this constrained environment and placed into a less constrained one significantly reduces the ability of these systems. The Mark 8:22-25, Jerusalem Bible 2 fact that a system uses active methods is not necessarily a drawback, and such methods are often used by biological systems (the bat's echo-location is a good example). From an engineering point of view, an active sensing system must be judged on its size, power consumption, efficiency and its effect on the environment. However, for visual perception of depth the method of choice for biological systems is passive. Much current research into visual processing is centred around the determination of how biological passive depth perception methods function and the design of similar methods tailored for use in machines. Despite some encouraging advances in our understanding of such methods (due in large part to the efforts of the late David Marr and his co-workers at the Massachusetts Institue of Technology) stereo vision systems that can operate well in loosely constrained environments have not yet been developed. It has been the author's experience with laymen, as well as educated people not acquainted with the computer vision field, that they greatly underestimate the difficulties involved in the perception of depth. The usual argument that they present is that the human visual system can perceive depth so effortlessly that they, quite naturally, assume that the processes involved must be fairly simple. One way in which to answer these people is to present them with two arrays of numbers, such as those in figure 1.1, and ask them to determine the depth of the object which gave rise to these arrays of numbers. This simple example points out the fact that the human visual system actually contains a very complex processing mechanism that is tailored to process spatial imagery such as that caused by patterns of light falling on our retinae. This same mechanism operates very inefficiently when presented with stimuli such as the array of numbers depicted in figure 1.1. In fact the only way a human can detect depth in this pair of arrays is to bypass the visual processing centres altogether and try and determine the correlation between the two arrays with the use of the higher cognitive centres of the brain, which are ill adapted for such computations. In doing this sort of experiment, a human begins to appreciate the complex processes that are actually involved in stereoscopic depth perception. 3 67 67 68 71 71 74 68 61 66 62 65 67 69 81 87 94 96 99 96 87 68 70 74 73 73 79 78 76 73 67 64 64 60 63 62 70 79 85 95 99 73 77 76 72 69 76 81 80 81 73 68 65 64 61 50 46 41 52 64 78 67 76 73 72 66 69 69 69 73 71 72 72 65 60 49 48 46 49 50 50 56 65 65 58 53 61 65 67 73 72 74 76 72 72 63 63 55 55 54 51 44 65 78 71 46 34 32 42 56 62 69 73 73 81 81 85 77 76 68 59 49 50 65 80 76 70 43 24 24 25 36 49 57 69 72 78 78 82 83 77 55 58 56 58 56 71 81 65 42 23 20 20 25 38 45 54 62 68 75 77 48 52 61 61 48 60 70 74 77 59 42 30 22 19 17 25 31 46 59 65 47 45 56 63 62 67 62 59 65 71 73 64 48 40 31 26 21 20 23 34 49 50 57 58 56 66 67 66 67 61 67 75 77 77 63 58 45 38 28 22 49 48 54 57 52 56 56 56 62 62 68 68 67 76 76 75 69 62 62 56 46 45 47 48 48 52 50 48 52 51 61 65 63 60 56 54 44 43 47 58 41 44 49 49 44 49 46 44 45 45 45 50 54 51 34 25 20 18 21 27 44 45 52 48 47 49 47 45 48 42 44 46 49 50 45 42 32 21 17 18 47 49 54 50 48 52 49 47 48 44 44 43 47 50 49 54 48 38 28 27 48 45 52 49 49 53 51 48 49 45 52 50 51 54 53 58 53 49 4 4. 41 50 51 54 52 51 55 53 50 51 48 54 51 53 51 48 52 50 44 45 48 57 50 57 55 54 59 58 54 56 52 57 55 53 53 49 52 49 46 42 47 53 46 52 51 54 58 58 57 57 54 60 57 57 55 52 57 53 50 52 53 LEFT 26 28 27 29 27 26 27 28 27 25 23 19 15 13 12 13 13 14 15 14 24 25 26 26 25 24 25 25 26 26 24 23 19 16 15 16 15 15 15 11 19 20 22 23 25 24 22 21 21 21 20 20 20 20 20 20 17 16 14 13 16 17 17 19 19 20 21 19 18 18 17 19 18 20 22 22 22 20 17 14 11 12 13 14 15 15 16 16 17 19 17 17 18 16 17 20 22 21 19 19 9 9 9 11 12 13 13 15 16 15 15 16 15 15 17 18 18 20 16 14 11 10 6 8 10 8 9 9 11 13 12 14 12 13 15 18 18 18 16 14 11 9 9 8 9 11 10 8 8 10 9 10 12 13 14 14 14 15 16 14 13 12 9 10 11 10 11 10 9 9 8 8 7 8 10 13 14 15 14 14 12 11 10 7 8 10 11 10 9 8 8 9 8 6 8 8 9 11 12 11 13 13 11 10 10 9 8 9 8 8 9 9 8 7 5 5 5 6 7 6 16 15 13 9 12 12 11 9 8 10 12 11 8 7 6 7 6 6 6 6 19 17 16 13 11 15 14 12 12 12 13 11 9 10 10 8 9 11 10 7 25 21 19 13 12 15 16 16 14 13 13 13 11 10 11 12 10 11 10 9 26 24 23 22 19 18 17 16 15 15 14 13 12 12 13 13 12 10 11 9 27 21 20 24 23 21 20 17 16 14 14 14 13 12 12 10 11 10 10 10 29 25 17 19 24 24 24 23 19 18 16 14 13 11 11 8 4 7 9 10 30 31 26 16 18 25 26 26 25 24 22 21 16 12 11 5 0 5 9 11 34 34 33 25 17 20 25 27 28 26 25 21 16 9 7 3 1 6 4 2 31 32 33 33 24 17 17 21 21 20 15 14 9 5 1 1 3 2 0 0 RIGHT FIGURE 1.1 Two correlated arrays of numbers which encode a three dimensional scene In order to understand the difficulties involved in stereoscopic depth perception one must first understand the process by which depth can, in principle, be measured. Consider the pair of images shown in figure 1.2. These are two views of the same scene as seen from two different vantage points. A schematic description of the situation is depicted in Figure 1.3. Looking closely at these images we notice that the image of the same physical point (such as the lower right corner of the telephone for example) occurs at different points in the two images. Even closer inspection reveals that this difference in position is not the same for all FIGURE 1.2 Two views of a scene taken from different vantage points. FOCAL PoiNT I M A G E P L f l M E FIGURE 1.3 The setup that produced the images shown in figure 1.2. physical points in the scene. In fact it can be seen from figure 1.3 that the farther away a 5 physical point is from the cameras, the smaller is the difference in position, or disparity. This is the basic principle which allows us to measure depth given two images of a scene taken from two different positions. As Marr (1974) has stated, there are three steps that are required for the determination of depth in this manner to proceed. These are: 1. A point in one of the images corresponding to an actual physical event must be found. 2. The corresponding point must be found in the other image. 3. The disparity between these corresponding points is measured and, given the geometry of the imaging process, the depth to the point in space giving rise to the physical event is computed. The second of these steps, commonly refered to as the correspondence problem, is the step that has proven the most difficult to implement. The problem, reduced to its most basic form, is how can we distinguish one point in an image from all of the other points in the image? Part of the answer to this question lies in the first step of the stereo depth perception process, namely the requirement that the features that we try to match between the two images arise from unique physical events. As Marr and Poggio (1979) point out, this rules out the use of image intensity as a matching feature since a physical event cannot be uniquely associated with a given image intensity due to the fact that there are many physical events which give rise to that image intensity. Marr and Poggio (1979) suggest that other features which are more directly related to physical events, such as sharp changes in image intensity be used. However, even if one can find a set of features that can be uniquely associated with physical events, the correspondence problem is still not solved. This is because of the inherent ambiguity of these matching features. Unless very complex features such as object descriptions (chair, telephone etc.), implying that an immense amount of processing has already been done to extract these features, are used, the features to be matched are to some extent ambiguous. This means that two independent features may look identical, (even if they result from totally separate physical events) and can be confused by the system that is 6 attempting to find correspondences. One of the main challenges that stereo vision researchers have faced is in reducing this feature ambiguity without excessively increasing the amount of processing required to find the features. The Multi-Resolution Paradigm For most scenes the range of shifts or disparities between the two images of that scene can be bounded. This means that any search for corresponding features can be limited to a certain finite region. Now, if the density of the features were such that there were few features in this matching region then the correspondence problem would not be very difficult. Thus one possible method for determining correspondences involves matching scene features which have a low density. Alternatively one can limit the size of the matching region, which means limiting the range of disparities that the system can handle. Marr and Poggio (1979) devised a scheme whereby one could use features with a high density and handle a large disparity range. This type of scheme was first used in the work of Moravec (1977). The technique used by Marr and Poggio, which was intended to model the way in which human (and some other biological systems) performed stereoscopic depth perception, involved using a multi-resolution feature set Such a set consists of collections of features of various densities. The low density, or low resolution, feature set can be matched over a large range of disparities, but only a sparse set of depth values is obtained. These values, however, can be used to guide the estimate to the next dense (higher resolution) feature set, which can be matched over a smaller range of disparities, and which provide a denser set of depth values. This process can proceed to higher and higher resolutions. The net result of this algorithm is that a dense set of depth values can be obtained over a large disparity range. However, this explanation has been oversimplified, and like many things, the process is not a simple as it seems. There must be a way of transferring information from one resolution level to another, which was not addressed by Marr and Poggio in their proposal of the multi-resolution method. The matching algorithm proposed by Marr and Poggio, which we describe in detail in chapter 2, is seen to have problems with non-constant disparity functions. 7 Goals of the thesis The objectives of this thesis are four-fold. First we wish to develop a multi-resolution stereo matching algorithm that is potentially rapid enough for real time applications. Secondly we want to analyze the component parts of the resulting algorithms to see where and how errors are introduced, and how these errors affect the performance of the matching algorithms. Thirdly, we wish to test the algorithm thoroughly and apply it to an industrial task, that of log scaling. Our final goal is to understand the mechanisms that are most important to the performance of multi-resolution matching algorithms and propose other algorithms which may offer improved performance. 8 1.2 - Overview of the Thesis This section briefly describes the layout of the thesis and points out the contributions that each chapter makes to the attainment of the goals of the thesis. Figure 1.4 depicts the topics to be covered in the thesis, where they can be found, and the relationship between them. In addition, at the end of each chapter we present a summary of the most important points raised in that chapter. Chapter 2 begins with a discussion of multi-resolution image representations. The Marr-Poggio multi-resolution algorithm (1979) is examined in some detail. We point out difficulties involved with the Marr-Poggio method when the disparity function is not constant It is shown that, to handle non-constant disparity functions, the low resolution disparity information must be accurately passed to the higher resolutions. Based on this idea we propose a modification to the Marr-Poggio algorithm. A different sort of matching algorithm, based on some recent work of Grimson (1985) is described, which involves scanning through a large disparity range for possible matches. Grimson's method is essentially single-resolution, using lower resolution information only to disambiguate competing matches. We show that this method also has difficulty with non-constant disparity functions. We propose a modification of this algorithm which allows non-constant disparity functions to be handled. For the multi-resolution methods discussed in this thesis, the projection of disparity data from lower resolutions to higher is seen to be of paramount importance. Thus, in chapter 3, we discuss, at some length, various methods by which this can be done. Grimson (1981b, 1982) has also discussed this problem, but only in the context of 'filling in' the gaps between the disparity values at the highest resolution, rather than the process of reconstructing the disparity function at the lower resolutions, which is what we are concerned with in this thesis. We show that the assumptions implicit in Grimson's reconstruction method are not entirely valid for the lower resolutions. Because of this we look at other possible methods for the disparity function reconstruction. This search led us to examine methods based on sampling theory; that is the theory of reconstructing analytic functions from their samples. One 9 Scale space matching Image representations CH.Z Feature matching 1 CW-fo Binocular Diffrequency measurement CH.2 Disparity function reconstruction Error analysis CH.3 CH.4 Experiments I CH.5 Application to Log scaling < CH. & Conclusions CH.8 CH"? FIGURE 1.4 The structure of the thesis. of the problems that we encounter is that the disparity samples we obtain are distributed non-uniformly, whereas most reconstruction methods based on sampling theory require the 10 samples to be on regular lattices (such as rectangular or hexagonal). We therefore develop a method, based on the sampling theory for uniform sample distributions, which allows the reconstruction of functions from non-uniformly distributed samples. We present one and two dimensional versions of this method, and give a computational algorithm for the two dimensional case. This method has a wider domain of applicability than the stereo vision problem of course; it has been succesfully applied to the reconstruction of synthetic aperture radio telescope imagery (Clark, Palmer and Lawrence, 1985). We also discuss how this method can be altered to handle the inclusion of disparity gradient, or surface normal information in the reconstruction process. It has been suggested (e.g. Ikeuchi, 1983) that the addition of such independent information about the surface will result in a more robust system for obtaining the three dimensional structure of an object Chapter 4, in some respects, forms the heart of the thesis. In this chapter we analyze the various processes by which errors can arise in the determination of the depth function. These errors can be partitioned into four distinct types. The first type is an error in the measured position of the features. This can result from sensor noise, position quantization, camera misalignment as well as from a heretofore unreported effect involving spatial filtering of images of tilted surfaces. The second class of errors is the matching errors. These are the errors which arise from incorrect matching of ambiguous features. We show, for our simplified algorithm, that the matching error is sensitive to errors in the disparity estimate obtained from the lower resolution levels. It is shown that using more complex features reduces the matching error. However, it is shown that non-constant disparity functions can cause distortion in the parameters (such as edge orientation) of the features which may cause them to be matched incorrectly. Experimental studies are described which show that this distortion causes an increase in the matching error. The third class of errors consists of those incurred in transferring disparity information from one resolution level to the next The chief component of this error is the error in 11 performing the reconstruction of the disparity function from the disparity samples at a given resolution. The fourth and last type of error described is the error inherent in the computation of the depth values from the disparity values. In general, these errors depend on the accuracy to which the various parameters of the imaging process, such as the baseline between the sensors, relative sensor tilt, focal length, etc., are known. If the values of these parameters are in error, then so will the depth values. This is an important consideration for the many industrial applications (such as the log scaling application described in chapter 5) which require accurate measurements. Chapter 5 describes the results of a number of experiments designed to check the analyses done in chapter 4. We run the multi-resolution matching algorithms proposed in chapter 2 on a number of different synthetic image pairs. These image pairs are created from arrays of uncorrelated (white) normally distributed random numbers. The disparity functions that are used are gaussian cylinders, that is constant disparity parallel to one axis and varying with a gaussian shape parallel to the other axis. The spread (or variance) of this gaussian disparity function is varied, which has the result of changing the bandwidth of the disparity function. Chapter 5 concludes with an example of the application of stereo depth perception to a typically messy industrial application; that of measuring or scaling lumber in a lumber sort yard. We show that stereo vision provides a feasible solution to the automation of this process. The steps involved in this automated log scaling procedure are outlined and examples of this process on actual images are provided. Estimates of the log measurement accuracy obtainable are given. Chapters 6 and 7 address alternative methods for obtaining disparity information. These methods, which are based on scale space image representations, are immune to some of the problems encountered by the matching techniques proposed in chapter 2. 12 Chapter 6 considers the idea of matching scale maps (defined in chapter 2). Chapter 7 discusses a recently proposed binocular process, that of diffrequency measurement Introduced by Blakemore (1970), this process involves measuring, in some fashion, the difference in the spatial frequency content between the two images. Using the scale space transform as an analytic as well as a representational tool, we give this idea a firm mathematical basis. It is shown that measurements of the diffrequency can be made on the binocular scale space image representation and that these diffrequency values are directly related to the disparity gradient of the surface. The final chapter (8) contains a discussion of the results obtained in the previous chapters and makes some conclusions. We provide possible directions for future research, based on the unanswered questions that the thesis raises. 13 D - F E A T U R E M A T C H I N G 2.1 - Image Representations In this chapter we discuss ways in which a pair of images can be matched to yield the correspondence between them. A summary of the topics covered in this chapter is given in block diagram form in figure 2.1. The bulk of the sections marked with '* ' contain new material. In order that image pairs be matchable it is necessary to create a symbolic representation of the images in terms of features, such as localized intensity changes. These features should correspond to distinct physical events in the scene. In this regard, features such as localized image intensity changes (often mistakenly referred to as edges) usually correspond to such physical events in the scene as shadows and other discontinuities in illumination, discontinuities in the surface orientation of objects, surface markings and so on. Thus these features will be adequate for use by the matching process. Features such as the raw image intensity levels are not adequate for use by the matching process since they cannot be uniquely associated with a given physical event in the scene. These so-called gray level features also exhibit a higher degree of ambiguity than do edge features. That is, in a given region of an image there are more indistinguishable gray levels than there are indistinguishable edges. In order for a given type of feature to be useful in a multi-resolution matching scheme, one must be able to create a multi-resolution image representation comprised of these features. In general, such a representation consists of a set of single resolution descriptions, each at a different resolution. The feature density generally (but not neccessarily) decreases as the resolution decreases, thus reducing the matching ambiguity (as there are less features to be confused with each other). A popular multi-resolution image representation is the feature pyramid (also known as the feature cone) (Tanimoto, 1978, and Levine 1978). A feature pyramid is depicted in figure 14 Introduction Image representations I CH.l.\ Feature Matching CH2.I CH 2-1 The Marr-Poggio matching algorithm CH. 23 A simplified Problems # method with the CW.3-S Marr-Poggio ^ approach CU.2.4 Summary Disparity Scan « algorithms CH.2.6 (*) Indicates New Material FIGURE 2.1 A summary of the topics covered in chapter 2. 2.2. A feature pyramid is made up of a number of single resolution feature maps, each 15 FIGURE 2.2 A feature pyramid. a resolution that is some fraction of the preceeding level. Each level is spatially quantized at a level proportional to its resolution. That is, the lower the resolution, the coarser is the quantization. This means that there is less information at the lower resolution levels, resulting in the characteristic pyramid shape. Since there is less data at the lower resolutions than at the higher resolution, processing takes place much more quickly at the lower resolution levels. If one can use the information gained from these low resolution processes to guide the processsing at the high resolutions, then the amount of processing required at the higher resolutions may be reduced. In this way an overall saving in computation over a single high resolution process may be achieved. In the case of stereo image matching, one can perform a matching search over a large range cheaply at low resolutions and use the disparity estimates so obtained to limit the required size of the search region at the higher resolutions. Grimson (1985) discusses the savings in computation possible from such a stereo matching scheme over a single resolution matching scheme. 16 Feature pyramids are distinguished by three factors; the ratio between succesive resolutions, the type of feature encoded in the pyramid, and the way in which the information at the various resolutions are obtained. If the ratio of succeeding resolutions is 2:1 the resulting pyramid is known as a quadtree. Applications and descriptions of quadtrees can be found in (Rosenfeld, 1984). Quadtrees can be extended to three spatial dimensions. These three dimensional structures are known as oct-trees (Srihari, 1984). Some implementations of feature pyramids obtain the lower resolution levels by averaging the information contained in the higher resolution levels in some way. Other implementations obtain each resolution level by independent means. As will be seen, the stereo algorithms to be described later acquire the multi-resolution feature maps by operating on the information in the higher resolutions. Often multi-resolution feature descriptions have a constant spatial quantization at all feature resolutions. The resulting structure is strictly not a pyramid, and loses the advantage of having less data to process at the lower resolutions. These types of structures are used when one wishes to have a high spatial resolution, along with a low feature density. One can also envisage a multi-resolution image representation wherein the difference in resolution between successive levels is vanishingly small. In such a case we would have a continuum of resolution levels. Such an image representation has been called a scale space representation (Witkin, 1983), where the term scale space refers to the continuum of resolution values. The term scale space originally was meant to apply to the case of the multi-resolution zero crossings of V 2 G filtered images (these terms will be defined shortly). However, the concept can, and should be, extended to cover any type of feature for which a continuum of resolutions can be defined. We have stated that features based on localized changes in image intensity are suitable features for a stereo matching system. How can we obtain a multi-resolution image representation based on these features? This question was addressed by Marr and Hildreth (1980), who proposed an edge operator that was able to detect edges at a given scale or 17 resolution. By altering a parameter in their operator, edges at different resolutions could be detected, or localized. This operator consists of three parts. The first part involves smoothing the image by convolving it with a gaussian low pass filter. This allows different resolutions to be achieved by changing the cutoff frequency of the low pass filter. The second part of Marr and Hildreth's multi-resolution edge detection operator involves taking the second spatial derivative (or in two dimensions taking the Laplacian) of the low pass filtered signal. The final step is to determine where the differentiated signal passes through zero. Such a point is called a zero crossing, and for linear image intensity changes locates the edge exactly. The filter is popularly known as a V 2 G filter, from its component parts. In two dimensions the V 2 G filter's impulse response is written: V 2G(r) = - ( l / i r o - 4 ) [ l - r V 2 a J ] e " ( r V 2 c 7 2 ) (2.1.1) where r2 = x2 + y2. The factor a in the filter response is the scale factor. Increasing o reduces the resolution of the resulting edge representation. The operation of the V 2 G edge detector is shown in figure 2.3 on a step edge signal. An edge pyramid can be built up using the Marr-Hildreth edge operator by filtering an image with the V 2 G filter for a number of different values of a, and then finding the zero crossings. We will call the resulting structure a zero crossing pyramid. In order to reduce computation and storage requirements in our stereo system, we quantize the zero crossing position to a level directly proportional to the a value of the V 2 G filter at each resolution. Thus the positions of low resolution edges are specified less accurately than are edges at the high resolutions. A scale space image representation can be obtained using the V 2 G filter by applying the filter to an image at a continuum of a values. In fact this was the original definition of the scale space. We can write this operation as an integral transform, which we call the scale space transform, as follows (for the one dimensional case): 18 i I STEP INPUT i i i •-t LARGE CT ? f ) ( ~ \ J V^G OUTPUT &L——^— »t ^ — ^ SMALL cT FIGURE 2.3 The response of the VJG filter to a step input The zero crossings of the filtered output are seen to coincide with the edge. F(x,a) = dVdx2 /".RujaV/5Jr e"(x-u)V2aJdu (2.1.2) F(x,a) is said to be the scale space transform of f(x). The above equation is seen to be a convolution of f(x) with the function dVdx2 aV/27re~^x _ u^/ 2 a , which is the impulse response of the one dimensional V2G filter. This one dimensional scale space transform can be straightforwardly extended to handle higher dimensions. This is discussed in more detail in chapter 4.3. The function G(x,o), obtained as follows: G(x,a) = 1 if F(x,o) = 0, and zero otherwise (2.1.3) is called the scale map function of f(x). It is a binary function that is zero everywhere except at the zeroes of the scale space transform. The scale map of a random function is shown in figure 2.4. The a scale is logarithmic. Note how, as the resolution decreases (i.e 19 increasing a), the density of the zero crossing contours also decreases. The two image representations described here, the zero crossing pyramid and the scale map, will be the representations that we will be using in our development of the stereo matching process in the rest of the thesis. FIGURE 2.4 The scale map of a random one dimensional function. 21 2.2 - The Feature Matching Problem Once the multi-resolution image description has been built for the two images in the stereo pair, we must match them to obtain the disparity function between corresponding points in the two images. The disparity function, which is usually required to be known at all points in the image, not just at those points coincident with a feature, is defined as follows. Suppose the intensity functions in the two images are given by f(x^) for the right image and gCxjJ for the left image. If we assume perfectly matched sensors and ideal viewing conditions then we can write: f(x R ) = g(xR-rf(xR)) (2.2.1) where d(x^) is the disparity function. Note that d is a vector quantity. That is, it has two components. This would seem to imply that any search for a match would require searching over a two dimensional space, so that both components of the disparity vector could be determined. However, it can be shown that the two components of the disparity vector are not independent of each other. This means that we can reduce the dimensionality of the search to one dimension. This dependence between the components of the disparity vector is inherent in the so-called epipolar constraint. The idea behind this constraint can be seen in figure 2.5. If one forms a plane defined by the inter-ocular axis (the line between the focal points of the eyes or cameras) and the. direction of view of either of the cameras (i.e. the vector from the camera focal point through the scene point being imaged), then the intersection of this plane with the image planes of the two cameras define two lines, one in each camera's image plane. The importance of these lines, called the epipolar lines, is that any feature on the epipolar line in one image has its corresponding feature on the associated epipolar line in the other image. Thus, to perform the search for a match, one need only search along the epipolar line in the other image associated with given view direction. Note that, in general, the epipolar lines change as the view direction changes. This means that one must determine where the epipolar lines are before the search is begun. To determine the 22 IMAGE OPTIC A X E S F I G U R E 2.5 The geometry of the epipolar lines. epipolar line for a given view direction requires accurate knowledge of the camera geometry, such as the inter-ocular axis. If this is not known then a (possibly limited if a coarse knowledge of the camera geometry is known) two dimensional matching search must be employed. A particularly simple epipolar geometry results if the optic axes of the two cameras are parallel. In this case the epipolar lines are all parallel to each other and are horizontal. This is the assumed geometry for most experiments in stereo vision, and in this thesis we will be no exception. Even if the camera geometry is not such that horizontal epipolar lines are produced one can in principle rectify the images by some coordinate transformations so that in the modified image space the epipolar lines are horizontal. 23 2.3 - The Marr-Poggio Matching Algorithm Marr and Poggio (1979) proposed a multi-resolution matching mechanism which was intended to model the way in which the human visual system performed stereo matching. This method was subsequently implemented by Grimson (1981a). The operation of this algorithm was as follows. A mulu-resolution image representation comprised of the zero crossings of a V 2 G filtered image at four different resolutions is constructed. The values of o for these four filters are 3, 6, 12, and 24 picture elements (pixels). The orientation of the zero crossings are quantized to 3 0 ° levels. The positions of the zero crossings are measured to within one pixel, and the same spatial resolution (pixel size) is used at all four levels (hence, the resulting structure is not a pyramid). The pair of multi-resolution image representations are then matched. The matching in the Marr-Poggio scheme proceeds in parallel at each resolution. Search for matching zero crossings is done along the epipolar lines. Matching zero crossings must have the same contrast sign (i.e. light to dark or dark to light) and have roughly the same orientation. The region over which a match is searched for is limited in size to reduce the possibility of obtaining ambiguous matches (that is, more than one possible match in the search region). Marr and Poggio have shown (Marr and Poggio, 1979) that if the size of the matching region is less than V2o, then the probability of getting more than one match in the search region is very low. Because the size of the search region is limited to V2o, the maximum disparity that can be measured at a given resolution level is 4\/2o. Thus, since a decreases as the resolution increases, the maximum disparity range at the high resolutions is very small. Conversely, at low resolutions the disparity range is relatively high. The Marr-Poggio method detects when a given high resolution search region is out of range of the true disparity (this means that the true match does not lie in the region in which we are searching). When this happens, the lower resolutions are examined to see if they are in range. If so, the disparity information obtained by the low resolution searches are used to shift the search region in the 24 higher resolutions so as to bring the search region into range of the actual disparity. This process is depicted in figure 2.6. The Marr-Poggio method determines whether or not a given search region is in range with the following probabilistic scheme. If the search region is out of range then the probability of there being at least one match in the search region was shown by Man and Poggio to be on the order of 0.7. If the search region is in range, the probability of there being at least one match in the search region approaches 1. Thus, by examining the percentage of zero crossings that have possible matches in a neighbourhood one can determine whether or not the search region is in range of the true disparity in that neighborhood. Clearly, the neighborhood in which the statistics are tabulated must be large enough to provide a meaningful estimate of the match proportion. If the proportion of matches in a neighborhood falls below a certain threshold (say .8) then the neighborhood is declared out of range and all matches are rejected. The lower resolutions are then examined to see if they are in range. If they are then the disparity information provided by these low resolutions are used to shift the search region at the high resolution and the matching process is repeated until the search region is in range at the high resolution for all points in the image. It can happen that there is more than one possible match in the search region (even when the search region is in range). Since it is obvious that not all of these possible matches can be the correct one, unless we can disambiguate these possible matches, that is, find out which one is the correct one, we will incur an error in the match. Marr and Poggio perform the disambiguation by examining whether the measured disparity, with reference to the current disparity estimate, is positive, negative or zero. This measure is then compared to the dominant disparity sign in a neighborhood about the feature being matched. If the sign of the possible match agrees with the dominant match then it is chosen as the correct match. If none of the possible matches agree with the dominant disparity sign, or if there is more than one match that agrees with the dominant disparity sign then no match is made. This is known as the three pool matching hypothesis as it assumes that there are only three types of disparity detectors, divergent (-), convergent ( + ) and null (0) which are 25 LOW R E S M A T C H I N G ^ do-RANGE H i ' R E S . MATCHING R A N G E _ D I S P A R | T Y - F U N C T I O N H I - R E S IN R A N G E F I G U R E 2.6 The Marr-Poggio multi-resolution matching scheme. broadly tuned as to disparity and are grouped into three pools. The scheme is depicted in figure 2.7. The three pool mechanism is one model that has been proposed for the human visual system (Marr and Poggio, 1979, p323). Grimson (1985) suggests that information from the lower resolutions can be used to disambiguate between possible matching candidates. This involves choosing the match whose disparity is in agreement with the disparity information in a neighborhood about the feature under question at the lower resolution. If the low resolution information cannot disambiguate the matches (as for example would be the case if the low resolution disparity varied within the neighborhood) then no match at all is made. 26 MATCHING ) RANGE ( d CONVERGENT POOL N U L L POOL DIVERGENT POOL FIGURE 2.7 The three matching pools of the three pool hypothesis. 27 2.4 - Problems with the Marr-Poggio Matching Scheme There are a number of problems with the Marr-Poggio matching scheme, especially with regard to matching images with non-constant disparity functions. We describe these problems in this section. Passing information from lower resolutions to higher. In a recent paper (Grimson, 1985) Grimson makes the following comments: "What is the effect of driving the matching process in a coarse-to-fine manner? At the next finer level, there will in general be twice as many feature points. If image features persist across scales, which they usually do, then in general, each of the feature points at the finer scale can be associated with a feature point at the coarser scale. This will not always be the case, of course, and if there is no corresponding feature at the coarser scale, then ... the use of multiple scales implies no savings of computational expense." This quotation implies that in the Marr-Poggio scheme information is transferred from lower resolution to higher resolutions only for those features that persist across scales. However, since there are generally twice as many feature points at the next higher resolution, it follows that only half of the features will persist between one resolution and the next lower resolution level (this is easily observed in the scale space representation, e.g. figure 2.4). As Grimson points out, the fact that information from the lower resolution is not available for matching those features which do not persist to the lower resolutions, means that a larger search region is needed, increasing the probability of obtaining ambiguous matches, and hence, increasing the probability of matching error. One way of solving the problem of transferring information between resolution levels is not to rely on only those features that persist between resolution levels, but rather reconstruct the disparity function from the information at all features at the lower resolutions, so that an estimate of the disparity is available at all points in the lower resolution image. In this way, a disparity estimate is available for all features in the higher resolution, not just those features that persist from the lower resolutions. This technique is hinted at by Grimson (1981a) in his early paper describing his implementation of the Marr-Poggio matching scheme. 28 He proposes extracting the disparity information from a region in a low resolution image, to be used by the higher resolution matching, by finding the median, mode or average of the disparity values in that region. Grimson did not go into any detail on this topic (only one sentence in a 36 page paper) and it is not clear whether or not he was suggesting a reconstruction scheme such as we are proposing and in any case seems to contradict the implications of the quote given above taken from a later paper of his (Grimson,1985). In- range/Out- of- range detection A second, and more fundamental, problem with the' Marr-Poggio matching scheme rests in what they refer to as the continuity constraint. This constraint states that disparity varies smoothly almost everywhere and that only a small fraction of the scene is composed of boundaries that are discontinuous in depth (Marr and Poggio, 1979 p.303). Grimson (1985) gives a simple example which illustrates how the in-range/out-of-range detection mechanism of Marr and Poggio, which is based on matching statistics in the neighborhood about the feature to be matched, fails for discontinuous disparity functions. This failure is given as a reason for why the continuity constraint is required. The idea behind Grimson's example is depicted in figure 2.8. Suppose that the matching statistics are compiled over the square region with sides of length d, as shown in the figure. Now suppose that (x/d) of this neighborhood covers region A, which has a disparity that is out of range of the matching process and the remaining (1-x/d) portion of the neighborhood covers region B which is in range of the matching process. If e is the threshold of the percentage of matching zero crossings in the neighborhood that is required to declare the match (of the feature in the centre of the neighborhood) in range, then for what values of x will the percentage of matched points in the neighborhood exceed el Grimson shows that if x is less than or equal to (l-e)d/0.3 then the matches in the neighborhood will be declared within range. It is clear from the diagram that if x is less than 0.5d then the match in the centre of the region will be over region B, and hence will be in range. However, if e is greater than 0.85 the algorithm will actually conclude that the match is out 29 ou-r-crf -range A 7--0 we can find a value of the disparity function slope, m, for which p is less than 70 + 8. In fact this occurs when m > 60w/d5. This means that the Marr-Poggio in-range/out-of-range detection scheme fails even for some continuous surfaces. Upon closer examination, it is seen that the Marr-Poggio theory requires not a continuity constraint but a constancy constraint for strict validity. We will see in the next discussion that this is the case for the Marr-Poggio disambiguation scheme as well. Grimson (1985) suggests that figural continuity can be used to disambiguate possible matches. The use of figural continuity, which was first proposed by Mayhew and Frisby (1981), involves distinguishing between matches which give rise to small scattered segments, indicating out-of-range, and matches which give rise to extended contours, indicating in-range. This method seems to be very effective. We have not studied the use of figural continuity in our algorithms, as it is fairly computationally intensive, and as we shall see later, our algorithms work well enough without it Problems with disambiguation As you may recall, one of the methods that Marr and Poggio suggested for disambiguating between possible matches was to examine the dominant sign of the disparity in a neighborhood about the feature to be matched. The candidate match which was consistent with the domimant disparity sign was then chosen as the correct match. However this method works only for disparity functions that are constant, or almost so. That this is so can be seen in figure 2.10. Suppose we want to disambiguate the possible matches to a feature located at point P. Suppose that there are two possible matches, one with a positive disparity (relative to the current disparity estimate) and the other with a negative disparity. In order to disambiguate these we find the dominant disparity sign of the unambiguously matched features 32 matched •features dispari+y £ estimats •function FIGURE 2.10 The failure of the Marr-Poggio disambiguation procedure for non-constant disparity functions. in a neighborhood about P. Note that these disparities are both positive and negative. Futhermore, since there are more negative disparities in the neighborhood than positive ones, the dominant disparity sign is taken to be negative. Thus the candidate match with the negative disparity is taken to be the correct match even though it is in fact incorrect One can modify the disambiguation method so that if there is more than one type of disparity sign in a neighborhood then no attempt is made to assign a match to the ambiguous feature. However, if a disparity function is non-constant almost everywhere then very little disambiguation will be performed and very few matches will be made.. Grimson (1985) suggests that figured continuity can be used to disambiguate possible matches as well as determine whether or not a feature is in range of the matching process. This method involves accepting only those matches which result in extended contours with sufficiently large extent However, the test for figural continuity can be computationally expensive and may break down for noisy images, or for images whose features are distorted as described in chapter 4. In these cases the zero crossing contours may be become broken 33 up. 34 2.5 - A Simplified Multi-resolution Matching Algorithm. Based on the discussion in the previous section we can make the following observations. 1. The in-range/out-of-range detection mechanism of Marr and Poggio is unreliable for non-constant disparity functions. 2. The disambiguation process proposed by Marr and Poggio is likewise unreliable for non-constant disparity functions. 3. The way in which disparity information is passed from lower resolutions to higher resolutions needs to be examined in more detail than was done by Marr and Poggio, and Grimson. Since the in-range/out-of-range detection and disambiguation processes are unreliable for non-constant disparity functions (which are the rule rather than the exception in most applications), and since these operations are quite computationally intensive (because of the need to examine a neighborhood of points for each feature to be matched) we wonder what would be the effect of eliminating these operations all together. Clearly, eliminating these operations would simplify the processing somewhat and resulting in a faster matching scheme. O n the other hand, it is evident that in eliminating these processes we run the risk of increased error due to incorrect matching. A system that did not perform in-range detection or disambiguation would have to ensure that the probability of having out-of-range or ambiguous matches be very small. We now propose a simple matching algorithm that tries to satisfy these conditions by relying heavily on the information passed to the higher resolutions from the lower. The multi-resolution nearest neighbour matching algorithm The operation of this algorithm, which we call the nearest- neighbour matching algorithm is depicted schematically in figure 2.11. The idea behind this algorithm is as 35 L E F T IMAGE DISPARITY ESTIMATE FROM PREVIOUS LEVEL V 2 G FILTER AND IC. DETECT NEAREST NEIGHBOUR k MATCHER DISPARITY S A M P L E S DISPARITY FUNCTION RECONSTRUCTION V 2 G FILTER AND Z.C. DETECT DISPARITY ESTIMATE TO NEXT L E V E L FIGURE 2.11 The operation of the multi-resolution nearest neighbour matching algorithm. follows. We eliminate all in-range/out-of-range checking and disambiguation of possible matches, partly to save on computation and partly because of the difficulties these processes have with non-constant disparity functions. In so doing we require that a disparity estimate be available that can be used to determine the centre of the matching region at a given resolution, and that this estimate be as accurate as possible in order to reduce the probability of the feature being out of range of the matching process. In order to reduce the probability of obtaining ambiguous matches, we must restrict the effective matching range. This is done 36 by using nearest neighbour matching. If the disparity estimate is sufficiently accurate, then the match whose disparity is closest to the estimated disparity is most likely the correct match, and we therefore treat it as such. Hence the term 'nearest neighbour matching'. This form of matching is depicted in figure 2.12. This type of matching algorithm requires that the disparity estimate be fairly accurate (however, as we will see in chapter 4.5, some error can be tolerated). In our system the disparity estimate is obtained from the lower resolution matching processes. In order to have a disparity estimate for all feature points in the higher resolution it is necessary to reconstruct the disparity function from the sparser set of disparity values available at the lower resolutions. Thus, for the high resolution disparity estimate to be accurate it is required that both the disparity measurements at the lower resolutions, and the reconstruction of the disparity function from these measurements be accurate. One way in which to provide a more accurate disparity estimate at a given resolution level is to transfer information from the higher resolutions as well as the lower (after an estimate is available at the higher resolution, of course). The processing flow of such an iterative matching method is given in figure 2.13. This multi-level flow of information from low resolutions to high resolutions back to low resolutions is similar to that proposed by Terzopoulos (1982) in his surface reconstruction algorithm (discussed in chapter 3.3). The main drawback of this approach is the increased computational requirement, as the matching, and disparity function reconstruction must be performed in every iteration. Experiments described in chapter 5 indicate, however, that some improvement in the performance of the matching algorithm is obtained as a result of such iteration. The feature representation that is used in our algorithm is the zero crossing pyramid with the spatial quantization proportional to the value of a. This means that the average zero crossing density in zero crossings per pixel is independent of the resolution level. Using a pyramid structure means that less computation is required to perform the matching search for a given disparity range at low resolutions than at high resolutions. We also consider the use of extrema (peaks or valleys) of the V 2 G filtered image as features to be matched. This was 37 EPIPOLAR LINE 21 ESTIMATED MATCH POSITION POSSIBLE MATCHES CHOSEN MATCH (NEAREST NEIGHBOR) FIGURE 2.12 Nearest neighbour matching. suggested by Frisby and Mayhew (1981) who claimed that extremum features were required if some phenomena concerning human stereopsis were to be explained. The bulk of the thesis involves an examination of the disparity measurement error mechanisms (chapter 4) and their effect on the performance of the matching algorithm as well as of the reconstruction process (chapters 3 and 4). These theoretical examinations are supported by experimental results (chapter 5). These studies indicate that, if the reconstruction process is performed sufficiently well, the simplified matching algorithm described here works well. 38 RESOLUT ION L E V E L 1 L E V E L 2 L E V E L 3 M r M r M L E V E L 4 F I G U R E 2.13 The processing flow of a- multi-level iterative matching algorithm. 39 2.6 - Disparity Scanning Matching Algorithms. In a recent paper (Grimson, 1985) Grimson suggested that instead of guiding the matching from low resolutions to high, the matching be done at the highest resolution only, by scanning through a large disparity range and noting at which disparities a match was possible for a given feature. Thus for each feature, a list of possible disparity values could be tabulated. Then these possible matches could be disambiguated in some fashion. Grimson recommends using the information obtained by performing the disparity scan at lower resolutions to do the disambiguation. Note that since the disparity range is the same for all resolutions the following comment by Nishihara (Nishihara, 1984) rings true: "Marr and Poggio's idea of trading off resolution for range seems to be largely abandoned in (Grimson's) technique." Grimson's method of disambiguation is as follows. Given a set of possible matches at a given resolution, a neighbourhood at the next lower resolution about the feature point is checked for unambiguous matches. If the disparity values of these unambiguous matches are all the same, and if this disparity value is one of the possible disparity values for the high resolution feature, then the high resolution feature is assigned that match. Otherwise the high resolution feature is assigned no match at all. Note that this method suffers from the same problem as did Marr and Poggio's disambiguation mechanism; it fails for non-constant disparity functions. If a region of the lower resolution image has non-constant disparity then there will, in general, be more than one disparity value in that region. Thus no assignment of disparity values will be possible for the higher resolution features. To remedy this problem we propose the " following algorithm, which we call the multi-resolution dispscan algorithm. We begin the matching process at the very lowest resolution level. At this level we scan through the entire disparity range, looking for possible matches. At each feature for which we have found at least one match, we examine a neighborhood of points about this feature. For each possible disparity of the central feature we count the percentage of matches in this neighborhood that result in a disparity within 2 pixels of this disparity. The candidate disparity which has the largest percentage of such 40 matches is taken to be the correct disparity for the feature. It is obvious that we cannot do this at the higher resolutions when the disparity function is not constant, for the same reasons that we gave for the inadequacy of the Marr-Poggio matching technique. However, after doing the above process at the lowest resolution, we have an estimate of the disparity function, with which we can modify the matching process at the higher resolutions. At the higher resolutions we do the following. As at the lowest resolution, we scan through the entire disparity range and mark down all the possible disparities that a given feature can have. Then, for each such possible disparity, we examine the features in a neighborhood about this feature and tabulate the percentage of possible matches whose disparities, when added to the difference between the disparity estimate at the central feature and the disparity estimate at the neighborhood feature, lie within 2 pixels of the candidate disparity. The candidate disparity with the largest percentage is chosen as the correct one. Note that the disparity estimate from the lower resolution is being used to guide the disambiguation process, but in a way that allows the algorithm to work for non-constant disparity functions. The operation of the algorithm is depicted graphically in figure 2.14. This algorithm, like the simple multi-resolution matching algorithm described in the previous section, depends crucially on the accuracy of the information provided by the lower resolutions. This means that it is important that the disparity values obtained at the low resolutions must be as accurate as possible and that the reconstruction of the disparity function from these values be as accurate as possible. As we have stated earlier, the examination of the disparity measurement and reconstruction process are studied in some detail in the remainder of the thesis. The next chapter, focuses on methods for performing the reconstruction process. » DISPARITY 4 ± 2 PIXEL MATCHING RANGE ESTIMATED H DISPARITY FUNCTION MINUS _ r _ O F F S E T n il N = 5 FOR 0 -23 N = 9 P0RD-I6 M=3 F0RD-/2 N = 4-F0RD=G • P O S S I B L E NEIGHBORHOOD F E A T U R E MATCH © P O S S I B L E C E N T R A L F E A T U R E MATCH X NEIGHBORHOOD F E A T U R E POSITION ® C E N T R A L F E A T U R E POSITION FIGURE 2.14 The operation of the multi- resolution Dispscan matching algorithm. 42 2.7 - Other matching methods The Marr-Poggio type of stereo matcher is not the only mechanism that has been proposed. Historically, the Marr-Poggio algorithm arose from a consideration of previous efforts to model the human stereo vision system. The bulk of these methods were based on the proposal by Julesz (1971) that the human stereo vision process is cooperative. That is, a solution for the disparity function is obtained by the cooperation of a large number of spatially distributed, but interacting, disparity detecting neurons. Such cooperative algorithms were proposed by Nelson (1975), Dev (1975), Sugie and Suwa (1977). Marr and Poggio (1976) presented a cooperative algorithm (later analyzed in Marr, Palm, and Poggio, 1978) which incorporated physical constraints in the formulation of the computational structure of the algorithm. However, even this cooperative algorithm, which performed better than the previously mentioned algorithms, works poorly on natural imagery (Marr and Poggio, 1979, p303). Also, as Marr (1982) has remarked, iterative methods (which most cooperative algorithms are) are not likely to be used by biological systems because of the slow speed of the neurons. Fast operation is obtainable only by one-shot algorithms utilizing highly parallel networks of processing elements (neurons). The inadequacy of Marr and Poggio's cooperative method led Marr and Poggio to develop their multi-resolution matching algorithm which has been described in the previous sections. Research into stereo vision has not been limited to the modeling of biological systems, of course. There have been many methods proposed specifically for machine vision. The earliest of these methods was the intensity based area correlation technique (e.g. Levine, O'Handley, and Yagi, 1973). This technique involves correlating the intensity functions of regions of the stereo image pair to determine the disparity in these regions. These methods typically suffer from low resolution and from the high ambiguity of the image intensity values. Baker and Binford (1981) used edge correlation in addition to intensity correlation in an effort to reduce the matching ambiguity. These also introduced a number of constraints which both reduced the ambiguity and sped up the computations. They also used a coarse-to-fine approach to limit the amount of computation required for the correlation. Ohta 43 and Kanade (1985) have proposed an algorithm which uses dynamic programming methods for searching large spaces of candidate matches in three dimensions (that is along the epipolar line as well as across it, and along the disparity dimension). Baker and Binford (1981) also use dynamic programming to search the space of possible disparity values for matches. The drawback of these dynamic programming approaches seem to lie in the immense amount of computation required, especially for complex scenes. From the point of view of simplicity in implementation the Marr-Poggio type of matching, or the nearest neighbour type of matching we propose is preferable over the correlational methods. It is for this reason that we have concentrated on the Marr-Poggio type of approach. 44 2.8 - Summary of Chapter 2 - The problem of matching feature based image descriptions was introduced. - The Marr-Poggio (1979) multi-resolution matching method was reviewed. - The out-of-range detection and disambiguation techniques used in the Marr-Poggio multi-resolution matching method were shown to be unreliable for non-constant disparity functions. Questions were raised about the way in which the Marr-Poggio algorithm transferred information from the lower resolutions to the higher. - A simple algorithm, (the Nearest Neighbour Matching Algorithm) was proposed which does away with the out-of-range detection and disambiguation processes, and instead relies on reconstruction of the disparity function at each resolution to provide a disparity estimate to guide the matching at the next higher resolution. Disparity scanning matching algorithms, as proposed by Grimson (1985) are introduced. A multi-resolution disparity scanning matching algorithm based on reconstruction of the disparity function at each resolution is proposed (the Multi-Resolution DispScan Algorithm). This algorithm is able to handle non-constant disparity functions, whereas it is not clear whether or not Grimson's method can. - It is shown that the peformance of the matching algorithms proposed in this chapter crucially depend on the accuracy of the disparity measurements at the lower resolutions and on the accuracy of the disparity function reconstruction process. 45 III - RECONSTRUCTION OF THE DISPARITY FUNCTION FROM ITS SAMPLES 3.1 - Introduction In the discussion, in the previous chapter, of the discrete multi-resolution matching algorithms, it was pointed out that the sparsely sampled disparity function obtained at a given resolution level must be, at least partially, reconstructed or interpolated to provide a denser sampling. This denser sampling is needed since guidance of the matching process at the next higher resolution requires a depth estimate at each feature location in this higher resolution. Because the feature density increases as the resolution increases, it follows that the lower resolution disparity function is not sampled at all locations corresponding to the higher resolution feature positions. Hence the lower resolution disparity function must be reconstructed or interpolated to provide disparity values at all feature locations in the higher resolution image representation. The reconstruction of the disparity function from its samples is also required at the highest resolution level, where there is no need to provide disparity estimates to a higher resolution feature matching process. In this case the reconstruction is needed to fill in the gaps between the feature points and provide a complete disparity map, that is, an array containing disparity values at each point in the image. One may argue that it is not necessary to know the disparity at each point in the image, but only where it is needed by some visual process. However, it is unlikely that the place where a disparity value is required is always going to coincide with a location at which the matching algorithm explicitly provides a disparity value. Hence some amount of reconstruction or interpolation will always be required. Grimson (Grimson, 1981b) provides further motivation, based on psychophysical considerations, for the interpolation process. In this chapter we will discuss methods of performing the reconstruction operation. It should be pointed out that the domain of applicability of the reconstruction methods described in this chapter is not limited to the stereo vision case but to a very wide range of applications. For example, (Clark et al, 1985) discusses the use of the transformation reconstruction method described later to an application 46 in radio astronomy. The topics covered in this chapter are listed in block diagram form in figure 3.1. A note on the terminology used in this thesis. There are three terms which will be used to denote the process of obtaining the value of a function at a point where it is not known explicitly. These are interpolation, approximation and reconstruction. These are, for the purpose of this thesis, defined as follows. Interpolation is the process of fitting a known (class of) function(s) through the measured function values such that the resulting function has the same values at the measurement points as the measured values. Approximation is the same as interpolation except that the condition that the interpolated function have the same values as the measured values at the measurement points is not enforced. Reconstruction is defined as the process of obtaining the exact function that has been sampled, using information (or assumptions) about the underlying function (for example, whether or not it is bandlimited). It should be noted that many of the reconstruction methods look similaT to interpolation methods, however the difference is that the interpolation methods assume nothing of the underlying function, and hence nothing can be said, in general, of their accuracy. In the reconstruction methods the underlying functions are assumed to satisfy some constraints, which allow the accuracy of the reconstruction to be determined. Proofs of Theorems stated in this chapter can be found in the Appendix. Grimson and Terzopoulos1 reconstruction methods (Relaxation) Introduction CH 3.1 I The WKS sampling theory and i t s extensions CH 3A (*) Indicates New Material The Warping * method I CH3.5 Extension to 2D C H 3.fo Implementation I CH 3.? Adding gradient information to the reconstruction process CH 3.? F I G U R E 3.1 The topics covered in this chapter. 48 3.2 - lnterpolatory Methods We will only briefly describe some interpolation methods, as these produce results which typically contain more error than the methods to be described later. A good review of interpolation methods can be found in Appendix VI of (Grimson, 1981b). The simplest form of interpolation is known as nearest neighbour interpolation. In this method a given function value is assigned the value of the function sample nearest to it. A slightly more complex method is linear interpolation wherein straight line segments are fitted between adjacent sample points (for the one dimensional case; in two dimensions the sample points are segmented, or triangularized, into triplets and a planar function, f = a + bx + cy, is fit through each of these triplets). This basic idea can be extended to higher order polynomials. However, higher order polynomial functions tend to exhibit oscillatory behaviour which almost certainly would not be present in the actual function. Spline interpolation methods involve piecewise matching of low order polynomial sections such that certain smoothness constraints are met. For example, in the case of cubic splines, two of the coefficients in a patch (ID case) may be set so that the measured function values at the ends of the patch match with the interpolation, and the other two coefficients will be chosen so as to ensure that the first and second derivative of the cubic segment match those of the two adjacent cubic segments; 49 3.3 - The Methods of Grimson and Terzopoulos The first detailed analysis of the surface reconstruction problem in the context of stereo vision was performed by Grimson, (Grimson, 1981b, 1982) as part of his PhD research. The approach Grimson took was to construct a complete surface (or disparity) description based only on the surface information known along zero crossing contours (Grimson used zero crossings as the features to be matched in the correspondence process). To constrain the infinite set of possible surfaces that could satisfy the conditions imposed by the disparity values known along the zero crossing contours, Grimson relied on the properties of the imaging process which, when coupled with the shape and reflectance characteristics of the surface, gave rise to the zero crossings in the image. Informally, his method was to compute the surface which fitted the known surface depth values and was 'most consistent' with the implicit shading information. Crudely put, this information can be described as implying that the reconstructed surface should, when passed through a V 2 G filter, not contain any new zero crossings that were not in the orignal zero crossing set This was formally referred to by Grimson (1981b, 1982) as the 'surface consistency' constraint, and informally as the 'No news is good news' constraint and was stated as follows: The absence of zero crossings constrains the surface shape. Grimson shows that the adoption of this constraint results in the conclusion that the best surface to fit the known data is the one that minimizes the variation in surface orientation (also known as the quadratic variation of the depth gradient function) over the surface. Grimson shows (1982) that the functional to be minimized is the following: 6(0 = [ / / ( f x x 2 + 2f x y 2 + f y y 2 ) d x d y ] 1 / 2 (3.3.1) Grimson (1982) shows that the above minimization problem can be characterized by using the calculus of variations to provide a set of differential equations (known as the Euler equations) that the minimal function must obey. Doing this, Grimson obtained the following differential 50 equation for f: V ' f = f +2f +f - 0 (3 3 2) xxxx xxyy yyyy \J.->.*-J where V is the biharmonic operator. The boundary conditions for this P.D.E. are given by (for the case of a square boundary, aligned with the coordinate axes): f = 0, f X X y = 0 for the boundaries parallel to the x axis. (3.3.3) f x x = 0, f y y x = 0 for the boundaries parallel to the y axis. (3.3.4) It can be shown (Terzopoulos, 1982) that the minimal function obtained as the solution of the above P.D.E. can be modeled as the surface that a thin metal plate takes when it is constrained to pass through the known depth values. Grimson (1981b) presents a computational method whereby the minimal surface can be obtained. Essentially this method involves searching for the surface function f which minimizes the functional 0(0 (equation 3.3.1) with a conjugate gradient search algorithm. Since the surface representation to be determined in practice, as well as the input depth constraints, are defined in a discrete rather than continuous fashion, the above minimization problem is converted into a discrete • problem. This involves conversion of the differentiations into differences and the integrations into summations. The function to be minimized then becomes (Grimson, 1981b, pl96): yy\-7 m-f + where Is- •} is the set of reconstructed surface depth values and {c. •} is the set of known 51 surface depths. The indices ij refer to positions on the grid of surface points. The scalar 0 is a smoothness parameter. If it is zero, the resulting surface will fit the known data values exactiy. If non-zero then the resulting surface will be a smooth approximation to the actual surface. In this fashion errors in the measured surface depth values may be smoothed out. Computationally, the conjugate gradient algorithm of Grimson, for the discrete case can be (see Terzopoulos, 1982) formulated as a relaxation algorithm. Relaxation methods are iterative procedures for determining the solutions of linear systems of equations of the form: Au = I (3.3.6) and hence: u = A " 1 ! (3.3.7) where A is a known NxN nonsingular matrix and J is a known Nxl column vector. Relaxation methods provide estimates of the solution vector u . The operation of the relaxation methods can be described by the following matrix equation: S ( k + D = G U ( k ) + E (3.3.8) where G is the iteration matrix and is a function of A and where Pc=J. The current estimate of the solution vector u ^ k + 1 ^ is obtained by multiplying the previous estimate of the solution vector u ^ ) by the iteration matrix G and adding to it the vector of known depth constraints, f (which is zero for the points at which no depth value is known). Three different types of relaxation schemes can be put into this form. The first type is Jacobi relaxation, for which G = D~^(L+U) where A = D-L-U is the decomposition of A into diagonal D, upper triangular U, and lower triangular L components. The other two types of relaxation methods are the Gauss-Seidel relaxation, for which G = (D-L)~Hj, and Successive 52 Overrelaxation (SOR) for which G = (I-wD~1L)~1[(l-w)I+a;D~'1TJ] with the relaxation parameter we(1,2). The Jacobi method is a parallel method in that each element of the new solution vector can be computed simultaneously. It requires that both the new and old solution vectors be stored completely. The Gauss-Seidel method is a sequential method as the computation of a particular element of the new solution vector can use the elements of the new solution vector that have been already computed. Thus the Gauss-Seidel algorithm will operate faster than the Jacobi algorithm. The Successive Overrelaxation algorithm is a generalization of the Gauss-Seidel algorithm (one obtains the Gauss-Seidel algorithm from the SOR algorithm when u> — l). This algorithm scales the residual vector (the vector that is added to the old solution to provide the new solution vector) by a number greater than one in an attempt to speed convergence. Terzopoulos (1982) provides the conditions on G which insure that the iterative process converges. In particular he shows that the SOR algorithm can not be guaranteed to converge if co is greater than or equal to 2. The matrix A is given by the Hessian of the functional 0(0 and is given by: A = [310(uh)/3uij9ukl], l + (20+0)^ (3.3.10) These coefficients are termed computational molecules by Terzopoulos (1982) who derived the 53 values of the coefficients, both for the interior grid points as well as for the grid points near the grid boundaries where the boundary conditions imposed by the formulation of the problem come into play. These same molecules were obtained by Grimson (1981b) in his specification of the conjugate gradient algorithm. These computational molecules will be used in the implementation of a relaxation algorithm later in the thesis, and are displayed in figure 3.2. The convergence of the aforementioned relaxation methods and of the conjugate gradient methods turns out to be painfully slow. In an effort to speed up the computation of the solution vector, Terzopoulos (1982) proposed the use of a multi-grid iterative algorithm. This method involves using depth constraints at a number of different levels of resolution. The coarsely sampled depth values at the low resolutions would be interpolated first, using one of the standard single grid relaxation algorithms. Since these algorithms reduce the high frequency errors in the depth estimate very quickly, a coarse solution can be obtained with a small number of iterations. This coarse estimate is then used as an initial estimate for relaxation on the next higher resolution level. The processing continues in this manner until the highest resolution level has been reached. Because, at each resolution level, the relatively high frequency errors are diminished, and since decreasing the resolution decreases the frequency of the errors that can be filtered out, it can be seen that such multi-grid methods can quickly eliminate error components having fairly low frequencies. The normal, single resolution relaxation methods can eliminate the high frequency error components quite quickly, but take far more iterations to eliminate the low frequency errors. Thus, if depth measurements are available at a number of different spatial resolutions, then the Terzopoulos multi-grid algorithm would be expected to be much faster than Grimson's method for surface reconstruction. Let us now discuss whether or not Grimson and Terzopoulos' algorithms are actually suitable for our application. Let us start by checking the validity of the assumptions made by Grimson (1981b). 54 FIGURE 3.2 Computational molecules for the relaxation surface approximation algorithm (after Terzopoulos, 1982). The thick bars indicate the boundary of the grid. The starting point for Grimson's development of his surface reconstruction procedure was his surface consistency constraint. Basically this said that if there was a region of the surface for which there was no zero crossing observed in the V 2 G image then the surface could not be changing appreciably, otherwise a zero crossing would have been created. This 55 statement seems reasonable enough but it neglects one important fact. This fact is that the V 2 G filtering operation restricts the density of observed zero crossings.2 In fact, the larger the space constant of the V 2 G filter, the lower the density of observed zero crossings. This can be easily seen by looking at any one of the scale space maps that are depicted in chapters 4 through 6. Thus we see that Grimson's surface consistency constraint strictly holds only for the case of V 2 filtering, and not for V 2 G filtering. There can be cases where the surface changes appreciably and not result in an observed zero crossing, simply because the change in the surface caused an intensity change which was of too high a frequency and was therefore filtered out by the V 2 G operation. This may not be a major problem for the higher resolution surface representations since these high frequency surface changes that are filtered out would not be perceived by the visual system anyway. At the lower resolutions, however, the problem is more serious. The surface changes that are filtered out at these resolutions will be fairly low frequency changes, and ones that would be perceivable by the visual system. Of course, if one is only trying to obtain a multi-resolution surface representation, as Terzopoulos (1982) does, then it is not expected that the lower resolution representation would capture all of the higher frequency surface changes. On the other hand, we have seen that the multi-resolution stereo matching process requires an accurate disparity estimate from the low resolution information so that the matching can proceed efficiently and accurately at the higher resolutions. Thus, if the low resolution surface reconstruction process does not detect the higher frequency changes in the surface (or disparity) then the higher resolution matching algorithms will be required to be more robust Thus we should ask ourselves whether or not there are other surface reconstruction methods, which do not assume Grimson's3 surface consistency constraint, that can better reconstruct the higher frequency surface changes. This question prompted the search for reconstruction methods that are described in the next chapter, all of which are based on assumptions of the frequency content of the function to 2 One of the reasons that Grimson did not account for this may lie in the fact that, in the statement of his Surface Consistency Theorem and its proof, he used V 2 filtering instead of the more general case of V 2 G filtering. In the case of V 2 filtering all of the (topographic) zero crossings are present and the Theorem holds. However, when V 2 G filtering is used the Surface Consistency Theorem is no longer valid. 3 It should be pointed out that Grimson intended his surface reconstruction algorithm for the task of filling in the disparity array at the highest resolution only. 56 be reconstructed (i.e. bandlimitedness). Before we get into these methods, let us briefly discuss a problem with applying Terzopoulos' multi-grid algorithm to our multi-resolution matching algorithm. At first glance Terzopoulos' algorithm appears to be tailor made for the reconstruction processes to be done in the multi-resolution matching system. However, Terzopoulos' algorithm requires that depth estimates be available at all resolutions before the surface reconstruction can take place. Clearly, the multi-resolution matching algorithm requires that the disparity function at a given resolution level be reconstructed before the next resolution level features can be matched. This means that the multi-grid reconstruction method is inapplicable. The best that one can do is to perform a number of relaxation iterations at each resolution level. This state of affairs is not as bad as it seems because the disparity function estimate from the previous resolution level is available to provide an starting point for the relaxation process. Presumably, much of the low frequency errors will have been suppressed in this manner. Thus, the number of relaxation iterations needed to achieve a certain error level is less than what would be expected from a single resolution level relaxation operation. However, having said this, it turns out that in practice, the relaxation reconstruction method is still fairly slow compared to the methods to be discussed in the next chapter. 57 3.4 - The W K S Sampling Theorem and its Extensions In this section we will be looking at methods for the reconstruction of functions that are based upon series expansions of these functions. It has been shown, by a host of researchers including J . M Whittaker (1929), E .T. Whittaker (1915), C .E . Shannon (1949) and Kotel'nikov (1933), that one can represent a bandlimited function with an infinite series whose coefficients are the samples of the function, suitably distributed. The precise statement of this result is given here as Theorem 3.1, which we call (after (Jerri, 1977)) the W K S sampling theorem, in honour of the aforementioned mathematicians. Theorem 3.1 The uniform 1-D sampling theorem (WKS) If f(t) is a function of one variable having a Fourier transform F(o>) such that F(CJ) = 0 for o)>o;o = 7r/T and is sampled at the points t n = nT, n = 0,+ l , ± 2 , . . . , then f(t) can be reconstructed exactly from its samples f(nT) as follows: f(t) = Z f(nT)sin[w 0(t-nT)]/[w 0(t-nT)] (3.4.1) rv=-oo This sampling theorem requires for its validity that the samples of the function be distributed uniformly. That is, the spatial or temporal interval between sampling instants be a constant Our application, however, produces samples that are non-uniformly distributed. Hence, the W K S sampling theorem, as it stands, is not valid for our application. We will now look at methods, adapted from the W K S theorem that allow the reconstruction of functions from non-uniformly distributed samples. One of the conditions on f(x) for the W K S theorem to hold is that it be bandlimited. This means that the following expression is true: 58 f(x) = / , e ^ ^ u j d u (3.4.2) where ] is some bounded interval (which we will take, without loss of generality, to be [-7r,7r] in the subsequent text) and F(CJ) is the Fourier transform of f(x). Now, suppose, instead of this condition, f(x) was bandlimited with respect to some other integral transform. For example: f(x) = / j K(XJCJ)Fk(W)CL> (3.4.3) Suppose further that F K eLj(I) (that is, f(x) has bounded energy) and that K(x£))e L2(I) is such that the set {K(xnJcj)} for all integer n is a complete orthogonal set on the interval I. Then any 'K bandlimited' function f(x) can be reconstructed as specified in Theorem 3.2 (due to Kramer (1959)). Theorem 3.2 The generalized WKS sampling theorem (due to Kramer, 1959, this precise statement of his theorem is that contained in Jerri, 1977 (Theorem III-A-1)) Let I be an interval and Lj(I) be the class of functions )g(u)du (3.4.4) where g(w)eL2(I). Suppose for each real x, K(x^>)eL2(I), and that there exists a countable set E={xn} such that {K(x n £))3 is a complete orthogonal set on I. Then: f(x) = lim Ef(x )Sn(x) (3.4.5) where 59 Sn(x) = ; i K( X ^)K*(x n J w)cL;/ / I |K(x n ^)| 2 cL J (3.4.6) Note that if K(XA>) = e J C d X and xn = nT then we get the standard WKS sampling theorem. As an example, a valid K(x^) is the Bessel function of m order, wJ^xw), which results in the inverse Hankel transform: f(x) = / j F(u) o)Jm(xw)dw (3.4.7) where I = [0,1]. For this case it can be shown that: S n « = V m W ^ V - ^ ^ + W <3A8> The sample sequence for this reconstruction formula is given implicitly by the positions of the zeroes of Jm(x). That is, x m n satisfies J m ( x m n ) = 0. It is easily seen, by looking at a graph of the function J for any value of m, that the sample sequence is not uniform (however, the sequence approaches uniformity for large values of x ). For this reconstruction formula to hold, f(x) must be bandlimited with respect to the Hankel transform. This means that F(u>) (which is the Hankel transform of f(x)) vanishes for u> outside the interval I = [0,1]. This is a different condition on f(x) than in the Fourier transform case. Note that a function f(x) that is not bandlimited in the usual Fourier transform sense may still be reconstructible with the above Bessel function reconstruction formula. There are many other types of functions that can be used as kernels in reconstruction formula. Jerri (1977) lists some of these, including the associated Legendre functions and the Chebyshev functions of the second kind. All of these reconstruction formula require that the function be sampled in some non-uniform fashion. However, this sample sequence is fixed, for each different reconstruction filter kernel. Thus while the sample distributions are non-uniform they are certainly not 60 arbitrary. Thus these methods are not applicable to our application in which the sample distribution is not known a priori and varies from case to case. One possible way in which we might try to obtain a sampling theorem for an arbitrary sample distribution {xnl is to determine, for each case, a kernel function K(x£))eL 2 ( I ) such that (K(xnA>)} is a complete orthogonal set. In general this would seem to be an impossible task, involving a search over the entire space of 1^ (1). However, one way in which one can proceed is to choose a specific kernel and ask whether or not the set {K(xnjx>)3 is a complete orthogonal set for an arbitrary sample set {xnL This approach (although they might not have explained their motivation in this fashion) was taken by Beutler (1966), Yao and Thomas (1967), and Higgins (1976). They took as the reconstruction kernel function the standard Fourier transform kernel e^ x and asked the question: Under what conditions is the set {e^n} a complete orthogonal set? This question was first looked into by Paley and Weiner (1934) who showed that this set is closed" if the sample set {xfi} obeys the following condition: |xn-n| < 1/7T2 (3.4.10) That is, if the sample set deviated by an amount less than l/7r2 from the uniform sample sequence, then the set {e^ xn} is closed. Levinson (1940) provided a tighter lower bound on the maximum allowable sample deviation and stated that: |xn|-|n| < 1/4 (3.4.11) Levinson claimed that this is the 'best possible' bound. "Closure in this context means that: fjfitxXeJ^nJdx = 0 (3.4.9) iff f(x)=0 almost everywhere for all x and all x n e ^ x n 5 -61 Since the e ^ n are not harmonically related, reconstruction methods which utilize these functions are known as non-harmonic Fourier series methods. Given this basis set it can be easily shown that: S n(x) = sin [7r (x-x n ) ] / [7r (x-x n ) ] (3.4.12) Higgins (1976) shows that for every basis set lgn(x)} for a Hilbert space there corresponds a unique biorthogonal basis {g n(x)} such that = S n m , where 6n m is the Kronecker delta and <.|.> indicates the inner product operation. Thus this results in another set of complete orthogonal reconstruction kernel functions. For example, Higgins shows that, for g n = e ^ ° x n the resulting reconstruction filter function corresponding to g is given by the Lagrange type reconstruction function: Sn(x) = H(x)/[H'(x n )(x-x n )] (3.4.13) where: H(x) = (x-Xo)rT(l-x/x r l)(l-x/x_ t l) (3.4.14) Higgins further states that if the sample sequence x n is a rational function of n, then the expression for H(x) can be put into closed form. It should be pointed out that Levinson (1940) also pointed out the existence of the biorthogonal sequence and wrote down the above formula for S n(x) (although not for the reasons that Higgins did). This result was also derived by Yao and Thomas (1967), albeit in a less elegant, more direct manner. Presumably one could extend this method (of determining closure conditions on the sample sequences) for other kernels besides the Fourier transform kernels. This would perhaps allow a wider range of different sample sequences to be used for reconstruction. We have come across no studies of such an approach, however, and it was felt to be outside the 62 scope of this thesis to attempt our own study of this matter. A different approach was proposed by Yen (1956) who provide reconstruction formula for a number of special cases of non-uniform sampling. His method involved applying a number of constraints on the reconstructed function value, obtained from the nature of the sampling distribution, and setting up systems of linear equations from which the function values can be solved. The special cases for which Yen provided reconstruction formulae are as follows. 1. - Migration of a finite number of uniform sample points. 2. - Sampling with a single gap in an otherwise uniform distribution. 3. - Recurrent nonuniform sampling. 4. - Reconstruction of time-limited, band-limited signals from arbitrarily distributed samples. The types of sample distributions implied by the first three cases of Yen are depicted in figure 3.3. The reconstruction formulae that result are very complex and will not be written down here. The interested reader is referred to (Yen, 1956). These equations also appear in the PhD thesis of F. Marvasti (Marvasti, 1973), who used them in an adaptive quantizer. He did not credit Yen with these equations, although he did include (Yen, 1956) in his list of cited literature. Marvasti did, however, propose some interesting methods of his own in his thesis, which we will now discuss. He presents a pair of techniques for 'uniformizing' a non-uniform sample sequence so that the function can be reconstructed with the WKS formula. The first of these techniques involves estimating the function values at uniformly spaced positions based on a finite number of previous (non-uniformly distributed) function samples. To do this estimation he used the method of (Yen, 1956) (again Marvasti did not cite this paper as the source of this method) which reconstructed a time-limited and bandlimited function from its samples (which can be arbitrarily distributed) in the interval where the function is non-zero. 63 FIGURE 3.3 The three special types of sample distributions handled by the reconstruction formula of Yen (1956). Although Marvasti does not mention it, it is clear that if the function whose samples are to uniformized are not time-limited, then this uniformization will not yield exact results. Hence the reconstructed function, after uniformization, will be in error. The second of Marvasti's uniformization schemes works by predicting the function values at the uniformly distributed 64 based on the information provided by a finite number of preceding function samples (non-uniformly distributed) with a linear predictor. This method produces estimates of the function values at uniformly distributed locations, but with an added jitter noise. Marvasti states that the error in the prediction method is greater than that produced using Yen's time-limited function reconstruction formula. To conclude this section we will consider reconstruction methods which involve simple linear filtering of the sampled function (which can be modeled as an impulse train). Marvasti (1984) shows that, if the sampling function, S = E8(x-x ) is thought of as the zero-crossing f\'-oa 11 of the following F M signal: F M = sin[wcx + fp(x)dx] (3.4.15) and if u>Q is much larger than the bandwidth of p(x), then the jitter noise produced by passing the sampled function through a lowpass filter is negligible. A similar result was obtained by Papoulis (1966) although his derivation proceeded directly from the WKS formulation. The essential conclusion of both Marvasti's and Papoulis' analyses is that the jitter error is negligble only if the deviation of the sample positions from the closest uniform sequence was small enough. This is the same sort of condition imposed by the non-harmonic Fourier series methods. It is clear that all of the methods described in this previous section all tightly constrain the allowed sample positions. Such techniques are therefore inapplicable to situations wherein the sample density varies significantly. In applying the stereo depth measurement algorithms described in this thesis to real life imagery, one is often faced with feature images wherein the feature density varies considerably over the image. Thus, it is evident that the reconstruction methods just discussed are clearly inappropriate for our application. What we require is a reconstruction method that allows reconstruction of functions from truly arbitrarily distributed sample sets. The next three sections describe the derivation and implementation of just such a method. 65 3.5 - The Transformation or Warping Method - ID Case In this section we develop a one dimensional sampling theory, which we extend to two dimensions in the next section, that allows one, under certain conditions, to reconstruct bandlimited functions exactly from arbitrarily distributed samples. This theory is seen to be a generalization of the analysis of Papoulis (1966) who showed how the standard uniform sampling theory of Whittaker, Shannon and Kotel'nikov could be extended to sample sequences that were slight deviations from a uniform sample sequence. We show how a more general result can be obtained by treating a non-uniform sample sequence as resulting from a coordinate transformation of a uniform sample sequence instead of being merely deviated from the uniform sequence. Section 3.6 details how these coordinate transformations can be determined in the two dimensional case. Section 3.7 describes a heuristic algorithm, based on the theory developed in section 3.6, for performing two dimensional function reconstruction from non-uniformly distributed samples. We will first derive a non-uniform sampling theorem for the one dimensional case and then extend this result to the two dimensional case in the next section. The starting point for our derivation is the classical Whittaker-Kotel'nikov-Shannon (WKS) sampling theorem (Theorem 3.1). Let us consider a non-uniform sample sequence {tn5 where t , the position of the n sample, is not necessarily equal to nT. For example, refer to the function shown in figure 3.4a. This function is sampled non-uniformly as shown at the locations tn. Now, suppose we apply a stretching/compressing transformation to f(t), described by T — 7(f), such that we end up with another function h(r) as shown in figure 3.4b, with a sampling period of T units. If the transformation, 7 , between t and r is such that U + nT = 7(t n), for some arbitrary to, then the samples of the function h(r) will be uniformly spaced (figure 3.4b) and we can use Theorem 3.1. For the reconstruction of h(r) to be exact we must have that h(r) be bandlimited to u 0 = 7r /T. If this is so, we can then reverse the stretching/compression operation, and retrieve the reconstructed version of the function f(t) by using the relationship: 66 0) b) FIGURE 3.4 a) A function, f(t) sampled at non-uniformly distributed positions b) The transformed function, g(r), sampled at uniformly distributed positions. f(t) = h(7(t)) (3.5.1) Substituting this relationship into (3.4.1) of Theorem 3.2, and using T = 7(t) yields 67 f(t) = L f(t) sin[wo(7(t)-nT)]/[a)o(7(t)-nT)] (3.5.2) n=-oo " Hence, in order to reconstruct f(t) from its non-uniformly spaced samples f (tn), it suffices to find the invertible and one-to-one function 7(f) such that 7(tn) = nT and then to use (3.5.2). The reconstruction formula (3.5.2) is equivalent to the one derived by Papoulis (1966), who treated the case of sample positions that were deviated slightly from a uniform sample distribution. However, his analysis indicated that this reconstruction would never be exact, but would always be subject to an aliasing error which became smaller as the sample deviation became smaller. This conclusion is too pessimistic, however, and it can be shown that there are cases for which the samples are not uniformly distributed and yet the reconstruction can be exact The conditions under which an exact reconstruction can be obtained are discussed below. In order for (3.5.2) to hold, the function h(r) must be bandlimited to u>0- Thus h(r) is a member of the set B„, , which is defined as the set of all functions whose Fourier "Jo transforms vanish for |o>| >w0- Let us define the set C to be the set of all functions which are the image of a function in B„ under the transformation 7 " \ It can be seen that C "J P 7 is the set of all functions that can be reconstructed exactly with (3.5.2), for a given 7 . The set C is clearly non-empty, and thus Papoulis' assertion that (3.5.2) is only approximately true for all functions is incorrect Equation (3.5.2) is approximate only for those functions that are not members of C . An interesting point to be noted, is that the functions in C are generally not band-limited. That this is so can be seen by examination of the relationship between the spectra of h(r) and f(t). It is possible to show that: 68 F (X ) = f%?(k»)H(u>)±> (3.5.3) where the function P, which can be thought of as a frequency-variant blurring function, is defined by: P(XA>) = J^e^2 7 r^'y(T))e^2 7 r X rdr (3.5.4) That is, P (AAJ ) is the Fourier transform of the angle modulated signal: pw(T) = e /2™ T « (3.5.5) Thus, if Pw(t) is not bandlimited, as is usually the case for angle modulated signals, then generally f(t) will not be either. One can show that there exists a transformation 7 such that the FM signal defined by: f(t) = t~J •f°n ( s ) d s (3.5.6) is a member of C when fl(s) is a positive, continuous function. Hence, such a function can always be reconstructed exactly, when sampled at the times t n=7~^(nT), even when it is not strictly bandlimited. Let us summarize the details of the above analysis in the form of a theorem : Theorem 3.3 The non-uniform 1-D sampling theorem Let a function f(t) of one variable be sampled at the points t = t , where t is not necessarily a sequence of uniformly spaced numbers. If a one-to-one continuous mapping 7(t) exists such that nT= 7(tn)> and if h(r) = f(y~\r)) is bandlimited to u)t = ir/T, then the 69 following equation holds: f(t) = I f ( t j sin[ej.(7(t)-nT)]/[wo(7(l>-nT)] n--co i J (3.5.7) This theorem can be generalized to include other orthogonal basis functions in the same manner that Theorem 3.1 was generalized to Theorem 3.2. This generalization will not be explicitly stated here. The reconstruction method described here can be thought of in a different manner. Consider a 'burst' type signal such as that shown in figure 3.5. Intuitively, we would expect that a uniform sampling of f(t) that allows an exact reconstruction, would not be the most efficient sampling scheme. It seems reasonable to require a higher sample density in the central region where there are high frequency components than in the outer regions where there are lower frequency components. Suppose that, at every point of the function f(t), we make a local estimate of its bandwidth - B(t). Then, it would follow that we would have to sample at a rate of 2B(t) samples/unit time in order to allow an exact reconstruction of the signal. This conclusion has been reached (albeit from a different direction) by Horiuchi (1968) who derived the reconstruction formula (3.5.9) for a signal with a time varying bandwidth B(t) that is sampled at the points t given implicitly by: tn = n/(2B(tn)) (3.5.8) f(t) = I f ( t j sin[7r(2B(t)t-n)]/[7r(2B(t)t-n)] n--oa Ii (3.5.9) The derivative of the mapping function, b j ( i ) / o i can be thought of as the instantaneous sampling rate and hence we have that: 70 f (t) t F I G U R E 3.5 A burst type of signal, with time-varying bandwidth 97(t)/9t = (27T/cj0)B(t) or 7(t) = k +J*0t(27r/cj0)B(r)dr (3.5.10) If the bandwidth B(t) is a constant (or approximately so over a given interval) then we can say: 7(t) = (27r/o)0)B(t)t (3.5.11) With this equation for 7(t) we can see that equation (3.5.2) and (3.5.9) are equivalent If the bandwidth is not approximately constant then our equation (3.5.2) and Horiuchi's equation (3.5.9) are not equivalent. 71 Equations (3.5.8) and (3.5.10) tells us (implicitly) how to optimally sample a signal, that is, how to sample a signal with the smallest number of sample points while still allowing an exact reconstruction of the signal. Equation (3.5.9) suggests that we could interpret the reconstructed f(t) as the response of a time varying (or adaptive) lowpass filter, with bandwidth B(t), to the signal L f(t)5(t-tn). Thus one can envisage the following process. Given an arbitrary function f(t) we estimate its bandwidth as a function of time. We then integrate this bandwidth function, as in equation (3.5.10), to yield the warping function 7(t). When this warping function crosses an integer value-n, we sample f(t). We then store or transmit the sample f(tn) along with the time at which the sample was taken, tfl. We can then, knowing all of the f(tfl) and tn values, reconstruct f(t) using (3.5.7). The preceding analysis is complicated by the fact that an exact 'local' bandwidth measure does not exist as bandwidth is defined globally, being a frequency domain measure. Thus, we can only obtain local band width 'estimates', which may cause concern as to the validity of equations (3.5.8)-(3.5.10). In practice, however, the reconstruction formulas that are used are defined only over a finite area (truncation of the reconstruction series), and so one loses nothing by assuming the local bandwidth over this finite area to be the actual bandwidth. In practice, the use of the above reconstruction theorem requires the knowledge of the function 7(t) at all points t for which we desire a reconstruction. If an analytical expression for the sampling sequence tQ is known (e.g. t = s(n)) then we can simply extend this analytical expression to include non integer values (e.g. 7(f) = s(t)). If no such analytical expression is available (as is usually the case) or if the analytical expression can not be extended to non integer values, then 7(0 must be found by interpolation between the known 7(n) points. The only constraint on this interpolation is that it must yield a 7 that is one-to-one and invertible (monotonic). We will now present an example showing the effectiveness of non-uniform sampling and reconstruction for signals with time varying bandwidth. Consider the following F M signal: 72 f(t) = sin[27T0(t)] (3.5.12) where = l,u, + l2Uj, 11,12=0,±1,±2,±3..., Ui^ kuj} (3.6.2) 77 and where the frequency domain basis vectors u} and u2 are related to the spatial domain basis vectors v, and v2 by u,Tv, = u2Tv2 = 2ir, and UiTv2 = u2Tv, = 0 (3.6.3) If F(c3) = 0 when F(CJ+C3s)^0 (for every "g^0), then the spectral repetitions do not overlap, and the following equation holds: f(x) = I f(x )g(x-x ) (3.6.4) .03 S S where g(x) is the inverse Fourier transform of the lowpass filter function G(w) defined by: = Q ueR G(o5) = arbitrary c3iR, c3-c3^ R (3.6.5) = 0 "~^s e^ Q is a constant that is equal to the area of each of the sampling latdce cells and is the inverse of the sample density. In terms of the sampling lattice basis vectors, v, and v2, Q is given by Q = •|v1|i|v2|'-(vlTv2)J (3.6.6) Now, following the lead of the dimensional case, let us introduce a second of f(x) under the coordinate transformation I = 7(x) analysis performed in section 3.5 for the one function of two variables, h(x), that is the image (3.6.7) 78 i.e. f(x) = h(f) = h( 7 (x ) ) (3.6.8) Let the coordinate transformation, -y(x), be such that the set of non-uniformly spaced samples {Xg}, is transformed into a uniformly spaced set of samples {J^ (such as the set defined by equation 3.6.1). Since. {fs} contains points that lie on a regular lattice, the function h(f ) can be reconstructed under the conditions of Theorem 3.4. If h ( £ ) is suitably bandlimited then this reconstruction will be exact Once h ( £ ) has been reconstructed, f(x) can be obtained by reversing the coordinate transformation (3.6.7). This is the basis of our non-uniform two dimensional sampling theorem which is stated below as Theorem 3.5. Theorem 3.5 The non-uniform two dimensional sampling theorem Suppose that a function of two variables f(x) is sampled at points in the infinite set {xg}. Now, if there exists a one-to-one continuous mapping y such that: 1 = 7(x) and Ig = T ( x s ) (3.6.9) and if the function defined by h(?) = f(7_1(I)) (3.6.10) satisfies the conditions of Theorem 3.4 then the following is true: h ( f ) = £ h ( ? )g(I-? ) (3.6.11) where g(£) is as defined by (3.6.5). Hence: 79 f(x) = I f(is)g(7(x)-7(xs)) (3.6.12) Let us assume that h ( £ ) is an isotropic function, where we have here taken the term isotropic, as in (Petersen and Middleton, 1962), to mean that the support of the Fourier transform of the function is a disk shaped region in the frequency plane, centred about the origin. If this is the case, we can define the region R, mentioned in Theorem 3.4, as R = (H : |X|<7r} . It can then be shown that (Petersen and Middleton, 1962 eq. 74 with B = l / 2 ) : g(I) = (fi/v'3M*\l\)'(*\'Z\) (3-6.13) where Jj is the first order Bessel function of the first kind. Combining equations (3.6.12) and (3.6.13) yields the following result: f(x) = L f(xs) (TTV3)1,(771 T(x)-T(x s)j )/(7TiT(x)-T(X G)|) (3.6.14) It can be shown (Petersen and Middleton, 1962) that the most efficient sampling lattice (in J space) for such an isotropic process is a hexagonal lattice with a characteristic spacing of 2/y/3. This sampling lattice is the one shown in figure 3.7. The values of £ g are fixed by this lattice and, by equation (3.6.13), the values of 7(xg) are also fixed. If we are to use equation (3.6.12) as our reconstruction formula, we must, as in the one dimensional case, know the mapping function y(x) at all sample points and at all other points at which we wish to obtain a reconstruction. As in the one dimensional case we can, once we know the values of 7(x) at the sample points {xg}, interpolate to find 7(x) for any x. Unlike the one dimensional case, however, it is not a trivial matter to obtain a 80 mapping, r-{xs}->{^s}, between the sample sets in x and \ space, that yields a one-to-one and continuous mapping function 7 . In the one dimensional case one and only one such mapping exists (restricting the sign of the derivative of 7 to be positive), given by interpolation of 7(t f i) = nT, but in two dimensions there may be, in general, any number of such mappings. The difficulty lies in the fact that there is no general scheme for ordering arbitrarily distributed points in two dimensions analogous to the sequential ordering available in one dimension, such that adjacency properties are preserved. For the purpose of the following discussion let us make the following definitions. Definition Partition A partition of a planar region R is a set of line segments, called Links, that divide R into a number of distinct, possibly overlapping subregions. The endpoints of these Links are called the Vertices of the partition. There can be no free vertices (i.e. those vertices belonging to only one Link) in a partition except at the boundary of R. Definition Tessellation A tessellation is a partition whose regions do not overlap. Definition Voronoi Tessellation The Voronoi tessellation of the J plane with respect to the point set {J^ is the tessellation whose Links consist of points equidistant from any two points ^.,?-e{7 } and no 1 J s closer to any other points J^e{jj. The vertices of the Voronoi tesellation are those points equidistant from three or more points in { £ g } and no closer to any other point in The subregions created by the Voronoi tessellation contain all points ^ closer to a given point in { £ g } than any other point in {|g}. 81 Definition Dirichlet Tessellation The Dirichlet tessellation (sometimes referred to as the Delaunay triangulation) is the dual of the Voronoi tesselation. The Dirichlet tessellation can be thought of as the set of line segments connecting the points in {J^ whose subregions in the Voronoi tessellation of ? with respect to {J^ share a common Link. A n example of Voronoi and Dirichlet tessellations are shown in figure 3.8. Further discussion of Dirichlet and Voronoi tessellations can be found in Ahuja and Schacter (1983). It can be seen that the partition created by connecting the points in the hexagonal sampling lattice to their nearest neighbours is a Dirichlet tessellation and as such, the regions created by the partition are non-overlapping triangles. We will denote this particular tessellation by D ^ . Definition P- Adjacency Two points are defined to be P-Adjacent if they are vertices of a partition P and share a common Link. Definition Adjacency Conserving Partition Mapping (ACPM) A mapping, T ^ x l - X ^ } , is termed an A C P M if it takes a partition having vertex set {Xgl into a partition having vertex set {1^ such that the points in {xl have the same P-adjacency as their images in f £ g } . As a result of the preservation of adjacency properties, a region in a partition has the same number of Links as the corresponding region in its image under an A C P M . Also, each vertex has the same number of links as its image under an A C P M . Hence, since D ^ has triangular regions, the regions of any A C P M , P , of D ^ are also triangular (although possibly overlapping). Note also that the inverse of an A C P M is itself an A C P M . Yoronoi Dirichlet F I G U R E 3.8 The Voronoi and Dirichlet tessellations for a set of points Definition Generalized Hexagonal Tessellation (GHT) A tessellation created by applying an A C P M to the tessellation D ^ is called a Generalized Hexagonal Tessellation or G H T . Al l vertices of a G H T are the junction of six Links. As was said at the beginning of this section, once we have a mapping, T, between the points in {Xg} and the points in ij^, we can determine the mapping function, 7(x), for x, not necessarily a member of {Xg}, by interpolation of the T values. The fact that the regions of P x are triangular suggests that, given a point x in the interior of one of these regions, we should let £=7(x) be some linear combination of the (known) values of 7(x) at the vertices of this region. Such a linear combination can be written as: 83 7(x) = | r ( x . ) I ( x ^ ( i ) ^ ( i + 1 )^( i + 2)j) (3.6.15) where V(x) = (xi,x2,x3) is the vertex set of the region of Px containing x and where (i)3 denotes (i)modulo(3)+l. The function I(x,x,.\ Jc,- , j( r , ^ ) is some interpolation function which results in an invertible 7 . The simplest such interpolation that we can do between three non-collinear points is a Trilinear Interpolation which fits a planar (vector valued) surface to these three points. The interpolation function for this method is given in equation (3.6.16). This equation describes a plane passing through the points (xAx^ .rjXxj'.xAO), and (xj'.x^O). I(x^1 >x2^3) = [x'(xVx23) + x2(x1 3-x1 2) + (x 2 3x' 2-x 2 2x 1 3)]/A (3.6.16) where A = xI 1(x2 2-x2 3) - xMxVx1,) + (x 1 2x 2 3-x I 3x 2 2) (3.6.17) and x = (x^x2). We can now state the following theorem which supplies the conditions under which a given one—to-one point set mapping will yield the one-to-one and continuous mapping (transformation) function that is required for equation (3.6.11) to be valid. Theorem 3.6 Invertibility of a mapping Given the Dirichlet tessellation D h on ~\ as defined above, and the trilinear interpolation defined by equation (3.6.16) then, if there is an ACPM, r ~ \ such that the image of D ^ Px is a G H T and the points in the set V(x) are not collinear, the mapping 7(x) defined by (3.6.15) is one-to-one and continuous. 84 The key condition in Theorem 3.6 is that T~ be an A C P M , or conversely, that P , the image of D ^ , be a G H T . Thus, in order to perform the reconstruction of a function for a given sample set Ix J, we need to first find the tessellation P v whose image under the O A mapping T is the hexagonal lattice tessellation, D ^ . The regions of P x all must have three sides and the vertices of P x must be the junction of six Links. It is suspected that it is not possible to create a G H T from all sets of points }xg{. We have no proof of this conjecture, but it can be seen that algorithms for creating G H T s run into trouble trying to order large numbers of points in some cases. For example, consider the set of points shown in figure 3.9a. This set made up of points that lie along twelve radial lines. This type of sample set is found in X - r a y tomography (e.g. see figure 3 of Pan and Kak, 1983). We try to create a G H T from this set by trying to map points in it to by following the ordering of points in { £ g } given in figure 3.7. The result of this process is shown in figure 3.9b. As we proceed the regions of the tessellation become thinner and less regular. It turns out that, using our construction, beyond a certain point no points can be mapped without creating an overlapping region or a region with collinear vertices. This, of course, does not constitute a valid proof of our conjecture as there are a number of other ways of trying to construct a G H T from this set of points; one of which may work. However, this example does point out the problems involved in performing the mapping operation. Even if a G H T could be found the thinness of the regions of the G H T so found would cause problems with the interpolation process. However, if we truncate the reconstruction formula (3.6.12) to only take into account those samples in a finite region about x, then it would not be necessary for y to be one-to-one and continuous everywhere. The mapping function need be one-to-one and continuous only over this restricted region. This would mean that we would need only to find a mapping function that created a partition that was only locally a G H T . In this way it is 85 b) An attempt to create a GHT from {Xg}. expected that a mapping could be found for any set {xg}. For example, figure 3.9c shows a few of these local GHTs defined on the sample set of figure 3.9a. The price we pay for 86 this weakening of Theorem 3.5 is that the reconstruction is no longer exact, even if h(£ ) is suitably bandlimited. In practice such a truncation of the reconstruction equation is unavoidable as one can only process a finite number of samples in a finite time. Let the finite set of sample points used to reconstruct f(x) be }xs}0e{xs}. Note that for different reconstruction points x we may have different sets {xg}0. When only a finite number of terms are used in equation (3.6.12) the resulting value of f(x0) will not be exact but will be subject to a 'truncation' error term. This truncation error is defined in equation (3.6.18) and can be bounded as shown in equation (3.6.19). e2t(x) = |f(x>fR(x)|> = |S ftxs) (wV3)J,(|7(xh7(xs)|)/(Tr|7(x>-7(xs)|)|J (3.6.18) In finding this bound we have used the Triangle Inequality and the fact that | Jj(x)| e(e). The corresponding discrete probability mass functions will be indicated by a ' * ' over the p's. Section 4.7 will present a discussion of how the various sources of error affect the performance of the simplified multi-resolution matching algorithm. 106 4.2 - Effect of Sensor Noise on Feature Position In this section we examine the following problem. Suppose we have a white Gaussian signal process that is V 2 G filtered (with a filter scale constant o^) yielding a random function, f(x). Let us define the position of an arbitrary feature (extremum point or zero crossing) of f(x) to be XQ. Thus f(xQ) = 0. Now suppose we add to the original signal a noise signal, which has a Gaussian distribution, so that the output of the filter is given by fn(x) = f(x) + n(x). Let the variance of the unfiltered noise signal n(x) be denoted and the variance of the unfiltered signal be of. Let the position of the feature of fn(x) that corresponds to XQ be given by x c . Correspondence in this case is defined as follows: lim |x - x J = 0 (4.2.1) Note that x n and x lie on corresponding feature contours. That is: lim D(CN-C0) = 0 (4.2.2) where CFI is the feature contour through the point x*c and CQ is the feature contour through the point XQ. D is some metric measuring the distance between two feature contours. Now define the point x*n to be the point on CFI that is the closest to XQ along the horizontal line through XQ. The relationship between XQ, Xc, x*n, CN and CQ is shown in figure 4.2. Note that, in general x n and XQ are not corresponding points, as defined by equation (4.2.1). Now suppose we were trying to match the features of f(x) with those of fn(x). If there was no added noise, then the measured disparity would be zero everywhere (we assume that the correspondence problem can be solved). Now, if we add some noise, the horizontal shift, or horizontal disparity, between corresponding features (which is what is measured by most stereo matchers) is a random variable equal to the distance between points x*n and XQ. Thus we can write the feature position error, e as follows: 107 FIGURE 4.2 The perturbation of feature contours by additive noise. e p = (xn-x 0>(l,0) (4.2.3) Our goal in the remainder of this section is to derive an expression for the probability density function of e . A one dimensional version of this problem was analyzed by Lunscher (1983). However, the noise model he used was unecessarily restrictive (as he assumed the noise to be constant while the signal was assumed linear). Wiejak (1983) discusses the probability of a zero crossing changing sign or of a new zero crossing being formed due to the addition of noise. Nishihara (1983) also analyzes this problem. In (Nishihara, 1982) is presented experimental evidence for the effects of noise on the zero crossings and no analysis is performed. However, in this thesis we are concerned only with the perturbation of zero crossings due to noise, as the appearance or disappearance of zero crossings will affect only the. matching statistics and not the accuracy of the disparity measurements themselves. Zero Crossing Position Errors 108 We assume that, near the unperturbed zero crossing point XQ, the signal function f(x) and the noise function n(x) can both be approximated by a plane. We can then write f(x) as follows: f(x) = (a/c,b/c)-(x-x0) (4.2.4) where (a,b,c) is the unit normal vector of the signal plane and x : y indicates the dot product operation. Similarly, the noise function can be approximated by: nbn>cn) is the unit normal vector of the noise function plane. We will take, without loss of generality, XQ = (0,0). The function obtained by adding the noise to f(x) can also be approximated by a plane, and can be defined by: n(x) + f(x) = (a/c+an/cn,b/c + bn/cn)-(x) - (x n 0)-(a n/c n ,b n/c n) (4.2.6) One can now see that xn, the horizontal position of the perturbed zero crossing, is equal to the position error, ep, and is given by: e p = ^ V S ^ + V ^ <4-17) Thus the error is seen to be proportional to the slope of the noise function and inversely proportional to the signal slope. These proportionalities were also noted by Nishihara (1983). We need to find an expression for the joint probability density of k, = x n , k2 = (a/c) and k 3=(a n/c n). Since f(x) and n(x) are uncorrelated we have that: p(k1,k2,k3) = pp(k2) pn(ki,k3) (4.2.8) 109 Also, we can assume that XRQ will be independent of the slope of the noise function. Thus we can write: pCkiXk, ) = P f (k 2 ) P x n 0 (k, ) p R 3(k 3) (4.2.9) The probability density of xFIQ is assumed to be uniform over a range 1/R, where R is the expected zero crossing rate of a horizontal slice of the random functions f(x) and n(x). Rice (1945) has derived the following expression for R, in terms of the autocovariance function, ^ ^ ( T ) of a horizontal slice of n(x). His result is: R = [->//ld"(0)/V/ld(0)]1/2/7r (4.2.10) The autocovariance function is derived in the appendix and we have: R = (l/of) i/(31/22) (4.2.11) where is the space constant of the V 2 G filter. With regard to k2 and k3 it can be seen that k2 = 3f(x)/9xi and k3 — 9n(x)/9xi. The probability densities of k2 and k3 can then be shown to be as follows (Rice, 1945): P f(k 2) = [-27r^f11(0)] 1 / 2 exp(k2V^f1](0)) (4.2.12) and P f (k 3 ) = [-27rV/n11(0)] 1 / 2 exp(k3Vv//nI1(0)) (4.2.13) 110 where \p^n(r) = d1/dri2[\p^r)], being the autocovariance function of f(x). Likewise, \l>nn(T) — 9V9TI2[V/>- (T)], ^ ( 7 ) being the autocovariance function of n(x). We can now write: p(k„k 2,k 3) = [ R / ( 4 j r i / t y f » ( < W n " ( 0 ) ) ] exp(-[k22/i//fI1(0) + k32/i//n11(0)]) |k,|l/R Now, from equation 4.2.7 we have that e = k]k3/(k2 + k3). The random variable k4 = k2 + k3 has a Gaussian distribution, with variance [\t'fn(0)+\//n11(0)]/2. The random variable kj = k 3 /k 4 is shown by Miller (1964, p50, equation 2.4.1) to have the following density: p k 5(u) = 2|W|1/2(W22U2 + W 1 1)/ 7r[(W 2 2u 2 + W11)3-4W 1 2 2u 2] (4.2.15) where W is the inverse of the covariance matrix of the random variables k4 and k3. It can be shown that: Wa2 2/s2 -2/s 2 -2/s 2 2(l + s2)/s2 (4.2.16) where we have set -\J/nn(0) = o 2 and defined s2 = \//^l(0)/\//n11(0) to be the 'signal to noise ratio' or SNR. Equation (4.2.14) for p ^ reduces to: pk 5(u) = S[U2(1 + S2)+1]/TT([U2(1 + S2) + 1]2-4U2) (4.2.17) The position error e is seen to be the product of the two random variables ki and k5. Since k, is uniformly distributed from -1 /R to l /R , we can write (following Miller, 1964, p48, in his proof of Theorem 4): I l l Pep(e) = RJ Pk5(u)/u du (4.2.18) "Re This integral can be evaluated (substitute v = u2 and use integrals #109 and #120 of the CRC Standard Mathematical Tables) to yield: p (e) = R/4 - R/(27r)tan 1[{(l + s l)J(Re)J + (s2-l)}/2s] + Rs/(47r)log|(l + s2)2(Re)4 + 2(s2-l)(Re)2 + l| - Rs/(27r)log|(l + s2)(Re)2| (4.2.19) In figure 4.3 we plot Pep(ep) Tor o^=l, for a number of different values of s2. Notice that, as the SNR increases, the distribution shifts to smaller error values. Figure 4.4 depicts the case of s=l, for a number of different filter o's. Notice that, as the filter a increases, the distribution expands towards higher error values. It can be seen that, for large e, p(e) approaches l/(2R(l + s2)e2). Thus e2p(e) vanishes slower than 1/e as e goes to °°. This means that the variance of p(e) is undefined, (but see section 5.5 for the definition of a 'pseudo-variance'). Extremal Point Position Errors The analysis for the extremum feature case is the same as for the zero crossing feature case. However, in this case the actual value of R is different It can be shown that (see Rice, 1945 for the derivation): Rextremum = Q SNR • = .5 o =1.0 A = 2.0 + =4.0 Zero Crossing Shift FIGURE 4.3 The probability density function of zero crossing position error, for a f = l , and SNR = .5, 1., 2., and 4. 0 1 2 3 4 5 Zero Crossing Shift FIGURE 4.4 The probability density function of zero crossing position error, for SNR=1, and o f = .5, 1., 2. and 4. 113 4.4 we can see that, as decreases, the probability density function of the position error becomes compressed towards smaller error values. From this we conclude that the position error of the extremum features will always be less than that for the zero crossing features. Quantized Position Error Compulation. To find the discrete probability mass function, ppr.(n) we can use equation (4.1.1). The ep resulting integral is not amenable to simple analytical integration techniques so we must rely on numerical integration. To compute the following graphs we used Lyness' (Lyness) SQUANK integration algorithm. The special case of n = 0 gives us the probability of a zero pixel error. This probability is plotted in figure 4.5, along with the probability of 'a 1, 2 or 3 pixel error as a function of the signal to noise ratio, given that q = l V 2 , and or=l. It should be again pointed out that the analysis presented in this chapter is valid for relatively small noise levels (i.e. small SNR). In addition to the noise causing a perturbation in the position of the true (physically significant) features, an excessively high noise level will cause the creation of features that are entirely noise related and have no physical significance. Furthermore, the true features will begin to be broken up, and may vanish entirely. 114 O Signal to Noise Ratio FIGURE 4.5 Probability of an n pixel error, for n=0, 1, and 2, given that q = l//2 and o f=l, as a function of the SNR. 115 4.3 - Analysis of Disparity Measurement Errors due to Filtering In this section we examine the errors in the measured disparity that result from the fact that we are spatially filtering the images with the V 2 G filter. This process has not been described elsewhere in the literature. We will initially examine the one dimensional case and then discuss the extension to two dimensions. We will use in the analysis of the filtering error the concept of the scale space transform (SST) introduced in chapter 2.1. The use of the SST will allow us to find the relationship between the corresponding zero crossings of the two stereo V 2 G filtered images. One Dimensional Filtering Let us consider a situation wherein we are viewing a surface that gives rise to a disparity function of the form: It can be seen that if the left eye sees a light intensity pattern g(x) and the right eye a light intensity pattern f(x), then g(x) and f(x) are related as follows: Note that this equation is only approximately true, as the observed intensity of a scene point generally depends on the angle between the surface normal vector and the view direction. However, for parallel cameras with large focal lengths the difference between view directions for the two cameras are usually fairly small, so that the difference in observed image intensity will be small. Highly specular surfaces are quite troublesome in this regard, since the observed image intensity is very sensitive to the angle of observation. In order to analyze such cases we must use an equation such as g(x^)h(x^,n(x,y)) = f(xR), where h is some function of position and of the surface normal vector n . If we assume that (4.3.2) is valid d(xL) = x R - x L = 0o + p \ x L (4.3.1) g(xL) = f(xR) (4.3.2) 116 we can write, using (4.3.1): g(xL) = f(/3„ + (l + p\)xL) (4.3.3) The SST of f(x) is given by equation (2.1.2) and is repeated below: F(x,a) = dVdx J f*Jf(u)aV(/2T)] e ~ ( x - u ) V 2 a 2 du (4.3.4) The SST of g(x) is obtained by substituting (4.3.3) into (4.3.4) to yield: G(x,a) = dVdx 2/" o o[f(/3 0 + (l + / 3 1 )u )aV( /2^ ) ]e " ( x ~ u ) 2 / 2 a 2 du (4.3.5) letting v = j30 + (l + j3i)u we obtain: G(x,a) = (l + /31)2dVd((l + /3 1)x) 2;r a 3[f(v)aV((l + /31Vr2^r)] e - (x( l + p\) + pVv)V2((l + p\)a)2 d v ( 4 1 6 ) Thus the SSTs of the left and right images are related as follows: G(x,a) = (l + /3ir 1F(^o + (l + |31)x,(l + /31)a) (4.3.7) or F(x,a) = (l + p\)G((x-p\)/(l + p\),a/(l + p\)) (4-3.8) Thus, for /30 = 0, the right hand scale map is obtained from the left hand scale map by a uniform expansion in both the x and a directions by a factor of 1/(1 + /3i). An example of this can be seen in figure 4.6 which shows the left and right scale maps of a randomly 117 FIGURE 4.6 The left and right scale maps of a randomly textured, tilted surface with a disparity gradient of -60/255. textured surface with j3j =-60/255. The fact that the left and right scale maps are scaled versions of each other are readily apparent Now, consider the idealized case in which we can match the corresponding zero crossings between the left and right images with no error whatsoever. Let us define the measured disparity as follows: V x L ' a ) = x R ( ° ) - x L < a ) ( 4 - 1 9 ) The variables x^(o) and xR(a) are the positions of the corresponding zero crossings in the left and right images, measured with filter resolution a. The error in the disparity measurement at a resolution a is given by: e d ( x L ' a ) = dr^x\Jo)~d^l) = xR(ahx L(a)-/3o-0.x L(a) (4.3.10) 118 Let us denote the track, in scale space, of the single zero crossing feature (defined by G(x,o) = 0), that passes through the point (x^,a0), by the functional representation: a = C(x) with inverse x = C _ 1 ( a ) (4.3.11) Note that C(x) is a monotonic function, as the point where dC/dx is zero is the point where two distinct features merge and does not strictiy belong to the contour (see Yuille and Poggio, 1983a for more on this matter). Let us assume that the zero crossing locations are measured at a resolution o = o0. Then we have that C(x^) = o 0 . From equation (4.3.8) it can be shown that, since F(xR ,a o) = 0, and since the points (xR,a 0) and (x^,o0) lie on corresponding zero crossing contours, C((x R-/3o)/(l + /3i)) = a , / ( l + p\) (4.3.12) This is shown diagrammatically in figure 4.7. We can rewrite the disparity error in terms of C~*(a) as follows: ed(xL,a 0) = (l + j3,)Cr1(ao/(l + /3 , ) ) -x L - /3 1 x L (4.3.13) Note that the disparity error is independent of pY This is what we would expect since translation of a function does not affect the magnitude of its frequency spectra. Let us set e d to zero in equation (4.3.13) and solve for a=C z e(x). This will define the family of zero disparity error loci. The significance of this family of curves lies in the fact that, if the scale map contours do not belong to this family, there will be a non-zero disparity measurement error. Setting e^ to zero and rearranging (4.3.13) we obtain, 119 0" FIGURE 4.7 The relationship between the left and right scale maps of a tilted surface. x = C_ 1z e(a 0/(1 + 13 0) = (xL+p'1xL)/(l + p\) = xL (4.3.14) This equation defines a family of vertical lines. If we substitute (4.3.14) into (4.3.13), we obtain the following relationship, ed(a) = (a./a)[CT1(a)-Cr1ze(a)] (4.3.15) 120 Therefore we conclude that the disparity error is a Jo times the horizontal difference between the scale space zero crossing contour through the point (x^,a0) and the zero error line, x = x^, measured at o = a o / ( l + 0i). An example of this is shown in figure 4.7, which illustrates how one can determine the disparity error with a graphical construction provided one has the scale map of the left hand image. The important point to be noted in the above discussion is that, for there to be zero disparity measurement error, all of the scale map contours of the image function must all belong to the family of zero error lines. In general, this is not the case, and we must conclude that one can, in general, expect to obtain zero disparity measurement error only when one is viewing a surface with constant disparity (/3i=0) or when the'zero crossing measurements are made at a resolution of o 0 = 0. In (Clark and Lawrence, 1984c) is a derivation of the disparity measurement error for —fx—x Y /2 f(x)=e v 0 / , for the case of zero crossing features. They show that the error in this case increases as o increases and as |0,| increases. We will now provide an approximation for the probability density function of the disparity measurement error for f(x) a Gaussian white random process. Zero Crossing Features It turns out that it is not possible to determine an exact expression for the probability density function Pe d(e (j) due to the complex, non-stationary nature of the SST, F(x,a), which is a two dimensional random process. We can, however, derive an approximation to Pecj(e(j), that is valid for small values of fiu using the following procedure. Consider the situation shown in figure 4.8, which depicts a zero crossing contour of a particular realization of the random process F(x,a). Note that for small A a the zero crossing contour is approximately straight and forms an angle with the o axis. The error, e^ in this case is given simply by: cr 121 F(x, to the discrete probability mass function, p e d , using equation (4.1.1). Doing so, we get: p e d(n) = -(l/7T)[tan_ 1((n+ l/2)q/p\) - tan" 1((n-l/2)q/p\)] (4.3.43) The probability of having a zero pixel error (Ped(0)) is given by: 127 Ped(0) = -(2/7r)tan-1(q/(2/31)) (4.3.44) Note that P e d(n) is independent of the resolution of the filter. This is due to the fact that as a increases, the size of the pixels (qa) increases as well. In figure 4.10 we plot P e d(n) as a function of the disparity gradient for n = 0, 1, 2, and 3, with q = 1V2. Extremum Features We can obtain the probability density function of the extremum position error by taking the covariance matrix of (4.3.24) to be: B» -^ ( 6 ) (0) 0 0 tfw(0) (4.3.45) B 1 2 T = 0//4)(0), 0 ) (4.3.46) B 2 2 = -*"(0) (4.3.47) Thus we get, for B defined as in (4.3.24) and (4.3.29): a2 = -^"(0)/[i//"(0)V/(6)(0)-(^(4)(0))2] (4.3.48) and b2 = lA//(%) (4.3.49) Apart from these changes the rest of the derivation is the same as for the zero crossing feature case. Thus we have that: 128 Disparity Gradient FIGURE 4.10 The probability of an n pixel disparity measurement error as a function of the disparity gradient, given q=lV2, for n = 0, 1, 2, and 3. ^ e d(e d) = k/(7r[ed 2 + k']) (4.3.50) where k is as defined by (4.3.33). Using equation (4.3.38) to obtain the required derivatives we get: k = a,0, (4.3.51) This is the same result as for the zero crossing case. Thus we conclude that, for the one dimensional case, the effect of filtering on the disparity measurement is the same for extremum features as for zero crossing features. Quantized Disparity Measurement Error for Extremum Features It is clear that the discrete probability mass function for the extremum feature case is the same as that of the zero crossing case and is therefore given by equation (4.3.43). 129 Nonlinear Disparity Functions In general, the disparity functions that one obtains from real world surfaces will not be of the linear form assumed above. If the disparity function is non-linear, a relationship between the left and right eye SSTs such as that in equation (4.3.7) can not be found. This being so, we can not expect to be able to find an expression, analogous to (4.3.14), for the zero disparity error curves. The best we can do in such a case is to linearize the disparity function about a point at which we wish to obtain a value for the disparity measurement error. If we expand d(x) in a Taylor's Series about a point x0 we have, d(x) = L d ( n ) r f (x„ ) /dx ( n ) ( x - x „ ) n (4.3.52) n=o We can linearize by taking the first two terms of this expansion. This linearization will be valid (i.e. the approximation error less than a certain amount) only in a small neighbourhood about x0. We will assume that this linearization will be valid. We then say that, d(\) = d(x0) + dc?(x„)/dx (x-x0) = /UxO+x/S^x,,) (4.3.53) where /31 = dof(x0)/dx and i30 = rf(x0)-Xo/31(Xo). This linearization then allows the computation of the disparity measurement error. Extension to the Two Dimensional Case. Now let us consider a two dimensional surface that, when viewed binocularly, gives rise to a linear disparity function of the form (4.3.1). Then, if the left eye sees a light intensity pattern g(xL,y) and the right eye an intensity pattern given by f(xR,y), then g(x,y) and f(x,y) are related as follows: 130 g(xL,y) = f(xR,y) (4.3.54) Given the disparity function (4.3.1) we can show that: g(x,y) = fU30 + (H-p\)x,y) (4.3.55) Now let us define the two dimensional Scale Space Transform (2DSST) as follows. F(x,y,o,e) = [e 232/3x2 + 32/3y2] ;/:00f(u1,u2)aV(27re)e-[(x-u')Ve2 + (>'-U2^/2a2du1duJ (4.3.56) Note that this is a generalization of the two dimensional transform of Yuille and Poggio who defined: F(x,y,a,e) = V2j;"00f(u1,u2)a2/(27Te)e"t(x_Ui:)2 + (y"U2)2j/2a2du1du2 (4.3.57) The reason for this generalization is to facilitate the description of the relation between the transforms of the left and right images. Since the disparity gradient is in the x-direction only, the effect of having a non-constant disparity is to skew the 2DSST in the (x,y,o\e) space as we go from the left eye to the right eye. This is why we need an extra scaling parameter, the skew factor e, in our definition of the 2DSST. Let us now derive the relationship between G(x,y,a,e) and F(x,y,o,c). Using (4.3.62) we can write: G(x,y,o,e) = [e 232/3x2 + 32/3y2] SSZjWo + (l + P 1)u1>u2)a2/(27r e )e~ [(x~ U l )'+ ( y _ U 2 ) 2 ] /2a2dUldu2 (4.3.58) 131 Letting Vi = u 1(l + p,i) + j30 and v2 = u 2 we obtain: G(x,y,a,e) = [e 2 9 2 /9x 2 + 9 2/9y 2] r f ! . f ( v , v , ) a V ( 2 , ^ d v ^ v . / d + ^ O (4.3.59) Now, if we define x = x(l + /30 + pY and e = e(l + /3j) then we can rewrite the above equation as: G(x,y,a,e) = [e 29 2/9 x2 + 9 2/9y 2] ; ; : 0 0 f ( v 1 , v 2 ) a 2 / ( 2 7 r e ) e - [ ( x - V l ) 2 / & 2 + ( y - V 2 « / 2 c ; 2 d v 1 d v 2 (4.3.60) = F(x,y,o,e) = F(x(l + /31) + /30,y,a,e(l + /3,)) (4.3.61) and conversely: F(x,y,a,e) = G((x-p\)/(l + p\),y,CT,e/(l + p\)) (4.3.62) Now let us define a two dimensional function, by holding o and y to be constant, as follows: F(x,e) = F(x,y o,0 o ,O (4.3.63) We will call this transform, for lack of a better term, the Skew Space Transform (SKST) of the two dimensional function f(x). This transform is parametrized by the y0 and o0 values, and is a planar slice through the four dimensional SST defined by (4.3.56). We can now describe the relation between the left and right skew maps (analogous to the scale maps described earlier). We have: 132 F(x,e) = g(x/(l + /31),e/(l + /31)) (4.3.64) where we have set p\ to zero for reasons of clarity. Now, let e = 1. This then gives us the case of V 2 G filtering. The points x L and x R for which F(x R , l ) = 0 and G(x L , l) = 0 define the locations of zero crossings of the V 2 G filtered image along y = y0 for a filter space constant of a 0 . It is evident that the analysis of the two dimensional disparity measurement error will be similar to the one dimensional analysis, with F replacing F and e replacing a. In particular we can write (following (4.3.15)): where x=C (e) is the track in (x,e) space (skew space) of the zero crossing of G(x,e) that passes through (x^,l), and C ~ * z e ( e ) is the zero error line given simply by x = x^. In figure 4.11 we plot the skew maps of F(x,e) = 0 and G(x,e) = 0, for a surface with /3! =—60/255 for a = 2. Compare these maps with the scale space maps of the one dimensionally filtered image shown in figure 4.6. As in the one dimensional case we can obtain an approximate expression for the probability density function of the disparity measurement error, given that g(x) is a zero mean white gaussian process. We make the assumption that Ae =-/3 1/(l + /31) is small, so that the zero crossing through the point (x^.l) is approximately straight, forming an angle 0. The error, eH in this case is given simply by: Now (xi-xT) can be expressed in terms of 0 and j3] to give the following expression for e^l/G + flO) = (l + p\)[C \l/(l + ^ )yc ^(1/(1+ /3 0)] (4.3.65) e d = (x!-x L )/ei = (xj-XoXl + ^ O (4.3.66) 133 Skew map. Disparity Gradient = 0, sigma = 2 Skew map, Disparity Gradient=-60, sigma = 2 I T : : : 1 1 FIGURE 4.11 The left and right skew maps obtained from a randomly textured surface with a horizontal disparity gradient of -60/255 with a =2. (4.3.67) where M=tan() is the slope of the line perpendicular to the zero crossing contour at (xL,l). The gradient vector, 77 = (77i,7}2)=VF(x,e) measured at (xL,l) is also perpendicular to the zero crossing line. Thus we have that: M = Th / r j , (4.3.68) and e d = /3I(T?2/T?1)/(1 + P\) (4.3.69) Now our problem reduces to finding the probability density function of the above function of 134 the two random variables 77 j and 7? 2 given that F (x L > l ) = 0. The solution of this problem, however, is quite a bit more complex than was the case in the one dimensional analysis. The reason for this is that the differential equation relating the e derivatives to the x derivatives is not as simple as equation (4.3.20). It can be shown (by performing the differentiation with respect to y in (4.3.56)) that F(x,e) can be defined as: F(x,e) = € J3V9x 2H,(x,e) + H2(x,e) (4.3.70) where H,(x,e) = sZMv)oAe/2*) e - ( x - u ) V 2 e 2 a ° 2 d u (4.3.71) H,(x,e) = SZMu)Oo/(e/3x) e ~ ( x _ u ) 2 / 2 e 2 a ( ) 2 d u (4.3.72) and where hi and h2 are defined as follows: hi(u) = SZjMoJ{\TH) e " ( y o " v ) 2 / 2 C T ° 2 d v (4.3.73) h>(") = /r o o f(u,v)/(a„/2i)[(y 0 -v)Va„ 2 -l] e _ ( y - v ) 2 / 2 f f 2 d v (4.3.74) We can now see that: 9F79x = e 2 9 3 H,/9x 3 + 9H 2 /3x = i?, when e = l (4.3.75) 135 9F/9e = 2e92H,/9x2 + e 29 3H,/9x29e + 9H2/9e = r)2 when e = l (4.3.76) It can be shown that (Yuille and Poggio, 1983a), since the Gaussian function is the Green's function for the diffusion equation, and H , and H 2 are the result of convolving some function with the Gaussian, then H , and H 2 are solutions of the diffusion equation. With the boundary conditions on FL and H 2 as described by Yuille and Poggio (appendix, Yuille and Poggio, 1983a) it can be seen that: 9 2 H,/9x 2 = ( l /c )9H 1 /9e (4.3.77) 9 2 H 2 /9x 2 = ( l /e)9H 2 /9e (4.3.78) Therefore we can write rj 2, in terms of x derivatives only, as follows: r?2 = 2e9 2 H 1 /9x 2 + e 3 9 4 Hj/9x 4 + e9 2 H 2 /9x 2 when e = l (4.3.79) Let a = ( T j j , r j2, £ ) . The covariance matrix of a is given by E ( a a T ) . Where E(-) indicates the matrix expectation operator. Using the procedure of Rice (Rice, 1945) it can be shown that: E(rH) = -^i ( 6 ) (0) + 2^ 1 2 ( 4 )(0)-i// 2 ( 2 )(0) (4.3.80) Efai) = -4^ ] ( 4 \0 )+a 2 ^ 1 ( 8 \0 ) + a 2 » / / 2 ( 4 \ 0 ) + 4a^ 1 ( 6 )(0) + 2 a 2 » / / 1 2 ( 6 ) ( 0 ) (4.3.81) mi) = Vi ( 4 ) (0) + 2 « / / 1 2 ( 2 ) ( 0 ) + ^ 2 ( 0 ) (4.3.82) 136 E(7?i7h) = 0 (4.3.83) E(T?,0 = 0 (4.3.84) Efa,$) = 2 ^ / % ) + a« / / I ( 6 ) (0 ) + 2a^ 1 2 ( 4 )(0) + 2^ n ( 2 )(0) + ai//2(2 )(0) " (4.3.85) In the above i//2(r) is the autocovariance of Hi , I / / 2 ( T ) that of H 2 and ^ I 2 ( T ) is the cross-covariance function of H} and H 2 . If we partition the covariance matrix of a as shown in equation (4.3.25), we can see that the joint probability density function of rji and 772 given that £ =0 is given by: B 1/a2 0 0 l/b! (4.3.86) where 1/a2 = E(T?0 (4.3.87) and 1/b2 = E f a D - E O ^ V E t t 1 ) (4.3.88) The covariance functions are derived in the Appendix, as well as formulae for their derivatives at zero. Inserting these values into the above equations we get: 1/a2 = 18.25v/ir/a5 (4.3.89) 137 1/b2 = [1.579a4 + 4.345a2 + 16.11]i/7T/o5 (4.3.90) We have that: PC(T?2,T?,|$=0) = (ab/27r) e ( a ^ l J + b 2 7 ? 2 J ) / 2 (4.3.91) If we let M =773/77, we get: P M ( M 7 h ) = (ab/27T)7 ? 1e7 ? l 2 ( a 2 + b 2 M 2 ) / 2 (4.3.92) Integrating out the dependence on 771 gives us: P^(M) = a/[7rb(M2 + a2/b2)] (4.3.93) Making the transformation e^ = -p\ju we get: P e d ^ = k/[7r(ed2 + k2)] (4.3.94) where: k = -/3,a/b = -f3lV/(0.881 + 0.238a2 + 0.0864a4) (4.3.95) As in the one dimensional case the error has a Cauchy distribution. In figure 4.12 we plot Pe(j(e) for a range of /3] values, holding a = 1. In figure 4.13, we plot p e d for a range of a values, holding $1 = -0.1. Notice that p e d is relatively insensitive to the value of a over quite a large range of a values. Recall that in the one-dimensional case, p . was not a function of a at all. FIGURE 4.12 The probability density of disparity measurement error for zero crossing features, o = l , p\ = -0.1 to -0.4. Zero C Feature Shift FIGURE 4.13 The probability density of disparity measurement error for zero crossing features, p\ = -0.1, a = 1, to 4. 139 Quantized Zero Crossing Disparity Measurement It can be seen that the form of the quantized disparity measurement error probability mass function is the same as for the one-dimensional case. The form of k is different however, as seen above. In figure 4.14 we plot Pecj(n) as a function of the disparity gradient for n = 0, 1, 2 and 3 with q = l/\/2, and o = 1. In figure 4.15 we plot p £ ( j as a function of o for n = 0, 1, 2 and 3, holding p\=-0.1 and q = 1//2. Note that these probabilities are relatively insensitive to changes in o over a large range of o values. This is due to the fact that the size of the pixels increase linearly with a. Extremum Features The probability density function of the disparity measurement error for the extremum feature case has the same form (equation 4.3.94) as in the zero crossing feature case. The matrix B that is used is different, however, due to the difference in the autocovariance functions in the two cases. As in the one dimensional case the autocovariances in the extremum and zero crossing cases can be seen to be related by: extremum zero crossing (4.3.96) We can then calculate the new values of 1/a2 and 1/b2. Doing so we obtain l /a : = 7035,/Tr/(64a7) (4.3.97) and 1/b2 = / rr [ a4(2.695) + o 2(20.87) + (99.49)] lo9 (4.3.98) Hence o *-w N value D = 0 O = 1 A = 2 + = 3 -0 .1 -0.2 -0.3 - 0 . 4 Disparity Gradient for ZC Features -0.5 FIGURE 4.14 The probability of an N pixel error for zero crossing features as function of p\ for a = l, q = I V 2 and N = 0,1.2 and 3. N value • = 0 o = 1 A = 2 + = 3 -e- -p-4* -e—< i iti it f—1' 1 2 3 4 5 Filter Sigma for ZC Features FIGURE 4.15 The probability of an N pixel error for zero crossing features as function of a for 0, = -0.1, q = 1//2 and N = 0,1,2 and 3. 141 k = (-/?,>/ (0.0245a4 + 0.1898a2 + 0.9051) (4.3.99) We plot the resulting error function /^(e^) for a = l and /3,=-0.1 - > -0.5 in figure 4.16, and in figure 4.17 we plot /^(e^) for a =1,2,3 and 4 with /3;=-0.1 and q = l/j /2. Compare these graphs to the zero crossing feature case; these differ only slightly. Quantized Extremum Disparity Measurement Error The basic expression for the probability mass function />ed(n) is the same as for the zero crossing feature case, except that the expression for k(a) is different, as noted above. In figure 4.18 we plot the probability of getting an N pixel disparity measurement error, when using extremum features, as a function of /3] for a = l , given q = l/v/2. In figure 4.19 we plot the probability of getting an N pixel disparity measurement error, when using extremum features, as a function of a for /3j=-.l, given q = l/|/2. 0.5 1 1.5 Extremum Feature Shift FIGURE 4.16 The probability density function of the disparity measurement error for extremum features for 0j = -O.l to -0.4 with o = l. 0.5 1 1.5 Extremum Feature Shift FIGURE 4.17 The probability density function of the disparity measurement error for extremum features with 01=-0.1 for a = 1,2,3 and 4 with q = l/y/2. o u Ed CU as «0 © O >> — d CO X) O hi C U N value • = 0 o = 1 = 2 + = 3 j 4 fr 9 $L-$I -0.1 -0.2 -0.3 -0.4 Disparity Gradient for EX Features -e—o ^ 1--0.5 FIGURE 4.18 The probability of an N pixel disparity measurement error for extremum features as a function of p\ for a=l and q = l/i/2. N value • = 0 o - 1 = 2 + = 3 -e- -e-•t1 -e—), and the reconstruction is exact. The above conditions on G(w) and F(CJ) for exact reconstruction are illustrated graphically in figure 4.20. There are three situations in which the above reconstruction formula will not be exact The first situation occurs when the spectral repetitions of F(c5) in the spectrum of the sampled function f (x) overlap with the central (i.e. (\u\2) = (0,0)) repetition, as shown in figure 4.21. Some of the energy in the spectral repetitions is added to the energy of the central repetition passed by the filter. This causes an error in the reconstructed value known as aliasing error.6 The second situation in which reconstruction errors arise is when the reconstruction filter, g(x), does not pass all of the central spectral repetition. In other words part of the energy in the function that we are trying to reconstruct is filtered out Sometimes this is not such a bad thing because some of the aliased. energy from the other spectral repetitions may also be filtered out The reconstruction filter can also have too wide a bandwidth, so that even if the function has a high enough sample density to eliminate all aliasing (or overlap of the spectral repetitions), the filter may pass some of the energy in the spectral repetitions anyway. Both of these situations are depicted in figure 4.22. In practice this error, which we call the Filtering Error, can be avoided if the maximum region of support of F(c3) and the sampling lattice is known, as one can then design an appropriate filter. 6The aliasing error derives its name from the fact that a given frequency component of the spectrum of f(x), when replicated, actually becomes a different frequency component Thus these frequency components are actually under an assumed name (frequency) or alias. 147 FIGURE 4.20 The shapes of the regions of support for F(CJ) and G(ZJ) for exact reconstruction of f(x) from its samples. SUPPORT OF • - A L I A S I N G ERROR FIGURE 4.21 Aliasing error in the reconstruction caused by too low a sample density. 148 ^ = FILTER ERROR FIGURE 4.22 The effect of having an improper reconstruction Filter. Note that the central repetition is partly filtered out and that parts of the other repetitions are passed by the filter. The third type of error that can arise, which will be seen to be a form of filtering error, occurs when a finite subset of the points in ixg} are used to perform the reconstruction of a value of f(x). This error is known as truncation error (because it arises when the reconstruction summation is truncated). The effect of limiting the number of sample points used in the reconstruction is equivalent to reconstruction with a filter that is a spatially truncated version of the optimum reconstruction filter. Since the support of this truncated filter in the space domain is bounded, its support in the frequency domain cannot be bounded. \ Hence this truncation error is seen to actually be a filtering error, since all frequencies will be passed to some extent by the filter, including those present in the non-central spectral repetitions of f(x). We can write down an expression that includes the effects of all the above errors on the reconstructed function as follows: 149 e_(x) = f(x>f (x) = f(x) - Zf(5)6[(x-xJ]*g t(x) (4.4.6) where gt(x) is the truncated version of the reconstruction filter. Obviously, in order to minimize the truncation, or filtering error, we should choose the Filter function gt(x) that has the following properties: - gt(x) = 0 outside some region R (i.e. is spatially bounded) - |G(t3)| is a minimum for all u> outside the region fi, where Q, is the region of support of F(CJ). Derivation of the optimal filter under truncation We will now derive an approximation for this optimal (in the sense of minimizing the truncation error) filter. Let us assume that both R and 0 are disk shaped with radii a and b respectively. The total energy in the filter function can be seen to be given simply by: A R = //R|g(x)| 2dS = / ; ! j G ( ^ ) | 2 c £ 3 (4.4.7) The second relation comes from Parseval's theorem. The condition that we minimize the energy in G(c3) for |c3|>b is equivalent to the condition that the energy in G(a>) be maximized for |o>| //(x)dx for |y|(r')dr' 0 n ( r ) = / r R N n ( r ) (4.4.16) and TlSU = v/caN n/(27rjN) (4.4.17) for n,N = 0,1,2... and where we define ^ N , n ( r ' e ) = R N , n ( r ) c o s ( N 9 ) <4 A 1 8> to be the polar representation of ^ N n ( x ) . Slepian also shows that the solution of the the above integral equation is also the solution to the following Sturm-Liouville equation ( 1 - ^ ) 0 ^ - 2 ^ + [(1/4- N 2 )/r 2 -c 2 r 2 + x]tf> = 0 (4.4.19) The bounded solutions of this equation, for arbitrary N are known as the generalized prolate spheroidal Junctions (Slepian, 1964). Bounded solutions occur only for discrete x values, given by X N n where X N n < X N n + 1- One can similarly order the 7 N n (Slepian, 1964, p. 3039) to g i v e 7 N , n + l < 7 N n a n d 7 N + l n < 7 N n - T n u s ^ l a r § e s t eigenvalue of the integral equation (4.4.13) is obtained when N = n = 0. Thus we have that g(x) = ^a Q( x ) = 0o o( | x | ) V | x | (4.4.20) 152 Now it remains to find an expression for 0QQ(I"). In the literature no closed form exists for this function and we can only write down approximations. Slepian (1964) provides a number of approximations to this function that are valid over different ranges of r. For large -1/4 c and small r (|r|00(r) = kVr e ~ r 2 c / 2 L 0 ( 0 ) ( r J c ) for |x|x/a, c->a 2c to give g(x) = k e_ clxl2 / 2 for |x| <(a 2c)~ 1 / 4 (4.4.23) This result is similar to the one obtained by Shanmugan et al (1979), later corrected by Lunscher (1983), who found the one dimensional filter whose step response had maximum energy in the vicinity of the step input, A question arises as to the validity of this approximation beyond the rather restrictive range that it is strictly defined over (i.e. |x| <(a2c)~^ /4). Lunscher (1983) does not say anything about this but apparently assumes that this approximation can be extended to |x|a). Shanmugan et al (1979) cite a paper by Streifer (1969) claiming that he shows that little error is incurred by extending the range of validity of the above approximation. However, Shanmugan et al misread Streifer's remarks. Streifer says that the approximation can be 153 extended in his application because, in his application the prolate spheroidal function is multiplied by a function which decreases rapidly for x>c~^ / 4. Thus, in his application, extending the approximation will not incur much additional error. However, in our application (as well as Lunscher's and that of Shanmugan et al) we must be more careful in extending our approximations. Actually we need not extend the above approximation past its region of validity at all since Slepian (1964, p 3027) provides, in addition to the above approximation, approximations which, taken together, are valid over the range from |x| = c ~ t o |x| =a. These can be paraphrased as follows: g(x) = k 2e~ac(v/a2"lxl 3 )/[(a+v/(a2-|x| 2))(a2-|x|2)1/4] (4.4.24) valid for (a2c)~1/4<|x| (CJ+C3 )] (4.4.27) m s The set lc3s3 is the set of the frequency domain repetitions caused by the sampling (this set is the dual of the spatial sample set defined by equation (3.6.1) as shown by equation (3.6.3)). If the power spectral density of f(x) is sufficiently bandlimited then the minimum mean square error vanishes (note: minimum in this context means that the mean square error produced using the above filter G(c3) is less than or equal to the mean square error produced using any other filter). Petersen and Middleton also derive the minimum mean square reconstruction error uniformly averaged over a sampling cell. This quantity may be useful in the analysis of the reconstruction errors in the non-uniform case, where the size of the sampling cell varies. This averaged mean square reconstruction error is given by: EA{[f(x>fR(x)?} = 1/(4TT2) / /^*fR(x)]2} = K(U) - 2IK(x-x„)g(x-xJ + ZIK(x -x fetf-i" )g(x-ST ) (4.4.29) m %,U& 1 2 1 2 where K(x) is the autocovariance function of f(x). Petersen and Middleton do not, however, provide a general frequency domain representation of the mean square error such as was given for the ideal filter case (4.4.28). They do show that the first two terms of the above expression can be put in terms of frequency domain quantities as follows: K(U) - 2IK(x-xs)g(i-xs) = 1/(4*')//"„,[*(£) - 2(G(£)/Q)Lexp(-jx£ )*(S-5 )]dw ' (4.4.30) S S It is the third term in equation (4.4.29) that they do not provide a frequency domain expression for. Let us call this third term T, for simplicity. We will now derive a frequency domain expression for T. We can expand T in terms of delta functions as follows, using the integral properties of the delta function: T = ////rooK(r?-s)g(x-r)g(x-s)ZI6(r-xSi)5(s-xS2)d?ds (4.4.31) It can be shown (using the result of Appendix A in Petersen and Middleton, 1962) that: ZZ5(r-xSi)5(s-xS2) = (l/Q')ELexp(-jr u^expHsu^) (4.4.32) 157 where {ajg} is the dual set to l!xgj] as given by equation (3.6.3). Thus we can write: T = ( l / Q 2 ) Z I / ; / / " f f i K ( r - s ) g ( x - r ) g ( x - s ) e x p ( - j r ^ S i ) e x p H s ^ S j ) d r d s (4.4.33) If we make the change of variables y = r - x , and z = s - x , and use the fact that g(x) and K(x) are even functions we obtain: T = a ; / / ; " o o K(y-z)g(y)g(z)exp(- jyw S i )exp(-jztJ S 2 )dydz (4.4.34) where we have defined, for simplicity, the operator a to be: a = (l/Q2)LLexp(-jx-(w_ +w„ )) (4.4.35) Separating out the functions that depend only on y gives us: T = a / / " r a g ( y ) e x p ( - j y ^ S i ) [ / / " t t K ( y-z ) g(z ) e x p(-jzicj S 2 ) d z ] d y (4.4.36) The integral in the brackets can be recognized as a convolution. Hence we can write: T = a / / " B , g (y ) e x p ( - j yw S i ) [K (y )»g^ ) e x p ( - j y 5 S j ) ]dy (4.4.37) Replacing the bracketed term by its Fourier transform representation gives: T = a / ( 4 7 r 2 ) ; / " „ g ( y ) e x p ( - j y w S i ) [ / ; " f f i 4 > ( ^ ) G ( ^ - w s p e x p a y w)d3 ] d y (4.4.38) Rearranging this equation gives us: T = a/(47r2)//"w#(w)G(i-wS2)[;/rrog^)exp(-jy^Si)expCyw)dy]dw (4.4.39) 158 Evaluation of the bracketed integral as a Fourier transform results in: T = a/(47rJ)fJ"" * (S)G(w-5 c )G@-c3 )du (4.4.40) Substituting the expression for a yields: T = (1/(4TT2Q 2 ))//" r o £)£Eexp^ (4.4.41) The total mean square error, E, for the general case of aliasing and truncation can now be written, and is done so below: E = l/(47T2);/roa[*(w)-2(G(w)/Q)Iexp(-jx^s)d>(^-ws)] + (4.4.42) (1/Q J)Jj"w*(S>ELexp<-jx-(5 +w ))G<5-5 )G(5-5 )]du This formula is valid for any filter function g(x) and for any random process f(x). We can obtain an expression for the average mean square error over a single sample cell, T, as was done by Petersen and Middleton (1962) in the case of ideal filtering. Let us denote this average mean square error by E . It can be seen that E is given by: E = K(0) - (1/QJEJ J r [2K(x-x s )g(x-x ) + Z Z K ( x g - x g )g(x-x )g(x-x g )]dx (4.4.43) Making a change of variables and noting that the summation of integrals over the elementary cells T is the same as integrating over the entire space allows us to write: 159 K(0) - (2/Q)/;" o oK(x)g(x)cS + (l/Q)/JR2LK(xs-xSj)g(x-xSi)g(x-xSi))dx (4.4.44) Let us define the function r(x) as follows: r(x) = 1, for x e T and r(x) = 0 elsewhere. (4.4.45) Then, using equation (4.4.42) we can write the third term in equation (4.4.29) as: (l/47r 2Q 3)IL;/" o or(x)exp(-jx-(w +w ))dx J f~„(w)G(u-£ )G(S-5 . )du (4.4.46) Recognizing the first integral in the above expression as a Fourier transform allows us to rewrite this as: (l/(47r2Q3))IIR(wS i +uS)S / " 0 0 $ ( w ) G ( w - w S i ) G ( w - w S 2 ) c £ 3 (4.4.47) Petersen and Middleton (1962, Appendix D) show that R(wg) = Q for c3sek3g} = U and is is zero for cj„e{wJ & TJ. This means that, since co„ +a>0 is a member of koj , R(C3„ +CJ„ ) S> o S>1 »2 S S i s2 equal to Q for c3 = - C J and is zero elsewhere. This allows us to rewrite (4.4.47) as: 31 » 2 (l/(4 7r JQ 2))Z;;" o o4'(w)G(^-^ s)G(w+ws)ck3 (4.4.48) Combining this result with equation (4.4.30) gives us the following expression for the general averaged mean square error: E = l/(47r2);/rc=*(^)[l " (2/Q)G(£) + (1/Q J)EG={u>xv>i). Let the filter be given by the approximate filter defined by (4.4.23), except that it extends to °°. That is: We will determine k so that the mean square error is minimized when a g is °° (i.e. when f(x) is constant). Doing this yields: d>(cj) = A2asv/7Texp(cjj2as2/4)5(aj2) (4.4.53) g(x) = k exp(-c|x|2/2) (4.4.54) Thus: (4.4.55) (k/v/(27rc)) = Q/Iexp(-H 2 /c ) (4.4.56) Let us define S as follows: 162 S =;Iexp(-|aJs|Vc) (4.4.57) Then E can be written: E - (A'o^ic)/(4Tt>)SZJe*vHjSos>/4) ~ (2/S)exp(-w,2(l/2c+os2)) + (l/S)exp(-w1 2(l/c+a sV4))]dcj1 (4.4.58) This integral can be evaluated to give: E = A2/(27r)[l + ( l /S) { lV(l + 4/(caS2))-2V(l + 2/(cas2))}] (4.4.59) The RMS error (i/E) is seen to be proportional to A, the amplitude of the function being reconstructed. This dependance can be seen in the experiments described in chapter 5. Notice that E does not go to zero as a g goes to infinity (except for c=0) but approaches 1— 1/S. This is due to the fact that the exponential filter lets in some energy from the non-central spectral repetitions that are created by the sampling. This is an example of Filtering Error which was described earlier in this section (not to be confused with the filtering error described in section 4.3 which refers to the effects of V 2 G filtering). However, as the distance between samples decreases to zero, S goes to one and E goes to zero. As a g approaches zero, E approaches A2/27T for all values of c (except zero). Let us assume that we have regular hexagonal sampling. Then, from chapter 3, we have that: w g = lju. + lju, (4.4.60) for lu\2 taking on all integer values. The vectors u \ and u 2 are obtained from the spatial sampling basis using (3.6.3). If we let the distance between nearest neighbour samples be 2a, then the resulting frequency domain sample basis vectors can be computed to be: 163 u, = (7T/a)(l,-lV3) and u2 = (7r7a)(0,2V3) (4.4.61) We can now write S as follows: S = ZZexp(-47r2(l12-l1l2 + l22)/(3a2c)) (4.4.62) In figure 4.25 we plot the average RMS reconstruction error v/E as a function of og for four different values of the filter constant c, given that A=\/(2n), and a = lV3. Figures 4.26 and 4.27 are similar to figure 4.25 except that a = l/i/6 and 1/2/3 respectively. Some conclusions can be immediately drawn. The first is that reducing the sample spacing, a, reduces the error. Secondly, increasing the value of the filter constant c, reduces the error for small values of ag, while decreasing c reduces the error for large values of ag. It is also evident that increasing the separation between samples results in increased error for large ag due to the passing by the filter of the energy in the spectral repetitions, which move closer to the frequency plane origin as the samples get farther apart. Nonuniform Sampling Let us now consider the case of the reconstruction error produced by the warping or transformation method described in chapter 3 for nonuniformly distributed samples. Recall that this method involved making a coordinate transformation based on the sample distribution, in such a way that the samples in the new coordinate system were uniformly distributed, so that the standard uniform reconstruction method of Petersen and Middleton (1962) could be used. The transformed version, h(x), of the function, f(x) to be reconstructed is related to f(x) as follows: h(x) = f(7_1(x)) (4.4.63) where 7(x) is the transformation from uniform to nonuniform coordinates. To determine the 164 H 1 1 H 1 1 1 4 1 1 V-c value • = c = 5 o = c = 10 A = c = 15 + = c = 20 H 1 1-- A A ia A A A A A A A A A A 6 A A A A—i i -e—e—e—e—e—e—e—e—e—e—e—e—e—e—6—e—e—e—<) -B B B B B B B B B B B B B S B—-B B—11 5 -1— 10 -r-15 20 i 25 Sigma S FIGURE 4.25 The average RMS reconstruction error for a Gaussian process and filter, with a = lV3, for c=5, 10, 15 and 20. FIGURE 4.26 The average RMS reconstruction error for a Gaussian process and filter, with a = l V 6 , for c=5, 10, 15 and 20. 165 c value D = c = 5 o = c = 10 A = c = 15 + = c = 20 1 H t- H 1 1- H (- •+-• (-10 15 Sigma S 20 25 FIGURE 4.27 The average RMS reconstruction error for a Gaussian process and filter, with a = W l 2 , for c=5, 10, 15 and 20. average mean square reconstruction error for a given reconstruction filter g(x), we need only determine the power spectral density, $^(5), of h(x) and then use equation (4.4.49). Note that the set {a>g} is fixed for all cases, as described in chapter 3.6. Let us assume that we know the power spectral density, $fe>), of the function, f(x), that is to be reconstructed. The question to be answered now is: How can $^(u>) be obtained from $ft(j) as follows: 167 *hl(T) = Hf(0)f(c1(7)+7)} = * ft+£>(?)) (4.4.71) However, in determining '/'^(r) we must consider all possible values that c ( r ) can take on. Thus we must perform an expectation operation with respect to c(r) . Doing so yields: tfh(r) = mft+c(T))} = UZaVc(c> ft +c)dc (4.4.72) where P c (c ) is the probability density funcdon of the random process c . If P c (c) and \frft) are even functions the above equation is seen to. be a convolution. Hence we can write: tfh(r) = pftMft) (4.4.73) Taking the Fourier transform of this relationship allows us to write: * h (5 ) = ft) (4.4.74) where #c(w) is the characteristic Junction of the distribution P c (T) . Thus, if we know p (c) and ft) we can, in principle, determine ^ ( C J ) which can be used in equadon (4.4.49) to compute the average mean square reconstruction error. When modelling the random perturbation function, B(x), one must keep in mind the invertibility condition on 7~^(x) (which states that the Jacobian determinant of the transformation must never equal zero). This condition means that B(x) must obey the following: |l + 9B(x)/3x| > 0 (4.4.75) (we have arbitrarily selected the sign of the Jacobian to be positive, it could just as easily be negative, in which case the above quantity must always be less than zero.) This can be 168 written as: 1 + T h i + 7 h 2 + T?n7? 22 - 77 1 277 21 > 0 (4.4.76) where rju - dby/dxu Vii - 3bi/3x 2, 7?2) = 9b 2 /3xi and 7722 = 3b 2/3x 2. One possible model for 5(x) can be obtained by assuming the probability density of the 7 7 . . to vanish over the range (-5,6), where 5 lies in the range [-(l+i/3)/2, (1—1/3)/2]. With this condition, the Jacobian of 7 " ^ is guaranteed to be always positive. Let us further assume that the power spectral density of the bj(x) vanish outside a disk of radius B in the frequency domain (i.e. the bj(x) are bandlimited to B). Papoulis (1967) shows that the derivatives of a bandlimited deterministic function are themselves limited in magnitude. He extends this result to the case of random functions in the one dimensional case. However, the limit is now a limit on the RMS value of the derivatives. His derivation can be extended to the two dimensional case to give the following limit on the mean square value of the derivatives of bj(x): H|3 k + rb,(x,y)/3x k3y r| 2} ^ P B 2 k + 2 r (4.4.77) where P is the power in b t(x), defined as P = (l/47T2)//n 4>b(£)c£ (4.4.78) and is the region of support of the Fourier transform of b,(x) (i.e. where it is non zero). Note that this does not limit the maximum of the derivatives of the functions. However, if the bj are Gaussian distributed, then so are the derivatives of the bj. If we fix the B / P product so that it is less than 8/2, then the probability that the magnitude of the derivatives will not exceed 5 will be 0.955 (as this is the 2a value of the Gaussian distribution). Thus we can create a model for the perturbation noise B(x) by assuming the functions bi(x) and 169 b2(x) to be Gaussian distributed and bandlimited such that the product of the bandwidth and the square root of the power of the bj(x) are less than 5/2, where 5 is in the range [ - ( W 3 ) / 2 , d V 3 ) / 2 ] . Since B(x) has a Gaussian distribution, so does c(x). Thus the characteristic function <£c(u>) is also Gaussian. This means that ^ ( w ) for this perturbation model is simply a Gaussian weighted version of ^ w ) . In the case of the sample sequence created by zero crossing contours of V 2 G filtered images, the above perturbation model is not valid. It appears to be very difficult to obtain a model for this case, primarily because of the high degree of correlation exhibited by the perturbation function in the zero crossing case (because zero crossings lie along continuous contours) and also because of the difficulty in finding a model which both satisfies the invertibilty constraint on 7~^(x) and results in a closed form expression for the characteristic function of c(x). For this reason no further work was done on trying to obtain a model for the perturbation function for the zero crossing sample sequence. However, this section has developed the theory necessary to compute the average mean square reconstruction error for the case of reconstructing from non-uniformly distributed samples using the transformation method of chapter 3, even if it turns out that the application of this theory to some sample sequences may be an intractable problem. 170 4.5 - Matching Error Analysis We will now consider the contribution to the disparity error produced by incorrect matching of features. Marr and Poggio (1979) discuss this topic in terms of the probability of there being more than one feature in the matching region. The assumption implicit in their work is that if there is only one feature in the matching range then that feature is the correct match. However, as we will see in this section, this assumption is valid only if the disparity estimate error is very small. They did not consider the effect on the matching error of relatively large disparity estimate error which, as we have seen in the previous sections, can result from the action of a number of error processes. In this section we provide a detailed analysis of the matching error, for the case of zero crossing features with nearest neighbour matching.7 This analysis brings out the dependence of the matching error on the error in the disparity estimate as well as on the size of the matching region. The matching process is depicted in figure 4.28. The true match is a distance e^ away from the estimated match position and is taken to be the origin of the epipolar line. The matching region extends a distance r m away from the estimated match position on either side. Ghost matches (defined as features, other than the true match, which lie within the matching region) lie at distances T- from the true match. An incorrect match will be made if one of the ghost matches (say ghost match j) lies closer to the estimated match position than does the true match. Thus the matching error will be equal to Tj (and not zero). Note that reducing the size of the matching region will not necessarily reduce the probability of error. It will do so only if the error in the disparity estimate is smaller than the size of this reduced matching region. In general this will not be case and the matching region must be fairly large so that, if there is a relatively large error in the disparity estimate, there is still a chance that the true match (or a ghost match close to it) will be chosen, resulting in a reduced error. In general, what we require from our matching algorithms, is that, as we proceed to higher levels of resolution, the variance of the disparity error gets smaller in 'Recall from chapter 2 that in this form of matching, the match is taken to be the matching feature nearest to the estimated match location. 171 < f t r u e m a t c h m FIGURE 4.28 The matching process. absolute terms (or stays more or less constant in terms of our pixels, which get smaller as the resolution increases). To this end, we provide an analysis of the matching error to see if, in fact, the disparity error does converge (i.e. the matching error should be smaller than the error in the initial disparity estimate). For the purposes of the following analysis let us assume, that the matching region is of infinite extent This is not as bold an assumption as it may seem because of the way that the matching is done. Recall that the match is taken to be the closest matching feature to the estimated match position. The only time that having a very large matching region will, have an effect is when the true match is missing, and instead of just having a 'no-match' situation (which would not affect the matching error) we would have a match which would always be incorrect Note that this can also happen with smaller matching regions, but with the larger matching regions there is a slight possibility that the induced matching error will be quite large (if there are no ghost matches near to where the true match should be). 172 Probability density of the matching error We can write the probability density of the matching error e m as p(em) = Prob{ a feature at T= e m given that there is a matching feature at 0 (the true match) and that there are no matching features in the region [T,2e^~r] (otherwise these features would have been selected as the match) 3 Let us define P^C?") to be the probability density of the interval r between the feature at the origin and the N ^ 1 matching feature (where we order consecutive features along the epipolar line : ...-2-1,0,1,2... 0 corresponding to the feature at the origin).8 Let us further define P (N) to be the probability of a feature being the N feature given that it is r units from the zeroth feature (the one at the origin). With these definitions it can be seen that the probability density of the matching error can now be written: Pm = N ? 0 P em(N*W P ^ r O d r , ! for 0l) features can be closer to the estimated match position This notation differs slightly from the notation by Longuet-Higgins (1963) who defined P(T) to be the probability density of the interval r between the feature at the origin and the N+1 matching feature. The reason we use the modified notation is to allow the definition of P0(T) which is the probability of the same feature being r units apart, which is obviously equal to one at T=0 and zero everywhere else. 173 than the N m feature only if the N - l m feature is as well. A similar argument holds for the case of e m less than e ,^ in which case we need only consider the N + l ^ feature. This is shown in figure 4.29. We have assumed that the occurence of a gap with no features in the interval (e m ,2e ) is independent of the occurence of the features at zero and e . This is not strictly valid (except for large values of em) but is an assumption we must make in order to obtain any mathematical headway. An interesting special case is that of e m = 0. Pm(0) i s m e probability that the correct match will be found. From the above equations we can see that: 2ed Pm(0) = 1 - JT o P^rOdT, (4.5.3) Note that Pm(0) is a function of the disparity estimate error, e .^ Longuet-Higgins (1963) provides a number of approximations to for the case of zero crossing features of a one dimensional random Gaussian distributed process, f(x). For the case of matching zero crossings without regard to their sign, Longuet-Higgins gives the following approximation: P,(r) = X(+,-;r) - X( + ,-,-;r) (4.5.4) where: X( + , - , t ) = (l/27ry(-^0/i//(r)X/(MnM2 2)[v/(l-v1 2 2)-v1 2ARCOS(v1 2)]/[^2(0)-^2(r)] (4.5.5) X( + ,--;r) = (l/47r2);v/(-/(MiiM 2 2)[v/(l-v 1 2 2 ) + v 1 2 ARCOS(-v 1 2 )] / [^ 2 (0)-^ 2 (r)] (4.5.18) 176 t X( + , + ,+ ;T) = (l/47r2)/v/(- S3 p s i o N O ^ o o a-Angle Quantiz. o 180 degrees o 360 degrees 1 1 i 5 10 15 20 Error in the Disparity Estimate 25 FIGURE 4.30 The theoretical probability of obtaining the correct match as a function of the disparity measurement error, o = y/2 5 10 15 20 Error in the Disparity Estimate FIGURE 4.31 Experimentally derived relationship between the probability of obtaining the correct match and the error in the disparity estimate for a number of different angle quantizations. 179 (4.5.2) is very difficult as finding a closed form expression for P n (T ) for N>3 (for the case of matching zero crossings without regard to sign) can not be done in general (see the discussion on this point in Longuet-Higgins, 1963). However, we can, as we did for the zero matching error case, obtain some representative results numerically. The results are depicted in figure 4.32 for the cases of e m = 0 (same as previously), 1, 2 and 3. As expected, the probability density for a non-zero matching error is a maximum at some non-zero value of the disparity estimate error, and the larger the matching error, the larger is the disparity error for which the probability reaches a peak. This indicates that, as the disparity estimate error increases, the expected value of the magnitude of the matching error will also increase. In conclusion we can make some general comments about the matching error. If the error in the disparity estimate is relatively small, there will be a high probability that the correct match will be found. Thus we can be confident that our multi-resolution matching algorithm will converge. If, on the other hand, the error in the disparity estimate is large then the matching error may be large as well. However, there is still at least a 50% chance that the matching error will be less than the disparity error (since the match may be either closer to or farther away from the true match, and for large disparity errors the chances of one or the other happening are about equal). Thus the matching algorithm may still converge if an iterative procedure (one in which the matching is done repeatedly at a single resolution level) is performed. It is difficult to evaluate how small the disparity error must be in order for the matching procedure to converge. The problem is further complicated by the highly correlated nature of most disparity functions encountered in practice, which will mean that in one section of the image the disparity estimate may be highly accurate but way off in some other region. This often will happen near disparity discontinuities where the reconstruction process produces a large amount of error. If the disparity error in a given region is high enough then the matching process will hardly ever produce the correct match. Thus the matching process is inherently unstable as a large enough disturbance (error) can dislodge the disparity estimate from the somewhat stable point at the correct value, and will never make its way back to the correct value. 180 Matching Error • 1.0 5 10 15 20 Error in the Disparity Estimate 25 FIGURE 4.32 The probability density of obtaining a matching error as a function of the error in the disparity estimate. Errors in the orientation When the disparity surface is non-constant the zero crossing contours will be distorted in going from one image to the other, as shown in figure 4.33. Let us assume that the orientation of a given zero crossing in the left image is 0O. Then, if the disparity funcdon is lineaT along the y axis with a gradient of m, and constant along the x axis, the orientation of the corresponding zero crossing in the right image is not 0O but is given by: 0 i = tan_1(tan(0o)+m) (4.5.20) The result of this change in the orientation of the zero crossings is to cause the matching process to make incorrect matches, or to cause matches to be missed. To test the effect of disparity gradients on the orientation, and how it affects the matching process, we performed the following experiment: We generated a 256x256 array of gaussian distributed random values, as above. A second 181 FIGURE 4.33 The distortion of zero crossing contours for non constant disparity functions. array was generated which contain the values of the first array horizontally shifted by an amount equal to 20*exp(-(I-128)**2/3200) where I is the row number of the array (1-256). Thus the disparity is constant along a row and varies along a column. Then we tried to perform the matching of the zero crossings for angle quantizations of 22.5, 60, and 180 degrees, for various values of the disparity estimate error. The percentage of correct matches was tabulated in each case. The results are shown in figure 4.34. It can be seen, in comparison with figure 4.32 that one of the effects of the non-constant disparity is to reduce the proportion of correct matches. This is due to the change in the zero crossing orientation. It can also be seen that the effect is more pronounced for the smaller levels of angle quantization. This is to be expected as the larger the angle quantization, the larger the required perturbation to get a change in the angle measurement The bottom line is that if the disparity function is non-constant (as is usually the case) the matching algorithm will produce some matching error over and above all the other sources of error we have discussed. Error in the Disparity Estimate FIGURE 4.34 The probability of obtaining the correct match as a function of the disparity estimate error for non-constant disparity functions. 183 4.6 - Geometry Errors Errors in camera parameters In the Appendix is derived the relationships between the image displacements, (X],x2) the camera geometry (shown in figure 4.35) and the physical (viewer centred) coordinates (X,Y,Z). These relationships are summarized as follows: Z = (l + a J ) f d x / [ f ( x -x 2 ) + a(f -x 1 x 2 ) ] (4.6.1) X = x,Z/f Y = y 2 Z/f where a = tan(2/3). We can now determine the sensitivity errors in the measured (or assumed) camera obtained by partial differentiation with respect get: 3Z/3f = 2Z/f - Z[D+2af]/A where D=x, -x 2 is the disparity and A = fD + a ( f + x,x2) (4.6.2) (4.6.3) of the computed position coordinates (X,Y,Z) to parameters j3, d x , and f. The sensitivities are to the parameters in question. Thus, for Z we (4.6.4) (4.6.5) We also have: 184 FIGURE 4.35 The stereo camera geometry. 3 Z / 3 d x = Z / d x (4.6.6) 3 Z / 3 X J = -Z(f+ax 2 ) /A (4.6.7) 3Z/3x 2 = Z(f-ax,)/A (4.6.8) 3Z/3/3 = sec2(/3)9Z/3a = [2af 7 d x /A - Z(P + x,x2)]sec2(^)/A (4.6.9) For P>>XiX2> 0=0, d z=0 and d x =d the angle sensitivity can be written: 3Z/30 = - Z 2 / d (4.6.10) 185 This sensitivity can become quite large for large depth values, meaning that slight errors in the measured camera tilt angle can result in large errors in the computed depth. The sensitivities of X can be written as: 3X/3f = (x2/03Z/3f-(xJ/f2)Z (4.6.11) 3X/3dx = (x2/f)3Z/3dx (4.6.12) 3X/3x, = (Xj/OSZ/Sx, (4.6.13) 3X/3x2 = (x2/f)3Z/3x2 + Z/f (4.6.14) 3X/3/3 = (x2/Q3Z/3/3 (4.6.15) The Y sensitivities are obtained in a similar fashion. For truly accurate depth and position computation, precise values for the camera parameters must be obtained. This can be done by careful setup of the cameras, minimizing the effects of external disturbances such as vibration, or by accurate estimation of the camera parameters from image plane measurements of ground control points. This is frequently done in photogrammetric applications (see Ghosh, 1979). Errors due to vertical misalignments 186 If the relative camera tilt angle is nonzero then there will be vertical as well as horizontal disparities (another way of saying that the epipolar lines are not horizontal). If the matching algorithm assumes a horizontal epipolar line along which to search for matches then the fact that there are vertical disparities will cause errors in the measured disparities and also may cause matches to disappear altogether. These two events are depicted in figure 4.36. Let us now derive the probability density of the disparity error produced by this vertical misalignment for the case of gaussian random white noise processes. It can be seen from figure 4.36 that the disparity error ey is a function of the zero crossing orientation 8 and the amount of vertical misalignment, 8 (which is assumed to be constant), and is given by: For any isotropic random process the angle 8 of the zero crossings has a uniform distribution in the interval (-7T,7r). However, in our matching algorithm we ignore all zero crossings that lie close (within an angle A) to the horizontal. Thus we take the distribution of the angles to be uniform only over the ranges ( -A,A -7r) and (A,7r-A). Thus we have: Outside this range ?Q(8) is zero. Let u = l/tan(0) and 8 = tan (1/M). The probability density of u is then given by: ey = 5/tan(0) (4.6.16) P « ( 0 ) = 1/[2TT-4A] for 6>e(-A,A-7r) and (A,TT-A) (4.6.17) P M (M) = ?e(8)\d8(n)/dn\ (4.6.18) = [(2TT-4A)(M2 + 1)] -1 for tan (l /M )e( -A,A -7r) and (A ,7r-A) (4.6.19) 187 ASSUMED EPIPOLAR "> LINE TRUE EPIPOLAR-7 LINE ^ ZERO CROSSING 8 t INCORRECT DISPARITY t ZERO CROSSING MISSING MATCHES FIGURE 4.36 The effects of vertical misalignment on the disparity measurements. Outside this range P^(M) is zero. Since ty = 5n we can write: Pev = PM(M(e v)|3M(ev)/3e x (4.6.20) 5/[(27T-4A)(5 J + ev2)] for ev)e(0,8/tan(A) (4.6.21) Outside this range Pe v(ey) is zero. The variance of P P V is given by ev o e y 2 = (25/(27r-45))/e2/[52 + e2]de = 52[l/(tan(A)(7r-2A))-l/2] o (4.6.22) For small A we have: o e v 2 = 62/(7rA) (4.6.23) 188 Thus the standard deviation of the error due to the vertical misalignment is seen to be proportional to the magnitude of the vertical misalignment 189 4.7 - Effect of the various errors on the multi-resolution matching algorithm In this section we discuss how the errors described in the previous sections affect the performance of the simplified multi-resolution matching algorithm. The various error sources can be seen to act on the matching process at the points shown in figure 4.37. The disparity measurement errors due to filtering, sensor noise, quantization, and vertical misalignment can be thought of as adding to the positions of the zero crossings that are input to the matcher. The matching error is added to the output of the matcher, and the reconstruction error is added to the output of the reconstruction process. Note that the reconstruction process actually filters, or smooths, the other error functions. However, the reconstruction process also smooths out the disparity function, resulting in a reconstruction error if the disparity function has appreciable high frequency components. It is also evident from figure 4.37 that the matching error depends on the disparity estimate obtained from the next lower (and in the iterative algorithm, the next higher) resolution level, and hence on the various errors at that level. We can write down a recursion which defines the error in the disparity estimate at a given resolution level, k, as follows: e d « = d-d = eW+riejV-^e^ + ^ K c ^ + ^ (4.7.-1) e d ( 0 ) = d o " d ( 4 > 7 - 2 ) where d 0 is the initial (lowest resolution) disparity estimate, and d is the true disparity function. The reconstruction process is indicated by r{...}. It can be seen, that the reconstruction operation acts on the filtering, sensor noise, quantization and vertical misalignment errors. Since the reconstruction operation is essentially a smoothing, these errors are smoothed out somewhat as well. Recall that, in chapter 3, we showed that the 190 FIGURE 4.37 The action of the various errors on the matching process. reconstruction error was reduced when the sample density increased. This fact, coupled with the smoothing of the error function by the reconstruction process suggests that the effect of including matches that are slightly incorrect, rather than getting rid of them may actually reduce the overall disparity error by increasing the sample density which in turn reduces the reconstruction error. This would probably be the case only in the regions of the images for which the disparity function was rapidly changing. It would be desirable to obtain a closed form expression for the probability distribution of the disparity error at each resolution level, so that we could examine the convergence of the matching algorithm. However, even if we assume white Gaussian noise for the input images, the distribution of the various error sources are all markedly non-Gaussian as we have seen in the discussion in the previous sections. Thus it is not possible to obtain analytical expressions for the disparity error probability density function. We can, however, 191 with reference to figure 4.37, make some qualitative statements. We know, from the earlier sections in this chapter, that all of the disparity measurement errors (except for the error due to vertical misalignment of the cameras) and the reconstruction errors decrease as the resolution increases. This means that as we proceed to higher and higher resolutions, the accuracy of the disparity measurements increase. As well, the accuracy of the disparity function increases. The only question lies with the matching algorithm. If the matching algorithm can match accurately then convergence of the matching algorithm is assured. However, as we have seen, the performance of our matching algorithm depends on the accuracy of the disparity estimate that guides it. In chapter 4.5 we saw that the probability density of the matching error exhibits an impulse at zero error. The magnitude of this impulse is a monotonically decreasing function of the error in the disparity estimate. The importance of this analysis is that it shows that the estimate in the disparity function can have a certain level of error and the matching algorithm will still yield exact matches most of the time. Crudely put, we can conclude that if the disparity measurements are sufficiently accurate, and the disparity function reconstruction is sufficiently accurate, at all resolution levels, then the matching algorithm will converge. If the errors in the disparity measurements are excessively high, as may happen with very noisy sensors, or if the disparity function reconstruction is poor, as may happen with surfaces that have high spatial frequency components or from using poor reconstruction methods, the matching algorithm may not converge. Often it may happen that the various sources of error are not uniformly distributed throughout the image but rather tend to accumulate in distinct regions. In this case the matching algorithm may converge over most of the image but diverge over scattered patches of it This is seen in some of the experiments described in the next chapter, especially when the reconstruction process is done sufficiently well. 192 4.8 - Summary of chapter 4 - The major sources of error in the simplified multi-resolution matching algorithm are sensor noise, spatial filtering effects, reconstruction errors, matching errors and geometry errors. - The disparity error due to the sensor noise increases as the resolution decreases and as the signal to noise ratio decreases. - The filtering error, due to spatial filtering of the images for non-constant disparity functions, increases as the disparity gradient increases, and as the resolution decreases. - The left and right scale maps of a one-dimensional stereo pair, for linear disparity functions, are related by a simple expansion factor. - Two dimensional functions can be represented by the Two Dimensional Scale Space Map which has two spatial dimensions and two scale dimensions. A two dimensional slice through this function, obtained by holding one scale and one spatial dimension constant, results in the Skew Map of the function. It is seen that the Skew Maps of two functions, for linear disparity, are related by a simple expansion factor. - The reconstruction error is composed of three, somewhat interacting, components, the truncation, aliasing and filtering errors. - The optimal truncated reconstruction filter is derived, consisting of generalized prolate spheroidal wavefunctions. - A general expression for the reconstruction error is derived, involving the sample distribution, function spectrum, and the reconstruction filter impulse response. - The reconstruction error is seen to, in general, rise as the resolution decreases (due to decreased sample density) and as the disparity function bandwidth increases. 193 - The distribution of the matching error exhibits an impulse at zero error. The magnitude of this impulse decreases as the error in the disparity estimate supplied to the matching process is increased. The fact that this impulse exists indicates that the matcher can tolerate some error in the disparity estimate and still yield exact matching. - The level of quantization of the zero crossing orientation affects the matching error, ln general the finer the quantization, the smaller the error. - If the disparity function is not constant, then, in general, the orientation of corresponding zero crossings will not be the same. This can cause an increase in the matching error. The increase in the matching error is seen to be greater for finer orientation quantizations, and for zero crossings whose orientation approaches vertical. - Errors in the measured or assumed camera geometry parameters will cause errors in the computation of depth and position from the disparity measurements. - Vertical misalignment of the cameras cause errors in the measured disparity. This error is greatest for zero crossings near horizontal. For Gaussian white random noise images the error standard deviation is proportional to the amount of vertical misalignment - The total disparity error at any resolution can be written as a recursive function of the errors at the previous resolutions. - The reconstruction process tends to smooth out the disparity measurement errors (not including the errors produced by the reconstruction process itself). - If the disparity errors at each resolution are relatively small (compared to some multiple of the V 2 G filter a) and if the reconstruction process is sufficiently accurate then the matching algorithm will converge. - The matching algorithm may converge over most of an image and diverge over small patches of the image. 194 V - EXPERIMENTS WITH T H E DISCRETE M U L T I - R E S O L U T I O N MATCHING ALGORITHMS 5.1 - Introduction This chapter presents the description and results of computational experiments performed to illustrate some of the analyses of the discrete multi- resolution matching algorithms that were done in the previous chapter. The topics covered in this chapter and the relationship between them are summarized in figure 5.1. A summary of the findings of this chapter is given at the end of the chapter. The experiments detailed in this chapter are designed to examine the effects of sensor noise, reconstruction errors, filtering errors and matching errors on the performance of the multi-resolution matching algorithms. All of the experiments described in this chapter (except for the last section) were performed on images whose intensities were randomly distributed with Gaussian distributions with mean 128 and standard deviation of 64. The intensity distributions were truncated so that all the intensities had values between zero and 255. The departure from the true Gaussian distribution caused by this truncation is assumed to be negligible. The measure of error that is used in these experiments is the RMS disparity error. That is, the square root of the average of the square of the difference between the measured disparity and the actual disparity at each point in the image. The surfaces used in the experiments were ones which gave rise to disparity functions of the form d(x,y) = d, max exp(-yV2o 2 s) (5.1.1) and rf(x,y) = rfmax exp(-x2/2a2 s) (5.1.2) 195 Introduction CH5.I Application of stereo vision * to log scaling Experiments CHS."? Implementation of the f i l t e r CH 5.2 Review of basic techniques CM 5.1 Analysis of the measurement accuracy CH 5.? * Finding the log in the disparity map CHS.? Estimating the log volume CH 5.? Summary Frequency response CM 5.3 Error due to disparity gradients CH 6". 4 Error due to sensor noise ^ I C H 5.5 Comparison * of the simple method to the dispscan method (*) Indicates New Material FIGURE 5.1 The topics covered in this chapter. These are cylindrical gaussian functions. These disparity functions were chosen for a number of reasons. First, by changing o we can change the effective bandwidth of the disparity 196 function, thereby exercising the reconstruction algorithms. Secondly, when the axis of these cylinders is aligned with the x-axis, the disparity gradient along the x-axis is zero. Thus there is no filtering effect in this case. Furthermore there are no self-occlusions of the surface which would cause missing or incorrect matches. If the cylinder axis is aligned with the y-axis then there is a disparity gradient along the x-axis. If we constrain the disparity gradient along the x-axis to be less than one, there will be no occlusions. Thus, we can obtain a measure of the effect of the filtering error on the matching algorithm by performing the experiment first on a cylinder with its axis along the x-axis and then repeating it with the cylinder aligned with the y-axis. The change in the observed disparity error can be attributed to the filtering error effect (but see the remarks below on decoupling the error components). The third reason for using these gaussian cylinders was that they somewhat modeled the expected disparity functions obtained in the log scaling application described in section 5.6. We examine, in these experiments, the use of three different matching algorithms. These are: 1. Matching using zero crossing features only. 2. Iterative (two pass) matching using zero crossing features only. 3. Iterative (two pass) matching using zero crossing and extremum features. Furthermore, three different types of reconstruction algorithms are tried. These are: 1. Warping or Transformation reconstruction method (see chapter 3.7) using the optimal filter derived in chapter 4.4. 2. Relaxation reconstruction method, (see chapter 3.3). This method is used only for the iterative matching algorithms, in order to speed its convergence rate. 3. 9x9 Averaging. This method was suggested by Grimson (1981a) as a means of obtaining a disparity value from a region. It consists of merely averaging the values of all 197 sample points located in a 9x9 pixel region of the reconstruction grid centred on the point to be reconstructed. The motivation for using such a method is that it provides a check on whether or not a simple reconstruction is all that is required. This method is computationally much cheaper than the other two reconstruction techniques. However such a large region causes problems when the disparity function is not constant. One of the difficulties that we encounter in doing these experiments is in decoupling the various sources of error from the resultant disparity error. The effect of the sensor noise is decoupled from the other error contributions by repeating a given experiment holding all conditions the same except that a Gaussian random function with a given variance is added to one of the input images. The increase in the disparity error can be then attributed to this added noise signal. Similarly, the effect of changing reconstruction methods and the effect of changing the surface bandwidth is obtained with the same process; varying a given parameter while holding all other conditions the same. However, as we have seen in the previous chapter, the matching error is dependent on the errors in the disparity estimate, which in turn is a function of the various measurement and reconstruction errors. This means that the matching error can not be held constant Thus the observed differences in disparity error between two experiments will always consist of changes in the matching error as well as changes in the disparity measurement error due to the effect being tested for (such as sensor noise). However, all is not lost If the disparity estimate error (due to sensor noise, reconstruction etc.) is small enough so that the correct match is always within the matching region, then the matching error will be essentially independent of the disparity estimate error (as was shown in the previous chapter). In this case the changes in the observed disparity error will be due to changes in the parameter that is being varied. Before going on to the presentation of the actual experiments we will briefly describe the implementation of the multi-resolution feature detection algorithm. -198 5.2 - Implementation of the Multi-Resolution Feature Extraction ln this section we discuss the implementation of the subsystem responsible for the production of the multi-resolution feature image representation. This subsystem can be broken up into two sections: the spatial filtering to produce the set of spatial frequency channels, and the feature detection. The spatial filtering is performed as shown in figure 5.2. The lowpass filter and sub-sampler sections form a two-dimensional decimator or sampling rate reducer. The lowpass filter restricts the maximum frequency of the image to one half its previous maximum, to limit the aliasing error when the filtered image is subsampled. Each decimation stage reduces the number of image samples by a factor of four (by two in each of the horizontal and vertical directions). Each lowpass filter section has exactly the same set of filter coefficients. Each successive stage of the decimator is followed by a V 2 G bandpass filter. Even though the coefficients for each of these V 2 G filters are the same, the apparent frequency response of these filters with respect to the input have different centre frequencies because of the sampling rate reduction. This scheme of spatial frequency channel production offers distinct advantages over the direct method in which the input signal is filtered by four separate bandpass filters, each having a different frequency response. The first, and probably least important advantage, is that only one set of filter coefficients is required for all the lowpass filters and for all the V 2 G filters. A more important advantage lies in the fact that the centre frequency of the prototypical bandpass filter, with respect to the input, is fairly high, on the order of n/2 radians. In designing finite wordlength digital bandpass filters the number of coefficients required to approximate an ideal filter response to a given accuracy is inversely proportional to the centre frequency. For example, in the direct method we would require a filter size on the order of 8Nx8N for the lowest (fourth) spatial frequency channel filter (given that the highest frequency filter was of size NxN), compared to the NxN size filter required in the hierarchical scheme for all the channels. Of course we must take into account the low pass filters in the hierarchical case but these too will be of constant, not exponential, size. In 199 INPUT V 2 6 , C H A K I N F l 1 IMAGE (N-N) BANDPASS FILTER (N.N) LOW PASS FILTER • S U B -SAMPLER V 2 6 BPF ^ . C H A N N E L V 2 G BPF , CHANNEL 3 fW) LP F S S — i — PRODUCTION OF SPATIAL FREQUENCY C H A N N E L S FIGURE 5.2 The spatial Filtering process. addition, the structure of our hierarchical filtering system facilitates the pipelining of computation as can be seen in (Clark and Lav/Tence, 1985c). The filtering method described in this thesis can be compared with the technique developed by Crowley and Stern (1984). They compute the Difference of Low-Pass (or 200 DOLP) transform of an image, which, for Gaussian low-pass filters, closely approximates the V 2 G filter. Their method uses subsampling to reduce computation and also uses the separability of the Gaussian low-pass filter to reduce the amount of computation. Their method produces bandpass filtering at resolution levels that are a factor of \/2 apart, in comparison to our method which produces bandpass filters with resolution a factor of 2 apart In some cases this may be useful, but for our application the Crowley and Stern method needs to do twice as much computation than is actually required. Clark and Lawrence (1984, and 1985c) describe a proposed hardware implementation of the filtering method described in this thesis that takes advantage of the fact that it can be implemented in a pipelined fashion to increase the speed of computation. It is not clear whether the method of Crowley and Stern can be similarly configured. If we let the lowpass filter prototype have a frequency response U^\U>i) and the bandpass filter have a frequency response B (b> i£> 2 ) then the frequency responses of the spatial frequency channels, referred to the input, are as follows: H1(CJ1JW2) = B(CJJ^J) H 2 (w i j w 2 ) = B(2CJ !,2CJ 2)L(W i AJ2) H3 (OJ ] A> 2) = B(4co, ,4CJ 2 )Uw x A> 2 )L(2o) i ,2CJ 2) H4(W1A)2) = B(8CL)I,8CJ2)UCJ1AJ2)U2W1,2CO2^4JI,4J2) and ^ w 1 £ ) 2 ) = L(a)1 + 27rkA>2 + 27r^ ) for k,l = ± 1,2,3... The prototype two-dimensional lowpass filter was designed by transforming a one-dimensional lowpass filter using the McClellan transformation (McClellan, 1973). This transformation takes a one-dimensional filter with transfer function F i ( c o ) and produces a two-dimensional filter with transfer function: (5.2.1) 201 where: f(wlJcj2) = arcos[.5(cos(cji) + cos(co2) + cos(a),)cos(6J2)-l)] (5.2.2) This transformation preserves the optimality (if present) of the one-dimensional filter in the two-dimensional design. The one-dimensional filter was designed using the Remez exchange algorithm (McClellan et al 1973) to produce an optimal half band lowpass filter (optimal in the sense that the peak approximation error to an ideal lowpass filter is minimized using a minimax criterion). The peak sidelobe level of the low pass filter is set by the number of coefficients used in the filter. For N=25 this level is about -33dB. One result of the McClellan transformation is that the resulting filter displays octant symmetry. Thus L(6Ji£j 2) = L(GJ2A>I) = L(WI,-CJ2) - U-cj!A>2) = 1X^2,-0)0 = \X-UJ2(JJI) = IX-CL)U-CJ2) - L ( -Cc> 2 £Ji ) . This means that, for N odd, there are only (N+l)V8 + (n + l)/4 unique filter coefficients instead of N 2 . This can result in a large savings in computation and increased throughput if the symmetry is taken advantage of. However, since we were forced to use a general purpose computer, instead of being able to build a special purpose device such as the systolic processor described in (Clark and Lawrence, 1985c), we implemented the filtering operations using the Fast Fourier Transform (FFT) on an FPS-100 array processor attached to a VAX-11/750 minicomputer. The multi-resolution filtering process used was still that depicted in figure 5.2, however. Typical C P U times for performing four level filtering on a 256x256 image were on the order of 70 seconds, for a lightly loaded system. The prototype bandpass filter is, as mentioned earlier, a V2G filter with transfer function B(a> 1 p 2 )=k(w 1 2 +w 2 2 )e" a 2 ( c J l 2 + £ j 2 2 ) (5.2.3) 202 The value of o is chosen to trade off between high bandwidth (lower number of filter coefficients) and low aliasing error (due to the sampling of the ideal continuous filter). We set the o value for the highest resolution level to be y/ 2. The frequency response of the four spatial filters is shown in figure 5.3. One of the spatial frequency axes has been suppressed (CJ2 = 0) for clarity. The peak sidelobe levels are less than 33 dB in all cases. Zero crossings are detected by scanning along horizontal lines (rasters) for either a zero value or for a change in sign. When one of these is found, a zero crossing is assigned to the position of the left pixel in the case of a sign change, and to the zero pixel in the zero value case. Once the zero crossings have been detected, we perform a thinning procedure which gets rid of small (one or two pixels across) isolated clumps of zero crossings. After this is done, we compute a measure of the angle of the zero crossing contour through the zero crossing pixel. This value is then used in the matching algorithm to disambiguate between possible matches. The angle measurements are quantized to to sectors of 6 0 ° , that is, 6 sectors in a 360° circle. To provide some measure of noise immunity, we ignore all zero crossings whose contrast falls below a given threshold. The threshold used will depend on the expected signal to noise ratio of the V 2 G filtered images (which in turn will depend on the processor word length and camera characteristics). In our experiments we used a threshold of 20/255 as the majority of noise-like zero crossings fell below this threshold. The zero crossing detection algorithm did not use the array processor at all, and typical CPU times for the zero crossing detection process on a four level image set (256x256, 128x128,...) were on the order of 90 seconds for a lightly loaded system. Combining these times with the typical filtering times reported above for the filtering process results in times on the order of 320 Seconds (about five and a half minutes) for performing the multi-resolution filtering and zero crossing extraction on a stereo pair of 256x256 images. 203 FREQUENCY FIGURE 5.3 The frequency response of the four spatial filters. Extremal points (points of local maxima and minima) were also detected. This was performed by searching along horizontal lines between successive zero crossings for the maximum, or minumum value. Note that this procedure Finds only one extremal point between successive zero crossings. Thus, since the expected number of extremal points is greater than the expected number of zero crossings (see equations 4.3.11 and 4.3.20), we will not find all 204 the extrema with this procedure. However, in practice, the number of extrema due to noise is substantial. The largest extremum in a given interval (which we call a semi-local extremum) is most likely not a noise extremum (although the position of this extremum may be perturbed by noise). In using the semi-local extrema as features we trade off feature density for the assurance that the features are created by events in the scene and not by noise. 205 5.3 - Frequency Response of the Matching Algorithms In this chapter we examine the performance of three different matching schemes, using three different reconstruction methods, as we vary the frequency domain characteristics of the surface disparity function. The change in the surface disparity function frequency content is attained by varying the value of a g in the disparity function equation (5.1.1) and (5.1.2). The values of a g used in the experiments in this chapter are 5, 10, 15, 20, 25, 30, 35, 40, 45 and 80 pixels. All stereo pairs in the experiments were 256x256 arrays of white gaussian random numbers with mean 128 and standard deviation of 64. The left and right images are related by the following equation: I r i g h t(x,y) = Ileft(x+tf(x,y),y) (5.3.1) where d(x,y) is given in (5.1.1). Figure 5.4 shows the zero crossing pyramid of such a random image pair. The three matching algorithms that are used are: 1. - Single pass, zero crossing features only. 2. - Two pass iterative, zero crossing features only. 3. - Two pass iterative, zero crossing and extremum features. The size of the matching region was seven pixels wide (r = 3 pixels). Figure 5.5 shows how the RMS disparity error varies as the size of the matching region changes (single pass, zero crossings only with reconstruction by the transformation method). Beyond r m there is not much difference in the measured RMS disparity error. This is because the matching is done from the centre of the matching region outwards. Thus a feature in the outlying parts of the matching region will be taken to be the match only if there is no features closer towards the centre of the matching region (see chapter 4.5). For large r m this will have a very small probability of happening, and hence increasing r past a certain point will have little effect 206 FIGURE 5.4 The zero crossing pyramid of a random image pair. For the purposes of the experiments three resolution levels were used. The variation 207 o CM o Ui Ui CO © a, tn • i—< Q ° S i z e of R • = 2 o = 3 = 4 10 20 T 30 40 50 Surface Sigma —r-60 — r -70 80 FIGURE 5.5 The variation of the RMS disparity error with the matching region size. of the RMS disparity error with changes in the number of resolution levels is shown in figure 5.6. It can be seen from this graph that at least three levels of resolution are required for these experiments. The three reconstruction methods used in these experiments are those listed in section 5.1. The relaxation method is used only in the iterative procedures, as the convergence is too slow in the single pass case. Figure 5.7 charts the RMS disparity error as a function of the number of relaxation iterations performed, and provides a visual indication of the convergence. It is evident that even more iterations are necessary for complete convergence. However, even at fifty iterations the amount of computation is very high, and the reconstruction takes longer than for the averaging or transformation methods. The results of the frequency response experiments are summarized by the graphs shown in figures 5.9 to 5.20. The thick solid line in each of these graphs represents an estimate of the RMS error due to the regions of the image that can not be matched (because the two images do not overlap in these regions). We assume that the disparity value FIGURE 5.6 The variation of the RMS disparity error with the number of resolution levels. Surface Sigma FIGURE 5.7 The RMS disparity error as a function of the number of relaxation iterations. 209 obtained in these regions will be, on the average, one half the disparity of the actual disparity at these points. This assumption is supported by examination of the actual disparity values obtained in these regions in the experimental tests. Perspective shaded plots10 of the disparity functions obtained with algorithm 1 and the warping reconstruction method for disparity function a of 10, 20, 40, and 80 are given in figure 5.8. It can be seen that the majority of the matching errors are to be found at the edges of the image. Also noticeable is the degradation in performance for the higher bandwidth surface, evident in the poorly defined ridge in the peak of the measured disparity function in the o— 10 case. In figure 5.21 is shown perspective, plots of the error maps obtained using matching method 1 for the three different reconstruction methods, for the case of a g = 40. The number of relaxation iterations used for the relaxation reconstruction was 200. Note that the errors are typically localized to small patches of relatively high error. The warping method is seen to yield the lowest error. Figure 5.22 shows perspective plots of the disparity functions for these cases. The warping method results in the least amount of spike errors, but produces a surface that is not as smooth as the relaxation or averaging methods. The following points can be made from these results: 1. - The averaging method performs comparably to the transformation method, while the relaxation method is somewhat worse. This is to be expected since the transformation method, due to truncation, is far from optimal, and the relaxation method is not convergent The warping or transformation method is seen to be the best overall in terms of reducing the amount of the highly localized spike errors. 2. - The addition of extrema features to the zero crossing features improves the performance with relaxation reconstruction, but does not appreciably affect the performance with the other two reconstruction methods except at low a g values. This can be explained as due to the fact the the relaxation reconstruction method requires less iterations to converge 1 0The program for plotting these shaded plots was written by Richard Jankowski of the Electrical Engineering Department U.B.C. 210 FIGURE 5.8 Perspective plots of the disparity function obtained using matching method 1, with the warping reconstruction method. when the sample density is increased, while for the other methods, increasing the sample density may not improve the reconstruction if the disparity function is already oversampled. 8-O co u. IM CO CO CM 05 10 20 T -30 40 Recon. Method a Warping . ?_. Ayerag ing 50 Surface Sigma 60 —r— 70 B0 FIGURE 5.9 RMS disparity error as a function of og, for matching algorithm 1, maximum disparity = 5. Surface Sigma FIGURE 5.10 RMS disparity error as a function of og, for matching algorithm 1, maximum disparity = 10. -1 O eo u b Ed Recon. Method D Warping .?._ Averaging 20 30 40 50 Surface Sigma 60 70 80 FIGURE 5.11 RMS disparity error as a function of o for matching algorithm maximum disparity = 15. 30 40 50 Surface Sigma FIGURE 5.12 RMS disparity error as a function of o' for matching algorithm maximum disparity = 20. O CO-t-i u> Cz3 efl a, 3S CM -O -- A A A A Recon. Method o Warping . ? . . A V erag 1 ng ^"Relaxation • u 10 20 —T— 30 —r-40 50 —r-60 70 80 Surface Sigma FIGURE 5.13 RMS disparity error as a function of ag, for matching algorithm 2, maximum disparity = 5. O P 5 -i i i i i i i I T 0 10 20 30 40 50 60 70 80 Surface Sigma FIGURE 5.14 RMS disparity error as a function of ag, for matching algorithm 2, maximum disparity = 10. Recon. Method D Warping . 9.. Averaging a Relaxation O n u CM ed a CO CO Recon. Method o Warping . 9_. AY?rag ing a Relaxation - r -10 20 30 40 50 60 70 80 Surface Sigma FIGURE 5.15 RMS disparity error as a function of o for matching algorithm 2, maximum disparity = 15. - r -o-| i 1 i '• 1 1 i i 0 10 20 30 40 50 60 70 80 Surface Sigma FIGURE 5.16 RMS disparity error as a function of a g ) for matching algorithm 2, maximum disparity = 20. 215 u> O n Ui u Ed ed CX CO Q cu-o-Recon. Method a Warping . ?.. A Y crag l ng A Relaxation i S A ., i i i 0 10 20 30 40 50 Surface Sigma FIGURE 5.17 RMS disparity error as a function of o maximum disparity = 5. 60 70 80 S' for matching algorithm 3, u O n u H CM-cd CO CO OS Recon. Method • Warping Averaging Relaxation "T" 10 20 —T" 50 30 40 Surface Sigma FIGURE 5.18 RMS disparity error as a function of o maximum disparity = 10. 60 70 80 S' for matching algorithm 3, k l O m-u W CM -10 CO Recon. Method a Warping . 9 . . AY e Jl a JJ 1 ng ^""Relaxation 10 20 50 30 40 Surface Sigma 60 70 80 FIGURE 5.19 RMS disparity error as a function of og. for matching algorithm maximum disparity = 15. Surface Sigma FIGURE 5.20 RMS disparity error as a function of os, for matching algorithm maximum disparity = 20. 217 This has as a corollary the statement that the reconstruction error is not a major component 218 WARPING AVERAGING RELAXATION FIGURE 5.22 Perspective plots of the disparity function obtained for a =40 for the three reconstruction techniques. of the disparity error, at least not for the larger a g values. 219 3. - The RMS disparity error is proportional to the maximum disparity value. This has been seen in chapter 4.4 to be the case for the reconstruction error. It is also evident that the other sources of error, that is, additive noise, quantization error, and filtering error, would, in general, not be related to the maximum disparity value. This implies that the reconstruction process, coupled with the matching process, as described in chapter 4.5, at low resolutions, is responsible for the bulk of the disparity error. 4. - The RMS disparity error, after subtracting the estimated RMS component due to edge effects, is seen to increase as ag decreases. This is due to the reconstruction error and roughly follows the example given in chapter 4.4 (where the same type of disparity function was use, but a slightly different type of filter). This effect can be seen in figure 5.18 for the a = 10 case where the disparity function is not completely reconstructed, due to the rapid change in the disparity function. 5. - The RMS disparity error in the case of the warping reconstruction method is seen to paradoxically rise for large ag values. This is due to edge effects. The warping method assumes a fixed boundary of zero value, while the other two methods assume a free boundary. Thus, in the warping case, the reconstructed function is pulled down to zero at the edges. The error incurred by this pulling down is largest for large ag surfaces since the disparity values near the edges of such surfaces are relatively high compared to the values for low ag surfaces which quickly drop down to zero. 220 5.4 - Surface Gradient Response of the Matching Algorithms The effect of the surface gradient (in the x-direction) on the performance of the matching algorithms is obtained as follows. The RMS disparity measurement error is found for a number of different gaussian cylinder surfaces as in the previous section. However, in these cases the axis of the cylinders were aligned with the y-axis. This means that there is a disparity gradient along the x-axis. Because of this disparity gradient we expect to observe an increase, in the measured RMS disparity error from the case in which the cylinder axis was aligned with the x-axis. To obtain a value for the error due to the filtering effect, we run the same experiment twice, once with the gaussian cylinder aligned with the x-axis and then with the cylinder axis aligned with the y-axis. The difference in the measured RMS disparity error can be attributed to the effect of the V 2 G filtering process on the feature positions. For most values of o the average disparity gradient is approximately given by: daVdxn = dmflv/256 (5.4.1) avg max v ' We performed the experiments for four values of ^ m a x ; 5, 10, 15, and 20. Ten different values of surface sigma were used; these were the same values as used in the experiments of the previous section. Since the average disparity gradient is essentially independent of the surface sigma, the RMS errors obtained using the different sigma values (for the same rf_„v) rn.3.x can be averaged to obtain a better estimate of the filtering error component However the results of this experiment were inconclusive, as the variance of the error differences so obtained were very high. In order to fully examine the effect of disparity gradients on the filtering error component we must somehow decouple the disparity measurement process from the correspondence process. This can be done, for the case of one-dimensional intensity functions, with the use of scale-space matching as described in chapter 6. With this method we can determine the correspondence between two one dimensional intensity functions. Once this correspondence has been established, we can measure the disparity values between 221 corresponding features. We can then compare these disparity values to the actual disparity values to obtain filtering error values. We can then compare these experimentally determined filtering errors to the theoretical predictions given in chapter 4. In chapter 4 it was pointed out that the theoretical analysis of the filtering error could be done only in the case of linear disparity functions. Thus, in the following experiments, we used only linear disparity functions. The case of nonlinear disparity functions is treated in chapter 6. The experiments proceeded as follows. We created a random one dimensional function f(x). A second random function, g(x), was then derived from f(x) using equation (4.3.3). This means that f(x) and g(x) can be thought of as the right and left intensity functions arising from a surface having a disparity function of the form d(x) = (lix. The scale maps of these two functions (for a given value of p\) were then computed. These two scale maps were then matched using the technique described in section 6.2. After the matching had been performed the disparity between corresponding scale map contours were computed for each a value in the scale map. The RMS error between these measured disparity values and the actual disparities for each value of o in the scale map is then obtained. This procedure was performed for four values of the disparity gradient p\; -20/255, -40/255, -60/255 and -80/255. The right hand scale maps of the intensity functions for each of these four cases are shown in figures 5.24 to 5.27. The left hand scale map is the same for each case, since the same random function is used as the left hand function. The scale map of this function is shown in figure 5.23. The results are plotted in figure 5.28. Figure 5.29 shows a theoretical prediction of the expected RMS disparity measurement error due to filtering as a function of a and p\. This prediction was obtained by assuming that the RMS filtering error is given by the square root of the sum of the variance of the theoretical filtering error distribution (equation 4.3.40) and the variance of the quantization error distribution. However, since the variance of the filtering error distribution, as strictly defined, is infinite, we can only obtain a 'pseudo-variance' value. This pseudo-variance is obtained by fitting a gaussian distribution to the actual filtering error distribution. This is performed by setting the peak values of the distributions to be the same. FIGURE 5.24 The right hand scale map for the ^ =-20/255 case. Doing this we obtain for the pseudo-variance the expression given in equation (4.3.41). The 223 F I G U R E 5.26 The right hand scale map for the p\ =-60/255 case, variance of the quantization error is simply given as 1/6 of the zero crossing position 224 FIGURE 5.28 The measured RMS disparity error due to filtering as a function of o for disparity gradients of -20/255, -40/255, -60/255 and -80/255. 225 CM F i l t e r S i g m a FIGURE 5.29 The expected RMS disparity error due to Filtering as a function of o for disparity gradients of -20/255, -40/255, -60/255 and -80/255. quantization level, and is thus equal to 1/6 since the zero crossing position measurements in this experiment were quantized to one pixel, for all a and |3 values. Comparing figures 5.28 and 5.29 show that the experimentally obtained filtering errors are indeed similar to the theoretical predictions. This experiment thus shows that the filtering error effect is obtained in practice and needs to be considered in analyzing the disparity errors produced by a matching algorithm. 226 5.5 - Performance of the Matching Algorithms with Additive Noise In this section we describe experiments that were performed to test the effect of adding gaussian white noise to one of the images in a stereo pair on the performance of the discrete multi-resolution matching algorithms. These experiments proceeded as follows. First the experiments described in section 5.3 (for the case of d m a x = 10) were performed to give a noise-free baseline for the RMS error measurements. Then the experiments were repeated, this time with gaussian white noise of variance on added to one of the images. This extra noise caused a shift in the position of the zero crossings of that image, thereby causing an error in the measured disparity. This was analyzed in chapter 4.2. The difference in the RMS error between the two sets of experiments were tabulated. The results of these experiments are depicted in figures 5.30 to 5.33. The maximum noise variance tested was 400. This corresponds to a minimum signal to noise ratio of 10, as the signal, variance was. 4096. Note that this is a fairly high signal to noise ratio yet the effect of the added noise on the RMS disparity error was still significant. We can obtain a theoretical prediction of the expected RMS disparity error due to the additive noise by using the results of the analysis of this error performed in chapter 4.2. As was the case in the filtering error, the variance of the theoretical additive noise error distribution is undefined. Thus the best that we can do is to qualitatively compare the measured RMS error with a 'pseudo-variance' of the additive noise error distribution. The pseudo-variance measure that we use is defined implicitly as follows: • f P e p ( e P ) d C P = i e _ x V 2 d x = 0.3413 (5.5.1) Thus <7p is the point at which the area under the probability density curve is equal to the area under the standard gaussian (normal) probability density curve in the interval (0,1) (recall that the standard deviation of the standard guassian distribution is 1 and the mean zero, thus the above interval is one standard deviation from the mean of the distribution). The pseudo-variance is plotted in figure 5.34 as a function of the noise variance for a filter a Ul o u U . >><=> Recon. Method • = Warping o = Averaging A = Relaxation 120 160 200 240 280 Noise Variance 400 FIGURE 5.30 Increase in RMS disparity error as a function of added noise variance. Surface o =30, Iterative matching, Zero crossings only. T 160 200 240 280 Noise Variance 320 360 400 FIGURE 5.31 Increase in RMS disparity error as a function of added noise variance. Surface o = 15, Iterative matching, Zero crossings only. O u u< Recon. Method D = Warping o = Averaging A = Relaxation 160 200 240 260 Noise Variance r 1 320 360 400 FIGURE 5.32 Increase in RMS disparity error as a function of added noise variance. Surface a = 30, Iterative matching, Zero crossings and extrema. Noise Variance FIGURE 5.33 Increase in RMS disparity error as a function of added noise variance. Surface a = 15, Iterative matching, Zero crossings and extrema. 229 of \/2 (the highest resolution filter a). It is seen that the pseudo-variance is a linear function of the additive noise variance. This concurs with the experimental evidence that indicates that the RMS disparity error is a roughly linear function of the additive noise variance (note that the statistical nature of the measurements means that there will be some deviation of the measurements from any trend such as the assumed linear one). However, the magnitude of the pseudo-variance is about 1/5 that of the measured increase in the RMS disparity error. This is presumably a result of the fact that our definition of the pseudo-variance was somewhat ad-hoc, and another definition may have produced a closer fit to the measured magnitudes. However the nature of the definition is such that one would expect it to capture more truly the form of the variation in the disparity error. Also, the discrepancy in the error magnitudes is almost certainly due in part to the fact that the errors due to the noise induced shifting of the zero crossing and extrema features (which is v/hat the pseudo-variance is a measure of) can not be decoupled completely from the effects of the matching and reconstruction processes. It is seen from the additive noise tests that the warping reconstruction method performs the best in terms of minimizing the increase in the RMS disparity error. This suggests that the warping method is better than the other methods at smoothing out the high frequency error components. The RMS disparity error increases only slightly as a g is decreased from 30 to 15. This suggests that the mechanism causing the increase in the disparity error in the presence of additive noise is only loosely coupled to the reconstruction process. Similarly, the addition of the extremum features does not affect the increase in RMS disparity error appreciably (except for the relaxation case at a g = 15). This again suggests that the reconstruction process is loosely coupled to the noise error mechanism. 230 o o d - | I 1 1 1 1 I I i T 0 40 80 120 160 200 240 280 320 360 400 Noise Variance FIGURE 5.34 The pseudo-variance of the additive noise error as a function of the additive noise variance, for of = y/2. 231 5.6 - Comparison of the Simplified Matching Algorithm with the DispScan Multi-resolution Matching Algorithm. In this section we compare the performance of the multi-resolution Dispscan matching algorithm with the simplified matching algorithm. We tested the multi-resolution Dispscan algorithm on the random image pairs with varying a g and a maximum disparity of 10 pixels. The 9x9 average and warping methods were used to do the disparity function reconstruction. The resulting RMS disparity errors, as a function of a g are plotted in figure 5.35. It is seen that the multi-resolution Dispscan matching algorithm performs somewhat better than the simplified method. However the Dispscan algorithm involves much more computation and takes almost twice as long. It is also observed that the Dispscan algorithm performs better with the warping reconstruction method than with the 9x9 averaging reconstruction method. This indicates that, even for the Dispscan algorithm, the reconstruction process is very important to the performance of the overall matching process. Ui O co U Ui W Algorithm D dispscan. avg o dispscan, warp a simple, warp 30 40 50 Surface Sigma 70 FIGURE 5.35 The RMS disparity errors for the Dispscan and simplified matching algorithms as a function of a_. 233 5.7 - Application of Image Analysis to Log Scaling In this section of the thesis we discuss the application of image analysis methods to a particular industrial application, that of log scaling. Log scaling is the process of measuring, or estimating the volume of wood in a log or group of logs. There are four basic reasons for wanting to scale logs: (Sinclair, 1980) 1. To determine payments and royalties to the Crown; 2. To determine payments to logging contractors; 3. To find the volume of logs for sale, trade or transfer; 4. To calculate divisional, departmental, or area production. The two most common techniques for scaling logs are weigh scaling and stick scaling. Weigh scaling involves weighing a truckload of logs and subtracting this weight from the weight of the empty truck. The resulting load weight can be used to estimate the total volume of the load of logs provided the mean density of the logs is known. Stick scaling involves a man walking over a log as it lies on the ground and measuring the length and the widths of the log at the two ends with a measuring stick. These measurements are written into a log book from which, at the end of a shift or workday, the log volumes are calculated. Stick scaling is more accurate on a piece by piece basis than is weight scaling, as the log density used in the estimation of the weight scale volume can be in error due to variations in the log's moisture content as well as to invalid assumptions as to the species of log. In contrast weight scaling provides information about a set of logs only. However, the main drawback of stick scaling is that it is very slow compared to weight scaling, especially if the logs are small. This is an important consideration now that forest firms are harvesting smaller and smaller logs. Sample scaling, which involves stick scaling only a small sample set of logs from a larger group of logs, can be used to speed up the scaling operation, but is useful only for large, homogenous, groups of logs for which volume statistics can be 234 adequately characterized (similar to the case of weight scaling). Sinclair (1980) has noted that, in a large number of British Columbia coastal logging operations, the log scaling function was the controlling factor in determining productivity. Hence, if one could speed up the log scaling process, then presumably one could improve the productivity of a sort yard operation11. Sinclair also states that as the scalers are rushed or pressured, the quality of the scaling is diminished (the measurement error increases). Thus automating the scaling process, which would presumably be unaffected by the workplace pressures that a human scaler is exposed to (such as the menacing presence of very large logging machinery rumbling about nearby), would be expected to increase the reliability and repeatability of the log scaling process. This section of the thesis looks at a process whereby the log scaling operation can be done automatically. One could conceivably automate the stick scaling operation by designing a robot to directly replace the human stick scaler. This machine would walk along the logs and measure them with a shiny chrome measuring stick. This approach however, would have most of the human problems of stick scaling as well as some new ones. At any rate the problems of engineering such a mechanism are enormous and the present state of the art is not sufficiently advanced to permit such a design. Demaerschalk et al (1980) have demonstrated improved accuracy over weight scaling by the use of optical methods in truck load volume estimation. Their method involved taking photographs of the truck load from the back and the side, enlarging these photographs and manually estimating (by counting the dots on an overlay grid) various geometric parameters such as the area of bark, the number of logs and so on. These parameters were input to a statistical regression procedure which provided an estimate for the volume of the truck load. Although this technique was designed to replace weight scaling, the general principle, that of optically sensing and measuring log parameters, could be applied to the domain of stick scaling; that of logs lying flat on the ground. n A sort yard is a place where logs from the harvesting grounds are sorted as to species and grade, bundled, and sent to the mills. 235 This type of procedure, instead of manually talcing photographs, enlarging them and painfully counting dots, could, in principle, be fully automated with television cameras and (special purpose) image analysis hardware. Simple techniques The simplest technique that one can use for processing images, yet one of the most widely used in log handling applications (as well as other applications) is that of binary thresholding. In this technique, all piexels whose intensity is greater than a given threshold is a assigned a high or T level. All pixels whose intensities fall below this threshold would be assigned the complementary low or '0' value. Thus the image is partitioned into two sets, those comprised of pixels with level T and the other containing the pixels with level '0'. If the intensity of the object(s) that one is interested in is sufficiently different from the other parts of the image, then this method provides a simple mechanism for separating the object from its surround. However, if the objects do not have an uniform intensity distribution, then the thresholding operation will cause the object to become broken up. Similarly if the surround does not have a uniform intensity distribution, then parts of it will be confused with the object. Figure 5.36 shows a photograph of a typical log lying on a flat deck. Figure 5.37 shows the result of applying a threshold equal to the mean intensity of the original image. Note that the black and white regions do not correspond to any meaningful structures. In general, to make the thresholding technique viable requires that one be able to control the lighting conditions, as well as the characteristics of the objects being imaged, very closely. This, for example, can be done by backlighting of the log which reduces the apparent texture of the logs visible surface (since it is in darkness) and provides a bright, uniform background. This technique, commonly refered to as broken beam scanning has been successfully employed in sawmills, where such a setup can be readily arranged. Examples of such systems are given in (Vit, 1962) and (Hand, 1975). Another method which has been proposed for the implementation of the thresholding technique, is to paint over the surface of the log with highly reflective white paint (thus reducing the effect of texture) and darkening the background (Miller and Tardif, 1970). In (Whittington, 1979) is described a method whereby 236 FIGURE 5.37 The effects of thresholding figure 5.36 an array of photodiodes senses, and thresholds, the light reflected off of a peeled log. 237 Vadnais (Vadnais, 1976) describes a similar system that detects the boundaries of a sawn board in order to determine proper edging strategies. However, in the situations encountered in most sort yards, one can not practically control either the lighting or the characteristics of the surfaces (logs and background). Thus the image processing technique must be able to handle textured surfaces and varying contrasts, such as is the case in figure 5.36. Some of the more advanced log handling image processing systems utilize edge detection algorithms. These techniques are more powerful than the thresholding techniques because of their relative insensitivity to changes in illumination across the scene. Instead of trying to distinguish between the body of the log and the background, edge based techniques search for the boundary between the log and the background. Since logs are usually simply connected (no holes) the body of the log can be determined from its boundary. Edge detection is performed by measuring, with some differential operator, the local variation in the intensity. If this variation exceeds a given level, we infer an edge in that locality. There are many such edge operators that have been used (see Rosenfeld and Kak, 1976 for a review, also see chapter 2 of this thesis for a discussion of the Marr- Hildreth theory of edge detection). However, even edge detection based schemes are not in themselves adequate for our application. Figure 5.38 show the results of applying an edge detection operator to the image of figure 5.36. Notice that, while the boundary of the log and the background is detected, many other edges are found as well. It is a very difficult task to determine which of these edges belong to the log-background boundary and which are due to the texture in the log interior and background. Thus, to solve our particular problem, which is to distinguish logs from the background when both the logs and the background are highly textured and of varying contrast, we require a more complex image analysis technique. In the previous chapters of the thesis we have seen that stereo depth perception works best under just these conditions. That is, the more edge segments there are in a stereo image pair, the greater the reliability of the depth measurement process. Also, since the logs in our application are lying on a flat deck, their occluding boundary (which is the boundary that we see in the image) is closer to 238 FIGURE 5.38 The result of applying an edge operator (Marr-Hildreth) to figure 5.36. the camera than is the background. Thus thresholding the depth value will give us, in principle, an error free segmentation of the logs from the background. There are other methods, other than the one described in this thesis, for determining the distance to the logs; these include laser ranging, and structured light techniques (Jarvis, 1983). These techniques are finding widespread application in industrial vision systems. However, they generally require a constrained environment (in terms of controlling the ambient light). Such control may not be attainable in a logging sort yard, and a system that can operate under more arbitrary conditions would be desirable. Analysis of the measurement accuracy The setup of the video log scaling system is given in figure 5.39. We assume that the camera axes are parallel and that the logs are lying on a flat deck parallel to the image planes of the cameras and that the focal points of the cameras are the same distance from the deck. In the Appendix is derived the formula for the depth in terms of the disparity. For the simple geometry involved in this application we have: 239 FIGURE 5.39 The video log scaling system setup. z = fd/D or D = fd/z (5.7.1) where D is the disparity, z is the vertical depth from the camera focal point to the the point on the log or deck being imaged, f is the camera focal length and d is the distance between the focal points of the two cameras. Let the size of the sensor be given by s and the number of imaging elements on the sensor (pixels) be denoted N. Then we have that: p = N*D/s (5.7.2) is the disparity measured in pixels. Let the distance from the camera focal points to the deck be denoted h. Then the minimum disparity will be: (5.7.3) 240 If the maximum log radius is denoted r max then the maximum disparity range that the matching algorithm will be faced with is given by: Ap = 2*N*f*d*r/[(s*h)(h-2r)] (5.7.4) The mean depth accuracy is given by 2r/Ap (for quantization of disparity to the nearest pixel) or: In this application the depth resolution is not important More important is the horizontal position accuracy. However the scaling factor for the horizontal position is almost constant and can be calibrated beforehand or during the log scaling process if ground control points can be detected in the images. If the horizontal scale factor is known, the error in the measured ground position, AX, is a function of the position error in the image, Ax. It can be shown that: where An is the image position error measured in pixels. In the best case analysis the only error in the image position will be due to the pixel quantization. In this case An = l/v/12. If we are viewing a cylindrical log with dimensions of LxW, then its volume is given by: If the errors in the measurements of L and W are equal to A X , then we can write the normalized error in the computed volume as: depth resolution = sh(h-2r)/Nfd (5.7.5) A X = (h + f)s/(fN) An (5.7.6) V = TT(W/2)2L (5.7.7) A V / V = (1/L + 2/W) A X (5.7.8) 241 The Field of view common to both cameras has dimensions (sh/f) (vertical) by (max(sh/f-d,0)) (horizontal). For the camera setup in the examples that follow, typical values for the camera parameters were: d = 10 inches N = 256 pixels h = 100 inches f = 46 mm r = 8 inches s = 25 mm Thus we have that: p = 0.1 mm/pixel Pmin = 4 5 P i x e l s Ap = 9 pixels The mean depth resolution = 1.89 inches/pixel The field of view is 55.5 inches by 45.5 inches. If the only errors in localizing positions in the image were due to quantization of the pixels, then we have that Ax = 0.062 inches. If there is a log in the scene with dimensions L=40 inches and W= 10 inches, then the normalized error in its volume computation would be A V / V = 0.013 (or 1.3%). In practice An may be higher; on the order of 1 pixel. In this case A V / V would rise to 4.6%. 242 Doubling the resolution to N = 512 would extend the required disparity range to 18 pixels but would halve the error in the volume. Moving the cameras farther apart would also increase the range in disparity values, but would do so at the cost of less overlap in the stereo pair, which means that disparity values would be available over a smaller region. Because we are only using the disparity values to distinguish the logs from the background, we do not require high disparity resolution. Thus we can move the cameras closer together and get an increased image overlap and a decrease in the disparity range. As we have seen in the experiments in the previous section, and in the theoretical analyses of chapter 4, the matching algorithm performs better for smaller disparity ranges. Thus, the reduced matching range created by moving the cameras closer together should improve the extraction of the disparity maps. We can not move the cameras too close together, however, as then the logs and the background will not be sufficiently separated in disparity value to allow a noise free segmentation. Experiments have shown that a disparity range of between 4 and 15 pixels gives good results. In the cases described later, the disparity ranges were on the order of 5 pixels. In a practical system we would require a vertical (from the camera's point of view) field of view of about 10 metres, and a horizontal field of view of at least 2 metres. The height of the camera would be on the order of 10 metres above the ground on which the logs are lying. Given a sensor size of 25 mm the field of view requirements result in a needed focal length of 25 mm. Assuming a resolution of 512x512 (N=512) and taking r=0.25m, and requiring that the disparity range, Ap to be 10 pixels gives that the distance between the cameras be 4 metres. The horizontal field of view is then 6 metres. The mean depth resolution is 5 cm/pixel. Thus a log with a diameter of 25 cm would have a disparity range of 5 pixels. Finding the disparity map The matching algorithm described in chapter 2 can be used to obtain the disparity map from the stereo image pair of the scene containing the logs. Because of the limited disparity range produced by the log scene for a 256x256 sensor resolution, it was not necessary to use more than one level of resolution in the experiments that we performed. In 243 a practical situation the sensor resolution may be larger, in which case more than one resolution level may be required. However, even when the sensor resolution is increased to 512x512 the disparity range is not excessively large. Thus we can expect that the multi-resolution matching algorithm will perform well. For the matching process to work with only a few levels of resolution, the initial disparity estimate must be as accurate as possible. Fortunately, the nature of our application ensures that this can be the case. The minimum disparity will always, for parallel camera axes, occur at the background. Since the background is fixed in relation to the cameras, its disparity can be computed or measured beforehand. The size of the logs can be bounded by some expected value (say 75 cm), and so we can bound the maximum disparity value. Thus a reasonably good estimate to the average scene disparity can be computed. The matching process was performed on the stereo image pairs shown in figure 5.40 using only one level of resolution. The initial disparity estimate was measured, for the purposes of this experiment, directly by hand on the images. There was also a vertical misalignment which was measured by hand on the images and accounted for in the matching process. These steps can be done in a practical system as an initial calibration step. After this is done for a given camera setup a number of images can be processed with no human intervention. The zero crossings of the left and right images in the stereo pairs of figure 5.40 are shown in figure 5.41. We do not do the reconstruction process at the highest resoluion in this case because we wish to both save computation and not distort the disparity discontinuity at the boundary of the log. The segmentation process to be explained next does the disparity function reconstruction in such a way as to accurately retain the position of the log boundary. It is still necessary to perform the reconstruction step at the lower resolutions. Segmenting the disparity map to find the log Once the disparity map has been computed it can be thresholded to extract all the objects which have a disparity greater than a given value. If the deck that the logs are lying 244 FIGURE 5.40 Two stereo image pairs depicting single log scenes. on is parallel to the image planes of the cameras then the disparity of the deck will be constant and the (magnitude of the) disparity of the logs will always be greater than the disparity of the deck. Thus the objects that are left after the thresholding process should always correspond to logs. In practice, there is always a small amount of noise in the computed disparity function, which results in some isolated points which are not part of the log. These can be Filtered out by requiring that there be a given number of other points in a given sized neighbourhood about the point in question. If this condition is not satisfied then the point is removed. At this point in the processing we have a sparse set of points which loosely define the logs region, as seen in figure 5.42, which depicts the result of thresholding and filtering the disparity map obtained from the stereo pairs of figure 5.40. Note that the disparity map of figure 5.42b) contains many incorrect disparity values along the right hand side of the image. This is due to the fact that there was very little image detail in this region and the zero crossings that were found were mainly due to camera noise. However, when the segmentation process, described below, is performed, these noise points form a distinctly non-log shaped region and are thus recognized as a non-log object 245 FIGURE 5.41 The zero crossings of the stereo pairs shown in figure 5.40. and discarded. 246 FIGURE 5.42 The thresholded disparity maps of the log scenes. The remaining processing steps require that we know the boundary of our log region. Thus we must somehow obtain this boundary from the set of points produced by the 247 thresholding process. We accomplish this in the following manner. For each pixel in the thresholded image we search fifteen pixels to either side of the pixel in question along a line with an orientation of 9 0 ° . If there is at least one 'hi' pixel on both sides of the centre pixel then we set the centre pixel to 'hi'. We then repeat this process with search line orientations of 7 7 ° , 4 5 ° , 22° and 0 ° . Finally the entire process is repeated. It can be seen that, if instead of just the five angles above, we used a continuum of angles, and if we extended the search range to infinity instead of 15 pixels, and if the process was iterated indefinitely, the boundary of the resulting filled in region would be the convex hull12 of the original set of thresholded points. However, often the convex hull is not a desirable representation for the log region, since sharp protusions on the log's surface, such as may result from knots or branches, will cause the convex hull to widely diverge from the true boundary of the log as shown in figure 5.43. For this reason, as well as to save on computation, we limit the region of search in the filling in process to 15 pixels. Thus, a non-convex region can be obtained, which is still filled in and roughly conforms to the log shape. The result of applying this filling in process to our real log image is shown in figure 5.44. We can now determine the log boundary simply by noting which pixels are 'hi' and are adjacent to only one 'hi' pixel to the left and right The log boundaries obtained for the real log images are shown in figure 5.45. The actual log boundaries are also drawn in (dotted lines) for comparison purposes. In practice the image may contain more than one log or there may be other objects (real or hallucinated by the computer) which pass the disparity threshold. In this case there will be more than one connected region in the segmented output In this case we can examine the shapes of these regions to see whether or not they have the characteristics of logs (long and thin). If they do then we compute their volumes. If not, we discard them. The moment calculation described in the next section requires that a representation of each boundary be constructed, known as the chain code representation (Freeman, 1974). The algorithm for deriving this representation from the boundary image map will not be given 1 2 The convex hull of a set of points is the convex polygon of smallest area that encloses all of the points in the set 248 LOG BOUNDARY •CONVEX HULL FIGURE 5.43 The approximation of the log boundary by its convex hull. here. However we note that it is possible to distinguish between the different closed boundaries by assigning them unique labels. For example we could number the boundaries 1. 2, 3 and so on. We can then perform computations on each boundary separately. Such computations would include shape measurements to distinguish between logs and other objects which do not have the long and slender shape associated with logs. Other computations include the determination of the moments of the region inside the boundary, which as we will see in the next section, allow us to estimate the volume of a log. Volume Computation Once the log has been succesfully separated out from its background, its volume can be estimated. Over the course of time a number of simple formulae have been developed to estimate the volume of a log based on a small number of measurements. Since speed is often required in the scaling process, having to make only a small number of measurements is important There are two scales of volume that are commonly used; the board foot and the cunit (or cubic foot) (Watts, 1983, the Forestry Handbook for B.C.). A board foot is the 249 FIGURE 5.44 The filled in log region, volume of wood in a piece of dimensions 12"xl2"xl". There are board foot scaling rules and 250 1 V \\ \\ \ \ \\ \ v \\ [t V \ \ \ \ V \ V J V /? V. Sr FIGURE 5.45 The detected log boundary. cubic foot scaling rules. These rules include adjustments for such things as kerf loss (due to non-zero width sawmill blades) and butt flare. Due to these adjustments the relation between 251 board feet and cubic feet measures is not 12 board feet to the cubic foot but rather 5.63 (Dilworth, 1975). We list some of the scaling rules below (from Dilworth, 1975). Board Foot Scaling Rules Knoufs rule of thumb - V = (D 2-3D)L/20 where V = log volume, L = log length in feet, and D = mean diameter in inches. Girard and Bruce rule of thumb - V = 1.58D2-4D-8 (for 32 foot logs). Scribners Decimal C log rule (Scribners Dec. C.) - drop off the least significant digit of the volume (gives volume in tens of board feet). This is the official rule for the U.S. forest service. British Columbia rule - V = (D-3/2)2(.7854)(8L/132). This is the old B.C. standard (superceded by the Smalian rule given below). Sammi's rule of thumb - V = (D-l) 2 L/20 Brereton - V = 0.0654D2L. Used for measuring logs to be exported on ships. Doyle - V = (D-4)2L/16. Erratic measure, not widely used. Cuhic Foot Scaling Rules Rapraeger's rule - V = .005454154(D+L/16)2L. D = diameter at the small end. Assumes a taper of 1 inch every 8 feet of length. Sorenson's rule - V = .005454154(D + L/20)2L. Assumes a taper of 1 inch every 10 feet of length. Ffuber's rule - V = C L where C = area at centre of the log. This is unsuitable for decked, rafted or loaded logs due to the difficulty of measuring the radius of the centre. Smalian - V = (b+t)L/2 where b = area of the base of the log and t = area of the top of the log. This is the official rule in British Columbia. 252 Dilworth (1975) supplies a table of the relative accuracies of the cubic foot rules listed above. These were obtained from measurements on a series of Douglas fir logs, and the volume obtained by Newton's rule, which is V = (b + 4c + t)L/6 (where c is the area of a slice through the centre of the log), was used as a baseline for comparison purposes. The results were as follows: Smalian : -5.8% Huber : -2.9% Sorenson : -3.78% Rapraeger : -4.7% In our automated process, we could, in principle, determine the volume more accurately than any of these rules by summing up the incremental areas along the axis of the detected log. However, a given user of the system may want the volume measures to be comparable to the value given by the log scaling rule that they are used to (such as the official government rule). In this case we would use the appropriate log rule. What is required of our image analysis process in order to compute the volumes, given the segmentation of the log? If we were to use the accurate method, we would need to determine the axis of the log; a deceptively difficult task. The use of one of the various log rules requires that one or more of the following be known: The length L of the log along its major axis, the diameter of the log at the top, centre and bottom, and the mean log diameter. In the system described herein the following method is used to find the axis of the log. We first measure the moments, m 0 0 , m10, mou mlu m 2 0 and m 0 2 , where: m 0 0 = / f D dxdy (5.7.9) 253 m 1 0 = J*/Rx dxdy (5.7.10) m oi •= / / R y dxdy (5.7.11) mn = / TRxy dxdy (5.7.12) m 2 o = //R x 2 dxdy (5.7.13) m o 2 = J/RY 3 dxdy (5.7.14) Using a discrete version of Greens's theorem (Tang, 1982) the above moments can be computed with line integrals along the boundary of R (R is the segmented log region), thereby cutting the required computation by an order of magnitude. The reader is directed to (Tang, 1982) for the exact formulation of the moment calculations. Once these moments have been computed the centroid and axes of the ellipsoid that also gives rise to these moment values can be determined. The centroid of this ellipsoid is given by: (xc,yc) = (m 1 0/m 0o,moi/moo) (5.7.15) The axes of the log will pass through this point The axis vectors are the eigenvectors of the following matrix: C = C20 c„ C n Co 2 (5.7.16) where: 254 C n = rrin/m, '00 (5.7.17) C 2 0 = m 2 0 /m ( [00 (5.7.18) C 0 2 = ni 0 2 /m ( '00 (5.7.19) The eigenvalues of this matrix give the lengths of the axes corresponding to the perpindicular eigenvector. Once the long axis has been determined we can search along this axis for the boundary points which cross this axis. The distance between these points gives the estimated length of the log. Correspondingly, we can find the width of the log by finding the intersection of the short axis with the boundary. These width and length estimates can be used in a number of the above scaling rules. We should be able to obtain a more accurate volume calculation by integrating the incremental areas measured along the long axis. We can also obtain the radius of the middle of the log, which is needed in some of the scaling rules, by measuring the width of the log along the short axis (through the centroid). These measurements are depicted in figure 5.46. Note that the above procedure will not work very well for objects that are warped or curved as the central axis cannot be approximated as a straight line as we have assumed. Also the centroid may, in such a case, lie outside the log region. The algorithm described above requires that the centroid be inside the boundary. More complex shape analysis techniques be employed for such cases. We have not examined this problem any closer than this. However, the automated system would in all likelihood be used to measure high value logs which are unlikely to exhibit warpage and other defects. We have applied the above procedure to the real log images pictured earlier. The actual volume of the visible portion of the logs were measured by hand (stick scaling) using Newton's formula and are compared below with the volume computed using the video method 255 FIGURE 5.46 Fitting an ellipsoid to the log region. with a number of scaling rules, including the integration method. The first number is the volume for the first log (figure 5.40a) and the second number for the log shown in figure 5.40b. Comparison of Volume Estimates (in cubic feet) 1. Hand Measured Volume = 1.480 ; 0.7200 2. Integration along Axis = 1.749 ; 0.564 3. Length*7T "(Width/2)2 = 1.749 ; 0.637 4. Smalian Rule Volume = 1.538 ; 0.617 5. Huber's Rule Volume = 1.678 ; 0.698 6. Sorenson's Rule Volume = 1.454 ; 0.617 256 7. Rapraeger's Rule Volume = 1.468 ; 0.619 8. Newton's Rule Volume = 1.631 ; 0.671 9. Frustum of Cone Volume = 1.536 ; 0.617 10. Average of the estimates = 1.590 ; 0.630 It is seen that the correspondence of the various estimates to the hand measured values vary somewhat. The differences range from about 2% to 20%. The experiments described in this thesis were done chiefly to show that stereo disparity can be used to segment out the log from its background, and were not designed to obtain statistics as to the accuracy of the system over a large sample of logs. More experimentation is required in this regard. Also better techniques for obtaining the log region from the thresholded disparity map could be devised, which would increase the accuracy. However, it is evident that stereo disparity can indeed be used to get an estimate (around 10% accuracy with the methods given here). The algorithm as it stands now could be implemented in off the shelf image processing hardware or with special purpose hardware. This has not been done but is the logical next step. Such a hardware implementation would enable a test of the algorithm in the industrial setting. This test would presumably bring to light some unthought of practical problems which need solving, and can also be used to provide a statistical estimate of the obtainable volume accuracy obtainable (after performing a large number of tests on different logs). Following this prototype phase, a production model could be developed, if this was indicated by the technical and economic analysis of the prototype performance. 257 5.8 - Summary of Chapter 5 - An efficient method for implementing the zero crossing pyramid construction was presented. - Experiments were performed to test the performance of the matching algorithms on random noise stereograms with non-constant disparity functions. - The RMS disparity errors were seen to decrease as the disparity functions became smoother and smoother. - The disparity errors were seen to be concentrated in small patches. - The warping or transformation reconstruction method worked better than either the 9x9 averaging or relaxation reconstruction methods, with regard to decreased RMS disparity error. - The RMS disparity errors were seen to increase with the disparity range of the disparity function. This can be explained as being due to the reconstruction error. - The filtering effect predicted in chapter 4.3 was observed experimentally in the case of one dimensional random image pairs. Its effect on the matching of two dimensional image pairs could not be ascertained. - It was shown experimentally that the increase in the RMS disparity error when gaussian white noise was added to the input images was linearly related to the standard deviation of the added noise. This was in agreement with the predictions of chapter 4.2. - The multi-resolution DispScan algorithm was seen to be slightly more accurate than the simplified matching algorithm. This improvement, however, was at the expense of increased computation. 258 - The simplified matching algorithm was successfully applied to the log scaling problem. - Moment based methods were developed to estimate the long and short axes of the logs, from which measurements can be made to be used in volume estimate calculations. - The difference in the volume estimate obtained from these calculations and the hand measured volume was on the order of 2 to 20%. The errors are due in part to the boundary finding methods used and in part to the pixel quantization. Errors also arise from incorrect disparity values near the true boundary, used. Better boundary finding methods may result in more accurate estimates. 259 VI - S C A L E SPACE FEATURE M A T C H I N G 6.1 - Introduction This chapter discusses the possibility of using scale space representations in the stereo matching process. The topics covered in this chapter are illustrated in figure 6.1. We have shown that one of the main problems that the discrete multi-resolution feature matching algorithms suffer from is the error induced by the reconstruction process. This error is largest at low resolutions and can cause the multi-resolution matching algorithms to become unstable. It would be preferable, then, to have an algorithm in which the reconstruction process would not be required, at least at low resolutions. We have also seen that the accuracy of the disparity measurements decrease as the resolution decreases, at least for 'tilted' surfaces. Thus it would be preferable to have a matching algorithm that made disparity measurements at high resolutions only. Let us summarize these two conditions on the matching algorithm as follows: C l - The disparity measurements are to be interpolated only at the highest resolution level. C2 - Disparity measurements are to be made (or used) only at the highest resolution level. From the above, we can say that a desirable feature matching algorithm would be one that in some way refrained from making any disparity measurements, and interpolating these measurements, until the highest possible resolution level. However, as we have seen earlier, the ambiguity between features increases as the resolution increases, making the matching problem very complex. In order to reduce the amount of feature ambiguity we must reduce the resolution. However, reducing the resolution also reduces the accuracy of the disparity measurement Clearly, some form of multi-resolution matching is necessary. The question that remains is - is there any multi-resolution feature matching algorithm that can satisfy conditions C l and C2 defined above? 260 Introduction CH6.» Matching without k Scale space disparity measurement matching or reconstruction ' 1 CM 6.2. CH 6.1 Problems with scale space matching Biological implications CH C.s Mokhtarian's method JL CH6.I A new matching method CM 6.7 Experiments (chapter 5.4) CM 64 Matching 2D scale maps CM 6.3 (*) Indicates New Material FIGURE 6.1 The topics covered in this The answer to this question lies in the Before we describe how the use of the scale chapter. scale space representation of the stereo images, space representation allows this question to be 261 answered in the affirmative, we should discuss again the reasons why discrete multi-resolution matching algorithms cannot satisy conditions C l and C2. ln order for the matching process to take place at a (discrete) resolution level, with a minimum of feature ambiguity, we must have a good enough estimate of the disparity so that the size of the search region for a match is sufficiently small to ensure that the percentage of false matches is suitably small. This means that we have some a priori knowledge of the disparity function which can only come from the lower resolution matching process. This means that the algorithm must have measured the disparity at some lower resolution. This means that condition C2 cannot be met Furthermore, since the low resolution disparity measurements are sparser than the features to be matched at the higher resolutions, there must be an interpolation or reconstruction operation to provide the higher density of disparity information required by the higher resolution matching processes. Therefore, it is clear that condition C l can not be met by these types of algorithms. It can be seen that the fundamental problem that these discrete matching methods have is that they perform a given matching operation at one resolution at a given time. That is, although they may use information from other resolutions to guide the matching, the matching itself is between features at a single resolution level only. This is the key point. If one allows the matching algorithm to match features across resolution levels, then conditions C l and C2 can indeed be satisfied. This brings us to the idea of scale space matching. The scale map of an image provides an integrated multi-resolution representation of the image. There is no built-in resolution chauvinism to cause one to believe that matching should be done at one resolution at a time. Look again at the scale maps of the simulated and real stereo pairs that were shown in chapter 4 and 5. Now try this simple experiment Get two identical pieces of cardboard or paper with long thin slits cut in them. Place these over the scale maps of a stereo pair such that the slits lie over a line of constant resolution (a, and the same in both). Now try and do the matching (assuming that you have an error prone estimate of the actual matches). It is pretty difficult Now remove the pieces of cardboard and try to match the scale map contours. If you are like most humans this will be a much easier task than 262 the previous one. Presumably this will be the case for computers as well. If one examines the way in which humans perform the task of matching the contours of the scale maps, they find that it is the global shape of the contours and the relationships between the contours that are used to perform the matching. This is due to the fact that, though segments of the scale map contours over a narrow range of a values may be quite ambiguous, contour segments which cover a large range of a values are much less ambiguous. Since, in the scale map, the contours are continuous in x and a, knowing the correspondence between contours at low resolutions means that one knows the correspondence at high resolutions as well, and vice-versa. Thus, one needs only measure the disparity at one resolution (i.e. the highest resolution, for the smallest error) to know the disparity values at all resolutions. Thus condition C 2 is satisfied. Furthermore interpolation or reconstruction of the disparity function is required only at the highest resolution (so that a suitably dense set of disparity values are available for the next processing step). Thus condition CI is satisfied. From this we conlude that, given that the process of scale map matching is feasible, such a process would be preferred to the discrete multi-resolution algorithms described in the earlier part of this thesis. In the following section we examine some methods that have been developed for matching scale space image representations. 263 6.2 - Matching of Scale Space Image Representations As the concept of scale space image representations is fairly new it stands to reason that the application of these representations to computational vision processes is only getting started. There has, to the authors knowledge, been only one system described in the literature that explicitly matches scale space function representations. This system, described in the thesis of F. Mokhtarian (Mokhtarian, 1984) is used to match the scale space representations of the curvature function of two planar curves (see Mackworth and Mokhtarian, 1984). This system was designed for a limited domain, that is one in which the scale maps to be matched were scaled and translated versions of each other. The reasoning behind this matching algorithm is as follows. The scale map can be considered as a tree structure. Witkin, (1983) was. the first to point this out The root of this tree is an imaginary contour which encloses all of the real scale map contours. Each of the real contours in the scale map encloses zero or more other contours, which correspond to the children of that contour. Each of these children enclose their own children and so on. Thus the scale map can be thought of as a multi-level tree. Each of the nodes in the tree correspond to a connected scale space contour, and have associated with them a left and right branch and a peak (unless the contour is incomplete, in which case it passes either the left, right or top boundaries of the scale map). Since the scale maps are assumed to be related by only a scaling and a translation, the transformation between corresponding contours in a pair of scale maps has only two parameters which need be determined. Mackworth and Mokhtarian's matching algorithm determines the minimum cost tree match, where the cost of a contour match is defined as the average distance between contours once the smaller of the two has been transformed (i.e. scaled and translated). Incomplete contours are not matched. The algorithm used to determine the lowest cost node matches is an adaption of the Uniform Cost Algorithm (Nilsson, 1971). Full details of the matching algorithm can be found in Mokhtarian's thesis (Mokhtarian, 1984). 264 We have developed a somewhat simpler scale map matching algorithm which operates in the same domain as Mackworth and Mokhtarian's algorithm. As with Mackworth and Mokhtarian's algorithm, we create a tree from the scale map whose nodes correspond to scale map contours and the children of these nodes are those scale map contours enclosed by the node contour. However, instead of performing a search to find the lowest cost (as defined above) set of node matches, our algorithm merely tries to maximize the correlation of the polarity of the scale map contours along the line of minimum a. The polarity of a scale map contour is defined, in this instance, to be the sign of the V 2 G filtered signal outside (that is, above and to the sides) the contour. The ordered list of contour polarities along the minimum o line provides a signature of the scale map, which can be used to match the scale maps. The autocorrelation of this signature almost always, in practice, exhibits a distinct peak at some shift. Thus, to match two scale maps we can measure the correlation of the ordered polarity list along the minimum a line. The peak of this correlation function will give the horizontal shift between corresponding scale map contours at minimum o. Note that, for non-zero disparity gradients, there will be scale map contours on the edge regions of one of the scale maps which have no corresponding contour in the other, due to the difference in the effective field of view in the two images. Thus, in this case, the horizontal shift predicted by the correlation will not be zero, but will be equal to the number of these 'new' boundary contours on the left side of the scale map. Also, if there is a non-zero disparity gradient, new contours may appear at the bottom of one of the scale maps and will not correspond with any of the contours in the other scale map. In this case the correlation peak will be diminished from its maximum possible value and correct matches cannot be made for all contours. The larger the disparity is, the greater the number of new contours. To handle this, our algorithm does the following. First we make a list consisting of all of the possible matches for each contour in the left hand scale map. We order the contours in the left and right scale maps according to height If we assume that the left image is the expanded one then we know that a given contour in the left scale map cannot match to any contour in the right scale map that is larger. Also we know that matching contours must have the same polarity. Furthermore, since the disparity function is linear, we know that the positional order 265 of contours in the right hand scale map must be the same as the order of the scale map contours in the left map that they match to. That is, there are no position reversals.13 We can determine the proper correspondences by applying these three constraints to weed out the list of possible matches. This algorithm has been tested on a number of one dimensional stereo scale map pairs of random Gaussian noise, with linear disparity functions. The matching algorithm matched perfectly on all of these cases. These cases were described in chapter 5.4 where they were used to test for the presence of the Filtering effect described in chapter 4.3. There are in the literature descriptions of systems which perform matching of multi-resolution image representations that are similar to scale map representations. Most notable of these is the representation of Crowley (Crowley, 1984, Crowley and Parker, 1984, and Crowley and Stern, 1984). This representation is seen to be a coarsely quantized (in the a value) scale map, where the features are the peaks (what we have called extrema) and ridges in the two dimensional V 2 G filtered image.14 This representation is also in the form of a tree, where the peaks and ridges are the nodes and they are linked both at a given resolution level as well as across resolution levels. In addition the local maxima of the three dimensional filter output (analogous to a quantized 2D scale space transform) are also used as features, and hence become nodes in the representations. It is clear that this representation is a type of scale space representation, although the tree structure is different than the one we have described earlier. The application that Crowley had in mind when he developed this representation was in the description and matching of shapes. It is evident that his representation can also be used to perform the stereo correspondence matching. Indeed, a colleague of his (Lowrie, 1984) did, in fact, use this representation to perform matching of stereo pairs. Just as a note in passing to complete this section, Yuille and Poggio, in one of their recent technical reports (Yuille and Poggio, 1983a) commented that they were looking into 1 3 Such reversals can occur with nonlinear disparities, such as are obtained for long thin objects such as wires or poles. "Actually, for computational reasons, Crowley uses the D O G or Difference Of Gaussians operator which can be shown to be a good approximation to the V 2 G operator (see e.g. Marr and Hildreth, 1980). 266 using scale maps in matching stereo pairs. It is clear that the scale map concept is turning out to be a useful practical tool as well as a theoretical tool. 267 6.3 - Matching of Two Dimensional Scale Maps The above mentioned methods are all one dimensional processes in that matching is done only along a line in the image plane, and the scale maps that are matched are only one dimensional. There are two drawbacks to this one-dimensional matching process. The first drawback is that such methods are unable to match scale maps derived from scenes having an appreciable vertical disparity component. Granted, such a condition may be rare, depending on the camera geometries, and if vertical disparities are present the images may be rectifiable, to produce horizontal epipolar lines. The second drawback occurs when the scale maps are computed via two dimensional filtering of the image, as is the case in a number of stereo systems (e.g. Grimson, 1981a). It can be shown that a one dimensional slice of a two dimensional scale map (as defined by Yuille and Poggio, 1984a (equation 4.3.57 of this thesis)) is, in general, not well-behaved where well-behavedness here refers to the property of scale maps of having contours that never contain points of local minima (or as Yuille and Poggio call them, upside down mountains and volcanoes). The proof of this statement is given in the appendix along with the conditions under which a slice of a two dimensional scale map is well-behaved. A scale map that is not well-behaved may possibly contain contours that never reach the minimum a line. Clearly, the matching algorithms described above will encounter difficulty in such a case. Just to drive home the non-well-behavedness of these one dimensional slices of scale maps consider the maps shown in figure 6.2. These represent three adjacent (i.e. slightly different y coordinates) slices of a two dimensional scale map for a Gaussian white noise image function. 1 5 The presence of scale map contours with local minima are evident These local minima are marked by small open circles on the maps. There are three ways in which we can remedy this problem. The first way is suggested by the theory developed in chapter 4.3 concerning the skew map representation of a two dimensional function. It was shown then that the skew map (which has only one spatial dimension) has the same properties as a one dimensional scale map. Since Yuille and Poggio 15These examples of one dimensional scale map slices were computed and plotted by Hans Wasmeier of the Electrical Engineering Department, UBC as part of a course project 268 FIGURE 6.2 Three adjacent one dimensional slices of a two dimensional scale map exhibiting non-well-behavedness. (1983a) have shown that all one dimensional scale maps are well-behaved, then so is the skew map. Thus we conclude that we can use the skew map representation as our input to the matching algorithms. There is a problem with this approach, however. The way in which the skew map is computed is to take a two-dimensional slice of a four-dimensional function (i.e. that defined by equation 4.3.56). Thus, the production of a skew map involves an order of magnitude increase in the amount of computation over the production of a scale map (which involves computation of a three dimensional function). This would rule out such an 269 approach for use in all practical implementations. A second remedy to the problem of matching two dimensional image representations involves actually performing the matching in two dimensions. Yuille and Poggio have shown that the full three dimensional form of the two dimensional scale map is always well-behaved (even though slices of it are not). Although the two dimensional scale map contours may contain saddle points, which look like local minima when sliced, there is always a path from such a point to the minimum a line. Thus contours can be matched and tracked to their maximum possible resolution. This solution, however, suffers from the same problem as the skew map method. Matching a two dimensional scale map involves an order of magnitude increase over the matching of a two dimensional scale map. The increase in computation will probably not be as great as for the skew map method, however, since the tree matching process involves fewer computations than does the spatial filtering process. Thus this method is preferable to the skew map method. The third solution, and probably the best, is to perform one dimensional filtering only. That is, compute separate one dimensional scale maps along each epipolar line in the images and match these. The filtering and matching computational complexity will both be an order of magnitude less than the two dimensional scale map matching method. The only possible drawback to this solution is the effect of one dimensional filtering on a two dimensional image function. As Grimson (1981a) points out, one dimensional, or directional, filtering tends to smear out edges in a direction perpendicular to the line of filtering. We can summarize by stating that, if we want to match scale based representations of two dimensional image functions, we must do one of the following: 1 - If the epipolar lines of the imaging system are not known then we must perform a full two dimensional matching of the two dimensional scale maps. If the epipolar lines of the imaging system are known then: 270 2 - We can perform a full two dimensional matching of the two dimensional scale maps anyway. 3 - We can compute the skew maps at each epipolar line of the two images and match these. 4 - We can perform one dimensional filtering along the epipolar lines in order to compute the one dimensional scale maps along these lines, and match these maps. From the point of view of minimizing computational complexity method 4 is the best. However, other considerations, such as minimization of feature distortion, or lack of information about the imaging geometry may indicate that one of the other methods be used. 271 6.4 - Problems With Scale Space Feature Matching As with all computational vision processes, there are some problems, practical and theoretical, with implementing stereo algorithms based on scale space feature matching. First and foremost of these problems is the immense computational load that is imposed in the computation of the scale maps themselves. For each value of a used in the construction of the scale map, a complete convolution of the image with the appropriate V 2 G filter is required. In order to maintain coherence of the scale map contours the quantization between the o values must be fairly small. Looking at the examples shown in chapters 4 and 5 and later in this chapter, one can see that this quantization should be at most on the order of 1.1:1 (i.e. a^/aj c_-^0 and that /3>0 for this mapping. We now wish to map a sample point in x space to the point T4 of the hexagonal lattice in J space, to create the partiton regions P 2 , D 2 that share a common Link with P i and D i . Imagine that r4 was not constrained to lie on a vertex of the hexagonal sampling lattice, but could lie anywhere in J space. It can be seen that if I\ was to lie anywhere on the line through Tj and T2 that /? would be zero (as the points riJ,2 and T4 are collinear). Furthermore it can be seen that only along this line can /3 be zero. Hence if T4 lies on the same side of the line through T] and T2 as does T3 then /3 is positive, and if it lies on the opposite side then 0 is negative. Now since we have the constraint that T4 must lie on the hexagonal sampling lattice, it can be seen that 0 for the region formed by and T 4 is negative. Now for "3 FIGURE 1.1 A portion of the partition P-^ Q in x space. r4 FIGURE 1.2 A portion of the partition D^Q in \ space. the Jacobian of 7 to have the same sign in P 2 as in P i , a for P 2 must also be negative. Hence xt must lie on the side of the line through it, and x2 opposite to x3. In other words the region formed formed by points Xi, x 2 , and x 3 must not overlap the region formed by points x,, x 2 , and x 4 . That is, P x must be a tessellation for the mapping function 7 to be one-to-one and continuous. Q.ED. Proof of THEOREM 3.7 : The proof of this theorem can be found in (Petersen and Middleton, 1964). Proof of THEOREM 3.8 : The proof of this theorem follows from the proof of theorem 3.7. 312 APPENDIX II - A METRICALLY ORDERED SEARCH ALGORITHM This appendix describes the algorithm that is used to perform nearest neighbour searches in the transformation reconstruction algorithm described in chapter 3. This algorithm is a modification of the one in (Hall, 1982) and performs a diamond search over a rectangular region (whereas Hall's algorithm performed a rectangular search). This algorithm can be extended to search over arbitrary convex regions by modifying the test for reaching the boundaries. The improvement over the algorithm given in the paper above is that the search is metrically ordered. That is, no point is searched before a point that is closer (using a Euclidean distance metric) to the start point The search is performed along the path shown in figure 2.1. Note how the search path changes in response to reaching a boundary of the search region. Most of the complexity in the algorithm is due to the handling of these boundary conditions. The edges of the search path are divided into five different types. These types, numbered 1,2,3,4 and 5, are defined as follows: Edge Type 1: Edges along the lower right side of the diamond. Edge Type 2: Edges along the upper right side of the diamond. Edge Type 3: Edges along the upper left side of the diamond. Edge Type 4: Edges along the lower left side of the diamond. Edge Type 5: The short edge causing an offset in the search path between edge type 4 and 1. Examples of these edge types are shown in figure 2.1. The edges, along which the search takes place, can be in one of five modes. These modes describe the relation of the edge with the boundaries of the search region. These five modes are summarized as follows: Mode 1 = 1: Mode I represents the case wherein an edge lies completely within the search region. That is, no part of the edge lies in the boundary. 313 FIGURE 2.1 The diamond search path and the five edge types. Mode 2 = IO : Mode 10 represents the case wherein the initial portion of the edge lies within the search region and the rest of the edge lies beyond the boundary. Mode 3 = 01 : Mode 01 represents the case wherein the initial portion of the edge lies outside the boundary and the rest of the edge lies within the search region. Mode 4 = 010 : Mode OIO represents the case wherein the initial part of the edge lies outside the boundary, the middle part lies within the search region, and the final part of the edge lies outside the search region. Mode 5 = B : Mode B represents the case wherein the entire edge lies outside of the search region. Examples of edges of each of these modes are given in figure 2.2. 314 B O U N D A R Y M O P E 1 MODE Z \ MODE 3 FIGURE 2.2 Examples of edges in the five edge modes. The operation of the algorithm is fairly simple. The search merely cycles between edge 1 to edge 2 to edge 3 to edge 4 to edge 5 to edge 1 etc. After each cycle the length of the edges is increased by one pixel. When the search along any one of the edges encounters a boundary, the mode of that edge, and of the next edge is altered to indicate this fact. The search point (X,Y) and the endpoint of the previous edge (E(i-1)) are also altered when the search along edge i reaches a boundary. The exception is edge type 5 whose parameters (length, mode) are never changed. In this way the search pattern shown in figure 2.1 is obtained. A pseudo-high-level language description of this algorithm is given below, begin ** X,Y is the starting point ** E l < - X + d; •* d is the search step size ** E2 <- Y - d ; ** Initialize the edge directions ** E3 <- X - d ; E4 < - Y + d; f o r i <- 1 u n t i l 4 d o m(i) <- I ** Initialize the edge modes to I X = X + 1; TEST(X,Y) SEARCH E D G E 2 SEARCH E D G E 3 ** Search along the edge directions ** SEARCH E D G E 4 SEARCH E D G E 5 SEARCH E D G E 1 g o t o L end p r o c e d u r e D I A G O N A L SEGMENT SEARCH(dx,dy,Xl,X2,Yl) b e g i n N < - floor(|(X2-Xl)/dx|); i f N = 0 t h e n r e t u r n f o r i < - 1 u n t i l N d o b e g i n XI < - XI + dx; ** Increment the search position ** Y l < - Y l + dy; TEST(Xl.Y) ** Test for the quantity being searched for ** e n d end p r o c e d u r e SEARCH E D G E 1 b e g i n dx = 1; ** Initialize the direction of search ** dy = -1 ; Step 1: [Mode I edge generation] if m(l) = I then begin if E l > XRIGHT then go to HITBOUND1; D I A G O N A L SEGMENT SEARCH(dx,dy,X,El,Y); E4 < - E4 + d; return end HITBOUND1: begin if E l > 2*XRIGHT - XSTART then go to ENTERB; m(l) < - IO; Let m(2) be adjusted to reflect that the initial segment of edge 2) is now outside of the search region.; D I A G O N A L SEGMENT SEARCH(dx,dy,X,XRIGHT,Y); E4 < - E4 + d; return end Step 2: [Mode IO edge generation] If m(l) = IO then begin if E l > 2*XRIGHT - XSTART then go to ENTERB; D I A G O N A L SEGMENT SEARCH(dx,dy,X,XRIGHT,Y); E4 < - E4 + d; X < - E l ; Y <- YSTART; return end Step 3: [Mode 01 edge generation] If m(l) = 01 then begin if E l > 2*XRIGHT - XSTART then go to ENTERB; Y <- YTOP X <- E l - Y + YSTART; if E l > XRIGHT then go to HITB0UND2; D I A G O N A L SEGMENT SEARCH(dx,dy,X,El,Y); E4 <- E4 + d; return end HITBOUND2: begin m(l) < - 010; Let m(2) be adjusted to reflect that the initial segment of edge 2) is now outside of the search region.; D I A G O N A L SEGMENT SEARCH(dx,dy,X,XRIGHT,Y); X < - E l ; Y <- YSTART; E4 < - E4 + d; return end Step 4: [Mode 010 edge generation] if m(l) = 010 then begin if E l > 2*XRIGHT - XSTART then go to ENTERB; Y <- YTOP; X <- E l - Y + YSTART; D I A G O N A L SEGMENT SEARCH(dx,dy,X,XRIGHT,Y); X <- E l ; Y <- YSTART; E4 <- E4 + d; return end Step 5: [Mode B generation] if m(l) = B then begin E4 <- E4 + d; X <- E l ; Y <- YSTART; return end Step 6: [Outer right boundary first reached] ENTERB: begin m(l) < - B; Let m(2) be adjusted to reflect that the initial segment of edge 2) is now outside of the search region.; if all m(i) = B then return; else E4 <- E4 + d; 319 X <- E l ; Y <- YSTART; return end end procedure SEARCH E D G E 5 begin X < - X + 1; if X > XRIGHT then m(l) = B; else TEST(X,Y) return end The procedures SEARCH E D G E 2, 3, and 4 are not written down here, as a space saving measure. The form of these procedures is the same as for procedure SEARCH E D G E 1. Details such as the sign of dx and dy, and the detection and handling of the boundary condition are different. 320 APPENDIX m - DERIVATION OF COVARIANCE FUNCTIONS In this appendix we derive the expressions for the covariance functions and their derivatives that are required in the body of the report. One dimensional case. We will first determine the autocovariance of the slice along o=o0 of the one dimensional scale space transform. This slice is obtained by filtering a one dimensional white Gaussian signal, having power spectral density of, with a filter having the following frequency response: H(w) = - a o 3 w 2 e " " 2 0 ( , V 2 (1) The power spectrum of the filtered signal is given by a-2|H(w)|2 giving: S(w) = ofo^t'^ 0" 2 (2) The autocovariance of the filtered function is simply the Fourier transform of S(a>) so that we get: ) = \oWe~u2°2e2/2\2 S,(w) (5) P 2(u) = | a 2 w 2 e _ c j 2 a 2 e V 2 | 2 S2(CJ) (6) —cj2cr2e2/2 where O 2CJ 2C is the Fourier transform of the filter function -x 2 /2e 2 a 2 d2/dx2 o/e\/2ire . The autocovariance functions ^i(r) and ^2(r) are the Fourier transforms of PI(CJ) and P2(w) respectively. In order to determine these functions we must find expressions for Si and S 2. Let us define the following two dimensional functions: gl(x,y) = f(x,y)»a5(x)V27r e " y / 2 a (7) g2(x,y) = f(x,y)»a5(x)/ l /27r 9 2/3y 2 e " y V 2 a 2 (8) where (•) indicates the convolution operator. It can be seen that: g.(x,y0) = h,(x) (9) and g2(x,y0) = h2(x) (10) 322 That is, h, and h 2 are slices of the two dimensional functions g, and g2. We can use the Mersereau and Oppenheim (1974) slice projection theorem to find the Fourier transforms of hi and hu from which we can then get Si and S2. Using the slice projection theorem we can write: /Ih,(x)} = o.SZao2t~U22°2n 4o2 = o^Vl-n (11) and flh2(x)} = - a T " a ^ j ' e " " ^ 7 2 dw2 = a J IK la (12) Hence we have that: S,(o>) = Oj2o227r (13) and S2(w) = oflir/o2 (14) Therefore we obtain: P,(w) = ofl-noW e ~ " 2 e , a J (15) and P2(CJ) = ofltioW e _ c j 2 e J a 2 (16) Evaluating the inverse Fourier transforms and setting e -1 yields: 323 \MT) = .25a i 2o l/7r[.25(T/a) ,-3(r/a) 2 + 3 ] e " - 2 5 ( T / ( 7 ) 2 (17) -4 and \p2{r) = o \|/,(r). As before it can be shown that the value of the derivatives of these functions at zero are given by: xjjSn\0) = ( - l ) n / 2 i / * ( n + 4)!/[(n/2 + 2)!2n + 4 a n ~ 1 ] (18) for n even and are zero for n odd. The cross-covariance function \jju of H, and H 2 can be shown to be the geometric mean of ^ and \p2. Thus, we have: iMO = .25a;V7r/a[.25(T/a) 4-3(T/a) 2 + 3 ] e " - 2 5 ( T / a ) 2 (19) and ^12(n)(0) = (-l) n /V7r(n + 4)!/t(n/2 + 2)!2n + 4 a n + 1 ] (20) for n even and are equal to zero for n odd. One dimensional slice of a two dimensional function Consider the 2D filter with the following frequency response: H(CJ,^2) = a*(wJ1+a)'2)e"(w2l+6,J2)aV2 (21) By the slice-projection theorem (Mersereau and Oppenheim, 1974) a slice p(x) of h(x,y) has the following Fourier . transform: 324 P(u) - o2)c-V+»2^/2^2 (22) = aj/27r[u2a2 + l ] e " u i a V 2 (23) This result, apart from a scale factor of 27ro2, was derived by Grimson (1981b). The power spectrum of p(x) can now be determined and is given by: P2(u) = 27ra2[l + 2o 2u 2 + o - V ] e " u 2 a 2 (24) Taking the inverse Fourier transform of the power spectrum yields the covariance function of the signal obtained by passing white noise, with unit variance, through the filter. Z = d x [ l + atan(20 + eI)]/[tan(-e2) + tan(2/3+01)] (31) Hence, substituting in the expressions for the tangents we have, after some algebra: Z = f 2(l + a : )d x / [ fD + a(f2 + x1x2)] (32) where D=X]-x2 is the disparity. 327 APPENDIX V - T H E CONDITIONS FOR W E L L - B E H A V E D SLICES OF 2D SCALE MAPS In this appendix we derive the conditions on a two dimensional function which ensure that a given one dimensional slice of its scale map is itself a well behaved (in the sense of Yuille and Poggio, 1983a) scale map. A well behaved scale map comes from a • scale space transform of the form: F(x,a) = USZj(u)e~(X~U)2/2o2du} (33) where Ii} is some linear differential operator in x. Now, let F (x,a) be a slice of the 2-D scale space transform defined by equation (4.3.57) (a skew factor of 1) as follows: F*(x.o) = V 2 ; ; " 0 0 f * ( u , v)(aV27r ) e " [ ( x " u ) 2 + ( y _ v ) 2 ; i / 2 a 2 d u d v | y = y ( ) (34) We can rewrite this as: F*(x,o) = Og(u,x) + h ( u ) ] e ( x - u ) V 2 a 2 d u (35) where g(u,x) = /roof*(u,v)/(27r) [(x-u)Vo 2 -l] e ~ ( y , r v ) 2 / 2 a 2 d v (36) and h(u) = /r.f.(u.v)/(2ir) [ ( y „ - v ) V a 2 - l ] e " ( y ° " v ) 2 / 2 a 2 d v (37) We can rewrite the scale space transform of equation (33) as follows: 328 F(x.cr) = j " r o f ( u ) U e ~ ( x " u ) V 2 ° 2 } d u (38) = /VCuMx - i O e - ^ - ^ ^ d u (39) where p(x-u) is a polynomial in (x-u). Thus, for F (x,o) to be well to be well behaved we require that: g(u,x) + h(u) = f(u)p(x-u) (40) where f(u) is any function of u and p(x-u) is a polynomial in (x-u). This implies that: g(u,x) = [p(x-u)-l]h(u) = p*(x-u)h(u) (41) * * where p is also a polynomial. From equations (4) and (5) we can see that p can only be the following: p*(x-u) = k[(x-u)2/o2-l] (42) where k is some constant This condition means that: / r o / ( u , v ) e - ( y ° - v ) 2 / 2 a 2 d v = k ; " o o f * ( u , v ) [ ( y c - v ) 2 / a 2 - l ] e ( y ° - v ) 2 / 2 a 2 d v (43) This equation can only be satisfied if f(u,v) is separable, that is if: f(u,v) = f,(u)f2(v) (44) Thus we conclude that a one dimensional slice of a two dimensional scale map is itself a well behaved scale map if and only if the two dimensional function which produced the two dimensional scale map is separable with respect to the axis along which the slice is made. 330 APPENDIX VI - DERIVATION OF THE DIFFREQUENCY QUANTIZATION ERROR The disparity gradient 0 i is a function of the scales of corresponding scale map contours, as detailed in chapter 7. If the scale in the left image is 0 , and that in the right image is o 2 then the disparity gradient is given by: 0, = (a./o,-].) (45) Let /3 = , l + /3j = o2/ol. The values of the scales in our experiments are not determined exacdy but are quantized. This means that the computed value of (3 is also quantized, and is not exact In this appendix we derive the probability density function of the error in the disparity gradient produced by this quantization, and also compute the variance of this quantization error. The quantization error is given by: e = (a2 + e2)/(ai + e,) - o2/Oi (46) = ( e i O ^ d O i V t o i C a i + ei)] (47) where ei and e2 are the quantization errors of the measured values of Oi and o 2 . We assume that es and e2 are uniformly distributed and independent of each other. Thus their joint probability density can be written as: P P P (ei,e2) = l/[(a1-bi)(a2-b2)] for aibi and a 2b 2 (48) C l C 2 and is zero otherwise. The values of ai, a2, bi, and b2 are functions of the true values of Ox and o2. These functions are as follows, for the logarithmic quantization used in our experiments: 331 = 2k N>-2k^N' + 1 / 2 ) (49) b = 2 k N , _ 2 k(N,- l /2 ) (50) ^ _ 2 k N 2 _ 2 k ( N , + l/2) (51) b _ 2 k N 2 _ 2 k ( N : - l / 2 ) (52) where k = 6.65/255, 2 + 2 k N l is the quantized value of a, and 2 + 2 k N z is the quantized value of a 2 for Ni and N 2 integers. Using the laws of transformation of variables for probability density functions we obtain: Pe(e) = J"!'CDPe]e2(ei,e2(e,e1))|a(e1,e2)/9(e1,e)|de1 (53) Using equation (48) we can see that this expression can be rewritten as Pe(e) = /^(crj -*- e1)/[(a1-b1)(a2-b2)] de2 (54) where 1, = max{ a!, {&2-Qal)a1/[ta1->ro J } (55) and 12 = max! bi, (b-to^o^/leo^ + o 2] ) (56) and lil 2 then p£(e) = 0. Thus we get after integration: Pe(e) = ( l 3 - l iXa . + (li + l2)/2)/[(a1-b1)(a3-b2)] (57) for 1]<12 and is zero for 1,>12. Let us define Lj and L 2 to be the values of e for which li and 12 switch between their two possible forms (equations (55) and (56)). These values are seen to be: Li = (a 2a,-a 1o 2)/(a,o, + ai 2) ' (58) U = ( b ^ - b . o O / t b . o . + o,2) (59) It can be seen that, since the cases l] = (a2a,-eai2)/(a2 + eai) l 2 = ( b 2 a i - e a j 2 ) / ( a 2 + eOi) never occur at the same time and that a i is always less than bi, the only case for which li>l 2 (and p e is zero) is when: ai>(b 2 Oi -eo 2 2 ) / (o- 2 + effi) (60) or when: bi<(b,ai-eoV)/(o 2 + e a i ) (61) Let us define L 3 and L* to be the values of e for which li=l 2 . It can be shown that these values are: L, = (b 2a,-a,0 2)/(a,a, + 0 2 2 ) (62) L 4 = (a 20i-b ,0 2 )/(a ,0]+ 0 2 2 ) (63) We can now obtain an expression for p g by substituting in the proper value of li and 1, in equation (?) according to the value of e. We will not write down the expression here as it is very tedious and does not contribute much more than equation (57). The variance of the diffrequency quantization error can be computed as follows: ° q = /!=oe 2Pe(e)d e <64) A closed form expression for this integral can be obtained but is not given here due to its length. The standard deviation (square root of the variance) is plotted in figure 7.13 in chapter 7 as a function of a2 and 0i (where we have set o i = o 2 / ( l + 0j)). 334 References 1) Abramowitz, M. and Stegun, I.A. 1965, "Handbook of Mathematical Functions.". Dover, New York 2) Ahuja, N. and Schacter, B. 1983, "Pattern Models.". John Wiley and Sons 3) Baker, H.H., and Binford, T.O. 1981, "Depth from edge and intensity based stereo.", Proc. 7th Int. Joint Conf. Art. Intell., Vancouver, B.C. 4) Barlow, H.B., Blakemore, C. and Pettigrew, J.D. 1967, "The neural mechanism of binocular depth discrimination.", Journal of Physiology, London, Vol. 193, pp 327- 342 5) Beutler, F.J. 1966, "Error free recovery of signals from irregularly spaced samples.", SI AM Review, Vol. 8, No. 3, pp 328-335 6) Blakemore, C. 1970, "A new kind of stereoscopic vision.", Vision Research, Vol. 10, pp 1181-1199 7) Burt, P., and Julesz, B. 1980, "A disparity gradient limit for binocular fusion.", Science, Vol. 181, pp 276-278 8) Clark, J.J., and Lawrence, P.D. 1984, "A hierarchical image analysis system based upon oriented zero crossings of bandpassed images.", in Multiresolution Image Processing and Analysis. Rosenfeld, A. ed., pp 148-168, Springer-Verlag, Berlin 9) Clark, J.J., Palmer, M.R. and Lawrence, P.D. 1985a, "A transformation method for the reconstruction of functions from non-uniformly spaced samples.", Accepted for publication, IEEE Transactions on Acoustics, Speech and Signal Processing. 10) Clark, J.J., and Lawrence, P.D. 1985b, "A systolic parallel processor for the rapid computation of multi-resolution edge images using the V 2 G operator.", Accepted for publication, Journal of Parallel and Distributed Computing. 11) Clark, J.J., and Lawrence, P.D. 1985c, "A theoretical basis for diffrequency stereo.", Submitted for publication. 335 12) Crowley, J.L. 1984, "A multiresolution representation for shape.", in Multiresolution Image Processing and Analysis. Rosenfeld, A. ed., pp. 169-189, Springer-Verlag, Berlin 13) Crowley, J.L. and Parker, A.C., 1984, "A representation for shape based on peaks and ridges in the difference of lowpass transform.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 2, pp 156-169 14) Crowley, J.L. and Stern, R.M. 1984, "Fast computation of the difference of low-pass transform.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 2, pp 212-222 15) Demaerschalk, J.P., Cottell, P.L., and Zobeiry, M. 1980, "Photographs improve statistical efficiency of truckload scaling.", Vol. 10, No. 3, pp 269-277 16) Dev, P. 1975, "Perception of depth surfaces in random-dot stereograms: A neural model.", International Journal of Man-Machine Studies, Vol. 7, pp 511-528 17) Dilworth, J.R. 1975, "Log Scaling and Timber Cruising", OSU Book Stores, Corvallis, Oregon 18) Fiorentini, A. and Maffei, L. 1971, "Binocular depth perception without geometrical cues.", Vision Research, Vol. 11, pp 1299-1311 19) Freeman, H. 1974, "Computer processing of line-drawing images.", Computer Surveys, Vol. 6, pp 57-97 20) Frisby, J.P. and Mayhew, J.E.W. 1980, "Spatial frequency tuned channels: implications for structure and function from psychophysical and computational studies of stereopsis.", Philosophical Transactions of the Royal Society of London B, Vol. 290, pp 95-116 21) Frisby, J.P. and Mayhew, J.E.W. 1981, "Psychophysical and computational studies towards a theory of human stereopsis.", Artificial Intelligence, Vol. 17, pp 349-385 22) Ghosh, S.K. 1979, "Analytical Photogrammetrv.". Pergamon Press, New York 23) Grimson, W.E.L. 1981a, "A computer implementation of a theory of human stereo vision.", Philosophical Transactions of the Royal Society of London B, Vol. 292, pp217-253 336 24) Grimson, W.E.L. 1981b, "From Images to Surfaces: A Computational Study of the Human Early Visual System.". MIT Press, Cambridge, Mass. 25) Grimson, W.E.L. 1982, "A computational theory of visual surface interpolation.", Philosophical Transactions of the Royal Society of London B, Vol. 298, pp 395-427 26) Grimson, W.E.L. 1984, "Binocular shading and visual surface reconstruction.". Computer Vision, Graphics and Image Processing, Vol. 28, No. I, pp 19-43 27) Grimson, W.E.L. 1985, "Computational experiments with a feature based stereo algorithm.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 7, No. 1, pp 17-34 28) Hall, R.W., 1982, "Efficient spiral search in bounded spaces.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 4, No. 2, pp 208-214 29) Hand, D.E. 1975, "Scanners can be simple.", in Modern Sawmill Techniques. White, V. (ed), Vol. 6, pp 187-196, Miller- Freeman, San Fransisco. 30) Higgins, J.R. 1976, "A sampling theorem for irregularly spaced sample points.", IEEE Transactions on Information Theory, September 1976 31) Horiuchi, K. 1968, "Sampling principle for continuous signals with time-varying bands.", Information and Control, Vol. 13, pp 53-61 32) Ikeuchi, K. 1983, "Constructing a depth map from images.", MIT Al Memo 744, Mass. Inst. Tech., Cambridge Mass. 33) Jarvis, R.A. 1983, "A perspective on range finding techniques for computer vision.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5, No. 2, pp 122-139 34) Jerri, A.J. 1977, "The Shannon sampling theorem-Its various extensions and applications: A tutorial review.", Proceedings of the IEEE, Vol. 65, No. 11, pp 1565-1596 35) Julesz, B . 1971, "The Foundations of Cvclopean Perception", University of Chicago Press, Chicago 337 36) Kotel'nikov, V.A. 1933 "On the transmission capacity of 'ether' and wire in electrocommunications.", lzd. Red. Upr. Svyazi RKKA (Moscow) 37) Kramer, H.P. 1959 "A generalized sampling theorem.", Journal of Mathematical Physics, Vol. 38., pp 68-72 38) Levine, M.D. 1978, "A knowledge based computer vision system.", in Computer Vision Systems. Hanson, A. and Riseman, E. (eds.) pp 335-351, Academic Press 39) Levine, M.D., O'Handley, D.A., and Yagi, G.M. 1973, "Computer determination of depth maps.", Computer Graphics and Image Processing, Vol. 2, pp 131-150 40) Levinson, N. 1940, "Gap and Density Theorems". American Mathematical Society Colloquim Publications, Vol. 26, American Mathematical Society, New York 41) Longuet-Higgins, M.S. 1962, "The distribution of intervals between zeroes of a stationary random function.", Philosophical Transactions of the Royal Society of London A., Vol. 254, pp 557-599 42) Lowry, A. 1984, it it M.S. Thesis, Carnegie-Mellon University, Pittsburgh, PA 43) Lu, S.Y. 1984, " A tree matching algorithm based on node splitting and merging.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 2, pp 249-256 44) Lunscher, W.H.H.J., 1983, "A digital image preprocessor for optical character recognition.", MaSc thesis, Dept. Electrical Engineering, University of British Columbia, Vancouver. 45) Lyness, J.N., "SQUANK (Simpson Quadrature Used Adaptively Noise Killed)", CACM Algorithm No. 379 46) Mackworth, A.K. and Mokhtarian, F. 1984, "Scale based description of planar curves.", Dept. of Computer Science Technical Report, 84-1, University of British Columbia, also in Proceedings of the Fifth National Conference of the Canadian Society for Computational Studies of Intelligence, London, Ont. 1984 47) Maffei, L. and Fiorentini, A. 1977, "Spatial frequency rows in the striate visual cortex.", Vision Research, Vol. 17, pp 257-264 338 48) Marr, D. 1974, "A note in the computation of binocular disparity in a symbolic, low level processor.", MIT Al memo no. 327, Mass. Institute of Tech., Cambridge, MA. 49) Marr, D. 1982, "Vision; A Computational Investigation in the Human Reperesentation and Processing of Visual Information.". W.H. Freeman, San Francisco. 50) Marr, D. and Hildreth, E. 1980, "Theory of edge detection.", Proceedings of the Royal Society of London B, Vol. 207, pp 187-217 51) Marr, D., Palm, G., and Poggio, T. 1978, "Analysis of a cooperative stereo algorithm.", Biological Cybernetics, vol. 28, pp 223-239 52) Marr, D., and Poggio, T. 1976, "Cooperative computation of stereo disparity.", Science, Vol. 194, pp 283-287 53) Marr, D. and Poggio, T. 1979, "A computational theory of human stereo vision.", Proceedings of the Royal Society of London B, Vol. 204, pp 301-328 54) Marr, D. and Ullman, S. 1981, "Directional selectivity and its use in early visual processing.", Proceedings of the Royal Society of London B, Vol. 211, pp 151-180 55) Marvasti, F. 1973, "Transmission and Reconstruction of Signals using Functionally Related Zero-Crossings.", PhD Thesis, Rensselaer Polytechnic Institute, Troy, New York 56) Marvasti, F. 1984, "Spectrum of non-uniform samples.", Electronics Letters, Vol. 20, No. 21, p 896 57) McClellan, J.H. 1973, "The design of two-dimensional digital filters by transformations.", Proceedings of the 7th Annual Princeton Conference on Information Science Systems 58) McClellan, J.H. Parks, T.W. and Rabiner, L.R. 1973, "A complete program for designing optimum FIR linear phase digital filters.", IEEE Transactions on Audio and Electroacoustics, Vol. 21, pp 506-526 59) Mersereau, R.M., 1979, "The processing of hexagonally sampled two-dimensional signals.", Proceedings of the IEEE, Vol. 67, pp 930-949 339 60) Mersereau, R.M. and Oppenheim, A.V. 1974 "Digital reconstruction of multidimensional signals from their reconstructions.", Proceedings of the IEEE, Vol. 62, pp 1319-1338 61) Mersereau, R.M. and Speake, T.C., 1983, "The processing of periodically sampled multidimensional signals.", IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 31, No. 1, pp 188-194 62) Miller, K.S., 1964, "Multidimensional Gaussian Distributions.", John Wiley and Sons, New York 63) Miller, D.G. and Tardif, Y. 1970, "A video technique for measuring the solid volume of stacked pulpwood.", Pulp and Paper Magazine of Canada, Vol. 71, No. 8, pp 40-41 64) Mokhtarian, F., 1984, "Scale space description and recognition of planar curves.", MSc. Thesis, Dept. of Computer Science, University of British Columbia, Vancouver, B.C. 65) Moravec, H.P. 1977, "Towards automatic visual obstacle avoidance.", Proc. 5th Int. Joint Conf. Artificial Intell., p 584 66) Nelson, J.L, 1975, "Globality and stereoscopic fusion in binocular vision.", Journal of Theoretical Biology, Vol. 49, pp 1-88 67) Nishihara, H.K. 1983, "Hidden information in early visual processing.", Proc. SPIE, Vol. 360, pp 76-87 68) Nishihara, H.K. 1984, "Practical real-time imaging stereo matcher.", Optical Engineering, Vol. 23, No. 5, pp 536-545 69) Ohta, Y. and Kanade, T. 1985, "Stereo by intra- and inter-scan line search using dynamic programming.", IEEE Transactions on Pall. Anal, and Mach. Intell., Vol. 7, No. 2, pp 139-154 70) Paley, R.E.A.C. and Weiner, N. 1934, "Fourier Transforms in the Complex Domain", American Mathematical Society Colloquim Publications, Vol. 19, American Mathematical Society, New York 71) Pan, S.X. and Kak, A.C. 1983, "A computational study of reconstruction algorithms for diffraction tomography: Interpolation versus Filtered Backpropogation.", IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 31, No. 5, pp 1262-1275 340 72) Papoulis, A. 1966, "Error analysis in sampling theory.", Proceedings of the IEEE, Vol. 54, No. 7, pp 947-955 73) Papoulis, A. 1967, "Limits on bandlimited signals.", Proceedings of the IEEE, Vol. 55, No. 10, pp 1677-1686 74) Petersen, D.P. and Middleton, D. 1962, "Sampling and reconstruction of wave-number limited functions in N-dimensional Euclidean spaces.", Information and Control, Vol. 5, pp 279-323 75) Petersen, D.P. and Middleton, D. 1964, "Reconstruction of multidimensional stochastic fields from discrete measurements of amplitude and gradient", Information and Control, Vol. 7, pp 445-476 76) Rice, S.O. 1945, "Mathematical analysis of random noise.", Bell System Technical Journal, Vol. 24, pp 46-156 77) Rosenfeld, A. (ed.) 1984, "Multiresolution Image Processing and Analysis", Springer-Verlag, Berlin 78) Rosenfeld, A. and Kak, A.C. 1976, "Digital Picture Processing", Academic Press, New York 79) Shanmugan, K.S., Dickey, F.M., and Green, J.A. 1979, "An optimal frequency domain filter for edge detection in digital pictures.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 1, pp 37-49 80) Sinclair, A.W.J. 1980, "Evaluation and economic analysis of twenty-six log sorting operations on the coast of British Columbia.", FERIC Technical Note TN-39, Forest Engineering Research Institute of Canada 81) Slepian, D. 1964, "Prolate spheroidal wave functions, Fourier analysis and uncertainty-IV: Extensions to many dimensions; Generalized prolate spheroidal functions.", Bell System Technical Journal, November 1964, pp 3009-3057 82) Srihari, S.N. 1984, "Multiresolution 3-d image processing and graphics.", in Multiresolution Image Processing and Analysis, A. Rosenfeld 83) Streifer, W. 1965, "Optical resonator modes - rectangular reflectors of spherical curvature.", Journal of the Optical Society of America, Vol. 55, no. 7, pp 868-877 341 84) Sugic, N. and Suwa, M . 1977, "A scheme for binocular depth perception suggested by neurophysiological evidence.", Biological Cybernetics, Vol. 26, pp 1-15 85) Tanimoto, S.L. 1978, "Regular hierarchical image and processing structures in machine vision.", in Computer Vision Systems. Hanson, A. and Riseman, E. (eds), Academic Press 86) Tang, G.Y. 1982, "A discrete version of Green's theorem.", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 4, No. 3, pp 242-249 87) Terzopoulos, D. 1982, "Multi-level reconstruction of visual surfaces: Variational principles and Finite element methods.", MIT Al memo 671, Mass. Inst. Tech., Cambridge, Mass. 88) Tyler, C.W. 1973, "Stereoscopic vision: Cortical limitations and a disparity scaling effect". Science, Vol. 181, pp 276- 278 89) Tyler, C.W. and Sutter, E.E. 1979, "Depth from spatial frequency difference: An old kind of stereopsis?", Vision Research, Vol. 19, pp 359-365 90) Vadnais, C. 1976, "Raise sideboard recovery with computerized edger.", in Modern Sawmill Techniques. White, V. (ed.), Vol. 6, pp 154-161, Miller- Freeman, San Francisco 91) VanMarcke, E. 1983, "Random Fields: Analysis and Synthesis.". MIT Press, Cambridge Massachussetts 92) Vit, R. 1962, "Electronic log scaler and its application in the logging industry.", Canadian Pulp and Paper Assoc., Woodlands Section, Index No. 2125 (B6), p526 93) Watts, S.B. (ed.), 1983 "Forestry Handbook for B.C.". published by the Forestry Undergraduate Society, University of B.C., Vancouver 94) Whittaker, E.T. 1915, "On the functions which are represented by the expansions of interpolatory theory.", Proceedings of the Royal Society of Edinburgh, Vol. 35, pp 181-194 95) Whittaker, J.M. 1929, "The Fourier theory of the cardinal functions.", Proceedings of the Mathematical Society of Edinburgh, Vol. 1, pp 169-176 342 96) Whittington, J.A. 1979, "Computer control in a chip-n-saw operation.", pill-118 97) Wiejak, J.S. 1983, "Edge location accuracy.", Proc. SPIE, Vol. 467, pp 164-169 98) Wiley, R.G. 1978, "Recovery of bandlimited signals from unequally spaced samples.", IEEE Transactions on Communications, Vol 26, No. 1, pp 135-137 99) Wilson, II.R. and Bergen, J.R. 1979, "A four mechanism model for spatial vision.", Vision Research, Vol. 19, pp 19-32 100) Wilson, H.R. and Giese, S.C. 1977, "Threshold visibility of frequency gradient patterns.", Vision Research, Vol. 17, pp 1177-1190 101) Witkin, A. 1983, "Scale-space filtering.", Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe West Germany, pp 1019-1022 102) Woodham, R.J. 1978, "Reflectance map techniques for analysing surface defects in metal castings.", T.R. 457, AI Lab, Mass. Inst. Tech., Cambridge, Mass. 103) Yao, K. and Thomas, J.B. 1967, "On some stability and interpolating properties of nonuniform sampling expansions.", IEEE Transactions on Circuit Theory, Vol. 14, pp 404-408 104) Yen, J.L. 1956, "On nonuniform sampling of bandwidth-limited signals.", IRE Transactions on Circuit Theory, December 1956, pp 251-257 105) Yuille, A.L., and Poggio, T. 1983a, "Scaling theorems for zero-crossings.", MIT AI memo 722, Mass. Inst. Tech., Cambridge, Mass 106) Yuille, A.L., and Poggio, T. 1983b, "Fingerprint theorems for zero-crossings.", MIT AI memo 730, Mass. Inst. Tech., Cambridge, Mass