T H E USE OF T H E GABOR EXPANSION IN COMPUTER VISION SYSTEMS. Richard Neil Braithwaite B. Sc. (Electrical Engineering) University of Calgary A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1989 © Richard Neil Braithwaite, 1989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of cL* CTK.! L- &JG / A/gg/g///^ The University of British Columbia Vancouver, Canada Date At/eZ-oST ~ZO , \ct%c\ DE-6 (2/88) Abstract This thesis explores the use of the Gabor expansion in computer vision systems. The exploration is performed in two parts: the properties of the Gabor expansion are re-viewed, and various image segmentation algorithms whose basis is the Gabor expansion are presented. The first part comprises a review of Gabor's original expansion of a one-dimensional signal [1] and Daugman's two-dimensional generalization [2] [3]. In both cases, impor-tant properties are discussed: the optimality of the "elementary function" with respect to minimal uncertainty, and the proper spacing of these functions to ensure a unique Gabor expansion. Methods for solving the Gabor expansion are investigated. Two non-iterative methods are discussed: the "inverse of the overlap matrix", and Bastiaans' "biorthonormal projection function" [4] [5]. One iterative method, Daugman's "steepest descent" [3], is reviewed. Two new iterative methods, the "warping method" and the "locally biorthonormal projection function," are presented. The complexity and conver-gence properties of these new methods are intermediate to Daugman's and Bastiaans' methods. In the second part of this thesis, Gabor expansion-based segmentation algorithms are presented. The properties of these algorithms—edge detection, texture segmentation, motion segmentation, and segmentation by depth—are investigated. It is shown that "local measures", which are by-products of the segmentation algorithms, can be used to both segment images and to provide three-dimensional descriptions of viewed objects. It is concluded that the Gabor expansion is a useful preprocessing step for computer vision systems. ii Table of Contents Abstract ii List of Tables vi List of Figures vii Acknowledgements x 1 Introduction 1 2 One-Dimensional Gabor Expansion 4 2.1 Joint Time-Frequency Representations 4 2.2 One-Dimensional Gabor Expansion 8 3 Two-Dimensional Gabor Expansion 16 3.1 Joint Position-Frequency Representations 16 3.2 Two-Dimensional Gabor Expansion 17 4 Solving for Gabor Expansion Coefficients 25 4.1 Projection Coefficients and Expansion Coefficients .25 4.2 Direct Methods 27 4.2.1 Inverse of Overlap Matrix 28 4.2.2 Biorthonormal Projection Function 29 4.3 Iterative Least Squares Solution 31 4.3.1 Steepest Descent 32 iii 4.3.2 Warping Method 34 4.3.3 Locally Biorthonormal Projection Function 36 4.4 Suitability for Vision Analysis 39 5 Implementing a Log-Polar Lattice 41 5.1 Orientation 41 5.2 Frequency 42 5.3 Spatial Sampling Lattice 42 5.4 Special Case: Circular Elementary Functions 43 6 Local Measures 44 6.1 Local Measures for a Single Image 45 6.1.1 Local Magnitude 45 6.1.2 Local Preferred Orientation and Moment of Inertia 45 6.2 Local Measures for Two Related Images 48 6.2.1 Local Difference in Phase 49 6.2.2 Local Difference in Preferred Orientation 52 7 Forming and Characterizing Pixel Groupings 54 7.1 Forming Pixel Groupings 54 7.1.1 Edge Detection and Texture Segmentation 54 7.1.2 Motion Segmentation 57 7.1.3 Segmentation by Depth 58 7.1.4 Integrating Maps to Form Relevant Pixel Groupings 62 7.2 Characterizing Regions and Edges 64 7.2.1 Texture of a Region 64 7.2.2 Edges 65 iv 7.2.3 Three-Dimensional Position and Motion 65 8 Examples 70 8.1 Edge Detection 70 8.1.1 Position Localization 71 8.1.2 Noise Immunity 73 8.1.3 Relevant Frequencies 77 8.1.4 Local Curvature Estimation 80 8.2 Texture Segmentation 81 8.2.1 Segmentation by Orientation 81 8.2.2 Segmentation by Frequency 82 8.3 Motion Segmentation 87 8.3.1 Camouflaged Object 88 8.3.2 Aperture Problem and Insufficient Spectral Energy 90 8.3.3 Translation in Depth 92 8.4 Discussion 94 9 Conclusion 96 9.1 Summary 96 9.2 Further Research 97 References 89 A Selecting an Appropriate Convergence Factor 102 B Lower Bound for the Optimal Convergence Factor 104 C Partial Derivatives for Three-Dimensional Motion 106 v List of Tables 8.1 Motion Parameters for the Camouflaged Circle 89 8.2 Motion Parameters for Translating Circle 91 8.3 Motion Parameters for Circle: Pure Translation Assumed 92 8.4 Motion Parameters for Shrinking Circle 93 vi List of Figures 2.1 Effective Duration of Gaussian Window 5 2.2 Effective Bandwidth of Gaussian Window 6 2.3 Cosine-type Elementary Function 10 2.4 Sine-type Elementary Function 11 2.5 Gabor Sampling Lattice 13 2.6 Logarithmic Sampling Lattice 14 3.7 2D Cosine-type Elementary Function 18 3.8 2D Sine-type Elementary Function 18 3.9 Frequency and Orientation Bandwidth of an Elementary Function . . . . 20 3.10 2D Sampling Lattice 21 3.11 2D Sampling Lattice, Gabor Generalization 22 3.12 Spectral Coordinates, Log-polar Lattice 24 4.13 Window for Bastiaans' Biorthonormal Function 30 4.14 Local Neighbourhood for Gabor Lattice 36 4.15 Local Neighbourhoods for Logarithmic Sampling Lattice 37 6.16 Moment of Inertia of Masses 46 6.17 Moment of Inertia for a Single Frequency 46 7.18 Vision System with Two Viewpoints 59 7.19 Four and Eight Point Connectivity 63 8.20 Test Image 1: A Circle • • • 7 1 vii 8.21 Magnitude Map: Test Image 1, fk = 0.125 cycle/pixel 72 8.22 Magnitude Map: Test Image 1, fk = 0.25 cycle/pixel 72 8.23 Edge Detector Output: Test Image 1, fk = 0.125 cycle/pixel 73 8.24 Edge Detector Output: Test Image 1, fk = 0.25 cycle/pixel . . . . . . . . 74 8.25 Test Image 2: A Circle With Additive Noise 74 8.26 Magnitude Map: Test Image 2, fk = 0.125 cycle/pixel 75 8.27 Magnitude Map: Test Image 2, fk = 0.25 cycle/pixel 75 8.28 Edge Detector Output: Test Image 2, fk = 0.125 cycle/pixel 76 8.29 Edge Detector Output: Test Image 2, fk = 0.25 cycle/pixel 76 8.30 Test Image 3: One Frequency, Four Orientations 77 8.31 Magnitude Map: Test Image 3, fk = 0.125 cycle/pixel 78 8.32 Magnitude Map: Test Image 3, fk = 0.25 cycle/pixel 78 8.33 Edge Detector Output: Test Image 3, fk = 0.125 cycle/pixel 79 8.34 Edge Detector Output: Test Image 3, fk = 0.25 cycle/pixel 79 8.35 Test Image 4: A Rectangle 80 8.36 Edge Detector Output: Test Image 4 81 8.37 Test Image 5: One Frequency, Two Orientations 82 8.38 Magnitude Map: Test Image 5, fk = 0.125 cycle/pixel 83 8.39 Magnitude Map: Test Image 5, fk = 0.25 cycle/pixel 83 8.40 Texture Segmentation Output: Test Image 5, fk = 0.125 84 8.41 Texture Segmentation Output: Test Image 5, /* = 0.25 84 8.42 Test Image 6: Two Frequencies, One Orientation 85 8.43 Magnitude Map: Test Image 6, fk = 0.125 cycle/pixel 85 8.44 Magnitude Map: Test Image 6, fk = 0.25 cycle/pixel 86 8.45 Texture Segmentation Output: Image 6, fk = 0.125 86 8.46 Texture Segmentation Output: Image 6, fk = 0.25 87 viii 8.47 Test Image 7: Camouflaged Circle 88 8.48 Motion Segmentation Output: Test Image 7 89 8.49 Motion Segmentation Output: Translating Circle 91 8.50 Motion Segmentation Output: Shrinking Circle 93 ix Acknowledgements I would like to thank Dr. Beddoes, my supervisor, for his continual support. A special thanks to my brother, Murray Braithwaite, who displayed skill and endurance while editing early revisions of this thesis. I would also like to thank Michele, my wife, for her love and support during the past two years of my studies. This work has been partially funded by the Natural Sciences and Engineering Research Council and by the Advanced Systems Foundation through postgraduate scholarships. x Chapter 1 Introduction In 1946, Denis Gabor [1] presented a new signal representation that describes a one-dimensional signal in terms of two variables: time and frequency. This representation (referred to as the "one-dimensional Gabor expansion") is suitable for analyzing aperi-odic signals such as sound. Gabor's motivation for designing this joint time-frequency representation was to analyze the human auditory system. More recently, Daugman [2] [3] generalized the Gabor expansion for analysis of two-dimensional signals. This "two-dimensional Gabor expansion" is believed to have a form similar to that of the early stages of human vision [2] [6] [7]. Such a claim suggests that the two-dimensional Gabor expansion could be a useful preprocessing step in a computer vision system. A computer vision system extracts, organizes, and interprets information obtainable from a sequence of digitized images. The images are presented to the vision system as matrices of pixel intensities. Each pixel has a two-dimensional spatial position and its intensity varies with time. No existing computer vision system can directly interpret an undifferentiated sequence of pixel intensities with much success; the vision task must be structured into a hierarchy of stages. The initial stage of computer vision analysis consists of forming simple, but useful, spatial and temporal pixel groupings such as edges, regions, and flow vectors. These pixel groupings are characterized by various measures such as position, preferred (spectral) orientation, and preferred frequency. For a vision system analyzing a sequence of images, 1 Chapter 1. Introduction 2 the detected changes in these measures over time provide useful information. If binocular (stereo) vision is available, changes of these measures induced by a change of viewpoint can also be utilized in the subsequent stage of vision analysis. The second stage of computer vision analysis consists of assigning three-dimensional properties to the newly-formed pixel groupings. These three-dimensional properties must be inferred because an image is a two-dimensional projection which under-determines a three-dimensional object. The inference process requires additional information such as knowledge of the imaging geometry (viewpoint constraints) and knowledge of phys-ical constraints on objects being viewed. The constraint information is applied to the results of the initial stage analysis, namely, the characterization of simple pixel group-ings by various measures, to perform the task of inferring three-dimensional properties. Three-dimensional properties of interest include boundaries, surface slant and tilt, rela-tive depth, and rotational and translational motion. In a general vision system, various techniques of forming and characterizing pixel groupings are integrated. These techniques include edge detection [8] [9], texture seg-mentation [9] [7] [11], motion segmentation [12] [13] [14] [15] [16], and segmentation by depth [17]. However, most existing implementations of these techniques have been de-veloped in isolation, without thought of integration. Each technique preprocesses pixel intensities differently. The complexity of a general vision system is significantly reduced if all techniques employ a common preprocessing algorithm. This thesis proposes that the two-dimensional Gabor expansion be used as the common preprocessing algorithm for integrated vision systems. A review of the Gabor expansion comprises chapters 2 through 4. Chapter 2 presents the Gabor expansion in its original one-dimensional form, introducing important concepts and definitions that are also used in the two-dimensional Gabor expansion. The two-dimensional Gabor expansion, as developed by Daugman [2] [3], is re-stated and expanded Chapter 1. Introduction 3 for use with visual data in chapter 3. Chapter 4 discusses various methods of solving the Gabor expansion and the suitability of each method for computer vision. The remaining chapters (5 through 8) are devoted to the Gabor expansion-based segmentation algorithms. Chapter 5 outlines the implementation of the log-polar form of the two-dimensional Gabor expansion. The log-polar form is amenable to segmen-tation algorithms because it represents the image in terms of useful local parameters (position, frequency, orientation). In chapter 6, the coefficients of the Gabor expansion are combined to produce local measures. The segmentation algorithms, which use local measures to form and characterize pixel groupings, are introduced in chapter 7. Chapter 8 illustrates the properties of these segmentation algorithms using simple test images. Chapter 2 One-Dimensional Gabor Expansion The Gabor expansion was originally applied to analyze one-dimensional signals, such as sound [1]. The one-dimensional Gabor expansion represents a signal as a sequence of frequency (pitch) elements of short duration occurring at specified time instants. This representation is resembles a musical score. The mathematics of one-dimensional signal analysis are presented in this chapter, setting the foundation for later generalization to two-dimensional analysis in the vision context. 2.1 Joint Time-Frequency Representations The one-dimensional Gabor expansion is a type of joint time-frequency representation. A joint time-frequency representation describes a one-dimensional signal, $(£), by two variables: time and frequency. The representation emphasizes the time-varying nature of the signal's frequency spectrum. In order to form a joint time-frequency representation, the frequency spectra over a series of short time intervals are required. The duration of each short time interval is defined by a "window function." A window function comprises a time duration and a bandwidth. The time duration is the standard deviation of the window function in the time domain (the "effective time duration"). The bandwidth is the standard deviation of the window function in the frequency domain (the "effective bandwidth"). The effective time duration and the effective bandwidth for a Gaussian window are shown in figures 2.1 and 2.2, respectively. 4 Chapter 2. One-Dimensional Gabor Expansion 5 Figure 2.1: Effective Duration of Gaussian Window Chapter 2. One-Dimensional Gabor Expansion 6 Figure 2.2: Effective Bandwidth of Gaussian Window Chapter 2. One-Dimensional Gabor Expansion 7 The effective time duration and the effective bandwidth are denoted by the width of the rectangular window included in their respective figures. Either the effective time duration or effective bandwidth can be used to define the effective size of the window function. This effective size will be referred to as the "scale" of the window function. In this chapter, any quantitative reference to scale is made using the time domain definition (the effective time duration). The actual size of the window function is referred to as the "support width." The sup-port width is defined in the time domain as the minimum time epoch that encompasses the non-zero portion of the window function (or other type of function). The support width for a Gaussian window is infinite in extent. In a computer implementation, the Gaussian window is truncated, thereby reducing the support width. If the window func-tion is over-zealously truncated, certain properties of the Gaussian waveform are lost (ie. minimum joint uncertainty, see section 2.2). On the other hand, an excessively large support width needlessly burdens the computer with insignificant calculations. The window function is used to extract a portion of the signal that is localized (con-centrated) in time by the effective duration and in frequency by the effective bandwidth. The window function must be both time-shifted and frequency-modulated in order to extract the signal energy that is concentrated about a specified time-frequency-phase co-ordinate (tn,fk,p). The tn is the time shift which defines the temporal coordinate of the new function. The fk is the modulation frequency which defines the spectral coordinate. The p is the phase of the modulation frequency relative to the coordinate tn. To extract the energy localized about (tn, /fc,p), the continuous inner product of the signal and the time-shifted, frequency-modulated window function is calculated. The inner product is a projection of the time-shifted, frequency-modulated window function onto the signal. This time-shifted, frequency-modulated window function will be re-ferred to as a "projection function." The value obtained from the inner product is the Chapter 2. One-Dimensional Gabor Expansion 8 "projection coefficient." As mentioned, the joint time-frequency representation describes a one-dimensional signal by two variables. A continuous form of a joint time-frequency representation over-determines the signal s(t). Therefore, the continuous joint time-frequency representation can be sampled without loss of information [5]. H the signal s(t) has a bandwidth of B Hz, the sample points in time and frequency must be chosen such that the overall sampling rate is IB independent measurements per second. If properly sampled, the set of projection coefficients contains sufficient information to reconstruct the signal s(t). The reconstructed signal is given by s(t) = £ a (w*>p)^(*;W*»iO» (2.i) nkp where E F Q is an elementary function centered at (tn,fk,p) and a() is the associated weighting coefficient. H the projection coefficient is used as a weighting coefficient, an elementary function must be chosen such that any overlap (in time or frequency) between adjacent projection functions is compensated; failure to compensate for overlap will in-troduce artifacts into the reconstructed signal. Thus, the elementary function must be biorthonormal to the set of projection functions. (Biorthonormal means orthonormal in both the time and frequency domains.) An alternative approach is to first select the elementary function, then design an appropriate biorthonormal projection function. In this approach, which is used in the Gabor expansion, the weighting coefficients a() are referred to as "expansion coefficients." 2.2 One-Dimensional Gabor Expansion Gabor [1] proposed that time-shifted, frequency-modulated Gaussian waveforms could be used as elementary functions in signal reconstruction. Gabor's elementary functions, which are denoted by ^ »(), are a special case of the elementary functions E F Q denned in Chapter 2. One-Dimensional Gabor Expansion 9 (2.1). For the remainder of this chapter, any reference to elementary function will mean Gabor's elementary function. The elementary functions in the Gabor expansion occur in quadrature pairs. This thesis selects cosine-type and sine-type elementary functions, which are given by (2.2) and (2.3), respectively. i>(t;n,k,0) = gn(t) cos 6nk(t) (2.2) i>{t; n, k, | ) = <7n(i) sin 9nk{t) (2.3) where ^ ) = e x ' - * W ( 2 ' 4 ) enk(t) = 2xfk(t-tn) (2.5) The two types of elementary functions are shown in figures 2.3 and 2.4. The elementary functions are centered at time tn and frequency /«,. The envelope of the elementary function has the same Gaussian shape in both the time and frequency domains. The time domain envelope is given by (2.4). The frequency domain envelope is given by Gfc(/) = exp-^!^M (2.6) The standard deviation of the Gaussian envelope in the time and frequency domain define the effective duration, At, and the effective bandwidth, A / , of the elementary function, respectively. The effective duration and the effective bandwidth are measures of time and frequency localization, respectively. Localization is related to the minimum separation for which two independent signal features can be resolved. To resolve two features that have a small separation in time, the elementary functions must have a short effective duration. Chapter 2. One-Dimensional Gabor Expansion 10 Figure 2.3: Cosine-type Elementary Function Chapter 2. One-Dimensional Gabor Expansion 11 Figure 2.4: Sine-type Elementary Function Chapter 2. One-Dimensional Gabor Expansion 12 Similarly, fine resolution in frequency requires the elementary function to have a small effective bandwidth. Arbitrary resolution cannot be achieved in both the time and frequency domains simultaneously. The joint time-frequency localization of an elementary function is subject to the Heisenberg uncertainty product, which can be expressed as AtA/ > 1. (2.7) The Gaussian waveform minimizes the Heisenberg uncertainty product such that (2.7) is an equality [1]. Since (2.7) is not affected by time-shift or frequency-modulation, all of Gabor's elementary functions possess the minimal uncertainty property. Minimal uncertainty means the Gabor elementary function optimally trades off the resolution in each domain. The Gabor expansion of a one-dimensional signal is given by s{t) = '£ankpr/>{t;tn,fk,p). (2.8) nkp The Gabor expansion coefficients, a^p, form a discrete set of empirical measurements that describes the signal s(t). The Gabor expansion is a sampling process defined by the center coordinates of the elementary functions (ie. tn,fk). The center coordinates for the set of elementary functions form a sampling lattice that covers the joint time-frequency space. The sampling lattice suggested by Gabor [1] is shown in figure 2.5. A sine and cosine pair of elementary functions are centered at each point in the lattice. The lattice points are evenly spaced in the time and frequency domains. The time and frequency sampling intervals are given by (2.9) and (2.10), respectively. AT = tn — *n_! = 20-5 Ai (2.9) Chapter 2. One-Dimensional Gabor Expansion 13 a(l,3) a(2,3) a(3,3) a(l,2) a(2,2) a(3,2) a(l,l) a(2,l) a(3,l) |*-AT-H T AF 1 Figure 2.5: Gabor Sampling Lattice (2.10) AF = f k - f k . 1 = 2°-5Af The sampling intervals, AT and AF, are determined by the effective duration, At, and the effective bandwidth, A / , respectively. The sampling area, ATAF, is twice the Heisenberg uncertainty product. The increase in area is required because each lattice point (tn,fk) is assigned two measurements: a sine and cosine expansion coefficient. The coefficients associated with the Gabor sampling lattice provide a "complete" de-scription of the signal s(i), containing enough information to reconstruct the original signal. In addition, each expansion coefficient in this lattice is an independent measure-ment. Since all the expansion coefficients are independent, the solution to the Gabor expansion (ie. ankP) is unique. The Gabor lattice defines equally-spaced frequency channels, making it useful for describing harmonic signals such as music or vibrations. As a consequence of this har-monic frequency spacing, the temporal sampling density is the same for each frequency channel. For most signals, the lower frequency channels encode coarse signal variations; Chapter 2. One-Dimensional Gabor Expansion 14 cn cn CS cn cn cn cn cn cn cn oo" cn w :io,3) cn i-H 1—4 N • C3 a(l,2) a(2,2) a(3,2) a(l,D a(4,2) a(2,l) a(5,2) Figure 2.6: Logarithmic Sampling Lattice the higher frequency channels encode finer detail. A lattice that has a higher temporal sampling density for the higher frequency channels is better suited than the Gabor lattice for representing these non-harmonic signals. The logarithmic lattice (shown in figure 2.6) is one such lattice that has a variable temporal sampling density. For this lattice, the scale of the Gaussian envelope is a function of the modulation frequency The time domain definition of scale is given by h (2.11) where A is a constant. This restriction on the scale of the Gaussian envelope makes the logarithmic form of the Gabor expansion a "multi-scale" representation. The time and frequency intervals defined by (2.9) and (2.10) can be used for the Chapter 2. One-Dimensional Gabor Expansion 15 logarithmic sampling lattice. The sampling intervals are given by (Ar) f e=2 0 - 5 (Ar) f e = 2°- 5A (2.12) /fc and (AF)k = 2°-5(A/) f c = 2 0- 5^. (2.13) The temporal sampling density increases with frequency. The overall sampling density in the joint time-frequency space (ie. AT^Ai^) remains constant. Chapter 3 Two-Dimensional Gabor Expansion This chapter introduces the two-dimensional generalization of the Gabor expansion. The two-dimensional Gabor expansion is a type of joint position-frequency representation, typically used to analyze images. Images represented by the "log-polar" form of the Gabor expansion are used in the segmentation algorithms of chapter 7. 3.1 Joint Position-Frequency Representations The joint position-frequency representation is a generalization of the time-frequency rep-resentation, describing an image I(x, y) by two spatial position variables and two spatial frequency (spectral) variables. The window function used in the joint position-frequency representation comprises an effective spatial extent and an effective bandwidth in each of the x and y directions. The effective spatial extents and effective bandwidths are defined by the standard deviation of the window function in their respective domain and direction. The projection function is a position-shifted and frequency-modulated window func-tion. A projection coefficient is an estimate of the of the signal energy concentrated about the position-frequency-phase coordinate (s, y, fx, fy, p). The frequency coordinates fx and fy can be represented by a modulation frequency / and an orientation <j>. The modulation frequency is given by / = Ul + fl?*- (3-14) 16 Chapter 3. Two-Dimensional Gabor Expansion 17 The orientation of the modulating wave relative to the x axis is given by <j> = arctan ^jr. (3.15) fx The set of projection coefficients can be used to reconstruct the image I(x,y). The reconstructed image is given by I(x,y)= X , a(xn,ym,fk,<f>hP)EF(x,y;xn,ym,fk,<Pi,p), (3.16) nmklp where EFQ is the two-dimensional elementary function and o() is the weighting coef-ficient. The weighting coefficients can be either the projection or expansion coefficients depending on the problem definition. 3.2 Two-Dimensional Gabor Expansion Daugman [2] generalized Gabor's elementary functions into two dimensions. For the two-dimensional case, the elementary function is a modulated elliptical Gaussian waveform. The two-dimensional cosine-type and sine-type elementary functions are given by (3.17) and (3.18), respectively. V>(z, y; n, m, k, /, 0) = gnm{&, y) cos 0nmW(x, y) (3.17) 7T V>(a:,3/;n,m,fc,f,-) = gnm(x,y) sin $nmki(x,y) (3.18) where (x,y) = (x cos 7e + y sinje, —x smje + y cos *ye) (3.19) TT (x - x n ) 2 (y -ym)2, 9nm(*,y) = e x P " 2 t - ( A x F ~ + ~{Kyy-] ( 3 ' 2 0 ) Onmkl = 27T[/X(fc)(x - Xn) + /y(J)( j / - J/m)] (3.21) The elementary functions appear in figures 3.7 and 3.8. The two-dimensional elemen-tary function has its spatial centroid located at (xn,j/m) and spectral centroid located at Figure 3.8: 2D Sine-type Elementary Function Chapter 3. Two-Dimensional Gabor Expansion 19 (/*(*)>/*(!))• The elliptical Gaussian envelope of the elementary function can be rotated using (3.19) such that the orientation of the major axis relative to the s-axis is given by The two-dimensional elementary function exhibits the same minimal uncertainty as its one-dimensional counterpart. The uncertainty constraint for the two-dimensional case is given by AxAyAfxAfy > i . (3.22) Ax and Ay define the effective spatial extent of the elliptical Gaussian envelope in the x and y directions. Afx and Afy define the effective bandwidth in each direction. Equation (3.22) assumes that the x- and y-axes correspond to the principal axes of the elliptical Gaussian waveform (ie. je = 0 or fe = ^). If this is not the case, the rotated axes (x,y) denned by (3.19) should be used. The effective spatial extent of the elliptical Gaussian envelope can be described by the scale and the aspect ratio. The scale is given by f Ai if Ax > Ay As = l (3.23) [ Ay otherwise. As is the effective spatial extent of the Gaussian cross-section measured along the major axis of the ellipse. The aspect ratio is given by a = < | j if Ax > Ay 44 otherwise. Ay (3.24) The aspect ratio is less than or equal to unity. If one of the principal axes of the elliptical envelope has the same orientation as the modulating wave (ie. *ye = <j>i or *fe = 4>i + f)* effective bandwidths can be described Chapter 3. Two-Dhnensional Gabor Expansion 20 Half-power Contour Figure 3.9: Frequency and Orientation Bandwidth of an Elementary Function (3.25) by a frequency and orientation bandwidth. The frequency bandwidth is given by i f 7 e = ^ + f. The orientation bandwidth is given by 2S?A7 tf7« = 6 WAS »*7. = 6 + i -The frequency and orientation bandwidths of a two-dimensional elementary function are shown in figure 3.9. A<f>= i (3.26) The two-dimensional Gabor expansion is given by I(x,y)= 52 OmmkiPip{x,y\n,m,k,l,p). nmklp (3.27) The sampling lattice for the two-dimensional Gabor expansion is illustrated in figure 3.10. Each spectral coordinate (fx,fy) has a corresponding spatial (x-y) sampling lattice. Chapter 3. Two-Dimensional Gabor Expansion 21 fx Figure 3.10: 2D Sampling Lattice The spacing between adjacent elementary functions for the two-dimensional general-ization of Gabor's sampling lattice is given by and AX = 2°-5Ax, A F = 2°-5Ay, AFX = 2°*Afx, AFy = 2°- 5A/ y. (3.28) (3.29) (3.30) (3.31) The above spacing is valid for two-dimensional elementary functions that are separable in x and y. If the elementary functions are not separable in x and y, then a sampling lattice based on the rotated axes (x,y) is used. In the two-dimensional Gabor lattice, Ax, Ay, Afx, and Afy are the same for each elementary function. Therefore the spatial sampling intervals (AX, AY") are identical Chapter 3. Two-Dimensional Gabor Expansion 22 fy T AY ± T AFy 1 •AX-H Spatial Domain H-AFx-H Frequency Domain fx Figure 3.11: 2D Sampling Lattice, Gabor Generalization for each spectral coordinate ( / x , / v ) . A generic spatial sampling lattice can be formed if a common spatial origin is selected. This generic spatial lattice is shown in figure 3.11 along with the spatial frequency (spectral) lattice. In the segmentation algorithms of chapter 7, local (spectral) orientation characteris-tics from different frequency channels are compared. To obtain a local orientation, the spectral coordinates for the Gabor lattice must be transformed from Cartesian to polar form. Even after the coordinate transformation, it is difficult to compare local orienta-tions because the orientation bandwidth of an elementary function in the Gabor lattice decreases with frequency. An alternative sampling scheme is the log-polar lattice. The log-polar lattice is amenable to image segmentation because its spectral representation uses frequency and orientation as basis parameters. The comparison of local orienta-tions is simplified because all elementary functions in the log-polar lattice have the same Chapter 3. Two-Dimensional Gabor Expansion 23 orientation bandwidth. For the log-polar lattice, the scale and orientation of the Gaussian envelope are re-stricted. The scale of the elliptical Gaussian envelope is a given by (As)k = 4- (3-32) The (spatial) orientation of the major axis of the ellipse is normal to the (spectral) orientation of the elementary function (ie. je = <f>i + )^. Because of these restrictions, the log-polar lattice provides a multi-scale image representation characterized by a spatial sampling density that increases with frequency. The sampling intervals for the log-polar lattice are denned as o0.5 \ (AX)kl = ±-A (3.33) fk (AY)kl = L - ^ , (3.34) fk - J t y <3'35> and A * = 5 ^ . (3-36) (AX)u and (AY)u are the spatial sampling intervals for the rotated axes. Rotated axes are not required if the Gaussian envelope of the elementary function is circular (ie. a = 1). In such a case, the spatial sampling intervals are dependent on the frequency but not the orientation of the spectral coordinate. The spectral coordinates for the log-polar lattice are shown in figure 3.12. The orientation spacing is independent of the modulating frequency fk; the frequency spacing is independent of the orientation fa. Chapter 3. Two-Dimensional Gabor Expansion 24 Figure 3.12: Spectral Coordinates, Log-polar Lattice Chapter 4 Solving for Gabor Expansion Coefficients This chapter introduces methods of solving the Gabor expansion for a one-dimensional signal s(t). These methods are generalized into two-dimensions for vision analysis. In both cases, the methods are complicated by the non-zero overlap between elementary functions. Because the elementary functions are non-orthogonal, the nature of a given elementary function's overlap with all other elementary functions becomes important. There are two categories of methods: direct and iterative. The direct methods are global, compensating for the overlap of the elementary functions in a single set of cal-culations. The iterative methods are local; they compensate for the overlap of adjacent elementary functions by using repeated local error calculations. The direct methods pro-vide the actual expansion coefficients at their completion. The iterative methods provide an approximate solution after each iteration—supplying the actual coefficients in the limit, after the solution sequence has converged. All of the methods discussed in this chapter require a set of projection functions whose coefficients are used to calculate the coefficients of the Gabor expansion. Before proceeding to the methods of solving the Gabor expansion, it is necessary to establish a relationship between the projection coefficients and expansion coefficients. 4.1 Projection Coefficients and Expansion Coefficients This section establishes a relationship between an arbitrary, but complete set of projec-tion coefficients and the Gabor expansion coefficients. The relationship introduces the 25 Chapter 4. Solving for Gabor Expansion Coefficients 26 overlap matrix which plays an important role in determining the coefficients of the Gabor expansion. A projection coefficient for the signal s(t) is obtained using = J 8{t)PF{t\ti,fj,?r)&, (4.37) where PF() is the projection function. If s(t) is replaced by the Gabor expansion s(t) = ] T a n f e p V ( M n > fkjPp), (4.38) nfcp then (4.37) becomes = j^"r*P1>{t\n,k,p)\PF{t^j,r)dt. (4.39) nfcp Since anL,p are constants for a given signal s(t), (4.39) can be rewritten as djr = y£dankp (ip(t;n,k,p)PF(t;i,j,r)dt. (4.40) nfcp J The integral in (4.40) is a continuous inner product that defines the overlap between the elementary function V>() and the projection function PF(). This overlap, denoted by q(), is given by q(i,j,r;n,k,p) = J\l>(t;n,k,p)PF(t;i,j,r)dt. (4.41) Thus, (4.40) can be expressed as Cijr = ^ OrApqfaj, r; n, k,p). (4.42) nfcp Equation (4.42) captures the relationship between the projection coefficients and the Gabor expansion coefficients. This relationship between the projection coefficients and expansion coefficients can be expressed in matrix form. Expressed as a vector, the set of expansion coefficients is given by a = [o(l) • • • a(indexg) • • •]T, (4.43) Chapter 4. Solving for Gabor Expansion Coefficients 27 where indexg = n + k0fftet(k + p 0//«et * p), (4.44) and a() is the expansion coefficient denned in (4.38). The vector of projection coefficients is similarly defined as c = [c(l) • • • c(indexp) • • -f, (4.45) where indexp = i + joff.et{j + roii„et * r), (4.46) and c() is the projection coefficient defined by (4.39). The matrix form of (4.42) is c = Qa, (4.47) where Q = 9(1,1) q(l,indexg) q(indexp,l) ••• q(indexp, index g) (4.48) and q() is the overlap defined by (4.41). The overlap matrix Q can be defined for two-dimensional signals by modifying indexp and indexg to include the additional variables. 4.2 Direct Methods The Gabor expansion coefficients can be obtained directly using the inverse of the overlap matrix or the biorthonormal projection function. Chapter 4. Solving for Gabor Expansion Coefficients 28 4.2.1 Inverse of Overlap Matrix The Gabor expansion coefficients can be obtained from the projection coefficients. Pre-multiplying (4.47) by Q~ L gives a = Q~lc. (4.49) This method assumes that Q' 1 exists. This inverse exists if Q is square and has full rank. The matrix is square if the number of points in the projection lattice and expansion lattice axe equal. The matrix has a full rank if each elementary function in the expansion lattice is an independent measurement of the signal. A common image processing technique is to filter the the signal with a set of bandpass niters. A bandpass filter produces a projection coefficient at each pixel position, thereby oversampling the image. An oversampled set of projection coefficients can be used to obtain the Gabor expansion coefficients if (4.47) is premultiplied by [Q TQ]~ 1Q T instead of Q~ L. [Q TQ]~ 1Q T is used because Q is not a square matrix when the projection lattice is oversampled. The Gabor expansion coefficients can be obtained from the oversampled set of pro-jection coefficients using a = [CfQY^c. (4.50) The inverse of Q TQ exists if each elementary function in the expansion lattice is an independent measurement. (Note that this is also a necessary condition for the existence of Q' 1). Thus, if the expansion lattice is correctly sampled, the projection lattice can be oversampled. Note that (4.50) can be rewritten as a = QJ^CN, (4.51) Chapter 4. Solving for Gabor Expansion Coefficients 29 where QN = Q TQ (4.52) and (4.53) Premultiplying by Q T effectively creates a new set of projection functions given by The lattice defined by the centers of the new projection functions no longer oversamples requiring equivalent projection and expansion lattices can be generalized to include over-sampled projection lattices by applying the transformation defined in (4.54). Thus, the iterative methods presented in later sections of this chapter can assume equivalent lattices without loss of generality. The inverse of the overlap matrix method is only useful for small-order signals. For large-order signals such as images, the amount of memory required to store the elements of Q' 1 is prohibitively large. A 256 x 256 pixel image has a 65,536 x 65,536 overlap matrix if the set of elementary functions is complete. It is presently impractical to store the four billion elements. 4.2.2 Biorthonormal Projection Function One method of circumventing the prohibitive calculation and storage costs of determining the inverse is to select a set of projection functions that are biorthonormal to the set of expansion functions. The biorthonormal relation is denned as (4.54) the image. This lattice contains the same points as the expansion lattice. Any method 1 if n = i, k = j, and p = r (4.55) 0 otherwise. Chapter 4. Solving for Gabor Expansion Coefficients 30 co E E co 1 -0.5 -0 --0.5 --1 -I 1 1 I I 1 I - I I I I I I I I I I I I I I " " • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 4.13: Window for Bastiaans' Biorthonormal Function If (4.55) is satisfied, the overlap matrix Q equals the identity matrix. Bastiaans [4] obtained an expression for PF(t;n,k,p) for the one-dimensional case. Bastiaans' expression uses the sampling lattice suggested by Gabor [1]. The biorthonor-mal projection function is given by PF(t; n, fc, p) = f(t + nAT) cos(^^ - 2nkn - p), (4.56) where = ( 2 ^ T ) ° ' 5 ( : ^ ) " 1 " 5 e ^ i r ) 2 ] £ exph7r(i + °'5)2] (4'57) i+0.5>^ and K„ = 1.85407468. The window function (4.57) is shown in figure 4.13. Porat and Chapter 4. Solving for Gabor Expansion Coefficients 31 Zeevi [7] provided a generalization of Bastiaans' window function for the two-dimensional Gabor lattice. For this type of lattice, the biorthonormal window function is given by Using a biorthonormal projection function to determine the Gabor expansion coeffi-cient has limitations. The window function 7(1) has an infinite support width; a truncated lattices require a different window function for each point on the sampling lattice. 4.3 Iterative Least Squares Solution The Gabor expansion can be solved using an iterative least square approach. This section outlines iterative methods for solving the one dimensional Gabor expansion. The methods can be extended into two dimensions by including the additional variables. The least square estimate of a^p minimizes the following cost function: (4.58) window must be used in practical implementations, thereby introducing errors. The win-dow function presented above is only valid for Gabor sampling lattices; irregular sampling (4.59) where (4.60) nfcp The derivative of the cost function is given by (4.61) where *7a(n,fc,p) — SJ (4.62) Chapter 4. Solving for Gabor Expansion Coefficients 32 The matrix form of (4.59) and (4.61) are given by (4.63) and (4.64), respectively. J = ±-Ql -^0, + const (4.63) (VJ) = Qa-c (4.64) where (VJ) = [£(!,!,!) . . . Ja(N,K,P)]T (4.65) The cost function is minimized when the gradient (VJ) is zero. The gradient of the cost function is equal to the projection of the Gabor waveform onto the error e(t). The Gabor waveform, in this case, is used as both the projection and elementary functions. The resulting overlap matrix Q is equivalent to the Gram matrix [19] since both are defined by the continuous inner product of pairs of elementary functions. The overlap matrix is also equivalent to the Hessian matrix [20] since the pair-wise overlaps of the elementary functions are the same as the second partial derivatives of the cost function (4.59). 4.3.1 Steepest Descent Daugman [3] proposed a neural network configuration to solve for the coefficients of the Gabor expansion. The proposed configuration is equivalent to the steepest descent method used in non-linear optimization [20]. In this thesis, Daugman's approach will be refer to as the "steepest descent method." An inverse calculation is not required for the steepest descent method. The steepest descent method uses local error information, provided by the gradient, to update the expansion coefficient estimates. The expansion coefficient update equation is given by a,+ 1 = a,-/?(VJ), (4.66) Chapter 4. Solving for Gabor Expansion Coefficients 33 where /? is the convergence factor. If the convergence factor is properly chosen, the coefficient estimates will improve with each iteration, providing the solution in the limit. The rate of convergence is controlled by the convergence factor /?. The optimal convergence factor (3* at iteration i is given by [20] (VJ) r(VJ) r 4 f i 7 , = ( V J ^ v j ) - ( 4-6 7 ) Since the optimal convergence factor is dependent on the error gradient, it must be re-computed for each iteration. Excessive computational and storage requirements asso-ciated with the overlap matrix Q make (4.67) impractical for large-order signals such as images. An alternative method is to use a constant /?. The constant must be small enough to guarantee convergence. The sequence will converge if the constant pi is chosen such that 0 < 0 < (4.68) The derivation of (4.67) and (4.68) can be found in appendix A. The constraint (4.68) requires knowledge of /?,*. Fortunately, the lower bound for /?,* can be estimated once the sampling lattice is defined. A more conservative version of (4.68) is given by 0 < /? < 2(3*LB < 2ft, (4.69) where is the lower bound of /?,*. Once the lower bound is determined, the constant /? should be chosen such that (3lB<(3<2/3iB. (4.70) A good choice for /? is near the lower bound (3LB. The convergence factor denned by (4.67) is global, thus all coefficient estimates are updated using the same /?. It is possible to assign a local convergence factor to each Chapter 4. Solving for Gabor Expansion Coefficients 34 point in the sampling lattice. The lower bound for the optimal convergence factor at the lattice point (n,ktp) is approximated by zZijr |g(*>j,r;7i,«,j>)l The derivation of (4.71) can be found in appendix B. The minimum from a set of PiB(n,k,p) can be used as the global lower bound PLB' 4.3.2 Warping Method The warping method is a variant of the steepest descent method, modeled after quasi-Newton methods used in non-linear optimization [20]. In a quasi-Newton implementation, an approximation to the inverse of the Hessian (overlap) matrix is used to improve the convergence of the iterative sequence. In the warping method, an approximation to the inverse of the overlap matrix is used to partially compensate for the overlap of neighbour-ing elementary functions. This approximation is referred to as the "warping matrix." The better the approximation, the greater the improvement in the rate of convergence over the steepest descent method. The rate of convergence for the steepest descent method is limited by the eigenvalue spread of the overlap (Hessian) matrix [20]. If the spread is large, many iterations are required to reach the solution. The convergence can be improved by warping the error gradient (VJ). The new rate of convergence depends on the eigenvalue spread of the product WQ where W is the warping matrix. The warping matrix premultiplies the update term as seen in (4.72). Oi+i = Oi - /3W{S7J) (4.72) Ideally the warping matrix should equal Q'1, making all eigenvalues equal. Unfortunately, the matrix Q requires too much memory for large-order signals such as images. Instead of using a full matrix, a reduced-order overlap matrix QnkP may Chapter 4. Solving for Gabor Expansion Coefficients 35 be used. Qnkp describes the overlap characteristics for a local neighbourhood centered about the lattice point (n, k,p). The local neighbourhood shown in figure 4.14 contains the elementary function (n,k,p) and its temporal and spectral neighbours. Once the local neighbourhood is defined, the warping matrix is given by Wnkp = Q~1P. (4.73) The warping vector for the Gabor expansion coefficient ankp is given by Wnkp = fnkpWnkp, (4.74) where fnrtp is an operator that selects the row associated with ani,p from the matrix WnkP' The coefficient update equation becomes Onkp = Gnfcp - /5w n r . p (VJ) n fcp, (4-75) where (VJ)nJ t P is the gradient of the Gabor expansion coefficients contained in the local neighbourhood centered at (n,fc,p). A warping vector wnkp must be calculated for each point in the sampling lattice. If the size of the warping vector is denoted by N W , the memory required to store the set of warping vectors is NWNEF- The memory requirements are much less than the full matrix Q if N W « NEF- The memory requirements are further reduced if most of the local neighbourhoods have the same overlap characteristics. In this case, most of the points in the sampling lattice will have the same warping vector. A local neighbourhood for a one-dimensional Gabor lattice is shown in figure 4.14. With the exception of the endpoints in the lattice, all local neighbourhoods have the same overlap characteristics. Thus, only one warping vector is required for the Gabor sampling lattice. The number of unique neighbourhoods for a logarithmic lattice depends on the neighbourhood size and the frequency spacing. One warping vector is required Chapter 4. Solving for Gabor Expansion Coefficients 36 Reference Gabor Function Neighbouring vim Gabor Function Figure 4.14: Local Neighbourhood for Gabor Lattice if the local neighbourhood is limited to temporal neighbours. Two warping vectors, one for each local neighbourhood, are required for the logarithmic sampling lattice shown in figure 4.15. 4.3.3 Locally Biorthonormal Projection Function The update term in (4.75) can be expressed as tOnfcP(VJ)nJtp = - / e(t)[J2WnfcP(», j,r)ij>(t\ r)]dt. (4.76) The new projection function created by the warping vector is given by PF(t;n,k,p) = Ylw^kp{i,j,r)if}(t]i,j,r). tjr (4.77) This projection function is orthonormal to all elementary functions in the local neigh-bourhood and is referred to as the "locally biorthonormal projection function." Chapter 4. Solving for Gabor Expansion Coefficients 37 Figure 4.15: Local Neighbourhoods for Logarithmic Sampling Lattice The support width of the projection function defined by (4.77) is larger than the support width of the elementary function. In a practical implementation, the locally biorthonormal projection function is truncated. Because the locally biorthonormal pro-jection function is used in an iterative implementation, the error introduced by truncation affects the convergence properties, not the final estimate of the Gabor expansion coeffi-cients. The remainder of this subsection outlines a method for calculating the optimal truncated projection function for a given support width. Assume the projection and elementary functions are normalized such that J i{>(t;n,k,p)PF(t;n,k,p)dt = 1. (4.78) The normalization makes the diagonal elements of the overlap matrix Q unity. The normalized overlap matrix Q can be written as Q = I + V, (4.79) Chapter 4. Solving for Gabor Expansion Coefficients 38 where V is a matrix containing all the off-diagonal elements of Q. Note that [I+V][I-V] =1 -V2. (4.80) If all the elements of V are small (ie. much less than one), then / - V2 « / (4.81) and Q-1 KI-V = 21-Q. (4.82) Using (4.82), an estimate of the locally biorthonormal projection function is recursively defined by PF,+1(t;n,k,p) = 2PFl(t;n,k1p)-^q{iJ,W (4.83) t j r Since (4.83) is a recursive expression, it is necessary to select an initial set of projection functions. For convenience, the elementary functions of the Gabor expansion are used as the initial set of projection functions. The optimal truncated projection function for a given support width is obtained as follows: • Normalize the projection function PFi(t;n,k,p) to satisfy (4.78). • Calculate the overlap elements q(i,j,r; n, k,p). • Calculate the new projection function using (4.83). • Truncate the projection function to the pre-specified support width. • Repeat until sequence has converged. The recursive procedure will converge if the sampling lattice is defined such that the Gabor expansion is unique. The resulting set of projection functions can be used in either the steepest descent or warping methods. Chapter 4. Solving for Gabor Expansion Coefficients 39 An interesting example of the locally biorthonormal projection function for the Gabor sampling lattice occurs when the support width is restricted to AT. The window function for this example is given by ' Kexpl^Y} for -ML<t<^f 7 ( 0 (4.84) 0 otherwise, where K is a normalization constant. This window function, suggested by Gabor [1] in his 1946 paper, coincides with the central lobe in Bastiaans' window function (see figure 4.13). 4.4 Suitability for Vision Analysis The segmentation algorithms in the later chapters use the log-polar form of the Gabor expansion. This section discusses the suitability of each method for solving the log-polar form of the Gabor expansion in the computer vision context. The inverse of the overlap matrix method is useful when the number of elementary functions in the expansion lattice is small. Since an image requires a large expansion lattice, this method is impractical for computer vision analysis. The biorthonormal projection function is designed for the Gabor lattice. The method can be modified to accommodate the log-polar lattice, however, the storage costs are high. Because the spatial sampling density varies with the modulation frequency of the elementary function, the overlap characteristics are dependent on the lattice position. Thus, a different window function must be stored for each point in the lattice. The steepest descent method is the simplest and most flexible of the iterative ap-proaches. If the convergence factor is sufficiently small, the steepest descent method can be used on any type of lattice, including the log-polar lattice. Because the steepest Chapter 4. Solving for Gabor Expansion Coefficients 40 descent method is based on local error calculations, it is suitable for parallel implemen-tation. The drawback of this method is that the rate of convergence is the slowest of the three iterative approaches. The warping method has improved convergence properties over the steepest descent approach.! This improvement in convergence is a result of increased complexity. In the warping method, all error gradients in the local neighbourhood must be calculated before a coefficient can be updated. This serial requirement complicates parallel implementa-tion. The warping method can be used on the log-polar lattice. If the local neighbourhood is properly chosen, the number of unique warping vectors can be minimized. In defining a local neighbourhood, it should be remembered that there is a tradeoff between the neigh-bourhood size and the rate of convergence. A good compromise is a local neighbourhood containing only spatial neighbours. In this case, only one warping vector is required. The locally biorthonormal projection function has improved convergence properties without introducing serialism. This method, however, requires a different window func-tion for each unique local neighbourhood. If the local neighbourhood contains only spatial neighbours, one window function per modulating frequency per orientation is required. The locally biorthonormal projection function method is used in the segmentation examples presented in chapter 8. In these examples, the log-polar lattice comprises circular elementary functions, thus the orientation neighbours can be included in the local neighbourhood without increasing the number of window functions. For the local neighbourhood containing spatial and orientation neighbours, one window function per modulating frequency is required. Chapter 5 Implementing a Log-Polar Lattice This chapter begins the second part of the thesis: the presentation of the Gabor expansion-based segmentation algorithms. The log-polar form of the Gabor expansion is used in the segmentation algorithms because the "log" provides a multi-scale representation of the image and the "polar" provides frequency and orientation as canonical variables. These properties are useful for creating "local measures", an important primitive used in the segmentation algorithms. The log-polar lattice is defined by a set of orientation, frequency, and spatial co-ordinates. This chapter outlines the steps required to implement a log-polar sampling lattice. A special case of the log-polar lattice that contains circular elementary function is addressed in the final section. This special lattice is used in subsequent chapters. 5.1 Orientation If the Gabor expansion is to be complete and unique, the orientation spacing must be selected such that A$ = —, (5.85) where n^ ,, the number of orientations, is an integer. From (5.85) it can be seen that the number of orientations in the log-polar lattice determines the orientation spacing. After the orientation spacing is determined, the orientation coordinates are given by <h = <pref + ZA*, (5.86) 41 Chapter 5. Implementing a Log-Polar Lattice 42 where 1 < I < and <f>ref is a reference orientation. In the segmentation examples of chapter 8, the reference orientation is <f>ref = — f; the number of orientations is = 4. The orientation spacing has been previously defined by (3.36). Equation (3.36) is independent of frequency, determined solely by A. Combining (3.36) and (5.85), A can be expressed as 5.2 Frequency The frequency spacing is given by h - f > - i = A F k \ A F k - ' - (5.88) Unlike the orientation spacing, the frequency spacing is dependent on the modulation frequency of the elementary function. A given modulation frequency can be defined in terms of a higher modulation frequency. The recursive relationship is given by Since A is determined by the orientation spacing, only a and the reference frequency / r e / are required to define the frequency coordinates. The frequency coordinates are given by A - ^ S ^ l " <5-90> where AA: is the difference in the frequency index between the reference frequency / r e / and the desired frequency fk-5.3 Spatial Sampling Lattice The sampling intervals in the spatial domain (AX, AY") can defined in terms of ct,\,fk, and fa. Since these variables have been defined in the previous sections, the spatial Chapter 5. Implementing a Log-Polar Lattice 43 sampling intervals for each spectral coordinate (fkt<i>i) is already determined. The spa-tial lattices will be defined once a spatial origin (xref,yref) is selected for each spectral coordinate. The coordinates for the rotated x-y lattice are defined as 20,5A x(n,m, k,l) = xTef(k,l) —[nsin j^ -macos^j] (5.91) /* and 205A y(n,m, k, I) — yref(k,l) —[ncos^ j + ma sin <p{\. (5.92) Jk 5.4 Special Case: Circular Elementary Functions Setting a = 1 simplifies the implementation of the log-polar lattice. All the elementary functions will have circular envelopes, thus the inconvenience of using rotated axes is avoided. If a common spatial origin is used, all spatial sampling lattices for a given frequency are identical. The x and y coordinates for this restricted lattice are given by (5.93) and (5.94), respectively. 20 5A z(n, k) = xref(k) + n[-7—] (5.93) Jk 20,5A y(m, k) = yref(k) + m[-r—] (5.94) Jk Chapter 6 Local Measures In order to form relevant pixel groupings, segmentation algorithms must cluster adjacent regions that are similar. The similarity of regions is gauged by comparing "local mea-sures." All local measures denned in this chapter—local magnitude, local preferred orien-tation, local moment of inertia, local difference in phase, and local difference in preferred orientation—are calculated from Gabor expansion coefficients. Thus, these measures are localized to the spatial region and frequency band defined by an elementary function centered at (xn,ym,fk). A local measure is categorized according to the number of images used in its calcula-tion. The local difference in phase and the local difference in preferred orientation require the Gabor expansion coefficients from two images; the other local measures require the coefficients from a single image. The Gabor expansion coefficients occur as quadrature pairs. An alternative definition uses a magnitude and a phase. The magnitude at the lattice point (z n , ym, <f>i) is given by 7T M(n,m,k,l) = [o2(n,m, k,l,0) + o2(n,m, k,l,— )]0'5. (6.95) M{n,m,k,l) is referred to as the "oriented magnitude." The phase at (xn,ym, fk,<j>i) is given by 0(n,m,k,l) = arctan 4^ 444^ - (6-96) a[n,m,k,l,0) 44 Chapter 6. Local Measures 45 6.1 Local Measures for a Single Image Three local measures are proposed for use on a single image. These measures include the local magnitude, the local preferred orientation, and the normalized local moment of inertia. The latter two measures use the local magnitude as a confidence estimate. 6.1.1 Local Magnitude The magnitude of the image localized about (xn, ymifk) is given by M(n, m,k) = ^ 2 M(n, m, k, I). (6.97) l Equation (6.97) assumes that the spatial lattice points (xn,ym) are the same for each ori-entation <pi (ie. a = 1 and a common spatial reference). If the lattices are different, some method of interpolation is required to align the magnitudes from different orientations. 6.1.2 Local Preferred Orientation and Moment of Inertia Another single image measure is the local preferred orientation. The local preferred orientation is defined using a modified version of the moment of inertia of masses equation. The moment of inertia of masses is given by [21] J r = 2>,2Am, (6.98) < where Am; is a mass and rj is the radial distance of the mass from the axis of rotation T (see figure 6.16). In the present application, the moment of inertia is defined in the spatial frequency (spectral) domain as shown in figure 6.17. The mass Am; is replaced by the oriented magnitude Af(n,m, fc,Z). The radial distance is defined by ri = fksin(<pi-<pT) (6.99) where fa is the orientation of the axis of rotation T. Chapter 6. Local Measures 46 Chapter 6. Local Measures 47 At each point (xn, ym,fk) there is a localized moment of inertia for an axis of rotation T. This local moment of inertia is given by 7y(n,m, fe) = fl M(n, TO, fe, I) sin2[<ft/ — <^ r(n,m, fe)]. (6.100) i The axis with the minimum moment of inertia is defined as the local preferred orientation, The orientation of the axes with the minimum and maximum moments of inertia are given by <f>T(n, m, fe) = 0.5 arctan — — . (6.101) }2i M(n, m, K , /) cos 2<pi Two orientations are obtained from (6.101) due to the fact that tan(2<£r) = tan(7r + 2<f>T). (6.102) The two orientations defined by (6.101) are substituted into (6.100) to determine which corresponds to the minimum moment of inertia. The axes with the minimum and maxi-mum moment of inertia are normal to each other. The preferred orientation <f>x{n, m, k) is independent of the local magnitude M(n, m, k). The moment of inertia, as defined in (6.100), is dependent on M(n, TO, k). To eUminate its dependence on the local magnitude, the moment of inertia could be normalized such that f i u\ D M(n, TO, k, I) sm2[<f>i - <fa{n, TO, fe)] . nq\ JT(n>m>k) = DM(n,TO,fe,/) ' ( 6 J 0 3 ) The new measure, the normalized local moment of inertia, is independent of M(n, TO, fe) and is restricted to the range 0 < Ir(n, m, fe) < 0.5. (6.104) Since a small normalized moment of inertia indicates tight clustering about the preferred orientation, /r(n,m, fe) can be considered a pseudo-variance of the local orientations. Chapter 6. Local Measures 48 In later stages of visual processing, the local measures will be combined to characterize a larger pixel grouping. In order to combine local measures properly, each measure is assigned a confidence estimate. The confidence of a measure is inversely proportional to the sensitivity of the measure to errors in the oriented magnitude M(n,m,fc,Z). The sensitivity derivative for <f>T(n,m,k) is given by 6<j>T(n,m,k) _ IT(n,m,k) 6M(n,m,k,l) M(n,m,k)' Thus, the confidence of the measure is high if the normalized local moment of inertia is small and the local magnitude is large. For simplicity, the confidence of the local preferred orientation measure will be defined as C^T)(n,m,k) = M(n,m,k). (6.106) The sensitivity derivative for Jr(n,m, k) is given by Slxin, m, k) sin2(<f>i — <j>r) — Irin, m, k) (6.107) 8M(n,m,k, I) M(n,m,k) If the local magnitude is large, the confidence of the measure is high. The confidence of the normalized local moment of inertia measure will be defined as Cf(T)(n,m,k) = M(n,m,k). (6.108) 6.2 Local Measures for Two Related Images If two related images are available, the differences between the reference image and the other image can be used as measures. Two difference measures are proposed: the local dif-ference in phase and the local difference in preferred orientation. The difference in phase is localized to the region defined by an elementary function centered at (xn,ym,/&,<&). To be consistent with the single image measures, all measures should be localized about Chapter 6. Local Measures 49 (xn,ym,fk). This can be achieved by splitting the local difference in phase into two mea-sures: magnitude and direction of translational displacement. The difference in preferred orientation is localized about (xn)ym, fk). 6.2.1 Local Difference in Phase The difference in phase at the lattice point (xn,ym, fki<Pi) is given by A0(n, m, fc, I) = arctan(w) (6.109) where _ a0(tt, m, fc, Z,0)ai(ra, m,k,l, f ) - 00(71, m, fc, Z, f )ai(n, m, fe, Z, 0) ao(n,m,fe,/,0)a1(n,m,fe,/,0) -f O o ( n , m , f e , / , £ ) a i ( n , m , f e , / , ^ ) ' The subscript 0 of a0 denotes the reference image; the subscript 1 of ax denotes the other image. The sensitivity derivatives for AO are given by SA0(n,m, fc,Z) o0(n,m, k,l,p + ^) 6oo(n, m, k,l,p) [Mo(n, m, fe, Z)]2 and (6.111) 6A6(ntm,fe,Z) _ ai(n,m,fe,Z,p + f) i i 2 £ai (n,m, fc, l,p) [Mi(n.,m, fc, Z)]2 If the oriented local magnitude M{n,m,k,l) is large for both images, the confidence of the measure is high. The confidence of the measure will be defined as Cz\o(n, m, fc, Z) = < M0(n, m, fc, Z) if Mo(n, m, fc, Z) < Mi(n, m, fc, Z) (6.113) Mi(n,m, fc,Z) otherwise. Oppenheim and Lim [22] stress the importance of phase information in the Fourier representation of images. It is shown that the structure of an image is stored in the phase information. Zeevi and Porat [23] found that the structure is also stored in the localized Chapter 6. Local Measures 50 phase information obtained from Gabor expansion coefficients. If the structure is shifted due to a displacement between two images, then there should be an associated shift in the local phase. I propose that A6 be used to estimate small translational displacements of an elementary function. The local phase shift A0(n,m, k, I) can be approximated by a translation of the ele-mentary function (xn,ym,fk,fa) along the axis denned by fa. Consider as an example an elementary function whose preferred orientation is along the x-axis. Such an elementary function is given by VKz,.y;Zn,3/m,/fc,0,p) = g{y\ym)g{x-yxn)cos[2irfk{x - xn) - p], (6.114) where g{x,xn) = eM-*{X~2lffh (6.H5) If (6.114) is translated along the x-axis by Sx, then il>(x,y;xn+8x,ym,fk,0,p) = g{y;ym)g{x;xn+Sx)cos[2Tt fk(x-xn)-2-K fkSx-p). (6.116) If (6.114) is phase shifted by AO then i>(x,y;xn,ym,fk,O,A0 + p) = g{y]ym)g{x\xn)cos[2irfk{x -xn) - A6 -p]. (6.117) If <jr(x;x„) w g{x;xn + Sx) (i.e. Sx is small and A is large), then A6 w 2itfkSx. (6.118) If an object represented by an elementary function moves, a phase shift is induced. The phase shift can be used to estimate the component of the object's motion along the elementary function's preferred orientation, fa. If the magnitude and direction of the ob-ject's translational displacement are denoted by D(n,mtk) and <^2?(n,m,k), respectively, Chapter 6. Local Measures 51 then the component of the translational displacement in the direction fa (referred to as "component displacement") is given by D(n,m,k,l) = _ 2)(n,m,fc)cos[ I^>(n,m,fc)-^j]. (6.119) If D(n,m, fc, I) is negative, the direction of the component displacement is ix + fa. In (6.119), the object's component displacement is measured using A0. Since A6 is restricted to (—7r,7r], the measurable component displacements are restricted to - ^ < £ > ( n , T O , f c , 0 < - L (6-120) Larger component displacements are incorrectly estimated because the measure is aliased into the restricted range. Equation (6.119) has two unknown variables: the magnitude D(n,m,k) and the di-rection <pr){n,m,k). If two component displacements are available, both variables can be obtained using D(n,m,k) = [Dl(n,m,k) + Dj(n,m, Jfe)] 0 , 5 (6.121) <f>D(n,m, k) = arctan A / ( n > T O ' f e ) (6.122) Dx(n,m,k) where D(n,m, fe, i) sin fa - D(n, TO, k,j) sin fa Dx(n, m, fe) = ^-f- —? (6.123) sm[fa - fa) D(n,m,k,i) cos fa + D(n,m,kJ) cos fa /R-1OA\ Dy(n, TO, fe) = -7-7-7 J T . (6.124) 6m(fa - fa) The above solution requires two component displacements. Since a typical log-polar lattice contains more than two orientations, a weighted combination of all component dis-placements can be used. The weight assigned to each component displacement equals the confidence of the local difference in phase measure. The weighted least square estimate for D(n,m, fc) and <f>r>(n,m,k) minimizes J = 52 C&ein, rn, fc, l)e2(n, TO, fc, I), (6.125) I Chapter 6. Local Measures 52 where e(n,m, k,l) = D(n,m,k,l) — D(n,m, fe) COS[<£D(B,TO, k) — fa], (6.126) The solution which minimizes (6.125) is given by n U ™ k\ Eijv>ij(n,m,k)D(n,m,kj)sinfa L>x[n,m,k) = — . JT (6.127) n / i \ E,j ivijjn,m, k)D(n,TO, fe, j) cosfa L»y(n,m,fe) = — - . — (6.128) E,;- w</(n, "i, fe) sm(0j - &) where tOij (n , m, fe) = L7A(s(n,m,fe,i)t7A(9(n,m,fe,<7')sin(^j — fa). (6.129) 6.2.2 Local Difference in Preferred Orientation The local difference in preferred orientation is given by A j./ U\ n K • Ep- M0(n, TO, fe, i) Mx (TO, TO, fe, j) sin 2{<j>j - fa) Afarin, TO, fe) = 0.5 arctan _ , , , , :—r: r y - r TT- (6.130) Eij AZo(n,m,fc,t)Mi(n,TO,fe,j)cos2(0j- — fa) The sensitivity derivatives for the local difference in preferred orientation can be expressed as SAfar(ntm, fe) £far(n,m, fe) (Ofo(n,TO, fe,t) i5A/o(n, TO, fe, *) and £A<£r(n, TO, fe) Sfar(n,m, fe) 6Mi(n,m,k,j) 6Mi(n,m,k,j)' The confidence of the measure will be defined as (6.131) (6.132) CA<f>(T)(n,m,k) = « <?<4(r),o(n,m,fe) if CV(T),o < C*(T),i C7^ (x),i(n, TO, fe) otherwise. The local difference in preferred orientation is also subject to aliasing errors. If the local moment of inertia is zero, the measurable difference is restricted to - | < A < f r < ^ . (6.134) Chapter 6. Local Measures 53 The smallest range of measurable differences in the preferred orientation occurs when the normalized local moment of inertia is one-half. This range is given by _ ^ < A < f r < ^ . (6.135) 4 4 Chapter 7 Forming and Characterizing Pixel Groupings This chapter presents the Gabor expansion-based segmentation algorithms which include edge detection, texture segmentation, motion segmentation, and segmentation by depth. The segmentation algorithms cluster adjacent spatial regions that contain similar local measures, thereby forming meaningful pixel groupings. Once the pixel groupings are formed, a more global characterization is created by combining local measures. The global characterization often provides a three-dimensional descriptions of objects being viewed. 7.1 Forming Pixel Groupings The local measures of the previous chapter are the basis of the pixel grouping techniques presented in this chapter. The techniques include edge detection, texture segmentation, motion segmentation, and segmentation by depth. 7.1.1 Edge Detection and Texture Segmentation Edge detection and texture segmentation have similar implementations. Both tech-niques use a magnitude map that describes the spatial positioning of local magnitudes M(n,m, k) for a given frequency There is a magnitude map for each frequency /fc. Higher frequency magnitude maps have a higher density of the local magnitudes. Both the edge detection and texture segmentation techniques search for large values in the magnitude maps. In texture segmentation, large local magnitudes are connected 54 Chapter 7. Forming and Characterizing Pixel Groupings 55 to form regions. If necessary, these regions are sub-divided into smaller groupings based on the preferred orientation. Edge detection uses peaks in the magnitude map to find transitions between distinctive texture regions. Edge Detection The transition region separating two or more texture regions is referred to as an "edge." The width of an edge is at least as wide as the spatial sampling interval, thus the best localization of an edge position is provided by the highest frequency magnitude map. The higher frequency magnitude maps, however, tend to produce false edge detections when the image is corrupted by random spectral variations. This sensitivity to noise occurs because elementary functions that have high modulation frequencies also have large bandwidths. The tradeoff between spatial extent and spectral bandwidth for the elementary function becomes a tradeoff between position localization and noise-in variance for the edge detection task. Within the transition region denoted by an edge, the local spectral activity has a wide frequency bandwidth oriented in a direction normal to the edge tangent. Because of the wide bandwidth, it is possible to combine magnitude maps from different frequency bands to produce an edge detector that is noise-invariant and provides good position localization. One such edge detector uses a weighted sum of all relevant magnitude maps. Since each magnitude map has a different spatial sampling density, this edge detector requires the interpolation of lower frequency local magnitudes. If the interpolated local magnitude is denoted by M(x,y,k), the combined map is given by MN(k)(x,y) = Y,N(k)M(x,y,k), (7.136) k where N(k) is a weighting term. The weighting term can be used to emphasize relevant frequencies and mask other frequencies. Determining which frequency maps are relevant Chapter 7. Forming and Characterizing Pixel Groupings 56 is postponed until the adjacent texture regions have been characterized (see section 7.2.2). Using the same weighting term, the preferred orientation of an edge segment can be estimated. The preferred orientation of the edge segment at (x,y) is given by M*,y) - 0- 5 a r c t a n E f civ(A :)A 2E JM(x, J,,fc,0cos2^ ( 7 ' 1 3 7 ) The preferred orientation is normal to the tangent of the edge at (x,y). Texture Segmentation Image texture is a pattern of pixel intensities that possesses some discernible attribute. Two such attributes include periodicity and directionality. The absence of periodicity or directionality can also be considered an attribute. Texture segmentation is a technique by which local patterns of pixel intensities with common attributes are grouped to form homogeneous texture regions. Texture segmentation is performed in stages. The first stage is to partition the mag-nitude map into regions with large local magnitudes. The second stage is to segment these regions by the normalized moment of inertia. In the final stage, regions with a low normalized moment of inertia are segmented based on preferred orientation. Thus, three types of regions are formed in which the final type can be further subdivided into orientation bands. Unlike an edge, there is no requirement that the texture of a region have wide spec-tral bandwidth. Combining information by adding magnitude maps is pointless since the resulting algorithm would be unable to discern two narrow-band textures that have the same preferred orientation but different frequencies. An alternative approach is to seg-ment each magnitude map separately, then integrate the spatial information. In such an approach, segmentation by texture frequency is done without regard for the orientation characteristics. / Chapter 7. Forming and Characterizing Pixel Groupings 57 7.1.2 Motion Segmentation M o t i o n segmentation uses the local difference in phase and the local difference i n pre-ferred orientation to form regions that have similar motion. Local translational motion is estimated using the difference i n phase. The difference i n preferred orientation is used to estimate local rotation about a lattice point. L ike texture segmentation, motion segmentation is performed i n stages. The first stage is to segment the magnitude maps based on local magnitude. If the local magnitude is small , the confidence of a motion measure w i l l be low. The second stage is to separate stationary and moving regions. The moving regions may be sub-divided based on speed and direction of translation or rotation. The speed of the translational motion is given by V(»,»,»)-5fc!S*2, (7.138) r-i — to where t0 and t\ are the time values for the reference image and the successive image, respectively. The velocity of the angular rotation is given by u>(n,m,k) — -. (7.139) ti — Io Because the translational and rotational velocity estimates are obtained from angular measurements, both can be corrupted by aliasing. To avoid aliasing, Hmits are imposed on the maximum measurable velocities. The measurable translational speed is l imited to V(n,m,k)< 1 (7.140) Zjk\t\ — to) The maximum measurable speed is larger for elementary functions that have lower mod-ulation frequencies. Lower frequency estimates are less l ikely to be corrupted by aliasing. It is possible to predict i f aliasing has corrupted a high frequency estimate by checking Chapter 7. Forming and Characterizing Pixel Groupings 58 lower frequency estimates, provided that the texture of the moving region has a wide spectral bandwidth. Any estimate of translational motion may suffer from the aperture problem [8]. In such a case, only the component of the motion in the direction of the preferred orien-tation can be measured. Motion estimates suffering from the aperture problem occur in regions where the local moment of inertia is small. Since combining local measures often increases the moment of inertia, the aperture problem can usually be resolved in the characterization stage. The measurable rotational velocity is conservatively restricted to < u)(n,m,k) < * . . (7.141) 4 ( t 1 - t „ ) ' 4(* 1-i 0) The range of measurable rotational velocities is independent of the modulation frequency, thus aliasing can not be resolved using lower frequency estimates. 7.1.3 Segmentation by Depth "Depth" refers to the distance of a viewed object from the camera. This distance is measured along the z-axis in a three-dimensional space. The depth of a point in a three-dimensional space can be estimated if images from two or more viewpoints are available. The case of two viewpoints is shown in figure 7.18. A point (x,y, z) will be projected onto an image plane for the left and right viewpoints at positions (scojj/o) and (a;i,j/i), respectively. The displacement in the position of the projected point along the viewing baseline (referred to as the "baseline displacement") is given by DB{x0,yo) = - *o)2 + (yi - 3fo)2]0-5 « M # B - arctan K i ^ . ] , (7.142) I l — X0 Chapter 7. Forming and Characterizing Pixel Groupings 59 Figure 7.18: Vision System with Two Viewpoints Chapter 7. Forming and Characterizing Pixel Groupings 60 where <J>B is the orientation of the viewing baseline. From the geometry shown in figure 7.18, the depth of the projected point is given by z(x0,y0) = zf[- , J , (7.143) e — JJB{x0,y0) where Zf is the focal length and e is the baseline separation of the two viewpoints. Instead of estimating the depth of a point, this implementation attempts to estimate the depth of a region. The size of the region is defined by the spatial extent of the elementary function centered at (n, m, k). A higher frequency elementary function defines a smaller region. In order to estimate depth, a Gabor expansion is performed on the left and right images. The baseline displacement is calculated using local phase differences between the left and right images. If the same spatial origin is used for each Gabor lattice, the estimated depth of the region centered at (n,m,k) is given by where DB(H, m, k) = D(TO,TO, k) COS[<J>B — <^ p(n, TO, k)]. (7.145) D(n, TO, k) and <£D(TO, TO, k) are the magnitude and direction of the measured displacement (see section 6.2.1); DB(n,m,k) is the baseline displacement. Because the baseline displacement D B ( B , T O , k) is measured using local phase differ-ences, the depth estimate may be corrupted by aliasing. An aliased depth estimate will appear closer to the observer. The range of measurable depths is restricted to 0 < z(n,m,k) < n ZJ (7.146) 2e/fc - 1 In order to measure depths that are farther, the above-mentioned range must be transformed by shifting one of the spatial sampling lattices. If the left view is used as Chapter 7. Forming and Characterizing Pixel Groupings 61 the reference viewpoint, the origin of the spatial lattice in the right image is shifted to the left along the viewing baseline. The estimated depth for the shifted binocular vision system is given by z{n>m>k) = Z'[e-DB(n,m,k;6.)-6.]' ( 7 ' U 7 ) where 8t is the lattice shift along the viewing baseline, and -D#(n, m, k; 8t) is the measured baseline displacement when one of the lattices is shifted by 6S. The range of measurable depths for the shifted binocular vision system is given by M i ^ ^ ] < ^ , m , , ) < 2 y [ ^ M ± L _ ] . (7.148) The lattice shift focuses the system on a reference depth. The reference depth, which corresponds to ##(71,771, k; 8t) = 0, is given by = */[-Ar]- (7'149) e — o, Since the reference depth must be positive and finite, the lattice shift must be restricted to 0 < 8, < e. (7.150) Because there is a limited range of measurable depths for a given frequency /*., it may be necessary to choose a reference depth that corresponds to a region of interest. If the lattices are shifted to focus on this region of interest, the sign of the displacement will separate near and far regions. Near (far) regions are closer to (farther from) the observer than the reference depth. The near and far region segmentation requires a focusing mechanism to direct the lattice shift. The displacement D(n,m, k;6a) can be used to direct the lattice shift if no aliasing has occurred. For larger displacements, an alternative approach is required. If the region of interest has a wide frequency bandwidth, lower frequency estimates can be used to determine if aliasing has occurred. If aliasing has occurred, the lower Chapter 7. Forming and Characterizing Pixel Groupings 62 frequency estimates can be used to make coarse adjustments to the spatial origin. Since the largest adjustment is e, the lowest frequency in the set of elementary functions should be chosen such that fmin < (7-151) Finer adjustments can be made later using the higher frequency estimates, thereby re-ducing the size of the region of interest. Segmentation by depth is similar to motion segmentation. The magnitude maps are segmented to identify depth measures that have high confidence. A reference depth is selected; the depth measures are segmented into near and far regions. 7.1.4 Integrating Maps to Form Relevant Pixel Groupings The edge detection, texture segmentation, motion segmentation, and segmentation by depth algorithms produce various maps. The edge detection produces one map that combines magnitude information from each frequency. The other segmentation methods produce individual maps for each frequency. Each map provides its "best guess" of how the image is to be segmented. These guesses are integrated to obtain the best possible segmentation. In order to segment a given map, each point is assigned a label based on the local measures. The edge detection labels include edge and not edge. The texture segmen-tation labels include insufficient spectral energy, high moment of inertia (no preferred orientation), and orientation. The motion segmentation labels include insufficient spec-tral energy, aliased measure, low moment of inertia (aperture problem), and motion. The segmentation by depth labels include insufficient spectral energy along the viewing base-line, aliased measure, and relative depth. Orientation, motion, and relative depth are considered quantitative measures. For quantitative measures, the value of the measure Chapter 7. Forming and Characterizing Pixel Groupings 63 1 , • 1 4 t \ \ 4 w 1 ^ Four Point Eight Point Figure 7.19: Four and Eight Point Connectivity is included in the label. After the labels have been assigned, regions and edges are formed by connecting ad-jacent points with common labels. Points that have quantitative measures are connected if the difference is small. The edges are formed using an eight point connectivity; the re-gions are formed using a four point connectivity. The four and eight point connectivities for a rectangular spatial sampling lattice are shown in figure 7.19. Once the regions and edges are formed, the various maps are integrated. The inte-grated map has a spatial sampling density equal to that of the highest frequency map. When the maps are being integrated, a preferred scale (frequency) must be chosen. The preferred scale defines the minimum size of a region and the maximum local density of edges. The choice of scale can be guided by the confidence estimates within individual segmentation maps. Even though a preferred scale is chosen, the higher frequency maps are still required for labeling points near texture boundaries [24] and for the localization of edges. The preferred scale may be different for each region. Chapter 7. Forming and Characterizing Pixel Groupings 64 7.2 Characterizing Regions and Edges Once the regions and edges have been formed, local measures are combined to provide more global characterizations. These characterizations are used to infer three-dimensional properties. 7.2.1 Texture of a Region The texture of a region is characterized by a preferred orientation, a normalized moment of inertia, a dominant frequency along an orientation, and an omni-directional dominant frequency. The respective measures are expressed as MR) = 0.5 arctan ^ X ^ ^ ^ u ^ ^ K ) E f c ^)/ f c 2 En,m€flDM(n,m,fe, /)cos2^ V ; r r m _ S» NWk E l M(n,m, k, I) s i n 2 Q r - fa) M ] ~ E f ciV(fe)/ 2E n,m eHDM(n,m,fe,Z) K 1 ZkN(k)Znim€RM(n,m,k,l) y > V 1 ZkN(k)Y:ntmeRM(n,m,k) v ; where R denotes the region being characterized. The weighting term N(k) can be used to mask frequencies that are not relevant. The lowest relevant frequency is determined by the size of the region being characterized. Any low frequency elementary function whose effective spatial extent is larger than the region is not required to characterize the texture. Texture can be used to infer surface properties. If the three-dimensional surface has a homogeneous microstructure, perspective distortion associated with differences in depth will cause changes in the local density of the image texture. The magnitude and direction of the maximum change in the local density of the texture is measured by the Chapter 7. Forming and Characterizing Pixel Groupings 65 texture gradient. To obtain the texture gradient, a region is divided into smaller sub-regions. By calculating the difference in dominant frequency between pairs of sub-regions, the magnitude and direction of the texture gradient can be estimated. The magnitude and direction of the texture gradient can be used to estimate the slant and tilt of a homogeneous surface [8]. Changes in the preferred orientation across subregions can be used to infer surface curvature [8]. 7.2.2 Edges For an edge, the lowest relevant frequency depends on the dominant frequencies of the adjacent texture regions. The spectral peaks, which identify the edge, appear in only the magnitude maps that have frequencies higher than the dominant frequency. Magnitude maps whose frequencies are less than or equal to the dominant frequency of the adjacent regions are not required for edge detection. Edge and depth information can be combined to detect occluding contours. An occluding contour is a closed edge that follows a discontinuity in depth. The occluding contour provides an outline of the two-dimensional projection of an object. By assuming certain constraints [8], the two-dimensional outline can be used to infer three-dimensional shape. 7.2.3 Three-Dimensional Position and Motion In order to estimate the three-dimensional position and motion of an object, the view-point constraints relating the two-dimensional projected coordinates to the original three-dimensional coordinates must be known. The three-dimensional coordinate (x, y,z) is related to the two-dimensional projection coordinate (x,y) as follows: Chapter 7. Forming and Characterizing Pixel Groupings 66 V = -T-y, (7-157) Zf + z where Zf is the focal length of the camera. The problem of converting the two-dimensional projection coordinates into three-dimensional physical coordinates is underconstrained for a single image. A second image from a different viewpoint or time is required to fully constrain the problem. Three-Dimensional Position The two-dimensional projected position of a segmented region will be defined by its center of area. The center of area is given by i 0 = ^ n ^ X n (7.158) Ho = EmAy^m, (7.159) MR where NR is number of lattice points in the segmented region R. The three-dimensional position of the object corresponding to the segmented region is given by Zf + z(x0,y0) (T Mtn\ «o = [— Jso (7.160) z f 2/o = [— j2/o (7.161) zi z0 = z(xQ>y0). (7.162) Three-Dimensional Motion Three-dimensional motion of a point (x,y, z) can be expressed as Vx{x,y, z) = Tx + ( y - yc)R* - { z - zc)Ry (7.163) Vy{x,y,z) = Ty + ( z - zc)Rx - ( x - xc)Rz (7.164) Chapter 7. Forming and Characterizing Pixel Groupings 67 Vz{x, y, z) = Tz + (x- xc)Ry - (y - yc)Rx (7.165) where Tx,Ty,Tz are the components of the translational motion along the x-, y-, and z-axes; Rx, Ry, Rz are the components of rotation viewed along the x-, j/-, and z-axes; and (xc,j/c,zc) is the center of the rotation. The two-dimensional projection of the motion along an orientation <f>i is given by Assuming that Zf and <f>i are known, the unknown variables in (7.166) include nine motion parameters plus the depth at each point (x,i7). Instead of measuring the motion of a point, this implementation measures the motion of a region defined by an elementary function. The apparent two-dimensional motion of the elementary function (n, m, fc, I) along the orientation <pi can be approximated by TW 7\ ~ D(n,m,k,l) / 7 1 f i 7 A V(n,m,k,l) w — . (7.167) This approximation can be used to form the following cost function: J= E tf(/0£C(».™.M)e2(n,m,M), (7-168) nmkeR I where e(n, m, fc, /) = D(n, m, fc, /) - V(n, m, fc, /)Ai. (7.169) Most of the motion parameters can be estimated by minimizing the cost function (7.168). The cost function (7.168) assigns the same motion parameters to each lattice point in a segmented region, implicitly assuming rigid body motion. The motion of deformable bodies must be described locally. The partial derivatives of the cost function with respect to the motion parameters are listed in appendix C. Only six of nine motion parameters are independent. The parameter Chapter 7. Forming and Characterizing Pixel Groupings 68 dependence can be illustrated by rewriting the three-dimensional motion equations as Vm(x,y,z) = [Tx - ycRz + zcRy] + yRz - zRy (7.170) Vy{x, y, z) = [Ty - zcRx + xcRz] + zRx - xRz (7.171) Vz(x,y,z) = [Tz-xcRy + ycRx] + xRy-yRx. (7.172) The ambiguity between translation and center of rotation can be resolved by assigning values to either the translations (Ta.,Ty,Tz) or the center of rotation (xc,yc,zc). Physical constraints on the trajectory of a moving object can be exploited when assigning values to dependent parameters. If it is assumed that the acceleration of the moving object is constant and the velocity is finite, the changes in the motion parameters during a short time interval are small. Large parameter changes can be discouraged by adding penalty terms to the cost function (7.168). The new cost function, which penalizes large parameter changes, is given by Jp = J + At £ CiiSiAPi)2 (7.173) < where SVi Si = ^ , (7.174) AP^Pi-Pi, (7.175) The other terms in (7.173) are as follows: Pi is one of the nine motion parameters; Pi is the motion parameter i as predicted by the past estimates; and Ci is the confidence of the predicted motion parameter i. Other constraints and measurements can be included in the cost function (7.173) to further improve the motion parameter estimates. These other sources of information include the local difference in preferred orientation (localized rotation), the change in the three-dimensional position (motion using feature-based correspondence [16]), and a Chapter 7. Forming and Characterizing Pixel Groupings 69 priori knowledge of the objects being viewed. Default conditions can be incorporated into (7.173) by setting the predicted parameter P equal to the default value and assigning it a relatively low confidence C. Chapter 8 Examples This chapter presents examples of the previously mentioned techniques for forming and characterizing pixel groupings. Pixel groupings are formed using edge detection, texture segmentation, and motion segmentation. Characterization of edges by local curvature and regions by rigid body motion is included. The purpose of this chapter is to illustrate, by example, the properties of the Gabor expansion-based algorithms. The algorithms use thresholds that have been subjectively chosen to best illustrate the properties of the current example. Further work would be needed to set these thresholds automatically. Each example in this chapter uses a synthetic test image. The pixel intensities for a test image are restricted to integer grey-level values between 0 (black) and 255 (white). The Gabor expansion is performed on the test image to obtain a set of expansion coeffi-cients. These coefficients are used to form magnitude maps for each frequency represented in the lattice. To display the magnitude maps, the local magnitudes are assigned a grey-level value. The maximum grey-level value (white, 255) is assigned to the largest local magnitude within a given map. 8.1 Edge Detection The proposed edge detector applies a threshold to each magnitude map to detect spectral peaks, referred to as edges. A second threshold based on local moment of inertia is applied to the edges to determine which segments have no preferred orientation. The segments 70 Chapter 8. Examples 71 Figure 8.20: Test Image 1: A Circle lacking a preferred orientation are marked in white in the display of the edge detector output. The edge segments with a preferred orientation are marked by a grey square with a white line. The white line denotes the direction of the tangent of the edge segment. This direction is normal to the local preferred orientation. The remaining regions, whose local magnitude is less than the threshold, are marked in black. The following subsections use examples to illustrate properties of the proposed edge detector. These properties include position localization, noise immunity, and local cur-vature estimation. 8.1.1 Position Localization A circle on a dark background is used as a test image to illustrate position localization. The test image is shown in figure 8.20. The magnitude maps for two frequencies appear in figures 8.21 and 8.22. Figures 8.23 and 8.24 display the corresponding edge detector Chapter 8. Examples 72 Figure 8.22: Magnitude Map: Test Image 1, fk = 0.25 cycle/pixel Chapter 8. Examples 73 Figure 8.23: Edge Detector Output: Test Image 1, /* = 0.125 cycle/pixel outputs. Comparing the edge detector outputs for the two frequencies, it can be seen that the higher frequency output (figure 8.24) has a narrower edge width. The narrower edge width indicates better position localization of the edge. 8.1.2 Noise Immunity The circle test image is corrupted by additive noise to illustrate the property of noise immunity. The noise corrupted test image is shown in figure 8.25. The magnitude maps for the two frequencies appear in figures 8.26 and 8.27. Figures 8.28 and 8.29 display the edge detector outputs. Comparing the edge detector outputs that appear in figures 8.28 and 8.29 with those of the previous example (figures 8.23 and 8.24), it can be seen that the additive noise causes false edge detections. Since fewer noise-induced (false) edge segments are produced in the lower frequency output (figure 8.28), the lower frequency edge detector has greater noise immunity. er 8. Examples Figure 8.25: Test Image 2: A Circle With Additive Noise Chapter 8. Examples 75 Figure 8.27: Magnitude Map: Test Image 2, fk = 0.25 cycle/pixel er 8. Examples Figure 8.28: Edge Detector Output: Test Image 2, fk = 0.125 cycle/pixel Figure 8.29: Edge Detector Output: Test Image 2, fk = 0.25 cycle/pixel Chapter 8. Examples 77 Figure 8.30: Test Image 3: One Frequency, Four Orientations 8.1.3 Relevant Frequencies To illustrate the concept of relevant frequencies (see section 7.2.2), a test image whose texture has a dominant frequency is used (see figure 8.30). The background texture for the test image is synthesized using Gabor expansion coefficients from one frequency. The expansion coefficients used in synthesizing the texture have the same magnitude but random phases [10]. The texture in the circle is generated in a similar manner. Figures 8.31 and 8.32 display the magnitude maps for two frequencies. The lower of the two frequencies is the same as the dominant frequency of the generated texture. The edge detector outputs appear in figures 8.33 and 8.34. The lower frequency output (figure 8.33) does not "see" the edge. The edge is detectable using the higher frequency output (figure 8.34). Thus, the magnitude maps whose frequency is less than or equal to the dominant frequency of the adjoining regions are not relevant to the edge detection task. Chapter 8. Examples Figure 8.32: Magnitude Map: Test Image 3, fk = 0.25 cycle/pixel er 8. Examples Figure 8.34: Edge Detector Output: Test Image 3, fk = 0.25 cycle/pixel Chapter 8. Examples 80 Figure 8.35: Test Image 4: A Rectangle 8.1.4 Local Curvature Estimation A test image consisting of a rectangle on a dark background is used to illustrate the detection of edge segments that have high local curvature (ie. corners). An edge segment with a high local curvature will have a high local moment of inertia. Thus, the edge detector can be used to detect such edge segments. The test image and the edge detector output appear in figures 8.35 and 8.36, respec-tively. In the edge detector output, the edge segments with high local curvature are marked in white. For the test image in figure 8.35, these segments are the corners of the rectangle. Chapter 8. Examples 81 Figure 8.36: Edge Detector Output: Test Image 4 8.2 Texture Segmentation Texture segmentation is performed using a modified version of the edge detector. The threshold used on the magnitude map is adjusted such that regions with large local magnitudes can be detected in addition to spectral peaks. The output format for texture segmentation is the same as the edge detector: black denotes regions with low local magnitudes; white denotes regions with no preferred orientation; white line on a grey square denotes the direction normal to the preferred orientation. 8.2.1 Segmentation by Orientation Figure 8.37 contains the segmentation by orientation test image. The background texture of the test image is restricted to a single frequency and orientation. The texture in the circle has the same dominant frequency as the background but a different orientation. Chapter 8. Examples 82 Figure 8.37: Test Image 5: One Frequency, Two Orientations The magnitude maps for two frequencies appear in figures 8.38 and 8.39. Figures 8.40 and 8.41 display the texture segmentation outputs for the same frequencies. In the lower frequency output (figure 8.40), two distinct regionsscan be formed if local measures with common orientations are connected. The higher frequency output (figure 8.41) is the same as an edge detector output, localizing the position of the transition between the two textured regions. 8.2.2 Segmentation by Frequency The segmentation by frequency test image is shown in figure 8.42. In this example, the textures in the circle and in the background have the same orientation but different dominant frequencies. The magnitude maps for the two frequencies are shown in figures 8.43 and 8.44. Figures 8.45 and 8.46 display the corresponding texture segmentation outputs. Both texture segmentation outputs can discern the circle if the thresholds are er 8. Examples Figure 8.39: Magnitude Map: Test Image 5, fk = 0.25 cycle/pixel Chapter 8. Examples 84 Figure 8.41: Texture Segmentation Output: Test Image 5, fk = 0.25 Figure 8.43: Magnitude Map: Test Image 6, fk = 0.125 cycle/pixel Figure 8.45: Texture Segmentation Output: Image 6, fk = 0.125 Chapter 8. Examples 87 Figure 8.46: Texture Segmentation Output: Image 6, fk = 0.25 properly chosen. The selection of the thresholds can be guided by the boundary position, which is localized using higher frequency maps (not shown). 8.3 Motion Segmentation The input to the proposed motion segmentation algorithm consists of two magnitude maps from successive time instants. At each time instant, a threshold is applied to the magnitude map to determine which regions contain enough spectral energy to accurately measure the motion. The motion estimate is unreliable if the local magnitude is below the threshold at one or both time instants. The regions containing unreliable motion estimates are marked in black in the motion segmentation output. In the regions with sufficient spectral energy, the magnitude and direction of the motion can be estimated. If the magnitude of the motion is above a specified threshold, the region is labeled "moving." In the output, the moving regions are marked by a black Chapter 8. Examples 88 Figure 8.47: Test Image 7: Camouflaged Circle dot with an attached white line. The black dot represents the starting position; the white line represents the direction of motion. 8.3.1 Camouflaged Object The test image for the motion segmentation example appears in figure 8.47. The test image contains a camouflaged circle. The circle is not discernible because the texture for both the circle and the background consists of a random pattern which has a wide spectral bandwidth. Even though the random pattern disguises the stationary circle, the shape and motion of the moving circle are highlighted in the motion segmentation output. The motion segmentation output for a one pixel displacement of the circle along the x-axis is shown in figure 8.48. The three-dimensional rigid body motion of an object can be estimated if assumptions are made. It is assumed that the circle in figure 8.47 is a two-dimensional projection of Chapter 8. Examples 8 9 Figure 8.48: Motion Segmentation Output: Test Image 7 a flat disk, initially positioned at (xo,yo,0). The center of rotation is assumed to be the center of the circle, i.e. ( z c , yc, zc) = («o, I/o, 0 ) . ( 8 . 1 7 6 ) The measured three-dimensional motion parameters, subject to the above-mentioned assumptions, appear in table 8.1. The measured direction of translational motion is close Motion Measured Actual 1.156 1.000 -0. 0 2 0 0.000 0.0 0 1 0.000 - 0 . 0 0 1 0.000 Ry 0.003 0.000 Rz 0 . 0 0 1 0.000 Table 8.1: Motion Parameters for the Camouflaged Circle Chapter 8. Examples 90 to the actual value; the magnitude is over-estimated. 8.3.2 Aperture Problem and Insufficient Spectral Energy The aperture problem is inherent in local measurements of motion for regions that have a low moment of inertia. In these regions, the measured direction of motion follows the preferred orientation of the local texture instead of the actual direction of motion; the component of motion normal to the preferred orientation is not measured. The other problem inhibiting local measurements of motion is insufficient spectral energy within a region. The absence of spectral energy implies the absence of reference points needed to detect movement. In these regions, no motion is measured. To illustrate effects of insufficient spectral energy and the aperture problem, consider the test image in figure 8.20. If the circle is translated along the z-axis, the motion segmentation output should be similar to figure 8.48. Figure 8.49 displays the actual motion segmentation output. The motion of the interior of the circle and the background can not by measured because there is insufficient spectral energy in these regions. The local measurements of motion at the top and bottom edges of the circle are incorrect because of the aperture problem. The preferred orientation of these edges is along the y-axis; the actual motion is along the z-axis. The motion segmentation output that appears in figure 8.49 offers two possible seg-mentations: a pair of arcs, or a ring. If a threshold is applied to both the motion and the confidence (local magnitude), the resulting segmentation consists of two arcs positioned at the leading and trailing edges of the circle. If a threshold is applied to the confidence alone, the motion segmentation algorithm detects the ring that encloses the circle. The three-dimensional motion parameters for the circle, as estimated by the local measures of motion within the arcs and the ring, appear in table 8.2. Both estimates use the same assumptions as the previous motion example; the circle is a two-dimensional projection Chapter 8. Examples 91 Figure 8.49: Motion Segmentation Output: Translating Circle Motion Arcs Ring Actual Tx 0.430 0.221 1.000 0.025 -0.001 0.000 Tz 0.055 0.044 0.000 R* -0.001 0.000 0.000 Ry -0.010 -0.015 0.000 Rz -0.001 0.000 0.000 Table 8.2: Motion Parameters for Translating Circle Chapter 8. Examples 92 Motion Arcs Ring Actual Tx 0.911 0.792 1.000 Ty 0.007 -0.007 0.000 Tz 0.038 0.026 0.000 Rx 0.000 0.000 0.000 Ry 0.000 0.000 0.000 R* 0.000 0.000 0.000 Table 8.3: Motion Parameters for Circle: Pure Translation Assumed of a rigid, flat disk initially positioned at (xo,yo,0). As a consequence of insufficient spectral energy in the interior of the circle and the low local moment of inertia along the ring (aperture problem), the local motion information is too spare to accurately estimate the three-dimensional motion of the circle. The dominant error in the estimation of the motion parameters occurs because a translation along the x-axis (Tx) and a rotation about the y-axis (Ry) produce similar local motions for the given segmentations (arcs or ring). The estimate of three-dimensional motion can be improved if a priori knowledge is exploited. If it is known that the motion is purely translational, the rotational parameters can be set to zero. The resulting estimate of the three-dimensional motion parameters appear in table 8.3. For this example, the pure translation assumption significantly improves the estimate of the three-dimensional motion parameters. 8.3.3 Translation in Depth Figure 8.50 displays the motion segmentation output for a shrinking circle. If the center of the circle coincides with the center of the camera view, and if rigid body motion is as-sumed, the perceived three-dimensional motion of the shrinking circle is a pure translation in depth (all other translation and rotation parameters are zero). If these assumptions Chapter 8. Examples 93 Figure 8.50: Motion Segmentation Output: Shrinking Circle are combined with those of the previous motion examples, the translation in depth can be accurately measured. The measured three-dimensional motion parameters appear in table 8.4. Table 8.4 also contains the estimate of the three-dimensional motion parame-ters subject to the pure translation assumption. For this example, the pure translation assumption does not improve the parameter estimates. Motion Measured Pure Translation Actual T -0.025 -0.007 0.000 Ty 0.053 0.025 0.000 Tz 0.941 0.940 1.000 Rx -0.001 0.000 0.000 Ry -0.001 0.000 0.000 Rr -0.000 0.000 0.000 Table 8.4: Motion Parameters for Shrinking Circle Chapter 8. Examples 94 8.4 Discussion The edge detector has scale-dependent properties which include position localization (section 8.1.1), noise immunity (section 8.1.2), and relevant frequencies (section 8.1.3). For a high frequency map, the edge detector output has good position localization, but poor noise immunity. For a low frequency map, the edge detector output has good noise immunity, but poor position localization. The low frequency map, however, may be irrelevant to the edge detection task if its frequency is less than or equal to the dominant frequency of the adjoining regions' texture. Therefore, a good edge detector should combine information from all relevant magnitude maps, using low frequency maps to detect the presence of an edge, and high frequency maps to localize its position. In some of the edge detector outputs, there are breaks and double detections of the edge contour (see figure 8.24). At these locations, the actual edge contour passes between Gabor expansion lattice points, thus splitting the edge's energy between adjacent local magnitudes. Applying a high threshold to the magnitude map produces a break in the detected edge contour; applying a low threshold produces a double detection. This problem can be corrected: by using a more elaborate peak finding technique in place of a threshold, or by shifting (jittering) the Gabor expansion lattice for better detection of problem edge segments. If the "jittering lattice" approach is used, information from each shifted lattice is combined to achieve the better edge detection. The position localization of the edge is still limited by the resolution of the chosen frequency map; further position localization is only achieved by utilizing higher frequency maps. The texture segmentation algorithm is similar to the edge detection algorithm; both outputs highlight large values in a given magnitude map. The difference between the two algorithms is the fact that texture segmentation uses lower frequency maps that are considered irrelevant to the edge detection task. In these lower frequency maps, a Chapter 8. Examples 95 threshold is selected to highlight a region instead of an edge. A region can be subdivided by preferred orientation if there are distinct changes in the directionality of the region's texture (section 8.2.1). In the higher frequency maps, the texture segmentation and edge detection algorithms are identical. (Note the similarities between figures 8.24 and 8.41). In the motion segmentation algorithm, the aperture problem and the effect of in-sufficient spectral energy have been illustrated (section 8.3.2). These problems are not exclusive to the Gabor expansion-based algorithm; all motion segmentation algorithms that use local measurements experience these two problems [8] [16]. Unlike many im-plementations, however, the Gabor expansion-based algorithm can detect the aperture problem and insufficient spectral energy by using the normalized local moment of inertia and the local magnitude, respectively. It has been shown that the information used in the motion segmentation algorithm can also be used to estimate three-dimensional rigid body motion (section 8.3). Lo-cal phase shifts within a segmented region are combined to produce an estimate of the three-dimensional motion parameters. The estimate of the three-dimensional motion pa-rameters will be good if many high quality (high confidence) local motion measurements are available (section 8.3.1). If the quality of the local motion measurements is low be-cause of the aperture problem or insufficient spectral energy, then additional assumptions are required to ensure a good estimate of the three-dimensional motion (section 8.3.2). Chapter 0 Conclusion 9.1 Summary The purpose of this thesis has been to show the utility of the Gabor expansion as a vi-sual preprocessing step by presenting various segmentation algorithms whose basis is the Gabor expansion. The segmentation algorithms include edge detection (section 7.1.1), texture segmentation (section 7.1.1), motion segmentation (section 7.1.2), and segmen-tation by depth (section 7.1.3). The properties of the first three segmentation algorithms have been illustrated by example in chapter 8. The most interesting of the algorithms is motion segmentation. The motion segmen-tation algorithm is novel because it uses the local phase information obtainable from the Gabor expansion to estimate local displacements between pairs of images. The motion segmentation algorithm is rich with information; local measures obtained from the al-gorithm can be combined to estimate the three-dimensional motion of a viewed object (section 7.2.3). A weighted combination, based on the confidence of local measures, is used to bias the estimation process in favor of higher quality data. The utility of the Gabor expansion is increased when the segmentation algorithms are implemented together in an integrated vision system. An integrated vision system com-bines information from various segmentation modules to provide reliable partitioning of an image. When the Gabor expansion-based segmentation algorithms are implemented in an integrated vision system, only one Gabor expansion per viewpoint per time instant 96 Chapter 9. Conclusion 97 is required, independent of the number of segmentation modules. Since the Gabor expan-sion is the most computationally intensive step in each of the segmentation algorithms, the efficiency of the integrated system is much greater than the sum of the individual modules. An integrated vision system that exclusively uses Gabor expansion-based segmenta-tion algorithms has theoretical value. Such a system presents a unified approach to the early stages of computer vision, which loosely parallels biological vision. 9.2 Further Research It is hoped that this thesis will stimulate further research into the Gabor expansion in both the computer and biological vision context. This section lists some areas that require further study. The process of integrating spatial data from various segmentation maps (section 7.1.4) has not been fully developed in this thesis. An integration procedure that combines segmentation maps of different scales (or equivalently, different frequencies) is required to successfully segment complex images such as natural scenes. The integration procedure must be able to select the appropriate scale at which a scene or an object is viewed. Fine details of an image can be described as a group of edges if a fine scale (high frequency map) is selected or as a texture if a coarse scale (low frequency map) is selected. Even though the selection of the scale can be partially guided by the data (ie. the confidence of local measures within each segmentation map), the use of a set of heuristic rules seems unavoidable. The motion segmentation example in section 8.3 use two related images; there is no uncorrelated noise between the two images. Uncorrelated noise will degrade the estimate of the three-dimensional motion parameters. To minimize the effect of the noise, the Chapter 9. Conclusion 98 motion parameters should be estimated using more than two images. Although the cost function (7.173) provides a method for combining past parameter estimates, it should not be used for noisy data because of the similarity of local motions for certain translation-rotation pairs (see section 8.3.2). The exceptional cases occur when either the pure translation or pure rotation assumption can be applied. Because the three-dimensional motion estimate is noise-sensitive, local measures from different time instants should be combined before estimating the motion parameters. The motion parameters should be estimated by minimizing a new cost function that comprises the local displacements from a series of time instants instead of only two time instants. It would be interesting to compare the performance of the Gabor expansion-based segmentation algorithms with the currently used segmentation algorithms. The compar-ison should be performed separately on individual modules, as well as collectively in an integrated vision system. The performance evaluation should consider the accuracy of the segmentation, generality of application, computational efficiency, and ease of parallel implement ation. It would also be interesting to ascertain if the Gabor expansion-based segmentation algorithms are useful models of biological vision. In particular, is there a connection between the range of measurable depths defined by (7.148) for computer vision and Parum's fusional area [8] for biological vision? Researchers in the field of biological vision may be able to answer this question, and may be able to find other parallels between these segmentation algorithms and biological vision. References [1] D. Gabor, "Theory of communication," J. Inst. Elec. Eng., vol. 93, pp. 429-457, 1946. [2] J. D. Daugman, "Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters," J. Opt. Soc. Am. A, vol. 2, no. 7, pp. 1160-1169, 1985. [3] J. D. Daugman, "Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression," IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, no. 7, pp. 1169-1179, 1988. [4] M. J. Bastiaans, "Gabor's expansion of a signal into Gaussian elementary signals," Proc. IEEE, vol. 68, pp. 538-539, 1980. [5] M. J. Bastiaans, "A sampling theorem for the complex spectrogram, and Gabor's expansion of a signal in Gaussian elementary signals," Optical Eng., vol. 20, no. 4, pp. 594-598, 1981. [6] J. Daugman, "Entropy reduction and decorrelation in visual coding by oriented neural receptive fields," IEEE Trans. Biomedical Eng., vol. 36, no. 1, pp. 107-114, 1989. [7] M. Porat and Y. Zeevi, "The generalized Gabor scheme of image representation in biological and machine vision," IEEE Trans. Pattern Anal. Machine IntelL, vol. 10, no. 4, pp. 452-467,1988. 99 References 100 [8] D. Marr, Vision. San Francisco, CA: Freeman, 1982. [9] M. Turner, "Texture discrimination by Gabor functions," Biol. Cybern., vol. 55, pp. 71-82, 1986. [10] M. Porat and Y. Zeevi, "Localized texture processing in vision: Analysis and syn-thesis in the Gaborian space," IEEE Trans. Biomedical Eng., vol. 36, no. 1, pp. 115-129, 1989. [11] A. Bovik, M. Clark, and W. Geisler, "Computational texture analysis using local-ized spatial filtering," Proc. IEEE Comp. Soc, Miami Beach, Florida, 1987, pp. 201-206. [12] T. Lawton, "Outputs of paired Gabor filters summed across the background frame of reference predict the direction of movement," IEEE Trans. Biomedical Eng., vol. 36, no. 1, pp. 130-139, 1989. [13] D. J. Heeger, "Optical flow using spatiotemporal filters," Int. J. Comp. Vision, pp. 279-302,1988. [14] A. B. Watson and A. Ahumada, "Model of human visual-motion sensing," J. Opt. Soc. Am. A, vol. 2, no. 2, pp. 322-341, 1985. [15] E. Adelson and J. Bergen, "Spatiotemporal energy models for the perception of motion," J. Opt. Soc. Am. A, vol. 2, no. 2, pp. 284-299, 1985. [16] J. Aggarwal and N. Nandhakumar, "On the computation of motion from sequences of images - A review," Proc. IEEE, vol. 76, no. 8, pp. 917-935,1988. [17] M. Jenkin, "The stereopsis of time-varying images," Research in Biological and Computational Vision, RBCV-TR-84-3, University of Toronto, 1984. References 101 [18] A. K. Mackworth, personal communication. [19] I. Gohberg and S. Goldberg, Basic Operator Theory. Boston, MA: Birkhauser, 1981. [20] D. G. Luenberger, Linear and Nonlinear Programming. Reading, MA: Addison-Wesley, 1984. [21] F. Beer and E. Johnston, Mechanics for Engineers: Statics and Dynamics. New York, NY: McGraw-Hill, 1976. [22] A. Oppenheim and J. Lim, "The importance of phase in signals," Proc. IEEE, vol. 69, no. 5, pp. 529-541, 1981. [23] Y. Zeevi and M. Porat, "Phasogram: Image representation by localized phase," J. Opt. Soc. Am. A, vol. 3, no. 13, p. 115, 1986. [24] R. Wilson and M. Spann, "Finite prolate spheroidal sequences and their applica-tions II: Images feature description and segmentation," IEEE Trans. Pattern Anal. Machine Intell, vol. 10, no. 2, pp. 193-203, 1988. Appendix A Selecting an Appropriate Convergence Factor This appendix contains the derivation of the range of convergence factors for which an iterative sequence is guaranteed to converge. As an intermediate step, the optimal convergence factor for the steepest descent method is derived. The cost of the coefficient errors at iteration i is given by J(a;) = - c^ a,- + const, (A.177) where Q is the symmetric overlap matrix; a is the expansion coefficient vector; and c is the projection coefficient vector. Since the expansion coefficient update equation for the steepest descent method is given by o i + 1 = a I -A ( V J ) , (A.178) the cost at iteration i + 1 can be expressed as J(ai+1) = J(*) + ^ (VJfQ(VJ) _ m _ _ _ _ ) T ( V J ) ( A n 9 ) Since V J = Qa - c, (A.180) equation (A.179) can be rewritten as J(a i + 1) = m - A(VJ) r(VJ) + # ( V J )^( V J ) . (A.181) The derivative of (A.181) with respect to # is given by X7(7i..-\ = -(VJ) r (VJ) + A (VJfQ(VJ). (A.182) 102 Appendix A. Selecting an Appropriate Convergence Factor 103 Therefore the optimal convergence factor, obtained by setting (A.182) equal to zero, is given by _ ( W K W ) ( A 1 8 3 ) To guarantee convergence, the cost must be reduced after each iteration, that is J{ai+1) - J(ai) < 0. (A.184) The difference in cost can be expressed as J C l i + i ) - = -/?(VJ) r(VJ) + / ? 2 ( V J ) ^ ( V J ) . (A.185) Combining (A.183), (A.184), and (A.185) leads to /?(/? - 2ff) < 0. (A.186) Therefore, the convergence of the iterative sequence is guaranteed if 0 < 0 < 2/?;. (A. 187) Appendix B Lower Bound for the Optimal Convergence Factor This appendix contains the derivation of the lower bound for the optimal convergence factor used in the steepest descent method. The derivation is first done for the one-dimensional Gabor expansion using the equally-spaced Gabor lattice. The result is mod-ified to give an approximate lower bound for other types of lattices. The optimal convergence factor for the steepest descent method is given by ( W ) r ( W ) , . fi ~ (vj)'g(vj)" ( B- 1 8 8 ) The lower bound of (B.188) can be estimated because Q has a distinctive structure. Assume that the elementary functions have been normalized such that J \TJ>(t',n,k,p)\2dt = 1. (B.189) The overlap between a pair of elementary functions, (i,j,r) and (n, is given by «(*,j,r;n,fc,p) « (-l)("+0(*-i) e X p - * K " " * ) 2 + ( * ~ cos(p - r). (B.190) From (B.190), it can be seen that g() is dependent on the time and frequency separation between the two elementary functions, but not on the reference coordinate (n, k,p). Assume that the Gabor lattice extends to infinity in both time and frequency. Under this assumption all elementary functions in the Gabor lattice have the same overlap characteristics. In such a case, the lowest optimal convergence factor is given by ^ B = E ^ ^ l ^ i , r ; n , f c , P ) l ' 104 Appendix B. Lower Bound for the Optimal Convergence Factor 105 where NEF is the number of elementary functions in the lattice. Since all of the overlap characteristics are the same, the denominator of (B.191) can be rewritten as EEl«( ,'»it ritt»*»P)l = NEF^2\q(i,j,r;n,k,p)\. (B.192) nkp ijr ijr Therefore, the lower bound for the optimal convergence factor for the Gabor lattice is given by filB = £ ^ ( J r ; n , M f ( B " 1 9 3 ) Because the lattice has a finite size in any practical implementation, (B.193) is only an approximation. In a finite Gabor lattice, estimates of the lower bound are larger if the reference coordinate (n, k,p) is near the endpoints of the lattice. Equation (B.193) can be considered a local lower bound which is assigned to the update equation for the coefficient ankp. The minimum of the local lower bounds can be used as the global lower bound for the optimal convergence factor. The concept of a local lower bound can be applied to other types of lattices. The local lower bound for a one-dimensional Gabor expansion is approximated by /3lB(n, k,p)« (B.194) The local lower bound for a two-dimensional Gabor expansion is approximated by «.(-.».*•»,>)« i.Xr-Mk.W ( B - 1 9 5 ) Appendix C Partial Derivatives for Three-Dimensional Motion This appendix contains the partial derivatives of the three-dimensional motion cost func-tion with respect to the motion and depth parameters. The motion cost function is given by where Let J= ^iV(ife)2C(n,m,fc,/)e(n,m,ife,/)2, (C.196) nmk I e(n,m,k,l) = D(n,m,k,I) - V(n,m,k,I)At. (C.197) kx = Zf cos fa, (C.198) ky = Zfsin<f>h (C.199) and kz = — (x cos <f>i + y sin <f>i). (C.200) The partial derivatives of (C.196) with respect to the motion parameters are given by ST k £L = -2At £ N(k) £ Ce-±- (C.201) = -2At 52 N(k) 52 Ce-^- (C.202) C1V nmk I ' ST k = -2A* 52 m £ Ce-f- (C.203) 0 1 * nmk I Zf-Tz U = - VclL - 2At S N(k) 52 Ce[-^- - (C.204) SR. STy 8TZ nmfc , Zf + Zf 106 Appendix C. Partial Derivatives for Three-Dimensional Motion 107 6 J 8 J S J 2At^N(k)Y:Ce[^-^-) (C.205) SRy CSTZ CSTX nf* , *f zf + * 8 J =yc£-xc£-2At^N{k)^Ce[yc08<h-z*m4>l] (C.206) 6RZ 8TX 6TV n m k , ( c ' 2 0 8 ) SJ „ SJ „ SJ ,„„„„. Km*^T.-*-W,- (c-209) Since the motion parameters are constant for a given rigid body, the last three deriva-tives are linearly dependent on the translational derivatives. Thus, only six of the nine derivatives are independent. The partial derivative of (C.196) with respect to the depth z(n,m,k) is given by E j ^ J J = 2 A ( E , ^ {T.k. + T„k, + T,k, +Rx[-ky(zf + zc) + kzyc] +Ry[kx(zf + zc) - kzxc] +Rz[-kxyc + kyxc]}. (C.210)
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- The use of the Gabor expansion in computer vision systems
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
The use of the Gabor expansion in computer vision systems Braithwaite, Richard Neil 1989
pdf
Page Metadata
Item Metadata
Title | The use of the Gabor expansion in computer vision systems |
Creator |
Braithwaite, Richard Neil |
Publisher | University of British Columbia |
Date Issued | 1989 |
Description | This thesis explores the use of the Gabor expansion in computer vision systems. The exploration is performed in two parts: the properties of the Gabor expansion are reviewed, and various image segmentation algorithms whose basis is the Gabor expansion are presented. The first part comprises a review of Gabor's original expansion of a one-dimensional signal [1] and Daugman's two-dimensional generalization [2] [3]. In both cases, important properties are discussed: the optimality of the "elementary function" with respect to minimal uncertainty, and the proper spacing of these functions to ensure a unique Gabor expansion. Methods for solving the Gabor expansion are investigated. Two non-iterative methods are discussed: the "inverse of the overlap matrix", and Bastiaans' "biorthonormal projection function" [4] [5]. One iterative method, Daugman's "steepest descent" [3], is reviewed. Two new iterative methods, the "warping method" and the "locally biorthonormal projection function," are presented. The complexity and convergence properties of these new methods are intermediate to Daugman's and Bastiaans' methods. In the second part of this thesis, Gabor expansion-based segmentation algorithms are presented. The properties of these algorithms—edge detection, texture segmentation, motion segmentation, and segmentation by depth—are investigated. It is shown that "local measures", which are by-products of the segmentation algorithms, can be used to both segment images and to provide three-dimensional descriptions of viewed objects. It is concluded that the Gabor expansion is a useful preprocessing step for computer vision systems. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-27 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0064988 |
URI | http://hdl.handle.net/2429/27824 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1989_A7 B72.pdf [ 5.34MB ]
- Metadata
- JSON: 831-1.0064988.json
- JSON-LD: 831-1.0064988-ld.json
- RDF/XML (Pretty): 831-1.0064988-rdf.xml
- RDF/JSON: 831-1.0064988-rdf.json
- Turtle: 831-1.0064988-turtle.txt
- N-Triples: 831-1.0064988-rdf-ntriples.txt
- Original Record: 831-1.0064988-source.json
- Full Text
- 831-1.0064988-fulltext.txt
- Citation
- 831-1.0064988.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0064988/manifest