@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Applied Science, Faculty of"@en, "Electrical and Computer Engineering, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Braithwaite, Richard Neil"@en ; dcterms:issued "2010-08-27T16:50:38Z"@en, "1989"@en ; vivo:relatedDegree "Master of Applied Science - MASc"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """This thesis explores the use of the Gabor expansion in computer vision systems. The exploration is performed in two parts: the properties of the Gabor expansion are reviewed, and various image segmentation algorithms whose basis is the Gabor expansion are presented. The first part comprises a review of Gabor's original expansion of a one-dimensional signal [1] and Daugman's two-dimensional generalization [2] [3]. In both cases, important properties are discussed: the optimality of the "elementary function" with respect to minimal uncertainty, and the proper spacing of these functions to ensure a unique Gabor expansion. Methods for solving the Gabor expansion are investigated. Two non-iterative methods are discussed: the "inverse of the overlap matrix", and Bastiaans' "biorthonormal projection function" [4] [5]. One iterative method, Daugman's "steepest descent" [3], is reviewed. Two new iterative methods, the "warping method" and the "locally biorthonormal projection function," are presented. The complexity and convergence properties of these new methods are intermediate to Daugman's and Bastiaans' methods. In the second part of this thesis, Gabor expansion-based segmentation algorithms are presented. The properties of these algorithms—edge detection, texture segmentation, motion segmentation, and segmentation by depth—are investigated. It is shown that "local measures", which are by-products of the segmentation algorithms, can be used to both segment images and to provide three-dimensional descriptions of viewed objects. It is concluded that the Gabor expansion is a useful preprocessing step for computer vision systems."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/27824?expand=metadata"@en ; skos:note "T H E USE OF T H E GABOR EXPANSION IN COMPUTER VISION SYSTEMS. Richard Neil Braithwaite B. Sc. (Electrical Engineering) University of Calgary A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1989 © Richard Neil Braithwaite, 1989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of cL* CTK.! L- &JG / A/gg/g///^ The University of British Columbia Vancouver, Canada Date At/eZ-oST ~ZO , \\ct%c\\ DE-6 (2/88) Abstract This thesis explores the use of the Gabor expansion in computer vision systems. The exploration is performed in two parts: the properties of the Gabor expansion are re-viewed, and various image segmentation algorithms whose basis is the Gabor expansion are presented. The first part comprises a review of Gabor's original expansion of a one-dimensional signal [1] and Daugman's two-dimensional generalization [2] [3]. In both cases, impor-tant properties are discussed: the optimality of the \"elementary function\" with respect to minimal uncertainty, and the proper spacing of these functions to ensure a unique Gabor expansion. Methods for solving the Gabor expansion are investigated. Two non-iterative methods are discussed: the \"inverse of the overlap matrix\", and Bastiaans' \"biorthonormal projection function\" [4] [5]. One iterative method, Daugman's \"steepest descent\" [3], is reviewed. Two new iterative methods, the \"warping method\" and the \"locally biorthonormal projection function,\" are presented. The complexity and conver-gence properties of these new methods are intermediate to Daugman's and Bastiaans' methods. In the second part of this thesis, Gabor expansion-based segmentation algorithms are presented. The properties of these algorithms—edge detection, texture segmentation, motion segmentation, and segmentation by depth—are investigated. It is shown that \"local measures\", which are by-products of the segmentation algorithms, can be used to both segment images and to provide three-dimensional descriptions of viewed objects. It is concluded that the Gabor expansion is a useful preprocessing step for computer vision systems. ii Table of Contents Abstract ii List of Tables vi List of Figures vii Acknowledgements x 1 Introduction 1 2 One-Dimensional Gabor Expansion 4 2.1 Joint Time-Frequency Representations 4 2.2 One-Dimensional Gabor Expansion 8 3 Two-Dimensional Gabor Expansion 16 3.1 Joint Position-Frequency Representations 16 3.2 Two-Dimensional Gabor Expansion 17 4 Solving for Gabor Expansion Coefficients 25 4.1 Projection Coefficients and Expansion Coefficients .25 4.2 Direct Methods 27 4.2.1 Inverse of Overlap Matrix 28 4.2.2 Biorthonormal Projection Function 29 4.3 Iterative Least Squares Solution 31 4.3.1 Steepest Descent 32 iii 4.3.2 Warping Method 34 4.3.3 Locally Biorthonormal Projection Function 36 4.4 Suitability for Vision Analysis 39 5 Implementing a Log-Polar Lattice 41 5.1 Orientation 41 5.2 Frequency 42 5.3 Spatial Sampling Lattice 42 5.4 Special Case: Circular Elementary Functions 43 6 Local Measures 44 6.1 Local Measures for a Single Image 45 6.1.1 Local Magnitude 45 6.1.2 Local Preferred Orientation and Moment of Inertia 45 6.2 Local Measures for Two Related Images 48 6.2.1 Local Difference in Phase 49 6.2.2 Local Difference in Preferred Orientation 52 7 Forming and Characterizing Pixel Groupings 54 7.1 Forming Pixel Groupings 54 7.1.1 Edge Detection and Texture Segmentation 54 7.1.2 Motion Segmentation 57 7.1.3 Segmentation by Depth 58 7.1.4 Integrating Maps to Form Relevant Pixel Groupings 62 7.2 Characterizing Regions and Edges 64 7.2.1 Texture of a Region 64 7.2.2 Edges 65 iv 7.2.3 Three-Dimensional Position and Motion 65 8 Examples 70 8.1 Edge Detection 70 8.1.1 Position Localization 71 8.1.2 Noise Immunity 73 8.1.3 Relevant Frequencies 77 8.1.4 Local Curvature Estimation 80 8.2 Texture Segmentation 81 8.2.1 Segmentation by Orientation 81 8.2.2 Segmentation by Frequency 82 8.3 Motion Segmentation 87 8.3.1 Camouflaged Object 88 8.3.2 Aperture Problem and Insufficient Spectral Energy 90 8.3.3 Translation in Depth 92 8.4 Discussion 94 9 Conclusion 96 9.1 Summary 96 9.2 Further Research 97 References 89 A Selecting an Appropriate Convergence Factor 102 B Lower Bound for the Optimal Convergence Factor 104 C Partial Derivatives for Three-Dimensional Motion 106 v List of Tables 8.1 Motion Parameters for the Camouflaged Circle 89 8.2 Motion Parameters for Translating Circle 91 8.3 Motion Parameters for Circle: Pure Translation Assumed 92 8.4 Motion Parameters for Shrinking Circle 93 vi List of Figures 2.1 Effective Duration of Gaussian Window 5 2.2 Effective Bandwidth of Gaussian Window 6 2.3 Cosine-type Elementary Function 10 2.4 Sine-type Elementary Function 11 2.5 Gabor Sampling Lattice 13 2.6 Logarithmic Sampling Lattice 14 3.7 2D Cosine-type Elementary Function 18 3.8 2D Sine-type Elementary Function 18 3.9 Frequency and Orientation Bandwidth of an Elementary Function . . . . 20 3.10 2D Sampling Lattice 21 3.11 2D Sampling Lattice, Gabor Generalization 22 3.12 Spectral Coordinates, Log-polar Lattice 24 4.13 Window for Bastiaans' Biorthonormal Function 30 4.14 Local Neighbourhood for Gabor Lattice 36 4.15 Local Neighbourhoods for Logarithmic Sampling Lattice 37 6.16 Moment of Inertia of Masses 46 6.17 Moment of Inertia for a Single Frequency 46 7.18 Vision System with Two Viewpoints 59 7.19 Four and Eight Point Connectivity 63 8.20 Test Image 1: A Circle • • • 7 1 vii 8.21 Magnitude Map: Test Image 1, fk = 0.125 cycle/pixel 72 8.22 Magnitude Map: Test Image 1, fk = 0.25 cycle/pixel 72 8.23 Edge Detector Output: Test Image 1, fk = 0.125 cycle/pixel 73 8.24 Edge Detector Output: Test Image 1, fk = 0.25 cycle/pixel . . . . . . . . 74 8.25 Test Image 2: A Circle With Additive Noise 74 8.26 Magnitude Map: Test Image 2, fk = 0.125 cycle/pixel 75 8.27 Magnitude Map: Test Image 2, fk = 0.25 cycle/pixel 75 8.28 Edge Detector Output: Test Image 2, fk = 0.125 cycle/pixel 76 8.29 Edge Detector Output: Test Image 2, fk = 0.25 cycle/pixel 76 8.30 Test Image 3: One Frequency, Four Orientations 77 8.31 Magnitude Map: Test Image 3, fk = 0.125 cycle/pixel 78 8.32 Magnitude Map: Test Image 3, fk = 0.25 cycle/pixel 78 8.33 Edge Detector Output: Test Image 3, fk = 0.125 cycle/pixel 79 8.34 Edge Detector Output: Test Image 3, fk = 0.25 cycle/pixel 79 8.35 Test Image 4: A Rectangle 80 8.36 Edge Detector Output: Test Image 4 81 8.37 Test Image 5: One Frequency, Two Orientations 82 8.38 Magnitude Map: Test Image 5, fk = 0.125 cycle/pixel 83 8.39 Magnitude Map: Test Image 5, fk = 0.25 cycle/pixel 83 8.40 Texture Segmentation Output: Test Image 5, fk = 0.125 84 8.41 Texture Segmentation Output: Test Image 5, /* = 0.25 84 8.42 Test Image 6: Two Frequencies, One Orientation 85 8.43 Magnitude Map: Test Image 6, fk = 0.125 cycle/pixel 85 8.44 Magnitude Map: Test Image 6, fk = 0.25 cycle/pixel 86 8.45 Texture Segmentation Output: Image 6, fk = 0.125 86 8.46 Texture Segmentation Output: Image 6, fk = 0.25 87 viii 8.47 Test Image 7: Camouflaged Circle 88 8.48 Motion Segmentation Output: Test Image 7 89 8.49 Motion Segmentation Output: Translating Circle 91 8.50 Motion Segmentation Output: Shrinking Circle 93 ix Acknowledgements I would like to thank Dr. Beddoes, my supervisor, for his continual support. A special thanks to my brother, Murray Braithwaite, who displayed skill and endurance while editing early revisions of this thesis. I would also like to thank Michele, my wife, for her love and support during the past two years of my studies. This work has been partially funded by the Natural Sciences and Engineering Research Council and by the Advanced Systems Foundation through postgraduate scholarships. x Chapter 1 Introduction In 1946, Denis Gabor [1] presented a new signal representation that describes a one-dimensional signal in terms of two variables: time and frequency. This representation (referred to as the \"one-dimensional Gabor expansion\") is suitable for analyzing aperi-odic signals such as sound. Gabor's motivation for designing this joint time-frequency representation was to analyze the human auditory system. More recently, Daugman [2] [3] generalized the Gabor expansion for analysis of two-dimensional signals. This \"two-dimensional Gabor expansion\" is believed to have a form similar to that of the early stages of human vision [2] [6] [7]. Such a claim suggests that the two-dimensional Gabor expansion could be a useful preprocessing step in a computer vision system. A computer vision system extracts, organizes, and interprets information obtainable from a sequence of digitized images. The images are presented to the vision system as matrices of pixel intensities. Each pixel has a two-dimensional spatial position and its intensity varies with time. No existing computer vision system can directly interpret an undifferentiated sequence of pixel intensities with much success; the vision task must be structured into a hierarchy of stages. The initial stage of computer vision analysis consists of forming simple, but useful, spatial and temporal pixel groupings such as edges, regions, and flow vectors. These pixel groupings are characterized by various measures such as position, preferred (spectral) orientation, and preferred frequency. For a vision system analyzing a sequence of images, 1 Chapter 1. Introduction 2 the detected changes in these measures over time provide useful information. If binocular (stereo) vision is available, changes of these measures induced by a change of viewpoint can also be utilized in the subsequent stage of vision analysis. The second stage of computer vision analysis consists of assigning three-dimensional properties to the newly-formed pixel groupings. These three-dimensional properties must be inferred because an image is a two-dimensional projection which under-determines a three-dimensional object. The inference process requires additional information such as knowledge of the imaging geometry (viewpoint constraints) and knowledge of phys-ical constraints on objects being viewed. The constraint information is applied to the results of the initial stage analysis, namely, the characterization of simple pixel group-ings by various measures, to perform the task of inferring three-dimensional properties. Three-dimensional properties of interest include boundaries, surface slant and tilt, rela-tive depth, and rotational and translational motion. In a general vision system, various techniques of forming and characterizing pixel groupings are integrated. These techniques include edge detection [8] [9], texture seg-mentation [9] [7] [11], motion segmentation [12] [13] [14] [15] [16], and segmentation by depth [17]. However, most existing implementations of these techniques have been de-veloped in isolation, without thought of integration. Each technique preprocesses pixel intensities differently. The complexity of a general vision system is significantly reduced if all techniques employ a common preprocessing algorithm. This thesis proposes that the two-dimensional Gabor expansion be used as the common preprocessing algorithm for integrated vision systems. A review of the Gabor expansion comprises chapters 2 through 4. Chapter 2 presents the Gabor expansion in its original one-dimensional form, introducing important concepts and definitions that are also used in the two-dimensional Gabor expansion. The two-dimensional Gabor expansion, as developed by Daugman [2] [3], is re-stated and expanded Chapter 1. Introduction 3 for use with visual data in chapter 3. Chapter 4 discusses various methods of solving the Gabor expansion and the suitability of each method for computer vision. The remaining chapters (5 through 8) are devoted to the Gabor expansion-based segmentation algorithms. Chapter 5 outlines the implementation of the log-polar form of the two-dimensional Gabor expansion. The log-polar form is amenable to segmen-tation algorithms because it represents the image in terms of useful local parameters (position, frequency, orientation). In chapter 6, the coefficients of the Gabor expansion are combined to produce local measures. The segmentation algorithms, which use local measures to form and characterize pixel groupings, are introduced in chapter 7. Chapter 8 illustrates the properties of these segmentation algorithms using simple test images. Chapter 2 One-Dimensional Gabor Expansion The Gabor expansion was originally applied to analyze one-dimensional signals, such as sound [1]. The one-dimensional Gabor expansion represents a signal as a sequence of frequency (pitch) elements of short duration occurring at specified time instants. This representation is resembles a musical score. The mathematics of one-dimensional signal analysis are presented in this chapter, setting the foundation for later generalization to two-dimensional analysis in the vision context. 2.1 Joint Time-Frequency Representations The one-dimensional Gabor expansion is a type of joint time-frequency representation. A joint time-frequency representation describes a one-dimensional signal, $(£), by two variables: time and frequency. The representation emphasizes the time-varying nature of the signal's frequency spectrum. In order to form a joint time-frequency representation, the frequency spectra over a series of short time intervals are required. The duration of each short time interval is defined by a \"window function.\" A window function comprises a time duration and a bandwidth. The time duration is the standard deviation of the window function in the time domain (the \"effective time duration\"). The bandwidth is the standard deviation of the window function in the frequency domain (the \"effective bandwidth\"). The effective time duration and the effective bandwidth for a Gaussian window are shown in figures 2.1 and 2.2, respectively. 4 Chapter 2. One-Dimensional Gabor Expansion 5 Figure 2.1: Effective Duration of Gaussian Window Chapter 2. One-Dimensional Gabor Expansion 6 Figure 2.2: Effective Bandwidth of Gaussian Window Chapter 2. One-Dimensional Gabor Expansion 7 The effective time duration and the effective bandwidth are denoted by the width of the rectangular window included in their respective figures. Either the effective time duration or effective bandwidth can be used to define the effective size of the window function. This effective size will be referred to as the \"scale\" of the window function. In this chapter, any quantitative reference to scale is made using the time domain definition (the effective time duration). The actual size of the window function is referred to as the \"support width.\" The sup-port width is defined in the time domain as the minimum time epoch that encompasses the non-zero portion of the window function (or other type of function). The support width for a Gaussian window is infinite in extent. In a computer implementation, the Gaussian window is truncated, thereby reducing the support width. If the window func-tion is over-zealously truncated, certain properties of the Gaussian waveform are lost (ie. minimum joint uncertainty, see section 2.2). On the other hand, an excessively large support width needlessly burdens the computer with insignificant calculations. The window function is used to extract a portion of the signal that is localized (con-centrated) in time by the effective duration and in frequency by the effective bandwidth. The window function must be both time-shifted and frequency-modulated in order to extract the signal energy that is concentrated about a specified time-frequency-phase co-ordinate (tn,fk,p). The tn is the time shift which defines the temporal coordinate of the new function. The fk is the modulation frequency which defines the spectral coordinate. The p is the phase of the modulation frequency relative to the coordinate tn. To extract the energy localized about (tn, /fc,p), the continuous inner product of the signal and the time-shifted, frequency-modulated window function is calculated. The inner product is a projection of the time-shifted, frequency-modulated window function onto the signal. This time-shifted, frequency-modulated window function will be re-ferred to as a \"projection function.\" The value obtained from the inner product is the Chapter 2. One-Dimensional Gabor Expansion 8 \"projection coefficient.\" As mentioned, the joint time-frequency representation describes a one-dimensional signal by two variables. A continuous form of a joint time-frequency representation over-determines the signal s(t). Therefore, the continuous joint time-frequency representation can be sampled without loss of information [5]. H the signal s(t) has a bandwidth of B Hz, the sample points in time and frequency must be chosen such that the overall sampling rate is IB independent measurements per second. If properly sampled, the set of projection coefficients contains sufficient information to reconstruct the signal s(t). The reconstructed signal is given by s(t) = £ a (w*>p)^(*;W*»iO» (2.i) nkp where E F Q is an elementary function centered at (tn,fk,p) and a() is the associated weighting coefficient. H the projection coefficient is used as a weighting coefficient, an elementary function must be chosen such that any overlap (in time or frequency) between adjacent projection functions is compensated; failure to compensate for overlap will in-troduce artifacts into the reconstructed signal. Thus, the elementary function must be biorthonormal to the set of projection functions. (Biorthonormal means orthonormal in both the time and frequency domains.) An alternative approach is to first select the elementary function, then design an appropriate biorthonormal projection function. In this approach, which is used in the Gabor expansion, the weighting coefficients a() are referred to as \"expansion coefficients.\" 2.2 One-Dimensional Gabor Expansion Gabor [1] proposed that time-shifted, frequency-modulated Gaussian waveforms could be used as elementary functions in signal reconstruction. Gabor's elementary functions, which are denoted by ^ »(), are a special case of the elementary functions E F Q denned in Chapter 2. One-Dimensional Gabor Expansion 9 (2.1). For the remainder of this chapter, any reference to elementary function will mean Gabor's elementary function. The elementary functions in the Gabor expansion occur in quadrature pairs. This thesis selects cosine-type and sine-type elementary functions, which are given by (2.2) and (2.3), respectively. i>(t;n,k,0) = gn(t) cos 6nk(t) (2.2) i>{t; n, k, | ) = <7n(i) sin 9nk{t) (2.3) where ^ ) = e x ' - * W ( 2 ' 4 ) enk(t) = 2xfk(t-tn) (2.5) The two types of elementary functions are shown in figures 2.3 and 2.4. The elementary functions are centered at time tn and frequency /«,. The envelope of the elementary function has the same Gaussian shape in both the time and frequency domains. The time domain envelope is given by (2.4). The frequency domain envelope is given by Gfc(/) = exp-^!^M (2.6) The standard deviation of the Gaussian envelope in the time and frequency domain define the effective duration, At, and the effective bandwidth, A / , of the elementary function, respectively. The effective duration and the effective bandwidth are measures of time and frequency localization, respectively. Localization is related to the minimum separation for which two independent signal features can be resolved. To resolve two features that have a small separation in time, the elementary functions must have a short effective duration. Chapter 2. One-Dimensional Gabor Expansion 10 Figure 2.3: Cosine-type Elementary Function Chapter 2. One-Dimensional Gabor Expansion 11 Figure 2.4: Sine-type Elementary Function Chapter 2. One-Dimensional Gabor Expansion 12 Similarly, fine resolution in frequency requires the elementary function to have a small effective bandwidth. Arbitrary resolution cannot be achieved in both the time and frequency domains simultaneously. The joint time-frequency localization of an elementary function is subject to the Heisenberg uncertainty product, which can be expressed as AtA/ > 1. (2.7) The Gaussian waveform minimizes the Heisenberg uncertainty product such that (2.7) is an equality [1]. Since (2.7) is not affected by time-shift or frequency-modulation, all of Gabor's elementary functions possess the minimal uncertainty property. Minimal uncertainty means the Gabor elementary function optimally trades off the resolution in each domain. The Gabor expansion of a one-dimensional signal is given by s{t) = '£ankpr/>{t;tn,fk,p). (2.8) nkp The Gabor expansion coefficients, a^p, form a discrete set of empirical measurements that describes the signal s(t). The Gabor expansion is a sampling process defined by the center coordinates of the elementary functions (ie. tn,fk). The center coordinates for the set of elementary functions form a sampling lattice that covers the joint time-frequency space. The sampling lattice suggested by Gabor [1] is shown in figure 2.5. A sine and cosine pair of elementary functions are centered at each point in the lattice. The lattice points are evenly spaced in the time and frequency domains. The time and frequency sampling intervals are given by (2.9) and (2.10), respectively. AT = tn — *n_! = 20-5 Ai (2.9) Chapter 2. One-Dimensional Gabor Expansion 13 a(l,3) a(2,3) a(3,3) a(l,2) a(2,2) a(3,2) a(l,l) a(2,l) a(3,l) |*-AT-H T AF 1 Figure 2.5: Gabor Sampling Lattice (2.10) AF = f k - f k . 1 = 2°-5Af The sampling intervals, AT and AF, are determined by the effective duration, At, and the effective bandwidth, A / , respectively. The sampling area, ATAF, is twice the Heisenberg uncertainty product. The increase in area is required because each lattice point (tn,fk) is assigned two measurements: a sine and cosine expansion coefficient. The coefficients associated with the Gabor sampling lattice provide a \"complete\" de-scription of the signal s(i), containing enough information to reconstruct the original signal. In addition, each expansion coefficient in this lattice is an independent measure-ment. Since all the expansion coefficients are independent, the solution to the Gabor expansion (ie. ankP) is unique. The Gabor lattice defines equally-spaced frequency channels, making it useful for describing harmonic signals such as music or vibrations. As a consequence of this har-monic frequency spacing, the temporal sampling density is the same for each frequency channel. For most signals, the lower frequency channels encode coarse signal variations; Chapter 2. One-Dimensional Gabor Expansion 14 cn cn CS cn cn cn cn cn cn cn oo\" cn w :io,3) cn i-H 1—4 N • C3 a(l,2) a(2,2) a(3,2) a(l,D a(4,2) a(2,l) a(5,2) Figure 2.6: Logarithmic Sampling Lattice the higher frequency channels encode finer detail. A lattice that has a higher temporal sampling density for the higher frequency channels is better suited than the Gabor lattice for representing these non-harmonic signals. The logarithmic lattice (shown in figure 2.6) is one such lattice that has a variable temporal sampling density. For this lattice, the scale of the Gaussian envelope is a function of the modulation frequency The time domain definition of scale is given by h (2.11) where A is a constant. This restriction on the scale of the Gaussian envelope makes the logarithmic form of the Gabor expansion a \"multi-scale\" representation. The time and frequency intervals defined by (2.9) and (2.10) can be used for the Chapter 2. One-Dimensional Gabor Expansion 15 logarithmic sampling lattice. The sampling intervals are given by (Ar) f e=2 0 - 5 (Ar) f e = 2°- 5A (2.12) /fc and (AF)k = 2°-5(A/) f c = 2 0- 5^. (2.13) The temporal sampling density increases with frequency. The overall sampling density in the joint time-frequency space (ie. AT^Ai^) remains constant. Chapter 3 Two-Dimensional Gabor Expansion This chapter introduces the two-dimensional generalization of the Gabor expansion. The two-dimensional Gabor expansion is a type of joint position-frequency representation, typically used to analyze images. Images represented by the \"log-polar\" form of the Gabor expansion are used in the segmentation algorithms of chapter 7. 3.1 Joint Position-Frequency Representations The joint position-frequency representation is a generalization of the time-frequency rep-resentation, describing an image I(x, y) by two spatial position variables and two spatial frequency (spectral) variables. The window function used in the joint position-frequency representation comprises an effective spatial extent and an effective bandwidth in each of the x and y directions. The effective spatial extents and effective bandwidths are defined by the standard deviation of the window function in their respective domain and direction. The projection function is a position-shifted and frequency-modulated window func-tion. A projection coefficient is an estimate of the of the signal energy concentrated about the position-frequency-phase coordinate (s, y, fx, fy, p). The frequency coordinates fx and fy can be represented by a modulation frequency / and an orientation . The modulation frequency is given by / = Ul + fl?*- (3-14) 16 Chapter 3. Two-Dimensional Gabor Expansion 17 The orientation of the modulating wave relative to the x axis is given by = arctan ^jr. (3.15) fx The set of projection coefficients can be used to reconstruct the image I(x,y). The reconstructed image is given by I(x,y)= X , a(xn,ym,fk,hP)EF(x,y;xn,ym,fk,(z, y; n, m, k, /, 0) = gnm{&, y) cos 0nmW(x, y) (3.17) 7T V>(a:,3/;n,m,fc,f,-) = gnm(x,y) sin $nmki(x,y) (3.18) where (x,y) = (x cos 7e + y sinje, —x smje + y cos *ye) (3.19) TT (x - x n ) 2 (y -ym)2, 9nm(*,y) = e x P \" 2 t - ( A x F ~ + ~{Kyy-] ( 3 ' 2 0 ) Onmkl = 27T[/X(fc)(x - Xn) + /y(J)( j / - J/m)] (3.21) The elementary functions appear in figures 3.7 and 3.8. The two-dimensional elemen-tary function has its spatial centroid located at (xn,j/m) and spectral centroid located at Figure 3.8: 2D Sine-type Elementary Function Chapter 3. Two-Dimensional Gabor Expansion 19 (/*(*)>/*(!))• The elliptical Gaussian envelope of the elementary function can be rotated using (3.19) such that the orientation of the major axis relative to the s-axis is given by The two-dimensional elementary function exhibits the same minimal uncertainty as its one-dimensional counterpart. The uncertainty constraint for the two-dimensional case is given by AxAyAfxAfy > i . (3.22) Ax and Ay define the effective spatial extent of the elliptical Gaussian envelope in the x and y directions. Afx and Afy define the effective bandwidth in each direction. Equation (3.22) assumes that the x- and y-axes correspond to the principal axes of the elliptical Gaussian waveform (ie. je = 0 or fe = ^). If this is not the case, the rotated axes (x,y) denned by (3.19) should be used. The effective spatial extent of the elliptical Gaussian envelope can be described by the scale and the aspect ratio. The scale is given by f Ai if Ax > Ay As = l (3.23) [ Ay otherwise. As is the effective spatial extent of the Gaussian cross-section measured along the major axis of the ellipse. The aspect ratio is given by a = < | j if Ax > Ay 44 otherwise. Ay (3.24) The aspect ratio is less than or equal to unity. If one of the principal axes of the elliptical envelope has the same orientation as the modulating wave (ie. *ye = i or *fe = 4>i + f)* effective bandwidths can be described Chapter 3. Two-Dhnensional Gabor Expansion 20 Half-power Contour Figure 3.9: Frequency and Orientation Bandwidth of an Elementary Function (3.25) by a frequency and orientation bandwidth. The frequency bandwidth is given by i f 7 e = ^ + f. The orientation bandwidth is given by 2S?A7 tf7« = 6 WAS »*7. = 6 + i -The frequency and orientation bandwidths of a two-dimensional elementary function are shown in figure 3.9. A= i (3.26) The two-dimensional Gabor expansion is given by I(x,y)= 52 OmmkiPip{x,y\\n,m,k,l,p). nmklp (3.27) The sampling lattice for the two-dimensional Gabor expansion is illustrated in figure 3.10. Each spectral coordinate (fx,fy) has a corresponding spatial (x-y) sampling lattice. Chapter 3. Two-Dimensional Gabor Expansion 21 fx Figure 3.10: 2D Sampling Lattice The spacing between adjacent elementary functions for the two-dimensional general-ization of Gabor's sampling lattice is given by and AX = 2°-5Ax, A F = 2°-5Ay, AFX = 2°*Afx, AFy = 2°- 5A/ y. (3.28) (3.29) (3.30) (3.31) The above spacing is valid for two-dimensional elementary functions that are separable in x and y. If the elementary functions are not separable in x and y, then a sampling lattice based on the rotated axes (x,y) is used. In the two-dimensional Gabor lattice, Ax, Ay, Afx, and Afy are the same for each elementary function. Therefore the spatial sampling intervals (AX, AY\") are identical Chapter 3. Two-Dimensional Gabor Expansion 22 fy T AY ± T AFy 1 •AX-H Spatial Domain H-AFx-H Frequency Domain fx Figure 3.11: 2D Sampling Lattice, Gabor Generalization for each spectral coordinate ( / x , / v ) . A generic spatial sampling lattice can be formed if a common spatial origin is selected. This generic spatial lattice is shown in figure 3.11 along with the spatial frequency (spectral) lattice. In the segmentation algorithms of chapter 7, local (spectral) orientation characteris-tics from different frequency channels are compared. To obtain a local orientation, the spectral coordinates for the Gabor lattice must be transformed from Cartesian to polar form. Even after the coordinate transformation, it is difficult to compare local orienta-tions because the orientation bandwidth of an elementary function in the Gabor lattice decreases with frequency. An alternative sampling scheme is the log-polar lattice. The log-polar lattice is amenable to image segmentation because its spectral representation uses frequency and orientation as basis parameters. The comparison of local orienta-tions is simplified because all elementary functions in the log-polar lattice have the same Chapter 3. Two-Dimensional Gabor Expansion 23 orientation bandwidth. For the log-polar lattice, the scale and orientation of the Gaussian envelope are re-stricted. The scale of the elliptical Gaussian envelope is a given by (As)k = 4- (3-32) The (spatial) orientation of the major axis of the ellipse is normal to the (spectral) orientation of the elementary function (ie. je = i + )^. Because of these restrictions, the log-polar lattice provides a multi-scale image representation characterized by a spatial sampling density that increases with frequency. The sampling intervals for the log-polar lattice are denned as o0.5 \\ (AX)kl = ±-A (3.33) fk (AY)kl = L - ^ , (3.34) fk - J t y <3'35> and A * = 5 ^ . (3-36) (AX)u and (AY)u are the spatial sampling intervals for the rotated axes. Rotated axes are not required if the Gaussian envelope of the elementary function is circular (ie. a = 1). In such a case, the spatial sampling intervals are dependent on the frequency but not the orientation of the spectral coordinate. The spectral coordinates for the log-polar lattice are shown in figure 3.12. The orientation spacing is independent of the modulating frequency fk; the frequency spacing is independent of the orientation fa. Chapter 3. Two-Dimensional Gabor Expansion 24 Figure 3.12: Spectral Coordinates, Log-polar Lattice Chapter 4 Solving for Gabor Expansion Coefficients This chapter introduces methods of solving the Gabor expansion for a one-dimensional signal s(t). These methods are generalized into two-dimensions for vision analysis. In both cases, the methods are complicated by the non-zero overlap between elementary functions. Because the elementary functions are non-orthogonal, the nature of a given elementary function's overlap with all other elementary functions becomes important. There are two categories of methods: direct and iterative. The direct methods are global, compensating for the overlap of the elementary functions in a single set of cal-culations. The iterative methods are local; they compensate for the overlap of adjacent elementary functions by using repeated local error calculations. The direct methods pro-vide the actual expansion coefficients at their completion. The iterative methods provide an approximate solution after each iteration—supplying the actual coefficients in the limit, after the solution sequence has converged. All of the methods discussed in this chapter require a set of projection functions whose coefficients are used to calculate the coefficients of the Gabor expansion. Before proceeding to the methods of solving the Gabor expansion, it is necessary to establish a relationship between the projection coefficients and expansion coefficients. 4.1 Projection Coefficients and Expansion Coefficients This section establishes a relationship between an arbitrary, but complete set of projec-tion coefficients and the Gabor expansion coefficients. The relationship introduces the 25 Chapter 4. Solving for Gabor Expansion Coefficients 26 overlap matrix which plays an important role in determining the coefficients of the Gabor expansion. A projection coefficient for the signal s(t) is obtained using = J 8{t)PF{t\\ti,fj,?r)&, (4.37) where PF() is the projection function. If s(t) is replaced by the Gabor expansion s(t) = ] T a n f e p V ( M n > fkjPp), (4.38) nfcp then (4.37) becomes = j^\"r*P1>{t\\n,k,p)\\PF{t^j,r)dt. (4.39) nfcp Since anL,p are constants for a given signal s(t), (4.39) can be rewritten as djr = y£dankp (ip(t;n,k,p)PF(t;i,j,r)dt. (4.40) nfcp J The integral in (4.40) is a continuous inner product that defines the overlap between the elementary function V>() and the projection function PF(). This overlap, denoted by q(), is given by q(i,j,r;n,k,p) = J\\l>(t;n,k,p)PF(t;i,j,r)dt. (4.41) Thus, (4.40) can be expressed as Cijr = ^ OrApqfaj, r; n, k,p). (4.42) nfcp Equation (4.42) captures the relationship between the projection coefficients and the Gabor expansion coefficients. This relationship between the projection coefficients and expansion coefficients can be expressed in matrix form. Expressed as a vector, the set of expansion coefficients is given by a = [o(l) • • • a(indexg) • • •]T, (4.43) Chapter 4. Solving for Gabor Expansion Coefficients 27 where indexg = n + k0fftet(k + p 0//«et * p), (4.44) and a() is the expansion coefficient denned in (4.38). The vector of projection coefficients is similarly defined as c = [c(l) • • • c(indexp) • • -f, (4.45) where indexp = i + joff.et{j + roii„et * r), (4.46) and c() is the projection coefficient defined by (4.39). The matrix form of (4.42) is c = Qa, (4.47) where Q = 9(1,1) q(l,indexg) q(indexp,l) ••• q(indexp, index g) (4.48) and q() is the overlap defined by (4.41). The overlap matrix Q can be defined for two-dimensional signals by modifying indexp and indexg to include the additional variables. 4.2 Direct Methods The Gabor expansion coefficients can be obtained directly using the inverse of the overlap matrix or the biorthonormal projection function. Chapter 4. Solving for Gabor Expansion Coefficients 28 4.2.1 Inverse of Overlap Matrix The Gabor expansion coefficients can be obtained from the projection coefficients. Pre-multiplying (4.47) by Q~ L gives a = Q~lc. (4.49) This method assumes that Q' 1 exists. This inverse exists if Q is square and has full rank. The matrix is square if the number of points in the projection lattice and expansion lattice axe equal. The matrix has a full rank if each elementary function in the expansion lattice is an independent measurement of the signal. A common image processing technique is to filter the the signal with a set of bandpass niters. A bandpass filter produces a projection coefficient at each pixel position, thereby oversampling the image. An oversampled set of projection coefficients can be used to obtain the Gabor expansion coefficients if (4.47) is premultiplied by [Q TQ]~ 1Q T instead of Q~ L. [Q TQ]~ 1Q T is used because Q is not a square matrix when the projection lattice is oversampled. The Gabor expansion coefficients can be obtained from the oversampled set of pro-jection coefficients using a = [CfQY^c. (4.50) The inverse of Q TQ exists if each elementary function in the expansion lattice is an independent measurement. (Note that this is also a necessary condition for the existence of Q' 1). Thus, if the expansion lattice is correctly sampled, the projection lattice can be oversampled. Note that (4.50) can be rewritten as a = QJ^CN, (4.51) Chapter 4. Solving for Gabor Expansion Coefficients 29 where QN = Q TQ (4.52) and (4.53) Premultiplying by Q T effectively creates a new set of projection functions given by The lattice defined by the centers of the new projection functions no longer oversamples requiring equivalent projection and expansion lattices can be generalized to include over-sampled projection lattices by applying the transformation defined in (4.54). Thus, the iterative methods presented in later sections of this chapter can assume equivalent lattices without loss of generality. The inverse of the overlap matrix method is only useful for small-order signals. For large-order signals such as images, the amount of memory required to store the elements of Q' 1 is prohibitively large. A 256 x 256 pixel image has a 65,536 x 65,536 overlap matrix if the set of elementary functions is complete. It is presently impractical to store the four billion elements. 4.2.2 Biorthonormal Projection Function One method of circumventing the prohibitive calculation and storage costs of determining the inverse is to select a set of projection functions that are biorthonormal to the set of expansion functions. The biorthonormal relation is denned as (4.54) the image. This lattice contains the same points as the expansion lattice. Any method 1 if n = i, k = j, and p = r (4.55) 0 otherwise. Chapter 4. Solving for Gabor Expansion Coefficients 30 co E E co 1 -0.5 -0 --0.5 --1 -I 1 1 I I 1 I - I I I I I I I I I I I I I I \" \" • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 4.13: Window for Bastiaans' Biorthonormal Function If (4.55) is satisfied, the overlap matrix Q equals the identity matrix. Bastiaans [4] obtained an expression for PF(t;n,k,p) for the one-dimensional case. Bastiaans' expression uses the sampling lattice suggested by Gabor [1]. The biorthonor-mal projection function is given by PF(t; n, fc, p) = f(t + nAT) cos(^^ - 2nkn - p), (4.56) where = ( 2 ^ T ) ° ' 5 ( : ^ ) \" 1 \" 5 e ^ i r ) 2 ] £ exph7r(i + °'5)2] (4'57) i+0.5>^ and K„ = 1.85407468. The window function (4.57) is shown in figure 4.13. Porat and Chapter 4. Solving for Gabor Expansion Coefficients 31 Zeevi [7] provided a generalization of Bastiaans' window function for the two-dimensional Gabor lattice. For this type of lattice, the biorthonormal window function is given by Using a biorthonormal projection function to determine the Gabor expansion coeffi-cient has limitations. The window function 7(1) has an infinite support width; a truncated lattices require a different window function for each point on the sampling lattice. 4.3 Iterative Least Squares Solution The Gabor expansion can be solved using an iterative least square approach. This section outlines iterative methods for solving the one dimensional Gabor expansion. The methods can be extended into two dimensions by including the additional variables. The least square estimate of a^p minimizes the following cost function: (4.58) window must be used in practical implementations, thereby introducing errors. The win-dow function presented above is only valid for Gabor sampling lattices; irregular sampling (4.59) where (4.60) nfcp The derivative of the cost function is given by (4.61) where *7a(n,fc,p) — SJ (4.62) Chapter 4. Solving for Gabor Expansion Coefficients 32 The matrix form of (4.59) and (4.61) are given by (4.63) and (4.64), respectively. J = ±-Ql -^0, + const (4.63) (VJ) = Qa-c (4.64) where (VJ) = [£(!,!,!) . . . Ja(N,K,P)]T (4.65) The cost function is minimized when the gradient (VJ) is zero. The gradient of the cost function is equal to the projection of the Gabor waveform onto the error e(t). The Gabor waveform, in this case, is used as both the projection and elementary functions. The resulting overlap matrix Q is equivalent to the Gram matrix [19] since both are defined by the continuous inner product of pairs of elementary functions. The overlap matrix is also equivalent to the Hessian matrix [20] since the pair-wise overlaps of the elementary functions are the same as the second partial derivatives of the cost function (4.59). 4.3.1 Steepest Descent Daugman [3] proposed a neural network configuration to solve for the coefficients of the Gabor expansion. The proposed configuration is equivalent to the steepest descent method used in non-linear optimization [20]. In this thesis, Daugman's approach will be refer to as the \"steepest descent method.\" An inverse calculation is not required for the steepest descent method. The steepest descent method uses local error information, provided by the gradient, to update the expansion coefficient estimates. The expansion coefficient update equation is given by a,+ 1 = a,-/?(VJ), (4.66) Chapter 4. Solving for Gabor Expansion Coefficients 33 where /? is the convergence factor. If the convergence factor is properly chosen, the coefficient estimates will improve with each iteration, providing the solution in the limit. The rate of convergence is controlled by the convergence factor /?. The optimal convergence factor (3* at iteration i is given by [20] (VJ) r(VJ) r 4 f i 7 , = ( V J ^ v j ) - ( 4-6 7 ) Since the optimal convergence factor is dependent on the error gradient, it must be re-computed for each iteration. Excessive computational and storage requirements asso-ciated with the overlap matrix Q make (4.67) impractical for large-order signals such as images. An alternative method is to use a constant /?. The constant must be small enough to guarantee convergence. The sequence will converge if the constant pi is chosen such that 0 < 0 < (4.68) The derivation of (4.67) and (4.68) can be found in appendix A. The constraint (4.68) requires knowledge of /?,*. Fortunately, the lower bound for /?,* can be estimated once the sampling lattice is defined. A more conservative version of (4.68) is given by 0 < /? < 2(3*LB < 2ft, (4.69) where is the lower bound of /?,*. Once the lower bound is determined, the constant /? should be chosen such that (3lB<(3<2/3iB. (4.70) A good choice for /? is near the lower bound (3LB. The convergence factor denned by (4.67) is global, thus all coefficient estimates are updated using the same /?. It is possible to assign a local convergence factor to each Chapter 4. Solving for Gabor Expansion Coefficients 34 point in the sampling lattice. The lower bound for the optimal convergence factor at the lattice point (n,ktp) is approximated by zZijr |g(*>j,r;7i,«,j>)l The derivation of (4.71) can be found in appendix B. The minimum from a set of PiB(n,k,p) can be used as the global lower bound PLB' 4.3.2 Warping Method The warping method is a variant of the steepest descent method, modeled after quasi-Newton methods used in non-linear optimization [20]. In a quasi-Newton implementation, an approximation to the inverse of the Hessian (overlap) matrix is used to improve the convergence of the iterative sequence. In the warping method, an approximation to the inverse of the overlap matrix is used to partially compensate for the overlap of neighbour-ing elementary functions. This approximation is referred to as the \"warping matrix.\" The better the approximation, the greater the improvement in the rate of convergence over the steepest descent method. The rate of convergence for the steepest descent method is limited by the eigenvalue spread of the overlap (Hessian) matrix [20]. If the spread is large, many iterations are required to reach the solution. The convergence can be improved by warping the error gradient (VJ). The new rate of convergence depends on the eigenvalue spread of the product WQ where W is the warping matrix. The warping matrix premultiplies the update term as seen in (4.72). Oi+i = Oi - /3W{S7J) (4.72) Ideally the warping matrix should equal Q'1, making all eigenvalues equal. Unfortunately, the matrix Q requires too much memory for large-order signals such as images. Instead of using a full matrix, a reduced-order overlap matrix QnkP may Chapter 4. Solving for Gabor Expansion Coefficients 35 be used. Qnkp describes the overlap characteristics for a local neighbourhood centered about the lattice point (n, k,p). The local neighbourhood shown in figure 4.14 contains the elementary function (n,k,p) and its temporal and spectral neighbours. Once the local neighbourhood is defined, the warping matrix is given by Wnkp = Q~1P. (4.73) The warping vector for the Gabor expansion coefficient ankp is given by Wnkp = fnkpWnkp, (4.74) where fnrtp is an operator that selects the row associated with ani,p from the matrix WnkP' The coefficient update equation becomes Onkp = Gnfcp - /5w n r . p (VJ) n fcp, (4-75) where (VJ)nJ t P is the gradient of the Gabor expansion coefficients contained in the local neighbourhood centered at (n,fc,p). A warping vector wnkp must be calculated for each point in the sampling lattice. If the size of the warping vector is denoted by N W , the memory required to store the set of warping vectors is NWNEF- The memory requirements are much less than the full matrix Q if N W « NEF- The memory requirements are further reduced if most of the local neighbourhoods have the same overlap characteristics. In this case, most of the points in the sampling lattice will have the same warping vector. A local neighbourhood for a one-dimensional Gabor lattice is shown in figure 4.14. With the exception of the endpoints in the lattice, all local neighbourhoods have the same overlap characteristics. Thus, only one warping vector is required for the Gabor sampling lattice. The number of unique neighbourhoods for a logarithmic lattice depends on the neighbourhood size and the frequency spacing. One warping vector is required Chapter 4. Solving for Gabor Expansion Coefficients 36 Reference Gabor Function Neighbouring vim Gabor Function Figure 4.14: Local Neighbourhood for Gabor Lattice if the local neighbourhood is limited to temporal neighbours. Two warping vectors, one for each local neighbourhood, are required for the logarithmic sampling lattice shown in figure 4.15. 4.3.3 Locally Biorthonormal Projection Function The update term in (4.75) can be expressed as tOnfcP(VJ)nJtp = - / e(t)[J2WnfcP(», j,r)ij>(t\\ r)]dt. (4.76) The new projection function created by the warping vector is given by PF(t;n,k,p) = Ylw^kp{i,j,r)if}(t]i,j,r). tjr (4.77) This projection function is orthonormal to all elementary functions in the local neigh-bourhood and is referred to as the \"locally biorthonormal projection function.\" Chapter 4. Solving for Gabor Expansion Coefficients 37 Figure 4.15: Local Neighbourhoods for Logarithmic Sampling Lattice The support width of the projection function defined by (4.77) is larger than the support width of the elementary function. In a practical implementation, the locally biorthonormal projection function is truncated. Because the locally biorthonormal pro-jection function is used in an iterative implementation, the error introduced by truncation affects the convergence properties, not the final estimate of the Gabor expansion coeffi-cients. The remainder of this subsection outlines a method for calculating the optimal truncated projection function for a given support width. Assume the projection and elementary functions are normalized such that J i{>(t;n,k,p)PF(t;n,k,p)dt = 1. (4.78) The normalization makes the diagonal elements of the overlap matrix Q unity. The normalized overlap matrix Q can be written as Q = I + V, (4.79) Chapter 4. Solving for Gabor Expansion Coefficients 38 where V is a matrix containing all the off-diagonal elements of Q. Note that [I+V][I-V] =1 -V2. (4.80) If all the elements of V are small (ie. much less than one), then / - V2 « / (4.81) and Q-1 KI-V = 21-Q. (4.82) Using (4.82), an estimate of the locally biorthonormal projection function is recursively defined by PF,+1(t;n,k,p) = 2PFl(t;n,k1p)-^q{iJ,W (4.83) t j r Since (4.83) is a recursive expression, it is necessary to select an initial set of projection functions. For convenience, the elementary functions of the Gabor expansion are used as the initial set of projection functions. The optimal truncated projection function for a given support width is obtained as follows: • Normalize the projection function PFi(t;n,k,p) to satisfy (4.78). • Calculate the overlap elements q(i,j,r; n, k,p). • Calculate the new projection function using (4.83). • Truncate the projection function to the pre-specified support width. • Repeat until sequence has converged. The recursive procedure will converge if the sampling lattice is defined such that the Gabor expansion is unique. The resulting set of projection functions can be used in either the steepest descent or warping methods. Chapter 4. Solving for Gabor Expansion Coefficients 39 An interesting example of the locally biorthonormal projection function for the Gabor sampling lattice occurs when the support width is restricted to AT. The window function for this example is given by ' Kexpl^Y} for -MLref is a reference orientation. In the segmentation examples of chapter 8, the reference orientation is ref = — f; the number of orientations is = 4. The orientation spacing has been previously defined by (3.36). Equation (3.36) is independent of frequency, determined solely by A. Combining (3.36) and (5.85), A can be expressed as 5.2 Frequency The frequency spacing is given by h - f > - i = A F k \\ A F k - ' - (5.88) Unlike the orientation spacing, the frequency spacing is dependent on the modulation frequency of the elementary function. A given modulation frequency can be defined in terms of a higher modulation frequency. The recursive relationship is given by Since A is determined by the orientation spacing, only a and the reference frequency / r e / are required to define the frequency coordinates. The frequency coordinates are given by A - ^ S ^ l \" <5-90> where AA: is the difference in the frequency index between the reference frequency / r e / and the desired frequency fk-5.3 Spatial Sampling Lattice The sampling intervals in the spatial domain (AX, AY\") can defined in terms of ct,\\,fk, and fa. Since these variables have been defined in the previous sections, the spatial Chapter 5. Implementing a Log-Polar Lattice 43 sampling intervals for each spectral coordinate (fkti) is already determined. The spa-tial lattices will be defined once a spatial origin (xref,yref) is selected for each spectral coordinate. The coordinates for the rotated x-y lattice are defined as 20,5A x(n,m, k,l) = xTef(k,l) —[nsin j^ -macos^j] (5.91) /* and 205A y(n,m, k, I) — yref(k,l) —[ncos^ j + ma sin i) is given by 7T M(n,m,k,l) = [o2(n,m, k,l,0) + o2(n,m, k,l,— )]0'5. (6.95) M{n,m,k,l) is referred to as the \"oriented magnitude.\" The phase at (xn,ym, fk,i) is given by 0(n,m,k,l) = arctan 4^ 444^ - (6-96) a[n,m,k,l,0) 44 Chapter 6. Local Measures 45 6.1 Local Measures for a Single Image Three local measures are proposed for use on a single image. These measures include the local magnitude, the local preferred orientation, and the normalized local moment of inertia. The latter two measures use the local magnitude as a confidence estimate. 6.1.1 Local Magnitude The magnitude of the image localized about (xn, ymifk) is given by M(n, m,k) = ^ 2 M(n, m, k, I). (6.97) l Equation (6.97) assumes that the spatial lattice points (xn,ym) are the same for each ori-entation ,2Am, (6.98) < where Am; is a mass and rj is the radial distance of the mass from the axis of rotation T (see figure 6.16). In the present application, the moment of inertia is defined in the spatial frequency (spectral) domain as shown in figure 6.17. The mass Am; is replaced by the oriented magnitude Af(n,m, fc,Z). The radial distance is defined by ri = fksin(T(n, m, fe) = 0.5 arctan — — . (6.101) }2i M(n, m, K , /) cos 2T). (6.102) The two orientations defined by (6.101) are substituted into (6.100) to determine which corresponds to the minimum moment of inertia. The axes with the minimum and maxi-mum moment of inertia are normal to each other. The preferred orientation x{n, m, k) is independent of the local magnitude M(n, m, k). The moment of inertia, as defined in (6.100), is dependent on M(n, TO, k). To eUminate its dependence on the local magnitude, the moment of inertia could be normalized such that f i u\\ D M(n, TO, k, I) sm2[i - m>k) = DM(n,TO,fe,/) ' ( 6 J 0 3 ) The new measure, the normalized local moment of inertia, is independent of M(n, TO, fe) and is restricted to the range 0 < Ir(n, m, fe) < 0.5. (6.104) Since a small normalized moment of inertia indicates tight clustering about the preferred orientation, /r(n,m, fe) can be considered a pseudo-variance of the local orientations. Chapter 6. Local Measures 48 In later stages of visual processing, the local measures will be combined to characterize a larger pixel grouping. In order to combine local measures properly, each measure is assigned a confidence estimate. The confidence of a measure is inversely proportional to the sensitivity of the measure to errors in the oriented magnitude M(n,m,fc,Z). The sensitivity derivative for T(n,m,k) is given by 6T(n,m,k) _ IT(n,m,k) 6M(n,m,k,l) M(n,m,k)' Thus, the confidence of the measure is high if the normalized local moment of inertia is small and the local magnitude is large. For simplicity, the confidence of the local preferred orientation measure will be defined as C^T)(n,m,k) = M(n,m,k). (6.106) The sensitivity derivative for Jr(n,m, k) is given by Slxin, m, k) sin2(i — r) — Irin, m, k) (6.107) 8M(n,m,k, I) M(n,m,k) If the local magnitude is large, the confidence of the measure is high. The confidence of the normalized local moment of inertia measure will be defined as Cf(T)(n,m,k) = M(n,m,k). (6.108) 6.2 Local Measures for Two Related Images If two related images are available, the differences between the reference image and the other image can be used as measures. Two difference measures are proposed: the local dif-ference in phase and the local difference in preferred orientation. The difference in phase is localized to the region defined by an elementary function centered at (xn,ym,/&,<&). To be consistent with the single image measures, all measures should be localized about Chapter 6. Local Measures 49 (xn,ym,fk). This can be achieved by splitting the local difference in phase into two mea-sures: magnitude and direction of translational displacement. The difference in preferred orientation is localized about (xn)ym, fk). 6.2.1 Local Difference in Phase The difference in phase at the lattice point (xn,ym, fki(x,y;xn+8x,ym,fk,0,p) = g{y;ym)g{x;xn+Sx)cos[2Tt fk(x-xn)-2-K fkSx-p). (6.116) If (6.114) is phase shifted by AO then i>(x,y;xn,ym,fk,O,A0 + p) = g{y]ym)g{x\\xn)cos[2irfk{x -xn) - A6 -p]. (6.117) If (n,m,fc)-^j]. (6.119) If D(n,m, fc, I) is negative, the direction of the component displacement is ix + fa. In (6.119), the object's component displacement is measured using A0. Since A6 is restricted to (—7r,7r], the measurable component displacements are restricted to - ^ < £ > ( n , T O , f c , 0 < - L (6-120) Larger component displacements are incorrectly estimated because the measure is aliased into the restricted range. Equation (6.119) has two unknown variables: the magnitude D(n,m,k) and the di-rection D(n,m, k) = arctan A / ( n > T O ' f e ) (6.122) Dx(n,m,k) where D(n,m, fe, i) sin fa - D(n, TO, k,j) sin fa Dx(n, m, fe) = ^-f- —? (6.123) sm[fa - fa) D(n,m,k,i) cos fa + D(n,m,kJ) cos fa /R-1OA\\ Dy(n, TO, fe) = -7-7-7 J T . (6.124) 6m(fa - fa) The above solution requires two component displacements. Since a typical log-polar lattice contains more than two orientations, a weighted combination of all component dis-placements can be used. The weight assigned to each component displacement equals the confidence of the local difference in phase measure. The weighted least square estimate for D(n,m, fc) and r>(n,m,k) minimizes J = 52 C&ein, rn, fc, l)e2(n, TO, fc, I), (6.125) I Chapter 6. Local Measures 52 where e(n,m, k,l) = D(n,m,k,l) — D(n,m, fe) COS[<£D(B,TO, k) — fa], (6.126) The solution which minimizes (6.125) is given by n U ™ k\\ Eijv>ij(n,m,k)D(n,m,kj)sinfa L>x[n,m,k) = — . JT (6.127) n / i \\ E,j ivijjn,m, k)D(n,TO, fe, j) cosfa L»y(n,m,fe) = — - . — (6.128) E,;- wj - fa) Afarin, TO, fe) = 0.5 arctan _ , , , , :—r: r y - r TT- (6.130) Eij AZo(n,m,fc,t)Mi(n,TO,fe,j)cos2(0j- — fa) The sensitivity derivatives for the local difference in preferred orientation can be expressed as SAfar(ntm, fe) £far(n,m, fe) (Ofo(n,TO, fe,t) i5A/o(n, TO, fe, *) and £A<£r(n, TO, fe) Sfar(n,m, fe) 6Mi(n,m,k,j) 6Mi(n,m,k,j)' The confidence of the measure will be defined as (6.131) (6.132) CA(T)(n,m,k) = « (n,m,k) — -. (7.139) ti — Io Because the translational and rotational velocity estimates are obtained from angular measurements, both can be corrupted by aliasing. To avoid aliasing, Hmits are imposed on the maximum measurable velocities. The measurable translational speed is l imited to V(n,m,k)< 1 (7.140) Zjk\\t\\ — to) The maximum measurable speed is larger for elementary functions that have lower mod-ulation frequencies. Lower frequency estimates are less l ikely to be corrupted by aliasing. It is possible to predict i f aliasing has corrupted a high frequency estimate by checking Chapter 7. Forming and Characterizing Pixel Groupings 58 lower frequency estimates, provided that the texture of the moving region has a wide spectral bandwidth. Any estimate of translational motion may suffer from the aperture problem [8]. In such a case, only the component of the motion in the direction of the preferred orien-tation can be measured. Motion estimates suffering from the aperture problem occur in regions where the local moment of inertia is small. Since combining local measures often increases the moment of inertia, the aperture problem can usually be resolved in the characterization stage. The measurable rotational velocity is conservatively restricted to < u)(n,m,k) < * . . (7.141) 4 ( t 1 - t „ ) ' 4(* 1-i 0) The range of measurable rotational velocities is independent of the modulation frequency, thus aliasing can not be resolved using lower frequency estimates. 7.1.3 Segmentation by Depth \"Depth\" refers to the distance of a viewed object from the camera. This distance is measured along the z-axis in a three-dimensional space. The depth of a point in a three-dimensional space can be estimated if images from two or more viewpoints are available. The case of two viewpoints is shown in figure 7.18. A point (x,y, z) will be projected onto an image plane for the left and right viewpoints at positions (scojj/o) and (a;i,j/i), respectively. The displacement in the position of the projected point along the viewing baseline (referred to as the \"baseline displacement\") is given by DB{x0,yo) = - *o)2 + (yi - 3fo)2]0-5 « M # B - arctan K i ^ . ] , (7.142) I l — X0 Chapter 7. Forming and Characterizing Pixel Groupings 59 Figure 7.18: Vision System with Two Viewpoints Chapter 7. Forming and Characterizing Pixel Groupings 60 where B is the orientation of the viewing baseline. From the geometry shown in figure 7.18, the depth of the projected point is given by z(x0,y0) = zf[- , J , (7.143) e — JJB{x0,y0) where Zf is the focal length and e is the baseline separation of the two viewpoints. Instead of estimating the depth of a point, this implementation attempts to estimate the depth of a region. The size of the region is defined by the spatial extent of the elementary function centered at (n, m, k). A higher frequency elementary function defines a smaller region. In order to estimate depth, a Gabor expansion is performed on the left and right images. The baseline displacement is calculated using local phase differences between the left and right images. If the same spatial origin is used for each Gabor lattice, the estimated depth of the region centered at (n,m,k) is given by where DB(H, m, k) = D(TO,TO, k) COS[B — <^ p(n, TO, k)]. (7.145) D(n, TO, k) and <£D(TO, TO, k) are the magnitude and direction of the measured displacement (see section 6.2.1); DB(n,m,k) is the baseline displacement. Because the baseline displacement D B ( B , T O , k) is measured using local phase differ-ences, the depth estimate may be corrupted by aliasing. An aliased depth estimate will appear closer to the observer. The range of measurable depths is restricted to 0 < z(n,m,k) < n ZJ (7.146) 2e/fc - 1 In order to measure depths that are farther, the above-mentioned range must be transformed by shifting one of the spatial sampling lattices. If the left view is used as Chapter 7. Forming and Characterizing Pixel Groupings 61 the reference viewpoint, the origin of the spatial lattice in the right image is shifted to the left along the viewing baseline. The estimated depth for the shifted binocular vision system is given by z{n>m>k) = Z'[e-DB(n,m,k;6.)-6.]' ( 7 ' U 7 ) where 8t is the lattice shift along the viewing baseline, and -D#(n, m, k; 8t) is the measured baseline displacement when one of the lattices is shifted by 6S. The range of measurable depths for the shifted binocular vision system is given by M i ^ ^ ] < ^ , m , , ) < 2 y [ ^ M ± L _ ] . (7.148) The lattice shift focuses the system on a reference depth. The reference depth, which corresponds to ##(71,771, k; 8t) = 0, is given by = */[-Ar]- (7'149) e — o, Since the reference depth must be positive and finite, the lattice shift must be restricted to 0 < 8, < e. (7.150) Because there is a limited range of measurable depths for a given frequency /*., it may be necessary to choose a reference depth that corresponds to a region of interest. If the lattices are shifted to focus on this region of interest, the sign of the displacement will separate near and far regions. Near (far) regions are closer to (farther from) the observer than the reference depth. The near and far region segmentation requires a focusing mechanism to direct the lattice shift. The displacement D(n,m, k;6a) can be used to direct the lattice shift if no aliasing has occurred. For larger displacements, an alternative approach is required. If the region of interest has a wide frequency bandwidth, lower frequency estimates can be used to determine if aliasing has occurred. If aliasing has occurred, the lower Chapter 7. Forming and Characterizing Pixel Groupings 62 frequency estimates can be used to make coarse adjustments to the spatial origin. Since the largest adjustment is e, the lowest frequency in the set of elementary functions should be chosen such that fmin < (7-151) Finer adjustments can be made later using the higher frequency estimates, thereby re-ducing the size of the region of interest. Segmentation by depth is similar to motion segmentation. The magnitude maps are segmented to identify depth measures that have high confidence. A reference depth is selected; the depth measures are segmented into near and far regions. 7.1.4 Integrating Maps to Form Relevant Pixel Groupings The edge detection, texture segmentation, motion segmentation, and segmentation by depth algorithms produce various maps. The edge detection produces one map that combines magnitude information from each frequency. The other segmentation methods produce individual maps for each frequency. Each map provides its \"best guess\" of how the image is to be segmented. These guesses are integrated to obtain the best possible segmentation. In order to segment a given map, each point is assigned a label based on the local measures. The edge detection labels include edge and not edge. The texture segmen-tation labels include insufficient spectral energy, high moment of inertia (no preferred orientation), and orientation. The motion segmentation labels include insufficient spec-tral energy, aliased measure, low moment of inertia (aperture problem), and motion. The segmentation by depth labels include insufficient spectral energy along the viewing base-line, aliased measure, and relative depth. Orientation, motion, and relative depth are considered quantitative measures. For quantitative measures, the value of the measure Chapter 7. Forming and Characterizing Pixel Groupings 63 1 , • 1 4 t \\ \\ 4 w 1 ^ Four Point Eight Point Figure 7.19: Four and Eight Point Connectivity is included in the label. After the labels have been assigned, regions and edges are formed by connecting ad-jacent points with common labels. Points that have quantitative measures are connected if the difference is small. The edges are formed using an eight point connectivity; the re-gions are formed using a four point connectivity. The four and eight point connectivities for a rectangular spatial sampling lattice are shown in figure 7.19. Once the regions and edges are formed, the various maps are integrated. The inte-grated map has a spatial sampling density equal to that of the highest frequency map. When the maps are being integrated, a preferred scale (frequency) must be chosen. The preferred scale defines the minimum size of a region and the maximum local density of edges. The choice of scale can be guided by the confidence estimates within individual segmentation maps. Even though a preferred scale is chosen, the higher frequency maps are still required for labeling points near texture boundaries [24] and for the localization of edges. The preferred scale may be different for each region. Chapter 7. Forming and Characterizing Pixel Groupings 64 7.2 Characterizing Regions and Edges Once the regions and edges have been formed, local measures are combined to provide more global characterizations. These characterizations are used to infer three-dimensional properties. 7.2.1 Texture of a Region The texture of a region is characterized by a preferred orientation, a normalized moment of inertia, a dominant frequency along an orientation, and an omni-directional dominant frequency. The respective measures are expressed as MR) = 0.5 arctan ^ X ^ ^ ^ u ^ ^ K ) E f c ^)/ f c 2 En,m€flDM(n,m,fe, /)cos2^ V ; r r m _ S» NWk E l M(n,m, k, I) s i n 2 Q r - fa) M ] ~ E f ciV(fe)/ 2E n,m eHDM(n,m,fe,Z) K 1 ZkN(k)Znim€RM(n,m,k,l) y > V 1 ZkN(k)Y:ntmeRM(n,m,k) v ; where R denotes the region being characterized. The weighting term N(k) can be used to mask frequencies that are not relevant. The lowest relevant frequency is determined by the size of the region being characterized. Any low frequency elementary function whose effective spatial extent is larger than the region is not required to characterize the texture. Texture can be used to infer surface properties. If the three-dimensional surface has a homogeneous microstructure, perspective distortion associated with differences in depth will cause changes in the local density of the image texture. The magnitude and direction of the maximum change in the local density of the texture is measured by the Chapter 7. Forming and Characterizing Pixel Groupings 65 texture gradient. To obtain the texture gradient, a region is divided into smaller sub-regions. By calculating the difference in dominant frequency between pairs of sub-regions, the magnitude and direction of the texture gradient can be estimated. The magnitude and direction of the texture gradient can be used to estimate the slant and tilt of a homogeneous surface [8]. Changes in the preferred orientation across subregions can be used to infer surface curvature [8]. 7.2.2 Edges For an edge, the lowest relevant frequency depends on the dominant frequencies of the adjacent texture regions. The spectral peaks, which identify the edge, appear in only the magnitude maps that have frequencies higher than the dominant frequency. Magnitude maps whose frequencies are less than or equal to the dominant frequency of the adjacent regions are not required for edge detection. Edge and depth information can be combined to detect occluding contours. An occluding contour is a closed edge that follows a discontinuity in depth. The occluding contour provides an outline of the two-dimensional projection of an object. By assuming certain constraints [8], the two-dimensional outline can be used to infer three-dimensional shape. 7.2.3 Three-Dimensional Position and Motion In order to estimate the three-dimensional position and motion of an object, the view-point constraints relating the two-dimensional projected coordinates to the original three-dimensional coordinates must be known. The three-dimensional coordinate (x, y,z) is related to the two-dimensional projection coordinate (x,y) as follows: Chapter 7. Forming and Characterizing Pixel Groupings 66 V = -T-y, (7-157) Zf + z where Zf is the focal length of the camera. The problem of converting the two-dimensional projection coordinates into three-dimensional physical coordinates is underconstrained for a single image. A second image from a different viewpoint or time is required to fully constrain the problem. Three-Dimensional Position The two-dimensional projected position of a segmented region will be defined by its center of area. The center of area is given by i 0 = ^ n ^ X n (7.158) Ho = EmAy^m, (7.159) MR where NR is number of lattice points in the segmented region R. The three-dimensional position of the object corresponding to the segmented region is given by Zf + z(x0,y0) (T Mtn\\ «o = [— Jso (7.160) z f 2/o = [— j2/o (7.161) zi z0 = z(xQ>y0). (7.162) Three-Dimensional Motion Three-dimensional motion of a point (x,y, z) can be expressed as Vx{x,y, z) = Tx + ( y - yc)R* - { z - zc)Ry (7.163) Vy{x,y,z) = Ty + ( z - zc)Rx - ( x - xc)Rz (7.164) Chapter 7. Forming and Characterizing Pixel Groupings 67 Vz{x, y, z) = Tz + (x- xc)Ry - (y - yc)Rx (7.165) where Tx,Ty,Tz are the components of the translational motion along the x-, y-, and z-axes; Rx, Ry, Rz are the components of rotation viewed along the x-, j/-, and z-axes; and (xc,j/c,zc) is the center of the rotation. The two-dimensional projection of the motion along an orientation i is given by Assuming that Zf and i are known, the unknown variables in (7.166) include nine motion parameters plus the depth at each point (x,i7). Instead of measuring the motion of a point, this implementation measures the motion of a region defined by an elementary function. The apparent two-dimensional motion of the elementary function (n, m, fc, I) along the orientation (t',n,k,p)\\2dt = 1. (B.189) The overlap between a pair of elementary functions, (i,j,r) and (n, is given by «(*,j,r;n,fc,p) « (-l)(\"+0(*-i) e X p - * K \" \" * ) 2 + ( * ~ cos(p - r). (B.190) From (B.190), it can be seen that g() is dependent on the time and frequency separation between the two elementary functions, but not on the reference coordinate (n, k,p). Assume that the Gabor lattice extends to infinity in both time and frequency. Under this assumption all elementary functions in the Gabor lattice have the same overlap characteristics. In such a case, the lowest optimal convergence factor is given by ^ B = E ^ ^ l ^ i , r ; n , f c , P ) l ' 104 Appendix B. Lower Bound for the Optimal Convergence Factor 105 where NEF is the number of elementary functions in the lattice. Since all of the overlap characteristics are the same, the denominator of (B.191) can be rewritten as EEl«( ,'»it ritt»*»P)l = NEF^2\\q(i,j,r;n,k,p)\\. (B.192) nkp ijr ijr Therefore, the lower bound for the optimal convergence factor for the Gabor lattice is given by filB = £ ^ ( J r ; n , M f ( B \" 1 9 3 ) Because the lattice has a finite size in any practical implementation, (B.193) is only an approximation. In a finite Gabor lattice, estimates of the lower bound are larger if the reference coordinate (n, k,p) is near the endpoints of the lattice. Equation (B.193) can be considered a local lower bound which is assigned to the update equation for the coefficient ankp. The minimum of the local lower bounds can be used as the global lower bound for the optimal convergence factor. The concept of a local lower bound can be applied to other types of lattices. The local lower bound for a one-dimensional Gabor expansion is approximated by /3lB(n, k,p)« (B.194) The local lower bound for a two-dimensional Gabor expansion is approximated by «.(-.».*•»,>)« i.Xr-Mk.W ( B - 1 9 5 ) Appendix C Partial Derivatives for Three-Dimensional Motion This appendix contains the partial derivatives of the three-dimensional motion cost func-tion with respect to the motion and depth parameters. The motion cost function is given by where Let J= ^iV(ife)2C(n,m,fc,/)e(n,m,ife,/)2, (C.196) nmk I e(n,m,k,l) = D(n,m,k,I) - V(n,m,k,I)At. (C.197) kx = Zf cos fa, (C.198) ky = Zfsinh (C.199) and kz = — (x cos i + y sin i). (C.200) The partial derivatives of (C.196) with respect to the motion parameters are given by ST k £L = -2At £ N(k) £ Ce-±- (C.201) = -2At 52 N(k) 52 Ce-^- (C.202) C1V nmk I ' ST k = -2A* 52 m £ Ce-f- (C.203) 0 1 * nmk I Zf-Tz U = - VclL - 2At S N(k) 52 Ce[-^- - (C.204) SR. STy 8TZ nmfc , Zf + Zf 106 Appendix C. Partial Derivatives for Three-Dimensional Motion 107 6 J 8 J S J 2At^N(k)Y:Ce[^-^-) (C.205) SRy CSTZ CSTX nf* , *f zf + * 8 J =yc£-xc£-2At^N{k)^Ce[yc08l] (C.206) 6RZ 8TX 6TV n m k , ( c ' 2 0 8 ) SJ „ SJ „ SJ ,„„„„. Km*^T.-*-W,- (c-209) Since the motion parameters are constant for a given rigid body, the last three deriva-tives are linearly dependent on the translational derivatives. Thus, only six of the nine derivatives are independent. The partial derivative of (C.196) with respect to the depth z(n,m,k) is given by E j ^ J J = 2 A ( E , ^ {T.k. + T„k, + T,k, +Rx[-ky(zf + zc) + kzyc] +Ry[kx(zf + zc) - kzxc] +Rz[-kxyc + kyxc]}. (C.210) "@en ; edm:hasType "Thesis/Dissertation"@en ; edm:isShownAt "10.14288/1.0064988"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Electrical and Computer Engineering"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en ; ns0:scholarLevel "Graduate"@en ; dcterms:title "The use of the Gabor expansion in computer vision systems"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/27824"@en .