Invariant Representation of Image Functions under Gamma Correction and Similarity Transformations i>y Andreas Siebert M . S . , T h e University of G e o r g i a , 1991 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF Doctor of Philosophy in THE FACULTY OF G R A D U A T E STUDIES (Department of C o m p u t e r Science) we accept this thesis as conforming to the required standard The University of British Columbia October 2000 © A n d r e a s Siebert, 2000 In presenting degree freely at this the thesis in partial fulfilment University of British Columbia, I agree that the available for copying of department publication this or of reference thesis by this for his thesis and study. scholarly or. her for Department The University of British C o l u m b i a Vancouver, Canada DE-6 (2/88) the requirements I further agree that purposes may representatives. financial permission. of gain It shall not be is an permission for granted allowed advanced Library shall make by understood be for the that without it extensive head of my copying or my written Abstract This work focuses on image retrieval and recognition in environments where the images are subject to a non-linear brightness change known in image processing as gamma correction. Our empirical data shows that gamma correction changes images significantly, resulting in poor retrieval results if not addressed. The proposed solution is based on a novel differential invariant under this kind of radiometric transformation. Since imaged objects are often subject not only to radiometric changes but also to variations of the scene geometry, we propose a representation of two-dimensional image functions that is simultaneously invariant under gamma correction and some geometric transformations, namely translation, rotation, and scaling. • An implementation of the proposed invariants based on derivatives of the Gaussian is given. For gamma correction without geometric scaling, improved image retrieval performance based on the invariant representation is demonstrated in both a template matching scenario and a histogram based retrieval system. The proposed invariants perform unsatisfactorily under scaling. The key reasons for this behavior are discussed, and empirical data on the accuracy of the proposed invariants are provided.. Contents Abstract ii Contents iii List of Tables vi List of Figures viii Acknowledgements 1 2 xii Introduction 1 1.1 Gamma Correction 2 1.2 Image Retrieval Systems 4 1.3 Image Functions and Derivatives 5 1.4 Geometric Invariants and Transformations . 8 1.5 Research Hypothesis 14 Background 2.1 2.2 16 Image Retrieval Systems 16 2.1.1 Histogram Based Retrieval 17 2.1.2 Wavelet Based Retrieval 18 2.1.3 Invariant Based Retrieval 18 2.1.4 Other Retrieval Techniques 21 Derivatives 22 iii 3 2.3 Scaling 24 2.4 Cameras and Gamma Correction 27 Differential Invariants 3.1 3.2 3.3 3.4 4 Differential Invariants for 1-d Functions . 29 3.1.1 Invariants under Scaling 29 3.1.2 Invariants under Gamma Correction 31 3.1.3 Modification of the Invariants 40 Differential Invariants for 2-d Functions 42 3.2.1 Rotationally Symmetric Operators . . 44 3.2.2 Validation of the 2-d Invariants 46 Implementation of Differential Invariants . 48 3.3.1 Gaussian Kernels 48 3.3.2 Convolution 49 3.3.3 Combined Scaling Operators 51 Summary 55 3.4.1 The Whole Family 55 3.4.2 Computational Complexity 56 Invariance under Gamma Correction: Empirical Analysis 4.1 5 29 59 Application Independent Performance Measures . 60 4.1.1 Accuracy 62 4.1.2 Reliability 67 4.1.3 Entropy 68 4.2 Application Scenario I: Template Matching 4.3 Application Scenario II: Histogram Based Retrieval 71 r ,. . . 82 Invariance under Scaling: Empirical Analysis 97 5.1 98 Assumptions and Error Sources iv 5.2 5.3 5.4 6 Performance Measures 108 5.2.1 Accuracy Ill 5.2.2 Reliability 5.2.3 Entropy . . .116 •. . 116 Excursion: The Averaged Wavelet Transform 118 5.3.1 Shift Variance of the Wavelet Transform 121 5.3.2 Shift Invariance by Averaging . 122 5.3.3 The AWT Filterbank .124 The Quest for Characteristic Points in 0 Space 127 Conclusion 134 Bibliography 137 Appendix A The 1-d Gaussian and its Derivatives 145 Appendix B The 2-d Gaussian and its Derivatives 146 Appendix C Test Images Database 148 Appendix D Binary Correlation Accuracy Matrices 151 Appendix E Histogram Distance Matrices 158 Appendix F Point Spread Function Patterns 161 Appendix G Scaling Behavior of the Invariant Appendix H Averaged Wavelet Transform H.l O i23 m 163 168 1-d AWT 168 H.2 2-d AWT 171 Appendix I © Space 173 v List of Tables 3.1 Invariants under similarity transformations and brightness changes 55 4.1 Error measures for 0 i 2 66 4.2 Percentages of reliable points for T O 7 ©m.127 . : 68 4.3 Normalized entropy H, 16 bins 69 4.4 Normalized entropy H, 256 bins 71 4.5 Correlation accuracy CA, template size 6 x 8 80 4.6 Correlation accuracy CA, template size 10 X 10 82 4.7 Distance matrix, intensity images 90 4.8 Distance matrix, invariant representation, a = 0 90 4.9 Distance matrix, invariant representation, a = 1.0 92 4.10 Recognition rates R 92 4.11 Mean runner-up distinctions 94 5.1 Error measures, camera scaling vs. synthetical scaling 5.2 Error measures for 0 i2 113 5.3 Error measures for 0 i23 113 5.4 5.5 Percentages of reliable points for Normalized entropy H . . . E.l Distance matrix, intensity images, a — 0 E.2 Distance matrix, intensity images, a = 1.0 m m O i2, ©mi23 • • m vi . . . 108 116 118 ,. . . . 158 160 E.3 Distance matrix, invariant representation, a = 0 160 E.4 Distance matrix, invariant representation, a = 1.0 160 vii List of Figures 1.1 Gamma correction intensity mapping 3 1.2 The invariant framework 5 3.1 A 1-d example function, its derivatives and its invariants . . . 43 3.2 The 1-d Gaussian and its derivatives 49 3.3 Partial derivatives of the 2-d Gaussian 53 3.4 The 2-d Gaussian combined operators 54 3.5 Example images and their invariants 58 4.1 Camera gamma correction versus synthetic gamma correction 61 4.2 Accuracy of invariant © i 2 7 63 4.3 Accuracy of invariant Q i2-y with prefiltering 4.4 Reliable points for 0 i 2 4.5 Histograms for entropy computation for Q i2-y 4.6 The template location problem 4.7 Template matching on synthetic images 74 4.8 Template matching on real images 76 4.9 Pseudocode for template location 77 4.10 Binary correlation accuracy matrices . 78 4.11 Comparison of BCAMs 79 4.12 Correlation accuracies 81 4.13 Correlation accuracy medians and means 81 . . m 64 m m 7 • • • 70 m . viii 67 72 4.14 Histogram based retrieval . . . < . 83 4.15 Mag-Lap 1-d feature histograms 86 4.16 Mag-Lap 2-d feature histograms for intensity image 87 4.17 Mag-Lap 2-d feature histograms for invariant representations 88 4.18 Retrieval rankings for query image Build 89 4.19 Retrieval rankings for query image WoBA 89 4.20 Retrieval rankings for query image ToolA ,. . 4.21 Pseudocode for recognition rate computation 89 91 4.22 Recognition rates R : 93 4.23 Mean runner-up distinctions 94 5.1 Interpolation kernels 100 5.2 Point spread function, razor edge 102 5.3 Area based sampling of a chessboard pattern 103 5.4 Camera scaled images 104 5.5 Camera scaling and synthetical scaling with the Gaussian 107 5.6 Creation of new extrema in 2-d by Gaussian smoothing 109 5.7 Synthetic scaling scheme 110 5.8 Example alphal image Build, for 5.9 Scaling by filtering vs. scaling by optical zooming, for 0 T O i2 and O i23 • m 0 i2 . 112 . . '. ' m 110 5.10 Camera scaled invariant vs. SO invariant 0 i2, no prefiltering 114 5.11 Camera scaled invariant vs. SO invariant 0 i2, with prefiltering 115 • 117 5.12 Reliable points for invariant m m 0 i2 m 5.13 Histograms for entropy computation, for 0 i2 m and 0 i23 5.14 Shift variance of the wavelet transform T O • • • 119 122 5.15 Pseudocode for the averaged wavelet transform . 123 5.16 Shift invariance of the averaged wavelet transform 124 5.17 1-d AWT 125 filterbanks 5.18 2-d AWT filterbank, Haar wavelet, 126 ix 5.19 2-d AWT filterbank, Daub wavelet 126 5.20 2-d AWT filterbank, Daub wavelet 126 5.21 2-d AWT, Haar filters, applied to image Build 128 5.22 Pseudocode for 0 space creation 129 5.23 Standard deviation in O i2 130 5.24 Standard deviation in 0 i23 5.25 Trajectories through 0 i2 m 5.26 Trajectories through 0 i23 2 8 Cl m m space space 130 space m • 131 space . 132 Test images I 149 C. 2 Test images II 150 D. l 152 Correct maximum correlation position images I, template size 6 x 8 D.2 Correct maximum correlation position images II, template size 6 x 8 153 D.3 Comparison of BCAMs, template size 6 x 8 154 D.4 Comparison of BCAMs, template size 6 x 8 155 D.5 Correct maximum correlation position images I, template size 10 X 10 156 D. 6 Correct maximum correlation position images II, template size 10 X 10 157 E. l Retrieval rankings for query image Build 159 E.2 Retrieval rankings for query image WoBA 159 E. 3 Retrieval rankings for query image ToolA 159 F. l Point spread function, cross pattern, Sony camera 162 F. 2 Point spread function, cross pattern, Olympus camera 162 G. l Scaling by filtering vs. scaling by optical zooming, for 0 i 3 164 G.2 Camera scaled invariant vs. SO invariant 0 i23, no prefiltering 165 G.3 Camera scaled invariant vs. SO invariant 0 i23, with prefiltering 166 G.4 Reliable points for invariant m m m 2 167 0 i23 m x H.l Averaged Wavelet Transform, 1-d, Daub2 wavelet, rectangular signal 169 H.2 Averaged Wavelet Transform, 1-d, Daubs wavelet, rectangular signal • • • 170 H. 3 Averaged Wavelet Transform, 2-d, Daub2 wavelet, image Build 172 I. 1 174 Max-Min difference in 0 space xi Acknowledgements The unfaltering and uncompromising support granted to me by my research supervisor, Bob Woodham, was essential for the success of this work. His advice concerning all kinds of scientific and non-scientific problems was invaluable. David Lowe and Raymond Ng, serving on my thesis committee, added many helpful suggestions and showed many ways to enrich my work. The classy team of system administrators reliably guaranteed the smooth running of the technical infrastructure. Their support was highly appreciated. The Computer Science department of the University of British Columbia in general and the Laboratory for Computational Intelligence in particular provided a very pleasant working environment where the exchange of ideas flourished and a warm atmosphere of cooperation reigned. It was a privilege to be a part of this fine institution. ANDREAS The University of British Columbia October 2000 xii SIEBERT Chapter 1 Introduction The goal of an important class of image processing systems is to recognize or to retrieve objects. Those objects are mapped onto digital images in many ways, depending on the scene geometry and the lighting. But it is often only the identity of an object that is of interest, not its particular size or location or orientation or brightness. In fact, all those variations that an object can undergo pose a serious problem for machine recognition and retrieval. One way to address this problem is by design of an invariant representation of the objects of interest. Invariant, that is, under a specified set of transformations. This thesis focuses on a non-linear radiometric transformation known as gamma correction. The effect of gamma correction on an image and the features derived from it can be quite significant, as our empirical data demonstrates, leading to mediocre retrieval performance if no countermeasures are taken. In large, uncontrolled image databases, it is typically unknown if gamma correction has been applied to any of the images or not, and if so, the specific value of gamma is not known either. In order to solve this problem, we present a family of differential invariants under gamma correction, which are designed to be simultaneously invariant under some common geometric transformations. We show how the invariants can be implemented, how they perform in a template matching application, and how they can be integrated into a histogram based image retrieval system. We start this introduction with a section on gamma correction, the main transformation of 1 interest. A short discussion of image retrieval systems as a preeminent application area of invariant representations follows. Next, we devote a section to image functions and their derivatives, since these are at the heart of the proposed invariants. We continue with a section on the kind of geometric transformations that we aim to be invariant under, with special emphasis on scaling. Finally, we present our research hypothesis. 1.1 Gamma Gorrection Historically, the term gamma correction was used to describe the non-linearity of photographic film. In the world of electronic imagery, gamma correction is used in the image formation process and for display devices [Poy96]. Actually, these two applications of gamma correction are the inverse of each other. Here, we are only concerned with the former, but conceptually the two are the same, differing just in the numerical value of gamma. The rationale behind gamma correction in cameras lies in human perception. The sensitivity of human senses in general and of the human eye in particular follow a logarithmic law, known as Weber's law. It can also be expressed by saying that the just noticeable difference JND, = (1.1) where q is the variable of interest, is constant. The variable of interest here is brightness. Eq. (1.1) indicates the problem that occurs if brightness is quantized in a linear way. Let us assume that the human eye can perceive brightness changes of JND(,=1%, and that the brightness values are linearly quantized into 8 bits, i.e ranging uniformly from 0 to 255. Then the difference between b=250 and b=251 is 1/250=0.4%, which is less than the just noticeable difference and therefore wasteful. On the other hand, the difference between b=10 and b = l l is 1/10=10%, which is much higher than the JND, leading to perceptible blocking artifacts in the image. 1 In order to take the non-linearity of the human perception into account, quantization of the brightness values can be done according to a power law., This power law, known as gamma ^he example images printed in this work are subject to the rendering idiosyncrasies of the'printing process and should only seen as approximations. 2 correction, is defined to be I^:=pP (1.2) where p > 0 is a constant that is determined by the value of 7, 7 > 0. Actually, for 8-bit data, p is set such that the intensity values 0 . . . 255 are mapped onto the same interval, which is obtained by setting p = 2 5 5 1-7 . Fig. 1.1 demonstrates gamma correction for different values of 7 . The x 0 50 100 150 200 Figure 1.1: Gamma correction as a function of intensity. (dashed) 7 = 0.45; (dotted) 7 = 2.22. 250 (solid) linear quantization, 7 = 1; axis is the input intensity, and the y axis is the intensity after gamma correction. For 7 = 1 , we have a linear mapping, equivalent to no gamma correction. The dashed line corresponds to 7=0.45, a typical value for cameras. It can be seen that the lower intensity values get spread out (they are "allocated more bits"), while the higher intensity values are compressed (they are "allocated fewer> bits"), and the ratio between adjacent brightness levels is approximately constant. For example, if 7=0.45, then the intensity value 50 is mapped to 122.5, i.e. the lower 50 intensity levels are mapped to 122 levels under gamma correction, while intensity value 205 is mapped to 231.1, i.e. the 50 highest intensity levels are compressed to 24 levels under gamma correction. The dotted line corresponds to 7=2.22=1/0.45, a typical value for display devices, and it has the reverse effect compared to the dashed line. Where the dashed and dotted lines are far away from the solid line, the effect of gamma correction onto the original image is large. Incidentally, the non-linear quantization in input devices also precompensates for the non-linear mapping from voltage to brightness on electronic displays, i.e. for the monitor gamma. 3 For emphasis, let's reiterate that gamma correction is done for perceptual purposes only, as a kind of image enhancement operation. There is no need to do gamma correction if the images 2 are not intended for human observers but rather serve as input to machine algorithms. This way, we will find images that have or have not undergone gamma correction. We also like to stress that the power law referred to as gamma correction is a rather general transformation, which for example subsumes linear brightness changes as a special case (7=1). 1.2 Image Retrieval Systems A computer vision task that has received a lot of attention is the retrieval of image data from image databases (see e.g. [FSN+95, JFS95, Pic95, PPS96, SC96, SM97, MCL97, YI99]). Typically, a user presents the system with an image of the search object and wants a ranking of the images in the data base, according to some measure of similarity, where the instances of that search object are ranked highest. The design of a suitable similarity metric is difficult, since simple mathematical measures, like the mean squared error, take into account neither human perception nor any notion of semantic similarity. "Content Based Retrieval", where content presumably refers to the semantics of the object or scene to be retrieved, has been the buzzword in the field, but it remains, at this time, an elusive goal. The general approach that this thesis takes in order to deal with data variability and similarity is outlined in fig. 1.2. The idea is to derive an invariant representation of the input image (see e.g. [MZ92b, Wei93]) and to employ it for all vision tasks later on. We assume a certain set of a priori known transformations that the input image may have undergone, but we do not know the parameters of those transformations. The invariant representation consists of a real valued matrix of the same size as the input image. This invariance implies that similarity with respect to the defined transformations ideally becomes identity in the invariant representation. On real data, the invariant representation certainly will not be identical for all instances, but for small deviations from identity we can Note that performing gamma correction in order to compensate for the characteristics of the monitor is also done only for the human observer of the monitor output. 2 4 image 1 local differential invariants invariant representation feature extraction indexing, etc. image 2 Figure 1.2: The invariant framework. The "black box" called local d i f f e r e n t i a l invariants should produce an invariant representation of the input image, irrespective of any transformation T performed on that image. All subsequent processing is done on that invariant representation. reasonably expect straightforward metrics like the mean squared error to perform well. Note that this notion of similarity is strictly syntactic, not semantic, in that it depends only on well denned transformations, not on any knowledge or assumptions about the scene or the observer. The kind of retrieval dealt with in this work stays in the realm of 2-d and it is appearance based. That is, no models of the scenes or objects are employed. Instead, the objects of interest and the search objects are represented by their 2-d images. No 3-d aspects are covered, i.e. changes of the camera-object geometry other than similarity transformations in the image plane are beyond the scope of this work. However, there are no general 3-d invariants for 2-d image processing systems anyways since the mapping from the 3-d world to 2-d images entails a loss of information [BWR92, BWR93]. 1.3 Image Functions and Derivatives This thesis will assume that, at least conceptually, images can be described by continuous, differentiable, bandlimited functions, the so-called image functions. These 2-d image functions map a spatial location to brightness values. 3 3 We use the terms intensity and gray value synonymously with brightness. 5 Electronic imaging devices, like CCD cameras, typically sample the image function with finite size rectangular sensor elements. The classic sampling theorem states that a bandlimited continuous function can be recovered completely from the sample values if the sample rate is above the Nyquist rate. Otherwise, aliasing occurs. However, the sampling theorem assumes point sampling, while physically realizable sensors necessarily integrate over a finite area. We call this kind of sampling area based sampling, sometimes also referred to as scanning through the aperture. Area based sampling is equivalent to convolving the image function with the sensor element response function before point sampling, which amounts to a non-ideal low pass filtering. 4 In many computer vision applications, area based sampling is a reasonable approximation to point sampling, but in the context of scaling, it is important to keep that difference in mind. Note also that geometric concepts like points and lines are mathematical abstractions which are now squeezed into the straitjacket of a discrete grid and endowed with a brightness value. For the purposes of this work, the image functions are scalar, defining images in terms of their gray level values. The geometric and radiometric transformations of interest and the proposed invariants do not depend on the wavelength of the incident light. Therefore, we ignore color, but it is conceptually straightforward to consider vector image functions and compute invariants separately for each channel. The invariants proposed later on are based on differentials. Differentials are local properties which are computed on the immediate neighborhood of a pixel [RW95]. They can be computed everywhere, at each pixel location. No preprocessing step like segmentation, keypoint selection, or line extraction has to be done. This is in contrast with point based invariants where the establishment of invariants critically depends on identifying corresponding points. Also, local features tend to be more useful in the presence of occlusion than global features. Strictly speaking, the concept of derivatives is undefined on discrete points - it is a concept related to continuous functions, based on the existence of an e-environment around the point of interest. It is here that the assumption of an underlying continuous, differentiable image function comes in. Theoretically, if the conditions of the sampling theorem are met, this underlying For example, the frequency response of a boxcar filter is a sine function, and the frequency response of a Gaussian is a Gaussian. 4 6 continuous image function is uniquely denned by the sample points. In practice, however, there are many ways to compute derivatives for discrete data (see e.g. [Wei94, AndOO]). Equivalently, there are many ways to fit a continuous function to sample points. It is important to understand why the theoretically unique underlying continuous image function can not be obtained in practice. The first reason is that the interpolation function dictated by the sampling theorem is the sine function, sinc(x)=sin(x)/x. This function has infinite support, so it has to be approximated by a kernel offinitesize. The convergence of the sine function towards zero is slow, which means that small kernels give only a rough approximation, but small kernels are essential both for efficiency of computation and to keep the computation localized. Second, the sampled data is necessarily noisy. The noise sources are the light itself, which has statistical properties, and the optical and electronical devices involved in the imaging process. It is standard practice to deal with noise by low pass filtering the image. There is a multitude of low pass filters and parameter settings to choose from, each particular choice effectively resulting in a different image function. In the context of the computation of derivatives, the necessity to smooth noisy data is even more paramount. This can be seen by looking at the Fourier transforms of derivatives. If the 1-d function f(x) has the Fourier transform F(u), then its first derivative f'(x) has the Fourier transform iu>F(u), its second derivative f"(x) has the Fourier transform — LO F(LO), and 2 so on. That is, each derivation introduces a factor of magnitude u>. That means that noise, which usually consists of high frequency components, is amplified, and it is amplified more than the low frequency components that make up the signal. One way to combine smoothing with the computation of derivatives is to use the derivatives of the Gaussian. They can be determined analytically in a closed form. Then the derivatives can be computed by convolving the image with the sampled derivatives of Gaussian (see e.g. [KvD87, Lin94a, SM97]). This is actually how the proposed invariants have been implemented. 7 1.4 Geometric Invariants and Transformations Invariance, according to the Oxford English Dictionary, is "the property of remaining unaltered or of being the same in different circumstances". Or, more specific in the context of mathematics, invariant means "unchanged by a specified transformation or operation". That is, an invariant is always an invariant under transformation T . So it is critical to identify the set of transformations under which invariance is to be achieved. We have already introduced gamma correction as an instance of radiometric transformations. Some of the most basic geometric transformations are translation, rotation, and scaling, which are summarily called similarity transformations, and these are the geometric transformations we aim to be invariant under. Similarity transformations result from a rigid relative movement between the camera and the object. That is, moving the camera or the object parallel to the imaging plane results in a translation, described by the parameters Ax, Ay, rotating the camera or the object about the optical axis results in a rotation, described by the parameter </>, and moving the camera or the object away from each other along the optical axis results in an equal scaling for the x-axis and the y-axis, described by the parameter a. In other words, this constraint similarity transformation is completely defined in terms of four parameters (sometimes also called degrees of freedom, or DOF), namely the translates along the axes of the image plane, the rotation angle, and the scaling factor. Note that we exclude here rotations out of the image plane. Those would expose 5 a different aspect of the object which could be captured only with a 3-d model. The assumption that the mapping of the real world object onto the image is not changed by the similarity transformations strictly holds only under orthogonal projection. The imaging process is more accurately modeled by perspective projection, but under certain conditions, e.g. when the object is far from the camera, orthogonal projection is a reasonable approximation. In matrix In the language of matrix theory, two matrices A, B £ C " " are said to be similar if there exists a nonsingular P G C " X " h that B = P~ AP, which defines a similarity transformation. A similarity transformation amounts to a change in basis, i.e. a change of coordinate system. 5 x 1 s u c 8 notation, the 2-d similarity transformation with four degrees of freedom can be described by X a cos (j> a sin <j> X s = — asm<j> cxcoscf) Vs + y Ax (1.3) Ay As mentioned before, this thesis stays in the realm of 2-d vision, defining objects in terms of their 2-d projections onto images! Sometimes, it is argued that affine transformations which also model the skewing of an object are a useful approximation to perspective projections (e.g. in [ASBH90, MZ92b, TB97]). The approximation is not adequate if the viewing distance is small compared to the depth range of the scene, which is sometimes referred to as weak perspective. In matrix notation, affine transformations is described by y a a b c d X y + Ax (1.4) Ay Note that the affine transformations has six degrees of freedom. A helpful visualization is this: while the 4-DOF similarity transformation maps a square to a square, an affine transformation maps a square to a parallelogram. Also, all areas are changed by the same ratio, and parallelism is preserved. However, there is a price to pay for the number of transformations or parameters one wants to remain invariant under. Each additional transformation requires more data for the invariant computation and tends to decrease its robustness. Specifically, in the context of differential invariants, more transformations mean higher derivatives, which are well known to be sensitive to noise. Also, increasing the number of images that are mapped onto the same invariant representation may result in an undesirable loss of discriminatory power. Depending on the application, a square should or should not be seen as being equivalent to a parallelogram. Finally, if we disallow skew or different scaling along the axes, rotational invariance can be obtained by the use of rotationally symmetric operators. Therefore, this thesis focuses on 4-DOF similarity transformations rather than on affine transformations. This implies the assumption that there is no significant skew between different instances of the same object. 9 We also assume that the geometric transformations don't change the image function, i.e. the intensity, in any other way than the one defined directly by the geometry. For example, if I(x, y) is the value of the image function before a translation by (Aa;, Ay), then we assume that /(x+Acc, y+ Ay) has the identical intensity value after the translation. This is only an approximation since the translation changes the object-lightsource-camera geometry. The intensity as a function of that geometry is described by the so-called bidirectional reflectance distribution function (BRDF, see e.g. [Hor86]). The exact form of the BRDF depends on the surface properties. For an ideal Lambertian surface, the reflected intensity is independent of the viewing direction, but for other surfaces, it is not. The transformations we have considered are only a small subset of a potentially infinite number of image transformations. In the malleable world of digital image processing, basically any arbitrary transformation of the image could be done, irrespective of physical constraints. Some common examples are histogram equalization or lossy compression, e.g. JPEG compression, which happen to be highly non-linear, irreversible transformations. There is no way to account for all of these transformations. In the absence of a specific application scenario, it seems advisable to stick to a few commonly encountered transformations which, to a first approximation, correspond to actual physical processes that have a direct bearing on the image function, and which are also often done synthetically in software. Scaling Among the three transformations that make up the similarity transformations, scaling occupies, in the context of differential invariants, a special position, while translational and rotational invariance can be seen as straightforward, even trivial. Note that in this work, invariance under geometric transformations means that if the respective transformation maps an image point to a new location, then the value of the function at the location before the transformation equals the value of the function at the new location after the transformation. Derivatives of continuous functions do not change as the underlying function is shifted, so translational invariance is inherent in the use of differential operators. However, the area- 10 based sampling process in the imaging device, as described in section 1.3, introduces the so-called phasing effect, sometimes referred to as alignment or registration problem. That is, the response of a sensor element to an edge depends on the relative position of the the sensor element to that edge. This is a key distinction between the classical mathematical work that deals with abstract, ideal entities like points and lines, and image processing where the physics of the image formation process matters. Rotational invariance can be achieved by employing rotationally symmetric operators. Again, the geometric layout of the finite size rectangular sensor elements allows only for a certain approximation. The approximation of discs by small squares has been well studied, and the approximation error for the proposed operators turns out to be negligible. Scaling poses some more fundamental problems. Many different notions of scale can be found in the computer vision literature, so we start with specifying our notion of scale. We are interested in the geometric changes that an image undergoes as the camera moves away from the imaged object. That is, scale here is directly connected to the notion of size. Assuming that this change of size of the imaged object doesn't lead to the appearance or disappearance of any image structure, scale can be modeled as a change of variable of the image function [RV91, SM96]. Definition (Scale). Let I(x, y) be a continuous image function. Then its scaled versions are I (ax, ay), where a > 0 is the scaling factor. For a < 1 we have an enlargement ("smaller scale"), and for a > 1 a reduction ("larger scale"). The scaling factor is a continuous variable, corresponding to the continuous distance between object and camera. Notions of scale reported in the literature that are different from the one introduced here are mentioned in section 2.3. In terms of the physical constraints of the imaging device rather than in terms of continuous mathematics, scale could be measured by the number of pixels a certain world object gets mapped onto, sometimes called nominal resolution. Since the number of sensor elements of a camera is fixed, an object gets mapped onto fewer pixels as the distance between camera and object increases. This means that the object gets sampled at a lower rate, which will lead to aliasing if the conditions of the sampling theorem are not met anymore. 11 For our purposes, a change of focal length (zooming) is basically equivalent to a change in the camera-object distance. We assume the object of interest to be almost planar so that the whole object can be simultaneously in focus. The camera settings are adjusted if need be such that the object of interest always remains focussed. Among the many synthetic scale spaces that have been proposed the Gaussian scale space deserves special attention. In general terms, a scale space consists of the original signal and a family of derived signals where the derivation is controlled by one parameter, i.e. the scale parameter [Wit83]. The goal usually is to extract information from the original signal at different levels, from fine to coarse. In the case of the Gaussian scale space, the family of derived signals is created by convolving the original signal with Gaussian kernels whose size is determined by the continuous parameter o. It has been shown that the Gaussian kernel is the unique kernel that guarantees causality (i.e. no new level curves are created as the scale increases), homogeneity (i.e. all spatial points are treated the same way), and isotropy (i.e. all scales are treated the same way) [Koe84]. The change of nominal resolution as the scale changes prompts us to revisit the sampling theorem. Is an image function describing the real world bandlimited? Certainly, the imaging sensors impose a lower and an upper cut-ofF frequency on the image.. But as the camera moves closer to an object, more and more details appear, which suggests that real world scenes are not bandlimited, for all practical purposes. This induces an asymmetry onto the scaling process. The details that appear when a camera moves closer to an object can not be predicted from the original image, they represent a gain of information. By contrast, the process of disappearing details when a camera moves away from an object can be fairly accurately simulated with an appropriate filter. This asymmetry means that any comparison between images of different scale has to be done on the larger scale. Formalization of Invariance We now formalize the concept of a radiometric invariant. The term radiometric refers to brightness changes of the image function, rather than to geometric transformations which are treated slightly 12 differently. Let / : K 1-4 K Definition (Radiometric Invariant). 6 n + be an image function that maps a spatial location to a non-negative brightness value, and T R : K i-> K be a + + radiometric transformation that maps a brightness value to a brightness value. Then a radiometric invariant 0 : R + i-> K is a function such that 0(/(x)) = 0(T (/(x))) R with n e N,x € R " . Particularly, an invariant under linear brightness change obeys e(/(x)) = e(fc/(x)) and an invariant under gamma correction obeys 0(/(x)) = 0(p/(xP) with k,p,y > 0. In this definition, K + includes zero. Image functions are often quantized, typically into integers from 0 to 255, but we treat them generally as non-negative, real valued functions. For images, n=2. Again, image locations are typically sampled on integer numbered discrete grids, but we assume a continuous, real valued spatial domain. Radiometric transformations change the value of the image function, but they do not change the image geometry. By contrast, similarity transformations act on the coordinate system. That is, geometric transformations change the spatial arguments of the image function rather than the function value directly. Definition (Geometric Invariant). Let / : R n i-> R + be an image function and T G : K H-> R " be a geometric transformation that maps a spatial location to a spatial n In math parlance, gamma correction would be defined as a transformation group (see e.g. [01v95]), i.e. as a transformation acting on a space, in this case the space of image functions. The "gamma correction group" is parameterized by a pair (p,7). The identity element is (p= 1,7=1), and P2 • (pi • ) = (P2 -pi) • Z ^ • [01v95] then defines an invariant of a transformation group as "a real-valued function whose values are unaffected by the group transformations". 6 12 13 1 location. Let © be the invariant before and 0 be the invariant after the geometric T g transformation. Then a geometric invariant 0 : E + H-> R is a function such that 0(/(x)) = 9 « ( / ( T ( x ) ) ) T G with n £ N, x e R . n In the case of the geometric transformation being translation or rotation, this definition amounts to the rather trivial statement that a rigid displacement of the image function results in an equal rigid displacement of the invariant function. The situation is more interesting in the case of scaling since it actually changes the gradient magnitudes of the image function. We are interested in hybrid invariants under both radiometric and geometric transformations, so the question arises how to combine them. A standard solution is to compute the radiometric invariants in a local coordinate system. That is, the origin of the coordinate system is assumed to be at the point of interest, and the canonical orientation of the coordinate system is defined to be along the gradient direction at that point. In this work, we use rotationally symmetric operators to achieve rotational invariance at any given' point. This is an approach similar to the canonical orientation method but it avoids some of its difficulties, e.g. the 180 degrees ambiguity of the gradient direction. With a local coordinate system in place we can concentrate on invariants under radiometric transformations and scale. 1.5 Research Hypothesis The following research hypothesis has guided this work: A representation of an image function based on differentials can be designed and implemented that is simultaneously invariant under gamma correction and similarity • transformation, where the transformation parameters are assumed to be unknown. Such a representation will yield improved image retrieval results. 14 After a closer look at related work, we develop the theoretical framework for the differential invariants and show how they can be implemented in 2-d by means of the derivatives of the Gaussian. We test the accuracy of our data and algorithms by comparing the invariant representations computed from images with gamma correction to the one computed from images without gamma correction. The problems that occur if gamma correction is not addressed and the benefits to be reaped from the invariant representation are shown in two application scenarios. The first one compares the accuracy of template matching on intensity images to the accuracy on the invariant representation if the search image has undergone gamma correction but the template has not. For the given data, the mean correlation accuracy increases from approximately 50% to 74% for a template of size 6x8, and from 63% to 82% for a template of size 10x10. The second application scenario presents a histogram based retrieval system whose performance is evaluated in terms of its recognition rate. Again, the histogrammed features are first computed from the intensity images, and the resulting recognition rates are compared to the recognition rates resulting from the histogrammed features computed from the invariant representation. For the given data, the recognition rate increases from 50% to 100%. We continue with a performance analysis of the invariants under scaling, where the notion of scale refers to an increase of the object-camera distance. The empirical results we present are mostly negative in the sense that we haven't been able to robustly compute differential scale invariants, and we strive to give the reasons for this, before we finally conclude. To the best of our knowledge, no invariant under gamma correction has been reported in the literature before. 15 Chapter 2 Background The work that is of greatest interest to us here concerns image retrieval and object recognition systems. Many retrieval techniques have been proposed in the literature,, some of them based on invariants. Invariants will be discussed only as far as they are relevant to image processing applications. We focus on the image representation and feature selection side of the retrieval systems, neglecting topics like indexing and similarity metrics. By contrast, some concepts and methods that are particular relevant deserve their own sections. We have a look at a few methods that have been proposed for computing derivatives on images, indispensable for the implementation of differential invariants. The subsequent section covers the large body of work dealing with scaling. We emphasize those approaches that contribute to linear scale space theory, but we also mention some of the wide variety of alternative notions of scale. In the final section, we give a few pointers to work that deals with cameras in general and gamma correction in particular. 2.1 Image Retrieval Systems In this section, we classify the retrieval systems according to the main techniques they use. Most retrieval systems employ several techniques and methods, so clearly there is some overlap among the following subsections. Also, we acknowledge the large number of retrieval systems proposed. This limited overview cannot do justice to all of them and is necessarily incomplete. A recent 16 survey paper is [MCL97]. We use the terms retrieval and recognition synonymously. Typically, recognition involves the matching of an a priori defined model to an image description. This matching process is sometimes referred to as classification or identification or categorization or detection. The result of the matching may be binary (object present / not present), or it may provide a likelihood or confidence measure of an object being present in an image. In image retrieval, the matching is done over a possibly large set of images from an existing image database. The object of interest may occur in several images, and the user is often interested in a ranking of the matches according to some similarity metric. Nevertheless, both retrieval and recognition aim to establish the presence of a particular pattern in an image, and for the purpose of this review there is no need to distinguish between those terms. Many recently published papers have labeled themselves as content based retrieval. Ideally, the user would specify a query in semantic terms of the problem domain (e.g. "Show me all images with happy people in them"), and the retrieval system would automatically translate the query into image features that can be efficiently extracted and matched. Unfortunately, the mapping of low-level image features to meaningful high-level descriptions remains a difficult and largely unsolved problem. Therefore, we refrain from calling anyof the current retrieval systems "content based". 2.1.1 Histogram Based Retrieval Histogram based approaches to image retrieval simply count the numbers of certain features occurring in an image. This allows for fast retrieval since the dimensionality of the image is reduced to the number of histogram bins. The method is conceptually straightforward,, well understood, and easy to implement. On the other hand, all spatial information is lost. It is inherent in the reduction.of dimensionality that many images are mapped into the same histogram. Swain and Ballard [SB91] use color histograms to index images. They also introduce the idea of histogram intersection for the matching of histograms. Funt and Finlayson [FF95] extend this work by histogramming color ratios in order to achieve lower sensitivity to illumination changes. 17 IBM's QBIC system [FSN+95] is an early commercially available image retrieval system that tries to allow queries based on visual content. It uses color histograms as well as color layout and texture histograms to determine the similarity between images. Vellaikal and Kuo [VK95] propose a hierarchical scheme of histograms of different color resolution. Schiele and Crowley [SC96, SCOO] use what they call multidimensional receptivefieldhistograms which turn out to be histograms of simple operator outputs, like gradient direction, gradient magnitude, Laplacian, or Gabor filters. 2.1.2 Wavelet Based Retrieval Wavelets have been an active research field for a while. The wavelet transform provides an efficient, information preserving, non-redundant decomposition of an image into dyadic scales. In the context of image retrieval, a significant drawback of the wavelet transform is its lack of shift invariance. This problem will be covered in more depth in section 5.3. Telfer et al. [TSD+94] propose the use of wavelet features to classify sonar imagery where the best features are learned by a neural network. Jacobs, Finkelstein, and Salesin [JFS95] build a fast retrieval system by keeping only the largest wavelet coefficients for each image and quantifying them coarsely. Oren et al. [OPS+97] train a system to learn a wavelet template for the detection of pedestrians. Mallet et al. [MCKV97] design wavelets to optimize the classification of spectral data using discriminant analysis. Tieng and Boles [TB97] apply a wavelet transform to the affine invariant representation of object contours in order to recognize planar objects under affine transformations. 2.1.3 Invariant Based Retrieval A major problem in image retrieval is the sheer size of the search space. Assuming that the description and distinction of images require many features and that those features are subject to variation if the scene geometry or the lighting change, a single real world object corresponds to a large number of possible image instances. Reducing the search space by assigning different instances of the same object that are related by a well-defined transformation to the same equivalent class can increase the efficiency of the search significantly. Such equivalent classes can be 18 induced by invariants under the respective transformations. Invariants have been known for some time. A classic paper by Hilbert from 1893 [Hil93] that proved the existence of a complete system of invariants for algebraic forms was the culmination of 19 century forays into this field. th text of differential geometry. Many invariants were developed in the con- Roughly a century later Weiss, Forsyth, Mundy, Zisserman, et al. [Wei88, FMZ 91, MZ92b, Wei93] published seminal work concerning the use of invariants + in the field of computer vision. Those papers also contain tutorial introductions and historical overviews. [MZF93] presents a number of computer vision applications of invariants. The step from invariants in differential, continuous geometry to invariants in computer vision is larger than it may appear at first. Geometry deals with points and lines which are abstract entities with ideal properties, e.g. no finite width. In computer vision, we necessarily have to deal with points and lines of finite extent. Even more important, computer vision is based on discrete, sampled data where some concepts from continuous mathematics cannot be applied, e.g. the limit of a quantity towards zero. And with the exception of binary data, images represent a variable, often brightness, that can take on a large range of values. This "grayness" property has no counterpart in classic geometry. Barrett et al. [BPHB91] present several methods to extract projective invariants from images. Van Gool et al. [GMP095] build their approach to invariance on the theory of Lie groups. Wood [Woo96] reviews a wide variety of invariants used for pattern recognition. The following pointers give an idea of the many different invariants that have been proposed for various applications. Arbter et al. [ASBH90] propose the use of affine invariant Fourier descriptors for the recognition of 3-d objects. They use the Fourier coefficients of normalized contours, in this case silhouettes of airplanes, as features. Rivlin and Weiss [RW95] use local invariants for curves where the curves are represented in a so-called canonical coordinate system that depends on the curve itself. Zhao and Chen [ZC97] introduce affine invariants computed along curves, based on moments. Flusser and Suk [FS98] present invariants under blur, i.e. under convolution with a symmetric point spread function, also based on image moments. They combine invariance under radiometric (sometimes also called photometric) transformation with invariance under geomet- 19 ric transformation. The same general technique, i.e. mixed invariants based on moments, has been employed by Reiss [Rei93] and van Gool et al. [GMU96]. The radiometric transformations they address are linear brightness changes with and without offset. That is, they have the form I = ki -\-b where b may be zero. [GMU96] points out that such a linear equation is a reasonable 1 model for photometric changes if the objects are planar and far away from both camera and light sources. In that case, the bidirectional reflectance distribution function is approximately identical at all points of the object. Kliot and Rivlin [KR98] try to balance local invariants and global invariants by representing shapes as sets of contours. Invariant signatures are computed separately for each contour, and these are organized in a tree structure that guides the retrieval process. Alferez and Wang [AW99], like [FS98], combine geometric invariants with radiometric invariants. They represent curves by affine parameterizations, project them onto basis functions (e.g. wavelets, NURBS=Non-Uniform Rational B-Splines, etc.), and form invariant ratios of the transform coefficients. For the invariance under illumination, Lambertian reflectance is assumed. Abbasi and Mokhtarian [AM99] report on a retrieval system where the objects are represented by the maxima of the curvature scale space of their boundaries. The scale space is created by smoothing the curve with a Gaussian. Note that all techniques reported in this paragraph depend on object contours or shapes. They assume that robust algorithms to extract those contours or shapes are available. The work that is most relevant and most closely related to this thesis is by Schmid and Mohr [SM96, SM97]. They employ a feature vector of nine differential invariants under the group of rigid displacements, i.e. translation and rotation. They compute differentials by convolution with Gaussian kernels, building on the notion of a local jet. The n th order local jet is defined as the equivalence class of all images that agree in all partial derivatives up to the n ih order at the location of interest [KvD87, KvD92]. Schmid and Mohr's multiscale approach consists of computing the feature vector for several different values of the a of the Gaussian. They show how a ratio of suitable derivatives would be scale invariant, commenting that the computation of that invariant would actually depend on the size of the Gaussian window. Ravela and Manmatha [RM98] present a retrieval system similar to that by Schmid and Mohr, using the same differential invariants. 20 Lowe [Low99] proposes an object recognition system based on features that are invariant under similarity transformation. Special attention is given to the task of extracting key points that are invariant under scaling (compare [SMB98] for an appreciation of the difficulty of this problem). A staged filtering approach, referred to as the Scale Invariant Feature Transform (SIFT), identifies key points as extrema of a difference-of-Gaussian function that remain extrema across all scales of the difference-of-Gaussian image pyramid. Robustness with respect to illumination changes is increased by giving more importance to the gradient orientation than the gradient magnitude. The gradient orientations are the basis for the local descriptions of the image in terms of so-called orientations planes. These are demonstrated to yield robustness to local geometric distortions. Beis and Lowe [BL99] present a 3-d object recognition designed to not rely on invariant features, arguing that since there exist no invariants for general 3-d to 2-d mappings, a richer set of shape information can be employed by using noninvariant groupings. 2.1.4 Other Retrieval Techniques The retrieval and recognition techniques mentioned in this section are only peripherally related to our work. However, they give an idea of the richness of the image retrieval field and point to approaches that could benefit from the use of invariants or approaches that compete with invariant based methods. A classic technique to reduce the dimensionality of data is the principal component analysis (PCA). The idea is to decorrelate the data such that most of the signal is concentrated in the first few dimensions (or components), so that higher components contain only little information and therefore can be discarded. Turk and Pentland [TP91] use the PCA for face recognition, and PCA is also frequently used in MIT's Photobook [PPS96] system. Murgu [Mur96] employs PCA for tree crown detection. Color has been widely used in retrieval. The work by Swain and Ballard [SB91] on color indexing and by the QBIC team [FSN 95] has already been mentioned. The robustness of color + based features critically depends on color constancy. That is, the color of objects is well known to be a function of the scene geometry and of the lighting conditions. Therefore, algorithms have been developed that try to determine the color of objects regardless of those external parameters, e.g. by Barnard, Funt and Finlayson [BFF97]. Slater and Healey [SH96] propose color invariants based on the distribution of spectral reflectance. These are later used for the retrieval of satellite images [HJ96]. An interesting approach to extract invariant groupings has been proposed by Lowe [Low85, Low87] and later on employed by many other researchers, e.g. by Havaldar et al. [HMS96] and by Iqbal and Aggarwal [IA99]. The general idea behind perceptual organization is to form groupings of perceptual features of image structures that are likely to be non-accidental and invariant over a wide range of viewpoints. Typical grouping criteria are proximity, parallelism, and collinearity. Searching and matching is then performed on those groupings. Forsyth and Fleck [FFB96] propose a retrieval system where the groupings are more knowledge based, in this case related to human anatomy. Aksoy and Haralick [AH99] formulate a grouping method as a graph-theoretic clustering approach. The work by Turk and Pentland [TP91] on face recognition has triggered a large amount of research in that field. An early face retrieval system is presented by Bach, Paul, and Jain [BPJ93]. Other popular application domains are the retrieval of logos and trademarks, see e.g. Kim and Kim [KK97], and of fingerprints, see e.g. Ratha et al. [RKCJ96]. Current research focuses on retrieval in multimedia databases where video and audio data has to be accommodated. A recent survey is provided by Yoshitaka and Ichikawa [YI99]. Related work is done in the field of digital libraries, see e.g. [Pic95] or [PP96]. 2.2 Derivatives Derivatives have played a prominent role in computer vision right from the start. They are closely related to the concepts of gradient and edge. An edge, or image discontinuity, is characterized by a gradient extrema, or, equivalently, by a zerocrossing of the second derivative. However, in edge detection, one is typically interested in the existence and the precise location of the edge [MH80, Can86, TP86]. By contrast, we are mainly interested in the magnitude of the derivatives everywhere, not just at the maxima or zerocrossings. 22 In continuous mathematics, the concept of derivatives is well defined and understood. However, in image processing we deal with sampled, discrete data. This has important implications for the computation of the derivatives. Theoretically, the sample points of a bandlimited function sampled at or above the Nyquist rate allow the reconstruction of the underlying continuous function, thereby allowing an arbitrarily precise computation of the derivatives. However, the reconstruction filter is the sine function. Infinite support and slow convergence of the sine function mean that the optimal reconstruction filter can only be approximated in practice. Seen from a different angle, there are many ways to fit a function to data points, and the way this fitting is done determines the value of the computed derivative. The classic gradient operators - Roberts, Sobel, Prewitt, Hueckel - simply compute the gradient with small orthogonal masks of size 2x2 or 3x3 that give weights of opposite sign to the pixels in the neighborhood of the point of interest. These operators can be found in most textbooks, e.g. in [Jai89]. Zuniga and Haralick [ZH87] propose a so-called integrated directional derivative gradient operator which is based on a cubic polynomial surface fit. It aims to reduce both the edge direction bias and the noise sensitivity, compared to the classic operators. Weiss [Wei94] also employs a data dependent way to compute derivatives. He designs filters for derivation that preserve polynomials up to a given order. Ando [AndOO] proposes a so-called consistent gradient operator minimizing the difference between the gradient derived from a continuous function and the discrete gradient derived from the corresponding sampled function. This criterion results in gradient masks that are slight modifications of the Sobel operator. The derivatives of the Gaussian have a long history as derivative operators in computer vision. They combine the derivative computation with smoothing, which is essential in the presence of noise. The response of the Gaussian derivative operator to different edge types and to noise has been studied by Williams and Shah [WS93] and Blom et al. [BtHRKV93]. [tHRFSV94] also contains a short analysis of the behavior of Gaussian derivative kernels under noise. Deriche [Der90] shows how derivatives can be efficiently implemented, based on recursive filtering. Van Vliet et al. [vVYV98] extend this approach to design recursive Gaussian filters. 23 As previously mentioned, Koenderink and van Dorn [KvD87] use the local jet of the derivatives of the Gaussian to describe an image at the points of interest. The Gaussian is well known to be an optimal filter in the sense that it maximizes the space-frequency localization (see e.g. [Lin94a]). The Gaussian kernel also has outstanding properties in the context of scaling, as pointed out in [Koe84]. Therefore, the Gaussian has become the kernel of choice in applications dealing with linear scale space in general and with differential approaches to multiscaling in particular. Some prominent examples are Mokhtarian and Mackworth [MM86], with a scale space for curvature; Lu and Jain [LJ89, LJ92], with an analysis of edge behavior in scale space; Lindeberg [Lin94a, Lin94b, Lin98], with his scale-space primal sketch; Schmid and Mohr [SM96, SM97], with their multiscaled differential invariant vectors; and Elder and Zucker [EZ98, Eld99] with their edge blur estimation. We discuss the Gaussian in more detail in the following section. 2.3 Scaling The term scale is.used in many contexts, often with a different meaning. The purpose of a specific scale space depends on the application, so the choice of a scale space must be made with a task in mind. The application of interest here is the imaging of objects as the distance between object and camera increases. A simple model of scale for this process is variable substitution, which is equivalent to a change of coordinate system. The scaling is described by one variable, the scaling factor a. This model of scale has been adopted for instance by Schmid and Mohr [SM97]. By contrast, synthetic scale spaces can be designed to have specific desirable properties that may or may not correspond to a physical process in the real world. In a general sense, a scale space is a one parameter family of signals derived from the original signal, with the purpose to go from a fine representation of the signal to a coarse one as the scaling parameter increases. A widely used technique to represent an image at multiple scales is based on image pyramids. An image pyramid consists of several levels of image copies of decreasing resolution, typically halved from level to level. The kind of filter applied to the image at each level determines the type of the pyramid. Burt and Adelson [BA83] discuss Gaussian and Laplacian pyramids. 24. Unser, Aldroubi, and Eden [UAE93] propose a polynomial spline pyramid. Lowe [Low99] builds a pyramid with the difference of Gaussians. Marr [Mar82] employs the notion of scale in his definition of a representational scheme he calls the primal sketch. The primal sketch consists of primitives of the same kind, e.g. blobs, at different scales, and these primitives in turn are grouped into larger scale tokens. Marr and Hildreth [MH80] observe that intensity changes in an image occur over a wide range of scales. They argue that intensity changes at a given scale are best detected by finding the zero crossings of the image convolved with the Laplacian of Gaussian. Witkin [Wit83] notes that a fundamental problem for the computation of derivatives is its dependence on the size of the neighborhood (or window) used for that computation, and that there can be no such thing as a categorically correct universal scale. His answer to this problem is what he calls scale space filtering. He creates the scale space by convolving the signal with a Gaussian of increasing <r, simultaneously tracking the zero crossings of the second derivative, where the second derivative refers to the convolution of the signal with the second derivative of the Gaussian. Koenderink [Koe84] emphasizes that world objects exist as meaningful objects only over a certain range of scales, between an inner and an outer scale, and that it doesn't make any physical sense to speak of the limit of a size towards zero. But a visual system should be able to handle all images scales between the cut-off scales simultaneously (the so-called deep structure of an image). He showed that the Gaussian is the only kernel that guarantees all the following scale space properties. • Causality: Causality is taken to mean that no new level surfaces are introduced as the scale increases, i.e. "no new structure". • Isotropy. Isotropy means that all spatial locations are treated the same way. • Homogeneity. Homogeneity means that all scales are treated the same way. Isotropy and homogeneity can be interpreted as a "no knowledge" assumption, since they avoid any bias in scale or location. Koenderink also showed that the Gaussian scale space satisfies a 25 diffusion equation which describes the evolution of a heat distribution over time in a homogeneous medium with uniform conductivity. Lindeberg [Lin94a, Lin94b] extended Koenderink's work by analyzing the kernels that introduce no new extrema as the scale increases, rather than the weak "no new level surfaces" condition. It turns out that in 1-d, it is again the Gaussian kernel which has this property, but this result does not generalize to 2-d or higher dimensions. Lindeberg [Lin98] also suggests a mechanism to select a specific, presumably interesting, scale in scale space. The book edited by ter Haar Romeny [tHR94] reflects the state of scale space theory. [tHRFSV94] is a short tutorial on the same topic. B-Splines are sometimes used as an efficient, compact alternative to Gaussian kernels. An analysis of the scale space derived from B-splines can be found in Wang and Lee [WL98]. A fresh outlook on scale was induced by the arrival of wavelets. Wavelets as basis functions with compact support provide an orthogonal decomposition of a signal into dyadic scales. The orthogonality property implies that the scales are decorrelated. Unlike the image pyramids, the wavelet transform is non-redundant. Early monographs of the field are provided by Daubechies [Dau92] and Chui [Chu92], and tutorial articles e.g. by Strang [Str89], Rioul and Vetterli [RV91], and by [PTVF92]. In a seminal paper, Mallat [Mal89] proposes a wavelet representation for images. He also shows how an image can be approximated by a multiresolution approach, and how to compute the wavelet transform efficiently by recursive filtering. In subsequent work, he analyzes the zero crossings of the wavelet transform [Mal91] and employs it to characterize edges [MZ92a]. The Gaussian kernels and the wavelet transform create a linear scale space. But several non-linear scale spaces have been reported in the literature as well. Perona and Malik [PM90] propose anisotropic diffusion where the smoothing depends on the image structure. They present a diffusion equation that defines gradient driven anisotropic diffusion. Intraregion smoothing is preferred over interregion smoothing, with the goal not to blur and dislocate strong edges. A non-linear scale space can also be created with morphological operations, like dilation and erosion. Chen and Yan [CY89] demonstrate a multiscale opening operator on binary images. 26 Jackway and Deriche [JD96] use scaled structuring functions to induce a so-called multiscale dilation-erosion scale space. Florack et al. [FStHR+95] generalize linear Gaussian scale space to a non-linear one. Sometimes the terms scale or scale invariance are used in a rather simplistic manner. For example, in Del Bimbo and Pala [BP97] scale invariance refers to the assumption that all instances of a certain object have the same aspect ratio of their minimum enclosing rectangles, irrespective of their actual size in the image. The observation behind fractals is that some structures show the same behavior at all scales. Mandelbrot's book [Man77] is an influential early work regarding fractals. Pentland [Pen84] applied them to image processing as scene descriptors. A very different sense of scale is embodied by TINs, Triangulated Irregular Networks, proposed as an efficient way to model terrain elevation [FL79], but applicable to any surface. The level of detail provided by TINs critically depends on the number of triangles, or vertices, in the network. That is, scaling can be implemented by the addition or removal of vertices. A hierarchy of approximations in TINs has been proposed by Evans, Kirkpatrick, and Townsend [EKTar]. Finally, maps (road maps, topographic maps, etc.) give forth an interesting notion of scale. Maps typically provide a scale to indicate the relationship between the distances on the map and the corresponding distances in the real world. This notion of scale is identical to the change of coordinate system mentioned above. However, at the same time, maps contain symbolic representations that correspond to a very different notion of scale. For example, cities are represented by symbols that indicate the size of the city, in terms of number of inhabitants, irrespective of the geographical size. Similarly, rivers are generally represented by a line of a certain width, largely independent of the actual width of the river. 2.4 Cameras and Gamma Correction The origin of the term "gamma correction" goes back to photographic films. The parameter gamma was introduced to model the non-linear mapping from film exposure to optical density. Later on, gamma correction was used in electronic display devices to compensate for the non27 linear transfer function between video signal and screen intensity. The kind of gamma correction that is of interest here, performed by camera hardware at image formation time, came last. An extensive discussion of all these aspects of gamma correction is given by Poynton [Poy96]. The manuals and specifications of the camera we use (a Sony DXC 950) don't discuss gamma correction. They don't even give the value of gamma, which therefore has to be determined from the image data. An example of camera specific gamma correction (for a Kodak Megaplus XRC camera) has been mentioned by Martin [Mar93]. That camera allows for gamma correction with 7=0.45; 0.50; 0.60, or no gamma correction at all (i.e. 7=1.0). As another example, Lenz and Fritsch [LF90] report an experimentally determined value of 7=0.65 for the Panasonic WV-CD50 camera. There are books galore on cameras in general. Here, we mention only [Hol98a, Hol98b] as books that focus on cameras in CCD technology. A recent article dealing with the emergent CMOS image sensor technology is [BL00]. Two articles that discuss noise, calibration, and accuracy of CCD sensors are [LF90] and [HK94]. Feltz and Karim [FK90] analyze the modulation transfer function of CCDs and observe that it depends strongly on the alignment between the image and the sensor cells. 28 Chapter 3 Differential Invariants This chapter presents a family of differential invariants under similarity transformation and brightness changes. Special emphasis is given to invariance under gamma correction. For the sake of simplicity, invariants for functions of only one variable are introduced first, in section 3.1. In section 3.2, we extend the framework to image functions, i.e. functions of two variables. Section 3.3 shows how the proposed invariants can be implemented with Gaussian kernels. The chapter concludes in section 3.4 with a discussion of the computational complexity of the invariant computation and a short mentioning of related but less interesting invariants. 3.1 Differential Invariants for 1-d Functions In this section, we present differential invariants for functions of one variable. First, we deal with the case of similarity transformations and linear brightness changes, focusing on the issue of scaling. Later on, we cover the key issue of gamma correction. The 1-d function f(x), typically representing brightness, is assumed to be continuous and differentiable. 3.1.1 Invariants under Scaling As mentioned above, following Schmid and Mohr [SM96], we treat scaling as a change of variable. In 1-d, if f(x) is our function of interest, then f(ax), where a > 0 is the scaling factor, is a scaled 29 version of f(x). If we rename the scaled variable ax := u(x) = u and name the scaled function g(u), we can define •f(x):=g(u)=g(ax)- (3.1) The derivatives of f(x) with respect to x are f'(x) = ag'(u) (3.2) f"(x) = a g"(u) (3.3) fW(x) = a gM(u) (3.4) 2 or, more generally, i Now it can be seen that the following ratio which we call Q 1 2 is invariant to scale change because the scaling factor a cancels out. ° 1 2 { X ) - J ^ x ) - a V ( « ) " 1 2 ( } ( } Note that Q12 is a ratio of functions of x. That implies that ©12 (z) itself is a function of x, not a constant. The invariance holds with respect to scale, not with respect to location. The invariant @ can be generalized to 1 2 QSM(X)= F { K ) { X ) (3.6) where k, n 6 N refer to the order of the derivatives. Eq. 3.5 is a special case of eq. 3.6 where k = 2, n = 1. In the context of image processing, it is advisable to avoid high derivatives whenever possible. Therefore, this generalization is only of theoretical interest. The invariant ©12 deals only with geometric transformations. It does not consider any invariance under brightness changes. However, such an invariance can be desirable because it would ensure that properties that can be expressed in terms of the level curves of the signal are invariant [Lin94a]. It turns out that such a hybrid invariant under both scale and linear brightness 30 change, i.e. under both geometric and radiometric transformations, can be derived along the same lines as invariant © 1 2 by taking into consideration the third derivative. We define a linear brightness transformation of f(x) to be k f(x)+b where k is the brightness factor and b is an offset. This transformation can be seen as a rough camera model where k is proportional to the exposure time and to the aperture size, and the offset depends mainly on the dark current. Dark current consists of thermally generated electrons in the CCD camera and it is proportional to the exposure time [Hol98b]. For a much more sophisticated camera model see [HK94]. Our function under both brightness transformation and scaling is f(x) := kg(u) + b = kg(ax) + b (3.7) The derivatives of f(x) with respect to x are f'(x) = kag'(u) (3.8) f"(x) = ka g"(u) (3.9) f"'(x) = ka g"'(u) (3.10) 2 3 Now it can be seen that the following ratio which we call ©123 is invariant to both linear brightness change and scale change because both A; and a cancel out. ft M 0123(Z) 3.1.2 _ /'(*) / " ( * ) - - - n x ) 3 fc«V» ( f c a V ( t _ 0 ) a - 9'(u) g"'(u) _ „ 2 g {u) - 0«3(«) ( " ) 3 n Invariants under Gamma Correction Gamma correction is a common non-linear brightness transformation. Interestingly, it turns out that our framework of ratios of derivatives is able to provide invariants under this kind of transformation as well. In 1-d, the gamma correction of function f(x) is defined to be Mx):=pf(xy 31 (3.12) where p is a constant that is determined by the value of 7 . The key idea now is to consider, at least conceptually, the logarithm of f (x) 1 rather than f (x): 1 f (x):=\n{pf(xy)=\np + y\nf(x) 1 The derivatives of / (x) (3.13) with respect to x are 7 ; ; M = ( 3 . 1 4 , „ ^M£M__fML } . {x) (3 15) Now the following ratio, called © i , is invariant under gamma correction since both p and 7 2 7 cancel out. e 127 (cc) lf'{x)/f(x) = (3 1 6 ) l(f{x)f"(x)-f'{x) )/f{x) f(x) f{x) fix) f"(x) - f'(xY 2 2 There is no more logarithm in eq. (3.16). That means that, in the end, no logarithm of / 7 has to be computed. But note that eq. (3.13) assumes p > 0 and f(x) > 0. We also like to point out that the elimination of the factor p implies that © 1 2 7 is invariant under linear brightness change as well. This can be verified by replacing f(x) with f(x) = cf(x)' in eq. (3.16). We obtain /(*)/'(*) ( )= 0 X - cf(x)cf'(x) = f(x) f"(x) - f'{xY 12 = cf(x) cf"(x) - (cf'(x)y f(x)f'(x) f(x) f"(x) - f'(xY The square root function is a special case of the gamma correction power law, with 7=0.5. The invariance of © i under taking the square root is also straightforward to verify. Replace 2 7 f(x) with f(x) = in eq. (3.16), with / ' = f'/(2y/J), () — 0 x 1 2 7 ( ) ff' - f" = /7(2v7) - f'/^fVf)- ^fil ' _ ff " / ^ - / ^ v 7 ( ^ - ^ ) - ( ^ r / / ' ' - / ' 2 Similarly, we can show that © 1 2 7 is not invariant under offsets b since, generally, (/(*) +6)'/'(a) ^ (f(x) + b) f"(x) - f'( y x 32 r f(x) f'(x) f(x) f"(x) - f'{xY Then As before, we can extend the invariant in eq. (3.16) to be simultaneously invariant under both gamma correction and scaling. That is, we consider the function f (x) = ln{pf (x)"*) := Inp + ~f In g(ax) :=~g^u) (3.17) 1 The derivatives of f~,(x) with respect to x are /;(*) = ^ (3.18) 9JM^M_- '(3.19) {X)=W % ( x ) = ja ' Then the invariant ratio @123 ( ) = 1 X ' ©1237 ' sfg'"(u) g'(u)g"(u) , -g'(uf - 3 \r + 2^ V g(u) g(u)* g(uf Q (3.20) is X 7 = fW iag'(u)/g(u) W = (7« ((5W5"H-5'W )/5W ) g(u) g'(u)g"'(u) - 3g(u)g'(u) g"(u) + 2g'(u) _ 2 (g"'(u)/g(u) - Sg'ju) g"(u)/g(u) + 2 g'(uf/g(uf) 2 2 2 2 2 2 g(u) g»(u) -2g(u)g>(u) g»(u) 2 2 2 4 + g>(u)* 2 ^> 123 It is straightforward albeit cumbersome to verify eqs. (3.5), (3.11), (3.16), and (3.21) analytically for differentiable functions. As an example, we proceed to verify the equation (3.16) for the invariant ©127 for the arbitrarily chosen function f(x) = 3x sin(27ra;) + 30 The first three derivatives are f'(x) — 3 sin(27r:r) + 67ra; cos(27ra;) f"(x) = 127rcos(27rz) - 12?r a; sin(27ra;) 2 f'"(x) = -36TT sin(27ra:) - 24K x 2 3 COS(2TTX) Then 012 7 = (3xsin(27rx) + 30) (3a: sin(27rz) + 6TTX COS(2TT:E)) (3xsin(27ra;) + 30) (127rcos(27ra;) - 12TT :C sin(27ra:)) - (3 sin(27ra;) + 6nx COS(2TTS;)) 2 33 2 ( 3 2 1 ) If we assume, for specificity, that 7 = 0.45, p = 2 5 5 = 255 - , then the function with gamma 1-7 0 55 correction corresponding to f(x) is fi ( ) = 255°- (3a; sin(27rx) + 30)°55 x and the first three derivatives are %(x) = 2 5 5 - • 0.45 (3sin(27rx) + 3 0 ) - ° ( 3 sin(27ra;) + 0 f"(x) = -255 0 55 255 ' 0 55 55 5 5 • 0.45 • 0.55 (3 sin(27ra;) + 30) - (3 sin(27ra;) + _1 cos(27ra;)) 6TTX 55 6TVX COS(27TZ)) 2 + • 0.45 (3s sin(27ra;) + 30) - (127rcos(27rx) - 127r a; sin(27ra;)) _0 55 2 f!?(x) =255 - • 0.45 (3 sin(27ra;) + 3 0 ) ~ - • 0 0 55 55 (1.55 • 0.55 (3sin(27rz) + 30)~ (3 sin(27ra;) + 67rx COS(2TT3;)) 2 3 + (-3) • 0.55 (3 sin(27ra;) + 30) (3 sin(27ra;) + Qnx COS(2TTO;)) (127rcos(27ra:) _1 127r zsin(27ra:)) + (-36TT sin(2?ra;) - 247r a:cos(27ra;))) 2 2 3 Then _ 255°- (3ic sin(27ra;) + 30)°- 255 • 0.45 (3 sin(27ra;) + 30)-°- • • • ~ 255°- (3xsin(27ra;) 4 30)°- (-255 • 0.45 • 0.55 (3sin(27ra:) + 30)" - . . . . . . (3 sin(27rx) 4 6irx cos(27rx))... . . . (3sin(27rx) 4 67ra;cos(27rx)) + 255° • 0.45 (3ar sin(27rx) 4 30)-°- . . . ...1... . . . (127rcos(27ra:) - 127T a;sin(27ra:))) - (255° • 0.45 (3sin(27ra;) + 30)-°- . . . ... 1 45 55 " 1 2 7 55 05 5 45 55 05 5 1 5 5 2 . . . (3 sin(27ra:) 4 67ra; cos(27ra;))) _ ~ 55 5 5 2 . 55 55 2 (3sin(27ra;) + 30)~ (3sin(27rx) + 6KXCOS{2TTX))... + 30)°- (-0.55 (3a;sin(27rx) + 30)- - ... 01 45 (3xsin(27rx) 1 55 ...l... ... (3 sin(27ra;) ... + fax (127rcos(27rx) + (3a; sin(27ra:) 4- 30)-°- . . . ...I... - 127rai sin(27rx))) - 0.45 (3a; sin(27ra;) 4 30)" ... cos(27ra;)) 55 2 11 2 l ... (3 sin(27ra;) + 67:2; cos(27ra;)) _ >(3a;sin(27ra;) + 30) (3a;sin(27ra;) + 67ra;cos(27ra:)) (3a: sin(27ra;) 4 30) (127rcos(27ra;) - 12TT Xsin(27rx)) - (3 sin(27ra:) 4 fax L 1 1 2 _ 2 cos(27ra;)) 2 which is identical to the previously computed © 1 2 7 of the function before gamma correction. The algebraically inclined reader is encouraged to verify the invariant 34 ©1237 for the same function. The Inverse Mapping It is instructive to ask which functions map onto the same 0i2 7 This is the inverse of the invariant 0i2 , what can be said about the function / that gave rise to it? And if both fi and fi induce the same invariant ©i2 , can we conclude that f\ = p fl for some p, 7? Is it possible to reconstruct / from ©i2 , given the values of p, 7? That would be a completeness result in the sense that two image functions have the same invariant representation ©i2 if and computation: given a certain 7 7 7 7 only if they are related via gamma correction . Unfortunately, the proof of such a completeness 1 theorem remains elusive in this generality, but we can demonstrate completeness if we restrict the space of admissible image functions. We can solve eq. (3.16) for / , obtaining the differential equation /(e Note that 0i2 . 7 ©i2 is not a constant 1 2 7 /"-/')-Qi2 /' = 0 2 7 (3.22) but a function. The solution for eq. (3.22) clearly depends on 7 Like many inverse problems, eq. (3.22) has no unique solution and therefore is ill-posed. In fact, by the very nature of an invariant, it is obvious that an invariant induces a many-to-one mapping, so that eq. (3.22) defines a whole solution space. ©i2 as defined implicitly by eq. (3.22) is undetermined in by 0i2 can be studied in more detail for restricted cases. For Even though the inverse of general, the constraints imposed 7 7 a first example, let us assume that the 1-d intensity, function / is a polynomial of degree 2 2 that may have undergone gamma correction: That is, f(x) = p(a x + a\x + a^p > 0, with 2 2 f'(x) = py(2a x + ai)(a x - a x-\-a )' ~ , 2 2 2 y r 1 0 1 f"(x) = 2pja (a x 2 2 2 + ax + a ) x 0 7 _ 1 +^7(7- l)(2a a:-|2 T h e term completeness is also used (e.g. in [Hil93]) for sets (or systems) of invariants where it refers to the property that all possible invariants for the transformation of interest can be derived from the given invariants. T h e Weierstrass approximation theorem of 1885 asserts that any continuous function can be uniformly approximated over a closed interval by polynomials of a sufficiently high degree. 1 2 35 ai) (a x 2 2 2 e 127 + a\x + a ) 7 and a > a /(4a ) to assure f(x) > 0. Then 2 2 0 0 /(*)/"(*) - / ' ( * ) 2 2 p j(2a x + a )(a x 2 2p 7a (a a; + a i z + a ) 2 2 2 2 + axx + a ) ~< 2 2 1 2 7 _ 1 0 2 2 + P T ( T - l)(2a :c + a i ) ( a z + a^x + a ) ~ 2 2 2 ...1 . . . - p 7 ( 2 a x + a i ) ( a a ; + aix + 2 2 2 2 1 0 2l 0 2 .. (3.23) a) ~ 2 2l 2 2 2 0 2 _ 2 a x + 3a aia; + ( a + 2a ao)x + aiao 2 3 2 2 2 2 —2a x — 2a a\x + 2a ao — a 2 2 2 _ n\x + n x z 2 2 2 2 + n^x + n.4 d\x + d x + d 2 2 3 That is, the coefficients of 0i2 can be expressed in terms of the coefficients of the polynomial / 7 as 7li (0 2a\ = (ii) n = 3a ai 2 2 (iii) n = 2a a + a 3 (iv) 2 714 = a i a 2 0 (3.24) 0 (v) dl = - 2 a 2 (yi) d = —2a i a 2 2 (vii) d = 2a ao 3 2 — «i This system of equations is overdetermined. Even the numerator is overdetermined, i.e. not every polynomial of degree 3 can be written as the product of a polynomial of degree. 2 and its derivative. We can solve eq. (3.24) for a , a , ao from (i), (ii), (iv) to give . 2 x (i) a 2 VW 2 (ii) a i = n / ( 3 v W 2 ) (3-25) 2 (iii) ao = Zy/ni/2 36 n$jn 2 with the constraints (ci) ' n = 2a a + 3 2 = 3 n 7 i / n + 2n2/(9ni) 0 1 4 2 (cu) ^ = -2a\ = -nx (3.26) (cm) d = -2a a 2 2 = -2n /3 x 2 (civ) d = 2 a a - a\ = 3rcin /7Z2 3 2 0 4 2n\l(9ni) Eq. (3.25) determines uniquely the coefficients of the original polynomial of degree two. Furthermore, according to the laws of algebra, ratios can be expanded as a/b = (k • a)/(k-b). Therefore, if we multiply n\ ... ds by a factor p, the invariant © i remains unchanged, and by eq. (3.25), the 2 7 coefficients a , ai, ao all get multiplied by a factor ^Jp. Such a multiplicative factor is compatible 2 with the assumption that / is reconstructible up to gamma correction, and the constraints from eq. (3.26) also continue to hold. It is also straightforward to show that polynomials of degree higher than two cannot match the © i 2 7 from eq. (3.23). If we try to find a polynomial of degree three that matches it, we see that f(x) = C3X + c x + c\x + c ,'f'(x) 2 3 2 0 = 3c3X f(x) f'(x) = 3 c 3 £ + 5c c a; + (4ciC3 + 2c )x 5 4 2 2 3 + 2c x 2 2 2 + ci, f"(x) — Qc^x + 2c , which results in 2 (2coc + c\)x + c c i . Comparing this expression + 2 0 to eq. (3.23), it follows immediately from the leading term that C3=0, reducing the polynomial to degree two. An analogous argument can be made for any other polynomial of degree larger than two. We can summarize this example by stating that the invariant © i 2 7 is complete if the space of admissible image functions is a priori limited to polynomials of a certain degree. For a second example, we assume an admissible space of sinusoid functions that may have 3 undergone gamma correction.. For simplicity, we restrict the analysis to the sum of one sine and a constant. That is, let f(x) — p(ao + «i sin(o;a; + 6)) > 0, with• f'(x) — pjuai COS(UJX + b)(ao + 7 ai sin^z + fc)) , f"(x) = -p~fu ui sm(ux-rb)(a 7-1 2 b)(a + ai sin(wa; + 6 ) ) 0 7-2 + ai sin^z + fc)) 7-1 0 +^7(7-l)u a\ cos (wz + 2 2 , and a > | « i | to assure f(x) > 0. Then 0 Sinusoids are often used as basis functions in image processing applications. Their properties are well understood (compare e.g. the Fourier transform). 3 37 ©127 f(x) f"(x) - f'(x) 2 p fuja\ cos(u>x + b)(ao + a\ sin(wx + 6)) " "" . . . 2 2 1 1 —p ju) ai sin(wz -f b)(ao + a i sin(wa; + 6)) T~ . . . 2 2 2 1 ...l... • • • + P j(l — l)u a cos (uix 2 2 2 . + b)(a + aisin(u)x + b)) ~<~ ... 2 2 2 0 ^ ... — p 7 w a 2 2 2 2 COS (LUX 2 _ + (3.27) 6)(ao + a\ sin(wa; + 6 ) ) T 2 - 2 cos(wa: -f b)(a + a\ sm(ux + b)) 0 —LO sin(w2; + 6)(ao + a i sin(wx + b)) + (7 — l)uja\ cos {wx + b) — 70101 cos (wa; + 6) 2 2 _ cos(cux + b)(a +aisin(ujx + b)) 0 —u(ai + ao sin(wa; + 6)) Now, given another sinusoid f(x) = a + a"i sin(aJa; + b), can parameters a , ai, U, b different from 0 0 ao, ai,u>, b result in the same invariant Qi2 (a;) as given in eq. (3.27)? The orthogonality property 7 of the sinusoids implies that the identity cos(u;a: + b) (ao + a\ sin(u;a; + 6)) _ cos(wa; + b) (ao + «i sin(cJa; + 6)) -u(a +aosm(tox-\-b)) -w (a + a sin(ux + b)) x x (3.28) 0 can only be fulfilled if u = u and 6 = 6. That leaves us with 4 ao + ai sin(ux + 6) _ ao + ~d\ sin(a;a; + b) ai + ao sin(a;x + 6) ai + ao sin(wa; + 6) (3.29) which can be fulfilled if and only if (ao,ai) = p(a ,ai). Again, such a multiplicative factor p is 0 compatible with the assumption that / is reconstructible up to gamma correction. Therefore, the parameters from eq. (3.27) determine the original image function. Similarly to the polynomial case, we can show that eq. (3.29) cannot be matched by a sum of more than one sine and a constant, i.e. by a function f(x) = ao + Y17-1 a;sin(u;;x + 6;) where n > 1, a{ ^ 0, u>i > 0, and 7^ Uj for i ^ j. This follows immediately from the orthogonality, property of sinusoids. In analogy to the first example, we can summarize this example by stating that the invariant 0 i 2 i complete if the space of admissible image functions is a priori limited to a sum of sinusoids. s 7 Since any differentiable, bandlimited function can be decomposed into a sum of sinusoids, this 5 example actually covers the kind of image functions that we assume. 4 5 ignoring trivial substitutions such as cos(a;) = sin(z + TT/2) implying that the function is also periodic 38 In section 3.1.3, we modify © i 2 7 slightly to cope with arguments where the denominator of the invariant is zero. That modification implies that, strictly speaking, completeness is lost, but it is only an ambiguity between numerator and denominator, i.e. a two-to-one mapping, that is introduced. In section 3.2, we extend © 1 2 7 to 2-d in a rotationally symmetric way. This means that 0 i 2 7 contains information about the gradient magnitude, but the gradient direction is lost. That- implies that the original 2-d image function is not reconstructible from In this section, we have assumed that if we compute 0 i 2 7 @i2 7 © i 2 7 given p, 7. , is available to us in an analytic form. However, on a discrete, sampled image, © i 2 itself will consist of discrete samples. 7 Determining a continuous function from samples is generally ill-defined and requires assumptions about the smoothness of the data or a model for the image function. Therefore, we cannot expect to robustly extract 0 i 2 7 from images in an analytical form. Any function of an invariant is itself an invariant. For example, ©127 : = (©12 ) 7 2 = ^ 2 2 _ - 2 + . 4 '• (3-30) is an invariant under gamma correction as well. So we can ask whether © 1 2 7 is minimal in some sense, i.e. whether there are invariants under gamma correction that are simpler than the proposed © 1 2 7 . No algebraic simplification like factoring suggests itself for © 1 2 7 - Furthermore, the design of © 1 2 7 in eqs. (3.13)-(3.16) suggest that both the first derivative and the second derivative are necessary to eliminate both p and 7. Therefore, we conjecture that there exists no differential invariant under gamma correction that does not include a second derivative or higher . 6 To finish this subsection, we remark that there are infinitely many functions that can result in a specific value of © 1 2 7 at a specific location. A single value of © 1 2 7 necessarily, contains only little information about the scene. But the invariant representation is defined everywhere, and An anonymous reviewer /of [Siear] suggested to calculate the histogram of / ' / / and to normalize it to some predefined standard deviation. Now, j - = ^ ~ ^ =7^-. That is, / / / is linear in 7, and this linearity is addressed by the normalization step. But note that our invariant is a local one where the parameter 7 has been eliminated by a computation that involves only the immediate surrounding of the point of interest. By contrast, this alternative invariant relies on a normalization step that involves the global computation of derivatives at several locations, possibly the whole image. Such a global normalization is sensitive to occlusion, changes in background, etc. 6 1 p 7 7 39 7 therefore it can be sampled densely. Ultimately, it is a set of invariant representation points, organized for instance as a template or as a histogram, that reveals the structure of the corresponding image object and allows to distinguish them from each other, as demonstrated in chapter 4. 3.1.3 Modification of the Invariants The invariants introduced above have the disadvantage that the denominator can be zero, rendering the invariants undefined at those points and making them possibly very large in the neighborhood of those poles. Therefore, the invariants are modified as follows. Whenever the modulus of the numerator num is larger than the modulus of the denominator denom, they are swapped. If both numerator and denominator are simultaneously zero, the invariant is defined to be zero, which typically happens to be a continuous extension at that point. Note that these modifications result in - 1 < 9 < 1. If, for the sake of notational convenience, we drop the variables x and u from the equations, the proposed modified invariants are given by 0 @m!2 — \ if 2 if | / ' | < | / " | f'lf else' if / ' / " ' = 0 and /•t/2 U'f")/f" if l/7'"l<l/" l f" /U'f") else 2 0 if / / ' = 0 and / / " - (//')/(//"- lff"-f' )/(ff) 2 f' ) 2 (3.32) 2 2 \ (3.31) 2 2 0 ©ml27 = = 0 and / " = 0. f /f" 2 ®m!23 — < f' if else 40 <•!//"-/''T f 2 =0 (- ) 3 33 0 © mi23 7 = if condl . \ (f f'f"> - 3 ff' f" + 2 f' )/(f f" 2 2 4 ^ (Pf" - 2 ff' f" 2 where condl is (f f'f" 2 2 - 2 ff' f" 2 2 + f' )/(f f'f" 2 - 3ff' f" 2 4 2 = 0 and / / " - 2 / / 4 4 - 3 ff' f" + 2 f' ) 2 + 2 f' + f' ) 2 2 4 / 2 if cond2 (3-34) else / " + / ' - 0) and cond2 is 4 ( | / / 7 ' " - 3 / / / " + 2/' | < | / 7 " - 2 / / ' 7 " + / ' | ) . 2 , 2 4 2 4 The case where both the numerator and the denominator are exactly zero can be encountered fairly often in analytical mathematics, but it is a rare event when dealing with noisy, sampled data. The invariants could be modified such that 0=0 if num < e and denom < e, but this yields no advantage in practice. It is in the nature of differentials that the proposed invariants are hard to compute robustly where the derivatives are simultaneously zero. The use of a threshold e cannot solve this problem, but it poses the problem of a good choice for the size of e, and it introduces non-linear effects right at that threshold. It should be noted that the above modifications map num/denom to the same value as denom/num for any pair (num, denom). That is, the modified invariants contain information about the relative size of (num, denom) to each other, but the information which of the two was larger is discarded. This introduces a potential two-on-one mapping from image functions onto the invariant. However, a differentiable function that leads to (denom/num) may not even exist. An alternative modification would be to replace ratios of the type r = / 1 / / 2 with a difference over sums, i.e. with R — (f\-f ) 2 i? is undefined only if both /1 and f are zero. Common / (f\+f ). 2 2 factors of /1 and f cancel, and common additives cancel in the numerator and may be negligible 2 in the denominator. This would result in modified invariants like for example f' 2 M12 Q &M123 with -00 < 0 M i 2 i 0 M i 2 3 < 00• = = f 2 2 f' l f l „ - f" , „ + f" (3.35) r + f l l 2 (3-36) That is, the number of poles has been reduced, but they may still occur. We prefer the modification that bounds the invariants in the [—1,1] interval, but it 41 should be pointed out right away that no matter which modification is chosen, there remains a problem where both the numerator and the denominator are zero or close to zero. Divisions by zero can be avoided, but numerical instabilities in the presence of noise can not. Remember that the invariants are based on derivatives, that is on spatial changes of the function. If there are no such changes, no robust computation of derivatives can be expected. A visualization of the function f(x) = 3x sin(27ra;) + 30 in the range [4,9.5], the same one as used in section 3.1.3, its derivatives and its invariants is given in fig. 3.1, on the left, and contrasted with the same function after gamma correction, 7 = 0.45, on the right. The influence of the gamma correction on the values of the derivatives is clearly visible. Notice also that Q i2-y and © O i237 m are basically identical on both sides, while 0 T O i2 shows small differences and m m i 3 2 shows large differences. 3.2 Differential Invariants for 2-d Functions In image processing, we are naturally interested in 2-d functions rather than in 1-d functions. Therefore, in this section, we extend our approach to image functions I(x,y) of two spatial variables. Again, we assume that I(x,y) is continuous and differentiable. In two dimensional space, the elementary derivatives are partial derivatives with respect to a certain variable, usually x or y. The 1-d concept of derivatives carries naturally over to partial derivatives, but they are directionally dependent. In the introduction, we stated our goal to achieve invariants under similarity transformations and certain kinds of brightness changes. Again, similarity transformations consist of translation, rotation, and scaling. Rotational invariance was not an issue in 1-d, but in 2-d it has to be addressed. The standard way to obtain rotational invariance is to use rotationally symmetric operators. Such rotationally symmetric differential operators have been intensively studied. Before we proceed, we will introduce some notation. In analogy with eq. (3.1), we define the scaled image 42 Figure 3.1: A 1-d example function, its derivatives and its invariants. The original function, (left) before gamma correction and (right) after gamma correction. (first row) 1-d function; (second row) first derivative; (third row) second derivative; (fourth row) third derivative; (fifth row) 0 ; (sixth row) 0 i ; (seventh row) 0 i ; (eighth row) 0 i . m l 2 m 2 3 m 43 2 7 m 2 3 7 function where both variables are scaled simultaneously by the same factor a as I(x, y) := J(u, v) = J(ax, ay) (3.37) We will omit the variables whenever this can be done without causing confusion. We will use the notation 1^ to denote the i th partial derivative of I, which is a shorthand for I ... . That is, 1^ Zlt denotes both I = di/dx and I — dl/dy, 7( ) x or I = d I/dy , 2 yy y 2 c a n 2 stand for I = d I/dx 2 xx 2 or I Zi = d I/(dxdy) 2 xy and so on. Later on, we will also use the notation /•(*) to denote the i th rotationally symmetric derivative of 7, which is introduced below. 3.2.1 Rotationally Symmetric Operators In this section, we present some rotationally symmetric operators that can be found in textbooks [Hor86]. As for the first order derivative, there is only one rotationally symmetric operator, which is the well known gradient magnitude. It is defined as \V(x,y)\ = ^/l 2 +I :=/' 2 (3.38) Note that the gradient magnitude is not a linear operator. For the second order derivative, we can use the Laplacian which has been well analyzed in 7 the computer vision literature [MH80, TP86]. V (x,y) = I +I 2 xx :=I" yy (3.39) Horn [Hor86] also presents an alternative second order derivative operator, the quadratic variation QV(*. y) = y/ii + 2i 2 x +i (3.40) 2 y y Horn actually defines the quadratic variation without the square root, but for reasons that will become apparent below, the square root is needed here. The Laplacian is linear but the quadratic variation is not. Also, the Laplacian is easier to compute than the quadratic variation. Therefore, we prefer the Laplacian over the QV as second order derivative operator. The established notation V is somewhat unfortunate since it could be confused with the square of the gradient V. Here, we always deal with the gradient magnitude, so V is the Laplacian and | V | is the square of the gradient magnitude. T 2 2 . 44 2 For the third order derivative, we can define, in close analogy with the quadratic variation, a cubic variation as CV(x, y) = ^P + ZP xxx + 3P xxy + P xyy (3.41) :=/" yyy Like the quadratic variation, this is not a linear operator. The important point here is that we can use these 2-d differential operators to extend the invariants from section 3.1 to 2-d in a rotationally symmetric way. All that has to be done in eqs. (3.31)-(3.34) is to replace / ' with the gradient magnitude, / " with one of the second derivative operators ( V or QV), and / ' " with the cubic variation. Then, for example, the invariant 0 2 1 2 from eq. (3.5) can be realized in 2-d either as I' |V(z,y)| 2 7x + /y 2 2 2 (3.42) 1 yy or as " &i2{x,y) 2 l V(x, )| _ QV(x,y) 7 +/ 2 2 y = — = Ip + 2 / A/ xx ~ ± 2 (3.43) 2 + / 2 xy i yy x Similarly, 0 i 2 from eq. (3.33) becomes m 7 0 if / / ' = 0 and II" - V = 0 2 0 m l 2 = < (II')/{II" - I' ) 2 7 (II" - I' )/{II') 2 if \II'\< 177" - (3.44) 7' | 2 else and one way to realize it is 0 ©ml2 = \ 7 if/• |V| = 0 and/• V - | V | = 0 2 (/•|V|)/(/.V -|V| ) 2 2 2 (3.45) if|/-|V||<|7-V -|V| | 2 (/•V -|V| )/(/-|V|) 2 2 2 else, which, in terms of partial derivatives, is 0 ©m!2 7 —\ if / yZ/ + Iy = 0 and / (I 2 XX + ^) = 0 ni^T^\<\i(i +i )-(i +i )\ [ i(J i^Hi$) + /(/«+/vy)-(^+5i + Iyy) - (IJ xx else 45 yy 2 2 (3.46) 3.2.2 V a l i d a t i o n of the 2-d Invariants Why is it possible to replace the 1-d derivatives in eqs. (3.31)-(3.34) with their rotationally symmetric 2-d counterparts without losing the invariant properties? question, recall how the 1-d invariants were derived. For 0 1 2 In order to answer this in eq. (3.5), the critical property is described by eq. (3.4). That is, the first order derivative of / has to return a factor a if / is scaled by a, and the second order derivative of / has to return a factor a , so that a cancels out 2 when we take the ratio f' /f" 2 (or rather g' /<?")• We require the same property to hold for the 2 2 proposed 2-d operators, that is, the 1-d scaling equation (3.4) should, in 2-d, become I (x,y) =a {i) J (ax,ay) { (3.47) {i) We will proceed to show the validity of eq. (3.47) for each one of the proposed 2-d operators, starting with the fact that partial derivatives act like 1-d derivatives with respect to scaling: I(i)(x,y) = a- (3.48) J (ax,ay) (i) That is, the partial derivatives J and J along the main axes return a factor a, meaning that x J (ax, ay) = al (x,y), x and x J x x x , J x x y y and J (ax,ay) , Jxyyi Jyyy = al (x,y). y y Similarly, J , J, xx xy J return a factor a , 2 yy return a factor a . Then the first order rotationally symmetric operator 3 (3.49) a\V(x,y)\ returns a factor a. Similarly, V (aa;, ay) 2 J (ax, xx ay) + Jyy (ax, ay) a I (x,y) 2 xx + (3.50) a I (x,y) 2 yy a• V (x,y) 2 2 returns a factor a . The same happens with the quadratic variation 2 QV(ax, ay) = ^J (ax, 2 x ay) + 2J% (ax, ay) + J (ax, 2 y y ay) (3.51) = a 2 QV(x,y) 46 Analogously, the cubic variation CV(ax, ay) = ^/J (ax, 2 xx = ^ I e = 2 ay) + 3J% (ax, ay) + ZJ (ax, 2 xy yy ( x , y)'+ 3 « Il {x, y) + 3a I (x, 6 x x 6 xy 2 yy ay) + J (ax, 2 yy y) + a I (x, 6 ay) y) 2 yy (3.52) a CV(x,y) 3 returns a factor a . ' 3 Now we can also subject the input image I(x,y) or I(ax,ay) to a linear brightness transform, resulting in an input image ki + b, with k the brightness factor and b the offset. Going through the same equations (3.49)-(3.52) with the modified input image, it is easy to verify that the output of each of the 2-d operators will be transformed by the same factor k, in strict analogy with the 1-d case. This ensures the validity of the invariant © 1 2 3 in 2-d. As for © 1 2 7 , this invariant depends on the derivative operators returning a factor 7 when applied to the logarithm of an image function with gamma correction. Let I(x, y) = ln(pl(x, y) ) = 7 lnp + 7 In I(x,y). Then the gradient magnitude operator applied to I(x,y) yields + 3 = V| = ^4)' \A + i 2 2 + (7^) 2 = 7 i (3.53) I returning a factor 7. Similarly, for the Laplacian XX +7 P yy P = 7 I(Ixx + lyy) 72 I x ly (3.54) returning a factor 7 as well. Analogously, QV = ^P xx + 2P + P xy yy (3.55) (II - ID + 2(1 I y - I Iyf.+ 2 XX x 47 X (Ilyy - Iff and P V - */f 2 ^ A- y xxx < (( "\T 2 (Ixxx X = yy x V . ^^xW 2 XX T T -LOT T yy i o( /11xxx T 3I I 72— X l^xxy X -\- I Iy xx ^ y\\ , (^,( yyy 1 + 2/ s2 73-) +••• m — or r y yy , - — + 0I \\2 1I I x y + ~J3~>) fJ r 2 1 O f y\\ J3)) 3 2 + X / 1 2 (3.56) \ j • 3 + x - + XX 2I I y Wi / 2 lxl p ( _ 3 ?T T i ~ tixyiy lxl m - ^ 7 2 xyy * yyy JT- + JT)) "+ T 3 -\- f 2 3I I = [vfy-J o (-.( if 4- xxy ' also return a factor 7. The invariant © i 3 is validated along the same lines by showing that | V | returns a factor 2 7 ja, V and QV return a factor 7 a , and C V returns a factor 7 a when applied to J(ax, ay), in 2 2 3 close analogy with ©123. 3.3 Implementation of Differential Invariants While the derivatives of continuous, differentiable functions are uniquely defined, there are many ways to implement derivatives for sampled functions. We follow Koenderink [Koe84], Linde- berg [Lin94a], Schmid and Mohr [SM96], Elder [Eld99], and many other researchers in employing the derivatives of the Gaussian function as filters to compute the derivatives of a sampled image function via convolution. 3.3.1 Gaussian Kernels In 1-d, the zero mean Gaussian function is defined to be G(x;a) = -jLe~£z V27T<7 (3.57) G(x,y,cr)=^e-^ (3.58) In 2-d, it is The analytical derivatives of the Gaussians are given in appendix A and appendix B, in 1-d and 2-d, respectively. The 1-d Gaussian and its derivatives are shown in fig. 3.2, while the partial derivatives of the 2-d Gaussian are shown in fig. 3.3. 48 c ' 1 1 I 1— 0.2 - 0.15 - 0.1 - 0 . 0 5 - N \ \ \ ^ ° 0 . 0 5 .•' \ s < X . - -8 -6 -4 -2 . O - 2 4 6 8 Figure 3.2: The 1-d Gaussian and its derivatives, a = 2.0. (solid) Gaussian; (dashed) first derivative of Gaussian; (dash-dotted) second derivative of Gaussian; (dotted) third derivative of Gaussian. We mention in passing that two Gaussian filters can be combined into a single Gaussian, e.g. for convolution of two Gaussians in 1-d, we have G(x, tn) © G{x, a ) = G{x, 2 3.3.2 + u\) (3.59) Convolution Given the Gaussian kernels, the computation of the derivatives is done by convolving the image function with the appropriate kernel. In 1-d, in the continuous domain, we have oo /(''>(&)= J G {x-x)f{x)dx {i) = G ®f (3.60) (l) —oo where G^ is the i th derivative of the Gaussian, and © denotes convolution. In the discrete domain, we approximate eq. (3.60) by the finite sum n fW(X)= G^{X-X)f(X) = G ®f {i) (3.61) X=—n where the uppercase variables denote the discrete domain. Note that the Gaussian and its derivatives converge quickly towards zero, which makes the truncation error of the sum in eq. (3.61) negligible for reasonable kernel sizes. A kernel size of 7a is generally deemed to be sufficiently 49 large. Some example images and their invariants are presented in fig. 3.5. The example images have been prefiltered with a Gaussian with a = 1.0. The Laplacian has been employed as the second order derivative operator. In 2-d, eqs. (3.60) and (3.61) become CO CO (i){x,y)= J J G {x -x,y-y) I{x, y) dxdy = G I {i) ®I (3.62) = G ®I (3.63) (i) — CO — C O and n m E I (X,Y)= {t) G (X -X,Y (i) -Y)I(X,Y) (l) X~—nY——m respectively, where 1^, G^ denote partial derivatives. invariants in 2-d. For example, 0 i 2 m Eq. (3.63) allows us to compute .the from eq. (3.46), employing the | V | and the V operators, 2 7 can be computed as 0 ©ml27 — \ if condl W(I®G )2+(I®Gy) ' I(I®G +I®G )-((I®G ) +(I®G ) ) 2 yy xx 2 x + (/ © G ) ) (I © G ) 2 x + (I ® G ) )\. 2 y c e l o n a • z + (I®G ) ) (3 \ - 641 J 2 x y g e ^{I®G ) +(I®G ) 2 2 x ®G) 2 x = 0), and cond 2 is \I 2 y 1 1 y - ((I®G y yy I G) nnn 2 x I(I®G +I®G ) where condl is the condition (I ie AO 2 x xx y + (I ® G ) 2 y ®G) 2 x = 0 and I (I ® G xx + (I ® G ) 2 y + I ®G) yy \ < \I {I ® G xx - ((/ © + I ©G ) yy - Note that eq. (3.64) is completely specified in terms of convolutions of the input image with Gaussian kernels, with the a of the Gaussian being the only parameter. It is instructive here to revisit the scaling equation (3.48) for a close look at how the derivatives of scaled images I, J are related when the derivatives are implemented using Gaussian kernels. Exploiting the symmetry of the kernels and using the integral form, eq. (3.48) can be written as CO jjI(x,y) CO G(i)(x,y;a) dx dy = -a* jjJ(u,v) — CO ; G^(u,v;aa) dudv (3.65) — o o which is basically the form used by Schmid and Mohr [SM96, SM97]. The important observation here is the "hidden" parameter a in the partial derivative of the Gaussian G^)(u, v; aa): in order 50 for eq. (3.65) to hold, the size of the Gaussian kernel has to be adjusted according to the scaling factor a. Put another way, the value of the derivatives depends on the filter size, and only if the filter sizes on both sides of the equation are properly adjusted, eq. (3.65) will be true. Note that this is not a particular problem of the Gaussian but a necessary consequence of the fact that we are dealing with sampled images where the concept of an e-environment, employed in analytical mathematics to define derivatives, doesn't hold. 3.3.3 Combined Scaling Operators The proposed implementation of the 2-d operators, as for example in eq. (3.64), proceeds by first computing the partial derivatives along both axes via convolution and then combining these according to the structure of the operators used. A question that comes up here is whether it is possible to combine the partial derivatives of the Gaussian into "one piece" operators, reducing the computations to one convolution per order of derivative and giving additional insight into the nature of the operators. As an example, let's have a look at the gradient magnitude in eq. (3.38). With the.Gaussian kernels, it gets implemented as ®.G ) 2 X + (I ® G ) .- The idea now is to define a combined 2 v operator, the gradient magnitude of Gaussian, GradoG(x,y) = y G ^ + G 2 / (3.66) = \y/x +y G 2 2 and then to compute the first derivative as the convolution I © GradoG. Analogously, we can define the Laplacian of Gaussian LoG(a:,y) = G + G,yy xx = - (x (3.67) + 2 4 y -2^)G 2 the Quadratic Variation of Gaussian QVoG(x, y) = ^G 2 + 2G 2 x +G 2 xy yy (3.68) 2)2 51 G and the Cubic Variation of Gaussian CVoG(x, y) = ^G + 3G 2 + 3G 2 xxx 2 xxy = ^((3a x-x ) 2 3 +G 2 xyy yyy + 3(a y-x y) 2 2 3(a x-xy ) 2 2 + 2 2 + 2 (3.69) (3a y-y ) ) G 2 3 2 1/2 These combined derivative operators are shown in fig. 3.4. Their rotationally symmetry is obvious. We observe that the Laplacian of Gaussian is the only operator that has both negative and positive parts. All the other operators are non-negative. Unfortunately, it turns out that we can not replace / ' , / " , / " ' in eqs. (3.31)-(3.34) with these combined operators and still maintain the invariants. The reasons for this are as follows. • Only the Laplacian of Gaussian is a linear operator, so only for that combined operator we have I © LoG = I ® G xx + I ©G . yy The other combined operators don't commute with convolution. That is, I © GradoG f y/(I © G ) 2 x + (I © G ) , 2 y and similarly for QVoG and CVoG. • The non-negativity of GradoG, QVoG, and CVoG means that they return a non-zero response to a non-zero constant input. However, a zero response to constant regions is critical for most of the proposed invariants. The invariants under gamma correction, 0 i 2 m 0mi237) and 7 rely on the derivative of a constant, namely lnp, to be zero. Similarly, for O i 2 3 , m the response to a brightness offset should be zero. • The non-linear combined operators don't allow any general predictions of the output under scaling, as we did in section 3.2.2 for the validation of the invariants. That is, from the structure of the operator alone we cannot predict how the output of applying the operator to I(x, y) compares to applying it to I(ax, ay). 52 53 54 3.4 Summary 3.4.1 The Whole Family In this chapter, we have so far discussed four invariants that combine invariance under some geometric transformation with some brightness transformation. We have left out some less interesting invariants that we are going to mention here for the sake of completeness. Table 3.1 categorizes systematically the invariants under subsets of the transformations of interest. We assume that all invariants are invariant under translation (since derivatives are translationally invariant) and invariant under rotation (since only rotationally symmetric operators are employed). This leaves six invariants to be considered: three choices with respect to brightness changes (no invariance under brightness changes, invariance under linear brightness changes, invariance under gamma correction), with or without invariance under scaling. no be e no sc sc 0 Gl2 lb Goi ©127 ©123 ©1237 gc Table 3.1: Invariants under similarity transformations and brightness changes. bc=brightness change, lb=linear brightness transformation, gc=gamma correction. sc=scaling, The invariant ©n, invariant only under translation and rotation, is trivial. Any of the rotationally symmetric derivative operators introduced in section 3.2 will do. In fact, one could even employ the original image function itself, or the original function filtered with a 2-d Gaussian. The invariant 0 i , which requires additional invariance under linear brightness change, could be O defined, in 1-d, as e »'W = 7«U ' (3 70) where i ^ j and i, j G N, with @oi = / / / ' being the simplest choice. The extension to 2-d can be done in the same way as above, using the Gaussian, the Gradient Magnitude, the Laplacian, etc. Since an invariant can not be simultaneously invariant under both linear brightness change and gamma correction, 0 3 1 2 and ©1237 can be seen as maximal invariant under the transformations proposed. 55 3.4.2 Computational Complexity We finish this chapter with a look at the computational costs of obtaining the 2-d invariant representations. The computational costs of a whole invariant based retrieval system is given in section 4.3, after the histogram based retrieval scenario has been introduced. Recall that the invariants are implemented via convolutions with partial derivatives of the Gaussian. For example, _ I'' I" _ h 2 1ii + Iy + G) G 2 2 x = I lyy © xx or compare eq. (3.64) for an implementation of @ i2 m 7 + (/ © G ) J Gyy 2 y + © The computational costs are mainly incurred by the convolutions, plus a few operations to combine them. The theoretical upper bound for general convolution is achieved by means of the convolution theorem which states that convolution in the space domain can be done via multiplication in the frequency domain. In turn, the transformation into frequency space and back can be done via the Fast Fourier Transform (see e.g. [Jai89]). Let the input image be of size mxn, and let the filter size be smaller than the image size. Then the cost of computing the F F T of the image is 0(mn logmn), and this cost is the dominating contribution. That is, the computational complexity of each invariant computation is CC = 0(rrin\ogmn) (3.72) In practice, for small filters, it is faster to do the convolution in the space domain. In this case, smaller filters allow a faster computation. The cost of general convolution in the space domain for filter size k x k is CC = 0(mnk ) (3.73) 2 However, the Gaussian and its partial derivatives are separable , which means that the 2-d con8 volution can be achieved by two 1-d convolutions along the axes, at a cost of 0(nk) for the rows and 0(mk) for the columns, resulting in a complexity of CC = 0{(m + n)k) 3 A 2-d function is separable if it can be factored as f(x, y) = fi(x) f2(y). 56 (3.74) As mentioned above, we use a filter of length « 7a. To make sure that the filter length is odd, we actually set the filter length to £: = 2[3<7l + l Here, the computation of © i m (3.75) has been done with CT=1.0, resulting in k — 7. 2 The asymptotic analysis above ignores all constant factors, which nevertheless influence the actual runtime. Some more implementation details are worth mentioning. • In 2-d, 0 i 2 3 and @ i 2 7 require the computation of square roots since the gradient magnim m tude and the cubic variation include square roots. This relatively expensive operation could be avoided by squaring 0 i 2 3 and 0 i 2 . m m 7 • For the Laplacian, we can replace I®G -\-I®G xx yy by / © L o G , as explained in section 3.3.3. This saves one convolution. 9 • If several invariants are to be computed for each image, then the partial derivatives can be reused. It is the cost/benefit ratio of the application that ultimately determines whether the cost of computing an invariant representation is worthwhile. . A different kind of "cost", albeit not in terms of computing time or memory, is the loss of information inherent in the invariant representation. There is no one-to-one mapping from the original image to its invariant representation, so these mappings are not reversible. In fact, we expect them to be irreversible, since by nature of the invariants, whole families of images that are equivalent under the respective invariant are mapped into the same representation. We will look into a pragmatic evaluation of this information loss in chapter 4. 9 T h e L o G is not separable, but it can be computed as the sum of separable functions. 57 Figure 3.5: Example images and their invariants: Mandrill, Barbara, Lena, Building, (first row) original images; (second row) Q \2'i (third row) © i 2 3 ! (fourth row) 0 i 2 7 ; (fifth TOW) © m l 23 m m 7 58 m Chapter 4 Invariance under G a m m a Correction: Empirical Analysis This chapter examines the robustness and the usefulness of the proposed invariant under gamma correction on real world data. A key issue has already been mentioned in the context of the implementation of the invariants in section 3.3, namely the computation of derivatives on discrete data within finite windows, which can only approximate the true derivatives. But with real data, there are other issues: How robust is the invariant computation under noise? How accurately is a physically realized gamma correction in a camera approximated by the mathematical model? Is the invariant representation useful in the face of the inherent information loss? In order to answer these questions, we look at images taken by a camera which allows us to switch gamma correction on or off. In the first section, we compare the invariant representations of those images to the theoretical predictions. In the subsequent sections we present two image retrieval scenarios. One is based on template matching and one is based on histogrammed features. They allow us to quantify the performance gain achievable by using the invariant representation, if the images in the database have undergone gamma correction but the search image has not, or vice versa. The empirical results show that, for the given image data, the invariant representation is advantageous, the loss of discriminatory power that comes with the loss of information notwithstanding. 59 4.1 Application Independent Performance Measures The key idea behind the invariant representation has been schematized in fig. 1.2. No matter whether an image has been subjected to some transformation T or not, it will be transformed into the same invariant representation. That is what the theory from chapter 3 predicts, and the adequacy of this prediction determines whether subsequent image.processing operations will reap the promised benefits. Therefore, our first task is to ask what error range can be expected when computing the invariant from an image with gamma correction, as opposed to computing it from an image without gamma correction. Put another way, how invariant are the invariants? For specificity, we focus here only on the invariant @ m i 2 7 , implemented with the Laplacian as the second order derivative, as given in eq. 3.46, and computed with the Gaussian parameter a = 1 for the kernels. That is, for the purpose of the empirical analysis of invariance under gamma correction, we ignore the issue of scale, which will be analyzed in the following chapter. But first, we have a look at the gamma correction that is produced by our Sony DXC-950 3 CCD video camera. The product specification says only that the gamma correction can be switched on and off, without actually specifying the value of 7. Our goal is to determine the value of 7 that minimizes, in a least squares sense, the difference between the image where the gamma correction has been done by the camera and the image where the gamma correction has been computed from an image without gamma correction. The measurements and computations we perform are as follows. 1 1. Take an image of a scene with gamma correction switched off. Such an image will be referred toas"0GC". 2. Take an image of the identical scene with gamma correction switched on. Such an image will be referred to as "CGC", for "camera gamma correction". 3. Compute the gamma correction from OGC, according to eq. (1.2). Such an image will be We assume a "pure" gamma correction function, as given by eq. (1.2), over the whole intensity range. However, it is quite possible that cameras follow eq. (1.2) only over a certain intensity interval, and that they perform e.g. a different mapping for low intensities. Fitting the image data to a piecewise defined function may lead to a slightly different value of 7, but for the sake of simplicity and generality, we refrain from such refinements. 1 60 referred to as "SGC", for "synthetic gamma correction". 4. Vary the 7 for the SGC images such that the sum of the squares of the differences at each pixel between the C G C image and the SGC image is minimized. The above steps have been done for ten different OGC-CGC pairs. The optimal value of 7 varies between 0.55 and 0.69, with both the mean and the median being 7 = 0.60, which is the value we adopt from now on to create SGC images. Note that the NTSC standard assumes 7 = 2.22 for display devices, which has the inverse 1/2.22=0.45. This deviation, however, is of no concern to us - after all, we want to be invariant under gamma correction, i.e. the invariant representation should be the same irrespective of the particular value of 7, and the invariant computation assumes no knowledge of the particular 7. Figure 4.1: Camera gamma correction vs. synthetical gamma correction, image WoBA, histograms at the bottom, (left) original image, "0GC"; (middle) gamma correction by camera, "CGC"; (right) synthetic gamma correction, "SGC", 7=0.60. Fig. 4.1 shows an example that compares a C G C image to a SGC image computed with 7=0.60, their histograms shown at the bottom. As expected, gamma correction brightens up the image, leading to a higher contrast in the darker areas, e.g. in the lower left corner. The SGC 61 image is not identical to the C G C image because of noise. The C G C image is taken independently of the OGC image and therefore is subject to all the standard noise sources, while the computation of the SGC image does not add any noise. Note also that not all intensity values occur in the SGC image, showing as gaps in the lower intensity range of the histogram, because the SGC image is computed from the 4.1.1 quantized OGC image. Accuracy The first application independent measure we are looking at will be the difference between which is the invariant 0 m i2-y computed on a OGC image, and QCGC OOGC, which is the same invariant computed on the corresponding C G C image. This difference will give us an indication of the accuracy and robustness of the invariant computation. As an example, fig. 4.2 shows 0 i 2 m 7 for a set of OGC, C G C , SGC images, together with the original image and the error measures AcGc(*',i) = \@ccc(i,j) - Qoac{i,j)\ (4-1) &SGc{i,j) ~ ®0Gc{i,j)\ (4.2) and, analogously, computed at each pixel location = \®SGc{hJ) of the invariant representation. Actually, the boundary regions of each invariant representation are ignored and set to zero in the figure since the derivatives and therefore the invariants cannot be computed reliably in those regions. At the image boundaries, the image may be extended by pixel replication or periodic wrap-around or reflection (mirroring). These techniques may preserve image continuity, but they introduce discontinuities in the derivatives. Therefore, the first three boundary rows and columns will be excluded from our analysis. In the difference images ACGC and ASGC > the bright regions indicate areas of low errors, while the dark ones indicate large errors. Since — 1 < 0 i m 2 7 < 1, the maximum difference could be -2, and, indeed, we observe some outliers that come close to that upper bound. We also observe 2 Recall that the camera may not implement "pure" gamma correction, i.e. it may not strictly follow eq. (1.2) everywhere. 2 62 250 10 20 30 M ... ^mmm*** 1150 40 50 60 70 80 90 60 £J 100 80 120 (a) 1.8 10 1.6 20 1.4 30 1.2 40 1 50 0.8 60 0.6 i 20 40 60 100 80 120 0.4 70 0.2 80 0 90 (c) • f i i * ' —'" ' «* 1. I % i 4 5 1 *«-i i *: «••• i f • . *> ..* i " j •s,—•* yf ' ,1 s V' • (e) Figure 4.3: Accuracy of invariant @ I2-Y, prefiltering with CT=1.0, image WoBA. (a) OGC image; TO (b) ©OGC; (c) A C G C ; (d) © C G C ; (e) A G C ; (f) S ©SGC- 64 that regions that are homogeneous in the original image, i.e. regions of almost constant intensity, show large errors. This is hardly surprising, given that the invariants are based on differentials and therefore respond to image discontinuities. This implies that in homogeneous areas the computation of the proposed invariants is more sensitive to noise than in areas of significant intensity variation. The SGC image basically shows none of these noise induced errors since the synthetically computed image assumes the same noise characteristic as the original image. But even in the SGC image there are some points where the error is very high. We observe that these errors occur at steep edges. Recall that the derivatives are computed via Gaussian kernels that combine derivation with smoothing. This inherent smoothing induces an upper limit onto the magnitude of the derivative that can be computed, and it also limits the accuracy of the derivative computation in areas of high contrast and high frequencies. 3 The errors indicated in fig. 4.2 immediately lead to the question of the benefit of prefiltering. Smoothing, or prefiltering, is a standard technique in image processing to cope with noise. However, smoothing, e.g. convolution with a Gaussian G, distorts the functional relation between a OGC image and its C G C counterpart. That is, if 7 — pi , 1 7 then, generally, 7 © G ^ p ( 7 © G ) . 7 7 So there is a trade-off between noise reduction and distortion. It turns out that gentle prefiltering with a Gaussian of size CT=1.0 proves beneficial. This is shown in fig. 4.3(e) where the overall error is significantly reduced for ©CGC- But such is not the case for ®SGC M the right column, because there was no noise in the first place, and all that prefiltering does is add some distortions that actually lead to a larger overall error. In order to quantify the accuracy not just for each pixel but globally for whole images, we compute the following error measures. The maximum absolute difference ACGC = max|e CGC7 (i, j) - @ Gc{i,j)\ 0 (4.3) the mean absolute difference = ^^\9cGc(hJ)-&oGc(i,j)\ (-) 4 4 i,j Also, the computation of the SGC image is done on the quantized OGC image, so the quantization error from the OGC image propagates. 3 65 the mean squared difference CGC A = Jlj2(QcGc(i,j)-e0Gc(iJ)) (4.5) 2 V h3 and the signal-to-noise ratio [Jai89] S N R C G C = 10 log 1 0 - j £ 2 c ( where n is the number of valid pixels, i.e. all pixels minus the boundary pixels, O~Q variance of © C G C , and a\ Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC median mean A CGC 1.8532 1.9623 1.9321 1.9182 1.9003 1.4872 1.9484 1.9354 1.9765 1.9400 1.9337 1.8854 6 ) is the is the variance of ACGC- a -= 0 image 4 AcGG 0.1302 0.2334 0.2951 0.2126 0.2834 0.0907 0.1161 0.2025 0.2759 0.2705 0.2230 0.2110 &CGC a = 1.0 SNR 5.174 0.3430 0.3466 6.615 0.4558 4.112 0.3246 6.940 0.3998 5.371 0.1478 12.714 0.2061 • 9.647 5.174 0.3430 0.4150 4,683 0.4088 4.639 5.272 0.3448 0.3390 6.507 A CGC ACGC -r-2 CGC SNR 1.7067 1.9106 1.9338 1.8210 1.9440 1.0744 1.6171 1.9751 1.9095 1.8563 1.8829 1.7749 0.0850 0.1517 0.2110 0.1242 0.1550 0.0541 0.0637 0.1350 0.1941 0.1698 0.1434 0.1344 0.1285 0.2305 0.3520 0.1956 0.2264 0.0881 0.1113 0.2362 0.2996 0.2616 0.2284 0.2130 14.497 9.872 5.879 11.063 9.776 17.083 14.800 8.109 7.157 8.046 9.824 10.628 A Table 4.1: Error measures for 0 i2~p C G C images. The columns on the left show the error measures without prefiltering, the ones on the right with prefiltering, cr=1.0. m The data for the test images (see appendix C ) is given in table 4.1. We observe that there are almost always some outliers that are close to the maximum error bound, even after prefiltering. Both the mean difference and the mean squared difference are significantly reduced by prefiltering, but they are also significantly larger than zero. That is, each test image has some pixel locations where Q i2-y is not computed robustly. As mentioned above, these locations are TO basically, homogeneous regions where the concept of differentials is not really helpful, plus a few locations where the edges are very steep. Prefiltering increases the signal to noise ratio. The S N R varies quite a lot among the images. 66 4.1.2 Reliability We can take another perspective on the error measures computed in section 4.1.1 and ask whether the relative error at any given pixel is less than a specified threshold e. The points that fulfill this criterion will be called reliable points. Let's define the relative error for the C G C image, in percent, to be 0CGCV,3) = 100 —— (4.7) VoGC(i, J) Then the set R P of reliable points, relative to an error threshold 6, is e RP = {P |*(t,i)<€} £ (4.8) 0 and we can define the percentage of reliable points as IRP I PRP = 100 i (4.9) £ n where n is again the number of valid pixels. Figure 4.4: Reliable points R P for invariant 9 i , in black, image WoBA. (top row) no prefiltering; (bottom row) prefiltering with (7=1.0. (left) ecGC=5.0; (middle) e GC=10.0; (right) e GC=20.0 e m 2 7 C C 67 image Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC median mean (T e = 5.0 — 0 € = 10.0 24.9 29.0 28.7 33.6 23.9 28.3 27.2 10.7 12.0 11.1 26.1 22.9 13.3 15.6 16.5 18.5 13.0 15.4 14.5 5.6 6.1 5.6 13.9 12.4 e = 20.0 43.8 48.2 47.1 53.5 41.9 45.9 44.7 20.1 22.7 20.8 44.3 38.9 e = 5.0 16.0 19.0 21.4 24.0 16.7 22.6 22.0 7.4 8,3 7.9 17.9 16.5 a = 1.0 e = 10.0 29.5 35.7 37.7 41.4 32.6 38.8 38.5 14.7 15.7 15.1 34.2 30.0 e = 20.0 49.3 58.9 58.1 65.3 55.6 57.6 57.6 27.1 28.6 28.3 56.6 48.6 Table 4.2: Percentages of reliable points for 0 i 2 , C G C images, for e^GC =5.0, 10.0, 20.0. The columns on the left show PRP without prefiltering, the ones on the right with prefiltering, m 7 e <7 pre = 1.0. Obviously, the smaller e, the smaller the set of reliable points. For the image WoBA, the sets of reliable points RP5.0, R P 1 0 . 0 , and RP20.O) are shown in fig. 4.4, in the top row before prefiltering, and in the bottom row after prefiltering. The number of reliable points, in black, is a monotonic function of the reliability threshold e. The improvement of the reliability by the prefiltering is also evident. Table 4.2 shows RP5.0, R P 1 0 . 0 , and R P 2 0 . 0 for a set of test images. Roughly, 50% of all pixels, using prefiltering, have a relative error less than 20%. Again, the improvement of the reliability by the prefiltering is evident. It is not obvious why the R P values for the three Tool £ image are so much lower than the rest. 4.1.3 Entropy Entropy is a simple concept from information theory that measures the information or uncertainty of a random variable. Our motivation for looking at entropy stems from the observation that the proposed invariant representation is not information preserving and therefore entails some loss of discriminatory power. In the extreme case, a representation that is zero everywhere is certainly invariant but useless. The entropy of such an invariant would be zero, and we are interested here 68 whether the observed entropy for the proposed invariants goes towards zero. The entropy H is defined as [Jai89] L H = ~Y Pk^g Pk J (4.10) 2 k=\ where the pk are the probabilities of the L possible discrete events. The maximum entropy for L events is H m&x — log L. In order to measure how close the entropy is to this theoretical 2 maximum, we compute a normalized entropy, in percent, as H = 100 H/H (4.11) max Entropy is maximized if the distribution of the random variable is uniform, and it is minimized, i.e. equal to zero, if only one event can happen. Noise can often be seen as uniform random variable, so that a high entropy can be an indicator of useless noise rather than of useful information. On the other hand, low entropy means low information content. Therefore, entropy as a metric for discriminatory power is of only limited use. image Build WoBA WoBB WoBC . WoBD Cycl Sand ToolA ToolB ToolC intensity images OGC 83.8 89.4 67.1 87.1 72.0 93.3 91.2 73.7 62.3 79.4 CGC 85.8 95.8 80.0 91.6 84.0 86.1 81.8 87.9 80.9 89.5 invariant repr. OGC 96.5 96.4 97.1 96.2 96.0 96.7 96.8 96.6 97.1 97.2 CGC 96.9 96.7 97.2 96.7 96.0 96.8 96.9 96.5 96.7 97.2 Table 4.3: Normalized entropy H of intensity images, on the left, vs. their invariant representation, on the right, L=16 bins The computation of the entropy requires a random variable over a finite number of discrete events. In the case of 8 bit intensity images, the data falls naturally into L=256 bins. In the case of the invariant representation, we divide the range —1... 1 into L=256 bins of equal width and compute the corresponding histogram. The binning influences the entropy value, so we also try a 69 . * >L ,* -\ 11 I 1 Figure 4.5: Histograms for entropy computation, 0 i 2 : L=16 bins in the middle, L=256 bins on the right, image WoBA. (first row) intensity image OGC; (second row) intensity image CGC; (third row) invariant representation of OGC; (fourth row) invariant representation of C G C . m 70 7 image Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC median mean intensity images OGC 92.4 92.2 79.0 90.0 ' 84.7. • 95.1 94.7 83.8 75.6 87.0 88.5 87.5 CGC 93.5 95.6 85.1 92.8 92.2 91.2. 89.8 92.4 87.4 93.6 92.3 91.4 invariant repr. OGC 99.2 98.9 99.3 98.8 98.8 99.1 99.3 98.5 99.1 9,9.3 99.1 99.0 CGC 99.4 99.2 99.4 99.1 98.8 99.1 99.3 98.9 99.1 99.5 99.2 99.2 Table 4.4: Normalized entropy H of intensity images, on the left, vs. their invariant representation, on the right, L=256 bins binning with L=16 bins. Fig. 4.5 shows an example of the histograms that are the basis for the entropy computations for an intensity OGC and its C G C image, as well as for their corresponding invariant representations for Qmu-y- We observe that the invariant representation seems to be more uniformly distributed than the intensity images which show clear peaks. This observation is confirmed by the normalized entropy values computed for several images and presented in table 4.3 for L=1Q bins and in table 4.4 for £ = 2 5 6 bins. While for the intensity images we find 70% < H < 96%, for the invariant representation we have 96% < H < 100%, with no significant differences between the OGC and the C G C images. These numbers indicate that the computation of the invariant representation is subject to a fair amount of noise, but we can also cautiously conclude that the invariant representation does not converge towards zero or some other constant value. 4.2 Application Scenario I: Template Matching We have seen above that the invariant representation can not always be computed robustly, that sometimes large errors may occur. We also noted that the invariant representation is not information preserving. This immediately leads to the question whether the use of the invariant 71 representation can be advantageous in an application. In order to answer this question, we have implemented two application prototypes and evaluated them on the set of images given in appendix C, comparing the performance with the invariant representation to the one without the use of the invariant representation. OGC intensity • CGC intensity search template i i i > (correlation) 0y • 1 template search r 1 (correlation) OGC invariant CGC invariant Figure 4.6: The template location problem: A template is cut out from the OGC intensity image and correlated with the corresponding C G C intensity image. We test if the correlation maximum occurs at exactly the same location as in the OGC intensity image. The same process is repeated with the invariant representations of the OGC and C G C images. The first scenario implements template matching by correlation, evaluating what we call the template location problem. The general idea, depicted in fig. 4.6, is as follows. Given an intensity image without gamma correction (OGC), we cut out a small template of size tn x tm at location (x,y). This template represents our search object. Note that we are not interested in the semantic content of that template - the template represents the search object as is, in terms of a graylevel pattern. If we correlate the template with the OGC image, we find, by definition, a correlation maximum at the position where the template was cut out. But the question of interest here is what happens if we correlate the OGC template with the C G C image. Will the correlation maximum still occur at the same position? If so, gamma correction doesn't pose a problem in this 72 context and doesn't have to be addressed. If not, does the invariant representation improve the likelihood to locate the template in the same position in the C G C image as in the OGC image? The correlation function s(x,y), 0 < s < 1, employed here is based on a normalized mean squared difference c(x,y), as used for example by Fua [Fua93]: s/,x(z, y) = max(0,1 - c (x, y)) I}T , . Ei -.((/(* + i«,y + i«) - 7 ) - (T(it,jt) - T)) (4.12) 2 tJ ^ , (nx it n + iuy + jt) -i) 2 E, ,(r(*'t,iO - T ) 2 w where I is an image of size n x TO, T is a template of size tn x tm, I(x,y) is the mean of the subimage I h of I of size tn x £TO positioned at (x,y), T is the mean of the template, su 0 < it < tn — 1 and 0 < < im — 1. In matlab-like vector notation, we can write the subimage as I h = I{x '• x + tn — 1, y : y-\-tm— 1). A perfect match, i.e. identity of subimage and template, su yields c=0, s—1. Note that we have chosen the upper left corner of the template T and of the subimage 7 & to identify their locations (x, y), not their centers. The template dimensions tn, tm SU are important parameters. However, for notational convenience we refrain from adding those parameters explicitly to T, but it is to be'understood that T can vary in size, giving different results. For our empirical analysis, we used the template sizes 6 X 8 and 10 X 10. Correlation is typically employed to search for a best match, i.e. a correlation maximum. That is, rather than correlating T with just one subimage, we correlate T with every subimage Isub(x,y) of size tn X tm. We define the set of maximum correlation points V (I,T) max of the correlation of template T with image I to be max where max si T{i,j)} t (4-13) range over the image size. If the maximum is unique, then the set has just one element, i.e. | P maa; | = 1, but a unique maximum can not be guaranteed. What we want to know is whether the correlation maximum occurs exactly at the position (io,Jo) where the template has been cut out. We call (ioiio) (*o,io) e a correct maximum correlation position, or CMCP for short, if F (I,T). max In fig. 4.7, we demonstrate the template location problem on a synthetic pattern, a simple 2-d sine function. The meshes in the first row show the image function before gamma correction ,73 100 (e) (f) Figure 4.7: Template matching on synthetic images: (a) OGC pattern; (b) SGC pattern; (c) correlation matrix, intensity images; (d) correlation matrix, invariant representations; (e) matched templates, intensity pattern; (f) matched templates, invariant representation. Black box=query template, white box=matched template. 74 (OGC) on the left and after synthetic gamma correction (SGC) on the right. As the template, we picked the one positioned at (40,15) in the OGC image, size 6x8, indicated by the black box in the intensity image in the third row on the left. The white box indicates the position of the maximum correlation. In the intensity image, it turns out to be a wrong match, at position (33,10), while in the invariant representation, on the right, we find a correct maximum correlation position at (40,15). The second row of fig. 4.7 shows the correlation functions for the whole image, on the left for the intensity image, on the right for the invariant representation. The white areas are regions of high correlation. They show us the spatial distribution of good matches. We observe that those areas are smaller in the case of the invariant representation. Fig. 4.8 shows the same data for a real image. Again, we have selected an example where the maximum correlation position is wrong in the intensity images but correct in the invariant representation. In order to systematically evaluate the correlation accuracy over the whole image, we compute the CMCPs for the templates T xtm{iv, jv) at each valid positions (i ,j ) tn v image J , in short denoted as T € J. Then the set VCMCP(I, positions consists of those points (i ,j ) v PcMCp(I,J) where (i ,j ) v v v v J) of correct maximum correlation where the correlation maximum occurs at €F = {{iv,jv)\(iv,jv) over the OGC (I,T), Te J} . MAX (i ,j ): v v (4.14) range over all valid positions. The valid positions are determined by the template size tn x tm and by the number n^ of boundary rows or columns of the image that are not taken into consideration, e.g. because the invariant representation can not be computed robustly in that regions. Then the valid positions range from (n^ + 1, n\> + 1) to (n - nf— tn + 1, m - n}, - tm + 1), resulting in the number of valid positions N to be N = (n — 2 * n\> — tn+1) • (m - 2* nf, — tm +1). v v Here, we set n;,=3. Then we can define the correlation accuracy CA to be the percentage of all correct maximum correlation positions in the image. CA(7, J) = 100 \FCMCP(I,J)\/ N v (4.15) The correlation accuracy CA(7, J) is our metric to evaluate the template matching performance. We compute C A / ( 7 , J) for intensity images where i=0GC and J = C G C , with and without ni prefiltering, and C A / ( 7 , J) for the invariant representations of above OGC and C G C images, nw 75 0.5 |o.4 (a) (b) 250 10 200 20 ill| i ^ i M ^ l i o.a 0.6 0.4 30 150 1 0.2 40 0 50 -0.2 100 60 -0.4 70 50 -0.6 80 -0.8 90 0 (d) (c) Figure 4.8: Template matching on real images, image WoBA: (a) correlation matrix, intensity images; (b) correlation matrix, invariant representations; (c) matched templates, intensity pattern; (d) matched templates, invariant representation. Black box=query template, white box=matched template. again with and without prefiltering. The results are recorded in binary correlation accuracy matrices B C A M . The pseudo code for computing the BCAMs is given in fig. 4.9. The binary correlation accuracy matrices of the example image WoBA are given in fig. 4.10. The differences between the B C A M of the unfiltered intensity image and of the prefiltered invariant representation is further developed in fig. 4.11. Subfigure 4.11(b) shows the regions where correlation fails for both the intensity image and the invariant representation. These regions correspond mostly to the areas of low accuracy, i.e. of high absolute error, in fig. 4.2, which in turn correspond mainly to homogeneous areas in the original image, i.e. areas of no structure. The regions indicated in subfig. 4.11(c) where the intensity image actually produces an accurate 76 Algorithm Template Location . Input: images J , I image size n x m template size tn x tm boundary size B Output: binary correlation accuracy matrix B C A M B C A M := 0 for i := B + l to n-B-tn+1 do for j := B + l to m-B-tm+1 do / * cut out the template * / T := J (i:i+tn-l, j:j+tm-l) / * correlate and keep maximum * / MaxCorr := 0 for x := B + l to n-B-tn+1 do for y := B + l to m-B-tm+1 do if J , T ( x , y ) > MaxCorr S then MaxCorr := s/ T(x,y) I / * decide on correct maximum * / if S J , T (ij) = MaxCorr then BCAM(i j) := 1 Figure 4.9: Pseudocode for the template location problem. I is the same as J, but with gamma correction. I, J may be either intensity images or the invariant representations. I, J, B C A M all have the same size nxm. s/,T(x,y) is the correlation between the template T and the corresponding subimage of I at (x,y). correlation maximum but the invariant representation does not are small and rather spurious. By contrast, there are large regions in subfig. 4.11(d) where only the invariant representation delivers accurate correlation maxima. The correlation accuracy matrices for the other images from the test image database, for template sizes 6 x 8 and 10 X 10, can be found in appendix D. The correlation accuracies for those images, with or without prefiltering, on the intensity images as well as on the corresponding invariant representation, are given in table 4.5 for template size 6x8 and in table 4.6 for template size 10 X 10, and these data, in turn, are visualized in fig. 4.12. There are a couple of important observations to be made. 77 20 40 60 80 100 120 20 40 60 80 100 120 Figure 4.10: Binary correlation accuracy matrices B C A M , template size 6 x 8 , white=PcMCP, image WoBA. (first row) intensity images; (second row) invariant representations; (left column) a —0; (right column) <r =1.0. pre pre • The correlation accuracy C A is higher on the invariant representation than on the intensity images. •. The correlation accuracy is higher on the invariant representation with gentle prefiltering, a pre = 1.0, than without prefiltering. We also observed a decrease in correlation accuracy if we increase the prefiltering well beyond a pre = 1.0. By contrast, prefiltering seems to be always detrimental to the intensity images CA. • In absolute terms, the correlation accuracy shows a large variation, roughly in the range 30%. ..90% for the unfiltered intensity images and 50%. ..100% for prefiltered invariant 78 (b) 40 (d) Figure 4.11: Comparison of BCAMs, template size 6x8, locations of interest in white, image WoBA. (a) intensity image; (b) locations where both representations fail; (c) locations where intensity image is superior; (d) locations where invariant representation (with 0- =l.O) is superior. pre representations. • Similarly, the gain in correlation accuracy by using the invariant representation instead of the intensity images shows a large variation as well. Sometimes, it is close to zero, sometimes as high as 45%. For our test images, it turns out that the invariant representation is always superior, but that doesn't necessarily have to be the case. In fact, we observe some points that are correct maximum correlation points in the intensity images but not in the invariant representation. 79 • The medians and means of the CAs over all test images confirm the gain in correlation accuracy for the invariant representation. • The larger the template size, the higher the correlation accuracy, independent of the representation. This is obvious since a larger template size means more structure, more discriminatory power. This implies that the larger the template size, the smaller the potential gain that can be reaped from the use of the invariant representation. The gain in correlation accuracy by using the invariant representation, in terms of the means and medians, is visualized in fig. 4.13. image Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC median mean intensity images CJ=0 85.0 55.5 39.3 67.2 31.6 60.5 50.5 41.7 29.5 42.1 46.3 50.3 CT=1.0 78.0 45.0 31.0 58.3 29.2 45.4 40.9 35.3 23.4 27.8 38.1 41.4 invariant repr. CT = 0 85.8 75.7 52.7 68.9 48.0 98.6 85.2 60.2 45.7 42.5 64.6 66.3 (7 = 1.0 89.5 80.4 57.6 78.7 67.4 99.4 94.4 68.0 54.1 48.4 73.4 73.8 Table 4.5: Correlation accuracy CA, template size 6x8, without prefiltering and with prefiltering, <V-e=l-0. Clearly, many metrics other than the correlation accuracy are conceivable. For example, instead of just making a binary decision regarding only the position of only the maximum correlation, one could rank the top N matches, say, by correlation magnitude, and derive a metric based on those rankings or those correlation magnitudes. Alternatively, one could base a metric on the geometric distance between the best match and the true match. The choice of the metric has to be motivated by the application in mind. Here, we only argue that there exists a metric, namely the correlation accuracy, in whose terms a performance gain can be observed. Furthermore, it can be argued that in applications like stereo where the depth is determined from the disparity 80 1 i 100 • 100 '"•<( 90 • o. 80 . 0 • 90 • 4::- Q '* \ i 80 O 50 60 '» . 0 o. '* 40 30 0. # -«. 50 " 0 ' Q ci 0 • 30 • 20 0 r " i 1 2 3 4 •5 6 7 8 *... o 9 o - ei • • 20 - • 10 - • 0 10 . 0 °. 6 40 o' 10 • —= o. 70 • o. o. •0. • 0 / . a 0 . • *-<>'•'. s •' '• o: 70 60 p.. •' * 1 2 3 4 5 6 7 8 9 10 Figure 4.12: Correlation accuracies, visualization of the data in tables 4.5 and 4.6. (left) template size 6x8; (right) template size 10 X 10. The entries at x=l refer to Build, the entries at x=2 refer to WoBA, etc. (circles) intensity images; (stars) invariant representation with a =0; (diamonds) invariant representation with <7 =1.0. The markers on the left hand side of the plots indicate the means, the markers on the right hand side the medians. pre pre Figure 4.13: Correlation accuracy statistics over all test images, (left) means; (right) medians. The first triplets refer to template size 6 x 8 , the second triplets refer to template size 10 X 10. The entries at x=l, x=6 refer to intensity images, the entries at x=2, x=7 refer to the invariant representation with <r =0, the entries at x=3, x=8 refer to the invariant representation with 'pre- =1.0. pre 81 image Build WoBA WoBB WoB'C WoBD Cycl Sand ToolA ToolB •ToolC median mean intensity images o=0 (7 = 1.0 89.92 83.30 75.73 62.69 53.36 45.00 78.21 70.43 50.07 43.29 72.62 57.05 63.30 48.14 52.17 43.32 37.12 29.13 55.52 39.07 59.41 46.57 62.70 52.14 invariant repr. CT = 1.0 93.31 95.17 88.58 90.67 59.34 62.81 81.30 86.68 65.35 78.25 99.93 99.96 93.25 98.60 76.86 79.60 63.43 66.10 49.49' 62.27 83.14 • 79.08 78.08 82.01 o=0 Table 4.6: Correlation accuracy CA, template size 10 X 10, without prefiltering and with prefilter- ing, o =1.0. pre which in turn is usually determined by correlation between left-right image pairs [Fua93], finding the template in the correct position is of highest importance. 4.3 A p p l i c a t i o n S c e n a r i o II: H i s t o g r a m B a s e d Retrieval While the previous template matching application scenario involved local search of image substructures, our second application scenario focuses on retrieving whole images from a set of images. The idea is to measure recognition rates if the query images have not undergone gamma correction but the images in the test database have, and compare the recognition rates for intensity images to the one achieved using the invariant representation. The image retrieval system prototype that we have implemented follows the system suggested by Schiele and Crowley [SC96]. In their approach, the images are represented by feature histograms, and the distance between histograms is measured with a simple statistic. As is to be expected, the system design space is huge, with some prominent parameters being • the kind of features used • the histogram resolution and dimensionality 82 ...{- X 1 Histo Histo OGC intensity •• • 0, CGC intensity n 0y ••• ••• X Figure 4.14: Histogram based retrieval: For each image, a feature histogram is computed. The x distances between the histogram of the OGC intensity image, which serves as the query image, and the C G C intensity image histograms are computed and ranked. The same process is repeated with the histograms of the OGC and C G C invariant representations, and the rankings are compared. 83 2 • the distance metric to measure the similarity of histograms For specificity and simplicity, we chose a set of system design parameters that showed good results in the experiments presented by Schiele, as follows. • We used the gradient magnitude and the Laplacian magnitude, referred to as "Mag-Lap", with = 1-0. Note that these features are translationally and rotationally invariant, UMag-Lap and they have been mentioned already in section 3.2.1. • For Mag-Lap, the histograms are 2-dimensional. We experimented with resolutions of 8 and 32 bins per dimension. • Among the distance metrics proposed, we chose the x metric. 2 A schematic overview of our prototype is shown in fig. 4.14. Let Hi,H be 2-d histograms with an equal number of data points. Then the x 2 2 distance between them is defined to be [PTVF92] (4.16) If the two histograms have a different number of data points, then this equation generalizes to (4.17) where c\ = y/Z jH (i,j)/'Zi H (i,j) i 2 tj 1 and c = 2 MhJ)/H (iJ). The 2 2 X distance is a well known statistic often employed to test whether two data sets are consistent with a single distribution function. It is zero for identical distributions. The x distance is a simple, 2 easy to compute distance metric. In the context of image retrieval, it has been used to evaluate the similarity between histograms (see e.g. [SC96, LM99]). But it compares each histogram bin Hi(i,j) only to its corresponding bin #2(2, j), not to its neighbors, say 1) etc. H (i + 1, j) 2 or H (i,j 2 — More sophisticated metrics can be conceived, but for the purposes of evaluating the performance gain obtainable from an invariant representation, the x distance serves us well. 2 The x statistic is computed on histograms, i.e binned data. However, the feature space is in 2 a continuous domain, so a mapping from the feature space into histogram bins has to be designed. 84 For our data, where the 8-bit intensity images are in the range 0....255, and —1 < @ m i 2 7 < 1, the resulting feature space is bounded for both the gradient magnitude and the Laplacian. We could therefore divide the continuous feature space into k intervals of equal size, with, interval width t /k for the gradient magnitude and 2ti /k mag ap bound of the gradient magnitude and ±ti ap for the Laplacian, where t mag is the upper is the upper/lower bound of the Laplacian. However, such a mapping would concentrate most data points into a few bins and leave most bins more or less empty, because in most images the feature values are more likely to be closer to zero than to the theoretical maximum. This is evident in fig. 4.15, for both the gradient magnitude and the Laplacian. The decrease in the histogram counts towards the extrema seems to be approximately exponential, and it is more pronounced in the case of the intensity images than for the invariant representation. In order to obtain a more uniform distribution of.the histogram counts, we subdivide the feature space into k — 1 intervals from 0 to i mag into k — 2 intervals from — ti ap to ti ap for the gradient magnitude and for the Laplacian, where i mag <t , and £ / < £ / , and mag ap ap put the feature values that don't fall into those ranges into the remaining overflow bins. The empirical data presented here was obtained with i mag images, and t mag = 0.5, t\ ap = 60, £ / = 60 for the intensity ap = 0.5 for the invariant representations. The 2-d histograms computed with those settings for image WoBA are given in fig. 4.16 for the intensity images and in fig. 4.17 for the invariant representations. This figure also compares the histograms derived from the OGC images to the ones derived from the C G C images. Clearly, those are not identical, but the histograms indicate that the difference between OGC and C G C derived histograms is smaller in the case of the invariant representation than in the case of the intensity images. We also observe that the distribution of the histogram counters is fairly uniform for the invariant representation but still highly non-uniform for the intensity images. It is not obvious whether a non-linear binning of the intensity image based features would lead to an improved retrieval performance or not. The x distances for the images in our test database, for 8x8 histogram resolution, are given 2 in tables 4.7 to 4.9, for the intensity images without prefiltering, and the invariant representations with and without prefiltering, respectively. Each row shows the distances from the given OGC histograms to the C G C histograms. Note that the tables are not symmetric since, generally, the 85 - 4 S O O t- 3 S O O 3 0 0 0 2 S O O 3 0 0 0 h 2 0 0 0 2 0 0 0 1 O O O 1 O O O --i s o ^ - 1 2 3 2 S O O -I soo I O O O 2 0 0 0 [- 2 S O O Z O O O -I h -I S O O -I O O O \- O O O Figure 4.15: Mag-Lap 1-d feature histograms: (left) gradient magnitude; (right) Laplacian. (first row) features from OGC intensity image, 8 x 8 bins; (second row) features from OGC intensity image, 32 x 32 bins; (third row) features from OGC invariant representation, 8 x 8 bins; (fourth row) features from OGC invariant representation, 32 X 32 bins. 86 3000 2000 1000 1000 Figure 4.16: Mag-Lap 2-d feature histograms for intensity image: (left) OGC; (right) C G C . (first row) 8 x 8 bins; (second row) 32 X 32 bins. distance from image I\ to image I when I\ is a OGC image and I is a C G C image is different 2 2 from when I\ is a C G C image and I is a OGC image. If the \ 2 2 value is smallest on the diagonal, then the query image (we use the term search object synonymously) has been correctly recognized. The percentage of correctly recognized objects is the recognition rate R: R= 100 N /N c (4.18) where N is the total number of correctly recognized objects and N is the number of objects in c the database. We observe three correctly recognized images for the intensity images (table 4.7), i.e. a recognition rate of 30%. Where the query image is not recognized correctly, it is ranked 87 300 300 Figure 4.17: Mag-Lap 2-d feature histograms for invariant representations: (right) C G C . (first row) 8 x 8 bins; (second row) 32 x 32 bins. (left) OGC; second best in five out of ten cases, and ranked third in the remaining two cases. The invariant representation, by contrast, yields recognition rates of 100%. The analogous distance tables for a histogram resolution of 32 x 32 are given in appendix E . They show a recognition rate of 50% for the intensity images, 90% for the unfiltered invariant representation, and 100% for the filtered one. The rankings have changed somewhat, but not dramatically. The x metric allows us to rank the query results. The rankings for the query images Build, 2 WoBA, and ToolA are visualized in figs. 4.18 to 4.20, respectively, for both the intensity images and the invariant representation without and with prefiltering. That is, these rankings are taken 88 Figure 4.18: Retrieval rankings for query image Build, 8 x 8 bins. (top) for intensity images; (middle) for invariant representations, a representations, a = 1.0. = 0; (bottom) for invariant Figure 4.19: Retrieval rankings for query image WoBA, 8 x 8 bins. (top) for intensity images; (middle) for invariant representations, a representations, a — 1.0. = 0; (bottom) for invariant pre pre pre pre Figure 4.20: Retrieval rankings for query image ToolA, 8 x 8 bins. (top) for intensity images; (middle) for invariant representations, a representations, a — 1.0. pre pre 89 = 0; (bottom) for invariant Build Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC 204.7 2018.0 4201.7 1197.9 4045.1 983.1 1453.0 3305.8 6202.4 3540.1 WoBA WoBB 951.9 490.5 2514.2 843.8 3050.5 1309.7 1154.3 1398.8 3264.8 1668.4 WoBC WoBD Cycl 1915.3 1100.7 1835.8 1692.4 1695.2 982.4 1284.4 2488.2 603.6 1993.4 1000.8 1564.0 740.2 469.7 3 8 1 . 7 1299.2 4 9 3 . 5 2246.2 1033.4 803.2 934.9 1097.0 1158.9 3 9 9 . 3 1339.9 7 3 7 . 9 1115.1 1240.1 1034.2 1421.6 1000.1 2313.6 1643.6 2919.5 1692.3 3351.2 1079.5 1717.9 1232.2 2282.5 Sand ToolA ToolB ToolC 1889.2 2458.4 1825.9 949.0 1182.5 490.4 1037.5 2348.9 3470.5 2503.6 1978.0 3546.3 4 6 6 . 6 1214.3 1240.0 5 7 8 . 6 1040.3 1584.1 2003.1 1443.4 1925.1 2691.5 1820.1 2609.2 303.2 2 4 8 . 9 1325.7 5 3 9 . 8 380.5 2 6 1 . 8 2322.3 592.0 1167.5 1028.1 2012.2 2238.1 2133.4 283.1 1029.2 310.4 Table 4.7: Distance matrix, intensity images, 8 x 8 bins, no prefiltering. Minimum distance in bold face. Recognition rate i?=30%. Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC 2 1 4 . 9 438.2 702.0 422.7 628.0 667.4 836.5 6 9 . 4 359.5 295.7 131.1 184.0 762.1 282.7 5 4 . 3 184.8 304.7 298.2 1040.6 249.9 208.9 1 5 0 . 3 151.0 406.3 1135.3 821.6 1147.5 1174.3 1018.6 1152.0 202.0 158.4 205.8 427.8 331.2 428.1 384.3 314.5 547.5 556.0 416.8 413.8 293.3 380.3 607.9 414.4 358.7 365.0 940.9 241.2 430.3 561.2 53.1 340.8 436.7 291.5 4 3 . 2 103.6 406.2 139.6 6 8 . 7 242.2 619.4 813.6 216.2 481.6 666.6 221.1 643.6 875.3 559.2 394.5 501.8 381.4 251.1 591.9 792.7 74.6 118.3 141.4 504.4 256.1 333.4 288.1 165.3 398.6 589.0 141.1 82.1 681.6 326.8 291.0 188.5 203.5 547.0 772.5 269.2 230.8 168.2 1 2 8 . 2 Table 4.8: Distance matrix, invariant representation, 8x8 bins, no prefiltering. Minimum distance in bold face. Recognition rate i?=100%. from rows one, two, and eight of tables 4.7 to 4.9. The search object is the one contained in the leftmost icon of the third row in each figure. We observe that the prefiltering influences the ranking for the invariant representation. The pseudo code for the computation of the recognition rate R is given in fig. 4.21. If the task is to find the ranking for a specific query image Jj. rather than the recognition rate for the whole database, then the computational steps are similar but simpler. The feature histograms of the C G C images are supposed to have been precomputed and stored. Only the feature histogram of the query image has to be computed at query time, and then the distances between that histogram and the stored histograms are computed and sorted. 90 Algorithm Recognition Rate Input: N image pairs (Jk, h) operator size o~Mag-Lap histogram resolution TMag boundary size B O u t p u t : distance m a t r i x dist recognition rate R Lap r x / * compute feature histograms * / for k := 1 to N do / * for a l l image pairs * / n := h e i g h t ( 4 ) m := width(// ) for i := B + l to n - B do / * for a l l non-boundary rows * / for j := B + l to m - B do / * for a l l non-boundary columns * / Ma {ij) := y/(I © G ) + (I ® G ) c 2 gi k 2 x k y Magji^S) := y/(J ® G ) + (J ® G ) 2 k x 2 k y Lap {\ ]):-Ik®G -\-Ik®G y . Lap j ([,}):= Jk ® G + Jk ® Gyy put Magi-Lapi pair into 2-d histogram Hi put Magj-Lapj pair into 2-d histogram Hj I ) xx y xx k k / * compute distance m a t r i x * / for k l := 1 to N do'/* for a l l histograms * / for k2 := 1 to N do-/* for a l l histograms * / dist(kl,k2) :=X (H ,H ) 2 Ikl Jk2 / * check i f (Jk, Ik) pairs have m i n i m u m distance * / N := 0 for k := 1 to N do / * for a l l rows of distance m a t r i x * / if dist(k,k) = m i n ( d i s t ( k , l : N ) ) c then N := N + 1 c c / * compute recognition rate * / R := 100 N j N c Figure 4.21: Pseudocode for recognition rate computation. Ik is like Jk but with gamma correction. The (Jk,Ik) pairs may be either all intensity images or all invariant representations. 91 x Build WoBA WoBB WoBC WoBD Cycl 2 Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC 136.1 613.5 517.8 484.3 1703.0 753.3 1834.4 833.8 699.2 1240.5 353.1 57.9 204.0 264.3 565.1 164.0 587.9 •194.9 171.4 395.4 523.7 236.0 51.6 234.5 670.5 311.6 939.8 299.0 215.9 342.1 Sand ToolA ToolB ToolC 352.9 1340.8 647.1 1411.8 715.5 249.4 345.8 168.3 482.3 144.0 255.3 463.1 240.7 608.5 240.7 85.2 656.9 362.7 787.3 286.1 740.2 59.7 315.8 404.0 368.5 325.3 273.1 25.9 193.6 182.2 920.1 402.7 268.5 60.9 559.1 264.4 258.4 220.2 448.8 52.8 207.3 346.0 281.0 642.4 136.2 437.3 176.5 251.8 507.9 212.2 Table 4.9: Distance matrix, invariant representation, 8 x 8 bins, a in bold face. Recognition rate 72=100%. pre 618.7 1215.1 141.4 395.9 177.7 , 359.4 254.7 '' 517.8 465.3 259.6 234.5 309.0 731.6 691.1 112.2 309.2 82.4 334.1 274.0 57.5 = 1.0. Minimum distance We have also computed the recognition rate R as a function of histogram resolution from 2 x 2, 4 x 4, . . . 32 x 32. The results are shown in fig. 4.22. As to be expected, the recognition rates go down as the number of bins gets close to one. A resolution of 8 X 8 seems sufficient for the given testbed. Actually, too high a resolution is not only computationally expensive but can actually lead to lower recognition rates because of the increased sensitivity. We further observe in fig. 4.22 that the recognition rates are consistently higher for the invariant representations for all resolutions than the ones for the intensity images, and prefiltering yields a small advantage over the unfiltered representation. R r=2 r=4 r=8 r=16 r=32 x Bo B i 30 50 30 50 50 Li 2 h /i 20 50 80 50 90 100 40 100 100 50 100 100 50 90 100 Bo # i 40 50 40 50 40 L 2 h h 20 40 80 40 80 100 40 90 100 60 100 100 70 70 80 Bo S i 40 50 40 60 40 h h 20 40 80 50 90 100 40 100 100 60 100 100 30 80 70 L oo Bo B i h 30 50 40 60 40 20 40 30 30 40 h 40 80 90 100 80 100 70 90 40 70 Table 4.10: Recognition rates R as a function of histogram resolution, varying from 2 X 2 to 32 X 32, for different metrics. 7io=intensity images, CT =0; 73i=intensity images, <7 =1.0; 7o=invariant representation, <7 =0; 7i=invariant repres., <7 =1.0. pre pre pre pre We have implemented some simple alternative metrics for the histogram distance computation, namely the L\, L , and the 2 norms, where the L\ norm is the normalized sum of absolute 92 100 90 80 70 60 50 40 30 20 10 0 15 20 25 2x2 30' 8x8 Figure 4.22: Recognition rates R as a function of histogram resolution, varying from 2 x 2 to 32 x 32, for x metric. Both diagrams represent the same information, (circles/left column) intensity images; (stars/center column) invariant representation, <7 =0; (diamonds/right column) invariant representation, <r =1.0. 2 pre pre differences between the histogram bins, the L norm is the normalized sum of squared differences, 2 and the norm is the normalized maximum absolute difference. The resulting recognition rates are given in table 4.10. The performance difference between the x 2 norms is so small as to be insignificant, while the results for the metric and the L\ and L 2 norm are weaker. One way to look at the data in tables 4.7 to 4.9 is to analyze the relative differences between the lowest x 2 distance and the second lowest x 2 distance in each row of the distance matrices. A high relative difference between those two suggests that the query object can be clearly distinguished from the other objects. In order to quantify the relative differences, we define the so-called runner-up distinction RD \ for each row kl of the distance matrix to be k RD kl = sv^{x {H H ))/vam{ {H H )) 2 2 kll k2 X kly k2 (4.19) where smin denotes the operator that determines the second smallest element of its argument vector. Then the mean runner-up distinction RD oyer the entire image database is defined as i " no = - y RD kl (4.20) fci=i where n is the number of images in the database, equal to the number of rows in the distance matrix. 93 Table 4.11 gives RD for the intensity images as well as for the invariant representation with and without prefiltering, for different histogram resolutions from 2x2 up to 32x32. The same data is visualized in fig. 4.23. The important observation here is that for low histogram resolutions representation intensity invariant, a = 0.0 invariant, cr = 1.0 r=2 4.27 2.21 8.58 r=4 2.30 2.06 6.20 r=8 1.65 1.56 3.32 r=16 1.47 1.34 1.71 r=32 1.24 1.19 1.21 Table 4.11: Mean runner-up distinctions RD for different representations and various histogram resolutions r. (r<16), and in accordance with our recognition rate results, the prefiltered invariant representation yields a significantly higher runner-up distinction than the intensity images. However, such is not the case for the highest histogram resolution - presumably, the increasing sensitivity to noise starts to show. 9 8 7 6 5 ' 4 3 2 1 0 0 5 10 15 20 25 30 Figure 4.23: Mean runner-up distinctions RD for different representations and histogram resolutions varying from 2 X 2 to 32 X 32. (circles) intensity images; (stars) invariant representation, 0pre=O; (diamonds) invariant representation, <r =1.0. pre In the image database literature, retrieval performance is often described in terms of recall and precision, where recall is the percentage of retrieved positives relative to the total number of positives, and precision is the percentage of retrieved positives relative to the total number of 94 retrieved items [FFB96]. The trade-off between recall and precision is often plotted in the form of a receiver operating curve (ROC). Recall and precision certainly are useful concepts, but in our case, the database is too small for them to make sense - it contains only one instance of each object, reducing recall to a binary variable, and we easily retrieve and rank the whole database rather than just a part of it, rendering the notion of precision inadequate. If we analyze the computational cost of the histogram based retrieval system, two interesting questions are: • What are the additional costs incurred by using the invariant representation? • Which computations can be done off-line, i.e. at database insertion time, and which computations have to be done on-line, i.e. at query time? The only additional costs for the invariant representation are the ones given by eq. (3.74) in section 3.4.2, of order ,0((m + n)k), for the invariant computation at each pixel. The other costs, independent of the chosen representation, are the costs for feature extraction and distance computation. Here, the features employed - the gradient magnitude and the Laplacian - are computed the same way as for the invariant representation, at the same cost of order 0((m + n) k). For N database entries, the distance evaluation requires TV computations of the x 2 in total is of order 0(N rMagi"Lap) for 2-d histograms of resolution TMag to determine the minimum of the x 2 X ^Lap, metric, which plus 0(N) work distances. The feature histograms are stored in the database. That is, the computation of the invariant representation, the feature extraction, and the histogramming of all database entries is done offline. At query time, the computation of the invariant representation, the feature extraction, and the histogramming is done for the query image only. The distance evaluation has to be done at query time as well. Of course, for larger databases, more efficient indexing schemes than the simplistic full pass over all database entries that we employed here are called for. We can summarize this section by pointing out that the proposed invariant representation gives superior performance in terms of the recognition rate, reinforcing the empirical findings concerning the improved correlation accuracy from the first application scenario. Clearly, the 95 data presented should be evaluated cautiously. We are aware that the given test image database is small, and it is not hard to predict that the recognition rate for the invariant representation will be less than 100% as the size of the image database increases. The recognition rate depends not only on the number of images in the database, but also on their similarity. The baseline intensity image retrieval results can most likely be improved, making the gap between the recognition rates of intensity images and invariant representations somewhat smaller. Nevertheless, it is interesting to observe that even in a small image database, the occurrence of gamma correction causes retrieval problems if it is not adequately addressed. For the given scenario, a performance gain through the use of invariants is visible, and it shows that the invariant representation has sufficient discriminatory power to tell the given objects apart. The empirical evaluation of the discriminatory power of the proposed invariant representation cannot be done in isolation. The ability of an image processing system to distinguish between images is subject to a multitude of factors. The overall system performance depends not only on the representation of the data, but on factors like the features selected, the matching technique, and especially on the image data itself. In the best case, image retrieval performance on data with gamma correction based on the invariant representation approaches image retrieval performance on the same data where gamma correction does not occur. 96 Chapter 5 Invariance under Scaling: Empirical Analysis Invariance under scaling turns out to be much harder to achieve in practice than invariance under gamma correction. The main reason for this has been mentioned in the context of the scaling equation (3.65) in section 3.3.2: The derivatives have to be computed in a finite window, rather than in an e-environment, and the result depends on the size of that window, turning the window size - or, more precisely, the parameter a of the Gaussian - effectively into another free parameter that influences the computed value of the invariant. That is, although the scaling factor a cancels out in the invariants 0 i23 m and 0 i23 , m 7 there is still an a priori unknown parameter that has to be dealt with. An intriguing question arises here: can the parameter a be eliminated using the same framework, i.e. by finding a suitable ratio such that a cancels out? Clearly, the parameter a is of a different nature than a or 7. The latter describe pointwise changes of the image function, while the former describes a convolution process that involves a finite window around each point. Unconstrained convolution can change the image function almost arbitrarily, so invariance under convolution can reasonably only be achieved if the convolution kernel is restricted, e.g. to being symmetric. Here, the convolution kernel is actually constrained to being a Gaussian. In 1-d, we have the function f(x) and the Gaussian filter g(x), so the derivative of the convolved function is d J f(x—t) g(t) dt j dx. However, it is not obvious to this author how invariance under convolution can be achieved using local differentials. Flusser and Suk [F.S98] actually present some global invariants under convolution, which they refer to as blur invariants. One invariant is the tangent of the Fourier transform phase, another invariant is based on image moments. The goal of our undertaking remains the same as indicated in fig. 1.2. That is, we aim to find a unique representation of an object independent of scale, i.e. its imaged size, and ideally this would be a continuous 2-d function defined and robustly computable everywhere. If this turns out to be not achievable, it would still be useful, for image retrieving purposes, to be able to identify some subset of keypoints where a unique representation is computable. And if there is no unique representation for all scales, we can ask whether we can identify some scales that are unique and therefore characteristic for a given object. This is the place to critically review the assumptions we made about the scaling and the image formation process because they will provide some insight into the expected accuracy of the invariant computations. These assumptions concern mainly the sampling theorem and the point spread function that characterizes a linear system. 5.1 Assumptions and Error Sources A central assumption is that the signal is bandlimited. Only then, according to the sampling theorem (see e.g. [Jai89]), can the continuous image function be reconstructed from the sampled data, with the sine function being the reconstruction filter. Otherwise, aliasing occurs, and there is no way the true image function can be reconstructed from an aliased signal. In practice, there are some limitations of the image formation process that have a significant influence on the results presented in this chapter. 1. The sampling theorem assumes point samples. However, by physical necessity, sensors integrate over a finite area. This process is sometimes referred to as area based sampling or as scanning through the aperture. One consequence is the phase effect [Hol98a], which refers to the phenomenon that the response of a sensor element depends on the phase between the 98 signal and the sensor array because it determines which part of the signal is measured. This effect can be quite pronounced at steep edges. The area based sampling can be modelled as the point sampling of the convolution of the original image function with a suitable lowpass filter, but we note that the original image function is not directly accessible. 2. Real world images are not bandlimited [Jai89]. As the distance between sensor and object 1 decreases, more details of the object become visible. Those details, corresponding to high frequencies, are averaged out or even aliased as the distance between sensor and object increases. 3. The number of sensor elements in a camera is fixed. So even if an image function was 2 bandlimited and sampled above the Nyquist frequency at a certain distance between sensor and object, it may not be so anymore as the distance increases, because the object gets mapped onto fewer pixels, i.e. the so-called nominal resolution decreases. 4. The point spread function PSF of the camera is only approximately Gaussian. We will have a closer look at the PSF of our camera below. 5. The model of scale as variable substitution in the image function, f(x) i - r g(ax), considers neither focus nor perspective projection of real cameras. We will assume here that the objects of interest are fully focused, which can be achieved if they don't vary significantly in depth. We also don't distinguish between optical zoom, brought about by changing the focal length of the lenses, and zooming brought about by changing the distance between the camera and the object. Instead, what we are interested in is the decreasing number of pixels an object gets mapped into during a zoom-out. There is also a limitation to simulating scaling because there are no perfect finite lowpass filters, i.e. filters with a boxcar frequency response. Ideally, there would be realizable filters that let pass "Real world" here refers to the macroscopic world of our daily experience. Of course, it could be argued that at the level of particle physics, there may be a smallest physical size, a smallest scale, an absolute cut-off, but in the context of images taken by standard cameras, this is of no concern. It is necessary in this context to distinguish between the image function that describes the real world and the image as imaged by a camera. The latter is always bandlimited as a result of thefinitesize of the lenses and thefinitenumber of sensor elements. 1 2 99 all frequencies below a defined cut-off frequency and eliminate all higher frequencies, without any transition band in between. But it is well known that such filters cannot exist. In the same vein, the optimal interpolation function, i.e. the sine function sinc(a;) = sin(vra;) / irx has infinite support and can therefore only be approximated by finite filters. That approximation is generally rather poor, and therefore interpolation is usually rather done with splines that have compact support. Fig. 5.1 shows various common interpolation functions on the left. On the right of fig. 5.1, a step edge is sampled at the circled points and then reconstructed using a truncated sine function for interpolation. The reconstruction clearly shows ripples and "overshoots". Note that step edges are not bandlimited, while the finite number of sample points for reconstruction imposes a bandlimit. o.s /•/ // Il o.« Ii n \ y 1 \ \ \ \\ \\ \ \\ ~\ v / \ / ~ N • -0.2 Figure 5.1: Interpolation: (left) Interpolation kernels, (solid) sine (dashed) cubic B-spline (dashdotted) cubic (dotted) linear, (right) Approximation of a step edge by a truncated sine function. The circles mark the sample points. Often, imaging systems are assumed to be linear and space invariant [Jai89j. Under those circumstances, an imaging system is completely described by its point spread function. With respect to linearity, Lenz and Fritsch observe that "Due to their operating principle, direct conversion of photons to electrons, the linearity of CCD cameras seems to be excellent (if their Gamma correction can be turned off)". The space invariance assumption is more problematic. Optical distortions, sensor manufacturing irregularities, frame grabber idiosyncrasies, and the discrete nature of the sensor elements imply that the spatial invariance assumption is only approximately true [LF90, FK90]. 100 What does the PSF of the camera we use look like? Is it close to a Gaussian? How does the shape of the PSF influence the image? In order to answer these questions, we imaged two synthetic patterns: a razor edge (sometimes also called a knife edge, or a step function) and a chessboard. The PSF is defined as the impulse response of a linear system where both input and output are non-negative. In the strict mathematical sense, an impulse is infinitesimal small. Rather than approximating that impulse by a blob of finite size, we instead look at how a razor edge, i.e. a sharp black-white transition, is imaged. We are interested in both the horizontal and the vertical edge, since they may be different [Hol98b]. The result is shown in fig. 5.2, first row, for the Sony camera. For comparison, we collected the same data for the digital Olympus D-600L camera, which delivers images in JPEG format, shown in the second row of fig. 5.2. In appendix F, we replaced the razor edge pattern with a cross pattern. We make the following observations: • The razor edge is blurred. There is a transition area between dark and light. • There is an "overshooting" or "edge sharpening" effect on both sides of the edge, especially for the Olympus camera. That is, at the boundaries of the transition area, the highest intensity is a little bit higher than the average high value and the low intensity is a little bit lower than the average low value. 3 • The imaged values are not perfectly symmetric about the edge. Besides from the camera noise, this is a result of the phase effect, i.e. of the positioning of the sensor elements relative to the edge. • The vertical PSF is similar to the horizontal one. A demonstration of the potential effects of area based sampling and the phase effect is given in fig. 5.3. The object to be imaged is a chessboard pattern, in the upper left (a). In the lower left (d) is the image produced by the camera at a certain distance. The scene-camera geometry was chosen such that one pixel maps seven eighths of a chessboard block, as indicated by the This edge sharpening may be a design feature to achieve a perception of increased contrast, realized by electronic postprocessing. Or it could be an artifact of the JPEG compression algorithm. 3 101 Figure 5.2: Point spread function of the Sony camera on a razor edge, (first row) Sony camera; (second row) Olympus camera, (left) camera image (middle) horizontal cross sections, rows 1, 3, 5 (right) vertical cross sections, columns 1, 3, 5. overlayed grid on the left. A striking phenomenon of the image taken are the gray stripes that are obviously artifacts of the imaging process since they don't occur in the original object. But they can be explained by assuming area based sampling. The image (b) is a synthetic image computed by averaging over the area of the object covered by each pixel. Or, equivalently, the image function is convolved with a boxcar filter of size seven eighths. The overlayed grid indicates the effect of the boxcar filtering and the closely related effects of phasing, i.e. of the relative position between object and sensor elements. The upper left pixel covers only a black square, so it is completely black. The next pixel covers mainly a white square, plus a little of the previous black square, so it is white, but with less than maximum intensity. As we move on, the phase shift between pixels and chessboard squares becomes larger, and when the pixel is centered right on the edge between two blocks, the resulting pixel is gray. This happens periodically. If we compare the resulting image to the camera image, we see the same general 102 pattern, but the camera image has less contrast, it looks more gray. This is indicative of the shape of the PSF not being rectangular but closer to a Gaussian. However, it is not Gaussian either images (e) and (f) show the result of purely Gaussian smoothing of the chessboard pattern. These images do not show the characteristic grey stripes that we observed in the camera image. But if we smooth the boxcar filtered chessboard pattern with a Gaussian of a small a, we obtain the image (c), which is perceptual very close to the camera image. Note that other grid ratios lead to different artifacts. ~ (-:' - + —1 _ _ 1 i T 1 -_1 ! — i _ 1 1 1 U - *.! 1 1 1 1 "" T I - r _ iii • " — j _ _ 1 '"i'f"" - r - " "t - r ) 1 ; i 1 ~ l ... ...... p - » *H ^ * ^ • : 1' !! - J-. i —x '"i - i ~ - . . if . . . . ™ i T "f" d -* - J _ i i i T" 1 * - 4. 1 1 - i — — —. I 1 T 1 1 : ::: 1: 1 - J. -- -- 1 _ """! ; (a) Figure 5.3: Area based sampling of a chessboard pattern, (a) synthetic chessboard pattern with overlayed grid; (b) simulated aperture scanning with boxcar filter; (c) simulated aperture scanning with boxcar filter followed by Gaussian smoothing; (d) image of chessboard pattern taken by a camera; (e) Gaussian smoothing of chessboard, a-2.0; (f) Gaussian smoothing of chessboard, (7=3.0. It is common practice in computer vision research to simulate scaling by Gaussian smoothing 103 100 200 300 400 500 600 50 100 150 200 250 300 360 SO 100 150 200 (a) (b) (c) (d) (e) (f) 250 300 3S0 Figure 5.4: Camera scaled images, image Building: (a) full original, 480x640 pixels; (b) alphal subimage, 291x387; (c) o-=1.09, 267x356; (d) a=1.18, 247x329; (e) a=1.27, 230x306; (f) a=1.35, 215x286. followed by resampling [Low99, FS98], i.e. to use synthetically scaled images. Therefore, the next step in our analysis is to compare camera scaled (CS) images to synthetically scaled (SS) images, where camera scaled means that the scaling stems from an increased distance between object and camera. Fig. 5.4 shows a series of CS images. Actually, we have imaged the printout of a grayscale image of a building. This way, the imaged object is guaranteed to be planar and without depth discontinuities, and the scale change can be done in a controlled lab environment. For the comparison between the CS and the SS images, we selected the CS image that has been scaled down by a factor a=1.35 (fig. 5.4(f)), and produce the SS image of equal size from the full sized CS image, the so-called alphal image, fig. 5.4(b). The comparison procedure comprises the following steps: • compute the SS image by prefiltering the alphal image with a Gaussian, parameter o~Syn, and scaling it down to the same size as the CS image 104 • prefilter the C S image with a Gaussian, parameter ocam • compute the difference image A / = | SS - C S | • compute error measures from the difference image. We scale down the SS image so that it can be compared to the C S image on a pixel by pixel basis. The downscaling is done with cubic spline interpolation. The error measures employed here are 4 the same as already encountered in section 4.1.1. That is, the maximum absolute difference A G = max \SS(i,j)-CS(i,j)\ (5.1) *,3 the mean absolute difference ^ G = ^-^2\SS(i,j)-CS(iJ)\ (5.2) i,3 the mean squared difference G = ^£(SS(^)-CS(i,j)) V *.J A 2 (5.3)' v and the signal-to-noise ratio SNR G = 10 log 10 ^ (5.4) where N is the number of valid, i.e. non-boundary, pixels, Vcs is the variance of the C S image, V VA g is the variance of the absolute difference between the C S and the SS image. Generally, the differences decrease as the smoothing increases, but at the cost of increasingly blurred images. In order to compensate for that, one could compute the dynamic range DR of the images, DR = maxjj j) — min;j I(i, j), where I is either the C S or the SS image. Then the differences could be normalized by dividing AQ, AQ, A G by the minimum dynamic range of DRcs, DRss- However, the signal-to-noise ratio implicitly takes the dynamic range into account, no normalization is needed. How are asyni o~c m to be chosen? If we set as =0, a yn then the subsequent downsampling step will result in aliasing, and consequently in large errors. This is shown in fig. 5.5, first row. Matlab function spline() 4 105 The maximum error A G is around 128 on an image with a dynamic range of about 220. Clearly, we want to prefilter the alphal image before downsampling. Therefore, we set <75 =Q;=1.35, yn ac7 =0, as shown in the second row of fig. 5.5. Interestingly, the resulting error remains very am large. It can be seen that the SS image looks much less sharp than the corresponding CS image. We observe that the simulation of scaling by Gaussian smoothing does not accurately predict what a camera scaled image of the same size looks like. Since the CS image looks sharper than the SS image, a straightforward strategy to reduce the difference between the CS image and the SS image is to smooth the CS image as well, and to increasingly smooth the SS image accordingly. Let us assume that the CS image is smoothed with o~Cam, that it is a factor a smaller than the alphal image, and that the underlying scaling is a Gaussian. Then we can apply eq. (3.59) regarding the combination of Gaussian processes to obtain °lyn = « 2 + {(XO-Cam) 2 . (5.5) where the factor a in front of o~Cam considers the fact that the CS image is reduced in size compared to the alphal image which is prefiltered by os yn If we solve for <7c value a m , we obtain the _ °Cam = ^ ^ S y n - a (-) 2 5 6 Fig. 5.5, third row, shows the results if asyn is set to ce+1.0=2.35, and <7<? =1.42, in accordance am with eq. (5.6). We observe that the error has gone down significantly. Slight improvements are possible if acam is chosen a little bit larger than dcam, e.g. c^cam+0.3. The fourth row of fig. 5.5 shows the results for <7Syn=Q+2.0=3.35, and o-c m—ocam+0.3=2.57. The numerical error a measures are given in table 5.1. The SNR shows the large gain achievable by proper filtering. As mentioned in section 2.3, the Gaussian has been proven to have the so-called evolutionary property under scaling, i.e. as the scale increases, no new structures are created, where, in 2-d, "no new structures" means "no new contour levels". However, unlike in the 1-d case, it is well possible that Gaussian smoothing creates new extrema in 2-d. Fig. 5.6 shows an example very similar to the one suggested in [LP90]. Two cones are connected via a narrow ridge whose center is higher 106. Figure 5.5: Camera scaling and synthetical scaling with the Gaussian, image Building, a = 1.3535: (left) CS; (center) SS; (right) A / . Note the error bars: the error scale changes from row to row. (first row) o =0, CT =0; (second row) aSyn=a=l.S5, a =0; (third row) <T5»n=a+1.0=2.35, <7com=1.42; (fourth row) ^ = 0 + 2 . 0 = 3 . 3 5 , cr =2.57. Syn Cam Cam Cam 107 °~Cam A 127.46 113.80 27.96 21.24 16.6014.76 0~Syn 0 0 1.42 1.72 2.27 2.57 G 0 1.35 2.35 2.35 3.35 3.35 A 12.35 13.34 5.74 5.19 4.98 4.90 G A 17.53 17.93 6.79 5.97 5.69 5.56 G SNR 11.443 10.546 20.410 22.191 22.470 22.857 DR DRss 220.0 220.0 164.3 162.4 160.3 159.5 227.0 174.3 169.0 169.0 167.1 167.1 CS Table 5.1: Error measures, camera scaling vs. synthetical scaling with the Gaussian, image Build, a=1.35. The values for asyn are set to 0, a, a+1, a-f-2, respectively. than the cones, i-e. there is a global maximum in between those cones. Gaussian smoothing will quickly erode the narrow ridge, thereby removing the peak and rendering the two cones the new maxima. This has the unpleasant consequence that, if we search for extrema in Gaussian scale space of images, such extrema can both appear or disappear as the scale increases. We conclude this section by reiterating that the camera PSF is not purely Gaussian, but a mixture between Gaussian and boxcar filter that can lead to artifacts that cannot be predicted by a Gaussian model. 5.2 Performance Measures In this section, we will compute the same performance measures for the invariants Q i2 m as we did for O i 2 ©mi23 m 7 and in section 4.1, i.e. accuracy, reliability, and entropy. We chose 9 i 2 TO and 0 i 2 3 rather than Q i 2 3 for simplicity. The general insights gained should be similar for m m 7 all these invariants. The figures given here show the scaling behavior of 0 i 2 , while the scalingm behavior of 0 i 2 3 is shown in appendix G. m Before.we do so, however, we take another look at the 2-d scaling equation, eq. (3.65). That equation suggests that scaling can be simulated in two ways: either on the full sized image (a=l), as on the left hand side of that equation, or on the reduced sized image (a >1), as on the right hand side. The critical difference is the change in a in accordance with a. We call the process described by the left hand side of eq. (3.65) scaling by filtering (SF), and the one on the right scaling by optical zooming (SO). 108 Figure 5.6: Creation of new extrema in 2-d by Gaussian smoothing: in the first row, there is a narrow ridge between the cones, with a maximum at the center; in the second row, that maximum has been removed by smoothing, and the two peaks of the cones are now maxima that didn't exist before the smoothing. The previous global maximum has turned into a saddle point. These two scaling processes applied to sampled images, rather than the continuous domain, are schematized in fig. 5.7. As discussed earlier, prefiltering is essential in order to reduce aliasing. But notice that in the SF scheme, it is the invariant representation that is prefiltered and downsampled, while in the SO scheme, this is done to the intensity image. Since there are no perfect filters, the outcome of the SF scheme is not the same as the outcome of the SO scheme. In fact, the differences between SF and SO are large, as demonstrated in fig. 5.9 for O i2, m using the example image from fig. 5.8. The filtering of the invariant representation before downscaling in the SF process reduces the range of the invariant (in the given examples, — 1 < by SO, but approximately —0.6 < O i2 m < 0.6 by SF, and similarly for © i23) m O i2 < m 1 and causes the final invariant representation to be much smoother than the SO invariant. The difference between Q(SF) and 0(50) can be reduced by postfiltering 0(50), which is also shown in figs. 5.9. How- ever, the issue is whether it is SO or SF that models the camera zooming more accurately. It 109 Prefiltering with a > = a Convolution with Differ. Operators <• \ prefiltered image derivatives Downscaling by factor a Computation of Invariant e zoomed-out image sF Convolution with Differ. Operators Filtering with o > = a /* > filtered invariant derivatives * >. Computation of Invariant Downscaling by factor a 0 fc) s f Postfiltering r 0 SO Figure 5.7: Synthetic scaling scheme: The left branch demonstrates scaling by filtering (SF), th right one scaling by optical zooming (SO). Figure 5.8: Example alphal image Build, 291x387 pixels, 110 0 i2 m and © i23m turns out that SO, without the need to postfilter, provides the better model, so we will pursue scaling by filtering no further. In the remaining part of this section, we compare invariants computed on camera scaled images to invariants computed via SO. The image data has been taken by a camera installed on a movable gantry. The first image is taken with a small distance between camera and object. This image, the alphal image, serves as input to the SO process. Then the camera is moved away from the object along the optical axis, and another image, the camera scaled (CS) image, is taken. The latter is manually cropped and registered such as to contain only the same scene as the alphal image. In terms of fig. 5.7, the alphal image corresponds to the "original image" that is the input for the SO branch, and the CS image is treated the same way as the "zoomed-out image" in the SO branch. 5.2.1 Accuracy The accuracy measures are the same as above, i.e. maximum difference, mean difference, mean squared difference and SNR. The images that we are comparing are the invariant representation of the camera scaled image and the invariant representation of the corresponding alphal image derived by simulated optical zooming. That is, A s c = rnax j \&cs{hj) — ®so(hJ)\, etc. where t O is either O i 2 or O i23- We use the same image data as for the SO vs. SF comparison above. m m The invariant representations and their differences are shown in fig. 5.10 for O i 2 without m prefiltering, and in fig. 5.11 with strong prefiltering (as =3-0)- The prefiltering is a Gaussian yn smoothing done as a first step on both the camera scaled image and the alphal image, with a large a. The a for the alphal image, asyn, was set to image, cream, was computed according to eq. 3.0, and the corresponding a for the camera scaled (5.6), again with a small constant offset of 0.3. The prefiltering is not to be confused with the smoothing inherent in the computation of the derivatives (with a£)er=1.0), v~AA=®)- or the prefiltering done as an anti-aliasing method before downsampling (with The numerical error measures are summarized in table 5.2 for O i 2 and in table m 5.3 for Oml23- Without the prefiltering, the SNR is very low, near 111 3 for © i 2 and near m 2 for 0 i 2 3 - That m SO 60 100 100 150 150 200 200 250 250 300 300 60 100 150 160 200 200 260 260 300 SO 300 100 150 200 gflp fl Hi i . * 50 MUM 'j 100 150 200 250 • Figure 5.9: Scaling by fltering (SF) vs. scaling by optical zooming (SO), 8 i , image Build, (left) SF; (middle) SO; (right) A=\SO - SF\. Note the error bars: the error scale changes from row to row. (first row) a=1.18, no postfiltering; (second row) a=1.18, with postfiltering; (third row) a=1.35, no postfiltering; (fourth row) «=1.35, with postfiltering. m 112 2 means that the error is almost the same order of magnitude as the signal, rendering the invariant effectively useless. Prefiltering improves the situation, lifting the SNR to around 6 for Q i2 m to around 5 for © m i23- Recall that the SNR was around 10 for Q Not surprisingly, the accuracy is higher for G i2 m m i2-y than for (table 4.1). G i23 m and which employs higher derivatives. There is no clear tendency as a function of a. Generally, wejwould expect a higher accuracy for a closer to 1. The maximum difference is always close to the theoretical maximum, i.e. we always encounter outliers. In fig. 5.11, the black areas that indicate large errors are the upper left corner and the lower right corner. The upper left corner contains gray sky, and the lower right corner contains bushes that have a rather low constrast. From fig. G.3 it is clear that large errors can occur everywhere, even after prefiltering. This suggests that 0 i23 m cannot be computed robustly. a = 0 a Acs Acs 1.09 1.18 1.27 1.35 median mean 1.9680 1.9054 1.9595 1.9682 1.9638 1.9503 0.1764 0.1456 0.1925 0.1857 0.1811 0.1751 -T-2 Acs 0.2886 0.2593 0.3063 0.3021 0.2954 0.2891 a = 3.0 SNR Acs Acs 3.229 3.796 2.996 2.937 3.113 3.240 1.9606 1.8597 1.9355 1.8543 1.8976 1.9025 0.1051 0.0873 0.1309 0.1-214 0.1133 0.1112 Acs SNR 0.1888 6.444 0.1739 6.837 0.2298 4.837 0^2182 5.121 0.2035 5.783 0.2027 5.810 Table 5.2: Error measures for 0 i 2 , image Build. The four numerical columns on the left show the error measures without prefiltering, the four right columns with prefiltering, crs =3.0. m yn a = 0 a Acs Acs 1.09 1.18 1.27 1.35 median mean 0.9979 0.9999 0.9987 0.9991 0.9989 0.9989 0.2487 0.1991 0.2754 0.2608 0.2548 0.2460 -T-2 (7 = 3.0 Acs SNR Acs Acs Acs SNR 0.3309 0.2824 0.3612 0.3461 0.3385 0.3302 2.281 3.026 1.667 1.905 2.093 2.220 0.9953 0.9924 0.9984 0.9979 0.9966 0.9960 0.1516 0.1180 .0.1910 0.1713 0,1615 0.1580 0.2160 0.1847 0.2677 0.2459 0.2310 0.2286 5.288 6.016 3.619 4.111 4.700 4.759 Table 5.3: Error measures for G i 2 3 , image Build. The four numerical columns on the left show the error measures without prefiltering, the four right columns with prefiltering, asyn=3-0m 113 Figure 5.10: Camera scaled invariant vs. SO invariant 6 i2, no prefiltering. (left) CS; (center) SO; (right) A=\CS-SO\. (first row) a=1.09; (second row) a=1.18; (third row) a=1.27; (fourth row) a=1.35. TO 114 Figure 5.11: Camera scaled invariant vs. SO invariant Q i 2 , with prefiltering, os = 3.0. (left) CS; (center) SO; (right) A = | C 5 - SO\. (first row) a=1.09; (second row) a=1.18; (third row) a=1.27; (fourth row) a=1.35. m 115 yn 5.2.2 Reliability The reliability analysis here is analogous to the one in section 4.1.2. That is, we identify those points where the relative error of the invariant, defined in accordance with eq. (4.7) to be 100 | © c s _ @ 5 o | / © 5 0 ) with 0 being either 0 i 2 or 0 i 2 3 , is below a specified threshold. m m Fig. 5.12 shows the reliable points for the thresholds e=5%, 10%, 20% for invariant 0 i 2 for m four different values of a, respectively. Table 5.4 summarizes the data for different values of a. All the results are for prefiltered data, since we have already observed that no reasonable results can be expected without strong prefiltering. Even then, the percentages of reliable points are low, compared to the percentages of reliable points for 0 i 2 7 which are around 50 for e=20% m (table 4.2). As with the accuracy, the percentages of reliable points are lower for 0 i 2 3 than for 0 i 2 m m And again, no clear pattern of reliable points as a function of a appears. Specifically, there is no monotonicity: if a point is reliable at ao, we can not conclude that it is reliable at a\ < ao- This observation reinforces our impression that 0 i 2 3 cannot be computed robustly. m a 1.09 1.17 1.27 1.35 median mean 0ml2 e = 5.0 6.7 11.7 5.7 6.5 6.6 7.7 ©ml23 e = 10.0 13.4 22.6 11.5 13.0 13.2 15.1 e = 20.0 26.6 41.1 23.0 25.3 26.0 29.0 e = 5.0 5.9 9.9 5.3 5.9 5.9 6.8 e = 10.0 11.6 19.1 10.2 11.8 11.7 13.2 e = 20.0 22.9 35.0 19.7 22.6 22.8 25.1 Table 5.4: Percentages of reliable points for 0 i 2 on the left and O i 2 3 on the right, image Build, for e=5.0, 10.0, 20.0. Prefiltering with <T =3.0, <7 =2.86. m m 5yn 5.2.3 Cam Entropy We have encountered entropy before, in section 4.1.3. Here, we repeat the same kind of entropy computation for the invariants 0 i 2 and 0 i 2 3 - Some example images and their corresponding m m histograms, binned into either 16 or 256 bins, are given in fig. 5.13. The precise numbers for the normalized entropy H, computed according to eq. (4.11), are given in table 5.5. We observe that 116 Figure 5.12: Reliable points for invariant 0 i , image Build, with prefiltering, a = 3.0. (left) e=5.0; (middle) e=10.0; (right) e=20.0. (first row) o=1.09; (second row) a=1.18; (third row) a=1.27; (fourth row) a=1.35. m 2 117 S y n J the entropy of the Q i2 m representation tends to be lower than the entropy of the intensity image, while the entropy of the @mi23 representation is generally larger. The shape of the Q i23 m @ i2 m and histograms in fig. 5.13 seems to be typical for real world images. Compare to the much more uniform distribution of © m i 2 7 in fig. 4.5. A change in the scaling factor a has little influence on the entropy. L = 16 L = 256 image Int. ©ml2 ©ml23 Build, a=1.00 Build, a=1.09 83.4 81.2 81.6 81.9 82.6 85.0 82.1 80.1 82.0 83.2 76.0 77.0 76.7 76.3 75.9 81.1 76.2 81.6 76.5 77.6 90.5 90.5 90.6 90.5 90.4 90.9 91.0 91.1 90.6 90.7 Build, a=l. 18 Build, a=1.27 Build, a=1.35 lena mandrill wbarb median mean Table 5.5: Normalized entropy H of intensity images vs. bins; (right columns) L=256 bins 5.3 Int. ©ml2 92.2 88.5 .91.1 89.0 91.3 88.9 91.4 88.7 91.6 88.5 92.2 91.1 89.7 88.7 88.4 . 91.5 91.4 88.8 89.4 91.0 @ i2 m and ©ml23 95.6 95.6 95.6 95.6 95.5 95.7 95.8 95.8 95.6 95.7 © i23m (left columns) L=16 Excursion: The Averaged Wavelet Transform An approach to scaling that has received a lot of attention in recent times is based on wavelets. They provide a mechanism to orthogonally decompose a signal' into frequency bands, corresponding to discrete, dyadic scales. The wavelet transform provides an efficient way to compute this hierarchical decomposition. This framework has been tried for image retrieval. However, wavelets have a well known property that poses a major problem in the context of image retrieval: they are not shift-invariant. Wavelets can be seen as basis functions i) k{x) with local support that can be used to s represent a function / . In 1-d, we can write f(x) = J2b rl>(2 x-k) s sk s,k = J2 *kiPsk(x) b s,k 118 (5.7) Figure 5.13: Histograms for entropy computation, 0 i 2 and 0 i 2 3 : L=16 bins in the middle, L=256 bins on the right, image Build, (first row) intensity image, a=1.0; (second row) intensity image, a=1.35; (third row) 0 i , a=1.0; (fourth row) 0 i , a=1.35; (fifth row) 0 i , a=1.0; (sixth row) 0 i 2 3 , a=1.35. m m 2 m m m 119 2 m 2 3 where s denotes the dilation, or scale, and k the translation. That is, all basis functions are derived via dilations and translations from a single function, the so-called mother wavelet. An important property of the wavelet basis is that it is orthogonal, i.e. J ip(x) ip k(x) dx = 0 for all s integers s, k. The coefficients b sk can be computed via the wavelet transform (WT). Mallat [Mal89] has shown that the W T can be computed in 0(N) time, where N is the number of sample points of the function, by means of a pyramidal scheme: At each scale, starting with the original signal, the input is filtered with both a lowpass filter and a corresponding highpass filter. The coefficients from the highpass filtering are kept, and the process is iterated with the downsampled lowpass output, resulting in as many W T coefficients as there are sample points. The W T is information preserving and therefore invertible. Possibly the simplest wavelet is the Haar wavelet, a piecewise constant function with Haar(x)=l for x G [0..0.5) and Haar(x)=-1 for x € [0.5..1). Another well-known class of wavelets are the Daubechies wavelets. The wavelets differ in their area of support and their degree of smoothness. The key idea that we pursue in this section is as follows [Sie98]. We know that the W T is shift variant, i.e. it depends on the position of the signal of interest. But if we assume a discrete signal of finite length TV, then this signal can only occur at N different positions. Now, if we average the W T over all ./V positions, this average will be independent of the initial position, and therefore shift invariant. Shift invariant systems are well understood systems that can be built with linear filters. That is, once we have a linear shift invariant system, we can analyze it in terms of its impulse response, and we can equivalently describe it in terms of filters that create that impulse response. That allows us to compare the AWT to the Gaussian filters we have proposed to implement the invariants from chapter 3. 120 5.3.1 Shift Variance of the Wavelet Transform Fig. 5.14 presents a 1-d example that shows the effect of a small shift of the signal onto how the signal is represented across the scales. The example function on the left is piecewise constant, sampled at ./V=64 points. On the right, the signalis shifted to the right by exactly one unit. The top row shows the input signal, the second row shows the DC component (s=0), and the rows three to seven show the detail (sometimes also called reconstructed components or backprojections) of the input signal at scale s=l to s=5, respectively. The scale s=l is the smallest scale, corresponding to the highest solution, i.e. containing the finest detail. The details are not the wavelet coefficients but computed from the coefficients by eq. (5.7). That is, the detail D at scale s is the sum over s all k. (5.8) k Note that the reconstructed components plus the DC component add up precisely to the original signal. This implies that the D C component contains all signal information that is not in the details. Put another way, the D C component depends on the number of decomposition levels. The sum of the D C component and the details up to scale s is the approximation of f(x) at scale s. On the left hand side of fig. 5.14, the signal is fully captured by just the D C component and the component at scale s=5, all other components are zero. By contrast, on the right hand side, there are non-zero components at all scales. This could be interpreted as resulting from a positional mismatch between the signal and the basis vectors (namely the dilated wavelets) it is represented by. The finer scales can be seen as just compensating the mismatch between the signal and the component at s=5, sometimes referred to as cancellation. Note that the cancellation effects cause a significant variation of energy across the scales. A similar demonstration of the effects of translation onto the wavelet representation is given in [SFAH92]. That paper actually introduced a weaker type of translational invariance called shiftability which means that all the information in a subband remains in that subband. 121 U' • 10 20 30 s=0 (DC) 40 50 0' 60 I 30 1 20 1 s=0 (DC) 1 s=1 s=1 P ' u s=2 s=2 I I 40 50 60 10 n U — T V , s=3 s=4 s=4 s=5 Figure 5.14: Shift variance of the wavelet transform: A piecewise constant 1-d signal and its decomposition into dyadic scales via the Haar wavelet. The signal on the right is shifted by one unit compared to the one on the left, but otherwise identical. Beneath the input signal are the details D at scale s. s 5.3.2 Shift Invariance by Averaging We call the transform that maps a signal to its wavelet transform details averaged over all circular shifts an Averaged Wavelet Transform (AWT). It can be formalized as follows. Let f(x) be a discrete signal of length N, and S = \og (N) the maximum number of dyadic scales. According 2 to eq. (5.7), f(x) = £ ,fc ^sk ipsk(x)- Let f^(x) be the circular right shift of f(x) by i units. That s is, f^(x) = f(x - i) for x=i+l..N, and f^\x) wavelet representation is denoted by f^(x) = f(N - i + 1) for x=l..i. = £ s ^l{x). k The corresponding Now we can write the AWT as a mapping AWT : R N • • • x R% x H-> AWT(x) = (AWT (:r), A W T ( x ) , . . . A W T ( i ) ) 0 1 r 5 where AWT.(x) = ^ £ £ *2 +0 (-) 5 9 t=0 ..k Note the argument (x + i) of the shifted wavelet tp^. It realigns the shifted wavelets, making sure the averaging is done over corresponding samples. The pseudocode for the AWT is given 122 in fig. 5.15. In fig. 5.16, the AWT for the same function as in fig. 5.14 is shown. But unlike in fig. 5.14, the details on the right are identical to the details to the left, up to the one unit shift. We also observe that the details are smoother compared to the pure WT. Appendix H shows the AWT computed for the same input but with Daubechies wavelets. The extension of the AWT from discrete 1-d functions to 2-d images is straightforward. The shifting has to be done in both dimensions, and the wavelet transform itself has to be 2-d. A 2-d WT is computed by doing 1-d WTs on the rows and columns. The computational costs for the AWT when implemented the way described is 0(N ). 2 The W T itself is O(N), and it has to be done N times, for the N shifts. However, Beylkin [Bey92] has shown that there is an 0(N\og N) 2 2-d, the costs are 0(N ) 4 algorithm to compute the WTs of all A' circular shifts. In for the naive implementation and 0(N \og\ 2 Algorithm Averaged N) for Beylkin's algorithm. Wavelet Transform A W T Input: discrete function f(x) number of samples N wavelet type W Output: AWT decomposition / * compute number of scales * / •S:=floor(log (N)) 2 / * compute wavelet transform for each shifted signal * / for i •:= 0 to N - l do shift f(x) circularly to the right by i units compute wavelet transform of shifted function at S scales using wavelet W for s := 0 to S do reconstruct details from wavelet coefficients shift the details circularly to the left by i units / * average at each position, at each scale * / for i := 0 to N - l do for s := 0 to S do average details over all shifts Figure 5.15: Pseudocode for the averaged wavelet transform. 123 The AWT is information preserving but redundant. The original signal can be reconstructed from the AWT by adding up the details: / W = ^AWT (x) (5.10) s s=0 Each level of detail consists of N samples, so there is a redundancy by a factor 5+1. Clearly, the AWT is not a wavelet transform. The linear averaging operation that gained shift invariance has also eliminated key properties of the pure WT, like orthogonality and non-redundancy. 200 100 10 s=0 (DC) 20 30 40 s=0 (DC) S=1 s=1 i f s=2 s=2 s=3 s=3 Figure 5.16: Shift invariance of the averaged wavelet transform: A piecewise constant 1-d signal and its decomposition into dyadic scales via the AWT(Haar). The signal on the right is shifted by one unit compared to the one on the left, but otherwise identical. Beneath the input signal are the details D at scale s. s 5.3.3 The A W T Filterbank Linear systems are well defined by their impulse responses. Therefore, we can analyze the AWT by computing the AWT output if the input signal is a unit impulse. This has been done in 1-d in fig. 5.17 for three different wavelets. We observe that the impulse responses are symmetric and have finite spatial support. The area of support increases with scale. The AWT is most regular for the Haar wavelet. At scale s=l, the impulse response consists of the vector ( - 0 . 2 5 0 . 5 - 0 . 2 5 ) , and at scale s=2, the impulse response vector is ( - 0 . 0 6 2 5 - 0 . 1 2 5 0 . 0 6 2 5 0 . 2 5 0 . 0 6 2 5 - 0 . 1 2 5 - 0 . 0 6 2 5 ) , etc. That is, the size of the impulse response quasi doubles from scale to scale, while the maximum value, at the center, is halved. In fact, the support sizes of the 124 AWT Haar filters are 3, 7, 15, • • - = 2 i+1 7, 19, 43, • • • = 1 + 3 5TJ*=i > 2fc a n d t h - 1. The support sizes of the AWT Daub filters are 2 e support sizes of the AWT Daub filters are 31, 91, 211, 8 ••• = l + 1 5 - £ J L i 2 * . Figure 5.17: 1-d AWT filterbanks: impulse responses for D C component and three scales, for various wavelets, (left) Haar; (middle) Daub2; (right) Daubg. Some 2-d AWT filters are given in fig. 5.18 for the Haar wavelet, in fig. 5.19 for the Daub 2 wavelet, and in fig. 5.20 for the Daubg wavelet. The coefficients of the smallest scale Haar AWT 2-d filter are -0.0625 -0.1250 -0.0625 -0.1250 0.7500 -0.1250 -0.0625 -0.1250 -0.0625 1 ~ ~16 1 2 1 2 -12 2 1 2 1 which turns out to be similar to some proposed discrete approximations of the Laplacian [Hor86, Jai89]. More coefficients are given in appendix H. The Haar wavelet AWT computed on the image Build is shown in fig. 5.21. The shapes of the impulse responses describe the set of filters that can be equivalently used to achieve the same system response via convolution. This is much more efficient than computing the AWT directly. The AWT filterbank has to be computed only once by computing the AWT of an impulse response, and can be stored for reuse. The properties of the AWT filters can be summarized as follows. 125 Figure 5.18: 2-d AWT filterbank: impulse responses for DC component and two scales, Haar wavelet. Note that the axes have been rescaled for better visualization, (left) D C filter for S=4; (middle) scale s=3; (right) scale s=4. 40 Figure 5.19: 2-d AWT filterbank: impulse responses for D C component and two scales, Daub wavelet. Note that the axes have been rescaled for better visualization, (left) DC filter for S=3; (middle) scale s=2; (right) scale s=3. 2 Figure 5.20: 2-d AWT filterbank: impulse responses for D C component and two scales, Daubs wavelet. Note that the axes have been rescaled for better visualization, (left) DC filter for S=3; (middle) scale s=2; (right) scale s=3. 126 • They are symmetric. • They have finite support. The larger the scale, the larger the support. The exact size depends on the underlying wavelet. • They are not rotationally symmetric. • They are not orthogonal, creating a redundant representation of the input signal. • They have both positive and negative lobes. By nature of their creation, the filters are only computed at certain discrete scales, but it would not be difficult to create intermediate filters by stretching or interpolating the given filters. We are now in a position to compare the AWT filters'to. the Gaussian kernels proposed in the previous chapters. We noted that the Gaussian is the only kernel with certain scale space properties (see section 2.3).. It follows that the AWT filters don't have these properties. Specifically, their shapes imply that they may create certain artifacts. Therefore, there seems to be no obvious advantage to use the AWT niters in place of the Gaussian kernels. 5.4 The Quest for Characteristic Points in O Space We have already compared the scale invariants 0 i2 m and 0 i23 m computed on a camera scaled image to the ones predicted by the Gaussian scaling model, assuming the scaling factor a to be known. In this section, we assume that a is not known a priori. The issue then is what can be said about the values of the invariants. Or put differently, how scale invariant are the scale invariants? And if the computed values of the invariants depend on parameter settings that are not known, can there still some characteristic points like extrema or zero crossings consistently be identified? Recall from section 3.3.2 that for the computation of the invariants on discrete image data with the proposed Gaussian kernels, an appropriate value for the Gaussian a has to be chosen. Here, we vary that parameter, called o~d , over a wide range, from cr^ =1.0 to <7d =3.0. We call er er er the space of invariant representations computed this way the © space, where © may be either 127 (a) (b) (c) Figure 5.21: 2-d AWT, Haar filters, applied to image Build, S=7. (a) input image; component; (c) 8=1; (d) s=2; (e) s=3; (f) s=4; (g) s=5; (h) s=6; (i) s=7. 0 i2 m (b) DC or © i 2 3 - Since section 5.2 has made the necessity of prefiltering obvious, we also have to m look at how prefiltering influences 0 space. The pseudocode for computing the © space is given in fig. 5.22. First, the input image is prefiltered. Then the © space is sampled in scale by determining the a that corresponds to reducing the size of the scaled image by one row at a time, up to an arbitrarily chosen cut-off of a=3.0. Note, however, that in this set-up, the invariant representation is not actually downscaled. That enables us to directly compare points according to their location, and it avoids any aliasing problems that may occur. Once the appropriate a and ader are determined, the invariant © , with t 128 t=ml2 or t=ml23, is computed at that specific scale as described in section 3.3.2. The resulting 0 space matrix can be seen as giving, for each image location P(x,y), a trajectory through scale space. A l g o r i t h m 0 Space Input: Image I image size N x M invariant type t max. a l p h a a prefilter s i g m a <xs O u t p u t : 0 space N x M x L m a t r i x Q (x, y; a) m a x yn t / * compute number of scales * / R o w H i g h := N R o w L o w := round (N/a ) L := R o w H i g h - R o w L o w + 1 max / * prefiltering w i t h Gaussian * / I := I © G ( c r 5 y n ) / * compute discrete 0 space "rowwise" * / CrferO := 1.0 for R o w C u r r := R o w H i g h downto R o w L o w do acurr •— RowHigh/RowCurr °~der '•— &Curr * °~derO / * don't downscale here * / compute 0 ( x , y; a t ) Curr on / using a d e r Figure 5.22: Pseudocode for 0 space creation. Ideally, the trajectories through 0 space were constant. We know already that such is not the case. Fig. 5.25 shows a few examples, for different prefiltering, of trajectories through 0mi2 space, and, analogously, fig. 5.26 for 0 i23 m of the image Build of size 291x387 pixels. In fig. 5.25(f) we observe a highly desirable behavior: 0 i 2 is virtually the same for all three m prefilterings, and it is almost constant even as a changes. By contrast, other points show a huge variation both as a function of the prefiltering and as a function of a. The variation of 0 as a function of a usually decreases as the prefiltering increases. 129 In order to obtain an overview of the variability of 0 space, we have computed, for each and O i23(z, y, a) as a function of pixel location P(x,y), the standard deviation of ® i2{x,y,a) m m a. These standard deviations are shown in figs. 5.23 and 5.24, respectively. They show that there is a large number of pixels with a significant variation in 0 space. Figure 5.23: Standard deviation in Q dle) a =1.0; (right) a =3.0. space, a=[1.0,3.0], image=Build. (left) cr =0; ml2 Syn Syn (mid- Syn Figure 5.24: Standard deviation in 0 (middle) a =1.0; (right) a =3.Q. Syn m i23 space, a=[l.0,3.0], image=Build. (left) as =0; yn Syn A measure similar to the standard deviation is the Max-Min difference M M , defined as the distance between the largest and the smallest value of 0(x, y; a) along a, i.e. M M f i , y) = | max0i(x, y; a) — min QAx, y; a)\ a (5.12) a The Max-Min differences M M for the same image Build are given in appendix I. They show the same tendencies as the standard deviation. That is, we find that at most locations, the variations along a are so large that the computation of the "invariants" becomes meaningless. Furthermore, the trajectory examples from figs. 5.25 and 5.26 suggest that the invariants don't converge towards 130 a stable value as a increases, and they don't show extrema that are independent of the prefiltering either. There are locations where the variation is small, visualized in black in figs. 5.22 and 5.23, but it is not obvious to us where these locations occur, i.e. how these points could be identified without computing the whole 0 space. t.2 1.4 1.8 1.1 2 2.4 2.8 2.8 I 1.2 1.4 (d) 16 . 1.8 1 (e) 22 1.4 2* 2.8 1 12 . 1.4 1.8 Figure 5.25: Trajectories through Q i2 space, a=[l.0,3.0], image=Build. (dashed) (T =l-0; (dash-dotted) o - ^ = 3 . 0 . (a) P(08,15); (b) P(47,ll); (d) P(86,320); (e) P(187,222); (f) P(82,195). m Syn 5 2 1.8 2.2 2.4 2.8 2.8 . ( f ) (solid) <7 =0; (c) P(209,137); Sjm Why is there such a strong variability in 0 space, which has been designed to be scale invariant? We have indicated the answer already above. The main reason is that derivatives have to be computed over a finite sized window, and that window gets larger as the scale increases. As a consequence, the structures in the neighborhood of the point of interested interfere with the computation of the derivatives, and that interference is different with each different window size. Another reason has been indicated in fig. 5.6. New extrema can be created even by Gaussian 131 to • ' (d) (b) (c) (o) (f) Figure 5.26: Trajectories through 0 i 2 3 space, a=[l.0,3.0], image=Build. (dashed) ffs„„=1.0; (dash-dotted) aSyn=S.O. (a) P(08,15); (b) P(47,ll); (d) P(86,320); (e) P(187,222); (f) P(82,195). m (solid) <7 =0; (c) P(209,137); Syn smoothing, and those can change the derivatives significantly, at least locally. If we now combine the results from this section with those from section 5.2 where we showed that the Gaussian model provides only limited accuracy and that the SNR for the invariants is low because of area based sampling, non-Gaussian PSF, alignment problems, and noise in general, we are led to a sobering conclusion: we haven't succeeded in the design of an algorithm that would robustly extract image points whose feature values are invariant under scale. Schmid, Mohr, and Bauckhage [SMB98] have reported a similar lack of success trying to identify key points (or interest points) under scale change. They applied various key point detectors to an image. They recorded the key points, then changed the scale of the image by changing the focal length of their camera, and applied the same key point detectors again. Then they measured the so-called repeatability rate, i.e. the percentage of keypoints found in both images within a 132 distance of e from each other. For e=0.5 and a scaling factor a=1.5, they report a repeatability rate of around 10% for the best key detector, and even less than 5% at « = 2 . 0 . For e=1.5, the repeatability rates are higher, but they drop quickly to below 40% at a=2.0. These results are indicative of the problems inherent in scaled images. The mediocre results reported in this section suggest that differential invariants are generally not robust in the face of scale changes. Differential invariants are attractive for their locality, but it is exactly this locality that turns out to be a problem as the changes in scale significantly change the locally sampled signal. Clearly, human observers are much more robust to scale changes than differential invariants. It seems that humans have an underlying notion of "identity" or "shape" or "gestalt" of objects that is invariant even if local gradients change. Possibly, that notion incorporates other features like color, texture, etc. Differential invariants cannot deliver that sense of invariance. 133 Chapter 6 Conclusion In this work, we have investigated some local, differential invariants for the use in image processing tasks, specifically image retrieval. The guiding idea was to compute a representation from the input images that is invariant under a certain set of geometric and radiometric transformations, with focus on gamma correction. With such an invariant representation, those transformations will not influence subsequent computations, thereby allowing higher retrieval performance. The proposed invariants are based on differentials which have been implemented with the derivatives of the Gaussian. The computation of derivatives on discrete data is an ill-defined problem, also requiring smoothing to cope with noise. The numerical problems inherent in the discrete computation of derivatives put a limit on the accuracy of the invariant representation. Non-Gaussian methods to compute the differentials may prove superior in specific instances. Data dependent computations of derivatives may be tried if a data model is available. Similarly, different filters could be used to prefilter the images. Invariance under gamma correction has been achieved by forming a ratio of derivatives such that the transformation parameters cancel out. A theoretical analysis of the unmodified invariant in 1-d, constrained to some elementary basis functions, shows that the invariant is complete, i.e. the invariant contains all original information up to the transformation parameters of interest. However, completeness is lost by a proposed modification of the invariant that makes the invariant defined everywhere, i.e. also where the denominator of the ratio is zero. 134 The empirical data that we present shows that the gamma correction done by our camera is adequately modelled by an analytical power law equation. However, two kinds of image areas have been identified where the invariant cannot be computed robustly. First, homogeneous areas of constant brightness are not conducive to robust computation of ratios of derivatives. Second, areas of very high contrast induce errors since the discrete derivative computation is inherently bounded by the inevitable smoothing. Nevertheless, we have demonstrated on a small test bed that the invariant representation leads to an increased retrieval performance both for template matching and for histogram based retrieval in a scenario where the database images were subjected to gamma correction but the query image was not. We have shown how the same method of designing suitable ratios of derivatives • can be extended such as to achieve simultaneous invariance under scaling. However, a number of problems severely limits the usefulness of differential invariants in the case of scale changes. Critically, for discrete data, derivatives have to be computed over a finite window size. The size of the window has to match the scaling factor in order to achieve consistent results, and if the scaling factor is a priori unknown, so is the corresponding window size. We searched in vain for scale space features that are. stable and independent of the window size. Furthermore, the empirical data that we present suggests that noise in general and inherent shortcomings of the image formation process in particular render a robust computation of differential invariants impossible in the case of scale changes. Future sensors with a reduced size of their active region, possibly based on CMOS technology, may alleviate the area based sampling induced problems. Can the framework be extended to cope with more transformations? There is no theoretical reason why this couldn't be done, but more transformations, amounting to more transformation parameters to be considered and eliminated, require higher derivatives. The higher the derivative, the higher the sensitivity to noise, and the larger the area of support needed to compute it. Our empirical data indicates robustness problems for derivatives of order larger than two. The proposed invariants are in no way restricted to applications in image processing. And even though gamma correction may look at first like an idiosyncrasy of imaging tools, it is in fact nothing else than a generic power law transformation. The invariants under gamma correction 135 provide a general method to trade off derivatives with a power law parameter. Such a method can be of interest to applications in other fields. Is there a biological realization of the proposed invariants out there? Especially in the case of invariance under gamma correction, this seems like afar-fetched speculation, given that gamma correction is an engineering artifact in the first place. However, it is known that certain cells in the human eye perform a kind of derivative computation. And if the automatic correction for some power law transformation grants higher survival value, it cannot be excluded that such invariants could find some real life incarnation. 136 Bibliography [AH99] S. Aksoy and R. Haralick. Graph-theoretic clustering for image grouping and retrieval. In CVPR, Ft.Collins, 1999. [AM99] S. Abbasi and F. Mokhtarian. Shape similarity retrieval under afTine transform: Application to multi-view object representation and recognition. In ICCV, Kerkyra, 1999. [AndOO] S. Ando. Consistent gradient operations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3):252-265, March 2000. [ASBH90] K. Arbter, W. Snyder, H. Burkhardt, and G. Hirzinger; Application of affineinvariant Fourier descriptors to recogniton of 3d objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7):640-647, July 1990. [AW99] R. Alferez and Y. Wang. Geometric and illumination invariants for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(6):505537, June 1999. [BA83] P. Burt and E. Adelson. The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31(4):532-540, April 1983. [Bey92] G. Beylkin. On the representation of operators in bases of compactly supported wavelets. SI AM Journal of Numerical Analysis, 6(6):1716-1740, December 1992. [BFF97] J. Barnard, B. Funt, and G. Finlayson. Color constancy for scenes with varying illumination. Computer Vision and Image Understanding, 65(2):311—321, February 1997. [BL99] J. Beis and D. Lowe. Indexing without invariants in 3d object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(10):1000-1015, October 1999. [BL00] A. Blanksby and M . Loinaz. Performance analysis of a color CMOS photogate image sensor. IEEE Transactions on Electron Devices, 47(l):55-64, January 2000. [BP97] A. Del Bimbo and P. Pala. Visual image retrieval by elastic matching of user sketches. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):121-132, February 1997. 137 [BPHB91] E. Barrett, P. Payton, N. Haag, and M . Brill. General methods for determining projective invariants in imagery. CVGIP: Image Understanding, 53(l):46-65, January 1991. [BPJ93] J. Bach, S. Paul, and R. Jain. A visual information management system for the interactive retrieval of faces. IEEE Transactions on Knowledge and Data Engineering, •5(4):619-628, August 1993. [BtHRKV93] J. Blom, B. ter Haar Romeny, J. Koenderink, and M . Viergever. Spatial derivatives and the propagation of noise in Gaussian scale-space. Journal of Visual Communication and Image Representation, 4:1-13, March 1993. [BWR92] J. Burns, R. Weiss, and E . Riseman. The non-existence of general-case viewinvariants. In J. Mundy and A. Zisserman, editors, Geometric Invariance in Computer Vision, pages 120-131. MIT Press, Cambridge, 1992. [BWR93] J. Burns, R. Weiss, and E . Riseman. View variation of point-set and linesegment features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(1):51—68, January 1993. [Can86] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679-698, November 1986. [Chu92] C. Chui. An Introduction to Wavelets. Academic Press, 1992. [CY89] M . Chen and P. Yan. A multiscaling approach based on morphological filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(7):694-700, July 1989. [Dau92] I. Daubechies. Ten Lectures on Wavelets. SIAM, 1992. [Der90] R. Deriche. Fast algorithms for low-level vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(l):78-87, January 1990. [EKTar] W. Evans, D. Kirkpatrick, and G. Townsend. Right-triangulated irregular networks. Algorithmica, to appear. [Eld99] J. Elder. Are edges incomplete? 34(2/3):97-122, 1999. [EZ98] J. Elder and S. Zucker. Local scale control for edge detection and blur estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(7):699-716, July 1998. [FF95] B. Funt and G. Finlayson. Color constant color indexing. IEEE Transactions on International Journal of Computer Vision, Pattern Analysis and Machine Intelligence, 17(5):522-529, May 1995. [FFB96] D. Forsyth, M . Fleck, and C. Bregler. Finding naked people. In ECCV, Cambridge, 1996. 138 [FK90] J. Feltz and M . Karim. Modulation transfer function of charge-coupled devices. Applied Optics, 29(5):717-722, February 1990. [FL79] R. Fowler and J. Little. Automatic extraction of irregular network digital terrain models. Computer Graphics, 13:199-207, 1979. [FMZ+91] D. Forsyth, J. Mundy, A. Zisserman, C. Coelho, A. Heller, and C. Rothwell. Invariant descriptors for 3-d object recognition and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):971-991, October 1991. [FS98] J. Flusser and T. Suk. Degraded image analysis: An invariant approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(6):590-603, June 1998. [FSN+95] M . Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M . Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: the QBIC system. IEEE Computer, 28(9):23-32, September 1995. See also: http://wwwqbic.almaden.ibm.com. [FStHR 95] L. Florack, A. Salden, B. ter Haar Romeny, J. Koenderink, and M . Viergever. Nonlinear scale-space. Image and Vision Computing, 13(4):279-294, May 1995. [Fua93] P. Fua. A parallel stereo algorithm that produces dense depth maps and preserves + image features. Machine Vision and Applications, 6:35-49, 1993. [GMP095] L. Van Gool, T. Moons, E. Pauwels, and A. Oosterlinck. Vision and Lie's approach to invariance. Image and Vision Computing, 13(4):259-277, May 1995. [GMU96] L. Van Gool, T. Moons, and D. Ungureanu. Affine/photometric invariants for planar intensity patterns. In ECCV, Cambridge, 1996. [Hil93] D. Hilbert. Uber die vollen Invariantensysteme. Mathematische Annalen, 42:313373,1893. [HJ96] G. Healey and A. Jain. Retrieving multispectral satellite images using physics-based invariant representations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):842-848, August 1996. [HK94] G. Healey and R. Kondepudy. Radiometric CCD camera calibration and noise estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(3):267-276, March 1994. [HMS96] P. Havaldar, G. Medioni, and F. Stein. Perceptual grouping for generic recognition. International Journal of Computer Vision, 20(l/2):59-80, 1996. [Hol98a] G. Hoist. CCD arrays, cameras, and displays. JCD Publishing & SPIE Press, 2 edition, 1998. 139 [Hol98b] G. Hoist. Sampling, Aliasing, and Data Fidelity. JCD Publishing & SPIE Press, 1998. [Hor86] B. Horn. Robot Vision. MIT Press, 1986. [IA99] Q. Iqbal and J. Aggarwal. Applying perceptual grouping to content-based image retrieval: Building images. In CVPR, Ft.Collins, 1999. [Jai89] A. Jain. Fundamentals of Digital Image Processing. Prentice-Hall, 1989. [JD96] P. Jackway and M . Deriche. Scale space properties of the multiscale morphological dilation-erosion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1):38-51, January 1996. [JFS95] C. Jacobs, A. Finkelstein, and D. Salesin. Fast multiresolution image querying. In Computer Graphics, 1995. [KK97] Y. Kim and W. Kim. Content-based trademark retrieval system using visually salient feature. In CVPR, Puerto Rico, 1997. [Koe84] J. Koenderink. The structure of images. Biological Cybernetics, 50:363-370, 1984. [KR98] M . Kliot and E. Rivlin. Invariant-based data model for image databases. In ICIP, Chicago, 1998. [KvD87] J. Koenderink and A. van Dorn. Representation of local geometry in the visual system. Biological Cybernetics, 55:367-375, 1987. [KvD9.2] J. Koenderink and A. van Dorn. Generic neighborhood operators. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(6):597-605, June 1992. [LF90] R. Lenz and D. Fritsch. Accuracy of videometry with CCD sensors. ISPRS Journal of Photogrammetry and Remote Sensing, 45:90-110, 1990. [Lin94a] T. Lindeberg. Scale-space theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics, 21(2):223-261, 1994. [Lin94b] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer, 1994. [Lin98] T. Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79-116, 1998. [LJ89] Y. Lu and C. Jain. Behavior of edges in scale space. IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(4):337-356, April 1989. [LJ92] Y. Lu and C. Jain. Reasoning about edges in scale space. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(4):450-467, April 1992. [LM99] T. Leung and J. Malik. Recognizing surfaces using three-dimensional textons. In ICCV, Kerkyra, 1999. 140 [Low85] D. Lowe. Perceptual Organization and Visual Recognition. Kluwer, 1985. [Low87] D. Lowe. Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence, 31(3):355-395, March 1987. [Low99] D. Lowe. Object recognition from local scale-invariant features. In ICCV, Kerkyra, 1999. [LP90] L. Lifshitz and S. Pizer. A multiresolution hierarchical approach to image segmentation based on intensity extrema. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(6):529-540, June 1990. [Mal89] S. Mallat. A theory for multiresolution signal decomposition: The wavelet representation; IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(7):674-693, July 1989. [Mal91] S. Mallat. Zero-crossings of a wavelet transform. IEEE Transactions on Information Theory, 37(4):1019-1033, July 1991. [Man77] B. Mandelbrot. Fractals: Form, Chance, and Dimension. W.H. Freeman, San Francisco, 1977. [Mar82] D. Marr. Vision. Freeman, 1982. [Mar93] G. Martin. High-resolution color CCD camera. In SPIE Conference on Cameras, Scanners, and Image Acquisition Systems, San Jose, 1993. [MCKV97] Y. Mallet, D. Coomans, J. Kautsky, and O. De Vel. Classification using adaptive wavelets for feature extraction. IEEE Transactions on.Pattern Analysis and Machine Intelligence, 19(10):1058-1066, November 1997. [MCL97] M. De Marsicoi, L. Cinque, and S. Levialdi. Indexing pictorial documents by their content: A survey of current techniques. Image and Vision Computing, 15(2) :119— 141, February 1997. [MH80] D. Marr and E. Hildreth. Theory of edge detection. Proceedings of the Royal Society London B, 207:187-217, 1980. [MM86] F. Mokhtarian and A. Mackworth. Scale based description and recognition of planar curves and two-dimensional shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(l):34-43, January 1986. [Mur96] D. Murgu. Individual tree detection and localization in aerial imagery. Master's thesis, The University of British Columbia, 1996. [MZ92a] S. Mallat and S. Zhong. Characterization of signals from multiscale edges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(7):710-732, July 1992. 141 [MZ92b] J. Mundy and A. Zisserman, editors. Geometric Invariance in Computer Vision. MIT Press, Cambridge, 1992. [MZF93] J. Mundy, A. Zisserman, and D. Forsyth, editors. Applications of Invariance in Computer Vision. Springer, 1993. [01v95] P. Olver. Equivalence, Invariants, and Symmetry. Cambridge University Press, 1995. [OPS 97] M . Oren, C. Papageorgiou, P. Sinha, E . Osuna, and T. Poggio. Pedestrian detection using wavelet templates. In CVPR, Puerto Rico, 1997. [Pen84] A. Pentland. Fractal-based description of natural scenes. IEEE Transactions on + Pattern Analysis and Machine Intelligence, 6(6):661-674, 1984.. [Pic95] R. Picard. Light-years from Lena: Video and image libraries of the future. In ICIP, Washington, 1995. [PM90] P. Perona and J. Malik. Scale space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7):629-639, July 1990. [Poy96] C. Poynton. A Technical Introduction to Digital Video. John Wiley & Sons, 1996. See also: http://www.inforamp.net/~poynton/GammaFAQ.html. [PP96] R. Picard and A. Pentland. Digital libraries: Representation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), 1996. [PPS96] A. Pentland, R. Picard, and S. Sclaroff. Photobook: Content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233-254, 1996. [PTVF92] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge Press, 2 edition, 1992. [Rei93] T. Reiss. Recognizing Planar Objects using Invariant Image Features. Springer, 1993. [RKCJ96] N. Ratha, K. Karu, S. Chen, and A. Jain. A real-time matching system for large fingerprint databases. IEEE)Transactions on Pattern Analysis and Machine Intelligence, 18(8):799-813, August 1996. [RM98] S. Ravela and R. Manmatha. Retrieving images by appearances. In ICCV, Bombay, 1998. [RV91] 0. Rioul and M . Vetterli. Wavelets and signal processing. . IEEE SP Magazine, pages 14-37, October 1991. [RW95] E. Rivlin and I. Weiss. Local invariants for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(3):226-238, March 1995. 142 [SB91] M . Swain and D. Ballard. Vision, 7(l):ll-32, 1991. Color indexing. International Journal of Computer [SC96] B. Schiele and J. Crowley. Object recognition using multidimensional receptive field histograms. In ECCV, Cambridge, 1996. [SCOO] B. Schiele and J. Crowley. Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(l):31-50, January 2000. [SFAH92] E. Simoncelli, W. Freeman, E. Adelson, and D. Heeger. Shiftable multiscale transforms. IEEE Transactions on Information Theory, 38(2):587-607, March 1992. [SH96] D. Slater and G. Healey. The illumination-invariant recognition of 3d objects using local color invariants. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(2):206-210, February 1996. [Sie98] A. Siebert. A linear shift invariant multiscale transform. In ICIP, Chicago, 1998. [Siear] A. Siebert. Retrieval of gamma corrected images. Pattern Recognition Letters, to appear. [SM96] C. Schmid and R. Mohr. Combining greyvalue invariants with local constraints for object recognition. In CVPR, San Francisco, 1996. [SM97] C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530-535, May . 1997. ;< [SMB98] C. Schmid, R. Mohr, and C. Bauckhage. Comparing and evaluating interest points. In ICCV, Bombay, 1998. [Str89] G. Strang. Wavelet and dilation equations: A brief introduction. SIAM Review, 31(4):614-627, December 1989. [TB97] Q. Tieng and W. Boles. Wavelet-based affine invariant representation: A tool for recognizing planar objects in 3d space. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(8):846-857, August 1997. [tHR94] B. ter Haar Romeny, editor. Kluwer, 1994. Geometry-Driven Diffusion in Computer Vision. [tHRFSV94] B. ter Haar Romeny, L. Florack, A. Salden, and M . Viergever. Higher order differential structure of images. Image and Vision Computing, 12(6):317-325, July 1994. [TP86] V. Torre and A. Poggio. On edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(2):147-163, March 1986. 143 [TP91] M . Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(l):71-86, 1991. ' [TSD+94] B. Telfer, H. Szu, G. Dobeck, J. Garcia, H. Ko, A. Dubey, and N. Witherspoon. Adaptive wavelet classification of acoustic backscatter and imagery. Optical Engineering, 33(7):2192-2203, 1994. [UAE93] M . Unser, A. Aldroubi, and M . Eden. The L polynomial spline pyramid. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):364-379, April 1993. [VK95] A. Vellaikal and C. Kuo. Content-based image retrieval using multiresolution histogram representation. In SPIE Conference on Digital Image Storage and Archiving Systems, Philadelphia, 1995. [vVYV98] L. van Vliet, I. Young, and P. Verbeek. Recursive gaussian derivative filters. In ICPR, Brisbane, 1998. [Wei88] I. Weiss. Projective invariants of shape. In DARPA Image Understanding Workshop, 1988. [Wei93] I. Weiss. Geometric invariants and object recognition. International Journal of Computer Vision, 10(3):207-231, 1993. [Wei94] I. Weiss. High-order differentiation filters that work. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(7):734-739, July 1994. [Wit83] A. Witkin. Scale space filtering. In IJCAI, Karlsruhe, 1983. [WL98] Y. Wang and S. Lee. Scale-space derived from B-splines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(10):1040-1054, October 1998. [Woo96] J. Wood. Invariant pattern recognition: A review. Pattern Recognition, 29(1):1-17, 1996. [WS93] D. Williams and M . Shah. Edge characterization using normalized edge detector. 2 CVGIP: Graphical Models and Image Processing, 55(4):311-318, July 1993. [YI99] A. Yoshitaka and T. Ichikawa. A survey on content-based retrieval for multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 11(1) :81—93, January 1999. [ZC97] D. Zhao and J. Chen. Affine curve moment invariants for shape recognition. Pattern Recognition, 30(6):895-901, 1997. [ZH87] O. Zuniga and R. Haralick. Integrated directional derivative gradient operator. IEEE Transactions on Systems, Man, and Cybernetics, 17(3):508-517, May 1987. 144 Appendix A The 1-d Gaussian and its Derivatives The 1-d, zero mean Gaussian is defined as G(x; CT) (A.l) = Then the first derivative of the Gaussian is G'(x; cr) = e~& = G(x; a) the second derivative of the Gaussian is the third derivative of the Gaussian is , '^,1111 \ l^ x x 3 \ 1 3xa 2 - x 3 Note how all derivatives can be expressed in terms of the original Gaussian function. 145 (A.2) Appendix B The 2-d Gaussian and its Derivatives In 2-d, the zero mean Gaussian is defined as G(x,y;o) = — ^ e ^ • (B.l) The first partial derivative with respect to x is dG(x,y;a) G x - = x = - ox - 2 e C T 2 o 2TT (T x n G(x,y; a) = dG(x,y;a) a _^±Z - l and with respect to y Gy = 1 — Z y 9 = dy (B.2) 1 o 9 e 2 < T (B.3) a 2-K a ~ G(x,y; o) 2 2 The second order partial derivatives are d G{x,y- a) 2 G xx = ,x . 1 1 _2i+i = (—-1) — xo - a ' o 2no G(x,y; cr) 2 e ox* K 2 2 2 2 2 cr (B.4) 4 ._ d G(x,y; cr) _ xy 1 *"~ c9xc9y ~ <r*2na* 2 * t T = -j d G(x,y;a) y 2 2 ( 6 2CT (B.5) CT) , 1 1 (B.6) = 2—G(a:,y;.ff) 146 The third order partial derivatives are _ d G(x,y; a) _ ,3a x 3 /J„3 ~ dx 3 l„4 V CT 1 3 „ i ^^2 6 C 7 4 6 / 3cr a; - x 2 6 T _ d G{x,y; a) _ y x 3 - d G(x,y; a) 1 2 IS G ( z , y ; CT) x ,y , 2 = 2 C T (B.8) a y-x y 3 G yy e 2 G ( s , y ; CT) , 2 ( 2 27TCT 9 3 1 X v x ~ xy 2 . 2 B _ &G(XM<T) 9 y 3 _ ,3y (B.9) G(a;,y; cr) y, 1 3 -1^4 = . 6i rcT 2e CT « 27 (B.10) G{x,y;a) Note again how all the derivatives can be expressed in terms of the original Gaussian function. 147 Appendix C Test Images D a t a b a s e The following images that form our test images database have been taken indoors by a Sony 3 CCD color camera DXC 950 and then converted into grayscale images. The camera was mounted on a gantry and controlled by software. The camera allows us to switch gamma correction on or off, and the image with gamma correction on ("CGC") was taken immediately after the corresponding image with gamma correction off ("OGC"), ensuring an identical environment. The camera with its frame grabber delivers images of size 480 X 640, but we used only smaller images ranging from 90 X 90 to 128 X 128 cut out from the full sized images. The important constraint during the 0GC-CGC image pair acquisition is to keep the camera and the scene objects in the same position and to maintain a constant lighting. Other than that, no restrictions are necessary. The selection of the imaged objects is more or less arbitrary, but we tried to select images that contain both high frequency and low frequency areas (i.e. homogeneous and textured regions), as well as high contrast and low contrast edges. 148 20 40 Figure C l : Test images I: (left) OGC (right) C G C (first row) Build05 128x128 (second row) WoBA 90x120 (third row) WoBB 90X120 (fourth row) WoBC 90x120 (fifth row) WoBD 90x120 149 Figure C.2: Test images II: (left) OGC (right) C G C (first row) KetCycl 90x90 (second row) KetSand 110x110 (third row) ToolA 120x100 (fourth row) ToolB 120X120 (fifth row) ToolC 110x110 150 Appendix D Binary Correlation Accuracy Matrices This appendix contains the binary correlation accuracy matrices as described in section 4.2 by eq. 4.14, for all the images from the test database (see appendix C), for template sizes 6 x 8 and 10 x 10. These images visualize the data that is summarized in tables 4.5 and 4.6, as well as in fig. 4.12. A white pixel denotes a correct maximum correlation position, while a black one denotes one at a wrong one. That is, the white areas show the point sets FCMCP{I> J)For template size 6 x 8 , this appendix also shows the differences between the correlation accuracy on intensity images and the one on the invariant representation in analogy with fig. 4.11. 151 Figure D . l : Correct maximum correlation position images, template size 6 x 8 . (first column) intensity images, cr =0; (second column) intensity images, a =1.0; (third column) invariant representations, cr =0; (fourth column) invariant representations, o- =1.0. (first row) B u i l d 0 5 ; (second row) WoBA; (third row) WoBB; (fourth row) WoBC; (fifth row) WoBD. pre pre pre pre 152 ] Figure D.2: Correct maximum correlation position images, template size 6 x 8 . (first column) intensity images, a —0; (second column) intensity images, cr =1.0; (third column) invariant representations, <7 =0; (fourth column) invariant representations, cT =1.0. (first row) KetCycl; (second row) KetSand; (third row) ToolA; (fourth row) ToolB; (fifth row) ToolC. pre pre pre pre 153 3^1 i •' - > * c i l % • .- J • * ... f * < - - h ' t P _ . r ^ : • J**" • • 1", •i " , '•. i A. * r ** Hi A 1 Figure D.3: Comparison of BCAMs, template size 6 x 8 . (first column) intensity image; (second column) locations where both representations fail; (third column) locations where intensity image is superior; (fourth column) locations where invariant representations (with a =1.0) is superior, (first row) B u i l d 0 5 ; (second row) WoBA; (third row) WoBB; (fourth row) WoBC; (fifth row) WoBD. pre 154 Figure D.4: Comparison of BCAMs, template size 6 x 8 . (first column) intensity image; (second column) locations where both representations fail; (third column) locations where intensity image is superior; (fourth column) locations where invariant representations (with a =1.0) is superior, pre (first row) K e t C y c l (second row) K e t S a n d (third row) T o o l A ToolC 155 (fourth row) T o o l B (fifth row) Figure D . 5 : Correct maximum correlation position images, template size 10 X 10. (first column) intensity images, < 7 = 0 (second column) intensity images, a =1.0 (third column) invariant representations, < J = 0 (fourth column) invariant representations, < 7 = 1 . 0 . (first row) B u i l d 0 5 (second row) WoBA (third row) WoBB (fourth row) WoBC (fifth row) WoBD pre p r e p r e p r e 156 Figure D.6: Correct maximum correlation position images, template size 10 X 10. (first column) intensity images, <7 =0 (second column) intensity images, <r =1.0 (third column) invariant representations, CT =0 (fourth column) invariant representations, cr =1.0. (first row) KetCycl (second row) KetSand (third row) ToolA (fourth row) ToolB (fifth row) ToolC pre pre pre pre 157 Appendix E Histogram Distance Matrices This appendix contains tables giving the x distances for 32 x 32 histograms for the histogram 2 based retrieval scenario in sec. 4.3. They are analogous to the tables 4.7 to 4.9, just with a higher histogram resolution. The recognition rates are 50% for the intensity images, 90% for the unfiltered invariant representation, and 100% for the prefiltered representation. Where the query image is not recognized correctly for the intensity images, it is ranked second best in four out of ten cases, and ranked third in the remaining case. Figs. E . l to E.3 visualize the data from tables E . l to E.4 for the query images Build, WoBA, and. Tool A, respectively. Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC Build WoBA WoBB 1333.7 4240.8 7562.0 3482.5 6622.3 1732.5 2538.1 6114.3 10222.4 6899.0 2816.0 1769.6 3874.1 1987.3 3836.3 2567.8 2406.0 2578.6 4909.7 3067.9 5172.5 2979.2 1885.0 2065.1 1633.8 3212.8 3895.6 2284.6 2834.4 2162.3 WoBC WoBD 3026.6 2156.4 3624.5 1610.1 3341.9 2449.3 2181.1 2776.7 •4818.1 3203.5 4307.1 2429.9 2812.2 1824.7 1672.7 3102.9 3068.9 2168.4 3319.7 2711.2 Cycl Sand ToolA 2422.0 3586.9 3716.9 2814.2 2744.5 891.7 1739.7 3973.8 5983.1 4471.4 3071.7 3353.4 3793.9 2273.3 2686.9 1336.1 1849.1 3803.5 5883.5 4371.8 4026.5 1682.6 2999.1 2332.7 3012.4 3186.3 3124.3 1545.9 3352.8 2116.9 ToolB ToolC 6229.6 2367.0 2104.5 3009.0 2252.3 4359.6 4346.1 1320.8 2170.6 1674.5 5059.0 2083.6 3460.5 2675.5 2885.3 4119.0 4095.8 1766.6 3185.9 2157.3 Table E . l : Distance matrix, intensity images, 32 X 32 bins, no prefiltering. Minimum distance in bold face. Recognition rate R=50%. 158 PS 11III 1* : f | 1 iiw-- Ji; —mmm» —— I .-, PS • f vS f i WW ' * •P 1 Figure E . l : Retrieval rankings for query image Build, 32 x 32 bins, (top) for intensity images (middle) for invariant representations, a = 0 (bottom) for invariant representations, a = 1.0 pre n • f I | -* j ™i vS FS pre : \ • I - A PS IP J 1 1 ' i « » a — Figure E.2: Retrieval rankings for query image WoBA, 32 X 32 bins, (top) for intensity images (middle) for invariant representations, a = 0 (bottom) for invariant representations, a = 1.0 pre pre Figure E.3: Retrieval rankings for query image ToolA, 32 X 32 bins, (top) for intensity images (middle) for invariant representations, a = 0 (bottom) for invariant representations, a = 1.0 pre pre 159 Build WoBA WoBB X' Build 1068.5 2781.2 4197.0 WoBA 3936.9 1275.8 2734.3 WoBB 6517.4 3721.4 1647.5 WoBC 2997.9 1907.4 1343.6 WoBD 5555.3 .4040.6 1463.5 Cycl 1075.7 1919.5 2642.4 Sand 2249.1 1764.0 3707.6 ToolA 5641.6 2389.3 1940.2 ToolB 9815.3 4865.2 2798.3 ToolC 6356.1 3048.3 2021.2 WoBC 2285.4 2095.2 3146.2 1069.5 2949.0 1502.4 1941.2 2496.4 4832.2 3095.4 WoBD 3182.5 2238.2 2136.8 1633.2 1549.2 2131.5 2679.9 1899.1 3475.5 2704.6 Cycl 1478.8 3073.2 3258.4 2074.5 2149.2 844.8 1904.5 3461.1 5524.4 3618.9 Table E.2: Distance matrix, intensity images, 32 X 32 bins, a face. Recognition rate R—50%. pre X* Build WoBA WoBB WoBC WoBD Cycl Sand ToolA ToolB ToolC Build 1219.8 1919.1 1805.3 2178.2 2163.2 1807.0 2122.0 2163.9 2111.5 2201.1 WoBA 1490.0 991.4 1294.5 1374.3 1195.4 1186.4 1229.4 1400.3 1381.0 1499.8 WoBB 1880.0 1519.7 1017.6 1322.5 1525.4 1353.1 1628.9 1648.6 1573.7 1520.1 WoBC 1430.6 1287.4 1159.5 1161.0 1274.2 1368.2 1621.6 1473.5 1467.2 1458.6 WoBD 1659.8 1125.3 1392.5 1187.1 982.4 1295.0 1430.3 1231.3 1251.2 1306.0 Cycl 1728.0 1154.1 1327.3 1431.1 1292.5 887.0 1104.3 1558.2 1486.4 1720.6 Sand 2202.0 2632.7 3295.7 1745.8 2760.5 1108.5 1855.6 3155.4 5602.4 3899.0 ToolA 3607.3 1396.2 2367.6 2212.3 2847.7 2575.3 2494.5 1266.1 3130.2 1744.3 ToolB 5362.6 2397.7 1582.8 2742.9 2221.8 3615.6 3846.8 1172.8 2069.0 1303.5 ToolC 4112.5 1844.0 2787.0 2416.8 2905.1 2880.1 3026.2 1533.5 3136.6 1945.5 — 1.0. Minimum distance in bold Sand 1954.5 1209.2 1508.4 1581.0 1439.2 1064.4 932.5 1714.2 1645.5 1896.8 ToolA 1557.3 1363.2 1534.3 1457.3 1227.0 1606.2 1780.8 981.6 1137.5 1141.8 ToolB 1459.7 1208.2 1385.3 4322.1 1102.9 1387.8 1548.8 1004.1 1068.8 1137.3 ToolC 1733.2 1368.1 1369.2 1263.4 1216.1 1558.8 1814.9 1225.8 1237.0 1151.4 Table E.3: Distance matrix, invariant representation, 32 X 32 bins, no prefiltering. Minimum distance in bold face. Recognition rate R=90%. Build Build 1134.6 WoBA 1653.5 WoBB 1651.2 WoBC 1581.2 WoBD 2698.5 Cycl 1747.3 Sand .2919.5 ToolA 1841.6 ToolB 1719.7 ToolC 2309.8 WoBA 1396.5 999.8 1371.2 1416.9 1564.9 1180.0 1621.2 1186.4 1181.1 1443.6 WoBB 1652.6 1374.0 1019.5 1337.3 1742.9 1324.4 2059.0 1387.7 1314.9 1327.3 WoBC 1412.9 1340.7 1356.5 1006.5 1788.3 1357.9 2109.1 1313.4 1272.8 1476.2 WoBD 2426.9 1384.7 1604.0 1728.8 960.9 1283.6 1458.3 1310.2 1329.8 1145.1 Cycl 1610.3 1153.0 1312.2 1447.9 1269.5 889.4 1269.6 1222.8 1223.3 1197.3 Sand 2520.5 1555.2 1871.3 1861.3 1489.9 1259.0 1008.8 15.50.7 1756.2 1561.3 ToolA ToolB ToolC 1739:9 1627.2 2288.2 1146.6 1182.9 1424.2 1309.4 1244.1 1435.6 1309.4 1266.3 1571.0 1407.5 1442.0 1257.9 1152.9 1192.6 1299.3 1685.3 1740.3 1827.7 990.9 1128.3 1318.3 1136.6 1017.5 1292.5 1172.3 1215.0 1056.1 Table E.4: Distance matrix, invariant representation, 32 X 32 bins, apre = 1.0. Minimum distance in bold face. Recognition rate 7i=100%. 160 Appendix F Point Spread Function Patterns Analogous to the razor edge point spread function data in section 5.1, we show here the camera response to a cross pattern for both the Sony and the Olympus camera. We generally observe the same phenomena: the blurring of the sharp edge, the "overshooting" of the PSF, especially in case of the Olympus camera, and the phase effect which manifests itself in the asymmetry of the image in spite of the perfect symmetry of the imaged object. 161 10 12 14 ) 2 4 3 2 4 ! 8 10 12 14 16 e 10 12 14 16 10 12 14 16 16 (c) Figure F . l : Point spread function of the Sony camera on a cross pattern, (a) camera image (b) horizontal cross sections, rows 4, 5, 6 (c) vertical cross sections, columns 4, 5, 6. 2 4 10 6 (a) (b) 12 14 16 (c) Figure F.2: Point spread function of the Olympus camera on a cross pattern, (a) camera image (b) horizontal cross sections, rows 1, 3, 5 (c) vertical cross sections, columns 1, 3, 5. 162 Appendix G Scaling Behavior of the Invariant 0 i23 m This appendix contains the scaling data for the invariant for @ i 2 in section 5.2. m 163 © i23, m in analogy to the data presented Figure G . l : Scaling by fltering (SF) vs. scaling by optical zooming (SO), O i 3 , image Build, (left) SF; (middle) SO; (right) A=\SO - SF\. (first row) a=1.18, no postfiltering; (second row) a=1.18, with postfiltering; (third row) a=1.35, no postfiltering; (fourth row) a=1.35, with postfiltering. m 164 2 50 50 100 100 150 150 200 250 200 50 100 150 200 250 300 250 Figure G.2: Camera scaled invariant vs. SO invariant O i 3 , no prefiltering. (left) CS; (center) SO; (right) A=\CS - SO\. (first row) ct=1.09; (second row) e*=1.18; (third row) e*=1.27; (forth row) ct=1.35. m 165 2 Figure G.3: Camera scaled invariant vs. SO invariant © i 2 3 , with prefiltering, os = 3.0. (left) CS; (center) SO; (right) A=\CS - SO\. (first row) a=1.09; (second row) a=1.18; (third row) a=1.27; (forth row) a=1.35. m 166 yn Figure G.4: Reliable points for invariant Q i 2 3 , image Build, with prefiltering, asyn — 3.0. (left) e=5.0; (middle) €=10.0; (right) e=20.0. (first row) ct=1.09; (second row) a=1.18; (third row) c*=1.27; (forth row) a=1.35. m 167 Appendix H Averaged Wavelet H.l Transform 1-d A W T In analogy with figs. 5.14 and 5.16 in section 5.3.1, we show here some 1-d examples of the AWT computed with Daubechies wavelets rather than with the Haar wavelet, with the number of decomposition levels set to three. 168 169 170 H.2 2-d A W T In section 5.3.3, the 2-d AWT Haar filter coefficients for s=l were given as -0.0625 -0.1250 -0.0625 -0.1250 0.7500 -0.1250 -0.0625 -0.1250 -0.0625 For s=2, we have -0.00390625 -0.00781250 -0.01171875 -0.01562500 -0.01171875 -0 00781250 -0.00390625 -0.00781250 -0.01562500 -0.02343750 -0.03125000 -0.02343750 -0 01562500 -0.00781250 -0.01171875 -0.02343750 0.02734375 0.07812500 0.02734375 -0 02343750 -0.01171875 -0.01562500 -0.03125000 0.07812500 0.18750000 0.07812500 -0 03125000 -0.01562500 -0.01171875 -0.02343750 0.02734375 0.07812500 0.02734375 -0 02343750 -0.01171875 -0.00781250 -0.01562500 -0.02343750 -0.03125000 -0.02343750 -0 01562500 -0.00781250 -0.00390625 -0.00781250 -0.01171875 -0.01562500 -0.01171875 -0 00781250 -0.00390625 For the 2-d AWT Daub filter for s=l, we have the coefficients 2 -0.0009462833 0 0 0 0.0086066723 0.0153808594 0.0086066723 0 0 0 0 -0.0009462833| 0 0 0.0086066723 0 -0.0782797337 -0.1398925781 -0.0782797337 0 0.0086066723 0.0153808594" 0 -0.1398925781 -0.1398925781 0 0.0153808594 0.008.6066723 0 -0.0782797337 -0.1398925781 -0.0782797337 0 0.0086066723 0 0 -0.0009462833 0 0.7500000000 0 0 0 0.0086066723 0.0153808594 0.0086066723 0 0 0 -0.0009462833J In analogy with fig. 5.21 in section 5.3.3, fig. H.3 shows the result of applying the Daub filters 2 to an image. 171 172 Appendix I 6 Space In a slight variation of the standard deviation of the © space given in section 5.4, we show the Max-Min difference M M for the same image Build, for both Q i2 m and @ i23, m according to eq. (5.12). Essentially, the Max-Min difference shows the same results as the standard deviation shown in section 5.4. 173 '• -vr,' : .-, ,.•-> ."'.-isT 1.8 mmk • mi "S 50 f t . >, A * -ft ii * y • *\ 1.6 < ' '„> / ' * -"" :'' ' 1.4 1.2 1 150 0.8 200 0.6 0.4 250 0.2 L ,-5T.ij<f*. >. 0 : 50 100 150 200 250 300 350 50 100 150 200 250 300 350 II n 50 1.4 100 1.2 150 0.8 2001 0.6 250 0.2 50 100 150 200 250 300 0 350 501 0.8 0.7 0.6 0.5 •PlIHRH 0.4 0.3 WaaBMmSBm 0.2 0.4 250 0.9 0.1 50 100 150 200 250 300 350 1.8 0.9 1.6 08 1.4 0.7 1.2 0.6 1 0.5 0.8 04 1001 1501 {firm 0.6 2501 50 100 150 200 250 300 350 200 0.3 0.4 0.2 0.2 0.1 0 50 100 150 200 250 300 350 Figure 1.1: Max-Min difference in 6 space, a=[l.0,3.0], image^Build. (left) © i space; (right) 0 i 2 3 space, (first row) <7s n=Q; (second row) o-Syn—1-Q; (third row) cTs =3.0. m m y yn 174 2
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Invariant representation of image functions under gamma...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Invariant representation of image functions under gamma correction and similarity transformations Siebert, Andreas 2000
pdf
Page Metadata
Item Metadata
Title | Invariant representation of image functions under gamma correction and similarity transformations |
Creator |
Siebert, Andreas |
Date Issued | 2000 |
Description | This work focuses on image retrieval and recognition in environments where the images are subject to a non-linear brightness change known in image processing as gamma correction. Our empirical data shows that gamma correction changes images significantly, resulting in poor retrieval results if not addressed. The proposed solution is based on a novel differential invariant under this kind of radiometric transformation. Since imaged objects are often subject not only to radiometric changes but also to variations of the scene geometry, we propose a representation of two-dimensional image functions that is simultaneously invariant under gamma correction and some geometric transformations, namely translation, rotation, and scaling. An implementation of the proposed invariants based on derivatives of the Gaussian is given. For gamma correction without geometric scaling, improved image retrieval performance based on the invariant representation is demonstrated in both a template matching scenario and a histogram based retrieval system. The proposed invariants perform unsatisfactorily under scaling. The key reasons for this behavior are discussed, and empirical data on the accuracy of the proposed invariants are provided. |
Extent | 34536848 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-07-23 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051492 |
URI | http://hdl.handle.net/2429/11216 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2000-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2000-566226.pdf [ 32.94MB ]
- Metadata
- JSON: 831-1.0051492.json
- JSON-LD: 831-1.0051492-ld.json
- RDF/XML (Pretty): 831-1.0051492-rdf.xml
- RDF/JSON: 831-1.0051492-rdf.json
- Turtle: 831-1.0051492-turtle.txt
- N-Triples: 831-1.0051492-rdf-ntriples.txt
- Original Record: 831-1.0051492-source.json
- Full Text
- 831-1.0051492-fulltext.txt
- Citation
- 831-1.0051492.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051492/manifest